ASCII Code, Extended ASCII characters (8-bit system) and ANSI Code
Related Links:
1. VBA Chr & Asc functions explained; corresponding Excel CHAR and CODE functions.
2. Excel Text and String Functions: TRIM & CLEAN.
A ‘character set‘ maps characters to their identifying code values. Unicode is a global standard for character encoding and is the most commonly used character set today. This section gives you a brief insight into the 7-bit ASCII code, Extended ASCII characters (8-bit), ANSI coding & UNICODE.
ASCII Code: Computers use the binary system internally for processing ie. they work with binary codes of 0s and 1s (zeroes and ones). The computer converts letters and characters into numbers, and then converts those numbers into binary. The system to encode letters and characters into numbers is called the ASCII code (American Standard Code for Information Interchange). It is a set of 128 characters wherein the first 32 (character codes 0-31) are control codes (used to control peripherals such as printers) and spacing characters and are unprintable (refer Table 1). The remaining 96 (character codes 32-127) are printable characters (representing the numbers from 0-9, the uppercase and lowercase English alphabets, punctuation marks, and symbols) which you will find on your keyboard (refer Table 2). Each character is given its own number reference. As with most things computer-related, the characters are counted from zero (not one), so they cover a numerical range of 0-127. Character 127 represents the command DEL. For example, the ASCII code for the capital letter “A” is the number 65, which is representable in binary using 0s and 1s (65 converts to binary number 1000001).
ASCII to Extended ASCII characters (8-bit system) and ANSI Code: In the 1960s, a need for standardization led to ASCII, which is a 7-bit system. But later almost everything was done in an 8-bit system. With 7 bits, 128 numbers (0-127 in decimal notation) are available to code characters. A bit is a binary digit which can have either two values, on or off. Seven bits can have 2^7 or 128 possible unique values. ASCII was soon expanded to an 8-bit system that has 256 code points, 0-255 (8-bit corresponds to 2 ^ 8 ie. 256 possibilities). There are many variants of Extended ASCII characters (8-bit system) to cover regional characters and symbols. One example is the extended ASCII characters which includes various letters needed for writing languages of Western Europe and certain special characters. This encoding is called ISO Latin-1 or ISO 8859-1, (ISO refers to International Organization for Standardization), which is the default character set in most browsers. The ISO 8859-1 character set includes the original ASCII character set (values 0 to 127), plus an extended character set (codes from 160-255) which contains the characters used in Western European countries and some commonly used special characters. Many Windows systems use another related 8-bit encoding, and this Microsoft specific encoding is referred to as ANSI, or Windows-1252. It is similar to ISO 8859-1 except that character codes 128-159 in ISO 8859-1 are reserved for controls whereas ANSI uses most of them for printable characters. ANSI stands for American National Standards Institute. The ANSI character set includes the standard ASCII character set (values 0 to 127), plus an extended character set (values 128 to 255; refer Table 3).
Extended ASCII characters (8-bit) and UNICODE: In addition to the ISO 8859-1 (Latin-1, West European languages) encoding, the ISO 8859 standard includes several 8-bit extensions to the ASCII character set, viz. ISO 8859-2 (Latin-2, Central and East European languages); ISO 8859-3 (Latin-3, Southeast European and miscellaneous languages); ISO 8859-4 (Latin-4, Scandinavian/Baltic languages); ISO 8859-5 (Latin/Cyrillic); and so on. In these 8-bit extensions, the lower 128 characters (0 to 127) are the same ASCII characters, while the upper 128 (128 t0 255) characters are for the appropriate language and symbols. However, the significant drawback of 256 characters limitation remained because languages such as Japanese and Arabic have thousands of characters. Also, the problem of incompatibility resulted if a user saves a file as Latin-1 while a co-user uses Latin 7 system in which case the extended characters (128 to 255) might display wrongly. This resulted in UNICODE representation which is a language independent code, the first version being a 16-bit encoding. Unicode data representations include UTF-8 (popular encoding used on the web), UTF-16 (used by Java and Windows) and UTF-32 (UTF-8 & UTF-32 are used by Linux and various Unix systems). ASCII was the most commonly used character encoding on the World Wide Web until December 2007, when it was surpassed by UTF-8. UTF-8 allows for backward compatibility with 7-bit ASCII, wherein the first 128 ASCII characters were incorporated into UNICODE and have the same numeric codes in both.
The Windows SDK provides function prototypes in generic, Windows code page (ANSI) and Unicode versions: Windows API functions to manipulate characters can be used in 3 formats: (i) Generic version which can be compiled for either Windows code pages (ANSI) or Unicode; (ii) Windows code page version with the letter “A” added at the end of the generic function name to indicate “ANSI”; (ii) Unicode version with the letter “W” added at the end of the generic function name to indicate “wide”. Some newer functions support only Unicode versions.
Windows applications normally use UTF-16 to represent Unicode character data. The use of 16 bits allows direct representation of 65,536 (2^16) unique characters, however Unicode has the potential to define 1,114,112 characters(2^16 * 17) and UTF-16 can support all 1,114,112 potential Unicode characters within the code point range U+0000 to U+10FFFF.
Excel uses ANSI code system for Windows computers: Excel uses the standard ANSI character set, for Windows systems. The Windows operating environment uses the ANSI character set, whereas Macintosh uses the Macintosh character set – hence the characters returned for the same numerical values by the Excel Char Function may vary across operating environments. Similar to the excel vba function of Chr, you have the vba ChrW function which returns a Unicode character for the specified character code, and again you have the excel vba Asc and AscW functions, wherein AscW returns the Unicode character code for the specified character.