| Unicode |
| Introduction | |||
|
ASCII Back in the old days of punched cards a 7-bit character coding system was used. 7 bits gave 27; = 128 possible combinations, enough for 26×2 letters, 10 numbers, about 15 punctuation characters, and 20 or so symbols. Finally, 33 of the codes were used as control characters, e.g. line feed, tab, bell etc. The del code was taken as the last character, number 127, that is 111111 in binary. This meant all 7 spaces representing the character on the card were punched out, thus allowing any mistakes to be deleted.
Extended ASCII However, the 256 characters were really not quite enough. So the first 128 letters are usually the same, but the last 128 depend on what language you are using. So the Latin-1 set is for West Europe, Latin-2 for Central and East Europe, Latin-3 is additional (e.g. Catalan, Turkish) and Latin-4 for other additional (e.g. Estonian, Lappish). Other systems for Russia etc. exist. An altogether different set also in common use is the symbol set, basically for use in mathematics, containing Greek letters and mathematical operators.
Unicode Standard Unicode provides a consistent way of encoding multilingual plain text and brings order to the chaotic state of affairs outlined above. The Unicode Standard provides the capacity to uniquely encode all of the characters used for the written languages of the world. It uses a 16 bit (2 byte) encoding allowing for over 65,000 characters. Each character is assigned a unique name that specifies it and no other. For example, U+0041 is assigned the character name "LATIN CAPITAL LETTER A". The standard defines rules for the working of composite characters (characters generated by combining others, e.g. à). Many such characters exit in their own right (as for à).
UCS - Universal (Multiple-Octet Coded) Character Set
Usefull Links | |||