The Unicode Standard Works
Instead of using 7-bit binary code, the Unicode standard uses 8-bit “code units” and combines up to four code units at a time. This extends the number of characters that can be encoded from 128 to 1,112,064 - enough capacity for all the major world languages to be encoded in a single character set.
There is even a range of codes for Klingon, although this has not been officially endorsed by the Unicode Registry.
Cleverly, rather than using four code units to transmit every character, Unicode only uses those that are necessary. For example the binary code for a capital “A” could be expressed as [0041], but instead is expressed as [01000001] in order to save space. If the full four code units were used in a text message, the number of characters allowed would decrease from 1 to 70.
Comments
0 comments
Please sign in to leave a comment.