Application locale / Locale and character set basics |
When programming an application for a Latin-based language sush as English, a single-byte character set can be used, and the logical size, storage size and print width of characters is the same. For example, in ISO-8859-1, the ê character takes one logical position, has a storage size of one byte and a print width of one.
When programming an international application using multiple languages and a multibyte character set encoding, you must distinguish three size units:
Working with byte units in a multibyte character set can be difficult: You need to calculate sizes, lengths and substring offsets in a number of bytes, when the natural way is to count in characters.
Length semantics define the unit to be used for character data type definition, charcater string lengths and positions.
With Byte Length Semantics, a length is expressed in bytes, while Character Length Semantics counts in characters.