Using character length semantics in UTF-8

Version 2.50 introduces char length semantics support to simplify the development of UTF-8 applications. UTF-8 can however also be used with byte length semantics in existing applications, especially if the database uses BLS. If your application is based on UTF-8, you should specify BLS or CLS usage with the FGL_LENGTH_SEMANTICS environment variable, according to the length semantics used in your database.

When using byte length semantics, the database drivers can truncate multibyte (UTF-8) string data that is fetched from the database, to fit the program variable corresponding in terms of size to the type of the database column. This is new compared to older versions. For example, when fetching an NCHAR(1) Chinese character from SQL Server, the size required to store such character in a program variable in UTF-8 will be 3 bytes. However, by convention, the program variable will be defined as a CHAR(1 (bytes)), to limit the total number of ASCII characters that can be stored in this variable. The database drivers adapt the intermediate fetch buffer to the corresponding size of the program variable. As result, because the Chinese character would not fit in a CHAR(1 (bytes)) variable, it is possible that you do not get the expected result, even when fetching in a program variable that is big enough to hold the Chinese UTF-8 character, like CHAR(20 (bytes)) for example. This will typically happen when fetching a SUBSTR[ING] expression into a large character variable:

DEFINE vc VARCHAR(20)
...
-- The substring function takes positions in character units
SELECT SUBSTRING(col, 5, 1) INTO vc
   FROM mytable WHERE ...
DISPLAY "[", vc, "]"
-- Displays [ ], when db char takes more than 1 byte

Character data truncation can also occur with the LOAD and UNLOAD instructions, when byte length semantics is used.

To solve this issue, use character length semantics in your programs, when the database uses char length semantics.