Version 2.50 introduces char length semantics support to simplify
the development of UTF-8 applications. UTF-8 can however
also be used with byte length semantics in existing applications,
especially if the database uses BLS. If your application is based
on UTF-8, you should specify BLS or CLS usage with the
FGL_LENGTH_SEMANTICS environment variable, according to
the length semantics used in your database.
When using byte length semantics, the database drivers can truncate
multibyte (UTF-8) string data that is fetched from the
database, to fit the program variable corresponding in
terms of size to the type of the database column. This is new compared
to older versions. For example, when fetching an NCHAR(1)
Chinese character from SQL Server, the size required to
store such character in a program variable in UTF-8 will be 3 bytes.
However, by convention, the program variable will be defined
as a CHAR(1 (bytes)), to limit the total number of ASCII
characters that can be stored in this variable. The database
drivers adapt the intermediate fetch buffer to the corresponding size
of the program variable. As result, because the Chinese
character would not fit in a CHAR(1 (bytes)) variable,
it is possible that you do not get the expected result, even when
fetching in a program variable that is big enough to hold
the Chinese UTF-8 character, like CHAR(20 (bytes)) for
example. This will typically happen when fetching a SUBSTR[ING]
expression into a large character variable:
DEFINE vc VARCHAR(20)
...
-- The substring function takes positions in character units
SELECT SUBSTRING(col, 5, 1) INTO vc
FROM mytable WHERE ...
DISPLAY "[", vc, "]"
-- Displays [ ], when db char takes more than 1 byte
Character data truncation can also occur with the LOAD and UNLOAD
instructions, when byte length semantics is used.
To solve this issue, use character length semantics in your programs,
when the database uses char length semantics.