CHARACTER data types

Informix^® supports following character data types:

CHAR(N) with N<= 32767 bytes
VARCHAR(N[,M]) with N<=255 bytes
NCHAR(N) with N<= 32767 bytes
NVARCHAR(N[,M]) with N<=255 bytes
LVARCHAR(N), without the 255 bytes limit (max size varies according to IDS version)

In Informix, both CHAR/VARCHAR and NCHAR/NVARCHAR data types can be used to store single-byte or multibyte encoded character strings. The only difference between CHAR/VARCHAR and NCHAR/NVARCHAR is for sorting: N[VAR]CHAR types use the collation order, while [VAR]CHAR types use the byte order. The character set used to store strings in CHAR/VARCHAR/NCHAR/NVARCHAR columns is defined by the DB_LOCALE environment variable. The character set used by applications is defined by the CLIENT_LOCALE environment variable. Informix uses Byte Length Semantics (the size N that you specify in [VAR]CHAR(N) is expressed in bytes, not characters as in some other databases)

SQL Server provides the following data types to store character data:

CHAR(N) with N<= 8000 bytes (single or multibyte charset)
VARCHAR(N) with N<= 8000 bytes (single or multibyte charset)
VARCHAR(MAX) with a limit of 2^31-1 bytes (single or multibyte charset)
NCHAR(N) with N<= 4000 (Unicode/UCS-2) characters
NVARCHAR(N) with N<= 4000 (Unicode/UCS-2) characters
NVARCHAR(MAX) with a limit of 2^31-1 bytes (Unicode/UCS-2)

To store large text data (LOBs), Microsoft™ SQL Server version 2005 introduced the VARCHAR(MAX) type as a replacement for the old TEXT type.

The use of NCHAR, NVARCHAR character types is the same as CHAR, VARCHAR, TEXT respectively, except:

The encoding is UCS-2 (an UTF-16 subset).
The length N in N[VAR]CHAR(N) defines a number of characters, not bytes.
Since each character occupies 2 bytes, twice the space is needed to store the same strings as with CHAR/VARCHAR.
The maximum size of NCHAR and NVARCHAR column is 4000 characters, compared to 8000 chars for CHAR/VARCHAR using a single-byte character set.
Unicode string literals are specified with a leading N. For example: N'日本語'
The LIKE statement behaves differently with CHAR and NCHAR columns when using the N prefix before the search pattern.

Note that SQL Server uses Byte Length Semantics to define the size of CHAR/VARCHAR columns, while NCHAR and NVARCHAR sizes are expressed in character units.

SQL Server defines the character encoding for CHAR and VARCHAR columns with the database collation. The database collation can be specified when creating a new database. Character strings are always stored in the UCS-2 encoding for NCHAR/NVARCHAR columns.

Automatic charset conversion is supported by SQL Server between the client application and the server. The client charset is defined by the Windows™ operating system, in the language settings for non-Unicode applications.

Solution

According to the character set used by your application, you must either use CHAR/VARCHAR or NCHAR/NVARCHAR columns with SQL Server. If the charset is single-byte , you can use CHAR/VARCHAR columns. If the charset set is multibyte or Unicode (i.e. UTF-8), you must use NCHAR/NVARCHAR columns in SQL Server.

See also the section about Localization.

Make sure that the regional language settings for non-Unicode applications corresponds to the locale used by Genero programs.

Check that your database tables does not use CHAR or VARCHAR types with a length exceeding the SQL SERVER limit.

When using a multibyte character set (such as UTF-8), define database columns as NCHAR and NVARCHAR, with the size in character units, and use character length semantics in BDL programs with FGL_LENGTH_SEMANTICS=CHAR.

When extracting a database schema from a SQL Server database, the schema extractor uses the size of the column in characters, not the octet length. If you have created a NCHAR(10 (characters) ) column a in SQL Server database, the .sch file will get a size of 10, that will be interpreted according to FGL_LENGTH_SEMANTICS as a number of bytes or characters.

Do not forget to properly define the database client character set, which must correspond to the runtime system character set.

Using the SNC driver

The SNC driver can work in char or in wide-char mode. The character size mode can be controlled by the following FGLPROFILE entry:

dbi.database.dbname.snc.widechar= { true | false }

By default the SNC database driver works in Wide Char mode (dbi.database.dbname.snc.widechar= true).

The char mode must be used with applications defining character string columns with CHAR/VARCHAR/TEXT types. It is not mandatory (i.e. the wide-char mode could be used), but it appears that SQL Server behaves in different ways when wide-char bindings are used for CHAR/VARCHAR/TEXT columns. When defining CHAR(n)/VARCHAR(n) columns in SQL Server, you specify n as a number of bytes, therefore you should use byte length semantics (the default) in Genero programs, with FGL_LENGTH_SEMANTICS=BYTE.

The wide-char mode must be used for applications defining character string columns with NCHAR/NVARCHAR/NTEXT types. These SQL types are used to store Unicode data. In such case, the runtime system must use a UTF-8 locale, with character length semantics (FGL_LENGTH_SEMANTICS=CHAR). In wide-char mode, all string literals of an SQL statement are automatically changed to get the N prefix. Thus, you don't need to add the N prefix by hand in all of your programs. This solution makes your Genero code portable to other databases.

Using the ESM driver

When using the ESM (EasySoft) database driver, string literals get the N prefix only if the current locale (LANG / LC_ALL) defines a multibyte code set such as .big5 or .utf8. String literals are not touched if the locale uses a single-byte character set.

When using the ESM (EasySoft) database driver, SQL Statements are prepared with SQLPrepare(), by using the current character set. EasySoft makes the necessary charset conversions from the client charset to UCS-2 before sending the SQL text to the server. ODBC SQL parameters with character string data are bound (SQLBindParameter) with the C type SQL_C_CHAR and with the SQL type SQL_W[VAR]CHAR (=UNICODE) or with SQL_[VAR]CHAR, based on the current locale. The SQL_W[VAR]CHAR type is used if the current locale is set to multibyte encoding. When using a single-byte encoding, parameters are bound with the SQL_[VAR]CHAR type. As a result, the necessary character set conversion is controlled by Easysoft and is optimized when using a single-byte character set.

Important: It is critical to declare the correct client character set in Easysoft configuration files. The EasySoft client character set is defined by the "Client_CSet" parameter in odbc.ini.

Using the FTM driver

When using the FTM (FreeTDS) database driver, string literals get the N prefix only if the current locale (LANG / LC_ALL) defines a multibyte code set such as .big5 or .utf8. String literals are not touched if the locale uses a single-byte character set.

With the FTM (FreeTDS) database driver, SQL Statements are prepared with SQLPrepare(), by using the current character set. FreeTDS makes the necessary charset conversions from the client charset to UCS-2 before sending the SQL text to the server. ODBC SQL parameters with character string data are bound (SQLBindParameter) with the C type SQL_C_CHAR and with the SQL type SQL_W[VAR]CHAR (=UNICODE) or with SQL_[VAR]CHAR, based on the current locale. The SQL_W[VAR]CHAR type is used if the current locale is set to multibyte encoding. When using a single-byte encoding, parameters are bound with the SQL_[VAR]CHAR type. As a result, the necessary character set conversion is controlled by FreeTDS and is optimized when using a single-byte character set.

Important: It is critical to declare the correct client character set in FreeTDS configuration files. The FreeTDS client character set is defined with "ClientCharset" parameter in odbc.ini.