CHAR and VARCHAR data types

Informix®

Informix supports the following character data types:

Table 1. Informix character data types
Informix data type Description
CHAR(n) SBCS and MBCS character data (max is 32767 bytes)
VARCHAR(n[,m]) SBCS and MBCS character data (max is 255 bytes)
NCHAR(n) Same as CHAR, with specific collation order
NVARCHAR(n[,m]) Same as VARCHAR, with specific collation order
LVARCHAR(n) max size varies depending on the IDS version

With Informix, both CHAR/VARCHAR and NCHAR/NVARCHAR data types can be used to store single-byte or multibyte encoded character strings. The only difference between CHAR/VARCHAR and NCHAR/NVARCHAR is in how they use sorting: N[VAR]CHAR types use the collation order, while [VAR]CHAR types use the byte order.

The character set used to store strings in CHAR/VARCHAR/NCHAR/NVARCHAR columns is defined by the DB_LOCALE environment variable.

The character set used by applications is defined by the CLIENT_LOCALE environment variable.

Informix uses Byte Length Semantics (the size N that you specify in [VAR]CHAR(N) is expressed in bytes, not characters as in some other databases)

Microsoft™ SQL Server

Microsoft SQL Server supports following data types to store character data:

Table 2. Microsoft SQL Server character data types
Microsoft SQL Server data type Description
CHAR(n) SBCS or MBCS encoding using the database collation, where n is specified in bytes (maximum storage size is 8000 bytes)
VARCHAR(n) SBCS or MBCS encoding using the database collation, where n is specified in bytes (maximum storage size is 8000 bytes)
VARCHAR(MAX) SBCS or MBCS encoding using the database collation, to store large text data (maximum storage size is 2^31-1 bytes)
NCHAR(n) UCS-2/UTF-16 encoding, where n is specified in byte-pairs (maximum storage size is 4000 byte-pairs)
NVARCHAR(n) UCS-2/UTF-16 encoding, where n is specified in byte-pairs (maximum storage size is 4000 byte-pairs)
NVARCHAR(MAX) UCS-2/UTF-16 encoding, to store large text data (maximum storage size is 2^30-1 byte-pairs)
Note: To store large text data (LOBs), Microsoft SQL Server (2005) provides the VARCHAR(MAX)/NVARCHAR(MAX) type as a replacement for the old TEXT/NTEXT types. See TEXT and BYTE (LOB) types for more details.

The use of NCHAR, NVARCHAR character types is the same as CHAR, VARCHAR respectively, except:

  • For NCHAR / NVARCHAR, the character encoding is UCS-2 or UTF-16, depending if the database collation is using the SC option.
  • The length N in N[VAR]CHAR(N) defines a number of byte-pairs, not bytes. For UCS-2 this corresponds to a number of characters. But with UTF-16 there can be surrogate pairs using 4 bytes (2 byte-pairs).
  • The maximum size of NCHAR(N) and NVARCHAR(N) column is 4000 byte-pairs, compared to 8000 bytes for CHAR/VARCHAR using a single-byte character set.
  • Unicode string literals are specified with a leading N. For example: N'日本語'
  • The LIKE statement behaves differently with CHAR and NCHAR columns when using the N prefix before the search pattern.

Note that SQL Server uses Byte Length Semantics to define the size of CHAR/VARCHAR columns, while NCHAR and NVARCHAR sizes are expressed in byte-pair units, which for most cases (UCS-2) corresponds to a number of characters (but this is not true for UTF-16 characters encoded in surrogate pairs).

SQL Server defines the character encoding for CHAR and VARCHAR columns with the database collation. The database collation can be specified when creating a new database. Character strings are always stored in the UCS-2 or UTF-16 encoding for NCHAR/NVARCHAR columns. UTF-16 is used when the DB collation has the SC option.

Automatic charset conversion is supported by SQL Server between the client application and the server. The client charset is defined by the Windows® operating system, in the language settings for non-Unicode applications.

Solution

Depending on the character set used by your application, you must either use CHAR/VARCHAR or NCHAR/NVARCHAR columns with SQL Server. If the charset is single-byte , you can use CHAR/VARCHAR columns. If the charset set is multibyte or Unicode (i.e. UTF-8), you must use NCHAR/NVARCHAR columns in SQL Server.

Make sure that the regional language settings for non-Unicode applications corresponds to the locale used by Genero programs. For more details about program locale settings, see Localization.

Check that your database tables does not use CHAR or VARCHAR types with a length exceeding the SQL Server character types limits.

When using a multibyte character set (such as UTF-8), define database columns as NCHAR and NVARCHAR, with the size in byte-pair units, and use character length semantics in BDL programs with FGL_LENGTH_SEMANTICS=CHAR. When using a single-byte character set in the programs (like ISO-8859-15), the column types can be CHAR and VARCHAR, and the size in bytes corresponds to the number of characters.

By default, in DDL statements executed by programs (such as CREATE TABLE), the CHAR, VARCHAR and LVARCHAR type names are left untouched. To force the ODI drivers to replace CHAR/VARCHAR/LVARCHAR by NCHAR/NVARCHAR type names when the locals is multibyte, set the dbi.database.dsname.ifxemul.nationalchars FGLPROFILE entry to true (default is false):
dbi.database.dsname.ifxemul.nationalchars = true 
Note:
  1. With SQL Server ODI drivers, dbi.database.dsname.ifxemul.nationalchars will only take effect, if the current application locale is multibyte (typically, UTF-8)
  2. The dbi.database.dsname.ifxemul.nationalchars parameter is ignored, if the switch corresponding to the character type name (dbi.database.dsname.datatype.{char|varchar|text}) is set to false.
  3. With the SNC ODI driver, to have dbi.database.dsname.ifxemul.nationalchars take effect, the wide-char mode must be enabled (this is the default with multibyte character sets).

When extracting a database schema from a SQL Server database, the fgldbsch schema extractor uses the size of the column in byte-pair units, not the byte length. If you have created an NCHAR(10) column in an SQL Server database, the .sch file will get a size of 10, that will be interpreted as a number of bytes or characters, depending on FGL_LENGTH_SEMANTICS.

Do not forget to properly define the database client character set, which must correspond to the runtime system character set.

The CHAR/VARCHAR type translation can be controlled with the following FGLPROFILE entries:
dbi.database.dsname.ifxemul.datatype.char = { true | false }
dbi.database.dsname.ifxemul.datatype.varchar = { true | false }
For more details see IBM Informix emulation parameters in FGLPROFILE.

Using the SNC driver

The SNC driver can work in char or in wide-char mode:

The char mode must be used with applications defining database columns with CHAR/VARCHAR/TEXT types. It is not mandatory (i.e. the wide-char mode could be used), but it appears that SQL Server performs better, when char bindings are used for CHAR/VARCHAR/TEXT columns. In char mode, the SNC driver binds SQL parameters (SQLBindParameter) with the SQL_CHAR/SQL_VARCHAR ODBC SQL types. When defining CHAR(N)/VARCHAR(N) columns in SQL Server, you specify N as a number of bytes. Therefore, it is recommended that you use byte length semantics in Genero programs, with FGL_LENGTH_SEMANTICS=BYTE (this is the default).

The wide-char mode must be used for applications defining database columns with NCHAR/NVARCHAR/NTEXT types. These SQL types are used to store UNICODE data. In such case, the runtime system must use a UTF-8 locale, with character length semantics (FGL_LENGTH_SEMANTICS=CHAR). In wide-char mode, the SNC driver binds SQL parameters (SQLBindParameter) with the SQL_WCHAR/SQL_WVARCHAR ODBC SQL types. Furthermore, all string literals of an SQL statement are automatically changed to get the N prefix. Thus, you don't need to add the N prefix by hand in all of your programs. This solution makes your Genero code portable to other databases. When defining NCHAR(N)/NVARCHAR(N) columns in SQL Server, you specify N as a number of byte-pairs that corresponds to a number of characters in UCS-2. Therefore, it is recommended you use char length semantics in Genero programs, with FGL_LENGTH_SEMANTICS=CHAR.

By default, the SNC database driver selects the expected char or wide-char mode, depending on the current application locale, assuming that the database column types fit to the application locale:

  • If the application locale defines a single-byte encoding (such as ISO8859-15), the driver assumes that the database columns are defined with CHAR/VARCHAR/TEXT types to store single-byte characters, and the SNC driver will use SQL_[VAR]CHAR.
  • If the application locale defines a multibyte encoding (such as UTF-8 or BIG5), the driver assumes that the database columns are defined with NCHAR/NVARCHAR/NTEXT types to store UNICODE characters, and the SNC driver will use SQL_W[VAR]CHAR.
The char / wide-char modes can be forced with the following FGLPROFILE entry:
dbi.database.dsname.snc.widechar= { true | false }
Note: Set the dbi.database.dsname.snc.widechar to false, only if you are using a multibyte encoding such as BIG5, with CHAR/VARCHAR/TEXT column types in the database.

Using the ESM driver

When using the ESM (Easysoft) database driver, SQL Statements are prepared with SQLPrepare(), by using the current character set. Easysoft makes the necessary charset conversions from the client charset to UCS-2 or UTF-16 for the server. ODBC SQL parameters with character string data are bound (SQLBindParameter) with the C type SQL_C_CHAR and with the SQL type SQL_W[VAR]CHAR (=UNICODE) or with SQL_[VAR]CHAR, based on the current locale. The SQL_W[VAR]CHAR type is used if the current locale uses a multibyte encoding. When using a single-byte encoding, parameters are bound with the SQL_[VAR]CHAR type. String literals get the N prefix only if the current locale defines a multibyte encoding. String literals are not touched, if the locale uses a single-byte character set. As a result, the necessary character set conversion is controlled by Easysoft and is optimized for SQL Server when using a single-byte character set.

Important: Define the correct client character set in Easysoft configuration files. In the odbc.ini data source definition, the Easysoft client character set is specified with the "Client_CSet" parameter, and the server character set is defined by "Server_CSet" or "Server_UCSet" parameters. For example, to cover all UNICODE characters, define:
Client_CSet   = UTF-8
Server_UCSet  = UTF-16LE

Using the FTM driver

With the FTM (FreeTDS) database driver, SQL Statements are prepared with SQLPrepare(), by using the current character set. FreeTDS makes the necessary charset conversions from the client charset to UCS-2/UTF-16 before sending the SQL text to the server. ODBC SQL parameters with character string data are bound (SQLBindParameter) with the C type SQL_C_CHAR and with the SQL type SQL_W[VAR]CHAR (=UNICODE) or with SQL_[VAR]CHAR, based on the current locale. The SQL_W[VAR]CHAR type is used if the current locale uses a multibyte encoding. When using a single-byte encoding, parameters are bound with the SQL_[VAR]CHAR type. String literals get the N prefix only if the current locale defines a multibyte encoding. String literals are not touched, if the locale uses a single-byte character set. As a result, the necessary character set conversion is controlled by FreeTDS and is optimized for SQL Server when using a single-byte character set.

Important: Define the correct client character set in FreeTDS configuration files. The FreeTDS client character set is defined with "ClientCharset" parameter in odbc.ini.