CHARACTER data types

Informix® supports the following character data types:

In Informix, both CHAR/VARCHAR and NCHAR/NVARCHAR data types can be used to store single-byte or multibyte encoded character strings. The only difference between CHAR/VARCHAR and NCHAR/NVARCHAR is for sorting: N[VAR]CHAR types use the collation order, while [VAR]CHAR types use the byte order. The character set used to store strings in CHAR/VARCHAR/NCHAR/NVARCHAR columns is defined by the DB_LOCALE environment variable. The character set used by applications is defined by the CLIENT_LOCALE environment variable. Informix uses Byte Length Semantics (the size N that you specify in [VAR]CHAR(N) is expressed in bytes, not characters as in some other databases.)

Netezza® supports the following character data types:

Netezza stores single-byte character data in CHAR/VARCHAR columns, and stores UNICODE (UTF-8 encoded) character strings in NCHAR/NVARCHAR columns. You cannot store UTF-8 strings in CHAR/VARCHAR columns.

NCHAR/NVARCHAR data is always stored in UTF-8. The database character defines the encoding for CHAR and VARCHAR columns and is defined when creating the database with the CREATE DATABASE command; the default is latin9. Note that, at the time of writing these lines, Netezza V6 does not yet support a different database character set than latin9.

No automatic character set conversion is done by the Netezza software, this means that the application/client character set must match the database character set.

Solution

If your application uses a single-byte character set (i.e. latin9), you can create tables with the CHAR and VARCHAR types. However, if you want to store UNICODE (UTF-8) strings, you must use the NCHAR/NVARCHAR types instead when creating tables. In program sources you can use CHAR/VARCHAR; these types can hold single and multibyte character sets, according to the C POSIX locale.

Important: Netezza (V6 while writing these lines) supports only the latin9 database character set for CHAR / VARCHAR types. Since character set conversion is not supported, you can only implement either latin9 or UTF-8 based applications.

When using a multibyte character set (such as UTF-8), define database columns as NCHAR and NVARCHAR, with the size in character units, and use character length semantics in BDL programs with FGL_LENGTH_SEMANTICS=CHAR.

When extracting a database schema from a Netezza database, the schema extractor uses the size of the column in characters, not the octet length. If you have created a CHAR(10 (characters) ) column a in Netezza database using the UTF-8 character set, the .sch file will get a size of 10, that will be interpreted according to FGL_LENGTH_SEMANTICS as a number of bytes or characters.

Do not forget to properly define the database client character set, which must correspond to the runtime system character set.

See also the section about Localization.