CHAR and VARCHAR data types
Informix®
Informix supports the following character data types:
| Informix data type | Description |
|---|---|
CHAR(n) |
SBCS and MBCS character data (max is 32767 bytes) |
VARCHAR(n |
SBCS and MBCS character data (max is 255 bytes) |
NCHAR(n) |
Same as CHAR, with specific collation
order |
NVARCHAR(n |
Same as VARCHAR, with specific collation
order |
LVARCHAR(n) |
max size varies depending on the IDS version |
With Informix, both
CHAR/VARCHAR and NCHAR/NVARCHAR data types can be used to store
single-byte or multibyte encoded character strings. The only difference between
CHAR/VARCHAR and NCHAR/NVARCHAR is in how they use sorting:
N[VAR]CHAR types use the collation order, while [VAR]CHAR types
use the byte order.
The character set used to store strings in
CHAR/VARCHAR/NCHAR/NVARCHAR columns is defined by the DB_LOCALE environment
variable.
Informix uses Byte Length Semantics (the size
N that you specify in [VAR]CHAR(N) is expressed in bytes, not characters as in some
other databases)
Microsoft™ SQL Server
Microsoft SQL Server supports following data types to store character data:
| Microsoft SQL Server data type | Description |
|---|---|
CHAR(n) |
SBCS or MBCS encoding using the database collation, where n is specified in bytes (maximum storage size is 8000 bytes) |
VARCHAR(n) |
SBCS or MBCS encoding using the database collation, where n is specified in bytes (maximum storage size is 8000 bytes) |
VARCHAR(MAX) |
SBCS or MBCS encoding using the database collation, to store large text data (maximum storage size is 2^31-1 bytes) |
NCHAR(n) |
UCS-2/UTF-16 encoding, where n is specified in byte-pairs (maximum storage size is 4000 byte-pairs) |
NVARCHAR(n) |
UCS-2/UTF-16 encoding, where n is specified in byte-pairs (maximum storage size is 4000 byte-pairs) |
NVARCHAR(MAX) |
UCS-2/UTF-16 encoding, to store large text data (maximum storage size is 2^30-1 byte-pairs) |
VARCHAR(MAX)/NVARCHAR(MAX) type as a
replacement for the old TEXT/NTEXT types. See TEXT and BYTE (LOB) types for more details.The use of NCHAR, NVARCHAR character types is the same as
CHAR, VARCHAR respectively, except:
- For
NCHAR/NVARCHAR, the character encoding is UCS-2 or UTF-16, depending if the database collation is using the SC option. - The length
NinN[VAR]CHAR(N)defines a number of byte-pairs, not bytes. For UCS-2 this corresponds to a number of characters. But with UTF-16 there can be surrogate pairs using 4 bytes (2 byte-pairs). - The maximum size of
NCHAR(N)andNVARCHAR(N)column is 4000 byte-pairs, compared to 8000 bytes forCHAR/VARCHARusing a single-byte character set. - Unicode string literals are specified with a leading N. For example: N'日本語'
- The LIKE statement behaves
differently with
CHARandNCHARcolumns when using the N prefix before the search pattern.
Note that SQL Server uses Byte Length Semantics to define the size of
CHAR/VARCHAR columns, while NCHAR and NVARCHAR
sizes are expressed in byte-pair units, which for most cases (UCS-2) corresponds to a number of
characters (but this is not true for UTF-16 characters encoded in surrogate pairs).
SQL Server defines the character encoding for CHAR and VARCHAR
columns with the database collation. The database collation can be specified when creating a new
database. Character strings are always stored in the UCS-2 or UTF-16 encoding for
NCHAR/NVARCHAR columns. UTF-16 is used when the DB collation has the SC option.
Automatic charset conversion is supported by SQL Server between the client application and the server. The client charset is defined by the Windows® operating system, in the language settings for non-Unicode applications.
Solution
Depending on the character set used by your application, you must either use
CHAR/VARCHAR or NCHAR/NVARCHAR columns with SQL Server. If the
charset is single-byte , you can use CHAR/VARCHAR columns. If the charset set is
multibyte or Unicode (i.e. UTF-8), you must use NCHAR/NVARCHAR columns in SQL
Server.
Make sure that the regional language settings for non-Unicode applications corresponds to the locale used by Genero programs. For more details about program locale settings, see Localization.
Check that your database tables does not use CHAR or VARCHAR
types with a length exceeding the SQL Server character types limits.
When using a multibyte character set (such as UTF-8), define database columns as
NCHAR and NVARCHAR, with the size in byte-pair units, and use
character length semantics in BDL programs with FGL_LENGTH_SEMANTICS=CHAR. When using a single-byte
character set in the programs (like ISO-8859-15), the column types can be CHAR and
VARCHAR, and the size in bytes corresponds to the number of characters.
CREATE TABLE), the
CHAR, VARCHAR and LVARCHAR type names are left
untouched. To force the ODI drivers to replace CHAR/VARCHAR/LVARCHAR by
NCHAR/NVARCHAR type names when the locals is multibyte, set the
dbi.database.dsname.ifxemul.nationalchars FGLPROFILE entry to
true (default is
false):dbi.database.dsname.ifxemul.nationalchars = true - With SQL Server ODI drivers,
dbi.database.dsname.ifxemul.nationalcharswill only take effect, if the current application locale is multibyte (typically, UTF-8) - The
dbi.database.dsname.ifxemul.nationalcharsparameter is ignored, if the switch corresponding to the character type name (dbi.database.dsname.datatype.{char|varchar|text}) is set tofalse. - With the SNC ODI
driver, to have
dbi.database.dsname.ifxemul.nationalcharstake effect, the wide-char mode must be enabled (this is the default with multibyte character sets).
When extracting a database schema from a SQL Server database, the fgldbsch schema extractor uses the
size of the column in byte-pair units, not the byte length. If you have created an
NCHAR(10) column in an SQL Server database, the .sch file will
get a size of 10, that will be interpreted as a number of bytes or characters, depending on
FGL_LENGTH_SEMANTICS.
Do not forget to properly define the database client character set, which must correspond to the runtime system character set.
CHAR/VARCHAR type translation can be controlled with the following FGLPROFILE
entries:dbi.database.dsname.ifxemul.datatype.char = { true | false }
dbi.database.dsname.ifxemul.datatype.varchar = { true | false }For
more details see IBM Informix emulation parameters in FGLPROFILE.Using the SNC driver
The SNC driver can work in char or in wide-char mode:
The char mode must be used with applications defining database columns with
CHAR/VARCHAR/TEXT types. It is not mandatory (i.e. the wide-char mode
could be used), but it appears that SQL Server performs better, when char bindings are
used for CHAR/VARCHAR/TEXT columns. In char mode, the SNC driver binds
SQL parameters (SQLBindParameter) with the SQL_CHAR/SQL_VARCHAR ODBC SQL types.
When defining CHAR(N)/VARCHAR(N) columns in SQL Server, you specify N as a number
of bytes. Therefore, it is recommended that you use byte length semantics in Genero programs, with
FGL_LENGTH_SEMANTICS=BYTE (this is the default).
The wide-char mode must be used for applications defining database columns with
NCHAR/NVARCHAR/NTEXT types. These SQL types are used to store UNICODE data. In such
case, the runtime system must use a UTF-8 locale, with character length semantics
(FGL_LENGTH_SEMANTICS=CHAR). In wide-char mode, the SNC driver binds SQL parameters
(SQLBindParameter) with the SQL_WCHAR/SQL_WVARCHAR ODBC SQL types. Furthermore, all
string literals of an SQL statement are automatically changed to get the N prefix. Thus, you don't
need to add the N prefix by hand in all of your programs. This solution makes your Genero code
portable to other databases. When defining NCHAR(N)/NVARCHAR(N) columns in SQL
Server, you specify N as a number of byte-pairs that corresponds to a number of characters in UCS-2.
Therefore, it is recommended you use char length semantics in Genero programs, with
FGL_LENGTH_SEMANTICS=CHAR.
By default, the SNC database driver selects the expected char or wide-char mode, depending on the current application locale, assuming that the database column types fit to the application locale:
- If the application locale defines a single-byte encoding (such as ISO8859-15), the driver
assumes that the database columns are defined with
CHAR/VARCHAR/TEXTtypes to store single-byte characters, and the SNC driver will useSQL_[VAR]CHAR. - If the application locale defines a multibyte encoding (such as UTF-8 or BIG5), the driver
assumes that the database columns are defined with
NCHAR/NVARCHAR/NTEXTtypes to store UNICODE characters, and the SNC driver will useSQL_W[VAR]CHAR.
dbi.database.dsname.snc.widechar= { true | false }dbi.database.dsname.snc.widechar to
false, only if you are using a multibyte encoding such as BIG5, with
CHAR/VARCHAR/TEXT column types in the database.Using the ESM driver
When using the ESM (Easysoft) database driver, SQL Statements are prepared with SQLPrepare(), by
using the current character set. Easysoft makes the necessary charset conversions from the client
charset to UCS-2 or UTF-16 for the server. ODBC SQL parameters with character string data are bound
(SQLBindParameter) with the C type SQL_C_CHAR and with the SQL type
SQL_W[VAR]CHAR (=UNICODE) or with SQL_[VAR]CHAR, based on the
current locale. The SQL_W[VAR]CHAR type is used if the current locale uses a
multibyte encoding. When using a single-byte encoding, parameters are bound with the
SQL_[VAR]CHAR type. String literals get the N prefix only if the current locale
defines a multibyte encoding. String literals are not touched, if the locale uses a single-byte
character set. As a result, the necessary character set conversion is controlled by Easysoft and is
optimized for SQL Server when using a single-byte character set.
Client_CSet" parameter, and the server character set is defined by
"Server_CSet" or "Server_UCSet" parameters. For example, to cover
all UNICODE characters,
define:Client_CSet = UTF-8
Server_UCSet = UTF-16LEUsing the FTM driver
With the FTM (FreeTDS) database driver, SQL Statements are prepared with SQLPrepare(), by using
the current character set. FreeTDS makes the necessary charset conversions from the client charset
to UCS-2/UTF-16 before sending the SQL text to the server. ODBC SQL parameters with character string data
are bound (SQLBindParameter) with the C type SQL_C_CHAR and with the SQL type
SQL_W[VAR]CHAR (=UNICODE) or with SQL_[VAR]CHAR, based on the
current locale. The SQL_W[VAR]CHAR type is used if the current locale uses a
multibyte encoding. When using a single-byte encoding, parameters are bound with the
SQL_[VAR]CHAR type. String literals get the N prefix only if the current locale
defines a multibyte encoding. String literals are not touched, if the locale uses a single-byte
character set. As a result, the necessary character set conversion is controlled by FreeTDS and is
optimized for SQL Server when using a single-byte character set.
ClientCharset" parameter in
odbc.ini.