CHAR and VARCHAR data types
Informix®
Informix supports the following character data types:
| Informix data type | Description |
|---|---|
CHAR(n) |
SBCS and MBCS character data (max is 32767 bytes) |
VARCHAR(n |
SBCS and MBCS character data (max is 255 bytes) |
NCHAR(n) |
Same as CHAR, with specific collation
order |
NVARCHAR(n |
Same as VARCHAR, with specific collation
order |
LVARCHAR(n) |
max size varies depending on the IDS version |
With Informix, both
CHAR/VARCHAR and NCHAR/NVARCHAR data types can be used to store
single-byte or multibyte encoded character strings. The only difference between
CHAR/VARCHAR and NCHAR/NVARCHAR is in how they use sorting:
N[VAR]CHAR types use the collation order, while [VAR]CHAR types
use the byte order.
The character set used to store strings in
CHAR/VARCHAR/NCHAR/NVARCHAR columns is defined by the DB_LOCALE environment
variable.
Informix uses Byte Length Semantics (the size
N that you specify in [VAR]CHAR(N) is expressed in bytes, not characters as in some
other databases)
Microsoft™ SQL Server
Microsoft SQL Server supports following data types to store character data:
| Microsoft SQL Server data type | Description |
|---|---|
CHAR(n) |
SBCS or MBCS encoding using the database collation (starting with SQL Server 2019 it can be UTF-8), where n is specified in bytes (maximum storage size is 8000 bytes) |
VARCHAR(n) |
SBCS or MBCS encoding using the database collation (starting with SQL Server 2019 it can be UTF-8), where n is specified in bytes (maximum storage size is 8000 bytes) |
VARCHAR(MAX) |
SBCS or MBCS encoding using the database collation (starting with SQL Server 2019 it can be UTF-8), to store large text data (maximum storage size is 2^31-1 bytes) |
NCHAR(n) |
UCS-2/UTF-16 encoding, where n is specified in byte-pairs (maximum storage size is 4000 byte-pairs) |
NVARCHAR(n) |
UCS-2/UTF-16 encoding, where n is specified in byte-pairs (maximum storage size is 4000 byte-pairs) |
NVARCHAR(MAX) |
UCS-2/UTF-16 encoding, to store large text data (maximum storage size is 2^30-1 byte-pairs) |
To store large text data (LOBs), Microsoft SQL Server
(2005) provides the VARCHAR(MAX)/NVARCHAR(MAX) type as a
replacement for the old TEXT/NTEXT types. See TEXT and BYTE (LOB) types for more details.
The use of NCHAR, NVARCHAR character types is the same as
CHAR, VARCHAR respectively, except:
- For
NCHAR/NVARCHAR, the character encoding is UCS-2 or UTF-16, depending if the database collation is using the SC option. - The length
NinN[VAR]CHAR(N)defines a number of byte-pairs, not bytes. For UCS-2 this corresponds to a number of characters. But with UTF-16 there can be surrogate pairs using 4 bytes (2 byte-pairs). - The maximum size of
NCHAR(N)andNVARCHAR(N)column is 4000 byte-pairs, compared to 8000 bytes forCHAR/VARCHARusing a single-byte character set. - Unicode string literals are specified with a leading N. For example: N'日本語'
- The LIKE statement behaves
differently with
CHARandNCHARcolumns when using the N prefix before the search pattern.
Note that SQL Server uses Byte Length Semantics to define the size of
CHAR/VARCHAR columns, while NCHAR and NVARCHAR
sizes are expressed in byte-pair units, which for most cases (UCS-2) corresponds to a number of
characters (but this is not true for UTF-16 characters encoded in surrogate pairs).
SQL Server defines the character encoding for CHAR and VARCHAR
columns with the database collation, specified when creating a new database. When using the _UTF8
modifier, character strings are UTF-8 encoded in CHAR/VARCHAR columns. In
NCHAR/NVARCHAR columns, character strings are always encoded in UCS-2 or UTF-16.
UTF-16 is used when the DB collation has the SC option.
Automatic charset conversion is supported by SQL Server between the client application and the server. The client charset is defined by the Windows® operating system, in the language settings for non-Unicode applications.
Solution
Check that your database tables does not use CHAR or VARCHAR
types with a length exceeding the limits of SQL Server character types.
Table columns using a different character encoding than the database is not supported with Genero: All table columns must use the same character encoding defined at the database level.
CHAR/VARCHAR
or NCHAR/NVARCHAR columns with SQL Server:- When the application charset is single-byte (like ISO-8859-15 or CP-1252), use
CHAR/VARCHARcolumns and define a database collation matching the application locale. In this configuration, the number of bytes corresponds to the number of characters, for both programCHAR/VARCHARvariables and databaseCHAR/VARCHARcolumns. - When the application locale is UTF-8, and the length semantics is in characters
(FGL_LENGTH_SEMANTICS=CHAR), use
NCHAR/NVARCHARcolumns. In this case, the size ofCHAR/VARCHARprogram variables (as a number of chars) matches the size and unit of database columns. - When the application locale is UTF-8, and the length semantics is in bytes
(FGL_LENGTH_SEMANTICS=BYTE), use
CHAR/VARCHARcolumns with _UTF8 collation. In this case, the size ofCHAR/VARCHARprogram variables (as a number of bytes) matches the size and unit of database columns.
Make sure that the regional language settings for non-Unicode applications corresponds to the locale used by Genero programs. For more details about program locale settings, see Localization.
The widechar mode of SQL Server drivers defines the ODBC SQL parameters and fetch buffers
bindings for character strings (SQL_CHAR or SQL_WCHAR), as well as automatic adding of N prefix in SQL
string literals. To get best behavior and performance of SQL Server, the widechar mode should
be enabled when using NCHAR/NVARCHAR columns, and it should be disabled when using
CHAR/VARCHAR columns (ref: FGL-315).
By Default, with all SQL Server ODI drivers, when not setting the FGLPROFILE
*.widechar option to false, the widechar mode is automatically
enabled, if the application locale is multibyte (UTF-8) and FGL_LENGTH_SEMANTICS=CHAR. Otherwise,
with SBCS locale (ISO-8859-15) or MBCS locale (UTF-8/BIG5) and BYTE length semantics and SQL Server
2019 + DB collation is *_UTF8, the widechar mode defaults to false. If the SQL Server
version is older than 2019 or the DB collation is NOT *_UTF8, for backward compatibility, the
widechar mode is enabled by default when the application locale is MBCS.
When extracting a database schema from an SQL Server database, the fgldbsch schema extractor uses the
size of the column in byte-pair units, not the byte length. If you have created an
NCHAR(10) column in an SQL Server database, the .sch file will
get a size of 10, that will be interpreted as a number of bytes or characters, depending on
FGL_LENGTH_SEMANTICS.
CHAR/VARCHAR type translation can be controlled with the following FGLPROFILE
entries:dbi.database.dsname.ifxemul.datatype.char = { true | false }
dbi.database.dsname.ifxemul.datatype.varchar = { true | false }For
more details see IBM Informix emulation parameters in FGLPROFILE.Automatic conversion of CHAR/VARCHAR type names
By default, in DDL statements executed by programs (such as CREATE TABLE), the
CHAR, VARCHAR and LVARCHAR type names are left
untouched.
CHAR/VARCHAR/LVARCHAR by
NCHAR/NVARCHAR type names (when the widechar mode is enabled), set the
dbi.database.dsname.ifxemul.nationalchars FGLPROFILE entry to
true (default is
false):dbi.database.dsname.ifxemul.nationalchars = true
- The
dbi.database.dsname.ifxemul.nationalcharswill only take effect, if the current application locale is multibyte (typically, UTF-8) and the widechar mode is enabled (this is the default with UTF-8 and CHAR length semantics). - The
dbi.database.dsname.ifxemul.nationalcharsparameter is ignored, if the switch corresponding to the character type name (dbi.database.dsname.datatype.) is set to{char|varchar|text}false.
Using the SNC driver
setlocale() on Unix/Linux
and GetACP() on Windows.On Windows platforms, when not using
the widechar mode, the MS ODBC driver makes charset conversions based on the system
locale for non-unicode applications (GetACP()). It does not use the locale set by
setlocale() (driven by the LANG environment variable with Genero). Therefore, it is
not possible to force the application with LANG to another code page as the system locale, except
for LANG=.fglutf8.
According to the char or widechar mode, the SNC driver will do the
appropriate ODBC bindings, using SQL_CHAR/SQL_VARCHAR or
SQL_WCHAR/SQL_WVARCHAR SQL types, and let the MS ODBC driver do the charset
conversions when binding with SQL_C_CHAR for the C type, or do the appropriate
charset conversions when using SQL_C_WCHAR for the C type.
On Windows
platforms, the ODBC data source option "Perform translation for character data" must be
disabled, when using UTF-8 application locale defined by LANG=.fglutf8,
NLS_LENGTH_SEMANTICS=BYTE, and the database has _UTF8 collation for CHAR/VARCHAR columns (the
widechar mode is disabled).
dbi.database.dsname.snc.widechar= { true | false }Using the ESM driver
The database client application codeset needs to be defined when using the ESM driver.
Client_CSet" parameter,
and the server character set is defined by "Server_CSet" or
"Server_UCSet" parameters. For example, to cover all UNICODE characters,
define:Client_CSet = UTF-8
Server_UCSet = UTF-16LETo support all UNICODE characters when using UTF-8 with NCHAR/NVARCHAR columns,
you need to define Client_CSet=UTF-8 and
Server_UCSet=UTF-16LE.
According to the char or widechar mode, the ESM driver will do the
appropriate ODBC bindings, using SQL_CHAR/SQL_VARCHAR or
SQL_WCHAR/SQL_WVARCHAR as SQL type and SQL_C_CHAR as C type. The
Easysoft ODBC driver will then do the appropriate character set conversions for
SQL_C_CHAR data.
dbi.database.dsname.esm.widechar= { true | false }Using the FTM driver
The database client application codeset needs to be defined when using the FTM driver.
ClientCharset" parameter in
odbc.ini:ClientCharset = UTF-8SQL_CHAR/SQL_VARCHAR or SQL_WCHAR/SQL_WVARCHAR as
SQL type and SQL_C_CHAR as C type. The FreeTDS ODBC driver will then do the
appropriate character set conversions for SQL_C_CHAR data.dbi.database.dsname.ftm.widechar= { true | false }