Language and character set settings

Purpose of application locale definition

The locale settings matters at compile time and at runtime. At runtime, the locale changes the behavior of the character handling functions, such as UPSHIFT and DOWNSHIFT. It also changes the handling of the character strings, which can be single byte or multibyte encoded. Compilation errors will occur if the source files contain characters that do not exist in the encoding defined by the current locale.

Always check that the local environment variable matches the locale of your Genero application, during development and at runtime:
$ fglrun -i
Charmap      : UTF-8
Multibyte    : yes
Stateless    : yes
Length Semantics : CHAR

Mobile platforms

On iOS and Android™ mobile platforms, the locale is automatically defined to be UTF-8. This cannot be changed.

The language conventions and system messages are defined by the device settings.

Microsoft™ Windows® platforms

On Windows platforms, when the LANG environment variable is undefined, the language and character set defaults to the system locale which is defined by the regional settings for "non-Unicode applications". For example, on a US-English Windows, this defaults to the 1252 code page.

It is not recommended to set the LANG variable, unless your application uses a different character set than the Windows system locale.

On Windows platforms, the syntax of the LANG variable is:
  language[_territory[.codeset]]
| .codeset
For example:
C:\> set LANG=English_USA.1252

UNIX™ platforms

On UNIX-based platforms, The LANG/LC_ALL/LC_CTYPE environment variables define the local settings for the application.

The LC_CTYPE environment variable can be set to overwrite the LANG/LC_ALL setting. Make sure that LC_CTYPE does not define a different encoding than your application locale is requesting.

With the LANG/LC_ALL/LC_CTYPE environment variables, you define the language, the territory (aka country) and the codeset (aka character set or code page) to be used. The format of the value is normalized as follows, but may be specific on some operating systems:
language_territory.codeset
For example:
$ LC_ALL=en_US.iso88591; export LC_ALL

What are possible locales on my platform?

Usually OS vendors define a specific set of values for the language, territory and codeset. For example, on a UNIX platform, you typically have the value "en_US.ISO8859-1" for a US English locale, while Microsoft Windows requires the "English_USA.1252" value. For more details about supported locales, refer to the operating system documentation.

A list of available locales can be found on UNIX platform by running the locale -a command. You may also want to read the man pages of the locale command and the setlocale function. On Windows platforms, search the Microsoft MSDN documentation for "Language and Country/Region Strings".

UNICODE support (UTF-8)

To support multiple languages in your application, you must use UNICODE. The encoding supported by Genero for UNICODE applications is UTF-8.

On UNIX platforms, UTF-8 locales are natively supported with LANG/LC_ALL.

On Windows platforms, defining the LANG environment variable to code page 65001 will not work. According to Microsoft C++ setlocale() documentation, recent Windows versions support the UTF-8 encoding. This encoding can be used with Genero, by setting the LANG environment variable to "language_territory.UTF8":
C:\> set LANG=en_us.UTF8

For system-wide locale settings, see also language options for "non-Unicode applications", in the Language & region section of the Windows operating system settings.

If your Windows platform does not provide proper support for UTF-8, Genero implements full UTF-8 support on Windows by setting the LANG environment variable to the value .fglutf8 :

C:\> set LANG=.fglutf8