Character set usage

The character set used to edit and compile .per form specification files is defined by the current locale. Form elements (typically, labels) can be written with non-ASCII characters of the current codeset.

The form element positions and sizes are determined by counting the width of characters, rather than the number of bytes identifying the characters in the current codeset. This rule can be ignored when using a single-byte character set such as ISO-8859-1 or CP-1252, where each character has width of 1 and codepoint of 1 byte. But this is important when using a multibyte character set like BIG5 or UTF-8.

For example, in the UTF-8 multibyte codeset, a Chinese ideogram is encoded with three bytes, while the visual width of the character is twice the size of a Latin character. In the next example, the labels with three Chinese characters have the same width as the labels using six Latin characters. As a result, all the labels will get the same size (6 cells), and all fields will be aligned properly in a proportional front display:

GRID
{
叽哱唶 [f001  ] abcdef [f002  ]
abcdef [f003  ] 叽哱唶 [f004  ]
}
END

For maximum portability, it is recommended to write all form specification files in ASCII (7 bit), and use localized strings to internationalize your forms.