$utfencode
Returns the result of encoding the provided text in UTF-8.
$utfencode(text, C)
The parameter C is an 8-bit (0-255) number referencing a GDI charset number for specific code page to unicode resolution.
Available charset numbers
The following table describes the available charset numbers:
GDI charset number | Charset |
000 | ANSI_CHARSET |
001 | DEFAULT_CHARSET |
002 | SYMBOL_CHARSET |
077 | MAC_CHARSET |
128 | SHIFTJIS_CHARSET |
129 | HANGEUL_CHARSET |
130 | JOHAB_CHARSET |
134 | GB2312_CHARSET |
136 | CHINESEBIG5_CHARSET |
161 | GREEK_CHARSET |
162 | TURKISH_CHARSET |
163 | VIETNAMESE_CHARSET |
177 | HEBREW_CHARSET |
178 | ARABIC_CHARSET |
186 | BALTIC_CHARSET |
204 | RUSSIAN_CHARSET |
222 | THAI_CHARSET |
238 | EASTEUROPE_CHARSET |
255 | OEM_CHARSET |
If no charset number is provided, the default is assumed (001).
Note: GDI charsets 1 and 255 are system dependent and are therefore expected to return different results across different machines. Values not on the table are treated as a reference to the default (001).
Actual transition
Behind the scene, $utfencode does the following:
- Looks up the char code in the charset map to be of what abstract character, e.g. for C the abstract character is LATIN CAPITAL LETTER C.
- The abstract character is found in unicode's charset map.
- The located char code is used to encode the char in UTF-8 encoding.
It is meaningless to supply a value for C when attempting to encode Unicode code points beyond the first 256. The reason for this is simple: these code pages are byte based and cannot make sense of these extended values.
Examples
$asc($utfdecode($utfencode($chr(195), 161)))
This example will return 915, which is the unicode char code of the greek capital letter Gamma.