$utfencode

From Scriptwiki
Revision as of 18:33, 25 February 2011 by NaNg (talk | contribs) (Added "see also" reference + correction to actual transition)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Returns the result of encoding the provided text in UTF-8.

$utfencode(text, C)

The parameter C is an 8-bit (0-255) number referencing a GDI charset number for specific code page to unicode resolution.

Available charset numbers

The following table describes the available charset numbers:

GDI charset number Charset
000 ANSI_CHARSET
001 DEFAULT_CHARSET
002 SYMBOL_CHARSET
077 MAC_CHARSET
128 SHIFTJIS_CHARSET
129 HANGEUL_CHARSET
130 JOHAB_CHARSET
134 GB2312_CHARSET
136 CHINESEBIG5_CHARSET
161 GREEK_CHARSET
162 TURKISH_CHARSET
163 VIETNAMESE_CHARSET
177 HEBREW_CHARSET
178 ARABIC_CHARSET
186 BALTIC_CHARSET
204 RUSSIAN_CHARSET
222 THAI_CHARSET
238 EASTEUROPE_CHARSET
255 OEM_CHARSET

If no charset number is provided, the default is assumed (001).

Note: GDI charsets 1 and 255 are system dependent and are therefore expected to return different results across different machines. Values not on the table are treated as a reference to the default (001).

Actual transition

Behind the scene, $utfencode does the following:

  1. Looks up the char code in the charset map to be of what abstract character, e.g. for C the abstract character is LATIN CAPITAL LETTER C.
  2. The abstract character is found in unicode's charset map.
  3. The located char code is used to encode the char in UTF-8 encoding.

It is meaningless to supply a value for C when attempting to encode Unicode code points beyond the first 256. The reason for this is simple: these code pages are byte based and cannot make sense of these extended values.

Examples

$asc($utfdecode($utfencode($chr(195), 161)))

This example will return 915, which is the unicode char code of the greek capital letter Gamma.

See Also