Difference between revisions of "$utfencode"

Latest revision as of 19:33, 25 February 2011

Returns the result of encoding the provided text in UTF-8.

$utfencode(text, C)

The parameter C is an 8-bit (0-255) number referencing a GDI charset number for specific code page to unicode resolution.

Available charset numbers

The following table describes the available charset numbers:

GDI charset number	Charset
000	ANSI_CHARSET
001	DEFAULT_CHARSET
002	SYMBOL_CHARSET
077	MAC_CHARSET
128	SHIFTJIS_CHARSET
129	HANGEUL_CHARSET
130	JOHAB_CHARSET
134	GB2312_CHARSET
136	CHINESEBIG5_CHARSET
161	GREEK_CHARSET
162	TURKISH_CHARSET
163	VIETNAMESE_CHARSET
177	HEBREW_CHARSET
178	ARABIC_CHARSET
186	BALTIC_CHARSET
204	RUSSIAN_CHARSET
222	THAI_CHARSET
238	EASTEUROPE_CHARSET
255	OEM_CHARSET

If no charset number is provided, the default is assumed (001).

Note: GDI charsets 1 and 255 are system dependent and are therefore expected to return different results across different machines. Values not on the table are treated as a reference to the default (001).

Actual transition

Behind the scene, $utfencode does the following:

Looks up the char code in the charset map to be of what abstract character, e.g. for C the abstract character is LATIN CAPITAL LETTER C.
The abstract character is found in unicode's charset map.
The located char code is used to encode the char in UTF-8 encoding.

It is meaningless to supply a value for C when attempting to encode Unicode code points beyond the first 256. The reason for this is simple: these code pages are byte based and cannot make sense of these extended values.

Examples

$asc($utfdecode($utfencode($chr(195), 161)))

This example will return 915, which is the unicode char code of the greek capital letter Gamma.

@@ Line 59: / Line 59: @@
 Behind the scene, $utfencode does the following:
 # Looks up the char code in the charset map to be of what abstract character, e.g. for ''C'' the abstract character is ''LATIN CAPITAL LETTER C''.
-# Unicode's code page is referenced and the abstract character is located in unicode's charset map.
+# The abstract character is found in unicode's charset map.
 # The located char code is used to encode the char in UTF-8 encoding.
@@ Line 70: / Line 70: @@
 == See Also ==
 * [[$utfdecode]]
+* [[How to display ANSI chars in UTF-8 format]]
 [[Category:Undocumented identifiers]]

Difference between revisions of "$utfencode"

Latest revision as of 19:33, 25 February 2011

Contents

Available charset numbers

Actual transition

Examples

See Also

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools