Difference between revisions of "$utfencode"

From Scriptwiki
Jump to: navigation, search
m (Added category)
m (Added "see also" reference + correction to actual transition)
 
Line 59: Line 59:
 
Behind the scene, $utfencode does the following:
 
Behind the scene, $utfencode does the following:
 
# Looks up the char code in the charset map to be of what abstract character, e.g. for ''C'' the abstract character is ''LATIN CAPITAL LETTER C''.
 
# Looks up the char code in the charset map to be of what abstract character, e.g. for ''C'' the abstract character is ''LATIN CAPITAL LETTER C''.
# Unicode's code page is referenced and the abstract character is located in unicode's charset map.
+
# The abstract character is found in unicode's charset map.
 
# The located char code is used to encode the char in UTF-8 encoding.
 
# The located char code is used to encode the char in UTF-8 encoding.
  
Line 70: Line 70:
 
== See Also ==
 
== See Also ==
 
* [[$utfdecode]]
 
* [[$utfdecode]]
 +
* [[How to display ANSI chars in UTF-8 format]]
  
 
[[Category:Undocumented identifiers]]
 
[[Category:Undocumented identifiers]]

Latest revision as of 19:33, 25 February 2011

Returns the result of encoding the provided text in UTF-8.

$utfencode(text, C)

The parameter C is an 8-bit (0-255) number referencing a GDI charset number for specific code page to unicode resolution.

Available charset numbers

The following table describes the available charset numbers:

GDI charset number Charset
000 ANSI_CHARSET
001 DEFAULT_CHARSET
002 SYMBOL_CHARSET
077 MAC_CHARSET
128 SHIFTJIS_CHARSET
129 HANGEUL_CHARSET
130 JOHAB_CHARSET
134 GB2312_CHARSET
136 CHINESEBIG5_CHARSET
161 GREEK_CHARSET
162 TURKISH_CHARSET
163 VIETNAMESE_CHARSET
177 HEBREW_CHARSET
178 ARABIC_CHARSET
186 BALTIC_CHARSET
204 RUSSIAN_CHARSET
222 THAI_CHARSET
238 EASTEUROPE_CHARSET
255 OEM_CHARSET

If no charset number is provided, the default is assumed (001).

Note: GDI charsets 1 and 255 are system dependent and are therefore expected to return different results across different machines. Values not on the table are treated as a reference to the default (001).

Actual transition

Behind the scene, $utfencode does the following:

  1. Looks up the char code in the charset map to be of what abstract character, e.g. for C the abstract character is LATIN CAPITAL LETTER C.
  2. The abstract character is found in unicode's charset map.
  3. The located char code is used to encode the char in UTF-8 encoding.

It is meaningless to supply a value for C when attempting to encode Unicode code points beyond the first 256. The reason for this is simple: these code pages are byte based and cannot make sense of these extended values.

Examples

$asc($utfdecode($utfencode($chr(195), 161)))

This example will return 915, which is the unicode char code of the greek capital letter Gamma.

See Also