Difference between revisions of "$utfencode"
(Created page with "Returns the result of encoding the provided text in UTF-8. $utfencode(text, C) The parameter ''C'' is an 8-bit (0-255) number referencing a GDI charset number for specific code...") |
m (Added "see also" reference + correction to actual transition) |
||
(One intermediate revision by the same user not shown) | |||
Line 59: | Line 59: | ||
Behind the scene, $utfencode does the following: | Behind the scene, $utfencode does the following: | ||
# Looks up the char code in the charset map to be of what abstract character, e.g. for ''C'' the abstract character is ''LATIN CAPITAL LETTER C''. | # Looks up the char code in the charset map to be of what abstract character, e.g. for ''C'' the abstract character is ''LATIN CAPITAL LETTER C''. | ||
− | # | + | # The abstract character is found in unicode's charset map. |
# The located char code is used to encode the char in UTF-8 encoding. | # The located char code is used to encode the char in UTF-8 encoding. | ||
Line 70: | Line 70: | ||
== See Also == | == See Also == | ||
* [[$utfdecode]] | * [[$utfdecode]] | ||
+ | * [[How to display ANSI chars in UTF-8 format]] | ||
+ | |||
+ | [[Category:Undocumented identifiers]] |
Latest revision as of 18:33, 25 February 2011
Returns the result of encoding the provided text in UTF-8.
$utfencode(text, C)
The parameter C is an 8-bit (0-255) number referencing a GDI charset number for specific code page to unicode resolution.
Available charset numbers
The following table describes the available charset numbers:
GDI charset number | Charset |
000 | ANSI_CHARSET |
001 | DEFAULT_CHARSET |
002 | SYMBOL_CHARSET |
077 | MAC_CHARSET |
128 | SHIFTJIS_CHARSET |
129 | HANGEUL_CHARSET |
130 | JOHAB_CHARSET |
134 | GB2312_CHARSET |
136 | CHINESEBIG5_CHARSET |
161 | GREEK_CHARSET |
162 | TURKISH_CHARSET |
163 | VIETNAMESE_CHARSET |
177 | HEBREW_CHARSET |
178 | ARABIC_CHARSET |
186 | BALTIC_CHARSET |
204 | RUSSIAN_CHARSET |
222 | THAI_CHARSET |
238 | EASTEUROPE_CHARSET |
255 | OEM_CHARSET |
If no charset number is provided, the default is assumed (001).
Note: GDI charsets 1 and 255 are system dependent and are therefore expected to return different results across different machines. Values not on the table are treated as a reference to the default (001).
Actual transition
Behind the scene, $utfencode does the following:
- Looks up the char code in the charset map to be of what abstract character, e.g. for C the abstract character is LATIN CAPITAL LETTER C.
- The abstract character is found in unicode's charset map.
- The located char code is used to encode the char in UTF-8 encoding.
It is meaningless to supply a value for C when attempting to encode Unicode code points beyond the first 256. The reason for this is simple: these code pages are byte based and cannot make sense of these extended values.
Examples
$asc($utfdecode($utfencode($chr(195), 161)))
This example will return 915, which is the unicode char code of the greek capital letter Gamma.