William Jiang

JavaScript,PHP,Node,Perl,LAMP Web Developer – http://williamjxj.com; https://github.com/williamjxj?tab=repositories

convert unicode to utf-8

I am converting Unicode 16-bits to UTF-8 for Chinese characters display. Here are the source of the conversion:

1. UTF-8 -> http://en.wikipedia.org/wiki/Utf8
2. CJK Unified Ideographs

1. Unicode CJK

The Chinese charset is set in the range of CJK.

The basic block named CJK Unified Ideographs (4E00–9FFF) contains 20,941 basic Chinese characters in the range U+4E00 through U+9FCC. The Charts are accessible here:

4E00-62FF, 6300-77FF, 7800-8CFF, 8D00-9FFF.

2. utf-8 Unicode table

What I going to do is to translate the right side Unicode to left-side UTF-8 3-bytes character.

utf-8(3字节) unicode(16位 – 用十六进制)
 
3-byte
E_
 
Indic
0800*
224
Misc.
1000
225
Symbol
2000
226
Kana
CJK

3000
227
CJK
4000
228
CJK
5000
229
CJK
6000
230
CJK
7000
231
CJK
8000
232
CJK
9000
233
Asian
A000
234
Hangul
B000
235
Hangul
C000
236
Hangul
Surr

D000
237
Priv Use
E000
238
Forms
F000
239

3. unicode->utf8 convert Formular

For CJK set, there is 3-bytes utf8 for a unicode charactor(16-bits).

Bits Last code point Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6
  7 U+007F 0xxxxxxx
11 U+07FF 110xxxxx 10xxxxxx
16 U+FFFF 1110xxxx 10xxxxxx 10xxxxxx
21 U+1FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
26 U+3FFFFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
31 U+7FFFFFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

4. example

For chinese word ‘大’ (Unicode 0x5927), the convert from unicode to utf-8 are:

(1) 按照unicode转utf-8的编码规则,汉字使用3字节序列
所以套用三字节转换公式
0800 - FFFF 
1110xxxx 10xxxxxx 10xxxxxx
其中用x代表的16位使用unicode相应的位来填充

(2) 0x5927转换为2进制0101 1001 0010 0111
填充到上面公式中的x中变成
11100101 10100100 10100111
用16进制表示为E5 A4 A7

(3) 验证方法为:
在浏览器地址栏中输入javascript:alert(encodeURI('大').replace(/%/g,'')),按回车。
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: