William Jiang

JavaScript,PHP,Node,Perl,LAMP Web Developer – http://williamjxj.com; https://github.com/williamjxj?tab=repositories

Perl, unicode/utf8/gb2312 convert

Perl, unicode/utf8/gb2312 convert

Here is a helpful chinese article which summarizes Perl’s unicode/utf8/gb2312 transfer. I list here for quick retrieve:

use utf8;
use Encode;
use URI::Escape;

$\ = "\n";

#从unicode得到utf8编码
$str = '%u6536';
$str =~ s/\%u([0-9a-fA-F]{4})/pack("U",hex($1))/eg;
$str = encode( "utf8", $str );
print uc unpack( "H*", $str );

# 从unicode得到gb2312编码
$str = '%u6536';
$str =~ s/\%u([0-9a-fA-F]{4})/pack("U",hex($1))/eg;
$str = encode( "gb2312", $str );
print uc unpack( "H*", $str );

# 从中文得到utf8编码
$str = "收";
print uri_escape($str);

# 从utf8编码得到中文
$utf8_str = uri_escape("收");
print uri_unescape($str);

# 从中文得到perl unicode
utf8::decode($str);
@chars = split //, $str;
foreach (@chars) {
    printf "%x ", ord($_);
}

# 从中文得到标准unicode
$a = "汉语";
$a = decode( "utf8", $a );
map { print "\\u", sprintf( "%x", $_ ) } unpack( "U*", $a );

# 从标准unicode得到中文
$str = '%u6536';
$str =~ s/\%u([0-9a-fA-F]{4})/pack("U",hex($1))/eg;
$str = encode( "utf8", $str );
print $str;

# 从perl unicode得到中文
my $unicode = "\x{505c}\x{8f66}";
print encode( "utf8", $unicode );

Actually, to convert GB2312 to Unicode, then insert into MySQL Unicode_general_ci table, the following strange way might be more efficient:

use Encode;
$gb=decode("euc-cn","$gb");
$unicode=$dbh->quote($gb);
# to insert $unicode to MySQL unicode general_ci table.

It seems strange, but works fine. Others, like Encode:from_to(), Encode:encode() all don’t work.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: