Maybe codec bug in MS1252, i.e., encoding Cp1252
Eric Liang
eric.l.2046 at gmail.com
Thu Sep 1 12:12:27 PDT 2011
Hi all,
I've recently got an encoding error while using Cp1252 with UTF-8, the
string converted from UTF-8 to Cp1252 can not be converted back:
String name1 = new String( new String("兆源").getBytes("UTF-8"),
"Cp1252");
String name2 = new String( name1.getBytes("Cp1252"), "UTF-8");
It looks like that there are some incorrect codes in jdk on encoding
Cp1252, and the related codes are:
0x83 0x0192 ;Latin Small Letter F With Hook
0x8d 0x008d
0x8f 0x008f
0x90 0x0090
0x9d 0x009d
( from the Cp1252->UTF-8 map in
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt
)
After I cloned the repository in http://hg.openjdk.java.net/jdk6/jdk6
and fix these codes in MS1252.java, the encoding error has gone.
I guess this is the right place to discuss this problem, and the patch
is in the attachment. Anyone with any comment is appreciated.
Regards,
Eric
--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCM/CS/E/MU/P d+(-) s: a- C++ UL$ P+>++ L++ E++ W++ N+ o+>++ K+++ w !O
M-(+) V-- PS+ PE+ Y+ PGP++ t? 5? X? R+>* tv@ b++++ DI-- D G++ e++>+++@ h*
r !y+
------END GEEK CODE BLOCK------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/jdk6-dev/attachments/20110902/834fbece/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: codec_bug_on_MS1252.diff
Type: text/x-patch
Size: 2327 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/jdk6-dev/attachments/20110902/834fbece/attachment.bin
More information about the jdk6-dev
mailing list