Maybe codec bug in MS1252, i.e., encoding Cp1252

Eric Liang eric.l.2046 at gmail.com
Thu Sep 1 12:12:27 PDT 2011


Hi all,
I've recently got an encoding error while using Cp1252 with UTF-8, the
string converted from UTF-8 to Cp1252 can not be converted back:

    String name1 = new String( new String("兆源").getBytes("UTF-8"),
    "Cp1252");
    String name2 = new String( name1.getBytes("Cp1252"), "UTF-8");

It looks like that there are some incorrect codes in jdk on encoding
Cp1252, and the related codes are:

    0x83    0x0192    ;Latin Small Letter F With Hook
    0x8d    0x008d
    0x8f    0x008f
    0x90    0x0090
    0x9d    0x009d

    ( from the Cp1252->UTF-8 map in
    http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt
    )

After I cloned the repository in http://hg.openjdk.java.net/jdk6/jdk6
and fix these codes in MS1252.java, the encoding error has gone.

I guess this is the right place to discuss this problem, and the patch
is in the attachment. Anyone with any comment is appreciated.

Regards,
Eric

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCM/CS/E/MU/P d+(-) s: a- C++ UL$ P+>++ L++ E++ W++ N+ o+>++ K+++ w !O
M-(+) V-- PS+ PE+ Y+ PGP++ t? 5? X? R+>* tv@ b++++ DI-- D G++ e++>+++@ h*
r !y+
------END GEEK CODE BLOCK------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/jdk6-dev/attachments/20110902/834fbece/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: codec_bug_on_MS1252.diff
Type: text/x-patch
Size: 2327 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/jdk6-dev/attachments/20110902/834fbece/attachment.bin 


More information about the jdk6-dev mailing list