RFR: 8211382 ISO2022JP and GB18030 NIO converter issues
Ichiroh Takiguchi
takiguc at linux.vnet.ibm.com
Tue Oct 2 09:21:50 UTC 2018
Hello,
IBM would like to contribute NIO converter patch to OpenJDK project.
Bug: https://bugs.openjdk.java.net/browse/JDK-8211382
Change: https://cr.openjdk.java.net/~itakiguchi/8211382/webrev.00/
Issue:
ISO2022JP decoder and GB18030 decoder (for decodeBufferLoop()) have code
range definition issues.
ISO2022JP, 0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28, 0x42, is converted to
\uFFA0
ISO2022JP is for Japanese, but \uFFA0 is a part of Korean Hangul
character.
GB18030, \uFFFE is converted to 0x84, 0x31, 0xA4, 0x38.
0x84, 0x31, 0xA4, 0x38 is converted to replacement character \uFFFD.
$ java Test1
\uFFA0
\uFFFD
Expected result
$ java Test1
\uFFFD
\uFFFE
Testcase is as follows:
========================
$ cat Test1.java
import java.nio.*;
import java.nio.charset.*;
public class Test1 {
public static void main(String[] args) throws Exception {
{
byte[] ba = new byte[] {0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28,
0x42,};
for(char ch : (new String(ba, "ISO2022JP")).toCharArray()) {
System.out.printf("\\u%04X",(int)ch);
}
System.out.println();
}
{
Charset cs = Charset.forName("GB18030");
CharsetDecoder cd = cs.newDecoder();
cd.onMalformedInput(CodingErrorAction.REPLACE)
.onUnmappableCharacter(CodingErrorAction.REPLACE);
byte[] ba = "\uFFFE".getBytes(cs);
ByteBuffer bb = ByteBuffer.allocateDirect(ba.length);
bb.put(ByteBuffer.wrap(ba));
bb.position(0);
CharBuffer cb = cd.decode(bb);
for(int i=0; i<cb.limit(); i++) {
System.out.printf("\\u%04X",(int)cb.get(i));
}
System.out.println();
}
}
}
========================
I'd like to obtain a sponsor for this issue.
Thanks,
Ichiroh Takiguchi
IBM Japan, Ltd.
More information about the core-libs-dev
mailing list