RFR: 8211382 ISO2022JP and GB18030 NIO converter issues
Hello, IBM would like to contribute NIO converter patch to OpenJDK project. Bug: https://bugs.openjdk.java.net/browse/JDK-8211382 Change: https://cr.openjdk.java.net/~itakiguchi/8211382/webrev.00/ Issue: ISO2022JP decoder and GB18030 decoder (for decodeBufferLoop()) have code range definition issues. ISO2022JP, 0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28, 0x42, is converted to \uFFA0 ISO2022JP is for Japanese, but \uFFA0 is a part of Korean Hangul character. GB18030, \uFFFE is converted to 0x84, 0x31, 0xA4, 0x38. 0x84, 0x31, 0xA4, 0x38 is converted to replacement character \uFFFD. $ java Test1 \uFFA0 \uFFFD Expected result $ java Test1 \uFFFD \uFFFE Testcase is as follows: ======================== $ cat Test1.java import java.nio.*; import java.nio.charset.*; public class Test1 { public static void main(String[] args) throws Exception { { byte[] ba = new byte[] {0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28, 0x42,}; for(char ch : (new String(ba, "ISO2022JP")).toCharArray()) { System.out.printf("\\u%04X",(int)ch); } System.out.println(); } { Charset cs = Charset.forName("GB18030"); CharsetDecoder cd = cs.newDecoder(); cd.onMalformedInput(CodingErrorAction.REPLACE) .onUnmappableCharacter(CodingErrorAction.REPLACE); byte[] ba = "\uFFFE".getBytes(cs); ByteBuffer bb = ByteBuffer.allocateDirect(ba.length); bb.put(ByteBuffer.wrap(ba)); bb.position(0); CharBuffer cb = cd.decode(bb); for(int i=0; i<cb.limit(); i++) { System.out.printf("\\u%04X",(int)cb.get(i)); } System.out.println(); } } } ======================== I'd like to obtain a sponsor for this issue. Thanks, Ichiroh Takiguchi IBM Japan, Ltd.
+1 -Sherman btw, since gb18030.decodeArrayLoop() is right I would assume it's just a "typo" in decodeBufferLoop() On 10/2/18, 2:21 AM, Ichiroh Takiguchi wrote:
Hello, IBM would like to contribute NIO converter patch to OpenJDK project.
Bug: https://bugs.openjdk.java.net/browse/JDK-8211382 Change: https://cr.openjdk.java.net/~itakiguchi/8211382/webrev.00/
Issue: ISO2022JP decoder and GB18030 decoder (for decodeBufferLoop()) have code range definition issues.
ISO2022JP, 0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28, 0x42, is converted to \uFFA0 ISO2022JP is for Japanese, but \uFFA0 is a part of Korean Hangul character.
GB18030, \uFFFE is converted to 0x84, 0x31, 0xA4, 0x38. 0x84, 0x31, 0xA4, 0x38 is converted to replacement character \uFFFD.
$ java Test1 \uFFA0 \uFFFD
Expected result $ java Test1 \uFFFD \uFFFE
Testcase is as follows: ======================== $ cat Test1.java import java.nio.*; import java.nio.charset.*;
public class Test1 { public static void main(String[] args) throws Exception { { byte[] ba = new byte[] {0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28, 0x42,}; for(char ch : (new String(ba, "ISO2022JP")).toCharArray()) { System.out.printf("\\u%04X",(int)ch); } System.out.println(); } { Charset cs = Charset.forName("GB18030"); CharsetDecoder cd = cs.newDecoder(); cd.onMalformedInput(CodingErrorAction.REPLACE) .onUnmappableCharacter(CodingErrorAction.REPLACE); byte[] ba = "\uFFFE".getBytes(cs); ByteBuffer bb = ByteBuffer.allocateDirect(ba.length); bb.put(ByteBuffer.wrap(ba)); bb.position(0); CharBuffer cb = cd.decode(bb); for(int i=0; i<cb.limit(); i++) { System.out.printf("\\u%04X",(int)cb.get(i)); } System.out.println(); } } } ========================
I'd like to obtain a sponsor for this issue.
Thanks, Ichiroh Takiguchi IBM Japan, Ltd.
Hello. Additional reviewer is required. It's typo issue as Sherman explained. Thanks, Ichiroh Takiguchi IBM Japan, Ltd. On 2018-10-03 07:01, Xueming Shen wrote:
+1
-Sherman
btw, since gb18030.decodeArrayLoop() is right I would assume it's just a "typo" in decodeBufferLoop()
On 10/2/18, 2:21 AM, Ichiroh Takiguchi wrote:
Hello, IBM would like to contribute NIO converter patch to OpenJDK project.
Bug: https://bugs.openjdk.java.net/browse/JDK-8211382 Change: https://cr.openjdk.java.net/~itakiguchi/8211382/webrev.00/
Issue: ISO2022JP decoder and GB18030 decoder (for decodeBufferLoop()) have code range definition issues.
ISO2022JP, 0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28, 0x42, is converted to \uFFA0 ISO2022JP is for Japanese, but \uFFA0 is a part of Korean Hangul character.
GB18030, \uFFFE is converted to 0x84, 0x31, 0xA4, 0x38. 0x84, 0x31, 0xA4, 0x38 is converted to replacement character \uFFFD.
$ java Test1 \uFFA0 \uFFFD
Expected result $ java Test1 \uFFFD \uFFFE
Testcase is as follows: ======================== $ cat Test1.java import java.nio.*; import java.nio.charset.*;
public class Test1 { public static void main(String[] args) throws Exception { { byte[] ba = new byte[] {0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28, 0x42,}; for(char ch : (new String(ba, "ISO2022JP")).toCharArray()) { System.out.printf("\\u%04X",(int)ch); } System.out.println(); } { Charset cs = Charset.forName("GB18030"); CharsetDecoder cd = cs.newDecoder(); cd.onMalformedInput(CodingErrorAction.REPLACE) .onUnmappableCharacter(CodingErrorAction.REPLACE); byte[] ba = "\uFFFE".getBytes(cs); ByteBuffer bb = ByteBuffer.allocateDirect(ba.length); bb.put(ByteBuffer.wrap(ba)); bb.position(0); CharBuffer cb = cd.decode(bb); for(int i=0; i<cb.limit(); i++) { System.out.printf("\\u%04X",(int)cb.get(i)); } System.out.println(); } } } ========================
I'd like to obtain a sponsor for this issue.
Thanks, Ichiroh Takiguchi IBM Japan, Ltd.
+1, looks fine If you need a sponsor, I can. Regards, Roger On 10/30/18 1:32 PM, Ichiroh Takiguchi wrote:
Hello. Additional reviewer is required. It's typo issue as Sherman explained.
Thanks, Ichiroh Takiguchi IBM Japan, Ltd.
On 2018-10-03 07:01, Xueming Shen wrote:
+1
-Sherman
btw, since gb18030.decodeArrayLoop() is right I would assume it's just a "typo" in decodeBufferLoop()
On 10/2/18, 2:21 AM, Ichiroh Takiguchi wrote:
Hello, IBM would like to contribute NIO converter patch to OpenJDK project.
Bug: https://bugs.openjdk.java.net/browse/JDK-8211382 Change: https://cr.openjdk.java.net/~itakiguchi/8211382/webrev.00/
Issue: ISO2022JP decoder and GB18030 decoder (for decodeBufferLoop()) have code range definition issues.
ISO2022JP, 0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28, 0x42, is converted to \uFFA0 ISO2022JP is for Japanese, but \uFFA0 is a part of Korean Hangul character.
GB18030, \uFFFE is converted to 0x84, 0x31, 0xA4, 0x38. 0x84, 0x31, 0xA4, 0x38 is converted to replacement character \uFFFD.
$ java Test1 \uFFA0 \uFFFD
Expected result $ java Test1 \uFFFD \uFFFE
Testcase is as follows: ======================== $ cat Test1.java import java.nio.*; import java.nio.charset.*;
public class Test1 { public static void main(String[] args) throws Exception { { byte[] ba = new byte[] {0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28, 0x42,}; for(char ch : (new String(ba, "ISO2022JP")).toCharArray()) { System.out.printf("\\u%04X",(int)ch); } System.out.println(); } { Charset cs = Charset.forName("GB18030"); CharsetDecoder cd = cs.newDecoder(); cd.onMalformedInput(CodingErrorAction.REPLACE) .onUnmappableCharacter(CodingErrorAction.REPLACE); byte[] ba = "\uFFFE".getBytes(cs); ByteBuffer bb = ByteBuffer.allocateDirect(ba.length); bb.put(ByteBuffer.wrap(ba)); bb.position(0); CharBuffer cb = cd.decode(bb); for(int i=0; i<cb.limit(); i++) { System.out.printf("\\u%04X",(int)cb.get(i)); } System.out.println(); } } } ========================
I'd like to obtain a sponsor for this issue.
Thanks, Ichiroh Takiguchi IBM Japan, Ltd.
participants (3)
-
Ichiroh Takiguchi
-
Roger Riggs
-
Xueming Shen