RFR: 8211382 ISO2022JP and GB18030 NIO converter issues

2 Oct 2018

      Hello,
IBM would like to contribute NIO converter patch to OpenJDK project.

Bug:    https://bugs.openjdk.java.net/browse/JDK-8211382
Change: https://cr.openjdk.java.net/~itakiguchi/8211382/webrev.00/

Issue:
ISO2022JP decoder and GB18030 decoder (for decodeBufferLoop()) have code 
range definition issues.

ISO2022JP, 0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28, 0x42, is converted to 
\uFFA0
ISO2022JP is for Japanese, but \uFFA0 is a part of Korean Hangul 
character.

GB18030, \uFFFE is converted to 0x84, 0x31, 0xA4, 0x38.
0x84, 0x31, 0xA4, 0x38 is converted to replacement character \uFFFD.

$ java Test1
\uFFA0
\uFFFD

Expected result
$ java Test1
\uFFFD
\uFFFE

Testcase is as follows:
========================
$ cat Test1.java
import java.nio.*;
import java.nio.charset.*;

public class Test1 {
  public static void main(String[] args) throws Exception {
    {
      byte[] ba = new byte[] {0x1B, 0x28, 0x49, 0x60, 0x1B, 0x28, 
0x42,};
      for(char ch : (new String(ba, "ISO2022JP")).toCharArray()) {
        System.out.printf("\\u%04X",(int)ch);
      }
      System.out.println();
    }
    {
      Charset cs = Charset.forName("GB18030");
      CharsetDecoder cd = cs.newDecoder();
      cd.onMalformedInput(CodingErrorAction.REPLACE)
        .onUnmappableCharacter(CodingErrorAction.REPLACE);
      byte[] ba = "\uFFFE".getBytes(cs);
      ByteBuffer bb = ByteBuffer.allocateDirect(ba.length);
      bb.put(ByteBuffer.wrap(ba));
      bb.position(0);
      CharBuffer cb = cd.decode(bb);
      for(int i=0; i<cb.limit(); i++) {
        System.out.printf("\\u%04X",(int)cb.get(i));
      }
      System.out.println();
    }
  }
}
========================

I'd like to obtain a sponsor for this issue.

Thanks,
Ichiroh Takiguchi
IBM Japan, Ltd.

Ichiroh Takiguchi

Xueming Shen

Ichiroh Takiguchi

Roger Riggs

tags

participants (3)