<i18n dev> RFR: 8291916: Unexpected output on Arabic Windows command prompt

Ichiroh Takiguchi itakiguchi at openjdk.org
Mon Aug 8 01:37:38 UTC 2022


On Fri, 5 Aug 2022 16:44:37 GMT, Naoto Sato <naoto at openjdk.org> wrote:

>> To support Windows command prompt's codepage, following charsets should be moved from jdk.charsets module to java.base module.
>> 
>> - IBM860
>> - IBM861
>> - IBM863
>> - IBM864
>> - IBM865
>> - IBM869
>
> Hi @takiguc,
> I am not quite sure what is the rationale for moving those charsets into `java.base` module. IIUC, we typically did such a fix when the java runtime cannot boot in a supported configuration (https://bugs.openjdk.org/browse/JDK-8187910), but it seems that this issue does not warrant such a requirement. Will you elaborate more?

Hello @naotoj .
As Alan was described, windows codepage mapping table is as follows

- 860 - Portuguese (DOS) - IBM860
- 861 - Icelandic (DOS) - IBM861
- 863 - French Canadian (DOS) - IBM863
- 864 - Arabic (864) - IBM864
- 865 - Nordic (DOS) - IBM865
- 869 - Greek, Modern (DOS) - IBM869

Java 8 implementation is as follows:
Windows command prompt setting, following sample is 864.

>chcp 864
Active code page: 864

Test program

>type termdump.java
import java.nio.charset.*;

public class termdump {
  public static void main(String[] args) throws Exception {
    String csname = System.getProperty("sun.stdout.encoding");
    if (csname == null) csname = System.getProperty("stdout.encoding");
    System.out.println(csname);
    Charset cs = Charset.forName(csname);
    for (int i0 = 0; i0 < 0x100; i0 += 0x10) {
      StringBuilder sb = new StringBuilder();
      for (int i1 = 0; i1 < 0x10; i1++) {
        byte[] ba = new byte[1];
        ba[0] = (byte) (i0 | i1);
        String s = new String(ba, csname);
        if (s.length() == 1) {
          char ch = s.charAt(0);
          if (ch < 0x7F) continue;
          if (Character.isISOControl(ch)) continue;
          if (ch == '\uFFFD') continue;
          sb.append(ch);
        }
      }
      if (sb.length() > 0) {
        System.out.printf("0x%02X %s%n", i0, sb.toString());
        System.out.print("    ");
        for (char ch : sb.toString().toCharArray()) {
          System.out.printf(" %04X", (int)ch);
        }
        System.out.println();
      }
    }
  }
}

Java8 output

>jdk8u345-b01\jre\bin\java termdump
cp864
0x20 %
     066A
0x80 °·∙√▒─│┼┤┬├┴┐┌└┘
     00B0 00B7 2219 221A 2592 2500 2502 253C 2524 252C 251C 2534 2510 250C 2514 2518
0x90 β∞φ±½¼≈«»ﻷﻸﻻﻼ
     03B2 221E 03C6 00B1 00BD 00BC 2248 00AB 00BB FEF7 FEF8 FEFB FEFC
0xA0  ­ﺂ£¤ﺄﺎﺏﺕﺙ،ﺝﺡﺥ
     00A0 00AD FE82 00A3 00A4 FE84 FE8E FE8F FE95 FE99 060C FE9D FEA1 FEA5
0xB0 ٠١٢٣٤٥٦٧٨٩ﻑ؛ﺱﺵﺹ؟
     0660 0661 0662 0663 0664 0665 0666 0667 0668 0669 FED1 061B FEB1 FEB5 FEB9 061F
0xC0 ¢ﺀﺁﺃﺅﻊﺋﺍﺑﺓﺗﺛﺟﺣﺧﺩ
     00A2 FE80 FE81 FE83 FE85 FECA FE8B FE8D FE91 FE93 FE97 FE9B FE9F FEA3 FEA7 FEA9
0xD0 ﺫﺭﺯﺳﺷﺻﺿﻁﻅﻋﻏ¦¬÷×ﻉ
     FEAB FEAD FEAF FEB3 FEB7 FEBB FEBF FEC1 FEC5 FECB FECF 00A6 00AC 00F7 00D7 FEC9
0xE0 ـﻓﻗﻛﻟﻣﻧﻫﻭﻯﻳﺽﻌﻎﻍﻡ
     0640 FED3 FED7 FEDB FEDF FEE3 FEE7 FEEB FEED FEEF FEF3 FEBD FECC FECE FECD FEE1
0xF0 ﹽّﻥﻩﻬﻰﻲﻐﻕﻵﻶﻝﻙﻱ■
     FE7D 0651 FEE5 FEE9 FEEC FEF0 FEF2 FED0 FED5 FEF5 FEF6 FEDD FED9 FEF1 25A0

Java20 output

>jdk-20\bin\java termdump
cp864
0x20 ﻋﺕ
     066A
0x80 ﺁ٠ﺁ٧ﻗ┤ﻷﻗ┤ﻸﻗ≈φﻗ½°ﻗ½∙ﻗ½ﺱﻗ½¤ﻗ½،ﻗ½œﻗ½٤ﻗ½βﻗ½┐ﻗ½½ﻗ½»
     00B0 00B7 2219 221A 2592 2500 2502 253C 2524 252C 251C 2534 2510 250C 2514 2518
0x90 ﺧ٢ﻗ┤ﻼﺩ│ﺁ١ﺁﺵﺁﺱﻗ┬┤ﺁﺙﺁ؛ﻡ؛٧ﻡ؛٨ﻡ؛؛ﻡ؛ﺱ
     03B2 221E 03C6 00B1 00BD 00BC 2248 00AB 00BB FEF7 FEF8 FEFB FEFC
0xA0 ﺁ ﺁﺝﻡﻑ∙ﺁ£ﺁ¤ﻡﻑ▒ﻡﻑ└ﻡﻑ┘ﻡﻑ¼ﻡﻑﻷﻅ┐ﻡﻑﻻﻡﻑ­ﻡﻑﺄ
     00A0 00AD FE82 00A3 00A4 FE84 FE8E FE8F FE95 FE99 060C FE9D FEA1 FEA5
0xB0 ﻋ ﻋ­ﻋﺂﻋ£ﻋ¤ﻋﺄﻋﻋﻋﺎﻋﺏﻡ؛∞ﻅ›ﻡﻑ١ﻡﻑ٥ﻡﻑ٩ﻅŸ
     0660 0661 0662 0663 0664 0665 0666 0667 0668 0669 FED1 061B FEB1 FEB5 FEB9 061F
0xC0 ﺁﺂﻡﻑ°ﻡﻑ·ﻡﻑ√ﻡﻑ─ﻡ؛├ﻡﻑ┴ﻡﻑ┌ﻡﻑ∞ﻡﻑ±ﻡﻑ«ﻡﻑ›ﻡﻑŸﻡﻑ£ﻡﻑﻡﻑﺏ
     00A2 FE80 FE81 FE83 FE85 FECA FE8B FE8D FE91 FE93 FE97 FE9B FE9F FEA3 FEA7 FEA9
0xD0 ﻡﻑﺙﻡﻑﺝﻡﻑﺥﻡﻑ٣ﻡﻑ٧ﻡﻑ؛ﻡﻑ؟ﻡ؛·ﻡ؛─ﻡ؛┴ﻡ؛┘ﺁﺁ،ﺃ٧ﺃ«ﻡ؛┬
     FEAB FEAD FEAF FEB3 FEB7 FEBB FEBF FEC1 FEC5 FECB FECF 00A6 00AC 00F7 00D7 FEC9
0xE0 ﻋ°ﻡ؛±ﻡ؛«ﻡ؛›ﻡ؛Ÿﻡ؛£ﻡ؛ﻡ؛ﺙﻡ؛ﺝﻡ؛ﺥﻡ؛٣ﻡﻑﺵﻡ؛┐ﻡ؛└ﻡ؛┌ﻡ؛­
     0640 FED3 FED7 FEDB FEDF FEE3 FEE7 FEEB FEED FEEF FEF3 FEBD FECC FECE FECD FEE1
0xF0 ﻡ٩ﺵﻋ∞ﻡ؛ﺄﻡ؛ﺏﻡ؛،ﻡ؛٠ﻡ؛٢ﻡ؛βﻡ؛¼ﻡ؛٥ﻡ؛٦ﻡ؛ﻻﻡ؛ﻷﻡ؛١ﻗ≈ 
     FE7D 0651 FEE5 FEE9 FEEC FEF0 FEF2 FED0 FED5 FEF5 FEF6 FEDD FED9 FEF1 25A0

Fixed output

>java -showversion termdump
openjdk version "20-internal" 2023-03-21
OpenJDK Runtime Environment (build 20-internal-adhoc.Administrator.jdk)
OpenJDK 64-Bit Server VM (build 20-internal-adhoc.Administrator.jdk, mixed mode, sharing)
cp864
0x20 %
     066A
0x80 °·∙√▒─│┼┤┬├┴┐┌└┘
     00B0 00B7 2219 221A 2592 2500 2502 253C 2524 252C 251C 2534 2510 250C 2514 2518
0x90 β∞φ±½¼≈«»ﻷﻸﻻﻼ
     03B2 221E 03C6 00B1 00BD 00BC 2248 00AB 00BB FEF7 FEF8 FEFB FEFC
0xA0  ­ﺂ£¤ﺄﺎﺏﺕﺙ،ﺝﺡﺥ
     00A0 00AD FE82 00A3 00A4 FE84 FE8E FE8F FE95 FE99 060C FE9D FEA1 FEA5
0xB0 ٠١٢٣٤٥٦٧٨٩ﻑ؛ﺱﺵﺹ؟
     0660 0661 0662 0663 0664 0665 0666 0667 0668 0669 FED1 061B FEB1 FEB5 FEB9 061F
0xC0 ¢ﺀﺁﺃﺅﻊﺋﺍﺑﺓﺗﺛﺟﺣﺧﺩ
     00A2 FE80 FE81 FE83 FE85 FECA FE8B FE8D FE91 FE93 FE97 FE9B FE9F FEA3 FEA7 FEA9
0xD0 ﺫﺭﺯﺳﺷﺻﺿﻁﻅﻋﻏ¦¬÷×ﻉ
     FEAB FEAD FEAF FEB3 FEB7 FEBB FEBF FEC1 FEC5 FECB FECF 00A6 00AC 00F7 00D7 FEC9
0xE0 ـﻓﻗﻛﻟﻣﻧﻫﻭﻯﻳﺽﻌﻎﻍﻡ
     0640 FED3 FED7 FEDB FEDF FEE3 FEE7 FEEB FEED FEEF FEF3 FEBD FECC FECE FECD FEE1
0xF0 ﹽّﻥﻩﻬﻰﻲﻐﻕﻵﻶﻝﻙﻱ■
     FE7D 0651 FEE5 FEE9 FEEC FEF0 FEF2 FED0 FED5 FEF5 FEF6 FEDD FED9 FEF1 25A0

863's output is as follows:

>chcp 863
Active code page: 863

>jdk8u345-b01\jre\bin\java termdump
cp863
0x80 ÇüéâÂà¶çêëèïî‗À§
     00C7 00FC 00E9 00E2 00C2 00E0 00B6 00E7 00EA 00EB 00E8 00EF 00EE 2017 00C0 00A7
0x90 ÉÈÊôËÏûù¤ÔÜ¢£ÙÛƒ
     00C9 00C8 00CA 00F4 00CB 00CF 00FB 00F9 00A4 00D4 00DC 00A2 00A3 00D9 00DB 0192
0xA0 ¦´óú¨¸³¯Î⌐¬½¼¾«»
     00A6 00B4 00F3 00FA 00A8 00B8 00B3 00AF 00CE 2310 00AC 00BD 00BC 00BE 00AB 00BB
0xB0 ░▒▓│┤╡╢╖╕╣║╗╝╜╛┐
     2591 2592 2593 2502 2524 2561 2562 2556 2555 2563 2551 2557 255D 255C 255B 2510
0xC0 └┴┬├─┼╞╟╚╔╩╦╠═╬╧
     2514 2534 252C 251C 2500 253C 255E 255F 255A 2554 2569 2566 2560 2550 256C 2567
0xD0 ╨╤╥╙╘╒╓╫╪┘┌█▄▌▐▀
     2568 2564 2565 2559 2558 2552 2553 256B 256A 2518 250C 2588 2584 258C 2590 2580
0xE0 αßΓπΣσµτΦΘΩδ∞φε∩
     03B1 00DF 0393 03C0 03A3 03C3 00B5 03C4 03A6 0398 03A9 03B4 221E 03C6 03B5 2229
0xF0 ≡±≥≤⌠⌡÷≈°∙·√ⁿ²■ 
     2261 00B1 2265 2264 2320 2321 00F7 2248 00B0 2219 00B7 221A 207F 00B2 25A0 00A0

>jdk-20\bin\java termdump
cp863
0x80 ├ç├╝├⌐├ó├é├¦┬╢├¯├¬├½├Î├»├«ΓÇù├Ç┬¯
     00C7 00FC 00E9 00E2 00C2 00E0 00B6 00E7 00EA 00EB 00E8 00EF 00EE 2017 00C0 00A7
0x90 ├ë├ê├è├┤├ï├§├╗├╣┬¨├Ë├£┬ó┬ú├Ô├¢╞Ê
     00C9 00C8 00CA 00F4 00CB 00CF 00FB 00F9 00A4 00D4 00DC 00A2 00A3 00D9 00DB 0192
0xA0 ┬³┬┤├│├║┬Î┬╕┬│┬»├ÀΓîÉ┬¼┬╜┬╝┬╛┬½┬╗
     00A6 00B4 00F3 00FA 00A8 00B8 00B3 00AF 00CE 2310 00AC 00BD 00BC 00BE 00AB 00BB
0xB0 ΓûÈΓûÊΓûôΓËéΓ˨ΓÏ´ΓÏóΓÏûΓÏÏΓÏúΓÏÈΓÏùΓÏÙΓÏ£ΓÏ¢ΓËÉ
     2591 2592 2593 2502 2524 2561 2562 2556 2555 2563 2551 2557 255D 255C 255B 2510
0xC0 ΓËËΓË┤Γ˼ΓË£ΓËÇΓË╝ΓÏÛΓσΓÏÜΓÏËΓÏ⌐ΓϳΓϦΓÏÉΓϼΓϯ
     2514 2534 252C 251C 2500 253C 255E 255F 255A 2554 2569 2566 2560 2550 256C 2567
0xD0 ΓÏÎΓϨΓϸΓÏÔΓϤΓÏÊΓÏôΓϽΓϬΓˤΓËîΓûêΓûÂΓûîΓûÉΓûÇ
     2568 2564 2565 2559 2558 2552 2553 256B 256A 2518 250C 2588 2584 258C 2590 2580
0xE0 ╬▒├ƒ╬ô╧Ç╬ú╧â┬╡╧Â╬³╬¤╬⌐╬┤ΓêÛ╧¶╬╡Γê⌐
     03B1 00DF 0393 03C0 03A3 03C3 00B5 03C4 03A6 0398 03A9 03B4 221E 03C6 03B5 2229
0xF0 Γë´┬▒Γë¸Γë¨Γî¦Γî´├╖Γëê┬░ΓêÔ┬╖ΓêÜΓü┐┬▓Γû¦┬¦
     2261 00B1 2265 2264 2320 2321 00F7 2248 00B0 2219 00B7 221A 207F 00B2 25A0 00A0

We are in final phase to migrate from Java8 to next stable Java.
I think tt should be fixed.

-------------

PR: https://git.openjdk.org/jdk/pull/9761


More information about the i18n-dev mailing list