RFR: 8291916: Unexpected output on Windows command prompt

Ichiroh Takiguchi itakiguchi at openjdk.org
Fri Sep 9 06:29:44 UTC 2022


On Tue, 9 Aug 2022 20:38:25 GMT, Naoto Sato <naoto at openjdk.org> wrote:

>> To support Windows command prompt's codepage, following charsets should be moved from jdk.charsets module to java.base module.
>> 
>> - IBM860
>> - IBM861
>> - IBM863
>> - IBM864
>> - IBM865
>> - IBM869
>
> I looked at this issue a bit more. It looks to me that the issue is caused by the fact that the encoding of `System.out` falls back to the default encoding, as `IBM864` is not in `java.base`. This issue seems not new and reproducible with the releases since JDK9 where modularization has been introduced. Also, I think other encodings than those `IBM*` listed here, can possibly cause this issue. In order to fix this completely, those obscure encodings also have to be in `java.base` which I don't think we would want to do.

Hello @naotoj .
Sorry for my bad reaction.

I checked these charsets with IBM CDRA definitions.
These are also same, but some round-trip definitions are not same, like #9661 .
I think there come from files under https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/ .
As you know, `CP860/CP861/CP863/CP864/CP865/CP869` are defined into [IANA Character Sets](https://www.iana.org/assignments/character-sets/character-sets.xhtml) as an alias.
Even if the registered names are `IBM*`, these charset implementations are from Microsoft.
I think these charset should be usable as default charset on Windows command prompt.
Please reconsider current Java implementation.

-------------

PR: https://git.openjdk.org/jdk/pull/9761


More information about the core-libs-dev mailing list