RFR: 8291916: Unexpected output on Windows command prompt
Ichiroh Takiguchi
itakiguchi at openjdk.org
Fri Sep 9 06:29:44 UTC 2022
On Tue, 9 Aug 2022 20:38:25 GMT, Naoto Sato <naoto at openjdk.org> wrote:
>> To support Windows command prompt's codepage, following charsets should be moved from jdk.charsets module to java.base module.
>>
>> - IBM860
>> - IBM861
>> - IBM863
>> - IBM864
>> - IBM865
>> - IBM869
>
> I looked at this issue a bit more. It looks to me that the issue is caused by the fact that the encoding of `System.out` falls back to the default encoding, as `IBM864` is not in `java.base`. This issue seems not new and reproducible with the releases since JDK9 where modularization has been introduced. Also, I think other encodings than those `IBM*` listed here, can possibly cause this issue. In order to fix this completely, those obscure encodings also have to be in `java.base` which I don't think we would want to do.
Hello @naotoj .
Sorry for my bad reaction.
I checked these charsets with IBM CDRA definitions.
These are also same, but some round-trip definitions are not same, like #9661 .
I think there come from files under https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/ .
As you know, `CP860/CP861/CP863/CP864/CP865/CP869` are defined into [IANA Character Sets](https://www.iana.org/assignments/character-sets/character-sets.xhtml) as an alias.
Even if the registered names are `IBM*`, these charset implementations are from Microsoft.
I think these charset should be usable as default charset on Windows command prompt.
Please reconsider current Java implementation.
-------------
PR: https://git.openjdk.org/jdk/pull/9761
More information about the core-libs-dev
mailing list