<i18n dev> RFR: 8289834: Add SBCS and DBCS Only EBCDIC charsets

Fri Aug 26 09:28:07 UTC 2022

On Wed, 6 Jul 2022 14:05:39 GMT, Ichiroh Takiguchi <itakiguchi at openjdk.org> wrote:

> OpenJDK supports "Japanese EBCDIC - Katakana" and "Korean EBCDIC" SBCS and DBCS Only charsets.
> |Charset|Mix|SBCS|DBCS|
> | -- | -- | -- | -- |
> | Japanese EBCDIC - Katakana | Cp930 | Cp290 | Cp300 |
> | Korean | Cp933 | Cp833 | Cp834 |
> 
> But OpenJDK does not supports some of "Japanese EBCDIC - English" / "Simplified Chinese EBCDIC" / "Traditional Chinese EBCDIC" SBCS and DBCS Only charsets.
> 
> I'd like to request Cp1027/Cp835/Cp836/Cp837 for consistency
> |Charset|Mix|SBCS|DBCS|
> | ------------- | ------------- | ------------- | ------------- |
> | Japanese EBCDIC - English | Cp939 | **Cp1027** | Cp300 |
> | Simplified Chinese EBCDIC | Cp935 | **Cp836** | **Cp837** |
> | Traditional Chinese EBCDIC | Cp937 | (*1) | **Cp835** | 
> 
> *1: Cp037 compatible

> Use following options, like OpenJDK: `java -cp icu4j-71_1.jar:icu4j-charset-71_1.jar:. tc IBM-1047 20000 1 1` ICU4J `java -cp icu4j-71_1.jar:icu4j-charset-71_1.jar:. tc IBM-1047_P100-1995 20000 1 1`
> 
> Actually, I'm confused by this result. Previously, I was just comparing A/A with B/B on OpenJDK's charset. I didn't think ICU4J's result would make a difference.

My initial reaction is one of relief that the icu4j provider can be used with current JDK builds. This means there is an option should we decide to stop adding more EBCDIC charsets to the JDK.

The test uses IBM-1047 and I can't tell if the icu4j provider is used or not. Charset doesn't define a provider method but I think would be useful to print cs.getClass() or cs.getClass().getModule() so we know which Charset implementation is used. Also I think any discussion on performance would be better served with a JMH benchmark rather than a standalone test.

-------------

PR: https://git.openjdk.org/jdk/pull/9399