Adding new IBM extended charsets

Nasser Ebrahim enasser at in.ibm.com
Tue Jul 24 08:56:50 UTC 2018


Thank you Martin, Sherman and Alan for your valuable inputs. 

I have done some initial analysis on the ICU4J. There are some 
compatibility issues on the ICU4J charsets with JDK charsets but am more 
concerned about its performance as JDK optimization do no exist in that 
implementation. I think we need to work with the ICU4J community to 
resolve those issues before we remove those charsets from JDK.

The primary reason we are interested to contribute the charsets to openjdk 
is that Java users of all locales to get a seamless experience when they 
move between openjdk and other implementations. I agree it is good from 
footprint and maintenance perspective if we are able to reduce the number 
of charsets. 

I believe the maintenance effort on the charsets are usually less as we 
hardly make any changes to the charsets once developed. Also, the charsets 
are usually independent to each other and hence usually will not affect 
the Java users unless they are used. As more team members from my team 
would like to actively participate in the openjdk community, I hope 
maintenance of any issues reported on IBM charsets may not be an issue 
going forward. As we discussed before, the footprint issue can be avoided 
if we enable the IBM charsets on a need basis with a build flag. 

As you advised, we can enable the IBM charsets only for AIX platform by 
default and user can enable them on other platforms on a need basis. If 
all of you agree, we can start working on moving all IBM charsets from 
jdk.charsets to a different module  jdk.ibm.charsets and enable them only 
for AIX platform by default. We can consider removing them from JDK in 
future if community found them as an overhead or not adding value. 

Please advise. 

Thank you,
Nasser Ebrahim



From:   Alan Bateman <Alan.Bateman at oracle.com>
To:     Xueming Shen <xueming.shen at oracle.com>, Nasser Ebrahim 
<enasser at in.ibm.com>
Cc:     core-libs-dev at openjdk.java.net
Date:   07/19/2018 03:44 PM
Subject:        Re: Adding new IBM extended charsets



On 19/07/2018 08:27, Xueming Shen wrote:
> Hi Nasser,
>
> From openjdk's perspective It would be preferred to direct the develop 
> to use the charset
> implementation provided by IBM, or the reliable third party that has 
> the appropriate knowledge,
> experience and resource to support/maintain those charsets such as the 
> icu4j charset
> project. I have been pulling the data from that huge icu-charset-data 
> file and implement/maintain
> them based on my best knowledge, but I'm sure engineers from IBM or 
> the icu project probably
> can do a much better job to implement/maintain/update those charsets 
> going forward.
>
> As first step we can separate those IBM charsets from the jdk.charset 
> into a separate package
> somewhere and configure them to be built into java.base and 
> jdk.charsets, for aix platform only.
> Then we can further discuss the best way to handle/distribute those 
> charsets that are not needed
> for the java.base module (for vm startup). As I said, it would be 
> ideal if we can remove them from the
> openjdk repo/binaries complete and direct the developer/user to use 
> the icu4j charset provider
> for those encodings, when needed. But given the possible compatibility 
> concern, we might want to
> phase this work out gradually in next major release.
I agree and in terms of phasing then I don't think it would be too 
disruptive if the EBCDIC charsets were dropped from jdk.charsets in JDK 
12, at least on the main stream platforms. As we've established in this 
thread, the ICU4J project does seem to publish its charset provider to 
Maven so there are alternatives for applications that really need these 
charsets

Nasser - do you do any testing with the ICU4J charsets? I quickly tried 
62.1 and it seems to work fine on the class path. I didn't check for any 
compatibility differences or compare the performance but maybe you have. 
It's a bit awkward to test this provider as an automatic module due to 
the unusual naming of these JAR files. They may not have looked at 
modules yet but the ability to link thee icu4h.charsets module into a 
run-time image seems something that people may want to do in the future.

-Alan







More information about the core-libs-dev mailing list