Adding new IBM extended charsets

Wed Aug 22 18:19:02 UTC 2018

Hi Alan,

Thank you for your valuable inputs. I will initiate the discussion with 
ICU4J community to explore the possibility of using ICU4J by resolving the 
compatibility and performance difference so that we can use ICU4J for most 
of the extended charsets and remove them JDK build. As we discussed 
earlier, significant changes are required on ICU4J side to resolve the 
functional and performance difference for JDK to directly consume it and 
hence may be considered as a long term solution.

In the mean time, I can explore the other option you have suggested to 
make the IBM charsets specific to AIX platform and keep optional for other 
platforms by making the make file changes. I will try to create a 
prototype to do the make/src file changes which enable generating IBM 
charsets as a separate module only on AIX platform and keep optional for 
other platforms.

Please let me know if you have any inputs.

Thank you,
Nasser Ebrahim

From:   Alan Bateman <Alan.Bateman at oracle.com>
To:     Nasser Ebrahim <enasser at in.ibm.com>, 
core-libs-dev at openjdk.java.net, Xueming Shen <xueming.shen at oracle.com>
Date:   08/06/2018 12:08 AM
Subject:        Re: Adding new IBM extended charsets

On 24/07/2018 09:56, Nasser Ebrahim wrote:
Thank you Martin, Sherman and Alan for your valuable inputs. 

I have done some initial analysis on the ICU4J. There are some 
compatibility issues on the ICU4J charsets with JDK charsets but am more 
concerned about its performance as JDK optimization do no exist in that 
implementation. I think we need to work with the ICU4J community to 
resolve those issues before we remove those charsets from JDK.
If you can work with the ICU4J project on these issues then I think we 
have a way forward. An additional issue with their downloads is that they 
target JDK 6 and don't seem to have thought about deploying as modules 
with JDK 9 or newer yet. Their downloads can be used as automatic modules 
but it requires renaming their JAR files due to unusual naming that they 
use to encode the version string. A simple Automatic-Module-Name attribute 
would make it easy for developers to deploy their charset provider on the 
module path, they can still target JDK 6.

As regards the way forward then I think we have to put infrastructure into 
the build to make it easy to allow specific charsets be included or 
excluded from specific platforms. As things stand, and as have you have 
found with your updates to the stdcs-<platform> files, the charsets are 
generated to be included in either java.base or jdk.charsets. We need 
another input to the configurability to make it possible to include or 
exclude so that the main stream platforms do not have to include the IBM 
charsets. There are several details around this, particularly around 
aliases, but if we can get that done then we have a lot of flexibility.  
My personal view is that we should work towards excluding the IBM charsets 
from the main stream platforms, starting with a cull of the EBCDIC 
charsets. If the ICU4J project can get their issues sorted out in a 
similar time frame then it makes for a simple migration story -- the JDK 
includes the standard charsets and many additional charsets. If you need 
others then download the ICU4J charset provider and deploy it on your 
class path or module path.

-Alan