Adding new IBM extended charsets
Nasser Ebrahim
enasser at in.ibm.com
Wed Aug 22 18:19:02 UTC 2018
Hi Alan,
Thank you for your valuable inputs. I will initiate the discussion with
ICU4J community to explore the possibility of using ICU4J by resolving the
compatibility and performance difference so that we can use ICU4J for most
of the extended charsets and remove them JDK build. As we discussed
earlier, significant changes are required on ICU4J side to resolve the
functional and performance difference for JDK to directly consume it and
hence may be considered as a long term solution.
In the mean time, I can explore the other option you have suggested to
make the IBM charsets specific to AIX platform and keep optional for other
platforms by making the make file changes. I will try to create a
prototype to do the make/src file changes which enable generating IBM
charsets as a separate module only on AIX platform and keep optional for
other platforms.
Please let me know if you have any inputs.
Thank you,
Nasser Ebrahim
From: Alan Bateman <Alan.Bateman at oracle.com>
To: Nasser Ebrahim <enasser at in.ibm.com>,
core-libs-dev at openjdk.java.net, Xueming Shen <xueming.shen at oracle.com>
Date: 08/06/2018 12:08 AM
Subject: Re: Adding new IBM extended charsets
On 24/07/2018 09:56, Nasser Ebrahim wrote:
Thank you Martin, Sherman and Alan for your valuable inputs.
I have done some initial analysis on the ICU4J. There are some
compatibility issues on the ICU4J charsets with JDK charsets but am more
concerned about its performance as JDK optimization do no exist in that
implementation. I think we need to work with the ICU4J community to
resolve those issues before we remove those charsets from JDK.
If you can work with the ICU4J project on these issues then I think we
have a way forward. An additional issue with their downloads is that they
target JDK 6 and don't seem to have thought about deploying as modules
with JDK 9 or newer yet. Their downloads can be used as automatic modules
but it requires renaming their JAR files due to unusual naming that they
use to encode the version string. A simple Automatic-Module-Name attribute
would make it easy for developers to deploy their charset provider on the
module path, they can still target JDK 6.
As regards the way forward then I think we have to put infrastructure into
the build to make it easy to allow specific charsets be included or
excluded from specific platforms. As things stand, and as have you have
found with your updates to the stdcs-<platform> files, the charsets are
generated to be included in either java.base or jdk.charsets. We need
another input to the configurability to make it possible to include or
exclude so that the main stream platforms do not have to include the IBM
charsets. There are several details around this, particularly around
aliases, but if we can get that done then we have a lot of flexibility.
My personal view is that we should work towards excluding the IBM charsets
from the main stream platforms, starting with a cull of the EBCDIC
charsets. If the ICU4J project can get their issues sorted out in a
similar time frame then it makes for a simple migration story -- the JDK
includes the standard charsets and many additional charsets. If you need
others then download the ICU4J charset provider and deploy it on your
class path or module path.
-Alan
More information about the core-libs-dev
mailing list