Adding new IBM extended charsets

Tue Jul 17 13:48:01 UTC 2018

Hi Alan,

Thank you for your inputs. I would like to clarify that all the  IBM 
charsets (IBMXXXX) in jdk.charsets are not IBM platform specific charsets. 
For example, only 43 charsets out of 72 IBMXXXX in jdk.charsets are EBCDIC 
or IBM platform specific charsets. Similarly, many charsets in the list of 
75 charsets which we would like to contribute are not EBCDIC charsets. 

I feel we should have a standard guideline for the extended charsets. If 
we are keeping the extended charsets in the JDK, then we may want to 
consider all ICU/IANA approved charsets in JDK. Otherwise, we may want to 
keep only the standard charsets in JDK and remove all the extended 
charsets so that all extended charsets can be taken from third party 
libraries like ICU4J.

If we decided to keep the extended charsets, then may be we can classify 
the extended charsets as ASCII and EBCDIC and the corresponding modules as 
jdk.ascii.charset and jdk.ebcdic.charset. Then, depends upon the platform, 
we can consider including either of the charset module or both. 

Please advise.

Thank you,
Nasser Ebrahim

From:   Alan Bateman <Alan.Bateman at oracle.com>
To:     Nasser Ebrahim <enasser at in.ibm.com>, Xueming Shen 
<xueming.shen at oracle.com>, core-libs-dev at openjdk.java.net
Date:   07/09/2018 01:25 AM
Subject:        Re: Adding new IBM extended charsets

On 06/07/2018 14:56, Nasser Ebrahim wrote:
> :
> I understood you preferred option is 3 [Remove all extended charsets 
from
> JDK (keep only default charsets) and use the extended charsets from 
third
> party like ICU4J]. Just to confirm, so you meant we need to keep only 
the
> standard charsets in the JDK and remove all the extended charsets from 
JDK
> and use them from ICU4J OR you meant apply that only for the new 
extended
> charsets. I think it is better to keep the consistency - either take all
> extended charsets from ICU4J or maintain all extended charsets with JDK.
> Keeping some extended charsets within JDK and use ICU4J for other 
extended
> charsets may confuse the Java user.
I think the suggestion in Sherman's mail is to drop the 70 or so IBM 
charsets from jdk.charsets. This will reduce the size of jdk.charsets 
and eliminate the need to maintain these charsets (at least on non-AIX 
builds). If developers need these charsets, say when connecting to 
database on an IBM system, then they can deploy the ICU4J provider on 
the class path or module path.

I don't think the suggestion impacts the 11 IBM charsets in java.base on 
non-AIX builds or the non-IBM charsets in jdk.charsets. They may be 
opportunities to drop some of these but that can be looked at separately.

Also I don't think the suggestion impacts the additional 12 IBM charsets 
that are included in the AIX build of java.base at this time. From the 
review threads, it seems there are supported locales on AIX that map to 
these charsets so this is why they are in java.base.

-Alan.