Adding new IBM extended charsets

Wed Jul 4 12:41:59 UTC 2018

Hello,

Am starting this mail thread to discuss about adding new IBM extended 
charsets. The questions is whether we need to add the new extended 
charsets to jdk.charsets or to a new separate charset provider/module like 
jdk.ibmcharsets. This discussion is in continuation of the suggestion from 
Alan Bateman in the mail chain - 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-May/053316.html. 

Am copying his inputs from that mail thread to start the discussion:
"I think we should start a discussion here about moving some or all of the 
IBM charsets to their own service provider module. I realize the AIX port 
might want to include some of them in its build of java.base but they 
aren't interesting to include in java.base, or even jdk.charsets, on most 
platform"

First, let me clarify whether IBM charsets are applicable only to IBM 
platforms like AIX or applicable to other platforms as well. All IBM 
charsets are applicable to any platforms including Linux and windows if 
those platforms needs to communicate with an application or database in 
IBM platforms like AIX. That is the reason, we traditionally add them to 
the jdk.charsets. However, we agree with Alan that those IBM charsets are 
not required if the JDK is not communicating to any applications/databases 
on IBM platforms. Hence, it makes sense to consider a separate charset 
provider / module for IBM charsets and use build parameters to decide 
whether to generate the new charset provider or not for any platforms.

Let me list out all the possible options I can think of for adding new 
extended charsets so that we can discuss and decide which is the best 
option.

1) Continue to add new extended charsets to jdk.charsets. 
The advantage with this approach is that no need to add new charset 
provider and all extended charsets are placed in one module. Also, any 
extended charset is applicable to any platform if they need to communicate 
with application/database in different platforms. The disadvantage is that 
the number of charsets in jdk.charsets keep increasing and blot its size. 
Also, many of those charsets may not be used in the lifetime of the JDK 
unless it is communicating  with application/databases of those platforms.

2) Create a new charset provider and module (say jdk.ibmcharsets) for all 
IBM charsets and include the new module in JDK on a need basis.
The advantage with this approach is that the foot print of jdk.charsets 
can be reduced and can include the new module only if it is required. The 
disadvantage is that a new charset provider needs to be created. Also, 
extended charsets will be located in two different modules and many a 
times both the modules are required.

3) Remove all extended charsets from JDK (keep only default charsets) and 
use the extended charsets from third party like ICU4J.
I believe this option might be discussed in the past and there might be 
valid reason not to pursue this option. Am still listing it to ensure that 
we have considered this option as well. The advantage with this approach 
is that we can avoid maintaining the same charsets by two different open 
source communities. The disadvantage with this option is that the release 
cycle of the two communities may be different and we may need to maintain 
the level ourselves for LTS releases as we may not want to change the 
specification in a service stream.

Please share your thoughts on your preferred option and list out any other 
options which I missed out. Thank you for your time.

Regards,
Nasser Ebrahim