<i18n dev> RFR: 8195686: ISO-8859-8-i charset cannot be decoded, should be mapped to ISO-8859-8

Jason Mehrens duke at openjdk.org
Fri Sep 13 03:15:05 UTC 2024


On Tue, 27 Aug 2024 17:01:19 GMT, Naoto Sato <naoto at openjdk.org> wrote:

>> Mapping ISO-8859-8-I charset to ISO-8859-8.
>> Below mentioned 2 aliases are added as part of this:-
>> **ISO-8859-8-I**
>> **ISO8859-8-I**
>> 
>> The bug report for the same:- https://bugs.openjdk.org/browse/JDK-8195686
>
> I looked at this issue a bit more. Looking at the IANA Charset registry (https://www.iana.org/assignments/character-sets/character-sets.xhtml) which `Charset` class is based on, `ISO-8859-8-I` is not an alias to `ISO-8859-8`, but it is defined as a distinct `Preferred MIME name`. So I don't think current proposed solution is correct. (It would return ISO-8859-8-I as an alias to ISO-8859-8). Also, looking at the RFC-1556, in which this ISO-8859-8-I encoding is defined, there are other encodings, i.e., ISO-8859-6-I, ISO-8859-6-E, and ISO-8859-8-E. Why are they not relevant, but ISO-8859-8-I is?
> Considering these, I am still not sure to introduce these new encodings now, also because there has not been any request from the time Bill Shannon worked (circa 2018), unless Arabic/Hebrew speaking communities jumped in and provide rationale to support them.

@naotoj does the mapping need to be removed from:

https://github.com/openjdk/jdk/blob/5e5942a282e14846404b68c65d43594d6b9226d9/src/java.xml/share/classes/com/sun/org/apache/xerces/internal/util/EncodingMap.java#L770

I ask because JakartaMail /Angus Mail is a similar usecase to this code.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20690#issuecomment-2347953621


More information about the i18n-dev mailing list