<i18n dev> RFR: 8195686: ISO-8859-8-i charset cannot be decoded, should be mapped to ISO-8859-8

Thu Oct 10 16:54:13 UTC 2024

On Thu, 10 Oct 2024 16:32:48 GMT, Naoto Sato <naoto at openjdk.org> wrote:

> Sorry, but I still don't believe that making "ISO-8859-8-I" as an alias to "ISO-8859-8" is the right solution, per the IANA character sets definition (https://www.iana.org/assignments/character-sets/character-sets.xhtml). The current PR would make "ISO-8859-8-I" charset appear in `Charset.forName("ISO-8859-8").aliases()`, but not in `Charset.availableCharsets()` which is deemed incorrect to me.

I agree. From the Charset specification,

> If a charset listed in the IANA Charset Registry is supported by an implementation of the Java platform then its canonical name must be the name listed in the registry. Many charsets are given more than one name in the registry, in which case the registry identifies one of the names as MIME-preferred. If a charset has more than one registry name then its canonical name must be the MIME-preferred name and the other names in the registry must be valid aliases.

Practically speaking it does seem to be a alias, but implementing as such would violate the Charset specification. So either defining as a new Charset for ISO-8859-8-I (if there is sufficient demand) or as Naoto pointed out, utilize the CharsetProvider would seem like appropriate solutions to me. A pro to the SPI solution is that you can also easily include all the other bidi supported implicit/explicit Charsets as well.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20690#issuecomment-2405607186