[jdk8u-dev] RFR: 8301119: Support for GB18030-2022

Andrew John Hughes andrew at openjdk.org
Wed Jul 12 11:32:26 UTC 2023


This patch modifies GB18030 to handle both the 2000 and the 2022 variant. The 2000 variant is available by setting `-Djdk.charset.GB18030=2000`.  This PR replaces https://github.com/openjdk/jdk8u/pull/45.

With the preceding test changes in place (https://github.com/openjdk/jdk8u/pull/43 and https://github.com/openjdk/jdk8u/pull/44), the changes needed for this are fairly minimal. The biggest divergence from 11u is in the character set providers. The changes in the `make` directory are not needed as 8u never moved to using a template for GB18030 in the first place (the 11u changes revert it back to being source-based). The change in the SPI.java generator tool moves into ExtendedCharsets.java in the class library, as the file is not auto-generated in 8u. Following additional work by @jerboaa, the alias is now set to `2022` initially, and then replaced in the `init()` method if `jdk.charset.GB18030` is `2000`.

In 8u, the standard charsets are generated from a text file by a shell script, while the extended charsets are handled by a standard class. 11u moves GB18030 from extended to standard. I experimented with this in 8u, but it seemed more problematic than just keeping it in the extended set. The only reason I can see for moving it in 11u is it allowed `IS_2000` to be package-private to sun.nio.cs. This is resolved by @jerboaa's changes which modify `aliasMap` appropriately on creation, and so allow the use of a package-private method `isGB18030_2000` in `ExtendedCharsets` instead.

To use the 11u solution would mean major rewrites to the shell script or bringing over the whole change in how the standard charset provider is generated from 11u, which I think, along with moving the package the character set is in, is too risky and unnecessary for this change. The generation changes are necessary because the GB18030 character set needs to provide a different alias, depending on whether it is the 2000 or 2002 variant. The `genCharsetProvider.sh` would need the alterations we have added to `ExtendedCharsets.java` to handle this, but converted to `awk`. I did experiment with this, but saw test failures.

The only adjustment to the `GB18030.java` changes is copyright headers and the removal of `IS_2000` as mentioned above.

With the tests, the adjustments are just due to differing bug IDs, the absence of `@modules` and the use of constructs (`var`) and library calls (`Set.of`) that don't exist in 8u. The `List.of` and `Set.of` calls are frequent issues in backports, so I used this as an opportunity to introduce a full set of equivalents into the test library. It should now be possible to just rewrite `Set.of` to `Utils.setOf` and `List.of` to `Utils.listOf`. The returned collections are expected to be unmodifiable, not contain `null` and (in the case of sets) not contain duplicates. Simple replacement with a newly constructed `ArrayList` or `HashSet` would not ensure this. While this test does not rely on this, others may, so it seemed worth providing a closer replacement for use in future backports.

All `sun.nio.cs` and `java.nio.charset` tests pass with this patch applied.

-------------

Commit messages:
 - Fix for JDK-8310947 and some suggested cleanup
 - Backport 5c4e744dabcf7785c35168db5d0458ccebfd41e6

Changes: https://git.openjdk.org/jdk8u-dev/pull/339/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk8u-dev&pr=339&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8301119
  Stats: 1035 lines in 9 files changed: 926 ins; 12 del; 97 mod
  Patch: https://git.openjdk.org/jdk8u-dev/pull/339.diff
  Fetch: git fetch https://git.openjdk.org/jdk8u-dev.git pull/339/head:pull/339

PR: https://git.openjdk.org/jdk8u-dev/pull/339


More information about the jdk8u-dev mailing list