[jdk8u-dev] RFR: 8301119: Support for GB18030-2022

Thomas Stuefe stuefe at openjdk.org
Wed Jul 12 16:58:14 UTC 2023


On Wed, 12 Jul 2023 11:21:53 GMT, Andrew John Hughes <andrew at openjdk.org> wrote:

> This patch modifies GB18030 to handle both the 2000 and the 2022 variant. The 2000 variant is available by setting `-Djdk.charset.GB18030=2000`.  This PR replaces https://github.com/openjdk/jdk8u/pull/45.
> 
> With the preceding test changes in place (https://github.com/openjdk/jdk8u/pull/43 and https://github.com/openjdk/jdk8u/pull/44), the changes needed for this are fairly minimal. The biggest divergence from 11u is in the character set providers. The changes in the `make` directory are not needed as 8u never moved to using a template for GB18030 in the first place (the 11u changes revert it back to being source-based). The change in the SPI.java generator tool moves into ExtendedCharsets.java in the class library, as the file is not auto-generated in 8u. Following additional work by @jerboaa, the alias is now set to `2022` initially, and then replaced in the `init()` method if `jdk.charset.GB18030` is `2000`.
> 
> In 8u, the standard charsets are generated from a text file by a shell script, while the extended charsets are handled by a standard class. 11u moves GB18030 from extended to standard. I experimented with this in 8u, but it seemed more problematic than just keeping it in the extended set. The only reason I can see for moving it in 11u is it allowed `IS_2000` to be package-private to sun.nio.cs. This is resolved by @jerboaa's changes which modify `aliasMap` appropriately on creation, and so allow the use of a package-private method `isGB18030_2000` in `ExtendedCharsets` instead.
> 
> To use the 11u solution would mean major rewrites to the shell script or bringing over the whole change in how the standard charset provider is generated from 11u, which I think, along with moving the package the character set is in, is too risky and unnecessary for this change. The generation changes are necessary because the GB18030 character set needs to provide a different alias, depending on whether it is the 2000 or 2002 variant. The `genCharsetProvider.sh` would need the alterations we have added to `ExtendedCharsets.java` to handle this, but converted to `awk`. I did experiment with this, but saw test failures.
> 
> The only adjustment to the `GB18030.java` changes is copyright headers and the removal of `IS_2000` as mentioned above.
> 
> With the tests, the adjustments are just due to differing bug IDs, the absence of `@modules` and the use of constructs (`var`) and library calls (`Set.of`) that don't exist in 8u. The `List.of` and `Set.of` calls are frequent issues in backp...

Hi Andrew,

I looked through the patch, and compared it with 11u, too, looks fine. Minor remarks inline, feel free to ignore the style nits.

jdk/src/share/classes/sun/nio/cs/ext/ExtendedCharsets.java line 1189:

> 1187:             new GetPropertyAction("sun.nio.cs.map"));
> 1188:         boolean isGB18030_2000 = "2000".equals(AccessController.doPrivileged(
> 1189:                 new GetPropertyAction("jdk.charset.GB18030")));

More user friendly would be to test for "2000" or "2023" and to throw IAE for other values; but since later JDK versions don't do this either this is probably fine.

jdk/src/share/classes/sun/nio/cs/ext/ExtendedCharsets.java line 1217:

> 1215:                     "gb18030-2000"
> 1216:                 });
> 1217:         }

Can you please add this replacement to the comment above listing the changes done in init?

jdk/src/share/classes/sun/nio/cs/ext/GB18030.java line 12640:

> 12638:             int hiByte = 0, loByte = 0;
> 12639:             currentState = GB18030_DOUBLE_BYTE;
> 12640:             boolean isGB18030_2000 = ExtendedCharsets.isGB18030_2000();

Here, and in other places: naming these vars IS_2000 would have simplified reviewing (mechanical comparison with 11)

jdk/test/lib/testlibrary/jdk/testlibrary/Utils.java line 990:

> 988:         if (e4 == null) { throw new NullPointerException("e4"); }
> 989:         if (e5 == null) { throw new NullPointerException("e5"); }
> 990:         if (e6 == null) { throw new NullPointerException("e6"); }

Here, and in other places: Objects.requireNonNull may be shorter.

jdk/test/lib/testlibrary/jdk/testlibrary/Utils.java line 1004:

> 1002:         if (!added) { throw new IllegalArgumentException("duplicate 5"); }
> 1003:         added = s.add(e6);
> 1004:         if (!added) { throw new IllegalArgumentException("duplicate 6"); }

Could be shortened, but thats up to you (if one does not care for the string, a simple `if (!(s.add(e1) && s.add(e2) && ...` would be sufficient.

-------------

Marked as reviewed by stuefe (Committer).

PR Review: https://git.openjdk.org/jdk8u-dev/pull/339#pullrequestreview-1526759863
PR Review Comment: https://git.openjdk.org/jdk8u-dev/pull/339#discussion_r1261429640
PR Review Comment: https://git.openjdk.org/jdk8u-dev/pull/339#discussion_r1261428343
PR Review Comment: https://git.openjdk.org/jdk8u-dev/pull/339#discussion_r1261437418
PR Review Comment: https://git.openjdk.org/jdk8u-dev/pull/339#discussion_r1261450643
PR Review Comment: https://git.openjdk.org/jdk8u-dev/pull/339#discussion_r1261455655


More information about the jdk8u-dev mailing list