[jdk8u-dev] RFR: 8301119: Support for GB18030-2022

Andrew John Hughes andrew at openjdk.org
Thu Jul 13 01:11:09 UTC 2023


On Wed, 12 Jul 2023 16:45:19 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This patch modifies GB18030 to handle both the 2000 and the 2022 variant. The 2000 variant is available by setting `-Djdk.charset.GB18030=2000`.  This PR replaces https://github.com/openjdk/jdk8u/pull/45.
>> 
>> With the preceding test changes in place (https://github.com/openjdk/jdk8u/pull/43 and https://github.com/openjdk/jdk8u/pull/44), the changes needed for this are fairly minimal. The biggest divergence from 11u is in the character set providers. The changes in the `make` directory are not needed as 8u never moved to using a template for GB18030 in the first place (the 11u changes revert it back to being source-based). The change in the SPI.java generator tool moves into ExtendedCharsets.java in the class library, as the file is not auto-generated in 8u. Following additional work by @jerboaa, the alias is now set to `2022` initially, and then replaced in the `init()` method if `jdk.charset.GB18030` is `2000`.
>> 
>> In 8u, the standard charsets are generated from a text file by a shell script, while the extended charsets are handled by a standard class. 11u moves GB18030 from extended to standard. I experimented with this in 8u, but it seemed more problematic than just keeping it in the extended set. The only reason I can see for moving it in 11u is it allowed `IS_2000` to be package-private to sun.nio.cs. This is resolved by @jerboaa's changes which modify `aliasMap` appropriately on creation, and so allow the use of a package-private method `isGB18030_2000` in `ExtendedCharsets` instead.
>> 
>> To use the 11u solution would mean major rewrites to the shell script or bringing over the whole change in how the standard charset provider is generated from 11u, which I think, along with moving the package the character set is in, is too risky and unnecessary for this change. The generation changes are necessary because the GB18030 character set needs to provide a different alias, depending on whether it is the 2000 or 2002 variant. The `genCharsetProvider.sh` would need the alterations we have added to `ExtendedCharsets.java` to handle this, but converted to `awk`. I did experiment with this, but saw test failures.
>> 
>> The only adjustment to the `GB18030.java` changes is copyright headers and the removal of `IS_2000` as mentioned above.
>> 
>> With the tests, the adjustments are just due to differing bug IDs, the absence of `@modules` and the use of constructs (`var`) and library calls (`Set.of`) that don't exist in 8u. The `List.of` and `Set.of` calls are fr...
>
> jdk/test/lib/testlibrary/jdk/testlibrary/Utils.java line 990:
> 
>> 988:         if (e4 == null) { throw new NullPointerException("e4"); }
>> 989:         if (e5 == null) { throw new NullPointerException("e5"); }
>> 990:         if (e6 == null) { throw new NullPointerException("e6"); }
> 
> Here, and in other places: Objects.requireNonNull may be shorter.

Completely forgot that method existed!

-------------

PR Review Comment: https://git.openjdk.org/jdk8u-dev/pull/339#discussion_r1261864914


More information about the jdk8u-dev mailing list