[jdk8u] RFR: 8186801: Add regression test to test mapping based charsets (generated at build time)
Andrew John Hughes
andrew at openjdk.org
Fri Jun 16 16:05:31 UTC 2023
This is a pre-requisite for [JDK-8301119](https://bugs.openjdk.org/browse/JDK-8301119) ("Support for GB18030-2022") and so is being proposed for inclusion in 8u382 during rampdown, so that the changes are in place for when GB18030-2022 enforcement begins in August. It introduces `GB18030.map`, containing the character mappings for GB18030, and tests to verify the correct mappings happen at run-time.
A number of changes were necessary for 8u. One main reason was the inclusion of [JDK-8186803](https://bugs.openjdk.org/browse/JDK-8186803) "Update Cp1140-Cp1149 EBEDIC euro charset to map \u000A to EBCDIC 0x15" as part of this fix in OpenJDK 10+. I have removed these changes in the 8u version so as to avoid making potentially incompatible library changes and focus on testing the current character mappings in 8u.
Another is that the character set data is spread across three files - `dbcs`, `sbcs` and `extsbcs` - in 8u, whereas it was amalgamated into a single file, `charsets`, during the introduction of modules. The coding test (`TestCharsetMapping.java`) has been adapted to use the 8u data format.
The detailed changes from the OpenJDK 10 patch are as follows:
1. Drop the introduction of the `IBM114x.nr` files which implement JDK-8186803.
2. Drop the change to `charsets` which doesn't exist in 8u and any equivalent change may lead to compatibility issues
3. `EUC_TW.java` has slightly different context in 8u (no `pkg` argument) but the filename change is the same
4. Drop the alias changes in `MS932_0213.java` & `x-MS932_0213` to avoid a compatibility risk.
5. Changes to `EuroConverter.java` are dropped as they relate to JDK-8186803.
6. Detect the IBM0114x character sets in `TestCharsetMapping.java` and expect an additional 0xA -> 25 mapping rather than counting this as an error
7. Remove the parsing and checking of aliases in `TestCharsetMapping.java` as the 8u data files don't store them
8. Handle the internal character sets in `TestCharsetMapping.java` as they are not marked as such in the 8u data files
9. Change the data file parsing in `TestCharsetMapping.java` to handle the three 8u data files. `dbcs` has additional fields to the other two, but the first five fields that we actually use are mostly the same (`dbcs` has a type field, the other two assume a type of `sbcs`).
10. `TestEBCDICLineFeed.java` is modified to handle IBM0114x as is, without the JDK-8186803 change
-------------
Commit messages:
- Backport cfe34ed89c4f6ef9a49dceef30da1e43b418b152
Changes: https://git.openjdk.org/jdk8u/pull/43/files
Webrev: https://webrevs.openjdk.org/?repo=jdk8u&pr=43&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8186801
Stats: 144279 lines in 14 files changed: 130668 ins; 12678 del; 933 mod
Patch: https://git.openjdk.org/jdk8u/pull/43.diff
Fetch: git fetch https://git.openjdk.org/jdk8u.git pull/43/head:pull/43
PR: https://git.openjdk.org/jdk8u/pull/43
More information about the jdk8u-dev
mailing list