[jdk8u] RFR: 8186801: Add regression test to test mapping based charsets (generated at build time)
Thomas Stuefe
stuefe at openjdk.org
Thu Jun 22 08:54:10 UTC 2023
On Fri, 16 Jun 2023 15:55:38 GMT, Andrew John Hughes <andrew at openjdk.org> wrote:
> This is a pre-requisite for [JDK-8301119](https://bugs.openjdk.org/browse/JDK-8301119) ("Support for GB18030-2022") and so is being proposed for inclusion in 8u382 during rampdown, so that the changes are in place for when GB18030-2022 enforcement begins in August. It introduces `GB18030.map`, containing the character mappings for GB18030, and tests to verify the correct mappings happen at run-time.
>
> A number of changes were necessary for 8u. One main reason was the inclusion of [JDK-8186803](https://bugs.openjdk.org/browse/JDK-8186803) "Update Cp1140-Cp1149 EBEDIC euro charset to map \u000A to EBCDIC 0x15" as part of this fix in OpenJDK 10+. I have removed these changes in the 8u version so as to avoid making potentially incompatible library changes and focus on testing the current character mappings in 8u.
>
> Another is that the character set data is spread across three files - `dbcs`, `sbcs` and `extsbcs` - in 8u, whereas it was amalgamated into a single file, `charsets`, during the introduction of modules. The coding test (`TestCharsetMapping.java`) has been adapted to use the 8u data format.
>
> The detailed changes from the OpenJDK 10 patch are as follows:
>
> 1. Drop the introduction of the `IBM114x.nr` files which implement JDK-8186803.
> 2. Drop the change to `charsets` which doesn't exist in 8u and any equivalent change may lead to compatibility issues
> 3. `EUC_TW.java` has slightly different context in 8u (no `pkg` argument) but the filename change is the same
> 4. Drop the alias changes in `MS932_0213.java` & `x-MS932_0213` to avoid a compatibility risk.
> 5. Changes to `EuroConverter.java` are dropped as they relate to JDK-8186803.
> 6. Detect the IBM0114x character sets in `TestCharsetMapping.java` and expect an additional 0xA -> 25 mapping rather than counting this as an error
> 7. Remove the parsing and checking of aliases in `TestCharsetMapping.java` as the 8u data files don't store them
> 8. Handle the internal character sets in `TestCharsetMapping.java` as they are not marked as such in the 8u data files
> 9. Change the data file parsing in `TestCharsetMapping.java` to handle the three 8u data files. `dbcs` has additional fields to the other two, but the first five fields that we actually use are mostly the same (`dbcs` has a type field, the other two assume a type of `sbcs`).
> 10. `TestEBCDICLineFeed.java` is modified to handle IBM0114x as is, without the JDK-8186803 change
Hi,
First review round. (Please note that formally, I am not jdk8u reviewer).
looked at the delta between two patches (yours and the JDK 10 one) and the delta on the file system between the frozen jdk10 repo and jdk8u-dev with your patch applied.
GB18030.map
idendical after patch
Okay.
EUC_TW.map:
- identical after patch
- renamed uppercase in 8
- I looked for remnants of lower case "euc_tw" but all I found was the historical alias.
Okay.
Looked at standard charset definitions. So, 8 (dbcs sbcs extsbcs) => 10 (charsets), and they changed the file format too.
Looked at the EBCDIC linefeed test. Arguably, you would not have to do this, or could do this in a separate patch. It has no bearing on the upcoming GB18030. But it's good to have. JDK-8186803 was a brainteaser, though, and it's annoying they did not do an individual patch.
Left to review is the Testcase itself. Will do after lunch.
Cheers, Thomas
jdk/test/sun/nio/cs/TestEBCDICLineFeed.java line 56:
> 54: cs, bb[0] & 0xff);
> 55: errs++;
> 56: }
Took some thinking, but I think this is okay.
Are we sure the old (pre-8186803) translations are always "U+000A" -> 25? I am not sure, it looked to me like it could be either mapped to E15 or E25.
-------------
PR Review: https://git.openjdk.org/jdk8u/pull/43#pullrequestreview-1492697234
PR Review Comment: https://git.openjdk.org/jdk8u/pull/43#discussion_r1238206969
More information about the jdk8u-dev
mailing list