<i18n dev> RFR: 8364365: HKSCS encoder does not properly set the replacement character
Volkan Yazici
vyazici at openjdk.org
Tue Aug 5 08:38:02 UTC 2025
On Tue, 5 Aug 2025 08:17:40 GMT, Volkan Yazici <vyazici at openjdk.org> wrote:
> Fix `HKSCS` encoder to correctly set the replacement character, and add tests to verify the `CodingErrorAction.REPLACE` behavior of all available encoders.
test/jdk/sun/nio/cs/TestEncoderReplaceLatin1.java line 1:
> 1: /*
Without the `HKSCS` fix, this test fails for following charsets:
Big5-HKSCS
x-Big5-HKSCS-2001
x-MS950-HKSCS
x-MS950-HKSCS-XP
test/jdk/sun/nio/cs/TestEncoderReplaceLatin1.java line 139:
> 137: return coderResult.isUnmappable();
> 138: };
> 139: }
I'd appreciate it if you can double-check these _"Is the given `char[]` unmappable for a particular encoder?"_ test generators.
test/jdk/sun/nio/cs/TestEncoderReplaceLatin1.java line 145:
> 143: * different from the given unmappable and the default one.
> 144: */
> 145: static byte[] findCustomReplacement(CharsetEncoder encoder, byte[] unmappable) {
I'd appreciate it if you can double-check this method.
test/jdk/sun/nio/cs/TestEncoderReplaceUTF16.java line 50:
> 48: * @build TestEncoderReplaceLatin1
> 49: * @run junit/timeout=10 TestEncoderReplaceUTF16
> 50: * @run junit/timeout=10/othervm -XX:-CompactStrings TestEncoderReplaceUTF16
1. Exercising both compact and inflated layouts for UTF-16.
2. Timeouts ensure that if the `CHARSETS_WITHOUT_UNMAPPABLE` fast-path in `findUnmappableNonLatin1()` becomes ineffective, we will get to know
test/jdk/sun/nio/cs/TestEncoderReplaceUTF16.java line 140:
> 138: * Finds an {@linkplain CoderResult#isUnmappable() unmappable} non-Latin-1 {@code char[]} for the given encoder.
> 139: */
> 140: private static char[] findUnmappableNonLatin1(CharsetEncoder encoder) {
I'd appreciate it if you can double-check this method.
test/jdk/sun/nio/cs/TestEncoderReplaceUTF16.java line 146:
> 144: System.err.println("Character set is known to be absent of unmappable non-Latin-1 characters!");
> 145: return null;
> 146: }
Without this fast-path, this test take several minutes to complete due to `findUnmappableNonLatin1()` taking ~20 seconds for each character set absent of unmappable Latin-1 characters.
test/jdk/sun/nio/cs/TestEncoderReplaceUTF16.java line 184:
> 182: }
> 183: return sa;
> 184: }
Used `JavaLangAccess::uncheckedPutCharUTF16` to, given a `char[]`, extract the `byte[]` backing the associated `String`.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2253526576
PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2253550878
PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2253554481
PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2253532986
PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2253556174
PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2253530248
PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2253542908
More information about the i18n-dev
mailing list