<i18n dev> RFR: 8364365: HKSCS encoder does not properly set the replacement character

Volkan Yazici vyazici at openjdk.org
Tue Aug 5 08:38:02 UTC 2025


On Tue, 5 Aug 2025 08:17:40 GMT, Volkan Yazici <vyazici at openjdk.org> wrote:

> Fix `HKSCS` encoder to correctly set the replacement character, and add tests to verify the `CodingErrorAction.REPLACE` behavior of all available encoders.

test/jdk/sun/nio/cs/TestEncoderReplaceLatin1.java line 1:

> 1: /*

Without the `HKSCS` fix, this test fails for following charsets:

    Big5-HKSCS
    x-Big5-HKSCS-2001
    x-MS950-HKSCS
    x-MS950-HKSCS-XP

test/jdk/sun/nio/cs/TestEncoderReplaceLatin1.java line 139:

> 137:             return coderResult.isUnmappable();
> 138:         };
> 139:     }

I'd appreciate it if you can double-check these _"Is the given `char[]` unmappable for a particular encoder?"_ test generators.

test/jdk/sun/nio/cs/TestEncoderReplaceLatin1.java line 145:

> 143:      * different from the given unmappable and the default one.
> 144:      */
> 145:     static byte[] findCustomReplacement(CharsetEncoder encoder, byte[] unmappable) {

I'd appreciate it if you can double-check this method.

test/jdk/sun/nio/cs/TestEncoderReplaceUTF16.java line 50:

> 48:  * @build TestEncoderReplaceLatin1
> 49:  * @run junit/timeout=10 TestEncoderReplaceUTF16
> 50:  * @run junit/timeout=10/othervm -XX:-CompactStrings TestEncoderReplaceUTF16

1. Exercising both compact and inflated layouts for UTF-16.
2. Timeouts ensure that if the `CHARSETS_WITHOUT_UNMAPPABLE` fast-path in `findUnmappableNonLatin1()` becomes ineffective, we will get to know

test/jdk/sun/nio/cs/TestEncoderReplaceUTF16.java line 140:

> 138:      * Finds an {@linkplain CoderResult#isUnmappable() unmappable} non-Latin-1 {@code char[]} for the given encoder.
> 139:      */
> 140:     private static char[] findUnmappableNonLatin1(CharsetEncoder encoder) {

I'd appreciate it if you can double-check this method.

test/jdk/sun/nio/cs/TestEncoderReplaceUTF16.java line 146:

> 144:             System.err.println("Character set is known to be absent of unmappable non-Latin-1 characters!");
> 145:             return null;
> 146:         }

Without this fast-path, this test take several minutes to complete due to `findUnmappableNonLatin1()` taking ~20 seconds for each character set absent of unmappable Latin-1 characters.

test/jdk/sun/nio/cs/TestEncoderReplaceUTF16.java line 184:

> 182:         }
> 183:         return sa;
> 184:     }

Used `JavaLangAccess::uncheckedPutCharUTF16` to, given a `char[]`, extract the `byte[]` backing the associated `String`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2253526576
PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2253550878
PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2253554481
PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2253532986
PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2253556174
PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2253530248
PR Review Comment: https://git.openjdk.org/jdk/pull/26635#discussion_r2253542908


More information about the i18n-dev mailing list