RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v2]
Liam Miller-Cushon
cushon at openjdk.org
Wed Jan 14 10:59:36 UTC 2026
On Tue, 13 Jan 2026 22:02:13 GMT, Roger Riggs <rriggs at openjdk.org> wrote:
> The test has an odd mix of throwing Exception and RuntimeException. It would be good to upgrade the test to use JUnit (though it could/should be a separate PR).
It seemed like `test/jdk/java/lang/String/Encodings.java` mostly uses `Exception`, and `test/jdk/sun/nio/cs/TestStringCoding.java` mostly uses `RuntimeException`, I was trying to be consistent with the existing code. I can take a look at migrating to Junit as a separate cleanup.
> src/java.base/share/classes/java/lang/String.java line 2112:
>
>> 2110: *
>> 2111: * <p>The result will be the same value as {@code getBytes(charset).length}.
>> 2112: *
>
> An @implNote or @apiNote maybe useful to indicate that this may allocate memory to compute the length for some Charsets.
Done, thanks
> src/java.base/share/classes/java/lang/String.java line 2120:
>
>> 2118: return encodedLengthUTF8(coder, value);
>> 2119: }
>> 2120: if (bytesCompatible(cs, 0, value.length)) {
>
> BytesCompatible gives a non-optimal answer for a US_ASCII input that has chars > 0x7f.
I updated this to not use `bytesCompatible`, and optimized the US_ASCII case.
> src/java.base/share/classes/java/lang/String.java line 2125:
>
>> 2123: if (cs instanceof sun.nio.cs.UTF_16LE ||
>> 2124: cs instanceof sun.nio.cs.UTF_16BE) {
>> 2125: return value.length << (1 - coder());
>
> Please encapsulate this computation `byteFor(int length, coder) {...}` to make it easier to re-use and document.
Done
-------------
PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3748957104
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2689955692
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2689958009
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2689959343
More information about the core-libs-dev
mailing list