RFR: 8372353: API to compute the byte length of a String encoded in a given Charset
Roger Riggs
rriggs at openjdk.org
Tue Jan 13 22:04:39 UTC 2026
On Fri, 21 Nov 2025 14:58:55 GMT, Liam Miller-Cushon <cushon at openjdk.org> wrote:
> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background.
>
> ---
>
>
> Benchmark (encoding) (stringLength) Mode Cnt Score Error Units
> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt 5 406782650.595 ± 16960032.852 ops/s
> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt 5 172936926.189 ± 4532029.201 ops/s
> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt 5 38830681.232 ± 2413274.766 ops/s
> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt 5 458881.155 ± 12818.317 ops/s
> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt 5 37193762.990 ± 3962947.391 ops/s
> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt 5 55400876.236 ± 1267331.434 ops/s
> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt 5 11104514.001 ± 41718.545 ops/s
> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt 5 182535.414 ± 10296.120 ops/s
> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt 5 113474681.457 ± 8326589.199 ops/s
> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt 5 37854103.127 ± 4808526.773 ops/s
> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt 5 4139833.009 ± 70636.784 ops/s
> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt 5 57644.637 ± 1887.112 ops/s
> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt 5 946701647.247 ± 76938927.141 ops/s
> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt 5 396615374.479 ± 15167234.884 ops/s
> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt 5 100464784.979 ± 794027.897 ops/s
> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt 5 1215487.689 ± 1916.468 ops/s
> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt 5 221265102.323 ± 17013983.056 ops/s
> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt 5 137617873.887 ± 5842185.781 ops/s
> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt 5 92540259.130 ± 3839233.582 ops/s
> StringLoopJmhBenchmark.ge...
The test has an odd mix of throwing Exception and RuntimeException.
It would be good to upgrade the test to use JUnit (though it could/should be a separate PR).
src/java.base/share/classes/java/lang/String.java line 2112:
> 2110: *
> 2111: * <p>The result will be the same value as {@code getBytes(charset).length}.
> 2112: *
An @implNote or @apiNote maybe useful to indicate that this may allocate memory to compute the length for some Charsets.
src/java.base/share/classes/java/lang/String.java line 2120:
> 2118: return encodedLengthUTF8(coder, value);
> 2119: }
> 2120: if (bytesCompatible(cs, 0, value.length)) {
BytesCompatible gives a non-optimal answer for a US_ASCII input that has chars > 0x7f.
src/java.base/share/classes/java/lang/String.java line 2125:
> 2123: if (cs instanceof sun.nio.cs.UTF_16LE ||
> 2124: cs instanceof sun.nio.cs.UTF_16BE) {
> 2125: return value.length << (1 - coder());
Please encapsulate this computation `byteFor(int length, coder) {...}` to make it easier to re-use and document.
-------------
PR Review: https://git.openjdk.org/jdk/pull/28454#pullrequestreview-3658097768
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2688260162
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2688257004
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2688253744
More information about the core-libs-dev
mailing list