<i18n dev> RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v17]

Eirik Bjørsnøs eirbjo at openjdk.org
Mon Feb 9 21:31:47 UTC 2026


On Fri, 30 Jan 2026 15:56:20 GMT, Liam Miller-Cushon <cushon at openjdk.org> wrote:

>> This implements an API to return the byte length of a String encoded in a given charset. See [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background.
>> 
>> ---
>> 
>> 
>> Benchmark                              (encoding)  (stringLength)   Mode  Cnt          Score          Error  Units
>> StringLoopJmhBenchmark.getBytes             ASCII              10  thrpt    5  406782650.595 ± 16960032.852  ops/s
>> StringLoopJmhBenchmark.getBytes             ASCII             100  thrpt    5  172936926.189 ±  4532029.201  ops/s
>> StringLoopJmhBenchmark.getBytes             ASCII            1000  thrpt    5   38830681.232 ±  2413274.766  ops/s
>> StringLoopJmhBenchmark.getBytes             ASCII          100000  thrpt    5     458881.155 ±    12818.317  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1              10  thrpt    5   37193762.990 ±  3962947.391  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1             100  thrpt    5   55400876.236 ±  1267331.434  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1            1000  thrpt    5   11104514.001 ±    41718.545  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1          100000  thrpt    5     182535.414 ±    10296.120  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16              10  thrpt    5  113474681.457 ±  8326589.199  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16             100  thrpt    5   37854103.127 ±  4808526.773  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16            1000  thrpt    5    4139833.009 ±    70636.784  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16          100000  thrpt    5      57644.637 ±     1887.112  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII              10  thrpt    5  946701647.247 ± 76938927.141  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII             100  thrpt    5  396615374.479 ± 15167234.884  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII            1000  thrpt    5  100464784.979 ±   794027.897  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII          100000  thrpt    5    1215487.689 ±     1916.468  ops/s
>> StringLoopJmhBenchmark.getBytesLength      LATIN1              10  thrpt    5  221265102.323 ± 17013983.056  ops/s
>> StringLoopJmhBenchmark.getBytesLength      LATIN1             100  thrpt    5  137617873.887 ±  5842185.781  ops/s
>> StringLoopJmhBenchmark.getBytesLength      LATIN1            1000  thrpt    5   92540259.1...
>
> Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Rename getBytesLength to getByteLength

Should we also consider the inverse operation, that is to compute the length of a String had it been decoded from a sequence of bytes?

`new String(byte[], Charset).length()` 

Someone will eventually ask for this. I see some potential use case for it in the `ZipFile` implementation where knowing the length ahead of decoding could provide efficient rejection of strings without decoding and without looking at String contents. 

Not saying we need to add it now, just that the name chosen here should leave room for a future addition of this inverse operation.

Something like:


str.getEncodedLength(Charset); // Encoded length of this string
String.getDecodedLength(byte[], Charset); // Decoded length of byte sequence 


or, with the current scheme:


str.getByteLength(Charset); // Encoded length of this string
String.getStringLength(byte[], Charset); // Decoded length of byte sequence 


EDIT:

Moving this out of `java.lang.String` unlocks:

* Symmetry in that both can be instance methods
* We would be free to support `ByteBuffer` and any `CharSequence`, not just strings:


Charset cs = StandardCharsets.UTF_8;
String h = "hello";
byte[] bytes = h.getBytes(cs);

cs.encodedLength(CharBuffer.wrap(h));
cs.encodedLength(new StringBuilder(h));

cs.decodedLength(bytes);
cs.decodedLength(ByteBuffer.wrap(bytes));

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3869643511


More information about the i18n-dev mailing list