RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5]
Roger Riggs
rriggs at openjdk.org
Thu Jan 15 19:27:10 UTC 2026
On Thu, 15 Jan 2026 18:18:31 GMT, Liam Miller-Cushon <cushon at openjdk.org> wrote:
>> src/java.base/share/classes/java/lang/String.java line 2151:
>>
>>> 2149: } else if (cs == US_ASCII.INSTANCE) {
>>> 2150: return encodedLengthASCII(coder, value);
>>> 2151: } else if (cs instanceof sun.nio.cs.UTF_16LE || cs instanceof sun.nio.cs.UTF_16BE) {
>>
>> I see that `sun.nio.cs.UTF_16{LE,BE}` specialization is suggested by @ExE-Boss [here]. Though I'm not really sure if this is really needed. I cannot spot any other usage of these constants in `java.base`, except `jdk.internal.foreign.StringSupport`, which is irrelevant.
>>
>> [here]: https://github.com/openjdk/jdk/pull/28454/files#r2552768341
>
> I don't have a strong opinion about these charsets. It's nice that the encoded length for them can be calculated in constant time, but on the other hand if they are less frequently used and there isn't precedent for special casing them in `java.base`, then this part could be dropped.
While is convenient that those UTF16 charsets have a easy to compute size, I doubt those two are in sufficient use to justify a commitment support them in the fast path.
If you are going to support charsets beyond the most common utf8, ascii, and ISO-8856-1, then
computing the encoded length should delegated to the Charset itself and have separate code in different packages.
Have you looked at `CharsetEncoder.maxBytesPerChar()`, It might only be useful for single byte formats, but if `maxBytesPerChar` is equal to `averageBytesPerChar` that might be a useful shortcut.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695660230
More information about the core-libs-dev
mailing list