RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v5]
Liam Miller-Cushon
cushon at openjdk.org
Thu Jan 15 20:05:05 UTC 2026
On Thu, 15 Jan 2026 19:23:43 GMT, Roger Riggs <rriggs at openjdk.org> wrote:
> While is convenient that those UTF16 charsets have a easy to compute size, I doubt those two are in sufficient use to justify a commitment support them in the fast path. If you are going to support charsets beyond the most common utf8, ascii, and ISO-8856-1, then computing the encoded length should delegated to the Charset itself and have separate code in different packages.
Thanks, that makes sense to me. My opinion is that a large amount of the value here is in optimizing UTF-8, and that there's an argument to optimize the other standard charsets that `String` has other fast paths for, but sharply diminishing returns beyond that. I would be inclined to stop at the standard charsets, but also happy to make changes if there's a preference for having more or fewer fast paths.
> Have you looked at `CharsetEncoder.maxBytesPerChar()`, It might only be useful for single byte formats, but if `maxBytesPerChar` is equal to `averageBytesPerChar` that might be a useful shortcut.
I had a quick look at that, and saw errors for `IBM-Thai`:
CharsetEncoder encoder = cs.newEncoder();
if (encoder.maxBytesPerChar() == 1f && encoder.maxBytesPerChar() == encoder.averageBytesPerChar()) {
return value.length * (int) encoder.maxBytesPerChar();
}
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695769015
More information about the core-libs-dev
mailing list