RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v11]

Liam Miller-Cushon cushon at openjdk.org
Mon Jan 19 08:16:17 UTC 2026


On Sun, 18 Jan 2026 09:06:31 GMT, Alan Bateman <alanb at openjdk.org> wrote:

> > Question: Have you considered the handling of replacement characters? They currently are counted into the returned length, but I wonder whether users actually want to print those characters as-is.
> 
> That is a good point. As `getBytes(Charset)` is specified to replace malformed-input and unmappable-character sequences, and the proposed method is specified to return the equivalent of `getBytes(Charset).length` then the returned length has to include them.

The motivating use cases I've seen for this method are to compute the length of encoded data that contains strings, where the strings would be encoded with `getBytes`. The CSR gives the example of encoding multiple large strings into a single array. Specifying the output in terms of `getBytes(cs).length` is necessary for that use-case, and requires the handling of replacement characters and unpaired surrogates to be the same between the two methods. Do you see alternatives that should be considered?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3767013988


More information about the core-libs-dev mailing list