<i18n dev> RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v17]

Liam Miller-Cushon cushon at openjdk.org
Mon Feb 9 21:31:55 UTC 2026


On Fri, 6 Feb 2026 16:34:38 GMT, Roger Riggs <rriggs at openjdk.org> wrote:

> The encoded form is always bytes, so I don't think 'byte' needs to be in the name. I'd be fine with getEncodedLength(Charset).

The javadoc would specify that it's a length in bytes, so perhaps that's sufficient without including 'bytes' in the method name.

I do think that some callers might expect `getEncodedLength(UTF_16)` to return a length in code units and not bytes. There was some related discussion in [JDK-8372338](https://bugs.openjdk.org/browse/JDK-8372338) and also Maurizio's [Pulling the (foreign) string](https://cr.openjdk.org/~mcimadamore/panama/strings_ffm.html#reading-strings-with-known-length) doc.

> The discoverability of the method if placed as Charset.getEncodedLength(String) would be very low and would require cross-package hacking to gain the performance advantage.

For completeness, here's a demo of it in `CharsetEncoder` (https://github.com/openjdk/jdk/pull/29639). As expected it's possible to implement it that way and preserve equivalent performance, by adding a package visibility method to `String` and using `JavaLangAccess`. With that change, `string.getByteLength(UTF_8)` could be expressed as:


    try {
        int byteLength = StandardCharsets.UTF_8.newEncoder()
                .onUnmappableCharacter(CodingErrorAction.REPLACE)
                .onMalformedInput(CodingErrorAction.REPLACE)
                .getByteLength(stringData);
    } catch (CharacterCodingException e) {
        throw new IllegalStateException(e);
    }


I can update the CSR to document this as an alternative.

> Should we also consider the inverse operation, that is to compute the length of a String had it been decoded from a sequence of bytes? Someone will eventually ask for this. I see some potential use case for it in the ZipFile implementation where knowing the length ahead of decoding could provide efficient rejection of strings without decoding and without looking at String contents.

What is the use-case for `decodedLength` in `ZipFile`? Does 'efficient rejection of strings without decoding' require knowing the decoded length, or just whether the data is a valid encoding?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3872766017


More information about the i18n-dev mailing list