<i18n dev> RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v17]

Eirik Bjørsnøs eirbjo at openjdk.org
Mon Feb 9 21:32:04 UTC 2026


On Mon, 9 Feb 2026 16:26:42 GMT, Liam Miller-Cushon <cushon at openjdk.org> wrote:

> What is the use-case for `decodedLength` in `ZipFile`? Does 'efficient rejection of strings without decoding' require knowing the decoded length, or just whether the data is a valid encoding?

The ZIP file CEN header format only includes the length of the name in encoded form. Knowing the length of the decoded string could potentially let us quickly reject lookup matches against a lookup String only based on comparing string lengths (ZipFile supports returning "directory/" as a result for "directory", so we know a match would be 9 or 10 chars long).  

In practise, we compare hash codes before comparing strings. So this would only be useful for hash collisions. These are rare, so not worth optimizing. Perhaps there are other oppertinities though, it would certainly be possible to reject lookups based on min/max occurrence of entry lengths (or perhaps a bitset of occurring string lengths).

But I'm sure there are other use cases where a `java.lang.String` is compared to its encoded form and knowing the length without String allocation could be useful. Input validation is another, possibly combined with rejection on malformed encoded data.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3872836764


More information about the i18n-dev mailing list