<i18n dev> RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v17]

Mon Feb 9 21:30:52 UTC 2026

On Fri, 6 Feb 2026 14:08:07 GMT, Liam Miller-Cushon <cushon at openjdk.org> wrote:

> The main reason `getEncodedLength` wasn't used is that it doesn't make it clear that the unit of length is bytes. For UTF-8 a byte length is intuitive, for e.g. UTF-16 or UTF-32 the "encoded length" could also be the count of int16 (number of wchar_t) or int32.

Emphasizing the unit of measurement is a laudable goal. I just feel that in this case it obscures what is being computed.

What’s computed here is the *encoded* length, the unit of measurement seems a secondary concern.

A String does not intrinsically have a «byte length», this concept seems only meaningful in relation to an encoding operation.

Was `getEncodedByteLength` considered? `getEncodedLengthInBytes`?

Was Charset considered as a home for this method? There the operational context of encoding would be obvious.

> Including a `get` prefix or not was also considered, one benefit of `get` is that it aligns with `getBytes`, and also it may help convey that the method is doing computation (it's often going to be O(1), compared to e.g. `length()` which is O(1)).

Again a laudable goal, but the actual computation seems obscure.

String is prime real estate for millions of programmers. We should get this right.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3861158061