<i18n dev> RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v17]
Eirik Bjørsnøs
eirbjo at openjdk.org
Mon Feb 9 21:30:52 UTC 2026
On Fri, 6 Feb 2026 14:08:07 GMT, Liam Miller-Cushon <cushon at openjdk.org> wrote:
> The main reason `getEncodedLength` wasn't used is that it doesn't make it clear that the unit of length is bytes. For UTF-8 a byte length is intuitive, for e.g. UTF-16 or UTF-32 the "encoded length" could also be the count of int16 (number of wchar_t) or int32.
Emphasizing the unit of measurement is a laudable goal. I just feel that in this case it obscures what is being computed.
What’s computed here is the *encoded* length, the unit of measurement seems a secondary concern.
A String does not intrinsically have a «byte length», this concept seems only meaningful in relation to an encoding operation.
Was `getEncodedByteLength` considered? `getEncodedLengthInBytes`?
Was Charset considered as a home for this method? There the operational context of encoding would be obvious.
> Including a `get` prefix or not was also considered, one benefit of `get` is that it aligns with `getBytes`, and also it may help convey that the method is doing computation (it's often going to be O(1), compared to e.g. `length()` which is O(1)).
Again a laudable goal, but the actual computation seems obscure.
String is prime real estate for millions of programmers. We should get this right.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3861158061
More information about the i18n-dev
mailing list