<i18n dev> RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v17]
Eirik Bjørsnøs
eirbjo at openjdk.org
Mon Feb 9 21:31:20 UTC 2026
On Fri, 6 Feb 2026 16:34:38 GMT, Roger Riggs <rriggs at openjdk.org> wrote:
> I'd be fine with `getEncodedLength(Charset)`.
This name makes it clear that the method returns the length of this String, after encoding. While it is not explicity that the length is in bytes, as Roger says, encoding implies bytes anyway, so at the least "Encoded" strongly suggests the unit of measurement is bytes.
The existence of a separate method for an encoded length makes it clear that this cannot have the same semantics as `String::length`. I don't think there is much room for confusion.
While the similarity to the `getBytes().length` idiom is neat in 2026, I don't think it is important enough to determine the name of the metod. A method should describe what it does, not what it's soon to be outdated idiom used to look like back in the day.
> The discoverability of the method if placed as `Charset.getEncodedLength(String)` would be very low and would require cross-package hacking to gain the performance advantage.
The target audicence for this method is not discussed in the CSR, at least not explicitly. If the target audience is "most programmers", then String is a good home. If the target audience is specialist/framework/library developers working in very large companies with very large datasets, then perhaps the discoverabilty is not such a big issue and it's better to "hide away" this in Charset, which is where encoding operations generally live. If performance is important enough, it can warrant use of `JavaLangAccess`.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3861566063
More information about the i18n-dev
mailing list