RFR: 8376226: CharsetEncoder.canEncode(CharSequence) is much slower than necessary [v2]

Mon Jan 26 15:02:14 UTC 2026

On Sun, 25 Jan 2026 17:34:16 GMT, Alan Bateman <alanb at openjdk.org> wrote:

>> Daniel Gredler has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>> 
>>  - Merge master
>>  - Optimize CharsetEncoder.canEncode(CharSequence)
>
> `canEncode(CharSequence)` in the base classes wraps the String representation. Maybe you've tried this already but can you replace that with the loop over each char so it can be compared with the overrides?

@AlanBateman A simple char loop in the base `CharsetEncoder` class would be incorrect for encodings like UTF-8 and UTF-16, no? (since they have surrogate pairs which may not encode correctly in isolation) I think the current behavior (slow, but guaranteed to work if actual encoding works) is good as a baseline fallback.

The `EUC_TW` encoder is currently the only class which overrides `canEncode(CharSequence)`, and it has to deal with surrogates. I can update `canEncode` in `UTF_8` and `UnicodeEncoder` (which handles UTF-16 variants) if you'd like, but I had initially tried to focus on the simpler cases (no surrogate pairs). Just let me know.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/29391#issuecomment-3800030312