RFR: 8376226: CharsetEncoder.canEncode(CharSequence) is much slower than necessary [v2]

Sun Jan 25 17:36:56 UTC 2026

On Fri, 23 Jan 2026 19:00:31 GMT, Daniel Gredler <dgredler at openjdk.org> wrote:

>> Subclasses of `CharsetEncoder` often override `canEncode(char)` in order to make it very fast. This is not the case for `canEncode(CharSequence)`, which currently must usually perform the full encoding process. As a result, `canEncode(CharSequence)` is about 20x slower than `canEncode(char)` when the input is encodable, and about 1600x slower than `canEncode(char)` when the input is not encodable.
>> 
>> The reason that performance is even slower for un-encodable input is that the internal logic is relying on a thrown exception to determine that the input cannot be encoded (requiring stack trace setup, etc).
>> 
>> This PR overrides `canEncode(CharSequence)` to simply check `canEncode(char)` on each character in the sequence when the encoding allows this (ASCII, ISO-8859-1, etc). Where this is not possible (e.g. UTF-8, UTF-16) this PR removes the exception-based flow control in `CharsetEncoder` so that the un-encodable scenario is at least improved.
>> 
>> Regression tests run locally:
>> - `make test TEST="jtreg:test/jdk/java/nio/charset"`
>> - `make test TEST="jtreg:test/jdk/sun/nio/cs"`
>> 
>> The included benchmark can be run via `make test TEST="micro:java.nio.CharsetCanEncode"`.
>> 
>> JMH benchmark results **before** changes:
>> 
>> 
>> Benchmark                                    Mode  Cnt    Score    Error  Units
>> CharsetCanEncode.asciiCanEncodeCharNo        avgt   30    0.502 ±  0.004  ns/op
>> CharsetCanEncode.asciiCanEncodeCharYes       avgt   30    0.503 ±  0.003  ns/op
>> CharsetCanEncode.asciiCanEncodeStringNo      avgt   30  821.635 ±  7.055  ns/op
>> CharsetCanEncode.asciiCanEncodeStringYes     avgt   30    8.875 ±  0.115  ns/op
>> CharsetCanEncode.iso88591CanEncodeCharNo     avgt   30    0.508 ±  0.006  ns/op
>> CharsetCanEncode.iso88591CanEncodeCharYes    avgt   30    0.506 ±  0.004  ns/op
>> CharsetCanEncode.iso88591CanEncodeStringNo   avgt   30  833.165 ±  7.315  ns/op
>> CharsetCanEncode.iso88591CanEncodeStringYes  avgt   30   10.357 ±  1.427  ns/op
>> CharsetCanEncode.iso88592CanEncodeCharNo     avgt   30    0.957 ±  0.009  ns/op
>> CharsetCanEncode.iso88592CanEncodeCharYes    avgt   30    1.407 ±  0.010  ns/op
>> CharsetCanEncode.iso88592CanEncodeStringNo   avgt   30  826.478 ±  4.409  ns/op
>> CharsetCanEncode.iso88592CanEncodeStringYes  avgt   30   13.223 ±  1.479  ns/op
>> CharsetCanEncode.shiftjisCanEncodeCharNo     avgt   30    1.370 ±  0.012  ns/op
>> CharsetCanEncode.shiftjisCanEncodeCharYes    avgt   30    1.386 ±  0.010  ns/op
>> CharsetCanEncode.sh...
>
> Daniel Gredler has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
> 
>  - Merge master
>  - Optimize CharsetEncoder.canEncode(CharSequence)

`canEncode(CharSequence)` in the base classes wraps the String representation. Maybe you've tried this already but can you replace that with the loop over each char so it can be compared with the overrides?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/29391#issuecomment-3796994065