RFR: 8376226: CharsetEncoder.canEncode(CharSequence) is much slower than necessary [v2]
Alan Bateman
alanb at openjdk.org
Sun Jan 25 17:36:56 UTC 2026
On Fri, 23 Jan 2026 19:00:31 GMT, Daniel Gredler <dgredler at openjdk.org> wrote:
>> Subclasses of `CharsetEncoder` often override `canEncode(char)` in order to make it very fast. This is not the case for `canEncode(CharSequence)`, which currently must usually perform the full encoding process. As a result, `canEncode(CharSequence)` is about 20x slower than `canEncode(char)` when the input is encodable, and about 1600x slower than `canEncode(char)` when the input is not encodable.
>>
>> The reason that performance is even slower for un-encodable input is that the internal logic is relying on a thrown exception to determine that the input cannot be encoded (requiring stack trace setup, etc).
>>
>> This PR overrides `canEncode(CharSequence)` to simply check `canEncode(char)` on each character in the sequence when the encoding allows this (ASCII, ISO-8859-1, etc). Where this is not possible (e.g. UTF-8, UTF-16) this PR removes the exception-based flow control in `CharsetEncoder` so that the un-encodable scenario is at least improved.
>>
>> Regression tests run locally:
>> - `make test TEST="jtreg:test/jdk/java/nio/charset"`
>> - `make test TEST="jtreg:test/jdk/sun/nio/cs"`
>>
>> The included benchmark can be run via `make test TEST="micro:java.nio.CharsetCanEncode"`.
>>
>> JMH benchmark results **before** changes:
>>
>>
>> Benchmark Mode Cnt Score Error Units
>> CharsetCanEncode.asciiCanEncodeCharNo avgt 30 0.502 ± 0.004 ns/op
>> CharsetCanEncode.asciiCanEncodeCharYes avgt 30 0.503 ± 0.003 ns/op
>> CharsetCanEncode.asciiCanEncodeStringNo avgt 30 821.635 ± 7.055 ns/op
>> CharsetCanEncode.asciiCanEncodeStringYes avgt 30 8.875 ± 0.115 ns/op
>> CharsetCanEncode.iso88591CanEncodeCharNo avgt 30 0.508 ± 0.006 ns/op
>> CharsetCanEncode.iso88591CanEncodeCharYes avgt 30 0.506 ± 0.004 ns/op
>> CharsetCanEncode.iso88591CanEncodeStringNo avgt 30 833.165 ± 7.315 ns/op
>> CharsetCanEncode.iso88591CanEncodeStringYes avgt 30 10.357 ± 1.427 ns/op
>> CharsetCanEncode.iso88592CanEncodeCharNo avgt 30 0.957 ± 0.009 ns/op
>> CharsetCanEncode.iso88592CanEncodeCharYes avgt 30 1.407 ± 0.010 ns/op
>> CharsetCanEncode.iso88592CanEncodeStringNo avgt 30 826.478 ± 4.409 ns/op
>> CharsetCanEncode.iso88592CanEncodeStringYes avgt 30 13.223 ± 1.479 ns/op
>> CharsetCanEncode.shiftjisCanEncodeCharNo avgt 30 1.370 ± 0.012 ns/op
>> CharsetCanEncode.shiftjisCanEncodeCharYes avgt 30 1.386 ± 0.010 ns/op
>> CharsetCanEncode.sh...
>
> Daniel Gredler has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>
> - Merge master
> - Optimize CharsetEncoder.canEncode(CharSequence)
`canEncode(CharSequence)` in the base classes wraps the String representation. Maybe you've tried this already but can you replace that with the loop over each char so it can be compared with the overrides?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/29391#issuecomment-3796994065
More information about the nio-dev
mailing list