RFR: 8376226: CharsetEncoder.canEncode(CharSequence) is much slower than necessary [v2]
Alan Bateman
alanb at openjdk.org
Mon Jan 26 16:42:22 UTC 2026
On Fri, 23 Jan 2026 19:00:31 GMT, Daniel Gredler <dgredler at openjdk.org> wrote:
>> Subclasses of `CharsetEncoder` often override `canEncode(char)` in order to make it very fast. This is not the case for `canEncode(CharSequence)`, which currently must usually perform the full encoding process. As a result, `canEncode(CharSequence)` is about 20x slower than `canEncode(char)` when the input is encodable, and about 1600x slower than `canEncode(char)` when the input is not encodable.
>>
>> The reason that performance is even slower for un-encodable input is that the internal logic is relying on a thrown exception to determine that the input cannot be encoded (requiring stack trace setup, etc).
>>
>> This PR overrides `canEncode(CharSequence)` to simply check `canEncode(char)` on each character in the sequence when the encoding allows this (ASCII, ISO-8859-1, etc). Where this is not possible (e.g. UTF-8, UTF-16) this PR removes the exception-based flow control in `CharsetEncoder` so that the un-encodable scenario is at least improved.
>>
>> Regression tests run locally:
>> - `make test TEST="jtreg:test/jdk/java/nio/charset"`
>> - `make test TEST="jtreg:test/jdk/sun/nio/cs"`
>>
>> The included benchmark can be run via `make test TEST="micro:java.nio.CharsetCanEncode"`.
>>
>> JMH benchmark results **before** changes:
>>
>>
>> Benchmark Mode Cnt Score Error Units
>> CharsetCanEncode.asciiCanEncodeCharNo avgt 30 0.502 ± 0.004 ns/op
>> CharsetCanEncode.asciiCanEncodeCharYes avgt 30 0.503 ± 0.003 ns/op
>> CharsetCanEncode.asciiCanEncodeStringNo avgt 30 821.635 ± 7.055 ns/op
>> CharsetCanEncode.asciiCanEncodeStringYes avgt 30 8.875 ± 0.115 ns/op
>> CharsetCanEncode.iso88591CanEncodeCharNo avgt 30 0.508 ± 0.006 ns/op
>> CharsetCanEncode.iso88591CanEncodeCharYes avgt 30 0.506 ± 0.004 ns/op
>> CharsetCanEncode.iso88591CanEncodeStringNo avgt 30 833.165 ± 7.315 ns/op
>> CharsetCanEncode.iso88591CanEncodeStringYes avgt 30 10.357 ± 1.427 ns/op
>> CharsetCanEncode.iso88592CanEncodeCharNo avgt 30 0.957 ± 0.009 ns/op
>> CharsetCanEncode.iso88592CanEncodeCharYes avgt 30 1.407 ± 0.010 ns/op
>> CharsetCanEncode.iso88592CanEncodeStringNo avgt 30 826.478 ± 4.409 ns/op
>> CharsetCanEncode.iso88592CanEncodeStringYes avgt 30 13.223 ± 1.479 ns/op
>> CharsetCanEncode.shiftjisCanEncodeCharNo avgt 30 1.370 ± 0.012 ns/op
>> CharsetCanEncode.shiftjisCanEncodeCharYes avgt 30 1.386 ± 0.010 ns/op
>> CharsetCanEncode.sh...
>
> Daniel Gredler has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>
> - Merge master
> - Optimize CharsetEncoder.canEncode(CharSequence)
Okay, let's go with what you have. src changes are fine, only skimmed (not detailed review) of the benchmark.
-------------
Marked as reviewed by alanb (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/29391#pullrequestreview-3706923124
More information about the nio-dev
mailing list