RFR: 8376226: CharsetEncoder.canEncode(CharSequence) is much slower than necessary [v2]

Alan Bateman alanb at openjdk.org
Mon Jan 26 16:42:22 UTC 2026


On Fri, 23 Jan 2026 19:00:31 GMT, Daniel Gredler <dgredler at openjdk.org> wrote:

>> Subclasses of `CharsetEncoder` often override `canEncode(char)` in order to make it very fast. This is not the case for `canEncode(CharSequence)`, which currently must usually perform the full encoding process. As a result, `canEncode(CharSequence)` is about 20x slower than `canEncode(char)` when the input is encodable, and about 1600x slower than `canEncode(char)` when the input is not encodable.
>> 
>> The reason that performance is even slower for un-encodable input is that the internal logic is relying on a thrown exception to determine that the input cannot be encoded (requiring stack trace setup, etc).
>> 
>> This PR overrides `canEncode(CharSequence)` to simply check `canEncode(char)` on each character in the sequence when the encoding allows this (ASCII, ISO-8859-1, etc). Where this is not possible (e.g. UTF-8, UTF-16) this PR removes the exception-based flow control in `CharsetEncoder` so that the un-encodable scenario is at least improved.
>> 
>> Regression tests run locally:
>> - `make test TEST="jtreg:test/jdk/java/nio/charset"`
>> - `make test TEST="jtreg:test/jdk/sun/nio/cs"`
>> 
>> The included benchmark can be run via `make test TEST="micro:java.nio.CharsetCanEncode"`.
>> 
>> JMH benchmark results **before** changes:
>> 
>> 
>> Benchmark                                    Mode  Cnt    Score    Error  Units
>> CharsetCanEncode.asciiCanEncodeCharNo        avgt   30    0.502 ±  0.004  ns/op
>> CharsetCanEncode.asciiCanEncodeCharYes       avgt   30    0.503 ±  0.003  ns/op
>> CharsetCanEncode.asciiCanEncodeStringNo      avgt   30  821.635 ±  7.055  ns/op
>> CharsetCanEncode.asciiCanEncodeStringYes     avgt   30    8.875 ±  0.115  ns/op
>> CharsetCanEncode.iso88591CanEncodeCharNo     avgt   30    0.508 ±  0.006  ns/op
>> CharsetCanEncode.iso88591CanEncodeCharYes    avgt   30    0.506 ±  0.004  ns/op
>> CharsetCanEncode.iso88591CanEncodeStringNo   avgt   30  833.165 ±  7.315  ns/op
>> CharsetCanEncode.iso88591CanEncodeStringYes  avgt   30   10.357 ±  1.427  ns/op
>> CharsetCanEncode.iso88592CanEncodeCharNo     avgt   30    0.957 ±  0.009  ns/op
>> CharsetCanEncode.iso88592CanEncodeCharYes    avgt   30    1.407 ±  0.010  ns/op
>> CharsetCanEncode.iso88592CanEncodeStringNo   avgt   30  826.478 ±  4.409  ns/op
>> CharsetCanEncode.iso88592CanEncodeStringYes  avgt   30   13.223 ±  1.479  ns/op
>> CharsetCanEncode.shiftjisCanEncodeCharNo     avgt   30    1.370 ±  0.012  ns/op
>> CharsetCanEncode.shiftjisCanEncodeCharYes    avgt   30    1.386 ±  0.010  ns/op
>> CharsetCanEncode.sh...
>
> Daniel Gredler has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
> 
>  - Merge master
>  - Optimize CharsetEncoder.canEncode(CharSequence)

Okay, let's go with what you have. src changes are fine, only skimmed (not detailed review) of the benchmark.

-------------

Marked as reviewed by alanb (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/29391#pullrequestreview-3706923124


More information about the nio-dev mailing list