RFR: 8376226: CharsetEncoder.canEncode(CharSequence) is much slower than necessary [v2]

Daniel Gredler dgredler at openjdk.org
Fri Jan 23 19:00:31 UTC 2026


> Subclasses of `CharsetEncoder` often override `canEncode(char)` in order to make it very fast. This is not the case for `canEncode(CharSequence)`, which currently must usually perform the full encoding process. As a result, `canEncode(CharSequence)` is about 20x slower than `canEncode(char)` when the input is encodable, and about 1600x slower than `canEncode(char)` when the input is not encodable.
> 
> The reason that performance is even slower for un-encodable input is that the internal logic is relying on a thrown exception to determine that the input cannot be encoded (requiring stack trace setup, etc).
> 
> This PR overrides `canEncode(CharSequence)` to simply check `canEncode(char)` on each character in the sequence when the encoding allows this (ASCII, ISO-8859-1, etc). Where this is not possible (e.g. UTF-8, UTF-16) this PR removes the exception-based flow control in `CharsetEncoder` so that the un-encodable scenario is at least improved.
> 
> Regression tests run locally:
> - `make test TEST="jtreg:test/jdk/java/nio/charset"`
> - `make test TEST="jtreg:test/jdk/sun/nio/cs"`
> 
> The included benchmark can be run via `make test TEST="micro:java.nio.CharsetCanEncode"`.
> 
> JMH benchmark results **before** changes:
> 
> 
> Benchmark                                    Mode  Cnt    Score    Error  Units
> CharsetCanEncode.asciiCanEncodeCharNo        avgt   30    0.502 ±  0.004  ns/op
> CharsetCanEncode.asciiCanEncodeCharYes       avgt   30    0.503 ±  0.003  ns/op
> CharsetCanEncode.asciiCanEncodeStringNo      avgt   30  821.635 ±  7.055  ns/op
> CharsetCanEncode.asciiCanEncodeStringYes     avgt   30    8.875 ±  0.115  ns/op
> CharsetCanEncode.iso88591CanEncodeCharNo     avgt   30    0.508 ±  0.006  ns/op
> CharsetCanEncode.iso88591CanEncodeCharYes    avgt   30    0.506 ±  0.004  ns/op
> CharsetCanEncode.iso88591CanEncodeStringNo   avgt   30  833.165 ±  7.315  ns/op
> CharsetCanEncode.iso88591CanEncodeStringYes  avgt   30   10.357 ±  1.427  ns/op
> CharsetCanEncode.iso88592CanEncodeCharNo     avgt   30    0.957 ±  0.009  ns/op
> CharsetCanEncode.iso88592CanEncodeCharYes    avgt   30    1.407 ±  0.010  ns/op
> CharsetCanEncode.iso88592CanEncodeStringNo   avgt   30  826.478 ±  4.409  ns/op
> CharsetCanEncode.iso88592CanEncodeStringYes  avgt   30   13.223 ±  1.479  ns/op
> CharsetCanEncode.shiftjisCanEncodeCharNo     avgt   30    1.370 ±  0.012  ns/op
> CharsetCanEncode.shiftjisCanEncodeCharYes    avgt   30    1.386 ±  0.010  ns/op
> CharsetCanEncode.shiftjisCanEncodeStringNo   avgt   30  850.336 ± 20.107  ns/op
> C...

Daniel Gredler has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:

 - Merge master
 - Optimize CharsetEncoder.canEncode(CharSequence)

-------------

Changes: https://git.openjdk.org/jdk/pull/29391/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29391&range=01
  Stats: 246 lines in 6 files changed: 238 ins; 1 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/29391.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/29391/head:pull/29391

PR: https://git.openjdk.org/jdk/pull/29391


More information about the nio-dev mailing list