RFR: 8376226: CharsetEncoder.canEncode(CharSequence) is much slower than necessary

Daniel Gredler dgredler at openjdk.org
Fri Jan 23 18:50:23 UTC 2026


Subclasses of `CharsetEncoder` often override `canEncode(char)` in order to make it very fast. This is not the case for `canEncode(CharSequence)`, which currently must usually perform the full encoding process. As a result, `canEncode(CharSequence)` is about 20x slower than `canEncode(char)` when the input is encodable, and about 1600x slower than `canEncode(char)` when the input is not encodable.

The reason that performance is even slower for un-encodable input is that the internal logic is relying on a thrown exception to determine that the input cannot be encoded (requiring stack trace setup, etc).

This PR overrides `canEncode(CharSequence)` to simply check `canEncode(char)` on each character in the sequence when the encoding allows this (ASCII, ISO-8859-1, etc). Where this is not possible (e.g. UTF-8, UTF-16) this PR removes the exception-based flow control in `CharsetEncoder` so that the un-encodable scenario is at least improved.

Regression tests run locally:
- `make test TEST="jtreg:test/jdk/java/nio/charset"`
- `make test TEST="jtreg:test/jdk/sun/nio/cs"`

The included benchmark can be run via `make test TEST="micro:java.nio.CharsetCanEncode"`.

JMH benchmark results **before** changes:


Benchmark                                    Mode  Cnt    Score    Error  Units
CharsetCanEncode.asciiCanEncodeCharNo        avgt   30    0.502 ±  0.004  ns/op
CharsetCanEncode.asciiCanEncodeCharYes       avgt   30    0.503 ±  0.003  ns/op
CharsetCanEncode.asciiCanEncodeStringNo      avgt   30  821.635 ±  7.055  ns/op
CharsetCanEncode.asciiCanEncodeStringYes     avgt   30    8.875 ±  0.115  ns/op
CharsetCanEncode.iso88591CanEncodeCharNo     avgt   30    0.508 ±  0.006  ns/op
CharsetCanEncode.iso88591CanEncodeCharYes    avgt   30    0.506 ±  0.004  ns/op
CharsetCanEncode.iso88591CanEncodeStringNo   avgt   30  833.165 ±  7.315  ns/op
CharsetCanEncode.iso88591CanEncodeStringYes  avgt   30   10.357 ±  1.427  ns/op
CharsetCanEncode.iso88592CanEncodeCharNo     avgt   30    0.957 ±  0.009  ns/op
CharsetCanEncode.iso88592CanEncodeCharYes    avgt   30    1.407 ±  0.010  ns/op
CharsetCanEncode.iso88592CanEncodeStringNo   avgt   30  826.478 ±  4.409  ns/op
CharsetCanEncode.iso88592CanEncodeStringYes  avgt   30   13.223 ±  1.479  ns/op
CharsetCanEncode.shiftjisCanEncodeCharNo     avgt   30    1.370 ±  0.012  ns/op
CharsetCanEncode.shiftjisCanEncodeCharYes    avgt   30    1.386 ±  0.010  ns/op
CharsetCanEncode.shiftjisCanEncodeStringNo   avgt   30  850.336 ± 20.107  ns/op
CharsetCanEncode.shiftjisCanEncodeStringYes  avgt   30   10.672 ±  0.088  ns/op
CharsetCanEncode.utf16leCanEncodeCharNo      avgt   30    0.518 ±  0.005  ns/op
CharsetCanEncode.utf16leCanEncodeCharYes     avgt   30    0.517 ±  0.005  ns/op
CharsetCanEncode.utf16leCanEncodeStringNo    avgt   30  857.907 ± 15.492  ns/op
CharsetCanEncode.utf16leCanEncodeStringYes   avgt   30   12.492 ±  1.444  ns/op
CharsetCanEncode.utf8CanEncodeCharNo         avgt   30    0.522 ±  0.008  ns/op
CharsetCanEncode.utf8CanEncodeCharYes        avgt   30    0.518 ±  0.004  ns/op
CharsetCanEncode.utf8CanEncodeStringNo       avgt   30  869.428 ± 11.116  ns/op
CharsetCanEncode.utf8CanEncodeStringYes      avgt   30   19.587 ±  0.190  ns/op


JMH benchmark results **after** changes:


Benchmark                                    Mode  Cnt   Score   Error  Units
CharsetCanEncode.asciiCanEncodeCharNo        avgt   30   0.509 ± 0.004  ns/op
CharsetCanEncode.asciiCanEncodeCharYes       avgt   30   0.504 ± 0.005  ns/op
CharsetCanEncode.asciiCanEncodeStringNo      avgt   30   0.608 ± 0.011  ns/op
CharsetCanEncode.asciiCanEncodeStringYes     avgt   30   0.508 ± 0.006  ns/op
CharsetCanEncode.iso88591CanEncodeCharNo     avgt   30   0.502 ± 0.004  ns/op
CharsetCanEncode.iso88591CanEncodeCharYes    avgt   30   0.502 ± 0.004  ns/op
CharsetCanEncode.iso88591CanEncodeStringNo   avgt   30   0.604 ± 0.004  ns/op
CharsetCanEncode.iso88591CanEncodeStringYes  avgt   30   0.507 ± 0.004  ns/op
CharsetCanEncode.iso88592CanEncodeCharNo     avgt   30   0.952 ± 0.008  ns/op
CharsetCanEncode.iso88592CanEncodeCharYes    avgt   30   1.394 ± 0.009  ns/op
CharsetCanEncode.iso88592CanEncodeStringNo   avgt   30   1.071 ± 0.006  ns/op
CharsetCanEncode.iso88592CanEncodeStringYes  avgt   30   1.375 ± 0.008  ns/op
CharsetCanEncode.shiftjisCanEncodeCharNo     avgt   30   1.360 ± 0.007  ns/op
CharsetCanEncode.shiftjisCanEncodeCharYes    avgt   30   1.381 ± 0.010  ns/op
CharsetCanEncode.shiftjisCanEncodeStringNo   avgt   30   1.581 ± 0.015  ns/op
CharsetCanEncode.shiftjisCanEncodeStringYes  avgt   30   1.388 ± 0.008  ns/op
CharsetCanEncode.utf16leCanEncodeCharNo      avgt   30   0.507 ± 0.004  ns/op
CharsetCanEncode.utf16leCanEncodeCharYes     avgt   30   0.509 ± 0.006  ns/op
CharsetCanEncode.utf16leCanEncodeStringNo    avgt   30  12.177 ± 0.139  ns/op
CharsetCanEncode.utf16leCanEncodeStringYes   avgt   30  10.717 ± 0.098  ns/op
CharsetCanEncode.utf8CanEncodeCharNo         avgt   30   0.511 ± 0.005  ns/op
CharsetCanEncode.utf8CanEncodeCharYes        avgt   30   0.516 ± 0.008  ns/op
CharsetCanEncode.utf8CanEncodeStringNo       avgt   30  24.626 ± 0.226  ns/op
CharsetCanEncode.utf8CanEncodeStringYes      avgt   30  20.593 ± 0.192  ns/op

-------------

Commit messages:
 - Optimize CharsetEncoder.canEncode(CharSequence)

Changes: https://git.openjdk.org/jdk/pull/29391/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29391&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8376226
  Stats: 246 lines in 6 files changed: 238 ins; 1 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/29391.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/29391/head:pull/29391

PR: https://git.openjdk.org/jdk/pull/29391


More information about the nio-dev mailing list