RFR: 8376226: CharsetEncoder.canEncode(CharSequence) is much slower than necessary
Daniel Gredler
dgredler at openjdk.org
Fri Jan 23 18:50:23 UTC 2026
Subclasses of `CharsetEncoder` often override `canEncode(char)` in order to make it very fast. This is not the case for `canEncode(CharSequence)`, which currently must usually perform the full encoding process. As a result, `canEncode(CharSequence)` is about 20x slower than `canEncode(char)` when the input is encodable, and about 1600x slower than `canEncode(char)` when the input is not encodable.
The reason that performance is even slower for un-encodable input is that the internal logic is relying on a thrown exception to determine that the input cannot be encoded (requiring stack trace setup, etc).
This PR overrides `canEncode(CharSequence)` to simply check `canEncode(char)` on each character in the sequence when the encoding allows this (ASCII, ISO-8859-1, etc). Where this is not possible (e.g. UTF-8, UTF-16) this PR removes the exception-based flow control in `CharsetEncoder` so that the un-encodable scenario is at least improved.
Regression tests run locally:
- `make test TEST="jtreg:test/jdk/java/nio/charset"`
- `make test TEST="jtreg:test/jdk/sun/nio/cs"`
The included benchmark can be run via `make test TEST="micro:java.nio.CharsetCanEncode"`.
JMH benchmark results **before** changes:
Benchmark Mode Cnt Score Error Units
CharsetCanEncode.asciiCanEncodeCharNo avgt 30 0.502 ± 0.004 ns/op
CharsetCanEncode.asciiCanEncodeCharYes avgt 30 0.503 ± 0.003 ns/op
CharsetCanEncode.asciiCanEncodeStringNo avgt 30 821.635 ± 7.055 ns/op
CharsetCanEncode.asciiCanEncodeStringYes avgt 30 8.875 ± 0.115 ns/op
CharsetCanEncode.iso88591CanEncodeCharNo avgt 30 0.508 ± 0.006 ns/op
CharsetCanEncode.iso88591CanEncodeCharYes avgt 30 0.506 ± 0.004 ns/op
CharsetCanEncode.iso88591CanEncodeStringNo avgt 30 833.165 ± 7.315 ns/op
CharsetCanEncode.iso88591CanEncodeStringYes avgt 30 10.357 ± 1.427 ns/op
CharsetCanEncode.iso88592CanEncodeCharNo avgt 30 0.957 ± 0.009 ns/op
CharsetCanEncode.iso88592CanEncodeCharYes avgt 30 1.407 ± 0.010 ns/op
CharsetCanEncode.iso88592CanEncodeStringNo avgt 30 826.478 ± 4.409 ns/op
CharsetCanEncode.iso88592CanEncodeStringYes avgt 30 13.223 ± 1.479 ns/op
CharsetCanEncode.shiftjisCanEncodeCharNo avgt 30 1.370 ± 0.012 ns/op
CharsetCanEncode.shiftjisCanEncodeCharYes avgt 30 1.386 ± 0.010 ns/op
CharsetCanEncode.shiftjisCanEncodeStringNo avgt 30 850.336 ± 20.107 ns/op
CharsetCanEncode.shiftjisCanEncodeStringYes avgt 30 10.672 ± 0.088 ns/op
CharsetCanEncode.utf16leCanEncodeCharNo avgt 30 0.518 ± 0.005 ns/op
CharsetCanEncode.utf16leCanEncodeCharYes avgt 30 0.517 ± 0.005 ns/op
CharsetCanEncode.utf16leCanEncodeStringNo avgt 30 857.907 ± 15.492 ns/op
CharsetCanEncode.utf16leCanEncodeStringYes avgt 30 12.492 ± 1.444 ns/op
CharsetCanEncode.utf8CanEncodeCharNo avgt 30 0.522 ± 0.008 ns/op
CharsetCanEncode.utf8CanEncodeCharYes avgt 30 0.518 ± 0.004 ns/op
CharsetCanEncode.utf8CanEncodeStringNo avgt 30 869.428 ± 11.116 ns/op
CharsetCanEncode.utf8CanEncodeStringYes avgt 30 19.587 ± 0.190 ns/op
JMH benchmark results **after** changes:
Benchmark Mode Cnt Score Error Units
CharsetCanEncode.asciiCanEncodeCharNo avgt 30 0.509 ± 0.004 ns/op
CharsetCanEncode.asciiCanEncodeCharYes avgt 30 0.504 ± 0.005 ns/op
CharsetCanEncode.asciiCanEncodeStringNo avgt 30 0.608 ± 0.011 ns/op
CharsetCanEncode.asciiCanEncodeStringYes avgt 30 0.508 ± 0.006 ns/op
CharsetCanEncode.iso88591CanEncodeCharNo avgt 30 0.502 ± 0.004 ns/op
CharsetCanEncode.iso88591CanEncodeCharYes avgt 30 0.502 ± 0.004 ns/op
CharsetCanEncode.iso88591CanEncodeStringNo avgt 30 0.604 ± 0.004 ns/op
CharsetCanEncode.iso88591CanEncodeStringYes avgt 30 0.507 ± 0.004 ns/op
CharsetCanEncode.iso88592CanEncodeCharNo avgt 30 0.952 ± 0.008 ns/op
CharsetCanEncode.iso88592CanEncodeCharYes avgt 30 1.394 ± 0.009 ns/op
CharsetCanEncode.iso88592CanEncodeStringNo avgt 30 1.071 ± 0.006 ns/op
CharsetCanEncode.iso88592CanEncodeStringYes avgt 30 1.375 ± 0.008 ns/op
CharsetCanEncode.shiftjisCanEncodeCharNo avgt 30 1.360 ± 0.007 ns/op
CharsetCanEncode.shiftjisCanEncodeCharYes avgt 30 1.381 ± 0.010 ns/op
CharsetCanEncode.shiftjisCanEncodeStringNo avgt 30 1.581 ± 0.015 ns/op
CharsetCanEncode.shiftjisCanEncodeStringYes avgt 30 1.388 ± 0.008 ns/op
CharsetCanEncode.utf16leCanEncodeCharNo avgt 30 0.507 ± 0.004 ns/op
CharsetCanEncode.utf16leCanEncodeCharYes avgt 30 0.509 ± 0.006 ns/op
CharsetCanEncode.utf16leCanEncodeStringNo avgt 30 12.177 ± 0.139 ns/op
CharsetCanEncode.utf16leCanEncodeStringYes avgt 30 10.717 ± 0.098 ns/op
CharsetCanEncode.utf8CanEncodeCharNo avgt 30 0.511 ± 0.005 ns/op
CharsetCanEncode.utf8CanEncodeCharYes avgt 30 0.516 ± 0.008 ns/op
CharsetCanEncode.utf8CanEncodeStringNo avgt 30 24.626 ± 0.226 ns/op
CharsetCanEncode.utf8CanEncodeStringYes avgt 30 20.593 ± 0.192 ns/op
-------------
Commit messages:
- Optimize CharsetEncoder.canEncode(CharSequence)
Changes: https://git.openjdk.org/jdk/pull/29391/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29391&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8376226
Stats: 246 lines in 6 files changed: 238 ins; 1 del; 7 mod
Patch: https://git.openjdk.org/jdk/pull/29391.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/29391/head:pull/29391
PR: https://git.openjdk.org/jdk/pull/29391
More information about the nio-dev
mailing list