Integrated: 8376226: CharsetEncoder.canEncode(CharSequence) is much slower than necessary
Daniel Gredler
dgredler at openjdk.org
Tue Jan 27 13:23:50 UTC 2026
On Fri, 23 Jan 2026 18:41:37 GMT, Daniel Gredler <dgredler at openjdk.org> wrote:
> Subclasses of `CharsetEncoder` often override `canEncode(char)` in order to make it very fast. This is not the case for `canEncode(CharSequence)`, which currently must usually perform the full encoding process. As a result, `canEncode(CharSequence)` is about 20x slower than `canEncode(char)` when the input is encodable, and about 1600x slower than `canEncode(char)` when the input is not encodable.
>
> The reason that performance is even slower for un-encodable input is that the internal logic is relying on a thrown exception to determine that the input cannot be encoded (requiring stack trace setup, etc).
>
> This PR overrides `canEncode(CharSequence)` to simply check `canEncode(char)` on each character in the sequence when the encoding allows this (ASCII, ISO-8859-1, etc). Where this is not possible (e.g. UTF-8, UTF-16) this PR removes the exception-based flow control in `CharsetEncoder` so that the un-encodable scenario is at least improved.
>
> Regression tests run locally:
> - `make test TEST="jtreg:test/jdk/java/nio/charset"`
> - `make test TEST="jtreg:test/jdk/sun/nio/cs"`
>
> The included benchmark can be run via `make test TEST="micro:java.nio.CharsetCanEncode"`.
>
> JMH benchmark results **before** changes:
>
>
> Benchmark Mode Cnt Score Error Units
> CharsetCanEncode.asciiCanEncodeCharNo avgt 30 0.502 ± 0.004 ns/op
> CharsetCanEncode.asciiCanEncodeCharYes avgt 30 0.503 ± 0.003 ns/op
> CharsetCanEncode.asciiCanEncodeStringNo avgt 30 821.635 ± 7.055 ns/op
> CharsetCanEncode.asciiCanEncodeStringYes avgt 30 8.875 ± 0.115 ns/op
> CharsetCanEncode.iso88591CanEncodeCharNo avgt 30 0.508 ± 0.006 ns/op
> CharsetCanEncode.iso88591CanEncodeCharYes avgt 30 0.506 ± 0.004 ns/op
> CharsetCanEncode.iso88591CanEncodeStringNo avgt 30 833.165 ± 7.315 ns/op
> CharsetCanEncode.iso88591CanEncodeStringYes avgt 30 10.357 ± 1.427 ns/op
> CharsetCanEncode.iso88592CanEncodeCharNo avgt 30 0.957 ± 0.009 ns/op
> CharsetCanEncode.iso88592CanEncodeCharYes avgt 30 1.407 ± 0.010 ns/op
> CharsetCanEncode.iso88592CanEncodeStringNo avgt 30 826.478 ± 4.409 ns/op
> CharsetCanEncode.iso88592CanEncodeStringYes avgt 30 13.223 ± 1.479 ns/op
> CharsetCanEncode.shiftjisCanEncodeCharNo avgt 30 1.370 ± 0.012 ns/op
> CharsetCanEncode.shiftjisCanEncodeCharYes avgt 30 1.386 ± 0.010 ns/op
> CharsetCanEncode.shiftjisCanEncodeStringNo avgt 30 850.336 ± 20.107 ns/op
> C...
This pull request has now been integrated.
Changeset: 992a8ef4
Author: Daniel Gredler <dgredler at openjdk.org>
URL: https://git.openjdk.org/jdk/commit/992a8ef46bc0a06c70fd5f4f307dbd20e402ed33
Stats: 246 lines in 6 files changed: 238 ins; 1 del; 7 mod
8376226: CharsetEncoder.canEncode(CharSequence) is much slower than necessary
Reviewed-by: alanb, naoto
-------------
PR: https://git.openjdk.org/jdk/pull/29391
More information about the nio-dev
mailing list