RFR: 8261254: Initialize charset mapping data lazily
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets. Testing: tier1-3 ------------- Commit messages: - Merge branch 'master' into lazy_charsets - More finals, copyrights - Improve finality - Initialize charset mapping data lazily Changes: https://git.openjdk.java.net/jdk/pull/2449/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2449&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261254 Stats: 274 lines in 21 files changed: 24 ins; 59 del; 191 mod Patch: https://git.openjdk.java.net/jdk/pull/2449.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2449/head:pull/2449 PR: https://git.openjdk.java.net/jdk/pull/2449
On Sun, 7 Feb 2021 19:08:18 GMT, Claes Redestad <redestad@openjdk.org> wrote:
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.
Testing: tier1-3
Applications calling `Charset.availableCharsets()` might see a decent startup improvement (125-190ms -> 57-95ms first call overhead). All `DoubleByte`-derived charsets can see a throughput improvement on benchmarks thanks to removing DCL initialization checks with a Holder pattern implementation, which allows static data to be declared final and be more amenable to optimizations: Before: Benchmark (charsetName) Mode Cnt Score Error Units StringDecode.WithCharset.decodeCharset MS932 avgt 15 280.072 ± 17.217 ns/op StringDecode.WithCharset.decodeCharsetName MS932 avgt 15 289.368 ± 18.174 ns/op After: Benchmark (charsetName) Mode Cnt Score Error Units StringDecode.WithCharset.decodeCharset MS932 avgt 15 254.906 ± 15.095 ns/op StringDecode.WithCharset.decodeCharsetName MS932 avgt 15 268.241 ± 15.725 ns/op ------------- PR: https://git.openjdk.java.net/jdk/pull/2449
On Sun, 7 Feb 2021 19:08:18 GMT, Claes Redestad <redestad@openjdk.org> wrote:
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.
Testing: tier1-3
I wouldn't expect enumerating all charsets with Charset::availableCharsets to be too common but moving the data to holder class looks okay. The missing "final" in a few places was an oversight. The replacement of the foreach and method ref in getServicesCatalog with imperative code is disappointment but okay here. ------------- Marked as reviewed by alanb (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2449
On Mon, 8 Feb 2021 08:36:21 GMT, Alan Bateman <alanb@openjdk.org> wrote:
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.
Testing: tier1-3
I wouldn't expect enumerating all charsets with Charset::availableCharsets to be too common but moving the data to holder class looks okay. The missing "final" in a few places was an oversight. The replacement of the foreach and method ref in getServicesCatalog with imperative code is disappointment but okay here.
I spotted usage of this in a real application. While they could work around it and remove the usage to gain an even larger startup win I figured I should do those that can't do so a favor, too. ------------- PR: https://git.openjdk.java.net/jdk/pull/2449
On Sun, 7 Feb 2021 19:08:18 GMT, Claes Redestad <redestad@openjdk.org> wrote:
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.
Testing: tier1-3
src/jdk.charsets/share/classes/sun/nio/cs/ext/AbstractCharsetProvider.java line 75:
73: 74: protected AbstractCharsetProvider(String pkgPrefixName) { 75: packagePrefix = pkgPrefixName.concat(".");
Hm, I wonder why not just `pkgPrefixName + '.'` here and below? ------------- PR: https://git.openjdk.java.net/jdk/pull/2449
On Mon, 8 Feb 2021 11:42:23 GMT, Сергей Цыпанов <github.com+10835776+stsypanov@openjdk.org> wrote:
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.
Testing: tier1-3
src/jdk.charsets/share/classes/sun/nio/cs/ext/AbstractCharsetProvider.java line 75:
73: 74: protected AbstractCharsetProvider(String pkgPrefixName) { 75: packagePrefix = pkgPrefixName.concat(".");
Hm, I wonder why not just `pkgPrefixName + '.'` here and below? Is it something about early init of `StringConcatFactory`?
Yes, I wanted to measure the overhead of `Charset` class initialization done by `Charset.availableCharsets()` and the `StringConcatFactory` bootstraps was a reasonable chunk of the cost so I moved them out of the picture. I didn't mind what I ended up with, but if you prefer I can move back to `pkgPrefixName + '.'` here. ------------- PR: https://git.openjdk.java.net/jdk/pull/2449
On Mon, 8 Feb 2021 12:16:23 GMT, Claes Redestad <redestad@openjdk.org> wrote:
src/jdk.charsets/share/classes/sun/nio/cs/ext/AbstractCharsetProvider.java line 75:
73: 74: protected AbstractCharsetProvider(String pkgPrefixName) { 75: packagePrefix = pkgPrefixName.concat(".");
Hm, I wonder why not just `pkgPrefixName + '.'` here and below? Is it something about early init of `StringConcatFactory`?
Yes, I wanted to measure the overhead of `Charset` class initialization done by `Charset.availableCharsets()` and the `StringConcatFactory` bootstraps was a reasonable chunk of the cost so I moved them out of the picture. I didn't mind what I ended up with, but if you prefer I can move back to `pkgPrefixName + '.'` here.
Let's keep it as is, to me it's a readable as concatenation via `+` ------------- PR: https://git.openjdk.java.net/jdk/pull/2449
On Sun, 7 Feb 2021 19:08:18 GMT, Claes Redestad <redestad@openjdk.org> wrote:
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.
Testing: tier1-3
Looks good to me. src/java.base/share/classes/java/lang/ModuleLayer.java line 936:
934: for (Module m : nameToModule.values()) { 935: servicesCatalog.register(m); 936: }
Seems to be unrelated, but it's not a bad change. ------------- Marked as reviewed by jkuhn (Author). PR: https://git.openjdk.java.net/jdk/pull/2449
On Sun, 7 Feb 2021 19:44:41 GMT, Johannes Kuhn <jkuhn@openjdk.org> wrote:
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.
Testing: tier1-3
src/java.base/share/classes/java/lang/ModuleLayer.java line 936:
934: for (Module m : nameToModule.values()) { 935: servicesCatalog.register(m); 936: }
Seems to be unrelated, but it's not a bad change.
As with the `StringConcatFactory` changes @stsypanov commented on, I did this to reduce noise when zooming in on the cost of `Charset` class initialization stemming from `Charset.availableCharsets()`. I ended up preferring the desugared version, so I left it in. ------------- PR: https://git.openjdk.java.net/jdk/pull/2449
On Sun, 7 Feb 2021 19:08:18 GMT, Claes Redestad <redestad@openjdk.org> wrote:
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.
Testing: tier1-3
Looks good to me. I initially thought of the same impression as Alan's, i.e., enumerating all charsets is not that common (in fact, same applies to Locale, and I once explored usages and it ended all up in test cases), but if real apps are using it, this is the right way to do so. ------------- Marked as reviewed by naoto (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2449
On Mon, 8 Feb 2021 17:40:00 GMT, Naoto Sato <naoto@openjdk.org> wrote:
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.
Testing: tier1-3
Looks good to me. I initially thought of the same impression as Alan's, i.e., enumerating all charsets is not that common (in fact, same applies to Locale, and I once explored usages and it ended all up in test cases), but if real apps are using it, this is the right way to do so.
Thanks for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/2449
On Sun, 7 Feb 2021 19:08:18 GMT, Claes Redestad <redestad@openjdk.org> wrote:
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.
Testing: tier1-3
This pull request has now been integrated. Changeset: 92c6e6df Author: Claes Redestad <redestad@openjdk.org> URL: https://git.openjdk.java.net/jdk/commit/92c6e6df Stats: 274 lines in 21 files changed: 24 ins; 59 del; 191 mod 8261254: Initialize charset mapping data lazily Reviewed-by: alanb, jkuhn, naoto ------------- PR: https://git.openjdk.java.net/jdk/pull/2449
participants (5)
-
Alan Bateman
-
Claes Redestad
-
Johannes Kuhn
-
Naoto Sato
-
Сергей Цыпанов