RFR: 8261254: Initialize charset mapping data lazily

older
RFR: 8259074: regex benchmarks and...

Claes Redestad

7 Feb 2021 7 Feb '21

7:36 p.m.

This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets. Testing: tier1-3 ------------- Commit messages: - Merge branch 'master' into lazy_charsets - More finals, copyrights - Improve finality - Initialize charset mapping data lazily Changes: https://git.openjdk.java.net/jdk/pull/2449/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2449&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261254 Stats: 274 lines in 21 files changed: 24 ins; 59 del; 191 mod Patch: https://git.openjdk.java.net/jdk/pull/2449.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2449/head:pull/2449 PR: https://git.openjdk.java.net/jdk/pull/2449

Show replies by date

Claes Redestad

7 Feb 7 Feb

7:36 p.m.

On Sun, 7 Feb 2021 19:08:18 GMT, Claes Redestad <redestad@openjdk.org> wrote:

...

This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.

Testing: tier1-3

Applications calling `Charset.availableCharsets()` might see a decent startup improvement (125-190ms -> 57-95ms first call overhead). All `DoubleByte`-derived charsets can see a throughput improvement on benchmarks thanks to removing DCL initialization checks with a Holder pattern implementation, which allows static data to be declared final and be more amenable to optimizations: Before: Benchmark (charsetName) Mode Cnt Score Error Units StringDecode.WithCharset.decodeCharset MS932 avgt 15 280.072 ± 17.217 ns/op StringDecode.WithCharset.decodeCharsetName MS932 avgt 15 289.368 ± 18.174 ns/op After: Benchmark (charsetName) Mode Cnt Score Error Units StringDecode.WithCharset.decodeCharset MS932 avgt 15 254.906 ± 15.095 ns/op StringDecode.WithCharset.decodeCharsetName MS932 avgt 15 268.241 ± 15.725 ns/op ------------- PR: https://git.openjdk.java.net/jdk/pull/2449

Alan Bateman

8 Feb 8 Feb

8:38 a.m.

On Sun, 7 Feb 2021 19:08:18 GMT, Claes Redestad <redestad@openjdk.org> wrote:

...

This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.

Testing: tier1-3

I wouldn't expect enumerating all charsets with Charset::availableCharsets to be too common but moving the data to holder class looks okay. The missing "final" in a few places was an oversight. The replacement of the foreach and method ref in getServicesCatalog with imperative code is disappointment but okay here. ------------- Marked as reviewed by alanb (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2449

Claes Redestad

11:33 a.m.

On Mon, 8 Feb 2021 08:36:21 GMT, Alan Bateman <alanb@openjdk.org> wrote:

...

...
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.

Testing: tier1-3

I wouldn't expect enumerating all charsets with Charset::availableCharsets to be too common but moving the data to holder class looks okay. The missing "final" in a few places was an oversight. The replacement of the foreach and method ref in getServicesCatalog with imperative code is disappointment but okay here.

I spotted usage of this in a real application. While they could work around it and remove the usage to gain an even larger startup win I figured I should do those that can't do so a favor, too. ------------- PR: https://git.openjdk.java.net/jdk/pull/2449

Сергей Цыпанов

11:44 a.m.

On Sun, 7 Feb 2021 19:08:18 GMT, Claes Redestad <redestad@openjdk.org> wrote:

...

This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.

Testing: tier1-3

src/jdk.charsets/share/classes/sun/nio/cs/ext/AbstractCharsetProvider.java line 75:

...

73: 74: protected AbstractCharsetProvider(String pkgPrefixName) { 75: packagePrefix = pkgPrefixName.concat(".");

Hm, I wonder why not just `pkgPrefixName + '.'` here and below? ------------- PR: https://git.openjdk.java.net/jdk/pull/2449

Claes Redestad

12:18 p.m.

On Mon, 8 Feb 2021 11:42:23 GMT, Сергей Цыпанов <github.com+10835776+stsypanov@openjdk.org> wrote:

...

...
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.

Testing: tier1-3

src/jdk.charsets/share/classes/sun/nio/cs/ext/AbstractCharsetProvider.java line 75:

...
73: 74: protected AbstractCharsetProvider(String pkgPrefixName) { 75: packagePrefix = pkgPrefixName.concat(".");

Hm, I wonder why not just `pkgPrefixName + '.'` here and below? Is it something about early init of `StringConcatFactory`?

Yes, I wanted to measure the overhead of `Charset` class initialization done by `Charset.availableCharsets()` and the `StringConcatFactory` bootstraps was a reasonable chunk of the cost so I moved them out of the picture. I didn't mind what I ended up with, but if you prefer I can move back to `pkgPrefixName + '.'` here. ------------- PR: https://git.openjdk.java.net/jdk/pull/2449

Сергей Цыпанов

12:50 p.m.

On Mon, 8 Feb 2021 12:16:23 GMT, Claes Redestad <redestad@openjdk.org> wrote:

...

...
src/jdk.charsets/share/classes/sun/nio/cs/ext/AbstractCharsetProvider.java line 75:

...
73: 74: protected AbstractCharsetProvider(String pkgPrefixName) { 75: packagePrefix = pkgPrefixName.concat(".");

Hm, I wonder why not just `pkgPrefixName + '.'` here and below? Is it something about early init of `StringConcatFactory`?

Yes, I wanted to measure the overhead of `Charset` class initialization done by `Charset.availableCharsets()` and the `StringConcatFactory` bootstraps was a reasonable chunk of the cost so I moved them out of the picture. I didn't mind what I ended up with, but if you prefer I can move back to `pkgPrefixName + '.'` here.

Let's keep it as is, to me it's a readable as concatenation via `+` ------------- PR: https://git.openjdk.java.net/jdk/pull/2449

Johannes Kuhn

12:18 p.m.

On Sun, 7 Feb 2021 19:08:18 GMT, Claes Redestad <redestad@openjdk.org> wrote:

...

This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.

Testing: tier1-3

Looks good to me. src/java.base/share/classes/java/lang/ModuleLayer.java line 936:

...

934: for (Module m : nameToModule.values()) { 935: servicesCatalog.register(m); 936: }

Seems to be unrelated, but it's not a bad change. ------------- Marked as reviewed by jkuhn (Author). PR: https://git.openjdk.java.net/jdk/pull/2449

Claes Redestad

12:21 p.m.

On Sun, 7 Feb 2021 19:44:41 GMT, Johannes Kuhn <jkuhn@openjdk.org> wrote:

...

...
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.

Testing: tier1-3

src/java.base/share/classes/java/lang/ModuleLayer.java line 936:

...
934: for (Module m : nameToModule.values()) { 935: servicesCatalog.register(m); 936: }

Seems to be unrelated, but it's not a bad change.

As with the `StringConcatFactory` changes @stsypanov commented on, I did this to reduce noise when zooming in on the cost of `Charset` class initialization stemming from `Charset.availableCharsets()`. I ended up preferring the desugared version, so I left it in. ------------- PR: https://git.openjdk.java.net/jdk/pull/2449

Naoto Sato

5:42 p.m.

On Sun, 7 Feb 2021 19:08:18 GMT, Claes Redestad <redestad@openjdk.org> wrote:

...

This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.

Testing: tier1-3

Looks good to me. I initially thought of the same impression as Alan's, i.e., enumerating all charsets is not that common (in fact, same applies to Locale, and I once explored usages and it ended all up in test cases), but if real apps are using it, this is the right way to do so. ------------- Marked as reviewed by naoto (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2449

Claes Redestad

6:48 p.m.

On Mon, 8 Feb 2021 17:40:00 GMT, Naoto Sato <naoto@openjdk.org> wrote:

...

...
This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.

Testing: tier1-3

Looks good to me. I initially thought of the same impression as Alan's, i.e., enumerating all charsets is not that common (in fact, same applies to Locale, and I once explored usages and it ended all up in test cases), but if real apps are using it, this is the right way to do so.

Thanks for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/2449

Claes Redestad

6:48 p.m.

New subject: Integrated: 8261254: Initialize charset mapping data lazily

On Sun, 7 Feb 2021 19:08:18 GMT, Claes Redestad <redestad@openjdk.org> wrote:

...

This patch refactor JDK internal charsets to initialize charset mapping data lazily when needed via holder classes. This means both a startup improvement in some cases, and possible throughput improvements for all DoubleByte-based Charsets.

Testing: tier1-3

This pull request has now been integrated. Changeset: 92c6e6df Author: Claes Redestad <redestad@openjdk.org> URL: https://git.openjdk.java.net/jdk/commit/92c6e6df Stats: 274 lines in 21 files changed: 24 ins; 59 del; 191 mod 8261254: Initialize charset mapping data lazily Reviewed-by: alanb, jkuhn, naoto ------------- PR: https://git.openjdk.java.net/jdk/pull/2449

1856

Age (days ago)

1857

Last active (days ago)

List overview

Download

11 comments

5 participants

participants (5)

Alan Bateman
Claes Redestad
Johannes Kuhn
Naoto Sato
Сергей Цыпанов