Optimization (?) of HashSet(Collection)
Andrew Haley
aph-open at littlepinkcloud.com
Fri Oct 15 08:57:37 UTC 2021
On 10/4/21 12:57, Сергей Цыпанов wrote:
>
> in the code of HashSet(Collection) we have an optimization opportunity:
>
> public HashSet(Collection<? extends E> c) {
> map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
> addAll(c);
> }
>
> instead of using addAll() inherited from j.u.Collection we can use c.forEach(this::add):
> > public HashSet(Collection<? extends E> c) {
> map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
> c.forEach(this::add);
> }
>
> This allows to rid Iterator and use counting loops for e.g. ArrayList instead of hasNext()/next().
Well, perhaps, but this sort of low-level hackery is not IMO something we
should reflexively do in the core class libraries.
> We are also likely to benefit from this change in case the argument collection is synchronized as it's going to be locked only once.
That's a good point, but would a synchronized collection really lock
coarsely around forEach(), rather than in add().
> I've used the benchmark [1] to analyze the impact and it demonstrates measurable improvement [2].
>
> What puzzles we a bit is the result for j.u.ArrayList. At warm-up time it demonstrates better results than at measurement time:
>
> length = 10
>
> # Run progress: 4,44% complete, ETA 00:21:41
> # Fork: 3 of 5
> # Warmup Iteration 1: 134,699 ns/op
> # Warmup Iteration 2: 115,391 ns/op
> # Warmup Iteration 3: 130,110 ns/op
> # Warmup Iteration 4: 114,465 ns/op <----
> # Warmup Iteration 5: 114,849 ns/op
> # Warmup Iteration 6: 115,073 ns/op
> # Warmup Iteration 7: 113,988 ns/op
> # Warmup Iteration 8: 115,301 ns/op
> # Warmup Iteration 9: 115,452 ns/op
> # Warmup Iteration 10: 124,493 ns/op <----
> Iteration 1: 123,776 ns/op
> Iteration 2: 123,719 ns/op
> Iteration 3: 123,725 ns/op
> Iteration 4: 126,892 ns/op
> Iteration 5: 125,493 ns/op
> Iteration 6: 124,661 ns/op
> Iteration 7: 123,783 ns/op
> Iteration 8: 123,975 ns/op
> Iteration 9: 124,047 ns/op
> Iteration 10: 123,899 ns/op
>
> length = 100
>
> # Run progress: 11,11% complete, ETA 00:20:10
> # Fork: 1 of 5
> # Warmup Iteration 1: 1236,695 ns/op
> # Warmup Iteration 2: 1069,909 ns/op
> # Warmup Iteration 3: 1243,852 ns/op
> # Warmup Iteration 4: 1059,038 ns/op <----
> # Warmup Iteration 5: 1057,670 ns/op
> # Warmup Iteration 6: 1057,117 ns/op
> # Warmup Iteration 7: 1056,932 ns/op
> # Warmup Iteration 8: 1054,909 ns/op
> # Warmup Iteration 9: 1058,106 ns/op
> # Warmup Iteration 10: 1145,699 ns/op <----
> Iteration 1: 1139,155 ns/op
> Iteration 2: 1141,662 ns/op
> Iteration 3: 1139,061 ns/op
> Iteration 4: 1139,306 ns/op
> Iteration 5: 1138,475 ns/op
> Iteration 6: 1140,491 ns/op
> Iteration 7: 1144,017 ns/op
> Iteration 8: 1139,411 ns/op
> Iteration 9: 1142,729 ns/op
> Iteration 10: 1144,042 ns/op
>
>
> Here I use 1 s warm-up iteration on 2s measurement iteration. It looks like at iteration 4 the code is top-optimized,
> and later at the end of warm-up a kind of deoptimization happens. There's still visible improvement however.
>
> The benchmark is run with 5 forks, and this deoptimization is visible for length = {10, 100}.
>
> So my two question are:
> - What is the reason of the deoptimization and can we do something about it?
C2 does this sort of thing all the time: it's a heuristic probabilistic
optimizer. I've seen plenty of bimodal examples, where inlining changes
and each time a program is run it's either fast or slow, quasi-randomly.
If you really want to know, use -prof perfasm in JMH.
> - Whether suggested changes is worth implementations?
IMO the gain is too small. Others may disagree.
--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the core-libs-dev
mailing list