Optimization (?) of HashSet(Collection)

Fri Oct 15 08:57:37 UTC 2021

On 10/4/21 12:57, Сергей Цыпанов wrote:

> 
> in the code of HashSet(Collection) we have an optimization opportunity:
> 
> public HashSet(Collection<? extends E> c) {
>    map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
>    addAll(c);
> }
> 
> instead of using addAll() inherited from j.u.Collection we can use c.forEach(this::add):
> > public HashSet(Collection<? extends E> c) {
>    map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
>    c.forEach(this::add);
> }
> 
> This allows to rid Iterator and use counting loops for e.g. ArrayList instead of hasNext()/next().

Well, perhaps, but this sort of low-level hackery is not IMO something we
should reflexively do in the core class libraries.

> We are also likely to benefit from this change in case the argument collection is synchronized as it's going to be locked only once.

That's a good point, but would a synchronized collection really lock
coarsely around forEach(), rather than in add().

> I've used the benchmark [1] to analyze the impact and it demonstrates measurable improvement [2].
> 
> What puzzles we a bit is the result for j.u.ArrayList. At warm-up time it demonstrates better results than at measurement time:
> 
>   length = 10
> 
> # Run progress: 4,44% complete, ETA 00:21:41
> # Fork: 3 of 5
> # Warmup Iteration   1: 134,699 ns/op
> # Warmup Iteration   2: 115,391 ns/op
> # Warmup Iteration   3: 130,110 ns/op
> # Warmup Iteration   4: 114,465 ns/op  <----
> # Warmup Iteration   5: 114,849 ns/op
> # Warmup Iteration   6: 115,073 ns/op
> # Warmup Iteration   7: 113,988 ns/op
> # Warmup Iteration   8: 115,301 ns/op
> # Warmup Iteration   9: 115,452 ns/op
> # Warmup Iteration  10: 124,493 ns/op  <----
> Iteration   1: 123,776 ns/op
> Iteration   2: 123,719 ns/op
> Iteration   3: 123,725 ns/op
> Iteration   4: 126,892 ns/op
> Iteration   5: 125,493 ns/op
> Iteration   6: 124,661 ns/op
> Iteration   7: 123,783 ns/op
> Iteration   8: 123,975 ns/op
> Iteration   9: 124,047 ns/op
> Iteration  10: 123,899 ns/op
> 
> length = 100
> 
> # Run progress: 11,11% complete, ETA 00:20:10
> # Fork: 1 of 5
> # Warmup Iteration   1: 1236,695 ns/op
> # Warmup Iteration   2: 1069,909 ns/op
> # Warmup Iteration   3: 1243,852 ns/op
> # Warmup Iteration   4: 1059,038 ns/op  <----
> # Warmup Iteration   5: 1057,670 ns/op
> # Warmup Iteration   6: 1057,117 ns/op
> # Warmup Iteration   7: 1056,932 ns/op
> # Warmup Iteration   8: 1054,909 ns/op
> # Warmup Iteration   9: 1058,106 ns/op
> # Warmup Iteration  10: 1145,699 ns/op  <----
> Iteration   1: 1139,155 ns/op
> Iteration   2: 1141,662 ns/op
> Iteration   3: 1139,061 ns/op
> Iteration   4: 1139,306 ns/op
> Iteration   5: 1138,475 ns/op
> Iteration   6: 1140,491 ns/op
> Iteration   7: 1144,017 ns/op
> Iteration   8: 1139,411 ns/op
> Iteration   9: 1142,729 ns/op
> Iteration  10: 1144,042 ns/op
> 
> 
> Here I use 1 s warm-up iteration on 2s measurement iteration. It looks like at iteration 4 the code is top-optimized,
> and later at the end of warm-up a kind of deoptimization happens. There's still visible improvement however.
> 
> The benchmark is run with 5 forks, and this deoptimization is visible for length = {10, 100}.
> 
> So my two question are:
> - What is the reason of the deoptimization and can we do something about it?

C2 does this sort of thing all the time: it's a heuristic probabilistic
optimizer. I've seen plenty of bimodal examples, where inlining changes
and each time a program is run  it's either fast or slow, quasi-randomly.

If you really want to know, use -prof perfasm in JMH.

> - Whether suggested changes is worth implementations?

IMO the gain is too small. Others may disagree.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671