Optimization (?) of HashSet(Collection)

Mon Oct 4 11:57:17 UTC 2021

Hello,

in the code of HashSet(Collection) we have an optimization opportunity:

public HashSet(Collection<? extends E> c) {
  map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
  addAll(c);
}

instead of using addAll() inherited from j.u.Collection we can use c.forEach(this::add):

public HashSet(Collection<? extends E> c) {
  map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
  c.forEach(this::add);
}

This allows to rid Iterator and use counting loops for e.g. ArrayList instead of hasNext()/next().

To me the most common use cases of HashSet(Collection) are those:
- ArrayList/Arrays.asList() is passed to convert List->Set or/and excluded duplicates
- HashSet is passed to have a copy of original Set

We are also likely to benefit from this change in case the argument collection is synchronized as it's going to be locked only once.

I've used the benchmark [1] to analyze the impact and it demonstrates measurable improvement [2].

What puzzles we a bit is the result for j.u.ArrayList. At warm-up time it demonstrates better results than at measurement time:

 length = 10

# Run progress: 4,44% complete, ETA 00:21:41
# Fork: 3 of 5
# Warmup Iteration   1: 134,699 ns/op
# Warmup Iteration   2: 115,391 ns/op
# Warmup Iteration   3: 130,110 ns/op
# Warmup Iteration   4: 114,465 ns/op  <----
# Warmup Iteration   5: 114,849 ns/op
# Warmup Iteration   6: 115,073 ns/op
# Warmup Iteration   7: 113,988 ns/op
# Warmup Iteration   8: 115,301 ns/op
# Warmup Iteration   9: 115,452 ns/op
# Warmup Iteration  10: 124,493 ns/op  <----
Iteration   1: 123,776 ns/op
Iteration   2: 123,719 ns/op
Iteration   3: 123,725 ns/op
Iteration   4: 126,892 ns/op
Iteration   5: 125,493 ns/op
Iteration   6: 124,661 ns/op
Iteration   7: 123,783 ns/op
Iteration   8: 123,975 ns/op
Iteration   9: 124,047 ns/op
Iteration  10: 123,899 ns/op

length = 100

# Run progress: 11,11% complete, ETA 00:20:10
# Fork: 1 of 5
# Warmup Iteration   1: 1236,695 ns/op
# Warmup Iteration   2: 1069,909 ns/op
# Warmup Iteration   3: 1243,852 ns/op
# Warmup Iteration   4: 1059,038 ns/op  <----
# Warmup Iteration   5: 1057,670 ns/op
# Warmup Iteration   6: 1057,117 ns/op
# Warmup Iteration   7: 1056,932 ns/op
# Warmup Iteration   8: 1054,909 ns/op
# Warmup Iteration   9: 1058,106 ns/op
# Warmup Iteration  10: 1145,699 ns/op  <----
Iteration   1: 1139,155 ns/op
Iteration   2: 1141,662 ns/op
Iteration   3: 1139,061 ns/op
Iteration   4: 1139,306 ns/op
Iteration   5: 1138,475 ns/op
Iteration   6: 1140,491 ns/op
Iteration   7: 1144,017 ns/op
Iteration   8: 1139,411 ns/op
Iteration   9: 1142,729 ns/op
Iteration  10: 1144,042 ns/op

Here I use 1 s warm-up iteration on 2s measurement iteration. It looks like at iteration 4 the code is top-optimized,
and later at the end of warm-up a kind of deoptimization happens. There's still visible improvement however.

The benchmark is run with 5 forks, and this deoptimization is visible for length = {10, 100}.

So my two question are:
- What is the reason of the deoptimization and can we do something about it?
- Whether suggested changes is worth implementations?

Regards,
Sergey Tsypanov

1.

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"})
public class HashSetBenchmark {

  @Benchmark
  public HashSet<Integer> fromArrayList(Data data) {
    return new HashSet<>(data.arrayList);
  }

  @Benchmark
  public HashSet<Integer> fromHashSet(Data data) {
    return new HashSet<>(data.hashSet);
  }

  @Benchmark
  public HashSet<Integer> fromArraysAsList(Data data) {
    return new HashSet<>(data.arraysAsList);
  }

  @State(Scope.Thread)
  public static class Data {
    private ArrayList<Integer> arrayList;
    private HashSet<Integer> hashSet;
    private List<Integer> arraysAsList;

    @Param({"10", "100", "1000"})
    private int length;

    @Setup
    public void setup() throws IOException {
      var array = new Integer[length];
      for (int i = 0; i < length; i++) {
        array[i] = i;
      }
      arraysAsList = Arrays.asList(array);
      arrayList = new ArrayList<>(arraysAsList);
      hashSet = new HashSet<>(arraysAsList);
    }
  }
}

2.

baseline

Benchmark                          (length)  Mode  Cnt      Score     Error  Units
HashSetBenchmark.fromArrayList           10  avgt   50    127,838 ±   1,822  ns/op
HashSetBenchmark.fromArrayList          100  avgt   50   1209,543 ±   8,435  ns/op
HashSetBenchmark.fromArrayList         1000  avgt   50  11915,334 ± 155,908  ns/op
HashSetBenchmark.fromArraysAsList        10  avgt   50    130,049 ±   1,661  ns/op
HashSetBenchmark.fromArraysAsList       100  avgt   50   1138,407 ±  19,998  ns/op
HashSetBenchmark.fromArraysAsList      1000  avgt   50  11345,099 ±  83,086  ns/op
HashSetBenchmark.fromHashSet             10  avgt   50    184,559 ±   1,387  ns/op
HashSetBenchmark.fromHashSet            100  avgt   50   2001,577 ±  42,321  ns/op
HashSetBenchmark.fromHashSet           1000  avgt   50  17535,017 ± 199,892  ns/op

patched

Benchmark                          (length)  Mode  Cnt      Score     Error  Units
HashSetBenchmark.fromArrayList           10  avgt   50    125,075 ±   0,907  ns/op
HashSetBenchmark.fromArrayList          100  avgt   50   1127,378 ±  19,368  ns/op
HashSetBenchmark.fromArrayList         1000  avgt   50  11455,172 ±  61,543  ns/op
HashSetBenchmark.fromArraysAsList        10  avgt   50    117,634 ±   3,641  ns/op
HashSetBenchmark.fromArraysAsList       100  avgt   50   1034,538 ±  11,958  ns/op
HashSetBenchmark.fromArraysAsList      1000  avgt   50  10786,914 ± 137,330  ns/op
HashSetBenchmark.fromHashSet             10  avgt   50    173,727 ±   2,177  ns/op
HashSetBenchmark.fromHashSet            100  avgt   50   1748,858 ±  12,535  ns/op
HashSetBenchmark.fromHashSet           1000  avgt   50  16474,208 ±  97,578  ns/op