RFR: 8156071: List.of: reduce array copying during creation

Fri Oct 2 19:45:36 UTC 2020

On Thu, 1 Oct 2020 00:13:28 GMT, Stuart Marks <smarks at openjdk.org> wrote:

> Plumb new internal static factory method to trust the array passed in, avoiding unnecessary copying. JMH results for
> the benchmark show about 15% improvement for the cases that were optimized, namely the 3 to 10 fixed arg cases.
> # VM options: -verbose:gc -XX:+UseParallelGC -Xms4g -Xmx4g --enable-preview -verbose:gc -XX:+UsePara
> llelGC -Xms4g -Xmx4g -Xint
> # Warmup: 5 iterations, 1 s each
> # Measurement: 5 iterations, 2 s each
> 
> WITHOUT varargs optimization:
> 
> Benchmark         Mode  Cnt     Score     Error   Units
> ListArgs.list00  thrpt   15  6019.539 ± 144.040  ops/ms
> ListArgs.list01  thrpt   15  1985.009 ±  40.606  ops/ms
> ListArgs.list02  thrpt   15  1854.812 ±  17.488  ops/ms
> ListArgs.list03  thrpt   15   963.866 ±  10.262  ops/ms
> ListArgs.list04  thrpt   15   908.116 ±   6.278  ops/ms
> ListArgs.list05  thrpt   15   848.607 ±  16.701  ops/ms
> ListArgs.list06  thrpt   15   822.282 ±   8.905  ops/ms
> ListArgs.list07  thrpt   15   780.057 ±  11.214  ops/ms
> ListArgs.list08  thrpt   15   745.295 ±  19.204  ops/ms
> ListArgs.list09  thrpt   15   704.596 ±  14.003  ops/ms
> ListArgs.list10  thrpt   15   696.436 ±   4.914  ops/ms
> ListArgs.list11  thrpt   15   661.908 ±  11.041  ops/ms
> 
> WITH varargs optimization:
> 
> Benchmark         Mode  Cnt     Score    Error   Units
> ListArgs.list00  thrpt   15  6172.298 ± 62.736  ops/ms
> ListArgs.list01  thrpt   15  1987.724 ± 45.468  ops/ms
> ListArgs.list02  thrpt   15  1843.419 ± 10.693  ops/ms
> ListArgs.list03  thrpt   15  1126.946 ± 30.952  ops/ms
> ListArgs.list04  thrpt   15  1050.440 ± 17.859  ops/ms
> ListArgs.list05  thrpt   15   999.275 ± 23.656  ops/ms
> ListArgs.list06  thrpt   15   948.844 ± 19.615  ops/ms
> ListArgs.list07  thrpt   15   897.541 ± 15.531  ops/ms
> ListArgs.list08  thrpt   15   853.359 ± 18.755  ops/ms
> ListArgs.list09  thrpt   15   826.394 ±  8.284  ops/ms
> ListArgs.list10  thrpt   15   779.231 ±  4.104  ops/ms
> ListArgs.list11  thrpt   15   650.888 ±  3.948  ops/ms

After a hint from @cl4es I ran the benchmarks with `-prof gc`. The allocation rate is reduced by about 40% per
operation in the cases where the optimization was applied.

WITHOUT varargs optimization:

ListArgs.list00:·gc.alloc.rate.norm           thrpt    5    ≈ 10⁻⁴               B/op
ListArgs.list01:·gc.alloc.rate.norm           thrpt    5    24.000 ±    0.001    B/op
ListArgs.list02:·gc.alloc.rate.norm           thrpt    5    24.000 ±    0.001    B/op
ListArgs.list03:·gc.alloc.rate.norm           thrpt    5    80.000 ±    0.001    B/op
ListArgs.list04:·gc.alloc.rate.norm           thrpt    5    80.036 ±    0.309    B/op
ListArgs.list05:·gc.alloc.rate.norm           thrpt    5    96.037 ±    0.316    B/op
ListArgs.list06:·gc.alloc.rate.norm           thrpt    5    96.038 ±    0.326    B/op
ListArgs.list07:·gc.alloc.rate.norm           thrpt    5   112.042 ±    0.361    B/op
ListArgs.list08:·gc.alloc.rate.norm           thrpt    5   112.043 ±    0.367    B/op
ListArgs.list09:·gc.alloc.rate.norm           thrpt    5   128.045 ±    0.385    B/op
ListArgs.list10:·gc.alloc.rate.norm           thrpt    5   128.046 ±    0.391    B/op
ListArgs.list11:·gc.alloc.rate.norm           thrpt    5   144.047 ±    0.406    B/op

WITH varargs optimization:

ListArgs.list00:·gc.alloc.rate.norm           thrpt    5    ≈ 10⁻⁴               B/op
ListArgs.list01:·gc.alloc.rate.norm           thrpt    5    24.000 ±    0.001    B/op
ListArgs.list02:·gc.alloc.rate.norm           thrpt    5    24.000 ±    0.001    B/op
ListArgs.list03:·gc.alloc.rate.norm           thrpt    5    48.000 ±    0.001    B/op
ListArgs.list04:·gc.alloc.rate.norm           thrpt    5    48.000 ±    0.001    B/op
ListArgs.list05:·gc.alloc.rate.norm           thrpt    5    56.000 ±    0.001    B/op
ListArgs.list06:·gc.alloc.rate.norm           thrpt    5    56.000 ±    0.001    B/op
ListArgs.list07:·gc.alloc.rate.norm           thrpt    5    64.000 ±    0.001    B/op
ListArgs.list08:·gc.alloc.rate.norm           thrpt    5    64.000 ±    0.001    B/op
ListArgs.list09:·gc.alloc.rate.norm           thrpt    5    72.000 ±    0.001    B/op
ListArgs.list10:·gc.alloc.rate.norm           thrpt    5    72.000 ±    0.001    B/op
ListArgs.list11:·gc.alloc.rate.norm           thrpt    5   144.050 ±    0.427    B/op

-------------

PR: https://git.openjdk.java.net/jdk/pull/449