RFR: 8156071: List.of: reduce array copying during creation

Fri Oct 2 20:41:45 UTC 2020

On Thu, 1 Oct 2020 00:13:28 GMT, Stuart Marks <smarks at openjdk.org> wrote:

> Plumb new internal static factory method to trust the array passed in, avoiding unnecessary copying. JMH results for
> the benchmark show about 15% improvement for the cases that were optimized, namely the 3 to 10 fixed arg cases.
> # VM options: -verbose:gc -XX:+UseParallelGC -Xms4g -Xmx4g --enable-preview -verbose:gc -XX:+UsePara
> llelGC -Xms4g -Xmx4g -Xint
> # Warmup: 5 iterations, 1 s each
> # Measurement: 5 iterations, 2 s each
> 
> WITHOUT varargs optimization:
> 
> Benchmark         Mode  Cnt     Score     Error   Units
> ListArgs.list00  thrpt   15  6019.539 ± 144.040  ops/ms
> ListArgs.list01  thrpt   15  1985.009 ±  40.606  ops/ms
> ListArgs.list02  thrpt   15  1854.812 ±  17.488  ops/ms
> ListArgs.list03  thrpt   15   963.866 ±  10.262  ops/ms
> ListArgs.list04  thrpt   15   908.116 ±   6.278  ops/ms
> ListArgs.list05  thrpt   15   848.607 ±  16.701  ops/ms
> ListArgs.list06  thrpt   15   822.282 ±   8.905  ops/ms
> ListArgs.list07  thrpt   15   780.057 ±  11.214  ops/ms
> ListArgs.list08  thrpt   15   745.295 ±  19.204  ops/ms
> ListArgs.list09  thrpt   15   704.596 ±  14.003  ops/ms
> ListArgs.list10  thrpt   15   696.436 ±   4.914  ops/ms
> ListArgs.list11  thrpt   15   661.908 ±  11.041  ops/ms
> 
> WITH varargs optimization:
> 
> Benchmark         Mode  Cnt     Score    Error   Units
> ListArgs.list00  thrpt   15  6172.298 ± 62.736  ops/ms
> ListArgs.list01  thrpt   15  1987.724 ± 45.468  ops/ms
> ListArgs.list02  thrpt   15  1843.419 ± 10.693  ops/ms
> ListArgs.list03  thrpt   15  1126.946 ± 30.952  ops/ms
> ListArgs.list04  thrpt   15  1050.440 ± 17.859  ops/ms
> ListArgs.list05  thrpt   15   999.275 ± 23.656  ops/ms
> ListArgs.list06  thrpt   15   948.844 ± 19.615  ops/ms
> ListArgs.list07  thrpt   15   897.541 ± 15.531  ops/ms
> ListArgs.list08  thrpt   15   853.359 ± 18.755  ops/ms
> ListArgs.list09  thrpt   15   826.394 ±  8.284  ops/ms
> ListArgs.list10  thrpt   15   779.231 ±  4.104  ops/ms
> ListArgs.list11  thrpt   15   650.888 ±  3.948  ops/ms

Looks good, i wondered why the performance results were so slow then i looked more closely and saw "-Xint" was used. I
usually don't ascribe much value to micro benchmarks run in interpreter only mode, but hey any shaving off startup time
is welcome. Less allocation is definitely welcome (although i do wish C2 was better at eliding redundant array
initialization and allocation).

-------------

Marked as reviewed by psandoz (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/449