RFR: 8156071: List.of: reduce array copying during creation

Tue Oct 6 03:14:41 UTC 2020

On Fri, 2 Oct 2020 20:38:40 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

>> Plumb new internal static factory method to trust the array passed in, avoiding unnecessary copying. JMH results for
>> the benchmark show about 15% improvement for the cases that were optimized, namely the 3 to 10 fixed arg cases.
>> # VM options: -verbose:gc -XX:+UseParallelGC -Xms4g -Xmx4g --enable-preview -verbose:gc -XX:+UsePara
>> llelGC -Xms4g -Xmx4g -Xint
>> # Warmup: 5 iterations, 1 s each
>> # Measurement: 5 iterations, 2 s each
>> 
>> WITHOUT varargs optimization:
>> 
>> Benchmark         Mode  Cnt     Score     Error   Units
>> ListArgs.list00  thrpt   15  6019.539 ± 144.040  ops/ms
>> ListArgs.list01  thrpt   15  1985.009 ±  40.606  ops/ms
>> ListArgs.list02  thrpt   15  1854.812 ±  17.488  ops/ms
>> ListArgs.list03  thrpt   15   963.866 ±  10.262  ops/ms
>> ListArgs.list04  thrpt   15   908.116 ±   6.278  ops/ms
>> ListArgs.list05  thrpt   15   848.607 ±  16.701  ops/ms
>> ListArgs.list06  thrpt   15   822.282 ±   8.905  ops/ms
>> ListArgs.list07  thrpt   15   780.057 ±  11.214  ops/ms
>> ListArgs.list08  thrpt   15   745.295 ±  19.204  ops/ms
>> ListArgs.list09  thrpt   15   704.596 ±  14.003  ops/ms
>> ListArgs.list10  thrpt   15   696.436 ±   4.914  ops/ms
>> ListArgs.list11  thrpt   15   661.908 ±  11.041  ops/ms
>> 
>> WITH varargs optimization:
>> 
>> Benchmark         Mode  Cnt     Score    Error   Units
>> ListArgs.list00  thrpt   15  6172.298 ± 62.736  ops/ms
>> ListArgs.list01  thrpt   15  1987.724 ± 45.468  ops/ms
>> ListArgs.list02  thrpt   15  1843.419 ± 10.693  ops/ms
>> ListArgs.list03  thrpt   15  1126.946 ± 30.952  ops/ms
>> ListArgs.list04  thrpt   15  1050.440 ± 17.859  ops/ms
>> ListArgs.list05  thrpt   15   999.275 ± 23.656  ops/ms
>> ListArgs.list06  thrpt   15   948.844 ± 19.615  ops/ms
>> ListArgs.list07  thrpt   15   897.541 ± 15.531  ops/ms
>> ListArgs.list08  thrpt   15   853.359 ± 18.755  ops/ms
>> ListArgs.list09  thrpt   15   826.394 ±  8.284  ops/ms
>> ListArgs.list10  thrpt   15   779.231 ±  4.104  ops/ms
>> ListArgs.list11  thrpt   15   650.888 ±  3.948  ops/ms
>
> Looks good, i wondered why the performance results were so slow then i looked more closely and saw "-Xint" was used. I
> usually don't ascribe much value to micro benchmarks run in interpreter only mode, but hey any shaving off startup time
> is welcome. Less allocation is definitely welcome (although i do wish C2 was better at eliding redundant array
> initialization and allocation).

Sorry to be late to the party. I thought that all reviews labeled with core-libs should be mirrored to core-libs-dev
mailing list but I haven't seen it there :(

Please note that the integrated implementation exposes listFromTrustedArray to everybody. No dirty unsafe reflection is
necessary, only single unchecked cast:

  static <T> List<T> untrustedArrayToList(T[] array) {
    @SuppressWarnings("unchecked")
    Function<List<T>, List<T>> finisher =
        (Function<List<T>, List<T>>) Collectors.<T>toUnmodifiableList().finisher();
    ArrayList<T> list = new ArrayList<>() {
      @Override
      public Object[] toArray() {
        return array;
      }
    };
    return finisher.apply(list);
  }

This might be qualified as a security issue.

-------------

PR: https://git.openjdk.java.net/jdk/pull/449