RFR(s): 8060192: Add default method Collection.toArray(generator)

Tue Dec 12 22:56:51 UTC 2017

On 12/7/17 5:03 PM, Martin Buchholz wrote:
> (I'm still trying to love this new API)
> 
> The changes to the jsr166 tck are fine.
> 
> I'm not convinced that the new implementation for ArrayList is progress.  The 
> current implementation of toArray(T[]) does
> 
>              // Make a new array of a's runtime type, but my contents:
>              return (T[]) Arrays.copyOf(elementData, size, a.getClass());
> 
> which does not have to pay the cost of zeroing the array, so is potentially 
> faster.  (depends on whether the VM provides cheap pre-zeroed memory?! what does 
> jmh say?)
> 
> If we're not getting type safety and not getting significantly better 
> performance, all we have is a more natural API.  But toArray(IntFunction) also 
> seems not very natural to me, and we'll have to live with the toArray(new 
> String[0]) wart forever anyways.  (But is it really so bad?)  (Maybe 
> toArray(Class<componentType>) is actually more natural?)

A lot of ideas flying around here. (Thanks John! :-)) But for this message let 
me focus on performance.

I think implicitly there's some concern about the new API and whether it might 
suffer inherent performance disadvantages compared to the existing APIs. It 
certainly doesn't seem to provide any inherent advantages. I spent some time 
doing benchmarking more rigorously, and I was able to get similar results to 
those from Alexey Shipilëv's article:

https://shipilev.net/blog/2016/arrays-wisdom-ancients/

That is, I was able to discern the difference between the copy-to-Object[] case, 
the copy to presized array case, and copy to zero-sized array case. Also added a 
potential new API toArray(Class<T>) for comparison.

Here are the results. For brevity, I ran only the tests with an ArrayList of 
size 1000:

Benchmark              (size)     (type)  Mode  Cnt     Score    Error  Units
ToArrayBench.clazz       1000  arraylist  avgt   30  1423.924 ± 11.437  ns/op
ToArrayBench.clazz0      1000  arraylist  avgt   30  1173.442 ± 11.614  ns/op
ToArrayBench.clazzDf     1000  arraylist  avgt   30  1179.648 ± 17.662  ns/op
ToArrayBench.lambda      1000  arraylist  avgt   30  1421.910 ± 21.320  ns/op
ToArrayBench.lambda0     1000  arraylist  avgt   30  1168.863 ± 11.109  ns/op
ToArrayBench.lambdaDf    1000  arraylist  avgt   30  1168.372 ±  9.512  ns/op
ToArrayBench.simple      1000  arraylist  avgt   30   462.371 ±  8.693  ns/op
ToArrayBench.sized       1000  arraylist  avgt   30  1417.483 ± 11.451  ns/op
ToArrayBench.zero        1000  arraylist  avgt   30  1182.932 ± 27.773  ns/op

Legend:

clazz - ArrayList override of toArray(Class<T>)

     T[] a = (T[])java.lang.reflect.Array.newInstance(Foo.class, size);
     System.arraycopy(elementData, 0, a, 0, size);

clazz0 - ArrayList override of toArray(Class<T>)

     T[] a = (T[])java.lang.reflect.Array.newInstance(clazz, 0);
     return (T[]) Arrays.copyOf(elementData, size, a.getClass());

classDf - Collection default method toArray(Class<T>)

     toArray((T[])java.lang.reflect.Array.newInstance(clazz, 0))

lambda - ArrayList override of toArray(IntFunc)

     T[] a = generator.apply(size);
     System.arraycopy(elementData, 0, a, 0, size);

lambda0 - ArrayList override of toArray(IntFunc)

     (T[]) Arrays.copyOf(elementData, size, generator.apply(0).getClass())

lambdaDf - Collection default method toArray(IntFunc)

     toArray(generator.apply(0))

simple - existing no-arg toArray() method

sized - existing toArray(T[]) called with presized array

     coll.toArray(new Foo[coll.size()])

zero - existing toArray(T[]) called with zero-sized array

     coll.toArray(new Foo[0])

----------

A couple observations from this. First, "lambda", the ArrayList override of 
toArray(IntFunc), is slower than "lambda0" or "lambdaDf", both of which create 
zero-length arrays and pass them elsewhere. So Martin, you're right, the 
override I put into ArrayList isn't buying anything. I'll take it out. Oddly 
enough, just inheriting the default might be sufficient.

(And the same probably applies to Arrays$ArrayList as well.)

Second, setting aside "simple" case (which is fast because doesn't have to do 
any array store checking) we see performance falling into two buckets: one that 
takes around 1400ns, and the other that takes around 1170ns. It turns out that 
the faster cases all end up calling the Arrays.copyOf() method. This is an 
intrinsic that can avoid the work of zeroing the array, because it allocates the 
array immediately before filling it with data from the source.

I would have thought that allocating the array immediately prior to filling it 
with System.arraycopy would have done the same, but apparently not.

We can see the effect of intrinsics by disabling the Arrays.copyOf intrinsic by 
applying the -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_copyOf options:

Benchmark              (size)     (type)  Mode  Cnt     Score    Error  Units
ToArrayBench.clazz       1000  arraylist  avgt   30  1413.922 ±  9.570  ns/op
ToArrayBench.clazz0      1000  arraylist  avgt   30  1427.857 ± 13.669  ns/op
ToArrayBench.clazzDf     1000  arraylist  avgt   30  1448.274 ± 15.734  ns/op
ToArrayBench.lambda      1000  arraylist  avgt   30  1423.556 ± 17.312  ns/op
ToArrayBench.lambda0     1000  arraylist  avgt   30  1422.177 ± 20.650  ns/op
ToArrayBench.lambdaDf    1000  arraylist  avgt   30  1464.320 ± 26.199  ns/op
ToArrayBench.simple      1000  arraylist  avgt   30   399.856 ±  4.893  ns/op
ToArrayBench.sized       1000  arraylist  avgt   30  1417.668 ±  8.979  ns/op
ToArrayBench.zero        1000  arraylist  avgt   30  1438.527 ± 11.521  ns/op

Here, we can see that everything (except "simple") has moved into the 1400ns 
range. (Not sure why "simple" appears to have gotten faster.)

My conclusion from this is that the choice of one API over another doesn't offer 
any inherent performance advantage. You can make any of the APIs go fast if you 
can arrange for it to call an intrinsic, even if this ends up allocating an 
"extra" zero-length array.

So the issue is really about which API we want to expose. I'll reserve that 
discussion for another message.

s'marks