RFR(s): 8060192: Add default method Collection.toArray(generator)
Stuart Marks
stuart.marks at oracle.com
Tue Dec 12 22:56:51 UTC 2017
On 12/7/17 5:03 PM, Martin Buchholz wrote:
> (I'm still trying to love this new API)
>
> The changes to the jsr166 tck are fine.
>
> I'm not convinced that the new implementation for ArrayList is progress. The
> current implementation of toArray(T[]) does
>
> // Make a new array of a's runtime type, but my contents:
> return (T[]) Arrays.copyOf(elementData, size, a.getClass());
>
> which does not have to pay the cost of zeroing the array, so is potentially
> faster. (depends on whether the VM provides cheap pre-zeroed memory?! what does
> jmh say?)
>
> If we're not getting type safety and not getting significantly better
> performance, all we have is a more natural API. But toArray(IntFunction) also
> seems not very natural to me, and we'll have to live with the toArray(new
> String[0]) wart forever anyways. (But is it really so bad?) (Maybe
> toArray(Class<componentType>) is actually more natural?)
A lot of ideas flying around here. (Thanks John! :-)) But for this message let
me focus on performance.
I think implicitly there's some concern about the new API and whether it might
suffer inherent performance disadvantages compared to the existing APIs. It
certainly doesn't seem to provide any inherent advantages. I spent some time
doing benchmarking more rigorously, and I was able to get similar results to
those from Alexey Shipilëv's article:
https://shipilev.net/blog/2016/arrays-wisdom-ancients/
That is, I was able to discern the difference between the copy-to-Object[] case,
the copy to presized array case, and copy to zero-sized array case. Also added a
potential new API toArray(Class<T>) for comparison.
Here are the results. For brevity, I ran only the tests with an ArrayList of
size 1000:
Benchmark (size) (type) Mode Cnt Score Error Units
ToArrayBench.clazz 1000 arraylist avgt 30 1423.924 ± 11.437 ns/op
ToArrayBench.clazz0 1000 arraylist avgt 30 1173.442 ± 11.614 ns/op
ToArrayBench.clazzDf 1000 arraylist avgt 30 1179.648 ± 17.662 ns/op
ToArrayBench.lambda 1000 arraylist avgt 30 1421.910 ± 21.320 ns/op
ToArrayBench.lambda0 1000 arraylist avgt 30 1168.863 ± 11.109 ns/op
ToArrayBench.lambdaDf 1000 arraylist avgt 30 1168.372 ± 9.512 ns/op
ToArrayBench.simple 1000 arraylist avgt 30 462.371 ± 8.693 ns/op
ToArrayBench.sized 1000 arraylist avgt 30 1417.483 ± 11.451 ns/op
ToArrayBench.zero 1000 arraylist avgt 30 1182.932 ± 27.773 ns/op
Legend:
clazz - ArrayList override of toArray(Class<T>)
T[] a = (T[])java.lang.reflect.Array.newInstance(Foo.class, size);
System.arraycopy(elementData, 0, a, 0, size);
clazz0 - ArrayList override of toArray(Class<T>)
T[] a = (T[])java.lang.reflect.Array.newInstance(clazz, 0);
return (T[]) Arrays.copyOf(elementData, size, a.getClass());
classDf - Collection default method toArray(Class<T>)
toArray((T[])java.lang.reflect.Array.newInstance(clazz, 0))
lambda - ArrayList override of toArray(IntFunc)
T[] a = generator.apply(size);
System.arraycopy(elementData, 0, a, 0, size);
lambda0 - ArrayList override of toArray(IntFunc)
(T[]) Arrays.copyOf(elementData, size, generator.apply(0).getClass())
lambdaDf - Collection default method toArray(IntFunc)
toArray(generator.apply(0))
simple - existing no-arg toArray() method
sized - existing toArray(T[]) called with presized array
coll.toArray(new Foo[coll.size()])
zero - existing toArray(T[]) called with zero-sized array
coll.toArray(new Foo[0])
----------
A couple observations from this. First, "lambda", the ArrayList override of
toArray(IntFunc), is slower than "lambda0" or "lambdaDf", both of which create
zero-length arrays and pass them elsewhere. So Martin, you're right, the
override I put into ArrayList isn't buying anything. I'll take it out. Oddly
enough, just inheriting the default might be sufficient.
(And the same probably applies to Arrays$ArrayList as well.)
Second, setting aside "simple" case (which is fast because doesn't have to do
any array store checking) we see performance falling into two buckets: one that
takes around 1400ns, and the other that takes around 1170ns. It turns out that
the faster cases all end up calling the Arrays.copyOf() method. This is an
intrinsic that can avoid the work of zeroing the array, because it allocates the
array immediately before filling it with data from the source.
I would have thought that allocating the array immediately prior to filling it
with System.arraycopy would have done the same, but apparently not.
We can see the effect of intrinsics by disabling the Arrays.copyOf intrinsic by
applying the -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_copyOf options:
Benchmark (size) (type) Mode Cnt Score Error Units
ToArrayBench.clazz 1000 arraylist avgt 30 1413.922 ± 9.570 ns/op
ToArrayBench.clazz0 1000 arraylist avgt 30 1427.857 ± 13.669 ns/op
ToArrayBench.clazzDf 1000 arraylist avgt 30 1448.274 ± 15.734 ns/op
ToArrayBench.lambda 1000 arraylist avgt 30 1423.556 ± 17.312 ns/op
ToArrayBench.lambda0 1000 arraylist avgt 30 1422.177 ± 20.650 ns/op
ToArrayBench.lambdaDf 1000 arraylist avgt 30 1464.320 ± 26.199 ns/op
ToArrayBench.simple 1000 arraylist avgt 30 399.856 ± 4.893 ns/op
ToArrayBench.sized 1000 arraylist avgt 30 1417.668 ± 8.979 ns/op
ToArrayBench.zero 1000 arraylist avgt 30 1438.527 ± 11.521 ns/op
Here, we can see that everything (except "simple") has moved into the 1400ns
range. (Not sure why "simple" appears to have gotten faster.)
My conclusion from this is that the choice of one API over another doesn't offer
any inherent performance advantage. You can make any of the APIs go fast if you
can arrange for it to call an intrinsic, even if this ends up allocating an
"extra" zero-length array.
So the issue is really about which API we want to expose. I'll reserve that
discussion for another message.
s'marks
More information about the core-libs-dev
mailing list