RFR: 8193031: Collections.addAll is likely to perform worse than Collection.addAll

Stuart Marks stuart.marks at oracle.com
Tue Dec 5 07:27:41 UTC 2017



On 12/4/17 7:50 PM, Martin Buchholz wrote:
> https://bugs.openjdk.java.net/browse/JDK-8193031
> http://cr.openjdk.java.net/~martin/webrevs/jdk/Collections-addAll/

       * Adds all of the specified elements to the specified collection.
       * Elements to be added may be specified individually or as an array.
       * The behavior of this convenience method is identical to that of
-     * {@code c.addAll(Arrays.asList(elements))}, but this method is likely
-     * to run significantly faster under most implementations.
+     * {@code c.addAll(Arrays.asList(elements))}.
       *
       * <p>When elements are specified individually, this method provides a
       * convenient way to add a few elements to an existing collection:

I agree with the removal of "significantly faster." In some real cases, it's 
definitely not! Usually the behavior wording isn't "is identical to". I'd prefer 
something like "the behavior is as if...."

      public static <T> boolean addAll(Collection<? super T> c, T... elements) {
-        boolean result = false;
-        for (T element : elements)
-            result |= c.add(element);
-        return result;
+        return c.addAll(Arrays.asList(elements));
      }

So, this seems sensible, but the tradeoffs aren't obvious.

If the destination is an ArrayList, the first thing that addAll() does is call 
toArray() on the argument. And Arrays$ArrayList.toArray() makes a copy, as 
required by spec. So this redundantly copies every element, compared to the 
for-loop.

But I did some quick benchmarks and I found that bulk copies are so fast that 
even doing two of them is quite a bit faster than a single copy in a for-loop, 
ranging from 4x-6x faster over a variety of sizes.

This also requires temp space the size of the input array. This gives me (and 
the garbage collector) a bit of a pause.

Turning to other collections, the AbstractCollection.addAll() implementation 
runs essentially the same loop as here, except that it runs it over its 
Collection argument instead of an array. Thus it uses an Iterator to loop 
instead of indexing over an array. Here, converting to a list has a slight 
performance disadvantage, around 10-15% (for HashSet, which uses 
AbstractCollection.addAll).

You point out in the bug report that the implementation of Collections.addAll() 
misses out on optimizations that implementations can provide for bulk updates. 
This is true, but my feeling is that loses something by converting the array 
into a Collection before calling the virtual addAll(Collection) method.

How about this alternative: do the same thing we did with sort().

Add a new default method Collection.addEach(T...) whose default implementation 
is either the original implementation of Collections.addAll() or your proposed 
modification. (We should use a name other than addAll, since overloading with a 
varargs method is a bad idea.) Add overrides for key implementations like 
ArrayList. Then, have Collections.addAll(c, T... elements) delegate to 
c.addEach(elements).

For ArrayList, at least, this would avoid the redundant copy and temporary 
allocation, which seems nice.

s'marks


More information about the core-libs-dev mailing list