Review for CR 6728865 : Improved heuristics for Collections.disjoint() [updated]

Ulf Zibis Ulf.Zibis at gmx.de
Tue Dec 21 13:44:08 UTC 2010


Am 21.12.2010 02:48, schrieb Mike Duigou:
> I've updated the webrev with Ulf's feedback. http://cr.openjdk.java.net/~mduigou/6728865.2/webrev/

I think your explanation belongs to both variables, contains and iterate.
Furthermore the iterated collection is the one with the major importance and accessed first, so 
declare and later assign it at 1. position:

         // As performance of Set's contains() is  less than O(N/2),
         //  iteration is given to the remaining collection.
         // For collections who's contains() are of the same complexity then
         // best performance is achieved by iterating the smaller collection.
         Collection<?>  iterate;
         Collection<?>  contains;

But anyway, the explanation IMO would be better located close to the if...else branch (see for my 
2nd approach below).

Better concatenate else + if into one line, so the first 3 cases of your heuristics appear more clear:

         } else  if (c2 instanceof Set) {


Now "E.g." should be capitalized instead "if". ;-)


Anyway I still prefer my 2nd approach, as it's more compact and easier to follow.
I believe, GIT compiler will optimize the actual order of operations anyway, and javac maybe too.
But my code could be better documented:

         // collections to iterate and compare on
         Collection<?> iterate = c1;
         Collection<?> contains = c2;

         // performance optimization cases
         if (c1 instanceof Set) {
             // As Set's contains() performs better (less than O(N/2))
             //than mere Collection's,iterate on c2.
             iterate = c2;
             contains = c1;
         } else if (!(c2 instanceof Set)) {
             // Both are mere Collections. Iterate over smaller collection.
             // E.g. if c1 contains 3 elements and c2 contains 50 elements and
             // assuming contains() requires ceiling(n/2) comparisons then
             // checking for all c1 elements in c2 would require 75 comparisons
             // vs. all c2 elements in c1 would require 100.
             int c1size = c1.size();
             int c2size = c2.size();
             if (c1size == 0 || c2size == 0) {
                 // One of the collections is empty. Nothing will match.
                 return true;
             } else if (c1size > c2size) {
                 iterate = c2;
                 contains = c1;
             }
         }


The heuristics:

1. Generally iterate over c1.

2. If c1 is a Set then iterate over c2.

3. If either collection is empty then result is always true.

4. Iterate over the smaller Collection.

-Ulf




More information about the core-libs-dev mailing list