Review for CR 6728865 : Improved heuristics for Collections.disjoint() [updated]

Ulf Zibis Ulf.Zibis at gmx.de
Sat Dec 25 01:32:56 UTC 2010


Trying to correct white space ...

Am 23.12.2010 23:59, schrieb Paul Benedict:
> Ulf, a previous email by Remi said only to invoke size() if the collection is a Set.
Thanks I missed that.
My guess was, that size() should be always faster than instantiating an iterator for the final 
for-loop, and then seeing, that there is no element.
But:
Given Set c1 with 100 elements and Set c2 with 0 elements.
Then we iterate and compare over 100 elements for nothing.

Result map for method disjoint():

           c1    |         Set        |        mere
c2   | elements |   0  |   3  |  50  |   0  |   3  |  50
---------------------------------------------------------------------
      |    0     | true | true | true | true | true | true

Set  |    3     | true | t/f  | t/f  | true | t/f  | t/f

      |   50     | true | t/f  | t/f  | true | t/f  | t/f
---------------------------------------------------------------------
      |    0     | true | true | true | true | true | true

mere |    3     | true | t/f  | t/f  | true | t/f  | t/f

      |   50     | true | t/f  | t/f  | true | t/f  | t/f
---------------------------------------------------------------------


(1) Iteration over in current webrev.3:

           c1    |         Set        |        mere
c2   | elements |   0  |   3  |  50  |   0  |   3  |  50
---------------------------------------------------------------------
      |    0     |  c2  |  c2  |  c2  |  c1  |  c1  |  c1

Set  |    3     |  c2  |  c2  |  c2  |  c1  |  c1  |  c1

      |   50     |  c2  |  c2  |  c2  |  c1  |  c1  |  c1
---------------------------------------------------------------------
      |    0     |  c2  |  c2  |  c2  | none | none | none

mere |    3     |  c2  |  c2  |  c2  | none |  c1  | c2

      |   50     |  c2  |  c2  |  c2  | none |  c1  | c1
---------------------------------------------------------------------


(2) Ideal iteration over:

           c1    |         Set        |        mere
c2   | elements |   0  |   3  |  50  |   0  |   3  |  50
---------------------------------------------------------------------
      |    0     | none | none | none | none | none | none

Set  |    3     | none | c1/2*|  c2* | none |  c1  |  c1

      |   50     | none |  c1* | c1/2*| none |  c1  |  c1
---------------------------------------------------------------------
      |    0     | none | none | none | none | none | none

mere |    3     | none |  c2  |  c2  | none | c1/2 |  c2

      |   50     | none |  c2  |  c2  | none |  c1  | c1/2
---------------------------------------------------------------------
* Optimum could differ depending on types of Sets e.g. HashSet, TreeSet,
   so further optimization is thinkable.


(3) Ideal iteration over, according "Set.size() can be expensive":
(I only found ConcurrentSkipListSet, are there others? Anyway, isn't size() anyhow cheaper than 
superfluously looping contains() ?)

           c1    |         Set        |        mere
c2   | elements |   0  |   3  |  50  |   0  |   3  |  50
---------------------------------------------------------------------
      |    0     | c1/2 | c1/2 | c1/2 |  c1  |  c1  |  c1

Set  |    3     | c1/2 | c1/2 | c1/2 |  c1  |  c1  |  c1

      |   50     | c1/2 | c1/2 | c1/2 |  c1  |  c1  |  c1
---------------------------------------------------------------------
      |    0     |  c2  |  c2  |  c2  | none | none | none

mere |    3     |  c2  |  c2  |  c2  | none | c1/2 |  c2

      |   50     |  c2  |  c2  |  c2  | none |  c1  | c1/2
---------------------------------------------------------------------


Why you introduce variables iterate and contains?
You could simply swap c1 with c2...

Shortest code for (2) with minimal swapping:

         int c1size, c2size;
         if ((c1size= c1.size())== 0 || (c2size= c2.size())== 0) {
             // At least one collection is empty. Nothing will match.
             return true;
         }
         if (!(c2 instanceof Set)) Optimize:{
             // As Set's contains() impl predictably performs better (< O(N/2))
             // than mere Collection's, iterate over latter c2, except ...
             if (!(c1 instanceof Set)) {
                 // If both are mere collections, iterate over smaller collection.
                 // E.g. if c1 contains 3 elements and c2 contains 50 elements and
                 // assuming contains() requires (N+1)/2comparisons thenchecking
                 // for all c1 elements in c2 would require 76.5 comparisons
                 // vs. all c2 elements in c1 would require 100.
                 if (c1size <= c2size) {
                     break Optimize;
                 }
             }
             Collection<?> temp = c1;
             c1 = c2;
             c2 = temp;
         }

Shortest code for (3), on a par with (1) with minimal swapping:

         if (!(c2 instanceof Set)) Optimize:{
             // As Set's contains() impl predictably performs better(< O(N/2))
             // than mere Collection's, iterate over latter c2, except ...
             if (!(c1 instanceof Set)) {
                 // Both are mere collections.
                 int c1size, c2size;
                 if ((c1size= c1.size())== 0 || (c2size= c2.size())== 0) {
                     // At least one collection is empty. Nothing will match.
                     return true;
                 }
                 // Iterate over smaller collection.
                 // E.g. if c1 contains 3 elements and c2 contains 50 elements and
                 // assuming contains() requires (N+1)/2comparisons thenchecking
                 // for all c1 elements in c2 would require 76.5 comparisons
                 // vs. all c2 elements in c1 would require 100.
                 if (c1size <= c2size) {
                     break Optimize;
                 }
             }
             Collection<?> temp = c1;
             c1 = c2;
             c2 = temp;
         }

-Ulf



More information about the core-libs-dev mailing list