Review for CR 6728865 : Improved heuristics for Collections.disjoint() [updated]

Tue Jan 4 05:06:53 UTC 2011

Am 04.01.2011 02:24, schrieb Rémi Forax:
> On 01/04/2011 02:02 AM, Ulf Zibis wrote:
>> For the same reason you could set the empty-check at 1st place. This would result in the most 
>> simple (and fast) code (see my before post).
>> - in all implementations, I have seen, "isEmpty()" is equivalent to "size() == 0"
>
> It's semantically equivalent by definition but the complexities aren't the same.
> see 
> http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/1c72adc9d5f3/src/share/classes/java/util/concurrent/ConcurrentSkipListMap.java
> near line 1740
Yes, of coarse, but ...

For Collections.disjoint(c1, c2), having n1 = c1.size(), n2 = c2.size()
Cost at minimun:
- c1.isEmpty() || c2.isEmpty()
Cost at worst:
- n1 iterations over c1 for computing c1.size()
- if n1 != 0: n2 iterations over c2 for computing c2.size()
Benefit:
- if n1 == 0 or n2 == 0: instatiation of c1.iterator() or n1 * (iteration + c2.contains())
- if n1 > n2: (n1 - n2)/2 * (iteration + comparison)
- if n1 <= n2: nothing
Note: comparison is likely much more expensive than iteration

As the optimization against the collection's sizes only makes sense if at least 1 collection is very 
small, it would be advantageous to have a size(int max) method, which internally should iterate at 
most max times instead of size times.
In the meantime we could do a workaround by iterating let's say at most 5 steps on each collection 
to check it's size.
Similar: Resultset.sizeIsLessThan(int n).

BTW: containsKey() and size() could be simplified by introducing doFind(V value) similar to doGet(K 
key), similar for ConcurrentHashMap

> or
> http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/1c72adc9d5f3/src/share/classes/java/util/concurrent/ConcurrentLinkedQueue.java 
>
> near line 400
Thanks for the additional example.

> or
> http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/1c72adc9d5f3/src/share/classes/java/util/concurrent/ConcurrentHashMap.java 
>
> near line 700.
ConcurrentHashMap is NOT a Collection!

>> - "cost a small amount in others" applies rarely; the only implementation, I have seen, where 
>> "Set.size() can be expensive", is ConcurrentSkipListSet (are there others?).
>
> By example, Collections.newSetFromMap(new ConcurrentHashMap<>).
As substitute for missing ConcurrentHashSet.

> Also don't forget all Sets that map JDBC results.
JDBC results size computation could be much faster. See: 
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6409445

The few collections, for which "Set.size() can be expensive" holds, are from concurrent package or 
wrap a JDBC result.
Can't we check that?

-Ulf