Review for CR 6728865 : Improved heuristics for Collections.disjoint()

Mike Duigou mike.duigou at oracle.com
Sun Dec 19 22:42:39 UTC 2010


On Dec 17 2010, at 17:02 , Rémi Forax wrote:

> On 12/18/2010 01:31 AM, Mike Duigou wrote:
>> I've posted a webrev for review at
>> 
>> http://cr.openjdk.java.net/~mduigou/6728865.0/webrev/
>> 
>> which improves the behaviour of Collections.disjoint() when the collection c1 is not a Set and is larger than c2.
>> 
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6728865
>> 
>> I've included some other micro-optimizations suggested by the issue and by common usage.
>> 
>> One optimization, the checks whether either collection is empty using isEmpty(), may slightly degrade performance relative to the current implementation but should be a good tradeoff for cases where either of the collections are empty.
>> 
>> I've also upgraded the javadoc to newer style conventions and included the missing @return.
>> 
>> Any comments or feedback welcome.
>> 
>> Mike
> 
> Hi Mike,
> I think that comparing size() is not a good idea because
> - for some collections, size() is not a constant operation

Understood. I use isEmpty() in the case where only one of c1 and c2 is a set for this reason.

> - you compare size() when c1 and c2 are sets which
>  may cause a performance regression because
>  disjoint(treeSet, hashSet) has not the same complexity as
>  disjoint(hashSet, treeSet).

This is a good point but perhaps indicates that it might be worth adding further handling that prioritizes which collection is used for contains()

The order of preference would seem to be :

Collection : O(unspecified but probably no worse than N/2)
Set        : O(hopefully < N/2) 
TreeSet    : O(log2 N) 
HashSet    : O(1 + (1-loading))

There's always an issue though when c1 and c2 are of very unequal size particularly when either collection is very small. I wonder if there are cases where checking the relative size() is worth the cost.

For now I will remove the size check in the case where both are Sets.

> Otherwise, you declare some local variables final.
> Declaring a local variable as final doesn't appear in the generated bytecode.
> The usual convention for the source of the JDK is to use final
> in front of a local variable only when needed (inner class).

I agree it's overkill and unnecessary. I will remove it. I had put it in while checking for reassignment in an intermediate revision.

Mike


More information about the core-libs-dev mailing list