Review for CR 6728865 : Improved heuristics for Collections.disjoint()
Mike Duigou
mike.duigou at oracle.com
Sun Dec 19 22:42:39 UTC 2010
On Dec 17 2010, at 17:02 , Rémi Forax wrote:
> On 12/18/2010 01:31 AM, Mike Duigou wrote:
>> I've posted a webrev for review at
>>
>> http://cr.openjdk.java.net/~mduigou/6728865.0/webrev/
>>
>> which improves the behaviour of Collections.disjoint() when the collection c1 is not a Set and is larger than c2.
>>
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6728865
>>
>> I've included some other micro-optimizations suggested by the issue and by common usage.
>>
>> One optimization, the checks whether either collection is empty using isEmpty(), may slightly degrade performance relative to the current implementation but should be a good tradeoff for cases where either of the collections are empty.
>>
>> I've also upgraded the javadoc to newer style conventions and included the missing @return.
>>
>> Any comments or feedback welcome.
>>
>> Mike
>
> Hi Mike,
> I think that comparing size() is not a good idea because
> - for some collections, size() is not a constant operation
Understood. I use isEmpty() in the case where only one of c1 and c2 is a set for this reason.
> - you compare size() when c1 and c2 are sets which
> may cause a performance regression because
> disjoint(treeSet, hashSet) has not the same complexity as
> disjoint(hashSet, treeSet).
This is a good point but perhaps indicates that it might be worth adding further handling that prioritizes which collection is used for contains()
The order of preference would seem to be :
Collection : O(unspecified but probably no worse than N/2)
Set : O(hopefully < N/2)
TreeSet : O(log2 N)
HashSet : O(1 + (1-loading))
There's always an issue though when c1 and c2 are of very unequal size particularly when either collection is very small. I wonder if there are cases where checking the relative size() is worth the cost.
For now I will remove the size check in the case where both are Sets.
> Otherwise, you declare some local variables final.
> Declaring a local variable as final doesn't appear in the generated bytecode.
> The usual convention for the source of the JDK is to use final
> in front of a local variable only when needed (inner class).
I agree it's overkill and unnecessary. I will remove it. I had put it in while checking for reassignment in an intermediate revision.
Mike
More information about the core-libs-dev
mailing list