JDK-6982173: Small problem causing thousands of wasted CPU hours

Stuart Marks stuart.marks at oracle.com
Thu Feb 14 02:44:28 UTC 2019



On 2/8/19 5:13 AM, Tagir Valeev wrote:
>> I would argue that iterating the argument and calling remove() on "this" are the
>> right semantics, because you want set membership to be determined by this set,
>> not by whatever collection you pass as an argument. However, I note that
>> AbstractCollection.removeAll and other removeAll implementations iterate over
>> "this" and do a contains() check on the argument. The revised
>> AbstractSet.removeAll would be an outlier here, though it makes sense to me to
>> do it this way.
> 
> For complete picture it should be noted that there's a slight
> difference in remove and removeAll spec: remove removes at most one
> element while removeAll removes all elements from the specified
> collection.
> E.g. c.removeAll(Collections.singleton(foo)) would remove all
> instances of foo from c while c.remove(foo) would return only one foo.
> These should be equivalent for Set where repeating elements should not
> normally appear, but it's wrong for any Collection. That's why
> AbstractCollection.removeAll
> cannot be rewritten in the same way.

Yes, this is a good point. This increases my concern over my earlier proposal 
for AbstractSet.removeAll being an "outlier".

The main issue here is whose semantics for containership are used. I had 
proposed iterating "arg" and calling this.remove() because that makes sense in 
isolation. However, AbstractCollection iterates "this" and calls arg.contains(). 
As you note, the specification implies that it must be done this way. This 
conflicts with my AbstractSet proposal.

In addition, there is the matter of retainAll(). You can't iterate "arg" and 
call this.remove(), since you need to remove elements that *aren't* present in 
"arg". At least, you can't without building an intermediate data structure, 
which seems out of bounds here. So retainAll() pretty much needs to iterate 
"this" and call arg.contains().

Based on this, I've changed my mind. AbstractSet.removeAll() really should do 
the same thing as AbstractCollection.removeAll(), namely to iterate "this" and 
call arg.contains(). In fact, AbstractSet can simply inherit the 
AbstractCollection implementation, and its specification adjusted accordingly.

Unfortunately, this means that code like the following:

     someSet.removeAll(someList)

will have O(m*n) performance, since it will repeatedly call someList.contains(). 
This doesn't alleviate the performance concern. However, it's at least specified 
and predictable, so one can work around it if necessary.

Also,

     set1.removeAll(set2)

will unconditionally use set2's semantics for containership. This is a bit 
counterintuitive, as one might expect set1's semantics to be used. But it needs 
to be this way to be consistent with Collection.removeAll(), 
Collection.retainAll(), and Set.retainAll(). And at least it's predictable, and 
it doesn't vary based on the relative sizes of the sets!

> Another interesting thing is
> java.util.IdentityHashMap.KeySet#removeAll implementation [1]:
> it reimplements the AbstractCollection#removeAll, because of the same
> reason: now
> the first branch of AbstractSet#removeAll becomes incorrect. See the
> comment before it:
> 
>          /*
>           * Must revert from AbstractSet's impl to AbstractCollection's, as
>           * the former contains an optimization that results in incorrect
>           * behavior when c is a smaller "normal" (non-identity-based) Set.
>           */
> 
> Btw this comment should be updated to remove a "smaller" condition if
> the proposed
> change for AbstractSet will be implemented.
> 
> [1] http://hg.openjdk.java.net/jdk/jdk/file/e57bcfd7bf79/src/java.base/share/classes/java/util/IdentityHashMap.java#l990

Also a good point. In fact, with my current thinking, this comment and the 
override can be removed, since it can be inherited from AbstractCollection. 
(This also applies to EntrySet later in the same file.)

**

I also note that ConcurrentHashMap's CollectionView nested class [1] has a 
removeAll() implementation that uses a similar (but slightly different) 
heuristic for deciding whether to iterate "this" or "arg". I think this 
heuristic should be removed, since it suffers from the same semantic shift 
depending on relative sizes.

I further note that ConcurrentSkipListSet.removeAll() overrides 
AbstractSet.removeAll() in order to avoid an expensive call to size(). But it 
always iterates its arg and removes from this, which is backwards from 
everywhere else. (But at least it doesn't do any size-based "optimization".) I 
think this is simply a bug. I've filed JDK-8218945 [2] to cover this.

**

If we all agree on this, I'll proceed with working up a changeset for 
JDK-6394757.[3] It's time to fix this one.

s'marks


[1] 
http://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/java.base/share/classes/java/util/concurrent/ConcurrentHashMap.java#l4538

[2] https://bugs.openjdk.java.net/browse/JDK-8218945

[3] https://bugs.openjdk.java.net/browse/JDK-6394757


More information about the core-libs-dev mailing list