JDK-6982173: Small problem causing thousands of wasted CPU hours
Stuart Marks
stuart.marks at oracle.com
Thu Feb 14 02:44:28 UTC 2019
On 2/8/19 5:13 AM, Tagir Valeev wrote:
>> I would argue that iterating the argument and calling remove() on "this" are the
>> right semantics, because you want set membership to be determined by this set,
>> not by whatever collection you pass as an argument. However, I note that
>> AbstractCollection.removeAll and other removeAll implementations iterate over
>> "this" and do a contains() check on the argument. The revised
>> AbstractSet.removeAll would be an outlier here, though it makes sense to me to
>> do it this way.
>
> For complete picture it should be noted that there's a slight
> difference in remove and removeAll spec: remove removes at most one
> element while removeAll removes all elements from the specified
> collection.
> E.g. c.removeAll(Collections.singleton(foo)) would remove all
> instances of foo from c while c.remove(foo) would return only one foo.
> These should be equivalent for Set where repeating elements should not
> normally appear, but it's wrong for any Collection. That's why
> AbstractCollection.removeAll
> cannot be rewritten in the same way.
Yes, this is a good point. This increases my concern over my earlier proposal
for AbstractSet.removeAll being an "outlier".
The main issue here is whose semantics for containership are used. I had
proposed iterating "arg" and calling this.remove() because that makes sense in
isolation. However, AbstractCollection iterates "this" and calls arg.contains().
As you note, the specification implies that it must be done this way. This
conflicts with my AbstractSet proposal.
In addition, there is the matter of retainAll(). You can't iterate "arg" and
call this.remove(), since you need to remove elements that *aren't* present in
"arg". At least, you can't without building an intermediate data structure,
which seems out of bounds here. So retainAll() pretty much needs to iterate
"this" and call arg.contains().
Based on this, I've changed my mind. AbstractSet.removeAll() really should do
the same thing as AbstractCollection.removeAll(), namely to iterate "this" and
call arg.contains(). In fact, AbstractSet can simply inherit the
AbstractCollection implementation, and its specification adjusted accordingly.
Unfortunately, this means that code like the following:
someSet.removeAll(someList)
will have O(m*n) performance, since it will repeatedly call someList.contains().
This doesn't alleviate the performance concern. However, it's at least specified
and predictable, so one can work around it if necessary.
Also,
set1.removeAll(set2)
will unconditionally use set2's semantics for containership. This is a bit
counterintuitive, as one might expect set1's semantics to be used. But it needs
to be this way to be consistent with Collection.removeAll(),
Collection.retainAll(), and Set.retainAll(). And at least it's predictable, and
it doesn't vary based on the relative sizes of the sets!
> Another interesting thing is
> java.util.IdentityHashMap.KeySet#removeAll implementation [1]:
> it reimplements the AbstractCollection#removeAll, because of the same
> reason: now
> the first branch of AbstractSet#removeAll becomes incorrect. See the
> comment before it:
>
> /*
> * Must revert from AbstractSet's impl to AbstractCollection's, as
> * the former contains an optimization that results in incorrect
> * behavior when c is a smaller "normal" (non-identity-based) Set.
> */
>
> Btw this comment should be updated to remove a "smaller" condition if
> the proposed
> change for AbstractSet will be implemented.
>
> [1] http://hg.openjdk.java.net/jdk/jdk/file/e57bcfd7bf79/src/java.base/share/classes/java/util/IdentityHashMap.java#l990
Also a good point. In fact, with my current thinking, this comment and the
override can be removed, since it can be inherited from AbstractCollection.
(This also applies to EntrySet later in the same file.)
**
I also note that ConcurrentHashMap's CollectionView nested class [1] has a
removeAll() implementation that uses a similar (but slightly different)
heuristic for deciding whether to iterate "this" or "arg". I think this
heuristic should be removed, since it suffers from the same semantic shift
depending on relative sizes.
I further note that ConcurrentSkipListSet.removeAll() overrides
AbstractSet.removeAll() in order to avoid an expensive call to size(). But it
always iterates its arg and removes from this, which is backwards from
everywhere else. (But at least it doesn't do any size-based "optimization".) I
think this is simply a bug. I've filed JDK-8218945 [2] to cover this.
**
If we all agree on this, I'll proceed with working up a changeset for
JDK-6394757.[3] It's time to fix this one.
s'marks
[1]
http://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/java.base/share/classes/java/util/concurrent/ConcurrentHashMap.java#l4538
[2] https://bugs.openjdk.java.net/browse/JDK-8218945
[3] https://bugs.openjdk.java.net/browse/JDK-6394757
More information about the core-libs-dev
mailing list