RFR [15] 6394757: rev2: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes
Jason Mehrens
jason_mehrens at hotmail.com
Sat May 2 05:41:35 UTC 2020
1. I assume you are using "c instanceof List" instead of "!(c instanceof Set)" to correctly handle IdentitityHashMap.values()? The instanceof List seems like safe choice but it is too bad we can still fool that check by wrapping List as an unmodifiableCollection. If splitIterator().characteristics() could tell us that the collection used identity comparisons I think we would be able to determine if it was safe to swap the ordering in the general case as we could check for IDENTITY, SORTED, and comparator equality.
2. Should code applied to HashSet.removeAll also be applied to HashMap.keySet().removeAll and HashMap.entrySet().removeAll? Collections::newSetFromMap will end up having different behavior if it is not consistently applied.
3. Not to derail this thread but do think we need a new JIRA ticket to address Collections.disjoint? Seems like it has similar sins of calling size and using "!(c2 instanceof Set)" but I don't want to muddy the waters by covering any solutions to that method in this thread.
Jason
________________________________________
From: core-libs-dev <core-libs-dev-bounces at openjdk.java.net> on behalf of Stuart Marks <stuart.marks at oracle.com>
Sent: Friday, May 1, 2020 3:01 PM
To: core-libs-dev
Subject: RFR [15] 6394757: rev2: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes
Hi all,
I'm taking another swing at this one. I've taken a minimal approach compared to
my previous proposals.
Briefly, AbstractSet.removeAll switches from iterating this collection and
calling contains() on its argument, to iterating the argument and calling
this.contains(), if the argument collection is smaller than this collection.
This attempt at optimization is incorrect, because this collection and the
argument collection might have different semantics for contains().
There is a confounding performance problem, which is that if the argument is a
List, contains() is generally linear, which can result in pathologically slow
(O(m*n)) performance. Because of the iteration-switching above, this performance
problem can appear and disappear based on the relative sizes, which is startling.
The fix for the semantic problem is simply to remove the attempted optimization
from AbstractSet. This comports with the specification of Collection.removeAll
and Set.removeAll, which imply that contains() is called on the argument
collection. This allows sets to inherit the implementation in
AbstractCollection.removeAll. In addition, this ensures that removeAll is now
always the complement of retainAll (as it should be), which it sometimes was not
when the optimization was in place.
IdentityHashMap's keySet and entrySet views were broken by this optimization, so
they had to override removeAll with copies of the implementation from
AbstractCollection. Since they can now inherit from AbstractCollection, these
overrides have been removed.
This leaves a potential performance problem with removeAll when the argument is
a List. To mitigate this, HashSet.removeAll switches to iterating the argument
if the argument is a List. This is safe, as both HashSet and List use the same
semantics for contains() (at least, as far as I know).
I've opted not to pursue size-based optimizations at this time, since they
provide only incremental benefit, whereas the pathological performance problem
mentioned above is the primary issue. Also, it's actually quite difficult to
determine when it's safe to switch iteration.
Finally, I've added performance notes to the specifications of containsAll(),
removeAll(), and retainAll(), warning about potential performance problems that
can occur with repeated calls to contains() made by these methods.
Bug:
https://bugs.openjdk.java.net/browse/JDK-6394757
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/6394757/webrev.2/
Previous discussions:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-July/007125.html
http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-January/058140.html
http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-February/058378.html
http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-May/060007.html
http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-May/060147.html
Thanks,
s'marks
More information about the core-libs-dev
mailing list