RFR [15] 6394757: rev2: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes
Jens Lideström
jens.lidestrom at fripost.org
Sat May 2 18:26:21 UTC 2020
I will chip in once more on this topic. (I did so once before [1], perhaps
in a much too wordy manner.)
The proposed changes in AbstractSet.removeAll will be inherited by
TreeSet. A TreeSet might use a comparator as a custom equality test.
Because of this it seems backwards if TreeSet.removeAll where to use the
equality semantics of the argument instead of the ones of the receiver,
which will be the case with the proposed changes. It seems much more
intuitive and regular to instead use the equality semantics of the receiver.
The same thing applies to other collections with custom equality tests.
It seems to be surprising and error prone to users if TreeSet where to
behave like this:
var s = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
s.addAll(Set.of("A", "B"));
s.remove("a"); // Removes "A" as expected
s.removeAll(List.of("b")); // Ops, doesn't remove "B"!
Many other common methods use the equality semantics of the receiver:
containsAll, equals, remove, contains. To let TreeSet.removeAll be
inconsistent with all these methods seems worse than it being inconsistent
with the less commonly used TreeSet.retainAll.
The conclusion is that I think it would be better if TreeSet or
AbstractSet gets an implementation of removeAll which iterates over the
argument collection.
Best regards,
Jens Lideström
[1]:
https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-July/061581.html
On 2020-05-01 22:01, Stuart Marks wrote:
> Hi all,
>
> I'm taking another swing at this one. I've taken a minimal approach
> compared to my previous proposals.
>
> Briefly, AbstractSet.removeAll switches from iterating this collection and
> calling contains() on its argument, to iterating the argument and calling
> this.contains(), if the argument collection is smaller than this
> collection. This attempt at optimization is incorrect, because this
> collection and the argument collection might have different semantics for
> contains().
>
> There is a confounding performance problem, which is that if the argument
> is a List, contains() is generally linear, which can result in
> pathologically slow (O(m*n)) performance. Because of the
> iteration-switching above, this performance problem can appear and
> disappear based on the relative sizes, which is startling.
>
> The fix for the semantic problem is simply to remove the attempted
> optimization from AbstractSet. This comports with the specification of
> Collection.removeAll and Set.removeAll, which imply that contains() is
> called on the argument collection. This allows sets to inherit the
> implementation in AbstractCollection.removeAll. In addition, this ensures
> that removeAll is now always the complement of retainAll (as it should
> be), which it sometimes was not when the optimization was in place.
>
> IdentityHashMap's keySet and entrySet views were broken by this
> optimization, so they had to override removeAll with copies of the
> implementation from AbstractCollection. Since they can now inherit from
> AbstractCollection, these overrides have been removed.
>
> This leaves a potential performance problem with removeAll when the
> argument is a List. To mitigate this, HashSet.removeAll switches to
> iterating the argument if the argument is a List. This is safe, as both
> HashSet and List use the same semantics for contains() (at least, as far
> as I know).
>
> I've opted not to pursue size-based optimizations at this time, since they
> provide only incremental benefit, whereas the pathological performance
> problem mentioned above is the primary issue. Also, it's actually quite
> difficult to determine when it's safe to switch iteration.
>
> Finally, I've added performance notes to the specifications of
> containsAll(), removeAll(), and retainAll(), warning about potential
> performance problems that can occur with repeated calls to contains() made
> by these methods.
>
> Bug:
>
> https://bugs.openjdk.java.net/browse/JDK-6394757
>
> Webrev:
>
> http://cr.openjdk.java.net/~smarks/reviews/6394757/webrev.2/
>
> Previous discussions:
>
>
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-July/007125.html
>
>
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-January/058140.html
>
>
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-February/058378.html
>
>
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-May/060007.html
>
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-May/060147.html
>
> Thanks,
>
> s'marks
>
More information about the core-libs-dev
mailing list