RFR (M): JDK-6394757: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes
Stuart Marks
stuart.marks at oracle.com
Fri May 3 01:36:03 UTC 2019
Hi all,
Please review these spec and implementation changes to remove the "optimization"
to AbstractSet.removeAll. Briefly, this method was specified (and implemented)
to iterate one collection or the other depending on the relative sizes of the
collections. The problem is that this would cause an unexpected semantic shift,
since one or the other collection's contains() method would be called depending
on their relative sizes, and the contains() methods might implement different
semantics depending upon the kind of collection.
The fix is to remove the specification and implementation of
AbstractSet.removeAll and to inherit AbstractCollection.removeAll, which does
the iteration one way consistently.
I've removed overriding removeAll method implementations from IdentityHashMap's
view sets which were added in order to avoid the "optimization" inherited from
AbstractSet.removeAll.
I've added some words to the Collection interface to introduce the term
"membership semantics" for a concept that's been around for a long time but
which never had a name, essentially the contains() method. I've then updated the
specifications of containsAll, removeAll, retainAll, and (where necessary)
equals to specify which collection's membership semantics are used.
Finally, since this change may introduce some performance issues the
optimization was intended to avoid, I've added some implementation notes to the
various methods to warn about potential performance issues if this collection's
(or the other's) contains() method is linear or worse.
There have been various discussions in the past (see JDK-8178425 for example)
that propose optimizations to the various bulk operations. They usually involve
modifying the decision criteria for iterating this collection vs iterating the
other collection. However, as we concluded previously, doing this introduces
semantic problems. Another approach to optimizating the bulk cases is to copy
the other collection into an intermediate collection that has an O(1) contains()
method; however, this also changes the semantics and thus must be ruled out.
Such approaches should be left to the caller.
Bug:
https://bugs.openjdk.java.net/browse/JDK-6394757
Previous discussions:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-January/058140.html
http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-February/058378.html
(see also additional bugs and linked to the above bug)
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/6394757/webrev.0/
Thanks,
s'marks
More information about the core-libs-dev
mailing list