RFR (M): JDK-6394757: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes

Fri May 3 01:36:03 UTC 2019

Hi all,

Please review these spec and implementation changes to remove the "optimization" 
to AbstractSet.removeAll. Briefly, this method was specified (and implemented) 
to iterate one collection or the other depending on the relative sizes of the 
collections. The problem is that this would cause an unexpected semantic shift, 
since one or the other collection's contains() method would be called depending 
on their relative sizes, and the contains() methods might implement different 
semantics depending upon the kind of collection.

The fix is to remove the specification and implementation of 
AbstractSet.removeAll and to inherit AbstractCollection.removeAll, which does 
the iteration one way consistently.

I've removed overriding removeAll method implementations from IdentityHashMap's 
view sets which were added in order to avoid the "optimization" inherited from 
AbstractSet.removeAll.

I've added some words to the Collection interface to introduce the term 
"membership semantics" for a concept that's been around for a long time but 
which never had a name, essentially the contains() method. I've then updated the 
specifications of containsAll, removeAll, retainAll, and (where necessary) 
equals to specify which collection's membership semantics are used.

Finally, since this change may introduce some performance issues the 
optimization was intended to avoid, I've added some implementation notes to the 
various methods to warn about potential performance issues if this collection's 
(or the other's) contains() method is linear or worse.

There have been various discussions in the past (see JDK-8178425 for example) 
that propose optimizations to the various bulk operations. They usually involve 
modifying the decision criteria for iterating this collection vs iterating the 
other collection. However, as we concluded previously, doing this introduces 
semantic problems. Another approach to optimizating the bulk cases is to copy 
the other collection into an intermediate collection that has an O(1) contains() 
method; however, this also changes the semantics and thus must be ruled out. 
Such approaches should be left to the caller.

Bug:

     https://bugs.openjdk.java.net/browse/JDK-6394757

Previous discussions:

     http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-January/058140.html
     http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-February/058378.html

     (see also additional bugs and linked to the above bug)

Webrev:

     http://cr.openjdk.java.net/~smarks/reviews/6394757/webrev.0/

Thanks,

s'marks