RFR [15] 6394757: rev2: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes

Stuart Marks stuart.marks at oracle.com
Fri May 8 20:49:27 UTC 2020



On 5/6/20 5:05 PM, Alan Snyder wrote:
> Let me clarify. I was referring to the non-support by the framework, which 
> restricts membership semantics:
> ...[specification text regarding equals()]...

OK, thanks for the clarification.

> Although the framework provides implementation classes that MAY be used to 
> create nonconforming collections, these classes are called out as potentially 
> non-conforming. For example:

[IdentityHashMap]

> *This class is /not/ a general-purpose |Map| implementation! While this class 
> implements the |Map| interface, it intentionally violates |Map's| general 
> contract, which mandates the use of the |equals| method when comparing objects. 
> This class is designed for use only in the rare cases wherein reference-equality 
> semantics are required.*
> 
> And in many places, the implications of non-conformance are also called out:
> 
> *While the object returned by this method implements the |Collection| interface, 
> it does /not/ obey |Collection's| general contract. Like its backing map, the 
> collection returned by this method defines element equality as 
> reference-equality rather than object-equality. This affects the behavior of its 
> |contains|, |remove| and |containsAll| methods.*
> 
> The phrase “affects the behavior” is open ended. It means all bets are off.

No, all bets are not off.

"All bets are off" isn't an unreasonable interpretation of the current wording 
of the specification, but in fact the specification is incredibly poorly worded 
and confusing. Plus there are several out-and-out bugs in the spec.

It's also somewhat odd to say that some collection implementations are 
conforming whereas others aren't. Implementations can have bugs that render them 
non-conformant to a specification. In this case, though, parts of the 
specification contradict each other. You could pick part A of the spec and say 
that part B is "non-conformant" but that really doesn't make sense if you 
consider the totality of the specification. The fact is that the spec as a whole 
is self-inconsistent. I intend to fix that.

> My point is that (fully) supporting custom membership semantics changes a 
> fundamental property of the original design and should be approached as a 
> redesign. It is not a matter of fixing a series of issues one at a time, as you 
> discover them. Calling these issues semantic bugs when there is no violation of 
> the specification is leading you down the wrong path, in my opinion.

The issue of membership semantics is part of the original design.

If you look at the history, the collections framework was added in JDK 1.2, and 
the JDK 1.2 specification includes both Set and SortedSet. There is the wording 
in the SortedSet specification that is very similar to what exists today: 
[edited for brevity]

     Note that the ordering maintained by a sorted set must be consistent with
     equals if the sorted set is to correctly implement the Set interface.
     This is so because the Set interface is defined in terms of the equals
     operation, but a sorted set performs all element comparisons using its
     compareTo (or compare) method. The behavior of a sorted set is well-defined
     even if its ordering is inconsistent with equals; it just fails to obey the
     general contract of the Set interface.

The original designers (including Josh Bloch, and probably others) clearly 
thought about these issues enough to put that wording in the specification. I 
don't think they thought through the full implications of this, or at least they 
didn't write it down, hence the vague wording above.

In addition, the issue was forgotten, and over time, bugs were introduced in the 
system that made things worse. For example, the "optimization" that is the root 
cause of the bug we are discussing (JDK-6394757) was introduced in JDK 1.3. In 
the comments on this bug report [1] Bloch is quoted as saying

     The spec clearly states that removeAll "Removes all this collection's
     elements that are also contained in the specified collection." So the
     current behavior is not just unpredictable, it's broken. It worked until
     1.3, and was broken in 1.3-1.5. Sigh...

     (And indeed, AbstractCollection.removeAll does precisely what the
     Collection.removeAll spec demands)

[1] 
https://bugs.openjdk.java.net/browse/JDK-6394757?focusedCommentId=12242803&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12242803

Another bug, a spec bug this time, was introduced later. (I haven't searched the 
history to find out exactly when.) The TreeSet.add() method specification says 
this: [2]

     Adds the specified element to this set if it is not already present. More
     formally, adds the specified element e to this set if the set contains no
     element e2 such that Objects.equals(e, e2). If this set already contains the
     element, the call leaves the set unchanged and returns false.

[2] 
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/TreeSet.html#add(E)

This is simply wrong, as TreeSet membership is determined by the comparison 
method, not by equals().

> What would (full) support for custom membership semantics look like? What 
> restrictions would you place on the contains() method? What axioms would you 
> specify for the generic operations? Is substitutability tossed out the window? 
> What does it mean for an object to be a member of a set (according to contains) 
> but not be returned by the iterator or included in the size? What should 
> Set.copyOf() do if its argument is a non-conforming collection? etc. etc.

I think it's fair to raise all of these questions if "all bets are off" but 
indeed all bets are not off. I think the root of the problem is in the statement 
from the specification,

     The behavior of a sorted set is well-defined even if its ordering is
     inconsistent with equals; it just fails to obey the general contract
     of the Set interface.

This statement is essentially impossible to understand without looking at the 
history, various bugs (such as this one), the design intent of the collections 
framework, etc. What it really means is the following.

A SortedSet being "well-defined" answers most of your questions above. The 
obvious axioms all still apply. If you add an element and add() returns true, 
then contains() on that element should return true, and iterating the set should 
at some point return that element, etc.

The "general contract" phrase that appears in several places doesn't mean "the 
totality of the Set interface" but instead it means specifically what we've been 
calling the "membership contract" of the set. That is,

     Sets contain no pair of elements e1 and e2 such that e1 $ e2.

Where $ is:

     compare(e1, e2) == 0   for SortedSet
     ==                     for sets built from IdentityHashMap
     equals()               for Sets excluding the above

Once it's clear that "general contract" really means membership contract, many 
things become clearer. Most bets in fact are still on. Issues arise only when 
there are cases where there are two sets at play, and the question of which 
set's membership contract is in use needs to be specified. For copy 
constructors, Set.copyOf, and addAll(), it's clear that elements are added 
according to the receiver's membership contract. Other methods require a bit 
more analysis:

     containsAll()
     equals()
     removeAll()
     retainAll()

The containsAll() and equals() methods both use the membership contract of the 
receiver, not the argument. Unfortunately, the equals() specification says,

     Returns true if the specified object is also a set, the two sets have the
     same size, and every member of the specified set is contained in this set
     (or equivalently, every member of this set is contained in the specified
     set).

As should be clear from this discussion, the "equivalently" clause is incorrect 
-- another spec bug.

Finally, removeAll() and retainAll() use the membership semantics of the 
argument. I've discussed this elsewhere in this thread.

To summarize, fixing this bug is part of a larger effort to clean up various 
ambiguities and bugs in the specifications, as well as bugs in implementations. 
The fact that I'm proceeding incrementally shouldn't be interpreted to mean that 
I'm fixing semantic issues one by one as I come across them.

> I understand that trade-offs are necessary, and if I understand correctly, the 
> CSR process should review intentional regressions. Do you agree?

Yes, I will definitely file a CSR for this, to include specification changes as 
well as the behavioral and performance tradeoffs.

s'marks



More information about the core-libs-dev mailing list