RFR [15] 6394757: rev2: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes
Stuart Marks
stuart.marks at oracle.com
Fri May 8 20:49:27 UTC 2020
On 5/6/20 5:05 PM, Alan Snyder wrote:
> Let me clarify. I was referring to the non-support by the framework, which
> restricts membership semantics:
> ...[specification text regarding equals()]...
OK, thanks for the clarification.
> Although the framework provides implementation classes that MAY be used to
> create nonconforming collections, these classes are called out as potentially
> non-conforming. For example:
[IdentityHashMap]
> *This class is /not/ a general-purpose |Map| implementation! While this class
> implements the |Map| interface, it intentionally violates |Map's| general
> contract, which mandates the use of the |equals| method when comparing objects.
> This class is designed for use only in the rare cases wherein reference-equality
> semantics are required.*
>
> And in many places, the implications of non-conformance are also called out:
>
> *While the object returned by this method implements the |Collection| interface,
> it does /not/ obey |Collection's| general contract. Like its backing map, the
> collection returned by this method defines element equality as
> reference-equality rather than object-equality. This affects the behavior of its
> |contains|, |remove| and |containsAll| methods.*
>
> The phrase “affects the behavior” is open ended. It means all bets are off.
No, all bets are not off.
"All bets are off" isn't an unreasonable interpretation of the current wording
of the specification, but in fact the specification is incredibly poorly worded
and confusing. Plus there are several out-and-out bugs in the spec.
It's also somewhat odd to say that some collection implementations are
conforming whereas others aren't. Implementations can have bugs that render them
non-conformant to a specification. In this case, though, parts of the
specification contradict each other. You could pick part A of the spec and say
that part B is "non-conformant" but that really doesn't make sense if you
consider the totality of the specification. The fact is that the spec as a whole
is self-inconsistent. I intend to fix that.
> My point is that (fully) supporting custom membership semantics changes a
> fundamental property of the original design and should be approached as a
> redesign. It is not a matter of fixing a series of issues one at a time, as you
> discover them. Calling these issues semantic bugs when there is no violation of
> the specification is leading you down the wrong path, in my opinion.
The issue of membership semantics is part of the original design.
If you look at the history, the collections framework was added in JDK 1.2, and
the JDK 1.2 specification includes both Set and SortedSet. There is the wording
in the SortedSet specification that is very similar to what exists today:
[edited for brevity]
Note that the ordering maintained by a sorted set must be consistent with
equals if the sorted set is to correctly implement the Set interface.
This is so because the Set interface is defined in terms of the equals
operation, but a sorted set performs all element comparisons using its
compareTo (or compare) method. The behavior of a sorted set is well-defined
even if its ordering is inconsistent with equals; it just fails to obey the
general contract of the Set interface.
The original designers (including Josh Bloch, and probably others) clearly
thought about these issues enough to put that wording in the specification. I
don't think they thought through the full implications of this, or at least they
didn't write it down, hence the vague wording above.
In addition, the issue was forgotten, and over time, bugs were introduced in the
system that made things worse. For example, the "optimization" that is the root
cause of the bug we are discussing (JDK-6394757) was introduced in JDK 1.3. In
the comments on this bug report [1] Bloch is quoted as saying
The spec clearly states that removeAll "Removes all this collection's
elements that are also contained in the specified collection." So the
current behavior is not just unpredictable, it's broken. It worked until
1.3, and was broken in 1.3-1.5. Sigh...
(And indeed, AbstractCollection.removeAll does precisely what the
Collection.removeAll spec demands)
[1]
https://bugs.openjdk.java.net/browse/JDK-6394757?focusedCommentId=12242803&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12242803
Another bug, a spec bug this time, was introduced later. (I haven't searched the
history to find out exactly when.) The TreeSet.add() method specification says
this: [2]
Adds the specified element to this set if it is not already present. More
formally, adds the specified element e to this set if the set contains no
element e2 such that Objects.equals(e, e2). If this set already contains the
element, the call leaves the set unchanged and returns false.
[2]
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/TreeSet.html#add(E)
This is simply wrong, as TreeSet membership is determined by the comparison
method, not by equals().
> What would (full) support for custom membership semantics look like? What
> restrictions would you place on the contains() method? What axioms would you
> specify for the generic operations? Is substitutability tossed out the window?
> What does it mean for an object to be a member of a set (according to contains)
> but not be returned by the iterator or included in the size? What should
> Set.copyOf() do if its argument is a non-conforming collection? etc. etc.
I think it's fair to raise all of these questions if "all bets are off" but
indeed all bets are not off. I think the root of the problem is in the statement
from the specification,
The behavior of a sorted set is well-defined even if its ordering is
inconsistent with equals; it just fails to obey the general contract
of the Set interface.
This statement is essentially impossible to understand without looking at the
history, various bugs (such as this one), the design intent of the collections
framework, etc. What it really means is the following.
A SortedSet being "well-defined" answers most of your questions above. The
obvious axioms all still apply. If you add an element and add() returns true,
then contains() on that element should return true, and iterating the set should
at some point return that element, etc.
The "general contract" phrase that appears in several places doesn't mean "the
totality of the Set interface" but instead it means specifically what we've been
calling the "membership contract" of the set. That is,
Sets contain no pair of elements e1 and e2 such that e1 $ e2.
Where $ is:
compare(e1, e2) == 0 for SortedSet
== for sets built from IdentityHashMap
equals() for Sets excluding the above
Once it's clear that "general contract" really means membership contract, many
things become clearer. Most bets in fact are still on. Issues arise only when
there are cases where there are two sets at play, and the question of which
set's membership contract is in use needs to be specified. For copy
constructors, Set.copyOf, and addAll(), it's clear that elements are added
according to the receiver's membership contract. Other methods require a bit
more analysis:
containsAll()
equals()
removeAll()
retainAll()
The containsAll() and equals() methods both use the membership contract of the
receiver, not the argument. Unfortunately, the equals() specification says,
Returns true if the specified object is also a set, the two sets have the
same size, and every member of the specified set is contained in this set
(or equivalently, every member of this set is contained in the specified
set).
As should be clear from this discussion, the "equivalently" clause is incorrect
-- another spec bug.
Finally, removeAll() and retainAll() use the membership semantics of the
argument. I've discussed this elsewhere in this thread.
To summarize, fixing this bug is part of a larger effort to clean up various
ambiguities and bugs in the specifications, as well as bugs in implementations.
The fact that I'm proceeding incrementally shouldn't be interpreted to mean that
I'm fixing semantic issues one by one as I come across them.
> I understand that trade-offs are necessary, and if I understand correctly, the
> CSR process should review intentional regressions. Do you agree?
Yes, I will definitely file a CSR for this, to include specification changes as
well as the behavioral and performance tradeoffs.
s'marks
More information about the core-libs-dev
mailing list