JDK-6982173: Small problem causing thousands of wasted CPU hours
Stuart Marks
stuart.marks at oracle.com
Mon Feb 25 19:06:15 UTC 2019
On 2/15/19 11:30 AM, Alan Snyder wrote:
> I think this situation is a mess.
>
> The “general contract” of a Collection, I believe, is that it contains zero
> or more identified member objects, as defined by appropriate equals() method.
> I know this is hard to specify precisely, but I presume we all “know it when
> we see it”.
>
> There is value to “collections” whose members are not objects but are
> equivalence classes of objects, as defined by a nonstandard equality test,
> but I think it is a mistake to call them Collections.
>
> If a method takes a parameter of type Collection, it should be free to assume
> that the parameter object supports the “general contract” of Collection. Is
> there any plausible alternative?
[back from vacation]
Yes, this situation is something of a mess.
I think the original idea was that the "general contract" was that collections
contained elements whose membership was defined by equals(). However, for a very
long time now there have been collections in the JDK such as SortedSet and
IdentityHashMap (more precisely, IdentityHashMap's keyset). I think we have to
accept the fact that these are collections even though their membership
semantics differ from those provided by equals().
> Changing one method in the JDK to support a non-standard Collection parameter
> does not solve the problem, because non-JDK methods with Collection (etc.)
> parameters could have similar “misbehavior”. How would the developer know
> when a specific TreeSet can or cannot be passed to a method? Does every
> method that accepts a Collection (etc.) parameter require boilerplate (as in
> the disjoint example) explaining its exact requirements or how it can go
> wrong?
>
> Perhaps it would be useful to define specific behaviors that these
> nonstandard “collections” might support. For example, a Membership interface
> (with method contains(e)) would be perfect for a removeAll(Membership) method
> on Collection, implemented as you propose.
The wording on Collections.disjoint() is particularly clumsy. I'm not entirely
sure what to do about it and its implementation; suggestions welcome. But I also
don't think Collections.disjoint() is a very important problem, compared to
what's going on in the rest of the framework.
Introducing a new interface type is basically a non-starter. Objects of type
Collection get passed around a lot, and users can usefully rely on things like
size(), they can iterate the Collection, etc.
Once we've admitted to the fact that different collections can have different
membership semantics, the main point of this bug (JDK-6982173) is to fix up the
semantics of AbstractSet.removeAll() to remove the semantic shifting that can
occur based on the relative sizes of the collections.
There's another bug JDK-8190545 that needs to be fixed at some point, which is a
larger scale cleanup to clarify and remove assumptions about collection
membership. It starts out from the point that SortedSet and its ilk define
membership using a Comparator; yet TreeSet.contains() still talks about using
equals(), which is simply incorrect. Pulling on that thread for a while reveals
a bunch of places in the spec that need adjustment. See the comments in the bug
report and the linked email discussions.
After these are fixed, the specification and implementation will at least be
self-consistent. However, it will still be the case that HashSet, TreeSet, and
IdentityHashMap's keyset will have different membership semantics, and if you
pass them around interchangeably you might get unexpected results. I think we
just have to live with that.
s'marks
More information about the core-libs-dev
mailing list