JDK-6982173: Small problem causing thousands of wasted CPU hours

Stuart Marks stuart.marks at oracle.com
Mon Feb 25 19:06:15 UTC 2019



On 2/15/19 11:30 AM, Alan Snyder wrote:
> I think this situation is a mess.
> 
> The “general contract” of a Collection, I believe, is that it contains zero
> or more identified member objects, as defined by appropriate equals() method.
> I know this is hard to specify precisely, but I presume we all “know it when
> we see it”.
> 
> There is value to “collections” whose members are not objects but are
> equivalence classes of objects, as defined by a nonstandard equality test,
> but I think it is a mistake to call them Collections.
> 
> If a method takes a parameter of type Collection, it should be free to assume
> that the parameter object supports the “general contract” of Collection. Is
> there any plausible alternative?

[back from vacation]

Yes, this situation is something of a mess.

I think the original idea was that the "general contract" was that collections 
contained elements whose membership was defined by equals(). However, for a very 
long time now there have been collections in the JDK such as SortedSet and 
IdentityHashMap (more precisely, IdentityHashMap's keyset). I think we have to 
accept the fact that these are collections even though their membership 
semantics differ from those provided by equals().

> Changing one method in the JDK to support a non-standard Collection parameter
> does not solve the problem, because non-JDK methods with Collection (etc.)
> parameters could have similar “misbehavior”. How would the developer know
> when a specific TreeSet can or cannot be passed to a method? Does every
> method that accepts a Collection (etc.) parameter require boilerplate (as in
> the disjoint example) explaining its exact requirements or how it can go
> wrong?
> 
> Perhaps it would be useful to define specific behaviors that these
> nonstandard “collections” might support. For example, a Membership interface
> (with method contains(e)) would be perfect for a removeAll(Membership) method
> on Collection, implemented as you propose.
The wording on Collections.disjoint() is particularly clumsy. I'm not entirely 
sure what to do about it and its implementation; suggestions welcome. But I also 
don't think Collections.disjoint() is a very important problem, compared to 
what's going on in the rest of the framework.

Introducing a new interface type is basically a non-starter. Objects of type 
Collection get passed around a lot, and users can usefully rely on things like 
size(), they can iterate the Collection, etc.

Once we've admitted to the fact that different collections can have different 
membership semantics, the main point of this bug (JDK-6982173) is to fix up the 
semantics of AbstractSet.removeAll() to remove the semantic shifting that can 
occur based on the relative sizes of the collections.

There's another bug JDK-8190545 that needs to be fixed at some point, which is a 
larger scale cleanup to clarify and remove assumptions about collection 
membership. It starts out from the point that SortedSet and its ilk define 
membership using a Comparator; yet TreeSet.contains() still talks about using 
equals(), which is simply incorrect. Pulling on that thread for a while reveals 
a bunch of places in the spec that need adjustment. See the comments in the bug 
report and the linked email discussions.

After these are fixed, the specification and implementation will at least be 
self-consistent. However, it will still be the case that HashSet, TreeSet, and 
IdentityHashMap's keyset will have different membership semantics, and if you 
pass them around interchangeably you might get unexpected results. I think we 
just have to live with that.

s'marks


More information about the core-libs-dev mailing list