JDK-6982173: Small problem causing thousands of wasted CPU hours

Alan Snyder javalists at cbfiddle.com
Mon Feb 25 20:33:56 UTC 2019


> I think the original idea was that the "general contract" was that collections contained elements whose membership was defined by equals(). However, for a very long time now there have been collections in the JDK such as SortedSet and IdentityHashMap (more precisely, IdentityHashMap's keyset). I think we have to accept the fact that these are collections even though their membership semantics differ from those provided by equals().

Perhaps we must accept their placement in the type hierarchy for compatibility reasons, but it seems reasonable to me to declare these exceptional classes to be nonstandard, which means that their behavior when used as a Collection might be unexpected.

An alternative, perhaps, would be to add methods that describe the nature of the membership, e.g. whether based on equals/hash or a comparator (or something else?), which would then become part of the general contract. You’ll probably call this a non-starter :-)

> After these are fixed, the specification and implementation will at least be self-consistent. However, it will still be the case that HashSet, TreeSet, and IdentityHashMap's keyset will have different membership semantics, and if you pass them around interchangeably you might get unexpected results. I think we just have to live with that.


I’m not sure what you mean by interchangeably. A TreeSet could misbehave when passed to a TreeSet if the method assumes the “general contract” is satisfied by its parameter. Or are you adding a special-case requirement that every method on a Collection that takes a Collection parameter must behave properly when the actual argument is an instance of the same class, or something like that?

> Once we've admitted to the fact that different collections can have different membership semantics, the main point of this bug (JDK-6982173) is to fix up the semantics of AbstractSet.removeAll() to remove the semantic shifting that can occur based on the relative sizes of the collections.

If we declare that collections can have different membership semantics and still satisfy the general contract of Collection, then we need to come up with a new specification of the general contract.

However, if we specify that membership based on anything other than equals/hash is nonstandard, then this “bug” is not a bug. It is a valid implementation on the assumption that the parameter satisfies the general contract of the parameter type. The behavior that you call a semantic shift, I would call an unexpected behavior resulting from using a nonstandard collection where a Collection was expected. I also believe that the performace penalty of fixing this bug is “astonishing”. One could argue whether the performance penalty is more or less astonishing than the unexpected behavior when a non-standard collection is used as the parameter.

> Introducing a new interface type is basically a non-starter.


Hopefully, what you mean is that introducting a new interface type and incompatibly rearranging the type hierarchy is a non-starter.

I still think it would be useful to:

1. introduce a Membership interface

2. have Collection extend Membership

3. add a method to Collection: removeAllMembers(Membership)

> Objects of type Collection get passed around a lot, and users can usefully rely on things like size(), they can iterate the Collection, etc.

Except when they are implementing something like removeAll()? They can iterate but not assume that they are getting all the objects that are members of the collection, but instead they are getting representative objects?

  Alan


> On Feb 25, 2019, at 11:06 AM, Stuart Marks <stuart.marks at oracle.com> wrote:
> 
> 
> 
> On 2/15/19 11:30 AM, Alan Snyder wrote:
>> I think this situation is a mess.
>> The “general contract” of a Collection, I believe, is that it contains zero
>> or more identified member objects, as defined by appropriate equals() method.
>> I know this is hard to specify precisely, but I presume we all “know it when
>> we see it”.
>> There is value to “collections” whose members are not objects but are
>> equivalence classes of objects, as defined by a nonstandard equality test,
>> but I think it is a mistake to call them Collections.
>> If a method takes a parameter of type Collection, it should be free to assume
>> that the parameter object supports the “general contract” of Collection. Is
>> there any plausible alternative?
> 
> [back from vacation]
> 
> Yes, this situation is something of a mess.
> 
> I think the original idea was that the "general contract" was that collections contained elements whose membership was defined by equals(). However, for a very long time now there have been collections in the JDK such as SortedSet and IdentityHashMap (more precisely, IdentityHashMap's keyset). I think we have to accept the fact that these are collections even though their membership semantics differ from those provided by equals().
> 
>> Changing one method in the JDK to support a non-standard Collection parameter
>> does not solve the problem, because non-JDK methods with Collection (etc.)
>> parameters could have similar “misbehavior”. How would the developer know
>> when a specific TreeSet can or cannot be passed to a method? Does every
>> method that accepts a Collection (etc.) parameter require boilerplate (as in
>> the disjoint example) explaining its exact requirements or how it can go
>> wrong?
>> Perhaps it would be useful to define specific behaviors that these
>> nonstandard “collections” might support. For example, a Membership interface
>> (with method contains(e)) would be perfect for a removeAll(Membership) method
>> on Collection, implemented as you propose.
> The wording on Collections.disjoint() is particularly clumsy. I'm not entirely sure what to do about it and its implementation; suggestions welcome. But I also don't think Collections.disjoint() is a very important problem, compared to what's going on in the rest of the framework.
> 
> Introducing a new interface type is basically a non-starter. Objects of type Collection get passed around a lot, and users can usefully rely on things like size(), they can iterate the Collection, etc.
> 
> Once we've admitted to the fact that different collections can have different membership semantics, the main point of this bug (JDK-6982173) is to fix up the semantics of AbstractSet.removeAll() to remove the semantic shifting that can occur based on the relative sizes of the collections.
> 
> There's another bug JDK-8190545 that needs to be fixed at some point, which is a larger scale cleanup to clarify and remove assumptions about collection membership. It starts out from the point that SortedSet and its ilk define membership using a Comparator; yet TreeSet.contains() still talks about using equals(), which is simply incorrect. Pulling on that thread for a while reveals a bunch of places in the spec that need adjustment. See the comments in the bug report and the linked email discussions.
> 
> After these are fixed, the specification and implementation will at least be self-consistent. However, it will still be the case that HashSet, TreeSet, and IdentityHashMap's keyset will have different membership semantics, and if you pass them around interchangeably you might get unexpected results. I think we just have to live with that.
> 
> s'marks
> 



More information about the core-libs-dev mailing list