RFR (M): JDK-6394757: rev1: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes

Peter Levart peter.levart at gmail.com
Fri May 17 06:48:34 UTC 2019


Hi Alan,

I can sympathize with the performance loss aspect of this patch, but I 
nevertheless think it is a move in the right direction. You must admit 
that the performance optimization of AbstractSet.removeAll was a rather 
shaky one (dependent on the relative sizes of the two collections) and 
as such very prone to sudden performance spikes that could occur for 
seemingly unknown reason. More importantly, the fact that the semantics 
of the method could change drastically dependent on such thing as 
relative sizes of the two collections was inexplicable. You may argue 
that the semantics of the method is undefined if called upon or passed 
an ersatz Collection instance, but even the undefined semantics may be 
consistent  - i.e. independent of such things as relative sizes of both 
collections.

It would be a perfect world if this optimization was not there to begin 
with. People would eventually find bottlenecks using this method and 
change their code to use explicit iteration or such. There may be 
perfectly valid code out there that uses conformant collection 
implementations that don't cause AbstractSet.removeAll to behave 
inconsistently and such code after this patch may perform badly. But it 
will perform consistently for cases that use ersatz Collections and that 
is more important in my opinion.

As Stuart says, optimizations mustn't change semantics. So is there a 
possibly narrower optimization, that doesn't change the semantics?

Here's a sketch of one:
- create a marker interface (could be JDK-internal) and mark all 
conforming Set implementations with it
- use AbstractSet.removeAll optimization only when both collections 
implement the marker interface


Regards, Peter

On 5/17/19 7:06 AM, Alan Snyder wrote:
> Hi Stuart,
>
> I believe I already accepted the fact of ersatz Collections.
>
> Could you explain the inconsistency in the specification that affects removeAll()? I don’t see it.
>
> In the current specification, the classes that can create ersatz Collections all call out the ersatz Collections in their documentation.
>
> SortedSet:
> Note that the ordering maintained by a sorted set (whether or not an
> explicit comparator is provided) must be <i>consistent with equals</i> if
> the sorted set is to correctly implement the {@code Set} interface.
> The behavior of a sorted set <i>is</i> well-defined even if its
> ordering is inconsistent with equals; it just fails to obey the general
> contract of the {@code Set} interface.
>
> TreeSet is similar
>
> IdentityHashMap:
>
> This class is <i>not</i> a general-purpose {@code Map} implementation!  While this class implements the {@code Map} interface, it
> intentionally violates {@code Map's} general contract, which mandates the
> use of the {@code equals} method when comparing objects.
>
> IdentityHashMap.keySet:
>
> While the object returned by this method implements the
> {@code Set} interface, it does <i>not</i> obey {@code Set's} general
> contract.  Like its backing map, the set returned by this method
> defines element equality as reference-equality rather than
> object-equality.  This affects the behavior of its {@code contains},
> {@code remove}, {@code containsAll}, {@code equals}, and
> {@code hashCode} methods.
>
> The logical implication of not supporting the general contract is that Collection operations performed on an ersatz Collection may have
> unexpected (implementation specific) effects, which implies that passing one of these ersatz Collections to a parameter of a Collection
> type may have unexpected (implementation specific) effects.
>
> In other words, whether removeAll is performed on an ersatz Collection or the parameter is an ersatz Collection, the effects may be
> implementation specific and inconsistent with the specification of removeAll.
>
> That is not to say that improving the design would not have benefit, of course it does. The bug reports you cite prove that.
> However, the compatibility impact of the change needs to be taken into account.
>
>    Alan
>
>
>
>> On May 16, 2019, at 8:42 PM, Stuart Marks <stuart.marks at oracle.com> wrote:
>>
>> Hi Alan,
>>
>> Whether you call them "ersatz" collections, "non-standard" collections, or collections with different membership semantics, they have been around in the collections framework from day one. This is perhaps unfortunate, and it may be considered to be bad design, but it is a fact.
>>
>> The issue is not with undefined or implementation-specific behavior. The issue is that the specification as it stands is self-contradictory. In particular, it assumes that certain operations are symmetric that in fact are not, if you read other parts of the spec and derive logical conclusions from them. This is what I'm trying to fix. This is not a mere a matter of intellectual consistency. This state of affairs has real consequences. There is a whole family of bugs linked to this one which complain both about the performance issue and the semantic issue. All of these bugs are directly related to the self-contradictory nature of the current specification. With my changes, the spec is self-contradictory in fewer places. (There is, however, more work to be done; see JDK-8190545.)
>>
>> Your lead point, though, is about performance. The performance optimization of AbstractSet.removeAll is one of those that assumes that the operation is symmetric, when in fact it isn't. As a rule, optimizations mustn't change semantics. Therefore, it has to go.
>>
>> s'marks
>>



More information about the core-libs-dev mailing list