RFR (M): JDK-6394757: rev1: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes
Peter Levart
peter.levart at gmail.com
Fri May 17 06:48:34 UTC 2019
Hi Alan,
I can sympathize with the performance loss aspect of this patch, but I
nevertheless think it is a move in the right direction. You must admit
that the performance optimization of AbstractSet.removeAll was a rather
shaky one (dependent on the relative sizes of the two collections) and
as such very prone to sudden performance spikes that could occur for
seemingly unknown reason. More importantly, the fact that the semantics
of the method could change drastically dependent on such thing as
relative sizes of the two collections was inexplicable. You may argue
that the semantics of the method is undefined if called upon or passed
an ersatz Collection instance, but even the undefined semantics may be
consistent - i.e. independent of such things as relative sizes of both
collections.
It would be a perfect world if this optimization was not there to begin
with. People would eventually find bottlenecks using this method and
change their code to use explicit iteration or such. There may be
perfectly valid code out there that uses conformant collection
implementations that don't cause AbstractSet.removeAll to behave
inconsistently and such code after this patch may perform badly. But it
will perform consistently for cases that use ersatz Collections and that
is more important in my opinion.
As Stuart says, optimizations mustn't change semantics. So is there a
possibly narrower optimization, that doesn't change the semantics?
Here's a sketch of one:
- create a marker interface (could be JDK-internal) and mark all
conforming Set implementations with it
- use AbstractSet.removeAll optimization only when both collections
implement the marker interface
Regards, Peter
On 5/17/19 7:06 AM, Alan Snyder wrote:
> Hi Stuart,
>
> I believe I already accepted the fact of ersatz Collections.
>
> Could you explain the inconsistency in the specification that affects removeAll()? I don’t see it.
>
> In the current specification, the classes that can create ersatz Collections all call out the ersatz Collections in their documentation.
>
> SortedSet:
> Note that the ordering maintained by a sorted set (whether or not an
> explicit comparator is provided) must be <i>consistent with equals</i> if
> the sorted set is to correctly implement the {@code Set} interface.
> The behavior of a sorted set <i>is</i> well-defined even if its
> ordering is inconsistent with equals; it just fails to obey the general
> contract of the {@code Set} interface.
>
> TreeSet is similar
>
> IdentityHashMap:
>
> This class is <i>not</i> a general-purpose {@code Map} implementation! While this class implements the {@code Map} interface, it
> intentionally violates {@code Map's} general contract, which mandates the
> use of the {@code equals} method when comparing objects.
>
> IdentityHashMap.keySet:
>
> While the object returned by this method implements the
> {@code Set} interface, it does <i>not</i> obey {@code Set's} general
> contract. Like its backing map, the set returned by this method
> defines element equality as reference-equality rather than
> object-equality. This affects the behavior of its {@code contains},
> {@code remove}, {@code containsAll}, {@code equals}, and
> {@code hashCode} methods.
>
> The logical implication of not supporting the general contract is that Collection operations performed on an ersatz Collection may have
> unexpected (implementation specific) effects, which implies that passing one of these ersatz Collections to a parameter of a Collection
> type may have unexpected (implementation specific) effects.
>
> In other words, whether removeAll is performed on an ersatz Collection or the parameter is an ersatz Collection, the effects may be
> implementation specific and inconsistent with the specification of removeAll.
>
> That is not to say that improving the design would not have benefit, of course it does. The bug reports you cite prove that.
> However, the compatibility impact of the change needs to be taken into account.
>
> Alan
>
>
>
>> On May 16, 2019, at 8:42 PM, Stuart Marks <stuart.marks at oracle.com> wrote:
>>
>> Hi Alan,
>>
>> Whether you call them "ersatz" collections, "non-standard" collections, or collections with different membership semantics, they have been around in the collections framework from day one. This is perhaps unfortunate, and it may be considered to be bad design, but it is a fact.
>>
>> The issue is not with undefined or implementation-specific behavior. The issue is that the specification as it stands is self-contradictory. In particular, it assumes that certain operations are symmetric that in fact are not, if you read other parts of the spec and derive logical conclusions from them. This is what I'm trying to fix. This is not a mere a matter of intellectual consistency. This state of affairs has real consequences. There is a whole family of bugs linked to this one which complain both about the performance issue and the semantic issue. All of these bugs are directly related to the self-contradictory nature of the current specification. With my changes, the spec is self-contradictory in fewer places. (There is, however, more work to be done; see JDK-8190545.)
>>
>> Your lead point, though, is about performance. The performance optimization of AbstractSet.removeAll is one of those that assumes that the operation is symmetric, when in fact it isn't. As a rule, optimizations mustn't change semantics. Therefore, it has to go.
>>
>> s'marks
>>
More information about the core-libs-dev
mailing list