JDK-6982173: Small problem causing thousands of wasted CPU hours

Jan Peter Stotz jpstotz at gmx.de
Sat Feb 2 17:42:24 UTC 2019


Hi Stuart.

It looks like the web sites and services I use when developing and those 
you use are fundamentally different or may be we use just different 
search engines, because when I have programming problems I usually not 
end up on Twitter or Reddit or DZone. And the existence of this mailing 
list seems to be hidden to the net or it's SE ranking is so bad that I 
can't remember that I ever found a link to it in the last years (before 
knowing it's existence and explicitly searching for it).
However when I search most likely the majority of links point to 
Stackoverflow.com.

Searching Stackoverflow for something like "[Java] [performance] 
Hashset" you immediatly get to the question
https://stackoverflow.com/questions/28671903 which has 65 upvotes and a 
view count of 8,085. For a 5 years old question this is IMHO pretty 
impressive.

Also one have to keep in mind that we are talking about a performance 
problem that requires usually an experienced developer and large data 
sets before the delay becomes so apparently that you start searching for 
the reason.

Your proposed replacement for AbstractSet.removeAll() looks nice and I 
agree with you regarding the arguments you pointed out on what 
collection used as "this" and what as argument. That should be clearly a 
decision of the developer.

That the change makes AbstractSet.removeAll "an outlier" is in my 
opinion only a problem from a very high-level perspective. Only people 
not aware of the differences between all the Collection implementations 
and the reason why they exist may expect to have a removeAll() 
implementation that works identical for every Collection implementation.

However regular programmers from my experience rarely come to a point 
where they use Collection. Personally for me Collection is more or less 
a synonym for Iterable because iterating over a Collection is one of the 
few properties that work without major (potential) performance issues 
for all implementations.

Jan

Am 25.01.2019 um 01:07 schrieb Stuart Marks:
> I wasn't aware that hundreds of developers are hitting this bug every 
> year. I haven't seen any mention of it (besides in the bug database) on 
> Twitter, on Reddit, on DZone, at the conferences I attend, or in several 
> years of core-libs-dev emails. Well, it was mentioned on core-libs-dev 
> once in 2011 [1] although that was a suggestion for improvement, not a 
> complaint about performance.
> 
> Nonetheless this is a real problem, and if it's causing difficulties we 
> can certainly take a look at it.
> 
> There is, however, more to the story. The ACTUAL problem is a semantic 
> one; see JDK-6394757. [2] Briefly, consider x.removeAll(y). Depending on 
> the relative sizes of x and y, this method might end up using either x's 
> or y's definition of membership, which could differ from each other. 
> (See the bug report for an example.) Thus the semantics of this method 
> depend upon the relative sizes of the collections, which is arguably 
> flawed.
> 
> Worse, this behavior is specified to iterate this set or the argument, 
> depending upon their relative sizes. [3] So, fixing this will require an 
> incompatible specification change.
> 
> The obvious way to fix this is to get rid of the "optimizations" (that 
> turn out not to be optimizations at all in some cases) and replace it 
> with a simple loop:
> 
>      public boolean removeAll(Collection<?> c) {
>          Objects.requireNonNull(c);
>          boolean modified = false;
>          for (Object e : c)
>              modified |= remove(e);
>          return modified;
>      }
> 
> I would argue that iterating the argument and calling remove() on "this" 
> are the right semantics, because you want set membership to be 
> determined by this set, not by whatever collection you pass as an 
> argument. However, I note that AbstractCollection.removeAll and other 
> removeAll implementations iterate over "this" and do a contains() check 
> on the argument. The revised AbstractSet.removeAll would be an outlier 
> here, though it makes sense to me to do it this way.
> 
> Is it worth the incompatibility?
> 
> s'marks
> 
> 
> 
> 
> [1] 
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-July/007125.html
> 
> [2] https://bugs.openjdk.java.net/browse/JDK-6394757
> 
> [3] 
> https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/AbstractSet.html#removeAll(java.util.Collection) 
> 
> 
> 



More information about the core-libs-dev mailing list