RFR (M): JDK-6394757: rev1: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes

Stuart Marks stuart.marks at oracle.com
Mon May 20 23:53:48 UTC 2019


On 5/16/19 11:48 PM, Peter Levart wrote:
> As Stuart says, optimizations mustn't change semantics. So is there a possibly 
> narrower optimization, that doesn't change the semantics?
> 
> Here's a sketch of one:
> - create a marker interface (could be JDK-internal) and mark all conforming Set 
> implementations with it
> - use AbstractSet.removeAll optimization only when both collections implement 
> the marker interface

Hi Peter,

It's certainly reasonable for there to be some concern over the potential 
performance loss. I don't particularly like the fact that even as we're 
improving the semantics, that some programs will suffer a slowdown, even those 
that aren't actually affected by the semantic issues.

However, I'm not convinced that we really understand the performance issue. When 
this was brought up in January, [1] the claim was that this wasted thousands of 
CPU hours. I'm skeptical. To back this up, in a followup message [2] the 
original poster linked to a Stack Overflow question [3] that turns out to have 
been derived from an old article by Jon Skeet [4]. To be sure, this is an 
interesting technical discussion, but it doesn't count as evidence that this is 
an actual performance problem.

If you look at the example closely, it turns out that it calls removeAll() and 
passes an argument a List with 300,000 elements. If you're doing operations on 
large lists, maybe you shouldn't be surprised to see a slowdown. Maybe instead 
you should have been surprised (in the iteration-switched case) that it ran 
reasonably fast at all.

Nonetheless it remains a possibility that some code that worked properly and 
performed reasonably well will suffer a slowdown. One way -- and I stress that 
this is only one way -- that performance might be improved is to switch the 
collection upon which iteration is done. The linked bug reports have a bunch of 
suggestions for doing this, including checking to see whether the argument is a 
List. To these I'd add the stipulation that, the iteration can be switched if it 
can be determined that it is *safe* to do so; and the safety depends on whether 
the two collections use compatible membership semantics.

Collections that use equals() are compatible with each other; keysets of 
IdentityHashMap are compatible with each other; SortedSets that use identical 
comparison methods are compatible with each other. But note, SortedSets that use 
a comparisons methods that are consistent with equals are also compatible with 
collections that use equals(). And it seems likely to be undecidable whether two 
different (!=) comparison methods are in fact compatible.

In any case it would interesting to try to chew on this notion of compatible 
membership semantics for a while. It might be difficult to come up with a 
complete and correct test. It's probably easier to come up with an 
approximation, like ConcurrentSkipListMap's equals() method. [5] I'm not sure 
such a discussion would be fruitful, though, until we're sure we understand how 
much of a performance problem this actually is.

s'marks




[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-January/058140.html

[2] http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-February/058378.html

[3] 
https://stackoverflow.com/questions/28671903/the-hashsett-removeall-method-is-surprisingly-slow

[4] 
https://codeblog.jonskeet.uk/2010/07/29/there-s-a-hole-in-my-abstraction-dear-liza-dear-liza/

[5] 
http://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/java.base/share/classes/java/util/concurrent/ConcurrentSkipListMap.java#l1706


More information about the core-libs-dev mailing list