RFR [15] 6394757: rev2: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes

Stuart Marks stuart.marks at oracle.com
Tue May 12 20:34:18 UTC 2020



On 5/8/20 6:46 PM, Jason Mehrens wrote:
> I assume Deque/Queue would suffer the same issue as List.  It would be nice to some how ensure ArrayDeque.contains is not called as far as the heuristic goes.  Looks like PriorityQueue uses equals for contains but that is not to say there are not other queue implementations in the wild that do something different.

The difficulty here is how far to take the heuristic. The goal here is not to 
guarantee that behavior can never be O(M*N) but to catch what seem to be the 
most common cases. I think the most common case is where the argument is a List, 
but there are lots of possibilities that we'd inevitably miss.

> The more I think about it in general, it seems like your task would be easier if you could declare:
> 1. AbstractCollection.removeAll should assume this.contains is O(N) and iterate over this and check contains on arg.
> 2. AbstractSet.removeAll should iterate over argument and assume that this.contains/remove is at least O(log (N)) and assume argument.contains is O(N).
> 3. Concrete implementations of removeAll know the cost of their contains method.  If it is known to be O(N) then walk this otherwise walk the arg.
> 4. Include an example in removeAll that says if membership semantics differ between sets use: source.removeIf(e -> removals.contains(e)); or removals.forEach(e -> source.remove(e)); instead.  If they don't differ then use removeAll.

This sort of makes sense from a performance standpoint, but it doesn't address 
the semantic issues I raised in my reply the other day to Jens Lideström. 
Specifically, we don't want Set.removeAll or AbstractSet.removeAll to disagree 
with Collection.removeAll and AbstractCollection.removeAll. We also want 
removeAll() to be the complement of retainAll().

> Given this has been around since JDK 1.3 would it be safe to assume that code that is mixing collections with different membership semantics is already working around this issue and therefore the risk is a bit lower in changing the spec?  Lots of concrete implementations already override removeAll anyway.

I'm not sure that's true. People keep running into either the performance 
problem or the semantic problem. Sometimes people live with bugs for a long time 
and can't figure it out. "Yeah, removeAll sometimes gives this weird result. I 
don't know why. Maybe I just don't understand Java."

s'marks


More information about the core-libs-dev mailing list