RFR [15] 6394757: rev2: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes
Stuart Marks
stuart.marks at oracle.com
Tue May 12 20:34:18 UTC 2020
On 5/8/20 6:46 PM, Jason Mehrens wrote:
> I assume Deque/Queue would suffer the same issue as List. It would be nice to some how ensure ArrayDeque.contains is not called as far as the heuristic goes. Looks like PriorityQueue uses equals for contains but that is not to say there are not other queue implementations in the wild that do something different.
The difficulty here is how far to take the heuristic. The goal here is not to
guarantee that behavior can never be O(M*N) but to catch what seem to be the
most common cases. I think the most common case is where the argument is a List,
but there are lots of possibilities that we'd inevitably miss.
> The more I think about it in general, it seems like your task would be easier if you could declare:
> 1. AbstractCollection.removeAll should assume this.contains is O(N) and iterate over this and check contains on arg.
> 2. AbstractSet.removeAll should iterate over argument and assume that this.contains/remove is at least O(log (N)) and assume argument.contains is O(N).
> 3. Concrete implementations of removeAll know the cost of their contains method. If it is known to be O(N) then walk this otherwise walk the arg.
> 4. Include an example in removeAll that says if membership semantics differ between sets use: source.removeIf(e -> removals.contains(e)); or removals.forEach(e -> source.remove(e)); instead. If they don't differ then use removeAll.
This sort of makes sense from a performance standpoint, but it doesn't address
the semantic issues I raised in my reply the other day to Jens Lideström.
Specifically, we don't want Set.removeAll or AbstractSet.removeAll to disagree
with Collection.removeAll and AbstractCollection.removeAll. We also want
removeAll() to be the complement of retainAll().
> Given this has been around since JDK 1.3 would it be safe to assume that code that is mixing collections with different membership semantics is already working around this issue and therefore the risk is a bit lower in changing the spec? Lots of concrete implementations already override removeAll anyway.
I'm not sure that's true. People keep running into either the performance
problem or the semantic problem. Sometimes people live with bugs for a long time
and can't figure it out. "Yeah, removeAll sometimes gives this weird result. I
don't know why. Maybe I just don't understand Java."
s'marks
More information about the core-libs-dev
mailing list