AbstractCollection.removeAll(Collection) and AbstractSet.removeAll(Collection)
Mike Duigou
mike.duigou at oracle.com
Wed Jul 13 20:34:43 UTC 2011
It looks to me that AbstractCollection could benefit from a c.isEmpty() check but that's the only optimization I currently see. It's necessary to iterate the target collection because values can be repeated for some collection types.
I agree that AbstractSet.removeAll appears to make a poor judgement that the cost of this.contains() and c.contains() are similar. I'd like to know if there are common usage patterns where iteration over c isn't the best choice. (adding isEmpty() checks for both this and c of course).
Josh? Martin? Doug? Any history why AbstractSet has this optimization?
Mike
On Jul 13 2011, at 13:02 , Jeff Hain wrote:
>
>> It doesn't work because the complexity of size() can be O(n).
>> In my opinion, only the checks for emptyness are Ok.
>
> The calls to "size()" are already there in current implementation,
> and I think with the idea that in most cases they are O(1).
> If they are O(n), it's of course pointless to use them, since it
> makes the whole treatment at least O(max(n1,n2)) (with n1 = this.size()
> and n2 = c.size()), while simply iterating on "c" without checking
> sizes makes it O(n2) (considering this.remove(Object) to be in O(1) (*)).
> But, then, the whole treatment remains linear at worse, and if,
> as it is often the case, they are O(1), it allows to make it
> O(min(n2,n1)) if "c" is also a set.
> Considering sizes therefore allows to have O(max(n1,n2)) instead
> of O(n2) in rare cases, and O(min(n1,n2)) instead of O(n2) in a fair
> amount of cases, which can be considered a win. Also, if size()'s
> are O(n), you are most likely doing concurrency, so you are smart,
> know what you do, and can make your own removeAll ;)
> What seems to have been overlooked in current implementation,
> is that this optimization, that works for sets, makes things actually
> worse if "c" is a list, in which case it's best to always iterate on
> "c", iterating on "this" resulting in a quadratic behavior.
>
> To make things simple, and have a treatment always in O(n2),
> we could just stick to iterating on "c" without checking sizes:
>
> public boolean AbstractSet.removeAll(Collection<?> c) {
> if (isEmpty() || c.isEmpty()) {
> return false;
> }
> boolean modified = false;
> for (Iterator<?> i = c.iterator(); i.hasNext(); ) {
> if (remove(i.next())) {
> modified = true;
> if (isEmpty()) {
> break;
> }
> }
> }
> return modified;
> }
>
> (*) We could refine considering the case of tree sets.
>
>
>
>> The overriden equals method should be no problem
>> if the Implementation is an set. So it should be
>> possible to exit in this case in the AbstractSet
>> Implementation, isn't it?
>
> For AbstractCollection.removeAll(Collection),
> that wouldn't work, since each instance to remove
> in "c" can have multiple occurrences in "this".
>
> For AbstractSet.removeAll(Collection),
> that wouldn't work either, because if "c" is
> based on "equals", and "c" is an identity set
> (based on "=="), multiple elements of "this"
> could be equal to a same element of "c",
> and removing as many as "c" contains doesn't
> mean you can't find more elements in "this"
> that would be equal to elements of "c".
>
> A best effort implementation, that should
> work if collections are not sets of different
> kinds, would look like this:
>
> public boolean AbstractSet.removeAll(Collection<?> c) {
> if (isEmpty() || c.isEmpty()) {
> return false;
> }
> boolean modified = false;
> final int n1 = size();
> final int n2 = c.size();
> final int cContainsAssumedComplexity = (c instanceof Set<?>) ? 1 : n2;
> if (n2 <= n1 * (long)cContainsAssumedComplexity) {
> for (Iterator<?> i = c.iterator(); i.hasNext(); ) {
> if (remove(i.next())) {
> modified = true;
> if (isEmpty()) {
> break;
> }
> }
> }
> } else {
> // Here, we know "c" is a Set
> // (else, we tested "n2 <= n1 * (long)n2",
> // which is always true for n1 > 0 and n2 > 0).
> if (n2 != Integer.MAX_VALUE) {
> // Will stop iterating once
> // we removed from "this" as
> // many elements as "c" contains.
> // XXX Does not work if "this" is an identity set,
> // and "c" based on "equals".
> int toRemove = n2;
> for (Iterator<?> i = iterator(); i.hasNext(); ) {
> if (c.contains(i.next())) {
> i.remove();
> modified = true;
> if (--toRemove == 0) {
> break;
> }
> }
> }
> } else {
> // "c" might contain more than Integer.MAX_VALUE
> // elements, so we can't break early.
> for (Iterator<?> i = iterator(); i.hasNext(); ) {
> if (c.contains(i.next())) {
> i.remove();
> modified = true;
> }
> }
> }
> }
> return modified;
> }
>
>
>
> Jeff
>
>
>
> ________________________________
> De : Rémi Forax <forax at univ-mlv.fr>
> À : core-libs-dev at openjdk.java.net
> Envoyé le : Mer 13 juillet 2011, 18h 27min 48s
> Objet : Re: AbstractCollection.removeAll(Collection) and
> AbstractSet.removeAll(Collection)
>
> It doesn't work because the complexity of size() can be O(n).
> In my opinion, only the checks for emptyness are Ok.
>
> Rémi
>
>
> On 07/13/2011 05:40 PM, Sebastian Sickelmann wrote:
>> Jeff Hain wrote:
>>> Hello.
>>> I have some remarks about the methods named in the subject (JDK 6u26, JDK
>>> 7b147).
>>> 1) AbstractCollection.removeAll(Collection)
>>> This method could return immediately if the specified collection is
>>> empty, not to iterate uselessly over all "this", or if "this" is empty, not
> to
>>> create an iterator for nothing.
>>> I also thought about returning as soon as as many elements as the
>>> specified collection contains have been removed, but it would not behave
>>> properly with collections containing more than Integer.MAX_VALUE elements,
>>> or with some overriden equals methods etc.
>> The overriden equals method should be no problem if the Implementation is an
>> set. So it should be possible to exit in this case in the AbstractSet
>> Implementation, isn't it?
>>
>>> 2) AbstractSet.removeAll(Collection)
>>> Let n1 be the size of "this", and n2 the size of the specified
> collection.
>>> This method is in O(n1*n2) if the specified collection has a
>>> contains(Object)
>>> method in O(n2) (as an ArrayList does), and n1<= n2.
>>> I think a better implementation could be done by taking care of the
>>> "setness" of the specified collection, and taking care of collections sizes
>>> differently.
>>> Also:
>>> - when iterating on the specified collection, the method could return as
>>> soon as "this" is empty, not to continue useless iterations.
>>> - when iterating on "this", the method could return if the specified
>>> collection is empty, not to iterate over all "this" uselessly.
>>>
>>> # Current implementation:
>>> public boolean removeAll(Collection<?> c) {
>>> boolean modified = false;
>>> if (size()> c.size()) {
>>> for (Iterator<?> i = c.iterator(); i.hasNext(); )
>>> modified |= remove(i.next());
>>> } else {
>>> for (Iterator<?> i = iterator(); i.hasNext(); ) {
>>> if (c.contains(i.next())) {
>>> i.remove();
>>> modified = true;
>>> }
>>> }
>>> }
>>> return modified;
>>> }
>>>
>>> # Alternative implementation (not tested):public boolean
>>> removeAll(Collection<?> c) {
>>> // Quick check on emptinesses, to make later treatments simpler.
>>> if (isEmpty() || c.isEmpty()) {
>>> return false;
>>> }
>>> boolean modified = false;
>>> final int n1 = size();
>>> final int n2 = c.size();
>>> // If the "c" is not a set, we are pessimistic about its "contains"
>>> // method
>>> // complexity.
>>> final int cContainsAssumedComplexity = (c instanceof Set<?>) ? 1 : n2;
>>> // If we iterate over "this", assumed complexity is O(n2).
>>> // If we iterate over "c", assumed complexity is O(n1 *
>>> cContainsAssumedComplexity).
>>> // Cast to long to avoid overflow.
>>> if (n2< n1 * (long)cContainsAssumedComplexity) {
>>> for (Iterator<?> i = c.iterator(); i.hasNext(); ) {
>>> if (remove(i.next())) {
>>> modified = true;
>>> if (isEmpty()) {
>>> break;
>>> }
>>> }
>>> }
>>> } else {
>>> for (Iterator<?> i = iterator(); i.hasNext(); ) {
>>> if (c.contains(i.next())) {
>>> i.remove();
>>> modified = true;
>>> }
>>> }
>>> }
>>> return modified;
>>> }
>> look good to me.
>>
>>>
>>>
>>> Regards,
>>>
>>> Jeff
>>>
More information about the core-libs-dev
mailing list