A behavior mismatch in AbstractCollection.toArray(T[] )

Mike Duigou mike.duigou at oracle.com
Wed Dec 14 05:03:03 UTC 2011


I agree that most people probably won't check whether the array returned is the array they provided. It is possible that they incorrectly assume it is. ie.

Collection<Integer> collection;
...
Integer[] foo = new Integer[collection.size()]; // of sufficient size for all elements
// at this point collection grows.
collection.toArray(foo); // this will work, right?
for(Integer each : foo) {  
   // wait! foo is empty, why?
}

If collection is mutated and increased in size between the creation of foo and the toArray operation then foo will be empty and the user probably won't know why the elements weren't copied. Actually, the elements were copied but the result of toArray containing those elements was ignored.

This usage seems like just a bad idiom though and is avoidable.

More worrying is the toArray() case where the resultant array is currently documented to hold only result elements and no extra nulls. If the collection shrinks then it is currently possible that the resulting array could very unexpectedly hold nulls.

Mike

On Dec 13 2011, at 05:30 , Sean Chou wrote:

> Sorry for the confuse. By "ok", I mean "compare the size of array which is
> going to be
> returned and the size of the specified array, and copy the elements
> into the specified
> array if it is larger and return the specified array."
> 
> Nothing is causing problem for now, I just found a mismatch. I think most
> guys will
> just use the returned array without checking if it's the specified one; and
> this is also
> why I think it may be possible to modify the behavior without causing
> problems.
> 
> And I think modifying ConcurrentHashMap is as dangerous as modifying
> AbstractCollection
> if people are relying on implementation, is this right? So it seems we can
> do nothing
> to the mismatch now...
> 
> 
> On Tue, Dec 13, 2011 at 8:06 PM, David Holmes <david.holmes at oracle.com>wrote:
> 
>> On 13/12/2011 9:18 PM, Sean Chou wrote:
>> 
>>> Hi ,
>>> 
>>> Is it possible to change the spec ? I found it is defined in
>>> java.utils.Collection interface. It would be easy for
>>> AbstractCollection to state that it is not designed for concurrent
>>> operations, and its subclass should take care of them.
>>> 
>> 
>> Such a disclaimer might be added to the spec for AbstractCollection but
>> that doesn't really change anything - it just makes observed behaviour less
>> surprising.
>> 
>> 
>>     However, I think the simplest way may be modifying toArray(T[])
>>> method for an additional check, which would work for most subclasses of
>>> AbstractCollection...
>>> Is that ok ?
>>> 
>> 
>> "ok" in what sense? Do you want to change the spec or just change the
>> current behaviour? If you do the latter and people rely on that
>> implementation rather than the spec then code will not be portable across
>> different implementations of the platform.
>> 
>> I would not want to see a change in behaviour without a change in
>> specification and I do not think the specification for AbstractCollection
>> can, or should be, changed. Just my opinion of course.
>> 
>> What is the concrete concurrent collection that you have a problem with?
>> If it is ConcurrentHashMap, as per the example, then perhaps
>> ConcurrentHashMap should be where a change is considered.
>> 
>> David
>> -----
>> 
>> 
>>> On Tue, Dec 13, 2011 at 3:41 PM, David Holmes <david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.**com <david.holmes at oracle.com>>> wrote:
>>> 
>>>   Hi Sean,
>>> 
>>> 
>>>   On 13/12/2011 5:21 PM, Sean Chou wrote:
>>> 
>>>       When I was reading the code of AbstractCollection.toArray(T[] ), I
>>>       found its behavior maybe different from the spec in multithread
>>>       environment. The spec says "If the collection fits in the specified
>>>       array, it is returned therein. Otherwise, a new array is allocated
>>>       with the runtime type of the specified array and the size of this
>>>       collection."  However, in multithread environment, it is not easy
>>> to
>>>       tell if the collection fits in the specified array, because the
>>>       items may be removed when toArray is copying.
>>> 
>>> 
>>>   Right. The problem is that AbstractCollection doesn't address
>>>   thread-safety or any other concurrency issues so doesn't account for
>>>   the collection growing or shrinking while the toArray snapshot is
>>>   being taken. Really the collection implementations that are designed
>>>   to support multiple threads should override toArray to make it clear
>>>   how it should behave. As it stands, in my opinion, it is more a
>>>   "quality of implementation" issue as to whether AbstractCollection
>>>   expends effort after creating the array to see if the array is
>>>   actually full or not; or whether after creating an array it turns
>>>   out it could have fit in the original.
>>> 
>>>   For a concurrent collection  I would write the spec for toArray
>>>   something like:
>>> 
>>>   "The current size of the collection is examined and if the
>>>   collection fits in the specified array it will be the target array,
>>>   else a new array is allocated based on that current size and it
>>>   becomes the target array. If the collection grows such that the
>>>   target array no longer fits then extra elements will not be copied
>>>   into the target array. If the collection shrinks then the target
>>>   array will contain null elements."
>>> 
>>>   Or for the last part "then the target array will be copied to a new
>>>   array that exactly fits the number of elements returned".
>>> 
>>>   David Holmes
>>>   ------------
>>> 
>>> 
>>> 
>>>            Here is a testcase:
>>>       //////////////////////////////**__////////////////////////////**
>>> //__//////////////
>>> 
>>>       import java.util.Map;
>>>       import java.util.concurrent.__**ConcurrentHashMap;
>>> 
>>> 
>>>       public class CollectionToArrayTest {
>>> 
>>>            static volatile Map<String, String>  map = new
>>>       TConcurrentHashMap<String, String>();
>>>            static volatile boolean gosleep = true;
>>> 
>>>            static class TConcurrentHashMap<K, V>  extends
>>>       ConcurrentHashMap<K, V>  {
>>>                public int size() {
>>>                    int oldresult = super.size();
>>>                    System.out.println("map size before concurrent
>>>       remove is "
>>>                            + oldresult);
>>>                    while (gosleep) {
>>>                        try {
>>>                            // Make sure the map is modified during
>>>       toArray is
>>>       called,
>>>                            // between getsize and being iterated.
>>>                            Thread.sleep(1000);
>>>                            // System.out.println("size called, size is "
>>> +
>>>       oldresult +
>>>                            // " take a sleep to make sure the element
>>>       is deleted
>>>       before size is returned.");
>>>                        } catch (Exception e) {
>>>                        }
>>>                    }
>>>                    return oldresult;
>>>                }
>>>            }
>>> 
>>>            static class ToArrayThread implements Runnable {
>>>                public void run() {
>>>                    for (int i = 0; i<  5; i++) {
>>>                        String str = Integer.toString(i);
>>>                        map.put(str, str);
>>>                    }
>>>                    String[] buffer = new String[4];
>>>                    String[] strings = map.values().toArray(buffer);
>>>                    // System.out.println("length is " + strings.length);
>>>                    if (strings.length<= buffer.length) {
>>>                        System.out.println("given array size is "
>>>                                        + buffer.length
>>>                                        + " \nreturned array size is "
>>>                                        + strings.length
>>>                                        + ", \nbuffer should be used
>>>       according to
>>>       spec. Is buffer used : "
>>>                                        + (strings == buffer));
>>>                    }
>>>                }
>>>            }
>>> 
>>>            static class RemoveThread implements Runnable {
>>>                public void run() {
>>>                    String str = Integer.toString(0);
>>>                    map.remove(str);
>>>                    gosleep = false;
>>>                }
>>>            }
>>> 
>>>            public static void main(String args[]) {
>>>                CollectionToArrayTest app = new CollectionToArrayTest();
>>>                app.test_concurrentRemove();
>>>            }
>>> 
>>>            public void test_concurrentRemove() {
>>> 
>>>       System.out.println("//////////**__////////////////////////////**
>>> //__//////\n"
>>> 
>>>       +
>>>       "The spec says if the given array is large\n "  +
>>>       "enough to hold all elements, the given array\n" +
>>>       "should be returned by toArray. This \n" +
>>>       "testcase checks this case. \n" +
>>>       "/////////////////////////////**__/////////////////");
>>> 
>>> 
>>>                Thread[] threads = new Thread[2];
>>>                threads[0] = new Thread(new ToArrayThread());
>>>                threads[1] = new Thread(new RemoveThread());
>>> 
>>>                threads[0].start();
>>> 
>>>                try {
>>>                    // Take a sleep to make sure toArray is already
>>> called.
>>>                    Thread.sleep(1200);
>>>                } catch (Exception e) {
>>>                }
>>> 
>>>                threads[1].start();
>>>            }
>>>       }
>>> 
>>>       //////////////////////////////**__////////////////////////////**
>>> //__//////////////
>>> 
>>> 
>>>       TConcurrentHashMap is used to make sure the collection is modified
>>>       during toArray is invoked. So the returned array fits in the
>>>       specified
>>>       array, but a new array is used because toArray checks the size
>>>       before copying.
>>> 
>>>       Is this a bug ?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards,
>>> Sean Chou
>>> 
>>> 
> 
> 
> -- 
> Best Regards,
> Sean Chou




More information about the core-libs-dev mailing list