Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream

Kelly O'Hair kelly.ohair at oracle.com
Thu May 13 00:52:58 UTC 2010


If it helps, I found an open test that should demonstrate it:

http://www.sourceware.org/cgi-bin/cvsweb.cgi/mauve/gnu/testlet/java/lang/StringBuffer/StringBufferTest.java?rev=1.7&content-type=text/x-cvsweb-markup&cvsroot=mauve

-kto


On May 12, 2010, at 5:31 PM, David Holmes wrote:

> Chris, Martin,
>
> The changes to AbstractStringBuilder violate the specification for  
> ensureCapacity: it has to grow from N to 2N+2. The new code in:
>
> http://hg.openjdk.java.net/jdk7/tl-gate/jdk/rev/ec45423a4700
>
> void expandCapacity(int minimumCapacity) {
> - int newCapacity = (value.length + 1) * 2;
> + int newCapacity = value.length * 2;
>
> Has dropped the +2 part.
>
> This is causing test failures.
>
> David
> -----
>
> Chris Hegarty said the following on 04/16/10 19:38:
>> Martin Buchholz wrote:
>>> Hi Chris,
>>>
>>> I recently discovered another place to handle huge arrays better -  
>>> in
>>> AbstractCollection.
>>> I've put those changes into
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize2/
>>> I propose to qfold these into the original changes for this bug
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>>> which have not yet been committed.
>> Good catch Martin, these additional changes look good.
>> -Chris.
>>>
>>> Martin
>>>
>>> On Tue, Mar 9, 2010 at 04:41, Christopher Hegarty -Sun Microsystems
>>> Ireland <Christopher.Hegarty at sun.com> wrote:
>>>> Sorry Martin, I appear to have missed your original request to  
>>>> file this
>>>> bug. I since filed the following:
>>>>
>>>> 6933217: Huge arrays handled poorly in core libraries
>>>>
>>>> The changes you are proposing seem reasonable to me.
>>>>
>>>> -Chris.
>>>>
>>>> Martin Buchholz wrote:
>>>>> [Chris or Alan, please review and file a bug]
>>>>>
>>>>> OK, guys,
>>>>>
>>>>> Here's a patch:
>>>>>
>>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>>>>>
>>>>> Martin
>>>>>
>>>>> On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern <kevin.l.stern at gmail.com 
>>>>> >
>>>>> wrote:
>>>>>> Hi Martin,
>>>>>>
>>>>>> Thank you for your reply.  If I may, PriorityQueue appears to  
>>>>>> employ the
>>>>>> simple strategy that I suggested above in its grow method:
>>>>>>
>>>>>>       int newCapacity = ((oldCapacity < 64)?
>>>>>>                          ((oldCapacity + 1) * 2):
>>>>>>                          ((oldCapacity / 2) * 3));
>>>>>>       if (newCapacity < 0) // overflow
>>>>>>           newCapacity = Integer.MAX_VALUE;
>>>>>>
>>>>>> It might be desirable to set a common strategy for capacity  
>>>>>> increase for
>>>>>> all
>>>>>> collections.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Kevin
>>>>>>
>>>>>> On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz <martinrb at google.com 
>>>>>> >
>>>>>> wrote:
>>>>>>> Hi Kevin,
>>>>>>>
>>>>>>> As you've noticed, creating objects within a factor of two of
>>>>>>> their natural limits is a good way to expose lurking bugs.
>>>>>>>
>>>>>>> I'm the one responsible for the algorithm in ArrayList.
>>>>>>> I'm a bit embarrassed, looking at that code today.
>>>>>>> We could set the array size to Integer.MAX_VALUE,
>>>>>>> but then you might hit an independent buglet in hotspot
>>>>>>> that you cannot allocate an array with Integer.MAX_VALUE
>>>>>>> elements, but Integer.MAX_VALUE - 5 (or so) works.
>>>>>>>
>>>>>>> It occurs to me that increasing the size by 50% is better done  
>>>>>>> by
>>>>>>> int newCapacity = oldCapacity + (oldCapacity >> 1) + 1;
>>>>>>>
>>>>>>> I agree with the plan of setting the capacity to something near
>>>>>>> MAX_VALUE on overflow, and throw OutOfMemoryError on next  
>>>>>>> resize.
>>>>>>>
>>>>>>> These bugs are not known.
>>>>>>> Chris Hegarty, could you file a bug for us?
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern <kevin.l.stern at gmail.com 
>>>>>>> >
>>>>>>> wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I've noticed bugs in java.util.ArrayList, java.util.Hashtable  
>>>>>>>> and
>>>>>>>> java.io.ByteArrayOutputStream which arise when the capacities  
>>>>>>>> of the
>>>>>>>> data
>>>>>>>> structures reach a particular threshold.  More below.
>>>>>>>>
>>>>>>>> When the capacity of an ArrayList reaches  
>>>>>>>> (2/3)*Integer.MAX_VALUE its
>>>>>>>> size
>>>>>>>> reaches its capacity and an add or an insert operation is  
>>>>>>>> invoked, the
>>>>>>>> capacity is increased by only one element.  Notice that in the
>>>>>>>> following
>>>>>>>> excerpt from ArrayList.ensureCapacity the new capacity is set  
>>>>>>>> to (3/2)
>>>>>>>> *
>>>>>>>> oldCapacity + 1 unless this value would not suffice to  
>>>>>>>> accommodate the
>>>>>>>> required capacity in which case it is set to the required  
>>>>>>>> capacity.  If
>>>>>>>> the
>>>>>>>> current capacity is at least (2/3)*Integer.MAX_VALUE, then  
>>>>>>>> (oldCapacity
>>>>>>>> *
>>>>>>>> 3)/2 + 1 overflows and resolves to a negative number  
>>>>>>>> resulting in the
>>>>>>>> new
>>>>>>>> capacity being set to the required capacity.  The major  
>>>>>>>> consequence of
>>>>>>>> this
>>>>>>>> is that each subsequent add/insert operation results in a  
>>>>>>>> full resize
>>>>>>>> of
>>>>>>>> the
>>>>>>>> ArrayList causing performance to degrade significantly.
>>>>>>>>
>>>>>>>>       int newCapacity = (oldCapacity * 3)/2 + 1;
>>>>>>>>           if (newCapacity < minCapacity)
>>>>>>>>       newCapacity = minCapacity;
>>>>>>>>
>>>>>>>> Hashtable breaks entirely when the size of its backing array  
>>>>>>>> reaches
>>>>>>>> (1/2) *
>>>>>>>> Integer.MAX_VALUE and a rehash is necessary as is evident  
>>>>>>>> from the
>>>>>>>> following
>>>>>>>> excerpt from rehash.  Notice that rehash will attempt to  
>>>>>>>> create an
>>>>>>>> array
>>>>>>>> of
>>>>>>>> negative size if the size of the backing array reaches (1/2) *
>>>>>>>> Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and  
>>>>>>>> resolves to a
>>>>>>>> negative number.
>>>>>>>>
>>>>>>>>   int newCapacity = oldCapacity * 2 + 1;
>>>>>>>>   HashtableEntry newTable[] = new HashtableEntry[newCapacity];
>>>>>>>>
>>>>>>>> When the capacity of the backing array in a  
>>>>>>>> ByteArrayOutputStream
>>>>>>>> reaches
>>>>>>>> (1/2) * Integer.MAX_VALUE its size reaches its capacity and a  
>>>>>>>> write
>>>>>>>> operation is invoked, the capacity of the backing array is  
>>>>>>>> increased
>>>>>>>> only by
>>>>>>>> the required number of elements.  Notice that in the  
>>>>>>>> following excerpt
>>>>>>>> from
>>>>>>>> ByteArrayOutputStream.write(int) the new backing array  
>>>>>>>> capacity is set
>>>>>>>> to 2
>>>>>>>> * buf.length unless this value would not suffice to  
>>>>>>>> accommodate the
>>>>>>>> required
>>>>>>>> capacity in which case it is set to the required capacity.   
>>>>>>>> If the
>>>>>>>> current
>>>>>>>> backing array capacity is at least (1/2) * Integer.MAX_VALUE  
>>>>>>>> + 1, then
>>>>>>>> buf.length << 1 overflows and resolves to a negative number  
>>>>>>>> resulting
>>>>>>>> in
>>>>>>>> the
>>>>>>>> new capacity being set to the required capacity.  The major  
>>>>>>>> consequence
>>>>>>>> of
>>>>>>>> this, like with ArrayList, is that each subsequent write  
>>>>>>>> operation
>>>>>>>> results
>>>>>>>> in a full resize of the ByteArrayOutputStream causing  
>>>>>>>> performance to
>>>>>>>> degrade
>>>>>>>> significantly.
>>>>>>>>
>>>>>>>>   int newcount = count + 1;
>>>>>>>>   if (newcount > buf.length) {
>>>>>>>>           buf = Arrays.copyOf(buf, Math.max(buf.length << 1,
>>>>>>>> newcount));
>>>>>>>>   }
>>>>>>>>
>>>>>>>> It is interesting to note that any statements about the  
>>>>>>>> amortized time
>>>>>>>> complexity of add/insert operations, such as the one in the  
>>>>>>>> ArrayList
>>>>>>>> javadoc, are invalidated by the performance related bugs.   
>>>>>>>> One solution
>>>>>>>> to
>>>>>>>> the above situations is to set the new capacity of the  
>>>>>>>> backing array to
>>>>>>>> Integer.MAX_VALUE when the initial size calculation results  
>>>>>>>> in a
>>>>>>>> negative
>>>>>>>> number during a resize.
>>>>>>>>
>>>>>>>> Apologies if these bugs are already known.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Kevin
>>>>>>>>




More information about the core-libs-dev mailing list