Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream

Chris Hegarty chris.hegarty at oracle.com
Fri Apr 16 09:38:07 UTC 2010


Martin Buchholz wrote:
> Hi Chris,
> 
> I recently discovered another place to handle huge arrays better - in
> AbstractCollection.
> I've put those changes into
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize2/
> I propose to qfold these into the original changes for this bug
> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
> which have not yet been committed.

Good catch Martin, these additional changes look good.

-Chris.

> 
> Martin
> 
> On Tue, Mar 9, 2010 at 04:41, Christopher Hegarty -Sun Microsystems
> Ireland <Christopher.Hegarty at sun.com> wrote:
>> Sorry Martin, I appear to have missed your original request to file this
>> bug. I since filed the following:
>>
>>  6933217: Huge arrays handled poorly in core libraries
>>
>> The changes you are proposing seem reasonable to me.
>>
>> -Chris.
>>
>> Martin Buchholz wrote:
>>> [Chris or Alan, please review and file a bug]
>>>
>>> OK, guys,
>>>
>>> Here's a patch:
>>>
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>>>
>>> Martin
>>>
>>> On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern <kevin.l.stern at gmail.com>
>>> wrote:
>>>> Hi Martin,
>>>>
>>>> Thank you for your reply.  If I may, PriorityQueue appears to employ the
>>>> simple strategy that I suggested above in its grow method:
>>>>
>>>>        int newCapacity = ((oldCapacity < 64)?
>>>>                           ((oldCapacity + 1) * 2):
>>>>                           ((oldCapacity / 2) * 3));
>>>>        if (newCapacity < 0) // overflow
>>>>            newCapacity = Integer.MAX_VALUE;
>>>>
>>>> It might be desirable to set a common strategy for capacity increase for
>>>> all
>>>> collections.
>>>>
>>>> Regards,
>>>>
>>>> Kevin
>>>>
>>>> On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz <martinrb at google.com>
>>>> wrote:
>>>>> Hi Kevin,
>>>>>
>>>>> As you've noticed, creating objects within a factor of two of
>>>>> their natural limits is a good way to expose lurking bugs.
>>>>>
>>>>> I'm the one responsible for the algorithm in ArrayList.
>>>>> I'm a bit embarrassed, looking at that code today.
>>>>> We could set the array size to Integer.MAX_VALUE,
>>>>> but then you might hit an independent buglet in hotspot
>>>>> that you cannot allocate an array with Integer.MAX_VALUE
>>>>> elements, but Integer.MAX_VALUE - 5 (or so) works.
>>>>>
>>>>> It occurs to me that increasing the size by 50% is better done by
>>>>> int newCapacity = oldCapacity + (oldCapacity >> 1) + 1;
>>>>>
>>>>> I agree with the plan of setting the capacity to something near
>>>>> MAX_VALUE on overflow, and throw OutOfMemoryError on next resize.
>>>>>
>>>>> These bugs are not known.
>>>>> Chris Hegarty, could you file a bug for us?
>>>>>
>>>>> Martin
>>>>>
>>>>> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern <kevin.l.stern at gmail.com>
>>>>> wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I've noticed bugs in java.util.ArrayList, java.util.Hashtable and
>>>>>> java.io.ByteArrayOutputStream which arise when the capacities of the
>>>>>> data
>>>>>> structures reach a particular threshold.  More below.
>>>>>>
>>>>>> When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE its
>>>>>> size
>>>>>> reaches its capacity and an add or an insert operation is invoked, the
>>>>>> capacity is increased by only one element.  Notice that in the
>>>>>> following
>>>>>> excerpt from ArrayList.ensureCapacity the new capacity is set to (3/2)
>>>>>> *
>>>>>> oldCapacity + 1 unless this value would not suffice to accommodate the
>>>>>> required capacity in which case it is set to the required capacity.  If
>>>>>> the
>>>>>> current capacity is at least (2/3)*Integer.MAX_VALUE, then (oldCapacity
>>>>>> *
>>>>>> 3)/2 + 1 overflows and resolves to a negative number resulting in the
>>>>>> new
>>>>>> capacity being set to the required capacity.  The major consequence of
>>>>>> this
>>>>>> is that each subsequent add/insert operation results in a full resize
>>>>>> of
>>>>>> the
>>>>>> ArrayList causing performance to degrade significantly.
>>>>>>
>>>>>>        int newCapacity = (oldCapacity * 3)/2 + 1;
>>>>>>            if (newCapacity < minCapacity)
>>>>>>        newCapacity = minCapacity;
>>>>>>
>>>>>> Hashtable breaks entirely when the size of its backing array reaches
>>>>>> (1/2) *
>>>>>> Integer.MAX_VALUE and a rehash is necessary as is evident from the
>>>>>> following
>>>>>> excerpt from rehash.  Notice that rehash will attempt to create an
>>>>>> array
>>>>>> of
>>>>>> negative size if the size of the backing array reaches (1/2) *
>>>>>> Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves to a
>>>>>> negative number.
>>>>>>
>>>>>>    int newCapacity = oldCapacity * 2 + 1;
>>>>>>    HashtableEntry newTable[] = new HashtableEntry[newCapacity];
>>>>>>
>>>>>> When the capacity of the backing array in a ByteArrayOutputStream
>>>>>> reaches
>>>>>> (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write
>>>>>> operation is invoked, the capacity of the backing array is increased
>>>>>> only by
>>>>>> the required number of elements.  Notice that in the following excerpt
>>>>>> from
>>>>>> ByteArrayOutputStream.write(int) the new backing array capacity is set
>>>>>> to 2
>>>>>> * buf.length unless this value would not suffice to accommodate the
>>>>>> required
>>>>>> capacity in which case it is set to the required capacity.  If the
>>>>>> current
>>>>>> backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1, then
>>>>>> buf.length << 1 overflows and resolves to a negative number resulting
>>>>>> in
>>>>>> the
>>>>>> new capacity being set to the required capacity.  The major consequence
>>>>>> of
>>>>>> this, like with ArrayList, is that each subsequent write operation
>>>>>> results
>>>>>> in a full resize of the ByteArrayOutputStream causing performance to
>>>>>> degrade
>>>>>> significantly.
>>>>>>
>>>>>>    int newcount = count + 1;
>>>>>>    if (newcount > buf.length) {
>>>>>>            buf = Arrays.copyOf(buf, Math.max(buf.length << 1,
>>>>>> newcount));
>>>>>>    }
>>>>>>
>>>>>> It is interesting to note that any statements about the amortized time
>>>>>> complexity of add/insert operations, such as the one in the ArrayList
>>>>>> javadoc, are invalidated by the performance related bugs.  One solution
>>>>>> to
>>>>>> the above situations is to set the new capacity of the backing array to
>>>>>> Integer.MAX_VALUE when the initial size calculation results in a
>>>>>> negative
>>>>>> number during a resize.
>>>>>>
>>>>>> Apologies if these bugs are already known.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Kevin
>>>>>>



More information about the core-libs-dev mailing list