Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream

David Holmes David.Holmes at oracle.com
Thu May 13 00:31:16 UTC 2010


Chris, Martin,

The changes to AbstractStringBuilder violate the specification for 
ensureCapacity: it has to grow from N to 2N+2. The new code in:

http://hg.openjdk.java.net/jdk7/tl-gate/jdk/rev/ec45423a4700

  void expandCapacity(int minimumCapacity) {
- int newCapacity = (value.length + 1) * 2;
+ int newCapacity = value.length * 2;

Has dropped the +2 part.

This is causing test failures.

David
-----

Chris Hegarty said the following on 04/16/10 19:38:
> 
> Martin Buchholz wrote:
>> Hi Chris,
>>
>> I recently discovered another place to handle huge arrays better - in
>> AbstractCollection.
>> I've put those changes into
>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize2/
>> I propose to qfold these into the original changes for this bug
>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>> which have not yet been committed.
> 
> Good catch Martin, these additional changes look good.
> 
> -Chris.
> 
>>
>> Martin
>>
>> On Tue, Mar 9, 2010 at 04:41, Christopher Hegarty -Sun Microsystems
>> Ireland <Christopher.Hegarty at sun.com> wrote:
>>> Sorry Martin, I appear to have missed your original request to file this
>>> bug. I since filed the following:
>>>
>>>  6933217: Huge arrays handled poorly in core libraries
>>>
>>> The changes you are proposing seem reasonable to me.
>>>
>>> -Chris.
>>>
>>> Martin Buchholz wrote:
>>>> [Chris or Alan, please review and file a bug]
>>>>
>>>> OK, guys,
>>>>
>>>> Here's a patch:
>>>>
>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>>>>
>>>> Martin
>>>>
>>>> On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern <kevin.l.stern at gmail.com>
>>>> wrote:
>>>>> Hi Martin,
>>>>>
>>>>> Thank you for your reply.  If I may, PriorityQueue appears to 
>>>>> employ the
>>>>> simple strategy that I suggested above in its grow method:
>>>>>
>>>>>        int newCapacity = ((oldCapacity < 64)?
>>>>>                           ((oldCapacity + 1) * 2):
>>>>>                           ((oldCapacity / 2) * 3));
>>>>>        if (newCapacity < 0) // overflow
>>>>>            newCapacity = Integer.MAX_VALUE;
>>>>>
>>>>> It might be desirable to set a common strategy for capacity 
>>>>> increase for
>>>>> all
>>>>> collections.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Kevin
>>>>>
>>>>> On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz <martinrb at google.com>
>>>>> wrote:
>>>>>> Hi Kevin,
>>>>>>
>>>>>> As you've noticed, creating objects within a factor of two of
>>>>>> their natural limits is a good way to expose lurking bugs.
>>>>>>
>>>>>> I'm the one responsible for the algorithm in ArrayList.
>>>>>> I'm a bit embarrassed, looking at that code today.
>>>>>> We could set the array size to Integer.MAX_VALUE,
>>>>>> but then you might hit an independent buglet in hotspot
>>>>>> that you cannot allocate an array with Integer.MAX_VALUE
>>>>>> elements, but Integer.MAX_VALUE - 5 (or so) works.
>>>>>>
>>>>>> It occurs to me that increasing the size by 50% is better done by
>>>>>> int newCapacity = oldCapacity + (oldCapacity >> 1) + 1;
>>>>>>
>>>>>> I agree with the plan of setting the capacity to something near
>>>>>> MAX_VALUE on overflow, and throw OutOfMemoryError on next resize.
>>>>>>
>>>>>> These bugs are not known.
>>>>>> Chris Hegarty, could you file a bug for us?
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern 
>>>>>> <kevin.l.stern at gmail.com>
>>>>>> wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I've noticed bugs in java.util.ArrayList, java.util.Hashtable and
>>>>>>> java.io.ByteArrayOutputStream which arise when the capacities of the
>>>>>>> data
>>>>>>> structures reach a particular threshold.  More below.
>>>>>>>
>>>>>>> When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE 
>>>>>>> its
>>>>>>> size
>>>>>>> reaches its capacity and an add or an insert operation is 
>>>>>>> invoked, the
>>>>>>> capacity is increased by only one element.  Notice that in the
>>>>>>> following
>>>>>>> excerpt from ArrayList.ensureCapacity the new capacity is set to 
>>>>>>> (3/2)
>>>>>>> *
>>>>>>> oldCapacity + 1 unless this value would not suffice to 
>>>>>>> accommodate the
>>>>>>> required capacity in which case it is set to the required 
>>>>>>> capacity.  If
>>>>>>> the
>>>>>>> current capacity is at least (2/3)*Integer.MAX_VALUE, then 
>>>>>>> (oldCapacity
>>>>>>> *
>>>>>>> 3)/2 + 1 overflows and resolves to a negative number resulting in 
>>>>>>> the
>>>>>>> new
>>>>>>> capacity being set to the required capacity.  The major 
>>>>>>> consequence of
>>>>>>> this
>>>>>>> is that each subsequent add/insert operation results in a full 
>>>>>>> resize
>>>>>>> of
>>>>>>> the
>>>>>>> ArrayList causing performance to degrade significantly.
>>>>>>>
>>>>>>>        int newCapacity = (oldCapacity * 3)/2 + 1;
>>>>>>>            if (newCapacity < minCapacity)
>>>>>>>        newCapacity = minCapacity;
>>>>>>>
>>>>>>> Hashtable breaks entirely when the size of its backing array reaches
>>>>>>> (1/2) *
>>>>>>> Integer.MAX_VALUE and a rehash is necessary as is evident from the
>>>>>>> following
>>>>>>> excerpt from rehash.  Notice that rehash will attempt to create an
>>>>>>> array
>>>>>>> of
>>>>>>> negative size if the size of the backing array reaches (1/2) *
>>>>>>> Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and 
>>>>>>> resolves to a
>>>>>>> negative number.
>>>>>>>
>>>>>>>    int newCapacity = oldCapacity * 2 + 1;
>>>>>>>    HashtableEntry newTable[] = new HashtableEntry[newCapacity];
>>>>>>>
>>>>>>> When the capacity of the backing array in a ByteArrayOutputStream
>>>>>>> reaches
>>>>>>> (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write
>>>>>>> operation is invoked, the capacity of the backing array is increased
>>>>>>> only by
>>>>>>> the required number of elements.  Notice that in the following 
>>>>>>> excerpt
>>>>>>> from
>>>>>>> ByteArrayOutputStream.write(int) the new backing array capacity 
>>>>>>> is set
>>>>>>> to 2
>>>>>>> * buf.length unless this value would not suffice to accommodate the
>>>>>>> required
>>>>>>> capacity in which case it is set to the required capacity.  If the
>>>>>>> current
>>>>>>> backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1, 
>>>>>>> then
>>>>>>> buf.length << 1 overflows and resolves to a negative number 
>>>>>>> resulting
>>>>>>> in
>>>>>>> the
>>>>>>> new capacity being set to the required capacity.  The major 
>>>>>>> consequence
>>>>>>> of
>>>>>>> this, like with ArrayList, is that each subsequent write operation
>>>>>>> results
>>>>>>> in a full resize of the ByteArrayOutputStream causing performance to
>>>>>>> degrade
>>>>>>> significantly.
>>>>>>>
>>>>>>>    int newcount = count + 1;
>>>>>>>    if (newcount > buf.length) {
>>>>>>>            buf = Arrays.copyOf(buf, Math.max(buf.length << 1,
>>>>>>> newcount));
>>>>>>>    }
>>>>>>>
>>>>>>> It is interesting to note that any statements about the amortized 
>>>>>>> time
>>>>>>> complexity of add/insert operations, such as the one in the 
>>>>>>> ArrayList
>>>>>>> javadoc, are invalidated by the performance related bugs.  One 
>>>>>>> solution
>>>>>>> to
>>>>>>> the above situations is to set the new capacity of the backing 
>>>>>>> array to
>>>>>>> Integer.MAX_VALUE when the initial size calculation results in a
>>>>>>> negative
>>>>>>> number during a resize.
>>>>>>>
>>>>>>> Apologies if these bugs are already known.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Kevin
>>>>>>>



More information about the core-libs-dev mailing list