Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream
Martin Buchholz
martinrb at google.com
Fri Apr 16 06:09:50 UTC 2010
Hi Chris,
I recently discovered another place to handle huge arrays better - in
AbstractCollection.
I've put those changes into
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize2/
I propose to qfold these into the original changes for this bug
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
which have not yet been committed.
Martin
On Tue, Mar 9, 2010 at 04:41, Christopher Hegarty -Sun Microsystems
Ireland <Christopher.Hegarty at sun.com> wrote:
> Sorry Martin, I appear to have missed your original request to file this
> bug. I since filed the following:
>
> 6933217: Huge arrays handled poorly in core libraries
>
> The changes you are proposing seem reasonable to me.
>
> -Chris.
>
> Martin Buchholz wrote:
>>
>> [Chris or Alan, please review and file a bug]
>>
>> OK, guys,
>>
>> Here's a patch:
>>
>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>>
>> Martin
>>
>> On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern <kevin.l.stern at gmail.com>
>> wrote:
>>>
>>> Hi Martin,
>>>
>>> Thank you for your reply. If I may, PriorityQueue appears to employ the
>>> simple strategy that I suggested above in its grow method:
>>>
>>> int newCapacity = ((oldCapacity < 64)?
>>> ((oldCapacity + 1) * 2):
>>> ((oldCapacity / 2) * 3));
>>> if (newCapacity < 0) // overflow
>>> newCapacity = Integer.MAX_VALUE;
>>>
>>> It might be desirable to set a common strategy for capacity increase for
>>> all
>>> collections.
>>>
>>> Regards,
>>>
>>> Kevin
>>>
>>> On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz <martinrb at google.com>
>>> wrote:
>>>>
>>>> Hi Kevin,
>>>>
>>>> As you've noticed, creating objects within a factor of two of
>>>> their natural limits is a good way to expose lurking bugs.
>>>>
>>>> I'm the one responsible for the algorithm in ArrayList.
>>>> I'm a bit embarrassed, looking at that code today.
>>>> We could set the array size to Integer.MAX_VALUE,
>>>> but then you might hit an independent buglet in hotspot
>>>> that you cannot allocate an array with Integer.MAX_VALUE
>>>> elements, but Integer.MAX_VALUE - 5 (or so) works.
>>>>
>>>> It occurs to me that increasing the size by 50% is better done by
>>>> int newCapacity = oldCapacity + (oldCapacity >> 1) + 1;
>>>>
>>>> I agree with the plan of setting the capacity to something near
>>>> MAX_VALUE on overflow, and throw OutOfMemoryError on next resize.
>>>>
>>>> These bugs are not known.
>>>> Chris Hegarty, could you file a bug for us?
>>>>
>>>> Martin
>>>>
>>>> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern <kevin.l.stern at gmail.com>
>>>> wrote:
>>>>>
>>>>> Greetings,
>>>>>
>>>>> I've noticed bugs in java.util.ArrayList, java.util.Hashtable and
>>>>> java.io.ByteArrayOutputStream which arise when the capacities of the
>>>>> data
>>>>> structures reach a particular threshold. More below.
>>>>>
>>>>> When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE its
>>>>> size
>>>>> reaches its capacity and an add or an insert operation is invoked, the
>>>>> capacity is increased by only one element. Notice that in the
>>>>> following
>>>>> excerpt from ArrayList.ensureCapacity the new capacity is set to (3/2)
>>>>> *
>>>>> oldCapacity + 1 unless this value would not suffice to accommodate the
>>>>> required capacity in which case it is set to the required capacity. If
>>>>> the
>>>>> current capacity is at least (2/3)*Integer.MAX_VALUE, then (oldCapacity
>>>>> *
>>>>> 3)/2 + 1 overflows and resolves to a negative number resulting in the
>>>>> new
>>>>> capacity being set to the required capacity. The major consequence of
>>>>> this
>>>>> is that each subsequent add/insert operation results in a full resize
>>>>> of
>>>>> the
>>>>> ArrayList causing performance to degrade significantly.
>>>>>
>>>>> int newCapacity = (oldCapacity * 3)/2 + 1;
>>>>> if (newCapacity < minCapacity)
>>>>> newCapacity = minCapacity;
>>>>>
>>>>> Hashtable breaks entirely when the size of its backing array reaches
>>>>> (1/2) *
>>>>> Integer.MAX_VALUE and a rehash is necessary as is evident from the
>>>>> following
>>>>> excerpt from rehash. Notice that rehash will attempt to create an
>>>>> array
>>>>> of
>>>>> negative size if the size of the backing array reaches (1/2) *
>>>>> Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves to a
>>>>> negative number.
>>>>>
>>>>> int newCapacity = oldCapacity * 2 + 1;
>>>>> HashtableEntry newTable[] = new HashtableEntry[newCapacity];
>>>>>
>>>>> When the capacity of the backing array in a ByteArrayOutputStream
>>>>> reaches
>>>>> (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write
>>>>> operation is invoked, the capacity of the backing array is increased
>>>>> only by
>>>>> the required number of elements. Notice that in the following excerpt
>>>>> from
>>>>> ByteArrayOutputStream.write(int) the new backing array capacity is set
>>>>> to 2
>>>>> * buf.length unless this value would not suffice to accommodate the
>>>>> required
>>>>> capacity in which case it is set to the required capacity. If the
>>>>> current
>>>>> backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1, then
>>>>> buf.length << 1 overflows and resolves to a negative number resulting
>>>>> in
>>>>> the
>>>>> new capacity being set to the required capacity. The major consequence
>>>>> of
>>>>> this, like with ArrayList, is that each subsequent write operation
>>>>> results
>>>>> in a full resize of the ByteArrayOutputStream causing performance to
>>>>> degrade
>>>>> significantly.
>>>>>
>>>>> int newcount = count + 1;
>>>>> if (newcount > buf.length) {
>>>>> buf = Arrays.copyOf(buf, Math.max(buf.length << 1,
>>>>> newcount));
>>>>> }
>>>>>
>>>>> It is interesting to note that any statements about the amortized time
>>>>> complexity of add/insert operations, such as the one in the ArrayList
>>>>> javadoc, are invalidated by the performance related bugs. One solution
>>>>> to
>>>>> the above situations is to set the new capacity of the backing array to
>>>>> Integer.MAX_VALUE when the initial size calculation results in a
>>>>> negative
>>>>> number during a resize.
>>>>>
>>>>> Apologies if these bugs are already known.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Kevin
>>>>>
>>>
>
More information about the core-libs-dev
mailing list