Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream

Martin Buchholz martinrb at google.com
Thu Mar 11 01:03:32 UTC 2010


On Wed, Mar 10, 2010 at 01:58, Goktug Gokdogan <gokdogan at gmail.com> wrote:
> Similarly,
>   BitSet.ensureCapacity

I don't think BitSet has this problem, because the bits are stored in longs,
so the array can never overflow.  But don't believe me - prove me wrong!

>   AbstractStringBuilder.expandCapacity

Yup.

>   Vector.ensureCapacityHelper

Yup.

The scope of this fix is growing...

> methods need to have similar checks and/or throw proper exceptions.
>
> By the way, I did not understand why IdentityHashMap and HashMap have
> different MAXIMUM_CAPACITY and different logic to handle resize and
> overflow.

These two classes store their data in completely different ways.
In particular, HashMap need never fail because of overflow ;
Integer.MAX_VALUE buckets should be enough for anybody!

Webrev regenerated.
http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/

Martin

>
> On Mon, Mar 8, 2010 at 8:10 PM, Martin Buchholz <martinrb at google.com> wrote:
>>
>> [Chris or Alan, please review and file a bug]
>>
>> OK, guys,
>>
>> Here's a patch:
>>
>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>>
>> Martin
>>
>> On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern <kevin.l.stern at gmail.com>
>> wrote:
>> > Hi Martin,
>> >
>> > Thank you for your reply.  If I may, PriorityQueue appears to employ the
>> > simple strategy that I suggested above in its grow method:
>> >
>> >         int newCapacity = ((oldCapacity < 64)?
>> >                            ((oldCapacity + 1) * 2):
>> >                            ((oldCapacity / 2) * 3));
>> >         if (newCapacity < 0) // overflow
>> >             newCapacity = Integer.MAX_VALUE;
>> >
>> > It might be desirable to set a common strategy for capacity increase for
>> > all
>> > collections.
>> >
>> > Regards,
>> >
>> > Kevin
>> >
>> > On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz <martinrb at google.com>
>> > wrote:
>> >>
>> >> Hi Kevin,
>> >>
>> >> As you've noticed, creating objects within a factor of two of
>> >> their natural limits is a good way to expose lurking bugs.
>> >>
>> >> I'm the one responsible for the algorithm in ArrayList.
>> >> I'm a bit embarrassed, looking at that code today.
>> >> We could set the array size to Integer.MAX_VALUE,
>> >> but then you might hit an independent buglet in hotspot
>> >> that you cannot allocate an array with Integer.MAX_VALUE
>> >> elements, but Integer.MAX_VALUE - 5 (or so) works.
>> >>
>> >> It occurs to me that increasing the size by 50% is better done by
>> >> int newCapacity = oldCapacity + (oldCapacity >> 1) + 1;
>> >>
>> >> I agree with the plan of setting the capacity to something near
>> >> MAX_VALUE on overflow, and throw OutOfMemoryError on next resize.
>> >>
>> >> These bugs are not known.
>> >> Chris Hegarty, could you file a bug for us?
>> >>
>> >> Martin
>> >>
>> >> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern <kevin.l.stern at gmail.com>
>> >> wrote:
>> >> > Greetings,
>> >> >
>> >> > I've noticed bugs in java.util.ArrayList, java.util.Hashtable and
>> >> > java.io.ByteArrayOutputStream which arise when the capacities of the
>> >> > data
>> >> > structures reach a particular threshold.  More below.
>> >> >
>> >> > When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE its
>> >> > size
>> >> > reaches its capacity and an add or an insert operation is invoked,
>> >> > the
>> >> > capacity is increased by only one element.  Notice that in the
>> >> > following
>> >> > excerpt from ArrayList.ensureCapacity the new capacity is set to
>> >> > (3/2) *
>> >> > oldCapacity + 1 unless this value would not suffice to accommodate
>> >> > the
>> >> > required capacity in which case it is set to the required capacity.
>> >> > If
>> >> > the
>> >> > current capacity is at least (2/3)*Integer.MAX_VALUE, then
>> >> > (oldCapacity
>> >> > *
>> >> > 3)/2 + 1 overflows and resolves to a negative number resulting in the
>> >> > new
>> >> > capacity being set to the required capacity.  The major consequence
>> >> > of
>> >> > this
>> >> > is that each subsequent add/insert operation results in a full resize
>> >> > of
>> >> > the
>> >> > ArrayList causing performance to degrade significantly.
>> >> >
>> >> >         int newCapacity = (oldCapacity * 3)/2 + 1;
>> >> >             if (newCapacity < minCapacity)
>> >> >         newCapacity = minCapacity;
>> >> >
>> >> > Hashtable breaks entirely when the size of its backing array reaches
>> >> > (1/2) *
>> >> > Integer.MAX_VALUE and a rehash is necessary as is evident from the
>> >> > following
>> >> > excerpt from rehash.  Notice that rehash will attempt to create an
>> >> > array
>> >> > of
>> >> > negative size if the size of the backing array reaches (1/2) *
>> >> > Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves to
>> >> > a
>> >> > negative number.
>> >> >
>> >> >     int newCapacity = oldCapacity * 2 + 1;
>> >> >     HashtableEntry newTable[] = new HashtableEntry[newCapacity];
>> >> >
>> >> > When the capacity of the backing array in a ByteArrayOutputStream
>> >> > reaches
>> >> > (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write
>> >> > operation is invoked, the capacity of the backing array is increased
>> >> > only by
>> >> > the required number of elements.  Notice that in the following
>> >> > excerpt
>> >> > from
>> >> > ByteArrayOutputStream.write(int) the new backing array capacity is
>> >> > set
>> >> > to 2
>> >> > * buf.length unless this value would not suffice to accommodate the
>> >> > required
>> >> > capacity in which case it is set to the required capacity.  If the
>> >> > current
>> >> > backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1,
>> >> > then
>> >> > buf.length << 1 overflows and resolves to a negative number resulting
>> >> > in
>> >> > the
>> >> > new capacity being set to the required capacity.  The major
>> >> > consequence
>> >> > of
>> >> > this, like with ArrayList, is that each subsequent write operation
>> >> > results
>> >> > in a full resize of the ByteArrayOutputStream causing performance to
>> >> > degrade
>> >> > significantly.
>> >> >
>> >> >     int newcount = count + 1;
>> >> >     if (newcount > buf.length) {
>> >> >             buf = Arrays.copyOf(buf, Math.max(buf.length << 1,
>> >> > newcount));
>> >> >     }
>> >> >
>> >> > It is interesting to note that any statements about the amortized
>> >> > time
>> >> > complexity of add/insert operations, such as the one in the ArrayList
>> >> > javadoc, are invalidated by the performance related bugs.  One
>> >> > solution
>> >> > to
>> >> > the above situations is to set the new capacity of the backing array
>> >> > to
>> >> > Integer.MAX_VALUE when the initial size calculation results in a
>> >> > negative
>> >> > number during a resize.
>> >> >
>> >> > Apologies if these bugs are already known.
>> >> >
>> >> > Regards,
>> >> >
>> >> > Kevin
>> >> >
>> >
>> >
>
>



More information about the core-libs-dev mailing list