Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream
Martin Buchholz
martinrb at google.com
Fri Mar 5 09:04:32 UTC 2010
Hi Kevin,
As you've noticed, creating objects within a factor of two of
their natural limits is a good way to expose lurking bugs.
I'm the one responsible for the algorithm in ArrayList.
I'm a bit embarrassed, looking at that code today.
We could set the array size to Integer.MAX_VALUE,
but then you might hit an independent buglet in hotspot
that you cannot allocate an array with Integer.MAX_VALUE
elements, but Integer.MAX_VALUE - 5 (or so) works.
It occurs to me that increasing the size by 50% is better done by
int newCapacity = oldCapacity + (oldCapacity >> 1) + 1;
I agree with the plan of setting the capacity to something near
MAX_VALUE on overflow, and throw OutOfMemoryError on next resize.
These bugs are not known.
Chris Hegarty, could you file a bug for us?
Martin
On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern <kevin.l.stern at gmail.com> wrote:
> Greetings,
>
> I've noticed bugs in java.util.ArrayList, java.util.Hashtable and
> java.io.ByteArrayOutputStream which arise when the capacities of the data
> structures reach a particular threshold. More below.
>
> When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE its size
> reaches its capacity and an add or an insert operation is invoked, the
> capacity is increased by only one element. Notice that in the following
> excerpt from ArrayList.ensureCapacity the new capacity is set to (3/2) *
> oldCapacity + 1 unless this value would not suffice to accommodate the
> required capacity in which case it is set to the required capacity. If the
> current capacity is at least (2/3)*Integer.MAX_VALUE, then (oldCapacity *
> 3)/2 + 1 overflows and resolves to a negative number resulting in the new
> capacity being set to the required capacity. The major consequence of this
> is that each subsequent add/insert operation results in a full resize of the
> ArrayList causing performance to degrade significantly.
>
> int newCapacity = (oldCapacity * 3)/2 + 1;
> if (newCapacity < minCapacity)
> newCapacity = minCapacity;
>
> Hashtable breaks entirely when the size of its backing array reaches (1/2) *
> Integer.MAX_VALUE and a rehash is necessary as is evident from the following
> excerpt from rehash. Notice that rehash will attempt to create an array of
> negative size if the size of the backing array reaches (1/2) *
> Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves to a
> negative number.
>
> int newCapacity = oldCapacity * 2 + 1;
> HashtableEntry newTable[] = new HashtableEntry[newCapacity];
>
> When the capacity of the backing array in a ByteArrayOutputStream reaches
> (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write
> operation is invoked, the capacity of the backing array is increased only by
> the required number of elements. Notice that in the following excerpt from
> ByteArrayOutputStream.write(int) the new backing array capacity is set to 2
> * buf.length unless this value would not suffice to accommodate the required
> capacity in which case it is set to the required capacity. If the current
> backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1, then
> buf.length << 1 overflows and resolves to a negative number resulting in the
> new capacity being set to the required capacity. The major consequence of
> this, like with ArrayList, is that each subsequent write operation results
> in a full resize of the ByteArrayOutputStream causing performance to degrade
> significantly.
>
> int newcount = count + 1;
> if (newcount > buf.length) {
> buf = Arrays.copyOf(buf, Math.max(buf.length << 1, newcount));
> }
>
> It is interesting to note that any statements about the amortized time
> complexity of add/insert operations, such as the one in the ArrayList
> javadoc, are invalidated by the performance related bugs. One solution to
> the above situations is to set the new capacity of the backing array to
> Integer.MAX_VALUE when the initial size calculation results in a negative
> number during a resize.
>
> Apologies if these bugs are already known.
>
> Regards,
>
> Kevin
>
More information about the core-libs-dev
mailing list