Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream

Martin Buchholz martinrb at google.com
Thu May 13 01:23:45 UTC 2010


Alright, I'm convinced this is a bug.  Please file one in bugtraq
(or expunge from tl in the unlikely event that would be permitted).
It appears we very carefully specified the capacity manipulation behavior,
but didn't have any jtreg tests for it.
Do y'all run the Mauve tests?  (Probably a good idea)

Martin

On Wed, May 12, 2010 at 17:31, David Holmes <David.Holmes at oracle.com> wrote:
> Chris, Martin,
>
> The changes to AbstractStringBuilder violate the specification for
> ensureCapacity: it has to grow from N to 2N+2. The new code in:
>
> http://hg.openjdk.java.net/jdk7/tl-gate/jdk/rev/ec45423a4700
>
>  void expandCapacity(int minimumCapacity) {
> - int newCapacity = (value.length + 1) * 2;
> + int newCapacity = value.length * 2;
>
> Has dropped the +2 part.
>
> This is causing test failures.
>
> David
> -----
>
> Chris Hegarty said the following on 04/16/10 19:38:
>>
>> Martin Buchholz wrote:
>>>
>>> Hi Chris,
>>>
>>> I recently discovered another place to handle huge arrays better - in
>>> AbstractCollection.
>>> I've put those changes into
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize2/
>>> I propose to qfold these into the original changes for this bug
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>>> which have not yet been committed.
>>
>> Good catch Martin, these additional changes look good.
>>
>> -Chris.
>>
>>>
>>> Martin
>>>
>>> On Tue, Mar 9, 2010 at 04:41, Christopher Hegarty -Sun Microsystems
>>> Ireland <Christopher.Hegarty at sun.com> wrote:
>>>>
>>>> Sorry Martin, I appear to have missed your original request to file this
>>>> bug. I since filed the following:
>>>>
>>>>  6933217: Huge arrays handled poorly in core libraries
>>>>
>>>> The changes you are proposing seem reasonable to me.
>>>>
>>>> -Chris.
>>>>
>>>> Martin Buchholz wrote:
>>>>>
>>>>> [Chris or Alan, please review and file a bug]
>>>>>
>>>>> OK, guys,
>>>>>
>>>>> Here's a patch:
>>>>>
>>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>>>>>
>>>>> Martin
>>>>>
>>>>> On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern <kevin.l.stern at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi Martin,
>>>>>>
>>>>>> Thank you for your reply.  If I may, PriorityQueue appears to employ
>>>>>> the
>>>>>> simple strategy that I suggested above in its grow method:
>>>>>>
>>>>>>       int newCapacity = ((oldCapacity < 64)?
>>>>>>                          ((oldCapacity + 1) * 2):
>>>>>>                          ((oldCapacity / 2) * 3));
>>>>>>       if (newCapacity < 0) // overflow
>>>>>>           newCapacity = Integer.MAX_VALUE;
>>>>>>
>>>>>> It might be desirable to set a common strategy for capacity increase
>>>>>> for
>>>>>> all
>>>>>> collections.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Kevin
>>>>>>
>>>>>> On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz <martinrb at google.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Kevin,
>>>>>>>
>>>>>>> As you've noticed, creating objects within a factor of two of
>>>>>>> their natural limits is a good way to expose lurking bugs.
>>>>>>>
>>>>>>> I'm the one responsible for the algorithm in ArrayList.
>>>>>>> I'm a bit embarrassed, looking at that code today.
>>>>>>> We could set the array size to Integer.MAX_VALUE,
>>>>>>> but then you might hit an independent buglet in hotspot
>>>>>>> that you cannot allocate an array with Integer.MAX_VALUE
>>>>>>> elements, but Integer.MAX_VALUE - 5 (or so) works.
>>>>>>>
>>>>>>> It occurs to me that increasing the size by 50% is better done by
>>>>>>> int newCapacity = oldCapacity + (oldCapacity >> 1) + 1;
>>>>>>>
>>>>>>> I agree with the plan of setting the capacity to something near
>>>>>>> MAX_VALUE on overflow, and throw OutOfMemoryError on next resize.
>>>>>>>
>>>>>>> These bugs are not known.
>>>>>>> Chris Hegarty, could you file a bug for us?
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern
>>>>>>> <kevin.l.stern at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I've noticed bugs in java.util.ArrayList, java.util.Hashtable and
>>>>>>>> java.io.ByteArrayOutputStream which arise when the capacities of the
>>>>>>>> data
>>>>>>>> structures reach a particular threshold.  More below.
>>>>>>>>
>>>>>>>> When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE
>>>>>>>> its
>>>>>>>> size
>>>>>>>> reaches its capacity and an add or an insert operation is invoked,
>>>>>>>> the
>>>>>>>> capacity is increased by only one element.  Notice that in the
>>>>>>>> following
>>>>>>>> excerpt from ArrayList.ensureCapacity the new capacity is set to
>>>>>>>> (3/2)
>>>>>>>> *
>>>>>>>> oldCapacity + 1 unless this value would not suffice to accommodate
>>>>>>>> the
>>>>>>>> required capacity in which case it is set to the required capacity.
>>>>>>>>  If
>>>>>>>> the
>>>>>>>> current capacity is at least (2/3)*Integer.MAX_VALUE, then
>>>>>>>> (oldCapacity
>>>>>>>> *
>>>>>>>> 3)/2 + 1 overflows and resolves to a negative number resulting in
>>>>>>>> the
>>>>>>>> new
>>>>>>>> capacity being set to the required capacity.  The major consequence
>>>>>>>> of
>>>>>>>> this
>>>>>>>> is that each subsequent add/insert operation results in a full
>>>>>>>> resize
>>>>>>>> of
>>>>>>>> the
>>>>>>>> ArrayList causing performance to degrade significantly.
>>>>>>>>
>>>>>>>>       int newCapacity = (oldCapacity * 3)/2 + 1;
>>>>>>>>           if (newCapacity < minCapacity)
>>>>>>>>       newCapacity = minCapacity;
>>>>>>>>
>>>>>>>> Hashtable breaks entirely when the size of its backing array reaches
>>>>>>>> (1/2) *
>>>>>>>> Integer.MAX_VALUE and a rehash is necessary as is evident from the
>>>>>>>> following
>>>>>>>> excerpt from rehash.  Notice that rehash will attempt to create an
>>>>>>>> array
>>>>>>>> of
>>>>>>>> negative size if the size of the backing array reaches (1/2) *
>>>>>>>> Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves
>>>>>>>> to a
>>>>>>>> negative number.
>>>>>>>>
>>>>>>>>   int newCapacity = oldCapacity * 2 + 1;
>>>>>>>>   HashtableEntry newTable[] = new HashtableEntry[newCapacity];
>>>>>>>>
>>>>>>>> When the capacity of the backing array in a ByteArrayOutputStream
>>>>>>>> reaches
>>>>>>>> (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write
>>>>>>>> operation is invoked, the capacity of the backing array is increased
>>>>>>>> only by
>>>>>>>> the required number of elements.  Notice that in the following
>>>>>>>> excerpt
>>>>>>>> from
>>>>>>>> ByteArrayOutputStream.write(int) the new backing array capacity is
>>>>>>>> set
>>>>>>>> to 2
>>>>>>>> * buf.length unless this value would not suffice to accommodate the
>>>>>>>> required
>>>>>>>> capacity in which case it is set to the required capacity.  If the
>>>>>>>> current
>>>>>>>> backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1,
>>>>>>>> then
>>>>>>>> buf.length << 1 overflows and resolves to a negative number
>>>>>>>> resulting
>>>>>>>> in
>>>>>>>> the
>>>>>>>> new capacity being set to the required capacity.  The major
>>>>>>>> consequence
>>>>>>>> of
>>>>>>>> this, like with ArrayList, is that each subsequent write operation
>>>>>>>> results
>>>>>>>> in a full resize of the ByteArrayOutputStream causing performance to
>>>>>>>> degrade
>>>>>>>> significantly.
>>>>>>>>
>>>>>>>>   int newcount = count + 1;
>>>>>>>>   if (newcount > buf.length) {
>>>>>>>>           buf = Arrays.copyOf(buf, Math.max(buf.length << 1,
>>>>>>>> newcount));
>>>>>>>>   }
>>>>>>>>
>>>>>>>> It is interesting to note that any statements about the amortized
>>>>>>>> time
>>>>>>>> complexity of add/insert operations, such as the one in the
>>>>>>>> ArrayList
>>>>>>>> javadoc, are invalidated by the performance related bugs.  One
>>>>>>>> solution
>>>>>>>> to
>>>>>>>> the above situations is to set the new capacity of the backing array
>>>>>>>> to
>>>>>>>> Integer.MAX_VALUE when the initial size calculation results in a
>>>>>>>> negative
>>>>>>>> number during a resize.
>>>>>>>>
>>>>>>>> Apologies if these bugs are already known.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Kevin
>>>>>>>>
>



More information about the core-libs-dev mailing list