Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream

Martin Buchholz martinrb at google.com
Thu May 13 01:38:10 UTC 2010


Webrev at

http://cr.openjdk.java.net/~martin/webrevs/openjdk7/StringBuffer-capacity/StringBuffer-capacity.patch

Martin

On Wed, May 12, 2010 at 18:23, Martin Buchholz <martinrb at google.com> wrote:
> Alright, I'm convinced this is a bug.  Please file one in bugtraq
> (or expunge from tl in the unlikely event that would be permitted).
> It appears we very carefully specified the capacity manipulation behavior,
> but didn't have any jtreg tests for it.
> Do y'all run the Mauve tests?  (Probably a good idea)
>
> Martin
>
> On Wed, May 12, 2010 at 17:31, David Holmes <David.Holmes at oracle.com> wrote:
>> Chris, Martin,
>>
>> The changes to AbstractStringBuilder violate the specification for
>> ensureCapacity: it has to grow from N to 2N+2. The new code in:
>>
>> http://hg.openjdk.java.net/jdk7/tl-gate/jdk/rev/ec45423a4700
>>
>>  void expandCapacity(int minimumCapacity) {
>> - int newCapacity = (value.length + 1) * 2;
>> + int newCapacity = value.length * 2;
>>
>> Has dropped the +2 part.
>>
>> This is causing test failures.
>>
>> David
>> -----
>>
>> Chris Hegarty said the following on 04/16/10 19:38:
>>>
>>> Martin Buchholz wrote:
>>>>
>>>> Hi Chris,
>>>>
>>>> I recently discovered another place to handle huge arrays better - in
>>>> AbstractCollection.
>>>> I've put those changes into
>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize2/
>>>> I propose to qfold these into the original changes for this bug
>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>>>> which have not yet been committed.
>>>
>>> Good catch Martin, these additional changes look good.
>>>
>>> -Chris.
>>>
>>>>
>>>> Martin
>>>>
>>>> On Tue, Mar 9, 2010 at 04:41, Christopher Hegarty -Sun Microsystems
>>>> Ireland <Christopher.Hegarty at sun.com> wrote:
>>>>>
>>>>> Sorry Martin, I appear to have missed your original request to file this
>>>>> bug. I since filed the following:
>>>>>
>>>>>  6933217: Huge arrays handled poorly in core libraries
>>>>>
>>>>> The changes you are proposing seem reasonable to me.
>>>>>
>>>>> -Chris.
>>>>>
>>>>> Martin Buchholz wrote:
>>>>>>
>>>>>> [Chris or Alan, please review and file a bug]
>>>>>>
>>>>>> OK, guys,
>>>>>>
>>>>>> Here's a patch:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern <kevin.l.stern at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Martin,
>>>>>>>
>>>>>>> Thank you for your reply.  If I may, PriorityQueue appears to employ
>>>>>>> the
>>>>>>> simple strategy that I suggested above in its grow method:
>>>>>>>
>>>>>>>       int newCapacity = ((oldCapacity < 64)?
>>>>>>>                          ((oldCapacity + 1) * 2):
>>>>>>>                          ((oldCapacity / 2) * 3));
>>>>>>>       if (newCapacity < 0) // overflow
>>>>>>>           newCapacity = Integer.MAX_VALUE;
>>>>>>>
>>>>>>> It might be desirable to set a common strategy for capacity increase
>>>>>>> for
>>>>>>> all
>>>>>>> collections.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Kevin
>>>>>>>
>>>>>>> On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz <martinrb at google.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Kevin,
>>>>>>>>
>>>>>>>> As you've noticed, creating objects within a factor of two of
>>>>>>>> their natural limits is a good way to expose lurking bugs.
>>>>>>>>
>>>>>>>> I'm the one responsible for the algorithm in ArrayList.
>>>>>>>> I'm a bit embarrassed, looking at that code today.
>>>>>>>> We could set the array size to Integer.MAX_VALUE,
>>>>>>>> but then you might hit an independent buglet in hotspot
>>>>>>>> that you cannot allocate an array with Integer.MAX_VALUE
>>>>>>>> elements, but Integer.MAX_VALUE - 5 (or so) works.
>>>>>>>>
>>>>>>>> It occurs to me that increasing the size by 50% is better done by
>>>>>>>> int newCapacity = oldCapacity + (oldCapacity >> 1) + 1;
>>>>>>>>
>>>>>>>> I agree with the plan of setting the capacity to something near
>>>>>>>> MAX_VALUE on overflow, and throw OutOfMemoryError on next resize.
>>>>>>>>
>>>>>>>> These bugs are not known.
>>>>>>>> Chris Hegarty, could you file a bug for us?
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern
>>>>>>>> <kevin.l.stern at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I've noticed bugs in java.util.ArrayList, java.util.Hashtable and
>>>>>>>>> java.io.ByteArrayOutputStream which arise when the capacities of the
>>>>>>>>> data
>>>>>>>>> structures reach a particular threshold.  More below.
>>>>>>>>>
>>>>>>>>> When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE
>>>>>>>>> its
>>>>>>>>> size
>>>>>>>>> reaches its capacity and an add or an insert operation is invoked,
>>>>>>>>> the
>>>>>>>>> capacity is increased by only one element.  Notice that in the
>>>>>>>>> following
>>>>>>>>> excerpt from ArrayList.ensureCapacity the new capacity is set to
>>>>>>>>> (3/2)
>>>>>>>>> *
>>>>>>>>> oldCapacity + 1 unless this value would not suffice to accommodate
>>>>>>>>> the
>>>>>>>>> required capacity in which case it is set to the required capacity.
>>>>>>>>>  If
>>>>>>>>> the
>>>>>>>>> current capacity is at least (2/3)*Integer.MAX_VALUE, then
>>>>>>>>> (oldCapacity
>>>>>>>>> *
>>>>>>>>> 3)/2 + 1 overflows and resolves to a negative number resulting in
>>>>>>>>> the
>>>>>>>>> new
>>>>>>>>> capacity being set to the required capacity.  The major consequence
>>>>>>>>> of
>>>>>>>>> this
>>>>>>>>> is that each subsequent add/insert operation results in a full
>>>>>>>>> resize
>>>>>>>>> of
>>>>>>>>> the
>>>>>>>>> ArrayList causing performance to degrade significantly.
>>>>>>>>>
>>>>>>>>>       int newCapacity = (oldCapacity * 3)/2 + 1;
>>>>>>>>>           if (newCapacity < minCapacity)
>>>>>>>>>       newCapacity = minCapacity;
>>>>>>>>>
>>>>>>>>> Hashtable breaks entirely when the size of its backing array reaches
>>>>>>>>> (1/2) *
>>>>>>>>> Integer.MAX_VALUE and a rehash is necessary as is evident from the
>>>>>>>>> following
>>>>>>>>> excerpt from rehash.  Notice that rehash will attempt to create an
>>>>>>>>> array
>>>>>>>>> of
>>>>>>>>> negative size if the size of the backing array reaches (1/2) *
>>>>>>>>> Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves
>>>>>>>>> to a
>>>>>>>>> negative number.
>>>>>>>>>
>>>>>>>>>   int newCapacity = oldCapacity * 2 + 1;
>>>>>>>>>   HashtableEntry newTable[] = new HashtableEntry[newCapacity];
>>>>>>>>>
>>>>>>>>> When the capacity of the backing array in a ByteArrayOutputStream
>>>>>>>>> reaches
>>>>>>>>> (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write
>>>>>>>>> operation is invoked, the capacity of the backing array is increased
>>>>>>>>> only by
>>>>>>>>> the required number of elements.  Notice that in the following
>>>>>>>>> excerpt
>>>>>>>>> from
>>>>>>>>> ByteArrayOutputStream.write(int) the new backing array capacity is
>>>>>>>>> set
>>>>>>>>> to 2
>>>>>>>>> * buf.length unless this value would not suffice to accommodate the
>>>>>>>>> required
>>>>>>>>> capacity in which case it is set to the required capacity.  If the
>>>>>>>>> current
>>>>>>>>> backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1,
>>>>>>>>> then
>>>>>>>>> buf.length << 1 overflows and resolves to a negative number
>>>>>>>>> resulting
>>>>>>>>> in
>>>>>>>>> the
>>>>>>>>> new capacity being set to the required capacity.  The major
>>>>>>>>> consequence
>>>>>>>>> of
>>>>>>>>> this, like with ArrayList, is that each subsequent write operation
>>>>>>>>> results
>>>>>>>>> in a full resize of the ByteArrayOutputStream causing performance to
>>>>>>>>> degrade
>>>>>>>>> significantly.
>>>>>>>>>
>>>>>>>>>   int newcount = count + 1;
>>>>>>>>>   if (newcount > buf.length) {
>>>>>>>>>           buf = Arrays.copyOf(buf, Math.max(buf.length << 1,
>>>>>>>>> newcount));
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> It is interesting to note that any statements about the amortized
>>>>>>>>> time
>>>>>>>>> complexity of add/insert operations, such as the one in the
>>>>>>>>> ArrayList
>>>>>>>>> javadoc, are invalidated by the performance related bugs.  One
>>>>>>>>> solution
>>>>>>>>> to
>>>>>>>>> the above situations is to set the new capacity of the backing array
>>>>>>>>> to
>>>>>>>>> Integer.MAX_VALUE when the initial size calculation results in a
>>>>>>>>> negative
>>>>>>>>> number during a resize.
>>>>>>>>>
>>>>>>>>> Apologies if these bugs are already known.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Kevin
>>>>>>>>>
>>
>



More information about the core-libs-dev mailing list