Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream
Kelly O'Hair
kelly.ohair at oracle.com
Thu May 13 00:52:58 UTC 2010
If it helps, I found an open test that should demonstrate it:
http://www.sourceware.org/cgi-bin/cvsweb.cgi/mauve/gnu/testlet/java/lang/StringBuffer/StringBufferTest.java?rev=1.7&content-type=text/x-cvsweb-markup&cvsroot=mauve
-kto
On May 12, 2010, at 5:31 PM, David Holmes wrote:
> Chris, Martin,
>
> The changes to AbstractStringBuilder violate the specification for
> ensureCapacity: it has to grow from N to 2N+2. The new code in:
>
> http://hg.openjdk.java.net/jdk7/tl-gate/jdk/rev/ec45423a4700
>
> void expandCapacity(int minimumCapacity) {
> - int newCapacity = (value.length + 1) * 2;
> + int newCapacity = value.length * 2;
>
> Has dropped the +2 part.
>
> This is causing test failures.
>
> David
> -----
>
> Chris Hegarty said the following on 04/16/10 19:38:
>> Martin Buchholz wrote:
>>> Hi Chris,
>>>
>>> I recently discovered another place to handle huge arrays better -
>>> in
>>> AbstractCollection.
>>> I've put those changes into
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize2/
>>> I propose to qfold these into the original changes for this bug
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>>> which have not yet been committed.
>> Good catch Martin, these additional changes look good.
>> -Chris.
>>>
>>> Martin
>>>
>>> On Tue, Mar 9, 2010 at 04:41, Christopher Hegarty -Sun Microsystems
>>> Ireland <Christopher.Hegarty at sun.com> wrote:
>>>> Sorry Martin, I appear to have missed your original request to
>>>> file this
>>>> bug. I since filed the following:
>>>>
>>>> 6933217: Huge arrays handled poorly in core libraries
>>>>
>>>> The changes you are proposing seem reasonable to me.
>>>>
>>>> -Chris.
>>>>
>>>> Martin Buchholz wrote:
>>>>> [Chris or Alan, please review and file a bug]
>>>>>
>>>>> OK, guys,
>>>>>
>>>>> Here's a patch:
>>>>>
>>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>>>>>
>>>>> Martin
>>>>>
>>>>> On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern <kevin.l.stern at gmail.com
>>>>> >
>>>>> wrote:
>>>>>> Hi Martin,
>>>>>>
>>>>>> Thank you for your reply. If I may, PriorityQueue appears to
>>>>>> employ the
>>>>>> simple strategy that I suggested above in its grow method:
>>>>>>
>>>>>> int newCapacity = ((oldCapacity < 64)?
>>>>>> ((oldCapacity + 1) * 2):
>>>>>> ((oldCapacity / 2) * 3));
>>>>>> if (newCapacity < 0) // overflow
>>>>>> newCapacity = Integer.MAX_VALUE;
>>>>>>
>>>>>> It might be desirable to set a common strategy for capacity
>>>>>> increase for
>>>>>> all
>>>>>> collections.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Kevin
>>>>>>
>>>>>> On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz <martinrb at google.com
>>>>>> >
>>>>>> wrote:
>>>>>>> Hi Kevin,
>>>>>>>
>>>>>>> As you've noticed, creating objects within a factor of two of
>>>>>>> their natural limits is a good way to expose lurking bugs.
>>>>>>>
>>>>>>> I'm the one responsible for the algorithm in ArrayList.
>>>>>>> I'm a bit embarrassed, looking at that code today.
>>>>>>> We could set the array size to Integer.MAX_VALUE,
>>>>>>> but then you might hit an independent buglet in hotspot
>>>>>>> that you cannot allocate an array with Integer.MAX_VALUE
>>>>>>> elements, but Integer.MAX_VALUE - 5 (or so) works.
>>>>>>>
>>>>>>> It occurs to me that increasing the size by 50% is better done
>>>>>>> by
>>>>>>> int newCapacity = oldCapacity + (oldCapacity >> 1) + 1;
>>>>>>>
>>>>>>> I agree with the plan of setting the capacity to something near
>>>>>>> MAX_VALUE on overflow, and throw OutOfMemoryError on next
>>>>>>> resize.
>>>>>>>
>>>>>>> These bugs are not known.
>>>>>>> Chris Hegarty, could you file a bug for us?
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern <kevin.l.stern at gmail.com
>>>>>>> >
>>>>>>> wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I've noticed bugs in java.util.ArrayList, java.util.Hashtable
>>>>>>>> and
>>>>>>>> java.io.ByteArrayOutputStream which arise when the capacities
>>>>>>>> of the
>>>>>>>> data
>>>>>>>> structures reach a particular threshold. More below.
>>>>>>>>
>>>>>>>> When the capacity of an ArrayList reaches
>>>>>>>> (2/3)*Integer.MAX_VALUE its
>>>>>>>> size
>>>>>>>> reaches its capacity and an add or an insert operation is
>>>>>>>> invoked, the
>>>>>>>> capacity is increased by only one element. Notice that in the
>>>>>>>> following
>>>>>>>> excerpt from ArrayList.ensureCapacity the new capacity is set
>>>>>>>> to (3/2)
>>>>>>>> *
>>>>>>>> oldCapacity + 1 unless this value would not suffice to
>>>>>>>> accommodate the
>>>>>>>> required capacity in which case it is set to the required
>>>>>>>> capacity. If
>>>>>>>> the
>>>>>>>> current capacity is at least (2/3)*Integer.MAX_VALUE, then
>>>>>>>> (oldCapacity
>>>>>>>> *
>>>>>>>> 3)/2 + 1 overflows and resolves to a negative number
>>>>>>>> resulting in the
>>>>>>>> new
>>>>>>>> capacity being set to the required capacity. The major
>>>>>>>> consequence of
>>>>>>>> this
>>>>>>>> is that each subsequent add/insert operation results in a
>>>>>>>> full resize
>>>>>>>> of
>>>>>>>> the
>>>>>>>> ArrayList causing performance to degrade significantly.
>>>>>>>>
>>>>>>>> int newCapacity = (oldCapacity * 3)/2 + 1;
>>>>>>>> if (newCapacity < minCapacity)
>>>>>>>> newCapacity = minCapacity;
>>>>>>>>
>>>>>>>> Hashtable breaks entirely when the size of its backing array
>>>>>>>> reaches
>>>>>>>> (1/2) *
>>>>>>>> Integer.MAX_VALUE and a rehash is necessary as is evident
>>>>>>>> from the
>>>>>>>> following
>>>>>>>> excerpt from rehash. Notice that rehash will attempt to
>>>>>>>> create an
>>>>>>>> array
>>>>>>>> of
>>>>>>>> negative size if the size of the backing array reaches (1/2) *
>>>>>>>> Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and
>>>>>>>> resolves to a
>>>>>>>> negative number.
>>>>>>>>
>>>>>>>> int newCapacity = oldCapacity * 2 + 1;
>>>>>>>> HashtableEntry newTable[] = new HashtableEntry[newCapacity];
>>>>>>>>
>>>>>>>> When the capacity of the backing array in a
>>>>>>>> ByteArrayOutputStream
>>>>>>>> reaches
>>>>>>>> (1/2) * Integer.MAX_VALUE its size reaches its capacity and a
>>>>>>>> write
>>>>>>>> operation is invoked, the capacity of the backing array is
>>>>>>>> increased
>>>>>>>> only by
>>>>>>>> the required number of elements. Notice that in the
>>>>>>>> following excerpt
>>>>>>>> from
>>>>>>>> ByteArrayOutputStream.write(int) the new backing array
>>>>>>>> capacity is set
>>>>>>>> to 2
>>>>>>>> * buf.length unless this value would not suffice to
>>>>>>>> accommodate the
>>>>>>>> required
>>>>>>>> capacity in which case it is set to the required capacity.
>>>>>>>> If the
>>>>>>>> current
>>>>>>>> backing array capacity is at least (1/2) * Integer.MAX_VALUE
>>>>>>>> + 1, then
>>>>>>>> buf.length << 1 overflows and resolves to a negative number
>>>>>>>> resulting
>>>>>>>> in
>>>>>>>> the
>>>>>>>> new capacity being set to the required capacity. The major
>>>>>>>> consequence
>>>>>>>> of
>>>>>>>> this, like with ArrayList, is that each subsequent write
>>>>>>>> operation
>>>>>>>> results
>>>>>>>> in a full resize of the ByteArrayOutputStream causing
>>>>>>>> performance to
>>>>>>>> degrade
>>>>>>>> significantly.
>>>>>>>>
>>>>>>>> int newcount = count + 1;
>>>>>>>> if (newcount > buf.length) {
>>>>>>>> buf = Arrays.copyOf(buf, Math.max(buf.length << 1,
>>>>>>>> newcount));
>>>>>>>> }
>>>>>>>>
>>>>>>>> It is interesting to note that any statements about the
>>>>>>>> amortized time
>>>>>>>> complexity of add/insert operations, such as the one in the
>>>>>>>> ArrayList
>>>>>>>> javadoc, are invalidated by the performance related bugs.
>>>>>>>> One solution
>>>>>>>> to
>>>>>>>> the above situations is to set the new capacity of the
>>>>>>>> backing array to
>>>>>>>> Integer.MAX_VALUE when the initial size calculation results
>>>>>>>> in a
>>>>>>>> negative
>>>>>>>> number during a resize.
>>>>>>>>
>>>>>>>> Apologies if these bugs are already known.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Kevin
>>>>>>>>
More information about the core-libs-dev
mailing list