Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream
Martin Buchholz
martinrb at google.com
Fri May 14 05:09:49 UTC 2010
OK, I committed the fix for this.
I also added some gratuitous tests to the new test case.
Martin
On Thu, May 13, 2010 at 15:46, David Holmes <David.Holmes at oracle.com> wrote:
> CR: 6952330 Fix for 6933217 broke contract of StringBuffer.ensureCapacity
>
> Thanks,
> David
>
> David Holmes said the following on 05/13/10 11:59:
>>
>> Hi Martin,
>>
>> Bugtraq is offline so I can't file a CR right now.
>>
>> The was caught by Mauve tests. Kelly sent a link to the mailing list but I
>> don't think he's a member so it's probably held up for approval.
>>
>> Martin Buchholz said the following on 05/13/10 11:38:
>>>
>>> Webrev at
>>>
>>>
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/StringBuffer-capacity/StringBuffer-capacity.patch
>>
>> Fix looks fine. Took me a few tries to confirm the test logic :)
>>
>> Thanks,
>> David
>> -----
>>
>>
>>> Martin
>>>
>>> On Wed, May 12, 2010 at 18:23, Martin Buchholz <martinrb at google.com>
>>> wrote:
>>>>
>>>> Alright, I'm convinced this is a bug. Please file one in bugtraq
>>>> (or expunge from tl in the unlikely event that would be permitted).
>>>> It appears we very carefully specified the capacity manipulation
>>>> behavior,
>>>> but didn't have any jtreg tests for it.
>>>> Do y'all run the Mauve tests? (Probably a good idea)
>>>>
>>>> Martin
>>>>
>>>> On Wed, May 12, 2010 at 17:31, David Holmes <David.Holmes at oracle.com>
>>>> wrote:
>>>>>
>>>>> Chris, Martin,
>>>>>
>>>>> The changes to AbstractStringBuilder violate the specification for
>>>>> ensureCapacity: it has to grow from N to 2N+2. The new code in:
>>>>>
>>>>> http://hg.openjdk.java.net/jdk7/tl-gate/jdk/rev/ec45423a4700
>>>>>
>>>>> void expandCapacity(int minimumCapacity) {
>>>>> - int newCapacity = (value.length + 1) * 2;
>>>>> + int newCapacity = value.length * 2;
>>>>>
>>>>> Has dropped the +2 part.
>>>>>
>>>>> This is causing test failures.
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>> Chris Hegarty said the following on 04/16/10 19:38:
>>>>>>
>>>>>> Martin Buchholz wrote:
>>>>>>>
>>>>>>> Hi Chris,
>>>>>>>
>>>>>>> I recently discovered another place to handle huge arrays better - in
>>>>>>> AbstractCollection.
>>>>>>> I've put those changes into
>>>>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize2/
>>>>>>> I propose to qfold these into the original changes for this bug
>>>>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>>>>>>> which have not yet been committed.
>>>>>>
>>>>>> Good catch Martin, these additional changes look good.
>>>>>>
>>>>>> -Chris.
>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>> On Tue, Mar 9, 2010 at 04:41, Christopher Hegarty -Sun Microsystems
>>>>>>> Ireland <Christopher.Hegarty at sun.com> wrote:
>>>>>>>>
>>>>>>>> Sorry Martin, I appear to have missed your original request to file
>>>>>>>> this
>>>>>>>> bug. I since filed the following:
>>>>>>>>
>>>>>>>> 6933217: Huge arrays handled poorly in core libraries
>>>>>>>>
>>>>>>>> The changes you are proposing seem reasonable to me.
>>>>>>>>
>>>>>>>> -Chris.
>>>>>>>>
>>>>>>>> Martin Buchholz wrote:
>>>>>>>>>
>>>>>>>>> [Chris or Alan, please review and file a bug]
>>>>>>>>>
>>>>>>>>> OK, guys,
>>>>>>>>>
>>>>>>>>> Here's a patch:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk7/ArrayResize/
>>>>>>>>>
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>> On Fri, Mar 5, 2010 at 02:48, Kevin L. Stern
>>>>>>>>> <kevin.l.stern at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Martin,
>>>>>>>>>>
>>>>>>>>>> Thank you for your reply. If I may, PriorityQueue appears to
>>>>>>>>>> employ
>>>>>>>>>> the
>>>>>>>>>> simple strategy that I suggested above in its grow method:
>>>>>>>>>>
>>>>>>>>>> int newCapacity = ((oldCapacity < 64)?
>>>>>>>>>> ((oldCapacity + 1) * 2):
>>>>>>>>>> ((oldCapacity / 2) * 3));
>>>>>>>>>> if (newCapacity < 0) // overflow
>>>>>>>>>> newCapacity = Integer.MAX_VALUE;
>>>>>>>>>>
>>>>>>>>>> It might be desirable to set a common strategy for capacity
>>>>>>>>>> increase
>>>>>>>>>> for
>>>>>>>>>> all
>>>>>>>>>> collections.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Kevin
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 5, 2010 at 3:04 AM, Martin Buchholz
>>>>>>>>>> <martinrb at google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Kevin,
>>>>>>>>>>>
>>>>>>>>>>> As you've noticed, creating objects within a factor of two of
>>>>>>>>>>> their natural limits is a good way to expose lurking bugs.
>>>>>>>>>>>
>>>>>>>>>>> I'm the one responsible for the algorithm in ArrayList.
>>>>>>>>>>> I'm a bit embarrassed, looking at that code today.
>>>>>>>>>>> We could set the array size to Integer.MAX_VALUE,
>>>>>>>>>>> but then you might hit an independent buglet in hotspot
>>>>>>>>>>> that you cannot allocate an array with Integer.MAX_VALUE
>>>>>>>>>>> elements, but Integer.MAX_VALUE - 5 (or so) works.
>>>>>>>>>>>
>>>>>>>>>>> It occurs to me that increasing the size by 50% is better done by
>>>>>>>>>>> int newCapacity = oldCapacity + (oldCapacity >> 1) + 1;
>>>>>>>>>>>
>>>>>>>>>>> I agree with the plan of setting the capacity to something near
>>>>>>>>>>> MAX_VALUE on overflow, and throw OutOfMemoryError on next resize.
>>>>>>>>>>>
>>>>>>>>>>> These bugs are not known.
>>>>>>>>>>> Chris Hegarty, could you file a bug for us?
>>>>>>>>>>>
>>>>>>>>>>> Martin
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern
>>>>>>>>>>> <kevin.l.stern at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>
>>>>>>>>>>>> I've noticed bugs in java.util.ArrayList, java.util.Hashtable
>>>>>>>>>>>> and
>>>>>>>>>>>> java.io.ByteArrayOutputStream which arise when the capacities of
>>>>>>>>>>>> the
>>>>>>>>>>>> data
>>>>>>>>>>>> structures reach a particular threshold. More below.
>>>>>>>>>>>>
>>>>>>>>>>>> When the capacity of an ArrayList reaches
>>>>>>>>>>>> (2/3)*Integer.MAX_VALUE
>>>>>>>>>>>> its
>>>>>>>>>>>> size
>>>>>>>>>>>> reaches its capacity and an add or an insert operation is
>>>>>>>>>>>> invoked,
>>>>>>>>>>>> the
>>>>>>>>>>>> capacity is increased by only one element. Notice that in the
>>>>>>>>>>>> following
>>>>>>>>>>>> excerpt from ArrayList.ensureCapacity the new capacity is set to
>>>>>>>>>>>> (3/2)
>>>>>>>>>>>> *
>>>>>>>>>>>> oldCapacity + 1 unless this value would not suffice to
>>>>>>>>>>>> accommodate
>>>>>>>>>>>> the
>>>>>>>>>>>> required capacity in which case it is set to the required
>>>>>>>>>>>> capacity.
>>>>>>>>>>>> If
>>>>>>>>>>>> the
>>>>>>>>>>>> current capacity is at least (2/3)*Integer.MAX_VALUE, then
>>>>>>>>>>>> (oldCapacity
>>>>>>>>>>>> *
>>>>>>>>>>>> 3)/2 + 1 overflows and resolves to a negative number resulting
>>>>>>>>>>>> in
>>>>>>>>>>>> the
>>>>>>>>>>>> new
>>>>>>>>>>>> capacity being set to the required capacity. The major
>>>>>>>>>>>> consequence
>>>>>>>>>>>> of
>>>>>>>>>>>> this
>>>>>>>>>>>> is that each subsequent add/insert operation results in a full
>>>>>>>>>>>> resize
>>>>>>>>>>>> of
>>>>>>>>>>>> the
>>>>>>>>>>>> ArrayList causing performance to degrade significantly.
>>>>>>>>>>>>
>>>>>>>>>>>> int newCapacity = (oldCapacity * 3)/2 + 1;
>>>>>>>>>>>> if (newCapacity < minCapacity)
>>>>>>>>>>>> newCapacity = minCapacity;
>>>>>>>>>>>>
>>>>>>>>>>>> Hashtable breaks entirely when the size of its backing array
>>>>>>>>>>>> reaches
>>>>>>>>>>>> (1/2) *
>>>>>>>>>>>> Integer.MAX_VALUE and a rehash is necessary as is evident from
>>>>>>>>>>>> the
>>>>>>>>>>>> following
>>>>>>>>>>>> excerpt from rehash. Notice that rehash will attempt to create
>>>>>>>>>>>> an
>>>>>>>>>>>> array
>>>>>>>>>>>> of
>>>>>>>>>>>> negative size if the size of the backing array reaches (1/2) *
>>>>>>>>>>>> Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and
>>>>>>>>>>>> resolves
>>>>>>>>>>>> to a
>>>>>>>>>>>> negative number.
>>>>>>>>>>>>
>>>>>>>>>>>> int newCapacity = oldCapacity * 2 + 1;
>>>>>>>>>>>> HashtableEntry newTable[] = new HashtableEntry[newCapacity];
>>>>>>>>>>>>
>>>>>>>>>>>> When the capacity of the backing array in a
>>>>>>>>>>>> ByteArrayOutputStream
>>>>>>>>>>>> reaches
>>>>>>>>>>>> (1/2) * Integer.MAX_VALUE its size reaches its capacity and a
>>>>>>>>>>>> write
>>>>>>>>>>>> operation is invoked, the capacity of the backing array is
>>>>>>>>>>>> increased
>>>>>>>>>>>> only by
>>>>>>>>>>>> the required number of elements. Notice that in the following
>>>>>>>>>>>> excerpt
>>>>>>>>>>>> from
>>>>>>>>>>>> ByteArrayOutputStream.write(int) the new backing array capacity
>>>>>>>>>>>> is
>>>>>>>>>>>> set
>>>>>>>>>>>> to 2
>>>>>>>>>>>> * buf.length unless this value would not suffice to accommodate
>>>>>>>>>>>> the
>>>>>>>>>>>> required
>>>>>>>>>>>> capacity in which case it is set to the required capacity. If
>>>>>>>>>>>> the
>>>>>>>>>>>> current
>>>>>>>>>>>> backing array capacity is at least (1/2) * Integer.MAX_VALUE +
>>>>>>>>>>>> 1,
>>>>>>>>>>>> then
>>>>>>>>>>>> buf.length << 1 overflows and resolves to a negative number
>>>>>>>>>>>> resulting
>>>>>>>>>>>> in
>>>>>>>>>>>> the
>>>>>>>>>>>> new capacity being set to the required capacity. The major
>>>>>>>>>>>> consequence
>>>>>>>>>>>> of
>>>>>>>>>>>> this, like with ArrayList, is that each subsequent write
>>>>>>>>>>>> operation
>>>>>>>>>>>> results
>>>>>>>>>>>> in a full resize of the ByteArrayOutputStream causing
>>>>>>>>>>>> performance to
>>>>>>>>>>>> degrade
>>>>>>>>>>>> significantly.
>>>>>>>>>>>>
>>>>>>>>>>>> int newcount = count + 1;
>>>>>>>>>>>> if (newcount > buf.length) {
>>>>>>>>>>>> buf = Arrays.copyOf(buf, Math.max(buf.length << 1,
>>>>>>>>>>>> newcount));
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> It is interesting to note that any statements about the
>>>>>>>>>>>> amortized
>>>>>>>>>>>> time
>>>>>>>>>>>> complexity of add/insert operations, such as the one in the
>>>>>>>>>>>> ArrayList
>>>>>>>>>>>> javadoc, are invalidated by the performance related bugs. One
>>>>>>>>>>>> solution
>>>>>>>>>>>> to
>>>>>>>>>>>> the above situations is to set the new capacity of the backing
>>>>>>>>>>>> array
>>>>>>>>>>>> to
>>>>>>>>>>>> Integer.MAX_VALUE when the initial size calculation results in a
>>>>>>>>>>>> negative
>>>>>>>>>>>> number during a resize.
>>>>>>>>>>>>
>>>>>>>>>>>> Apologies if these bugs are already known.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Kevin
>>>>>>>>>>>>
>
More information about the core-libs-dev
mailing list