Bugs in java.util.ArrayList, java.util.Hashtable and java.io.ByteArrayOutputStream

develop4lasu at gmail.com develop4lasu at gmail.com
Fri Mar 5 09:47:39 UTC 2010


Hello,

I'm using my own Collections if it's possible so I can add some thoughts:

1. I would decrease default array size to 4/6/8, for me it was few Mb more  
of free memory ( i suggest testing on application that use at least 300Mb)

I would test:

initial size: 4
long newCapacity = ((long)oldCapacity) + (oldCapacity >> 1) + 2;

initial size: 6
long newCapacity = ((long)oldCapacity) + (oldCapacity >> 1) + 2;

initial size: 8
long newCapacity = ((long)oldCapacity) + (oldCapacity >> 1) + 2;

initial size: 4
long newCapacity = ((long)oldCapacity) + (oldCapacity >> 2) + 4;

and then use:
> (int)Math.min(newCapacity, Integer.MAX_VALUE);


Would be nice then for:
Collections.addAll(...)
to ask for proper capacity before adding any elements

Greetings.

W dniu 05-03-2010 10:04 Martin Buchholz <martinrb at google.com> napisał(a):
> Hi Kevin,



> As you've noticed, creating objects within a factor of two of

> their natural limits is a good way to expose lurking bugs.



> I'm the one responsible for the algorithm in ArrayList.

> I'ma bit embarrassed, looking at that code today.

> We could set the array size to Integer.MAX_VALUE,

> but then you might hit an independent buglet in hotspot

> that you cannot allocate an array with Integer.MAX_VALUE

> elements, but Integer.MAX_VALUE - 5 (or so) works.



> It occurs to me that increasing the size by 50% is better done by

> int newCapacity = oldCapacity + (oldCapacity >> 1) + 1;



> I agree with the plan of setting the capacity to something near

> MAX_VALUE on overflow, and throw OutOfMemoryError on next resize.



> These bugs are not known.

> Chris Hegarty, could you file a bug for us?



> Martin



> On Wed, Mar 3, 2010 at 17:41, Kevin L. Stern kevin.l.stern at gmail.com>  
> wrote:

> > Greetings,

> >

> > I've noticed bugs in java.util.ArrayList, java.util.Hashtable and

> > java.io.ByteArrayOutputStream which arise when the capacities of the  
> data

> > structures reach a particular threshold. More below.

> >

> > When the capacity of an ArrayList reaches (2/3)*Integer.MAX_VALUE its  
> size

> > reaches its capacity and an add or an insert operation is invoked, the

> > capacity is increased by only one element. Notice that in the following

> > excerpt from ArrayList.ensureCapacity the new capacity is set to (3/2) *

> > oldCapacity + 1 unless this value would not suffice to accommodate the

> > required capacity in which case it is set to the required capacity. If  
> the

> > current capacity is at least (2/3)*Integer.MAX_VALUE, then (oldCapacity  
> *

> > 3)/2 + 1 overflows and resolves to a negative number resulting in the  
> new

> > capacity being set to the required capacity. The major consequence of  
> this

> > is that each subsequent add/insert operation results in a full resize  
> of the

> > ArrayList causing performance to degrade significantly.

> >

> > int newCapacity = (oldCapacity * 3)/2 + 1;

> > if (newCapacity
> > newCapacity = minCapacity;

> >

> > Hashtable breaks entirely when the size of its backing array reaches  
> (1/2) *

> > Integer.MAX_VALUE and a rehash is necessary as is evident from the  
> following

> > excerpt from rehash. Notice that rehash will attempt to create an array  
> of

> > negative size if the size of the backing array reaches (1/2) *

> > Integer.MAX_VALUE since oldCapacity * 2 + 1 overflows and resolves to a

> > negative number.

> >

> > int newCapacity = oldCapacity * 2 + 1;

> > HashtableEntry newTable[] = new HashtableEntry[newCapacity];

> >

> > When the capacity of the backing array in a ByteArrayOutputStream  
> reaches

> > (1/2) * Integer.MAX_VALUE its size reaches its capacity and a write

> > operation is invoked, the capacity of the backing array is increased  
> only by

> > the required number of elements. Notice that in the following excerpt  
> from

> > ByteArrayOutputStream.write(int) the new backing array capacity is set  
> to 2

> > * buf.length unless this value would not suffice to accommodate the  
> required

> > capacity in which case it is set to the required capacity. If the  
> current

> > backing array capacity is at least (1/2) * Integer.MAX_VALUE + 1, then

> > buf.length
> > new capacity being set to the required capacity. The major consequence  
> of

> > this, like with ArrayList, is that each subsequent write operation  
> results

> > in a full resize of the ByteArrayOutputStream causing performance to  
> degrade

> > significantly.

> >

> > int newcount = count + 1;

> > if (newcount > buf.length) {

> > buf = Arrays.copyOf(buf, Math.max(buf.length
> > }

> >

> > It is interesting to note that any statements about the amortized time

> > complexity of add/insert operations, such as the one in the ArrayList

> > javadoc, are invalidated by the performance related bugs. One solution  
> to

> > the above situations is to set the new capacity of the backing array to

> > Integer.MAX_VALUE when the initial size calculation results in a  
> negative

> > number during a resize.

> >

> > Apologies if these bugs are already known.

> >

> > Regards,

> >

> > Kevin

> >


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100305/833b39e4/attachment.html>


More information about the core-libs-dev mailing list