Proposal: Better HashMap.resize() when memory is tight

Wed Nov 21 20:01:38 UTC 2007

Hi Martin

Thanks for responding so quickly and thoughtfully. I just thought that
trying this little bit harder could make the difference when your
already big hashmap overflows by just a few entries. While I agree
that testing this code under OOME conditions would be tiresome (it
took me quite a while to get the demo right, too), its basic soundness
would be very easy to test using direct calls to tightResize(). So a
single test showing it's really an improvement over fastResize() in
terms of max memory footprint would suffice, no? However, since you
say the scenario doesn't warrant the added maintenance burden, I'm
just going to take your word for it. After all, I'm not going to be
the one maintaining it, and I've never seen the problem in practice
myself, either. ;)

But, more generally speaking, does your "no bullet-proofing" argument
mean that in general you don't endorse switching algorithms to try to
cope with tight memory? I can see that starting to "bullet-proof"
could be a never-ending story. However, I think there is a distinction
between what I proposed and some of the wilder schemes one could
contemplate (like growing by less than doubling and having to switch
to less efficient hash->index conversions, or using a TreeMap to hold
overflows, etc.) These latter would affect the overall behaviour of
the HashMap significantly, leading to pervasive code changes. My
change only tries a little harder in one very isolated spot. It does
this with no significant code complexity, and with no effects on the
overall behaviour (other than making it still work in my scenario, of
course). Is that kind of "trying-harder" still bad? Isn't it kind of
similar to what the GC does, too?

Peter

On Nov 21, 2007 6:37 PM, Martin Buchholz <Martin.Buchholz at sun.com> wrote:
> Hi Peter,
>
> It's true that under low memory conditions your code would
> allow execution to continue under some circumstances.
> However, I'm not sure this would be an improvement to the JDK.
> Recovery from OOME is fraught with hazards.  We do occasionally
> try, but an application becomes much less reliable once OOMEs
> start appearing.  Perhaps it's better to fail than to pretend
> that the JDK has been bullet-proofed against OOME.
> OOME recovery code is rarely executed and hard to test.
> The new code would have to be maintained indefinitely,
> making future maintenance just a little bit harder for
> the maintainers.
>
> If the hashmap is fully populated, most of the memory is tied
> up in the Entry objects themselves, not in the table array.
> Each Entry object should be about 5 words of memory, while
> there's approximately one word used within the table array.
> So I don't think we'll see anything close to the factor of
> two max memory saving that we might expect.
>
> I would prefer to see engineering work go into something
> like auto-reduction of the table array when many elements
> have been removed, but that's a hard problem.
>
> Martin
>
>
> Peter Arrenbrecht wrote:
> > Hi all,
> >
> > I recently thought about how to resize hashmaps. When looking at the
> > JDK 6 source, I saw that java.util.HashMap always transfers old
> > entries from the old table. When memory is tight, it might be better
> > to first concatenate the entry lists into one big list, throw away the
> > old table, allocate the new one, and then fill it from the
> > concatenated list. So:
> >
> >       @Override
> >       void resize( int _newCapacity )
> >       {
> >               try {
> >                       fastResize( _newCapacity ); // This would be the current resize code.
> >               }
> >               catch (OutOfMemoryError e) {
> >                       tightResize( _newCapacity );
> >               }
> >       }
> >
> >       @SuppressWarnings( "unchecked" )
> >       private void tightResize( int newCapacity )
> >       {
> >               Entry head = joinLists();
> >               table = new Entry[ newCapacity ];
> >               threshold = (int) (newCapacity * loadFactor);
> >               transfer( head, table );
> >       }
> >
> >       @SuppressWarnings("unchecked")
> >       private Entry joinLists()
> >       {
> >               final Entry[] src = table;
> >               final int n = src.length;
> >               Entry head = null;
> >               int i = 0;
> >               while (i < n && null == head) {
> >                       head = src[ i++ ];
> >               }
> >               Entry tail = head;
> >               assert i >= n || null != tail;
> >               while (i < n) {
> >                       Entry e = src[ i++ ];
> >                       if (null != e) {
> >                               tail.next = e;
> >                               do {
> >                                       tail = e;
> >                                       e = e.next;
> >                               } while (null != e);
> >                       }
> >               }
> >               return head;
> >       }
> >
> >       @SuppressWarnings("unchecked")
> >       private void transfer( Entry head, Entry[] tgt )
> >       {
> >               int n = capacity();
> >               while (head != null) {
> >                       Entry e = head;
> >                       head = head.next;
> >                       int i = indexFor( e.hash, n );
> >                       e.next = tgt[ i ];
> >                       tgt[ i ] = e;
> >               }
> >       }
> >
> >
> > What do you think?
> > -peo
>
>
> ---------- Forwarded message ----------
> From: Peter Arrenbrecht <peter.arrenbrecht at gmail.com>
> To: core-libs-dev at openjdk.java.net
> Date: Wed, 21 Nov 2007 16:51:28 +0100
> Subject: Re: Proposal: Better HashMap.resize() when memory is tight
> Hi again, I have gone over my code again and a) discovered a very
> stupid mistake rendering the desired effect null and void, and b)
> developed a test that demos the effect of the improvement. Here's the
> improved code:
>
>         private void tightResize( int newCapacity )
>         {
>                 Entry head = joinLists();
>                 table = null; // free it first
>                 table = new Entry[ newCapacity ]; // then reallocate
>                 threshold = (int) (newCapacity * loadFactor);
>                 transfer( head, table );
>         }
>
> Below you can find the test code. This shows the problem here on
> Ubuntu Linux 7.04 with jre 1.6.0:
>
> java version "1.6.0"
> Java(TM) SE Runtime Environment (build 1.6.0-b105)
> Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)
>
> The command-line for the improved map is:
>
>   java -server -Xmx7m -cp bin ch.arrenbrecht.java.util.BetterHashMap new
>
> and for the old map:
>
>   java -server -Xmx7m -cp bin ch.arrenbrecht.java.util.BetterHashMap
>
> I have only managed to demo the effect on the server VM. And it is
> necessary to leave an Object array of size initial*2+something free,
> rather than just initial+something, which I expected. Maybe that is an
> effect of the generational collector. Also, there have been spurious
> cases where the test failed even with the new map. No idea why.
>
> Here is the test code:
>
>         public static void main( String[] args )
>         {
>                 final int initial = 131072;
>                 final float loadFactor = 0.5F;
>                 final HashMap<Integer, Integer> m;
>                 if (args.length > 0) {
>                         System.out.println( "Creating better map..." );
>                         m = new BetterHashMap<Integer, Integer>( initial, loadFactor );
>                 }
>                 else {
>                         System.out.println( "Creating standard map..." );
>                         m = new HashMap<Integer, Integer>( initial, loadFactor );
>                 }
>
>                 System.out.println( "Priming map (should see no resize here)..." );
>                 for (int i = 0; i < initial / 2; i++) {
>                         Integer o = i;
>                         m.put( o, o );
>                 }
>                 Integer o = initial;
>
>                 Entry head = blockMemExcept( initial * 2 + initial / 4 );
>                 System.out.println( "Filled with " + n + " entries." );
>
>                 System.out.println( "Adding next element (should see resize here)..." );
>                 m.put( o, o );
>
>                 if (head == null) System.out.println( "Bad." ); // force "head" to
> remain in scope
>                 System.out.println( "Done." );
>         }
>
>         /**
>          * Done separately so memBlock goes out of scope cleanly, leaving no
> local stack copies pointing
>          * to it.
>          */
>         private static Entry blockMemExcept( int exceptObjs )
>         {
>                 System.out.println( "Reserving memory..." );
>                 Object[] memBlock = new Object[ exceptObjs ];
>
>                 System.out.println( "Filling rest of memory..." );
>                 int i = 0;
>                 Entry head = null;
>                 try {
>                         while (true) {
>                                 head = new Entry( 0, null, null, head );
>                                 i++;
>                         }
>                 }
>                 catch (OutOfMemoryError e) {
>                         // ignore
>                 }
>
>                 if (memBlock[ 0 ] != null) return null;
>                 n = i;
>                 return head;
>         }
>         private static int n = 0;
>
> Cheers,
> -peo
>
>
> ps. This all runs on copies of HashMap and AbstractMap in
> ch.arrrenbrecht.java.util.
>
>
> On Nov 21, 2007 9:52 AM, Peter Arrenbrecht <peter.arrenbrecht at gmail.com> wrote:
> > Hi all,
> >
> > I recently thought about how to resize hashmaps. When looking at the
> > JDK 6 source, I saw that java.util.HashMap always transfers old
> > entries from the old table. When memory is tight, it might be better
> > to first concatenate the entry lists into one big list, throw away the
> > old table, allocate the new one, and then fill it from the
> > concatenated list. So:
> >
> >         @Override
> >         void resize( int _newCapacity )
> >         {
> >                 try {
> >                         fastResize( _newCapacity ); // This would be the current resize code.
> >                 }
> >                 catch (OutOfMemoryError e) {
> >                         tightResize( _newCapacity );
> >                 }
> >         }
> >
> >         @SuppressWarnings( "unchecked" )
> >         private void tightResize( int newCapacity )
> >         {
> >                 Entry head = joinLists();
> >                 table = new Entry[ newCapacity ];
> >                 threshold = (int) (newCapacity * loadFactor);
> >                 transfer( head, table );
> >         }
> >
> >         @SuppressWarnings("unchecked")
> >         private Entry joinLists()
> >         {
> >                 final Entry[] src = table;
> >                 final int n = src.length;
> >                 Entry head = null;
> >                 int i = 0;
> >                 while (i < n && null == head) {
> >                         head = src[ i++ ];
> >                 }
> >                 Entry tail = head;
> >                 assert i >= n || null != tail;
> >                 while (i < n) {
> >                         Entry e = src[ i++ ];
> >                         if (null != e) {
> >                                 tail.next = e;
> >                                 do {
> >                                         tail = e;
> >                                         e = e.next;
> >                                 } while (null != e);
> >                         }
> >                 }
> >                 return head;
> >         }
> >
> >         @SuppressWarnings("unchecked")
> >         private void transfer( Entry head, Entry[] tgt )
> >         {
> >                 int n = capacity();
> >                 while (head != null) {
> >                         Entry e = head;
> >                         head = head.next;
> >                         int i = indexFor( e.hash, n );
> >                         e.next = tgt[ i ];
> >                         tgt[ i ] = e;
> >                 }
> >         }
> >
> >
> > What do you think?
> > -peo
> >
>
>