Trimming ArrayList and HashMap During GC Copy

Erik Osterlund erik.osterlund at oracle.com
Mon Jan 23 09:57:44 UTC 2023


Hi Nathan,

I can think of a number of concerning aspects of doing something like this:

1. Such a change would have user observable differences in behaviour, which could introduce bugs in user code, due to the optimization.

2. The JIT relies a lot on hoisting range checks out of loops to get good performance, but we need to safepoint poll inside of loops. If we take the slow path of such a safepoint poll, and perform a GC which shrinks the arrays, then the range checks that were only performed before the loop, are no longer valid, and we would essentially have to deoptimize all activation records (frames) in the system, which with virtual threads could be a lot of frames. Either that or stop hoisting range checks out of loops, but that comes with its own performance problems.

3. Many GCs allocate objects that are larger than a certain threshold in special storage that deals with unique concerns for larger objects. If said objects shrink to below that limit, I suspect we would have many interesting problems to deal with. Not impossible to fix, but would probably be rather painful to deal with.

Perhaps this could be modelled as a library problem instead. Sounds like perhaps the growing strategy of the library is too aggressive for your preferences?

Thanks,
/Erik

> On 21 Jan 2023, at 00:24, Nathan Reynolds <numeralnathan at gmail.com> wrote:
> 
> ArrayList, HashMap, and other collections have internal Object[]'s to store the elements.  The Object[]'s are lazily initialized to minimize heap usage.  However, there is still quite a bit of waste in these Object[]'s.  Some collections are oversized from the beginning.  Some collections are filled and then partially or fully emptied.  For an average heap dump, 10.1% of the heap is wasted on oversized Object[]'s.
> 
> These Object[]'s can be trimmed during GC's move operation but only when the process transitions to idle.  Since the trimming only happens when the application transitions to idle, then the extra CPU and/or pause time isn't a concern.  The performance penalty to the application to resize larger won't happen for  a while since the process is idle.  With the reduction of heap usage more idle processes can run on a single VM (i.e., the JVM will be more Docker Container friendly).  
> 
> The trimming may apply to other collections such as ArrayDeque, BitSet, HashSet, IdentityHashMap, LinkedHashMap, Stack, TreeMap, TreeSet, WeakHashMap, Hashtable, and Vector.  HashSet, IdentityHashMap, LinkedHashMap, TreeMap, and TreeSet may be implemented with HashMap so trimming HashMap will automatically make these other collections benefit.
> 
> How did I come up with the 10.1% heap reduction?  I analyzed 650 heap dumps and presented the findings in Java Memory Hogs.  By carefully adding up the right numbers, I calculate that the heap reduction will be 10.1%.  That's a big savings and it will impact the entire world of Java programs! 
> 
> I entertained the idea of having a setting to trim with every GC or when objects promote from young to old generation.  One concern is that the application would take a performance penalty to enlarge some or all the Object[]'s.  Another concern is that the extra CPU time and/or pause time will interfere with the application's performance.
> 
> What do you think of the overall idea?  Do you have ideas of how to trim more often than when the application transitions to idle?



More information about the hotspot-gc-dev mailing list