[External] : Re: Trimming ArrayList and HashMap During GC Copy

Sat Jan 28 08:06:48 UTC 2023

Hi Nathan,

Unfortunately, that section did not cover it. That section talks about add, and source code level range checks in the code. I’m talking about get and JVM level range checks that happen on array loads. While the array list level source code range checks are interesting in their own right, they can be fiddled with when for example racingly using the ArrayList from different threads, or for example using reflection, class redefinition or instrumentation on the ArrayList. While that seems dumb, we can never allow the JVM to crash, even when processing dumb java code. That’s something that we gladly would walk through fire to ensure if we have to, and will never compromise on. Therefore the integrity of array accesses have to stand on their own feet, and not rely on some array list properties that dumb code violates.

To ensure that, when you read elements[index], the JVM has to emit a JVM-internal range check to prevent erroneous code from accessing data outside of the array. So to be crystal clear - there is a separate source code level bounds check in the ArrayList against the size - that is not the one I am referring to. That one is *not* enough to guarantee the integrity of the array isn’t violated, and hence that we don’t crash the JVM.

My main point was that taming the JVM internal range check of the array accesses, will likely come at a big cost. Both engineering wise and performance wise. The JIT believes the array length is truly immutable (because it is), and hence doesn’t believe any form of fencing will be used to guard updates to this field; it only applies to mutable fields, and the array length simply isn’t mutable, and is treated as such. This assumption is ingrained in the very reasoning of JVM internal range check and counted loop optimizations (loops that run a bounded number of times, usually driving performance optimizations). Teaching the JIT that the length field of arrays is in fact mutable, would ruin many such optimizations, which are crucial for performance. That’s what I’m saying is a hard one to sell.

You say the benefit of solving this problem in the GC is that synchronization is free. But it isn’t free though when we have to tame the JIT and break loop optimizations to make it work. It seems to me like buying a free Mc Donalds lunch voucher for $1000.

I think you are on to something in the problem domain, I just think the solution domain isn’t focused where it ought to be. If these data structures got bloated in an unwanted way, I think it would help to identify how they got there, and then try to improve the sizing heuristics in the libraries instead. You say taming growth isn’t the issue. So is it the lack of a shrinking policy when removing elements perhaps? Whatever it is, I think it’s best fixed explicitly at the library level, and not by hacking the entire JVM around this issue.

/Erik

On 28 Jan 2023, at 00:16, Nathan Reynolds <numeralnathan at gmail.com> wrote:

> But first of all, you still didn’t address my concern that out of bounds accesses into ArrayList may sometimes crash the JVM, instead of just throwing innocent exceptions. The aaload does a range check to know if it should throw, that gets aggressively moved as early as possible.

I thought I covered your concern in the section "Loop Hoisting Performance Concern".  I covered hoisting the if statement and range/bounds checking.  If you still think I haven't covered your concern please elaborate.

On Thu, Jan 26, 2023 at 1:04 PM Erik Osterlund <erik.osterlund at oracle.com<mailto:erik.osterlund at oracle.com>> wrote:
There is a lot to say here. But first of all, you still didn’t address my concern that out of bounds accesses into ArrayList may sometimes crash the JVM, instead of just throwing innocent exceptions. The aaload does a range check to know if it should throw, that gets aggressively moved as early as possible. Destroying array range check optimizations is going to be a difficult sell as they are crucial for good performance, and with those optimizations in place, shrinking arrays is not valid, as it allows accesses outside the bounds of the array.

/Erik

On 26 Jan 2023, at 20:55, Nathan Reynolds <numeralnathan at gmail.com<mailto:numeralnathan at gmail.com>> wrote:

> Perhaps the reason for the power of two sizing there, is because then going from a hashCode to a bucket can be done quickly using bitwise and, instead of expensive modulo. I suspect that changing that policy would come at a significant CPU overhead. Also, when elements are scattered randomly in the backing array, if you don’t rehash the table, you would only be able to remove a couple of trailing nulls at the end of the array, while many would remain in the middle between the elements. And rehashing might run into collisions which requires allocating tree nodes - something we can’t do from a GC context.

You are correct.  Changing away from a power of 2 table size is going to be a hard sell.  Perhaps there are workloads where the memory savings is worth it, but I don't have any data to support or reject such a proposal.  I consider the non-power of 2 table sizes outside my proposal.  I am only proposing trimming from one larger power of 2 to a smaller power of 2 as long as the HashMap's load factor is adhered to.

> Having said that, perhaps if a HashTable is too sparse, a smaller default size of the table and more dense load factor, would save some memory.

A smaller default table size will save 0.5% heap space for the average program.  We would need tests to make sure a smaller default size isn't going to cause a problem.

As for a more dense load factor, this drastically impacts the performance of HashMap.  Perhaps some workloads can benefit from this.  We could make the default loading factor a System property.  On the other hand, if someone is going to set the System property, then they probably already thought about setting the loading factor in the HashMap constructor.  We would need more data to be able to figure this out.

Pivot

Okay.  I am pivoting the design a bit in order to address some concerns.  I am still proposing off, idle, and always modes.  I still have the caveat that the always mode may impact the Java application performance too much.

For Serial and Parallel GC, we can trim ArrayLists and HashMaps at any time while the Java application is paused.

For G1, the application is paused during the evacuation phase.  We can trim during the evacuation since the Java application is paused.

For ZGC, we can trim during the relocation operation.

Here's what I understand about how ZGC works.

  1.  Right before relocation, the good color for pointers is changed to remap.
  2.  There is a race between multiple Java and (multiple?) GC threads to change all the pointers to the remap pointer.
  3.  The racing threads check if the pointer is destined for relocation.
     *   If not, the threads change the color of the pointer and return.
  4.  The threads check if the forwarding table already has a new address.
     *   If so, the threads change the pointer to the new address and return.
  5.  All threads allocate their own new space.
  6.  All threads copy the object's contents to their own new space.
  7.  The threads use CAS to attempt to insert their copy's address into the forwarding table.
  8.  The winner's copy becomes the official copy of the object.
  9.  The losers deallocate their own new space and use the official address for the object.

At some point, ZGC selects which objects will be relocated.  ZGC then executes step #1.  At the time step #1 happens, all objects become temporarily immutable.  For objects that will stay in-place, the Java and GC threads detect this and simply change the pointers' color.  The object is no longer immutable.  For objects that will move, the Java and GC threads copy from the old location into a new location(s).  The old location is effectively immutable because the Java threads must copy the object before any mutation can happen.

Here's what needs to change to support trimming.  This only happens if trimming is enabled for this GC collection.  If idle mode is selected, then most GC collections won't have trimming enabled.

  1.  No change
  2.  No change
  3.  No change
  4.  No change
  5.  All threads allocate their own new space for the ArrayList and its trimmed Object[].
  6.  All threads copy the ArrayList's contents, set the Object[]'s length in the new location to the ArrayList's size, and copy the Object[]'s elements up to size.
  7.  No change
  8.  No change
  9.  No change

When an object must be relocated, none of the threads can modify the old location.  The existing and new relocation algorithms don't allow for it.  This means the relocation and trimming does not have a race condition even though threads are competing to see which can do the copy first.  For example, let's say Thread A is on step #5, #6, or #7.  (The other steps don't matter.)  Let's say Thread B wins the race and then executes ArrayList's Java code and modifies the size, content, grow(), and/or shrink.  This modification happens in the new location.  Thread A won't see it because Thread A only looks at the old location which the algorithm does not allow to be modified.  Thread A will detect it lost the race and throw away its stale new copy.  Data integrity is preserved.

As for HashMap, here are steps #5 and #6.

5. All threads allocate their own new space for the HashMap, its new power-of-2-length Object[], and all the HashMap Nodes.
6. All threads copy the HashMap's contents and the HashMap Nodes' key, value and hash into their new location (and not the next pointer).  All threads rebuild their own copy of the hashtable.  The threads only use the Nodes' hash value and hence don't need to compute hashes, check for equality, or execute Java code.

Don't we need to check if two Nodes have equal keys?  No.  The equality check happened when the keys were put into the HashMap in Java code.  They weren't equal then and so must remain as 2 different Nodes.

Java Code Changes

All the GC algorithms require some help from ArrayList and HashMap code.  Let's consider this pseudo code for ArrayList.add().

public boolean add(E item)
{
   int index, length;

   // Potential safepoint #1

   index = size++;

   // Potential safepoint #2
   // TODO Add fence
   // Potential safepoint #3

   length = elementData.length;

   // Potential safepoint #4

   if (size > length)
   {
      // Potential safepoint #5
      grow();
      // Potential safepoint #6
   }

   // Potential safepoint #7

   elementData[index] = item;

   // Potential safepoint #8

   return true;
}

In order to ensure this code works with GC's trimming, we need to add a fence between incrementing the size field and reading elementData.length.  This fence ensures that the trimming code sees the need for at least the new size of elements in the new Object[].  The fence prevents JIT from reordering the store to the size field after reading elementData's length.

We only need an opaque fence.  We don't need a full or acquire fence that impacts CPU behavior and hence performance.  GC pauses ensure adequate CPU fencing.  JIT can put a safepoint at any of the potential safepoint places and it will work.  Of course, potential safepoints #1 and #8 can also be located outside add().  Potential safepoints #5 and #6 can be located anywhere inside grow().

Potential safepoint #1: Sadly, the trim operation will see the smaller size and change elementData's length to match the current size.  When the Java thread wakes up, the Java thread will call grow().  Since trimming only happens when the program is transitioning to idle, the extra CPU time to trim and regrow is inconsequential.  (This scenario begs the question if the idle transition logic is correct because there is an active Java thread.)

Potential safepoint #2: The trim operation will see the larger size and change elementData's length to match the larger size.  When the Java thread wakes up, the Java thread will see that the elementData's length has room for 1 more element.  The Java thread will not call grow() and simply execute elementData[index] = item.

Potential safepoint #3: Same as #2

Potential safepoint #4: The trim operation will see the larger size and change elementData's length to match the larger size.  When the Java thread wakes up, the length local variable will not match elementData's length.  This is okay.  Here are 2 scenarios.  1) The length local variable could be ≤ than size.  Sadly, the Java thread will call grow() obviating the trim operation's work.  2) The length local variable could be > than size.  The Java code won't call grow() and will execute elementData[index] = item.  Setting the item parameter into the array will work because the trim operation left space at index.

Potential safepoint #5: The trim operation will see the larger size and change elementData's length to match the larger size.  Sadly, when the Java thread wakes up, it will call grow() obviating the trim operation's work.

Potential safepoint #6: The trim operation will see the larger size and change elementData's length to match the larger size.  When the Java thread wakes up, it will execute elementData[index] = item.  Setting the item parameter into the array will work because the trim operation left space at index.

Potential safepoint #7: Same as #6.

Potential safepoint #8: The Java thread finished adding the item parameter to elementData before the trim operation.  Thus, the trim operation will make elementData's length == size.

For remove operations, the size change must come after elementData[index] = null.  We need another opaque fence to ensure size changes after elementData[index] = null.

The same logic applies for HashMap, size field must be incremented before checking if the hashtable needs to enlarge.  The size field must be decremented after removing a node from the hashtable.

Loop Hoisting Performance Concern

Let's say JIT inlines add() into a loop where the loop will add N elements.  To hoist the if statement (and hence bounds check) out of the loop, JIT would have to hoist size += N outside the loop along with the if statement and the grow() call.  The opaque fence prevents JIT from hoisting anything.  We can make JIT handle this special case and hoist the opaque fence along with the rest of the hoisted logic.  No performance is lost.

This violates the meaning of an opaque fence.  So, either we allow the violation in this special case because the code still works or we create a new fence that is weaker than opaque.  We really only need the store to the size field to happen in program order before loading elementData.length.  All other loads and stores can move around.

Perhaps we can use VarHandle.setOpaque(size) and VarHandle.getOpaque(elementData) instead of a fence.  I don't know the minutia on how this works, but perhaps this change allows JIT to recognize it can hoist.

Concurrent Collections

Concurrent collections don't need trimming to happen during GC since the data structure already handles concurrent changes.  GC can simply collect pointers to all the applicable concurrent collections (i.e., ConcurrentHashMap, and CopyOnWriteArrayList).  We have a few options on what kind of thread can do the trimming.  1) Using the fork-join pool of threads would take advantage of all the cores but may interfere with the program.  2) A single dedicated Java thread means the trimming won't be able to take advantage of multiple cores yet the impact will be minimal to the program.  I know that thread priority is rarely used, but allowing a configuration option to set this thread's priority to idle would ensure trimming never takes CPU from the program.  3) Loom threads would take advantage of all the cores but will take CPU time away from the program since trimming is largely (completely?) a non-blocking operation.  A configuration option to set these threads' priority to idle or low would be ideal.

Notes

For ArrayList, the extra trimming work is negligible and probably ends up being a net zero or faster because fewer nulls have to be copied from the old Object[] to the new Object[] location.

For HashMap, the extra trimming work will probably have a minuscule impact.  However, this minuscule impact may be replaced with better overall program performance.  Years ago after String de-duplication was implemented, I was surprised that reducing the memory usage of programs had the benefit of improving response times.  So, the extra CPU spent to de-duplicate is well spent in improving program performance.  In the future, trim's CPU time will be compensated by better overall program performance.

We need to consider if young collections will be trimmed.  On one hand, trimming all objects makes the ZGC relocation algorithm easier.  However, the ZGC relocation algorithm only needs to check if the collection is old.  So, the algorithm only needs another if statement.  On the other hand, it may be a waste of CPU time to trim young collections since most of the young collections will be garbage soon.  However, if the program is transitioning to idle, then trimming the young collections makes sense since these collections could be around for a 4-day weekend.

I am still only suggesting that we do trimming when the program transitions to idle.  This way the extra CPU spent on trimming has an inconsequential impact.  Potentially the always mode for trimming will be fine for most workloads.  This will require performance testing.

For always mode, perhaps we only trim old collections each GC cycle and young collections when the program transitions to idle.  It would be interesting to know how many old collections have large variances in size.  Queues probably swing wildly in size and so trimming wouldn't be a good idea.  The data in caches probably change over time.  However, the cache size is usually fixed and maybe constant.  If constant, trimming will be of great benefit.  If we can determine which collections swing wildly in size, then trimming can avoid those and we are one step closer to being able to enable the always mode for all workloads.

On Mon, Jan 23, 2023 at 10:20 PM Erik Osterlund <erik.osterlund at oracle.com<mailto:erik.osterlund at oracle.com>> wrote:

On 24 Jan 2023, at 02:33, Nathan Reynolds <numeralnathan at gmail.com<mailto:numeralnathan at gmail.com>> wrote:

> The array list code itself will probably blow up if for example you GC in the middle of let’s say the add method. It checks if we have capacity, then we GC draining that capacity, then add the object out of bounds. GC safepoints can happen anywhere in Java code, including in any method on the array list, which uses the length for various reasons, assuming it won’t change seemingly concurrently.

You raise a valid point.  Thank you for catching this.  I wrote a response but came up with something better below.

> Loads from an array can speculatively float above user control checks like the ArrayList internal range check. That’s fine and the user can’t observe the difference in behaviour. However, an aaload has its own range check against the actual array length. Loads into the array can not float above that array range check, because that could perform a load outside the heap boundary. Therefore, with your proposal, a read out of bounds of the array list would not only throw an exception, but occasionally also access outside the heap and hence crash the JVM, when accessed in a loop within which the capacity can shrink, rendering hoisted range checks invalid. That’s my concern here. And combating that would likely involve a huge deopt hammer triggered by GC.

We can have GC only do the trimming operation when the only references to the ArrayList or HashMap are from other objects in the heap.  If the ArrayList or HashMap is referenced from a thread's stack or registers, then GC skips the ArrayList or HashMap.  Thus, a Java thread acquires a "lock" on the ArrayList or HashMap when it starts referencing the ArrayList or HashMap.  The Java thread releases a "lock" when it stops referencing the ArrayList or HashMap.  "lock" doesn't mean anything more than the thread having a reference to the ArrayList or HashMap.  It does not imply any synchronization mechanism that will cause a performance problem for the Java thread.

Non-ZGC algorithms pause the application during the move operation.  While the application is paused, GC can be sure that no Java threads are operating on the ArrayList or HashMap.

For ZGC, a GC thread can move the ArrayList and its internal Object[] with trimming.  Whenever a GC thread is moving an object, Java threads that attempt to access the object already block and wait for the move to finish, if I understand correctly.  If the Java thread wants to access the ArrayList before a GC thread has moved it, the Java thread does the move operation.  Since the Java thread is executing code outside of the ArrayList, then the Java thread will do the trimming before the Java thread starts looking at Object[] bounds.

Right. This takes us back to scanning all activation records in the system. As mentioned earlier, that tends to get rather expensive when you use virtual threads, and you can have activation records in the heap as well, which also similarly may have problematic references to array list internals. We have had to go over mechanisms that scan activation records in general, and rethink such algorithms to do something else, due to the scalability concerns.

> Yes, I am talking about large object allocation. While we certainly could deal with it, my point was rather that it would be painful and require a bunch of special handling.

The trimming can avoid large objects.  With that in mind, trimming large Object[] in ArrayList and HashMap is probably not worth the additional effort.  Large sparsely-used ArrayList doesn't even show up in the average heap dump.  Large sparsely-used HashMap only takes 0.4% of the heap.  Large sparsely-used ConcurrentHashMap only takes 0.1% of the heap.  We can revisit trimming large sparsely-used ArrayLists and HashMaps when we are looking for heap reductions ≤ 0.5%. :)

Fair enough.

>> Perhaps this could be modelled as a library problem instead. Sounds like perhaps the growing strategy of the library is too aggressive for your preferences?
> I was thinking of rather having a less aggressive growing strategy, or something along those lines.

This won't help in HashMap's case.  The table sizes are powers of 2.  The growth has to be a doubling.  HashMap is already growing as slowly as possible.

Perhaps the reason for the power of two sizing there, is because then going from a hashCode to a bucket can be done quickly using bitwise and, instead of expensive modulo. I suspect that changing that policy would come at a significant CPU overhead. Also, when elements are scattered randomly in the backing array, if you don’t rehash the table, you would only be able to remove a couple of trailing nulls at the end of the array, while many would remain in the middle between the elements. And rehashing might run into collisions which requires allocating tree nodes - something we can’t do from a GC context.

Having said that, perhaps if a HashTable is too sparse, a smaller default size of the table and more dense load factor, would save some memory.

Slowing the growth rate of ArrayList will only reduce waste by < 0.3%.

Okay.

/Erik

On Mon, Jan 23, 2023 at 2:34 PM Erik Osterlund <erik.osterlund at oracle.com<mailto:erik.osterlund at oracle.com>> wrote:

On 23 Jan 2023, at 18:45, Nathan Reynolds <numeralnathan at gmail.com<mailto:numeralnathan at gmail.com>> wrote:

> 1. Such a change would have user observable differences in behaviour, which could introduce bugs in user code, due to the optimization.

How is this user observable?  The Object[] is buried inside an ArrayList or HashMap.  This idea is not touching other Object[]'s outside a collection.

I suppose a performance impact from having to grow is somewhat observable.  This was noted in my original email.  However, growing is not functionally observable.

The only way to functionally observe this is through reflection.  Reflection is a moot concern because there are no guarantees when using reflection and switching to a new Java version.  So, any reflection into ArrayList or HashMap would have to be fixed like any other functional change to ArrayList or HashMap.

The array list code itself will probably blow up if for example you GC in the middle of let’s say the add method. It checks if we have capacity, then we GC draining that capacity, then add the object out of bounds. GC safepoints can happen anywhere in Java code, including in any method on the array list, which uses the length for various reasons, assuming it won’t change seemingly concurrently.

> 2. The JIT relies a lot on hoisting range checks out of loops to get good performance, but we need to safepoint poll inside of loops. If we take the slow path of such a safepoint poll, and perform a GC which shrinks the arrays, then the range checks that were only performed before the loop, are no longer valid, and we would essentially have to deoptimize all activation records (frames) in the system, which with virtual threads could be a lot of frames. Either that or stop hoisting range checks out of loops, but that comes with its own performance problems.

ArrayList uses its internal size field to determine what elements the code looks at.  If the Object[] changes length to anything ≥ size, then loops and hoisting still work.  The new Object[] length won't cause a problem.

HashMap may or may not be possible to trim.  It will take some effort to examine the code and determine if a GC trim will work or the code may need to be changed to be compatible.  One way to make the Java code compatible would be to make the Java code set the HashMap's table field to null while holding the reference to the table in a local variable.  GC will then not recognize the table as belonging to HashMap and hence GC won't do anything with it.

Loads from an array can speculatively float above user control checks like the ArrayList internal range check. That’s fine and the user can’t observe the difference in behaviour. However, an aaload has its own range check against the actual array length. Loads into the array can not float above that array range check, because that could perform a load outside the heap boundary. Therefore, with your proposal, a read out of bounds of the array list would not only throw an exception, but occasionally also access outside the heap and hence crash the JVM, when accessed in a loop within which the capacity can shrink, rendering hoisted range checks invalid. That’s my concern here. And combating that would likely involve a huge deopt hammer triggered by GC.

> 3. Many GCs allocate objects that are larger than a certain threshold in special storage that deals with unique concerns for larger objects. If said objects shrink to below that limit, I suspect we would have many interesting problems to deal with. Not impossible to fix, but would probably be rather painful to deal with.

Are you talking about large object allocation?  If so, this is beyond my expertise.  However, I would assume that GC could allocate a new location for the Object[], copy the data, and leave the large Object[] as garbage for collection later.  This is probably worth it since the memory saves could be very significant.

Yes, I am talking avout large object allocation. While we certainly could deal with it, my point was rather that it would be painful and require a bunch of special handling.

> Perhaps this could be modelled as a library problem instead. Sounds like perhaps the growing strategy of the library is too aggressive for your preferences?

I am not sure how a Java library could implement this.  The library could use reflection to get access to the Object[] in ArrayList or HashMap, allocate a new Object[], copy the elements, and use reflection to store the new Object[].  However, there is no synchronization mechanism to get the other Java threads to see the reference to the new Object[].  Some threads will continue to use the old Object[] and some threads will use the new Object[].  Also, if other Java threads change the elements in the old Object[] while the library is making the change, then those changes could be lost.

I was thinking of rather having a less aggressive growing strategy, or something along those lines.

The beauty of trimming during GC move is that GC and Java threads have already negotiated how to synchronize the object move.  In Serial GC, the Java threads are paused and GC can move anything it wants.  GC changes the stack (and registers?) of all the Java threads.  The Java threads have no way to tell that the object moved.  In ZGC, the Java and GC threads work together to perform the move.  If I understand correctly, other GC algorithms execute moves like Serial or ZGC.

As I mentioned earlier, they have not negotiated when such GC can shrink, if a java thread is in the middle of a method poking around at an array list.

Hope this makes sense.

/Erik

On Mon, Jan 23, 2023 at 1:57 AM Erik Osterlund <erik.osterlund at oracle.com<mailto:erik.osterlund at oracle.com>> wrote:
Hi Nathan,

I can think of a number of concerning aspects of doing something like this:

1. Such a change would have user observable differences in behaviour, which could introduce bugs in user code, due to the optimization.

2. The JIT relies a lot on hoisting range checks out of loops to get good performance, but we need to safepoint poll inside of loops. If we take the slow path of such a safepoint poll, and perform a GC which shrinks the arrays, then the range checks that were only performed before the loop, are no longer valid, and we would essentially have to deoptimize all activation records (frames) in the system, which with virtual threads could be a lot of frames. Either that or stop hoisting range checks out of loops, but that comes with its own performance problems.

3. Many GCs allocate objects that are larger than a certain threshold in special storage that deals with unique concerns for larger objects. If said objects shrink to below that limit, I suspect we would have many interesting problems to deal with. Not impossible to fix, but would probably be rather painful to deal with.

Perhaps this could be modelled as a library problem instead. Sounds like perhaps the growing strategy of the library is too aggressive for your preferences?

Thanks,
/Erik

> On 21 Jan 2023, at 00:24, Nathan Reynolds <numeralnathan at gmail.com<mailto:numeralnathan at gmail.com>> wrote:
>
> ArrayList, HashMap, and other collections have internal Object[]'s to store the elements.  The Object[]'s are lazily initialized to minimize heap usage.  However, there is still quite a bit of waste in these Object[]'s.  Some collections are oversized from the beginning.  Some collections are filled and then partially or fully emptied.  For an average heap dump, 10.1% of the heap is wasted on oversized Object[]'s.
>
> These Object[]'s can be trimmed during GC's move operation but only when the process transitions to idle.  Since the trimming only happens when the application transitions to idle, then the extra CPU and/or pause time isn't a concern.  The performance penalty to the application to resize larger won't happen for  a while since the process is idle.  With the reduction of heap usage more idle processes can run on a single VM (i.e., the JVM will be more Docker Container friendly).
>
> The trimming may apply to other collections such as ArrayDeque, BitSet, HashSet, IdentityHashMap, LinkedHashMap, Stack, TreeMap, TreeSet, WeakHashMap, Hashtable, and Vector.  HashSet, IdentityHashMap, LinkedHashMap, TreeMap, and TreeSet may be implemented with HashMap so trimming HashMap will automatically make these other collections benefit.
>
> How did I come up with the 10.1% heap reduction?  I analyzed 650 heap dumps and presented the findings in Java Memory Hogs.  By carefully adding up the right numbers, I calculate that the heap reduction will be 10.1%.  That's a big savings and it will impact the entire world of Java programs!
>
> I entertained the idea of having a setting to trim with every GC or when objects promote from young to old generation.  One concern is that the application would take a performance penalty to enlarge some or all the Object[]'s.  Another concern is that the extra CPU time and/or pause time will interfere with the application's performance.
>
> What do you think of the overall idea?  Do you have ideas of how to trim more often than when the application transitions to idle?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20230128/341b0156/attachment-0001.htm>