Trimming ArrayList and HashMap During GC Copy

Nathan Reynolds numeralnathan at gmail.com
Thu Jan 26 20:12:19 UTC 2023


> I think the issue is that is is observable by the Java code, in ArrayList
and Hashmap.

The change is observable by the collection's code, but this is okay.  The
change doesn't break anything.

> You're proposing specialised JVM
and Java behaviour for particular classes. If the behaviour could be
generalized and more widely used, then the cost of
the special code could be amortised across many classes, and could be more
consistently implemented.

Unfortunately, this optimization requires GC and the collection's code to
work together.  GC has to understand how each collection works to know how
to trim it.  For example, GC has to know that trimming ArrayList's Object[]
will work as long as the new Object[] length is == the ArrayList's size.
Also, GC has to know how to trim a HashMap.  GC can't simply shorten the
Object[].  GC has to redistribute the nodes in the HashMap.

> Observation can also apply in terms of concurrency, where it is not just
how, but when the change is observed that is
important. You may have several threads executing different parts of the
class, each with different ideas of the size of
the array. I think there would need to be thought given to how the Java
code and GC code co-ordinate with one another.

Agreed.  Please see my email I sent just before this one.  It tackles this
in depth.

> The debugging APIs (JVMTI, JDWP) also allow memory accesses to be
observed, and I suspect we'd need to be careful with
JNI and the panama features.

When JVMTI and JDWP tries to access an object, the internal JVM code
already deals with GC relocating objects.  Since the trimming happens
during relocation, JVMTI and JDWP won't be able to observe that trimming
happens.

Native methods are a different animal.  When the object address is given to
native methods, GC pins the object and so it can't relocate.  No relocation
means trimming won't happen.

> My feeling is that there should be clear boundary between what GC does
and what the language does.

GC already does String byte[] de-duplication.  GC understands the nature of
Strings and that it can de-duplicate the internal byte[].  This provides a
20% reduction in memory usage and surprisingly a significant boost to
response time.  String byte[] de-duplication crosses the boundary between
GC and Java code.

Trimming is just another optimization like String byte[] de-duplication.
Trimming provides a 10.1% reduction in memory usage and presumably a boost
to response time.  This cannot be ignored and outweighs any boundaries
between GC and Java code.

On Tue, Jan 24, 2023 at 3:57 AM Stuart Monteith <stumon01 at arm.com> wrote:

> On 23/01/2023 17:44, Nathan Reynolds wrote:
> >  > 1. Such a change would have user observable differences in behaviour,
> which could introduce bugs in user code, due to
> > the optimization.
> >
> > How is this user observable?  The Object[] is buried inside an ArrayList
> or HashMap.  This idea is not touching other
> > Object[]'s outside a collection.
>
> I think the issue is that is is observable by the Java code, in ArrayList
> and Hashmap. You're proposing specialised JVM
> and Java behaviour for particular classes. If the behaviour could be
> generalized and more widely used, then the cost of
> the special code could be amortised across many classes, and could be more
> consistently implemented. My first thoughts
> are that we could implement sparse arrays or arraylets in the java code.
> Another might be an annotation to ask for
> certain arrays to not be precommited.
>
> >
> > I suppose a performance impact from having to grow is somewhat
> observable.  This was noted in my original email.
> > However, growing is not functionally observable.
> >
> > The only way to functionally observe this is through reflection.
> Reflection is a moot concern because there are no
> > guarantees when using reflection and switching to a new Java version.
> So, any reflection into ArrayList or HashMap
> > would have to be fixed like any other functional change to ArrayList or
> HashMap.
> >
>
> Observation can also apply in terms of concurrency, where it is not just
> how, but when the change is observed that is
> important. You may have several threads executing different parts of the
> class, each with different ideas of the size of
> the array. I think there would need to be thought given to how the Java
> code and GC code co-ordinate with one another.By
> analogy, we have the Reference classes (Weak, Phantom, etc), where it is
> expected that the reference field will be
> cleared automatically by the garbage collector, and it is understood this
> will only happen when there are no strong
> references. Perhaps something like:
>
> class ResizeableArray<T> {
>      Object<T>[] reference;
> }
>
> The debugging APIs (JVMTI, JDWP) also allow memory accesses to be
> observed, and I suspect we'd need to be careful with
> JNI and the panama features.
>
> >  > 2. The JIT relies a lot on hoisting range checks out of loops to get
> good performance, but we need to safepoint poll
> > inside of loops. If we take the slow path of such a safepoint poll, and
> perform a GC which shrinks the arrays, then the
> > range checks that were only performed before the loop, are no longer
> valid, and we would essentially have to deoptimize
> > all activation records (frames) in the system, which with virtual
> threads could be a lot of frames. Either that or stop
> > hoisting range checks out of loops, but that comes with its own
> performance problems.
> >
> > ArrayList uses its internal size field to determine what elements the
> code looks at.  If the Object[] changes length to
> > anything ≥ size, then loops and hoisting still work.  The new Object[]
> length won't cause a problem.
> >
> By definition, arrays have constant lengths, so trimming the length would
> cause problems, and with larger arrays you
> could get inconsistencies between threads. I'd expect there would be
> reservations because of the security implications
> of resizing arrays.
>
> > HashMap may or may not be possible to trim.  It will take some effort to
> examine the code and determine if a GC trim
> > will work or the code may need to be changed to be compatible.  One way
> to make the Java code compatible would be to
> > make the Java code set the HashMap's table field to null while holding
> the reference to the table in a local variable.
> > GC will then not recognize the table as belonging to HashMap and hence
> GC won't do anything with it.
> >
> >  > 3. Many GCs allocate objects that are larger than a certain threshold
> in special storage that deals with unique
> > concerns for larger objects. If said objects shrink to below that limit,
> I suspect we would have many interesting
> > problems to deal with. Not impossible to fix, but would probably be
> rather painful to deal with.
> >
> > Are you talking about large object allocation?  If so, this is beyond my
> expertise.  However, I would assume that GC
> > could allocate a new location for the Object[], copy the data, and leave
> the large Object[] as garbage for collection
> > later.  This is probably worth it since the memory saves could be very
> significant.
> >
> >  > Perhaps this could be modelled as a library problem instead. Sounds
> like perhaps the growing strategy of the library
> > is too aggressive for your preferences?
> >
> > I am not sure how a Java library could implement this.  The library
> could use reflection to get access to the Object[]
> > in ArrayList or HashMap, allocate a new Object[], copy the elements, and
> use reflection to store the new Object[].
> > However, there is no synchronization mechanism to get the other Java
> threads to see the reference to the new Object[].
> > Some threads will continue to use the old Object[] and some threads will
> use the new Object[].  Also, if other Java
> > threads change the elements in the old Object[] while the library is
> making the change, then those changes could be lost.
> >
> > The beauty of trimming during GC move is that GC and Java threads have
> already negotiated how to synchronize the object
> > move.  In Serial GC, the Java threads are paused and GC can move
> anything it wants.  GC changes the stack (and
> > registers?) of all the Java threads.  The Java threads have no way to
> tell that the object moved.  In ZGC, the Java and
> > GC threads work together to perform the move.  If I understand
> correctly, other GC algorithms execute moves like Serial
> > or ZGC.
>
>
> My feeling is that there should be clear boundary between what GC does and
> what the language does. The language should
> provide all of the synchronization and array handling feature you need to
> implement efficient data structures, and GC
> should collect garbage efficiently. If java.utils.Arrays.copyOf() isn't
> efficient, perhaps that is an area that could be
> looked into. It would be under control of the class using the array, so it
> would be able to synchronize correctly, and
> unlike with shrinking arrays, prevents Java code from writing past array
> bounds.
>
>
> It is an interesting idea, but I think the implications go beyond what
> Erik and I have said,
>
> BR,
>         Stuart
>
> >
> > On Mon, Jan 23, 2023 at 1:57 AM Erik Osterlund <
> erik.osterlund at oracle.com <mailto:erik.osterlund at oracle.com>> wrote:
> >
> >     Hi Nathan,
> >
> >     I can think of a number of concerning aspects of doing something
> like this:
> >
> >     1. Such a change would have user observable differences in
> behaviour, which could introduce bugs in user code, due
> >     to the optimization.
> >
> >     2. The JIT relies a lot on hoisting range checks out of loops to get
> good performance, but we need to safepoint poll
> >     inside of loops. If we take the slow path of such a safepoint poll,
> and perform a GC which shrinks the arrays, then
> >     the range checks that were only performed before the loop, are no
> longer valid, and we would essentially have to
> >     deoptimize all activation records (frames) in the system, which with
> virtual threads could be a lot of frames.
> >     Either that or stop hoisting range checks out of loops, but that
> comes with its own performance problems.
> >
> >     3. Many GCs allocate objects that are larger than a certain
> threshold in special storage that deals with unique
> >     concerns for larger objects. If said objects shrink to below that
> limit, I suspect we would have many interesting
> >     problems to deal with. Not impossible to fix, but would probably be
> rather painful to deal with.
> >
> >     Perhaps this could be modelled as a library problem instead. Sounds
> like perhaps the growing strategy of the library
> >     is too aggressive for your preferences?
> >
> >     Thanks,
> >     /Erik
> >
> >      > On 21 Jan 2023, at 00:24, Nathan Reynolds <
> numeralnathan at gmail.com <mailto:numeralnathan at gmail.com>> wrote:
> >      >
> >      > ArrayList, HashMap, and other collections have internal
> Object[]'s to store the elements.  The Object[]'s are
> >     lazily initialized to minimize heap usage.  However, there is still
> quite a bit of waste in these Object[]'s.  Some
> >     collections are oversized from the beginning.  Some collections are
> filled and then partially or fully emptied.  For
> >     an average heap dump, 10.1% of the heap is wasted on oversized
> Object[]'s.
> >      >
> >      > These Object[]'s can be trimmed during GC's move operation but
> only when the process transitions to idle.  Since
> >     the trimming only happens when the application transitions to idle,
> then the extra CPU and/or pause time isn't a
> >     concern.  The performance penalty to the application to resize
> larger won't happen for  a while since the process is
> >     idle.  With the reduction of heap usage more idle processes can run
> on a single VM (i.e., the JVM will be more
> >     Docker Container friendly).
> >      >
> >      > The trimming may apply to other collections such as ArrayDeque,
> BitSet, HashSet, IdentityHashMap, LinkedHashMap,
> >     Stack, TreeMap, TreeSet, WeakHashMap, Hashtable, and Vector.
> HashSet, IdentityHashMap, LinkedHashMap, TreeMap, and
> >     TreeSet may be implemented with HashMap so trimming HashMap will
> automatically make these other collections benefit.
> >      >
> >      > How did I come up with the 10.1% heap reduction?  I analyzed 650
> heap dumps and presented the findings in Java
> >     Memory Hogs.  By carefully adding up the right numbers, I calculate
> that the heap reduction will be 10.1%.  That's a
> >     big savings and it will impact the entire world of Java programs!
> >      >
> >      > I entertained the idea of having a setting to trim with every GC
> or when objects promote from young to old
> >     generation.  One concern is that the application would take a
> performance penalty to enlarge some or all the
> >     Object[]'s.  Another concern is that the extra CPU time and/or pause
> time will interfere with the application's
> >     performance.
> >      >
> >      > What do you think of the overall idea?  Do you have ideas of how
> to trim more often than when the application
> >     transitions to idle?
> >
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20230126/09d8fdbf/attachment-0001.htm>


More information about the hotspot-gc-dev mailing list