[External] : Re: Trimming ArrayList and HashMap During GC Copy
Nathan Reynolds
numeralnathan at gmail.com
Tue Jan 24 01:33:01 UTC 2023
> The array list code itself will probably blow up if for example you GC in
the middle of let’s say the add method. It checks if we have capacity, then
we GC draining that capacity, then add the object out of bounds. GC
safepoints can happen anywhere in Java code, including in any method on the
array list, which uses the length for various reasons, assuming it won’t
change seemingly concurrently.
You raise a valid point. Thank you for catching this. I wrote a response
but came up with something better below.
> Loads from an array can speculatively float above user control checks
like the ArrayList internal range check. That’s fine and the user can’t
observe the difference in behaviour. However, an aaload has its own range
check against the actual array length. Loads into the array can not float
above that array range check, because that could perform a load outside the
heap boundary. Therefore, with your proposal, a read out of bounds of the
array list would not only throw an exception, but occasionally also access
outside the heap and hence crash the JVM, when accessed in a loop within
which the capacity can shrink, rendering hoisted range checks invalid.
That’s my concern here. And combating that would likely involve a huge
deopt hammer triggered by GC.
We can have GC only do the trimming operation when the only references to
the ArrayList or HashMap are from other objects in the heap. If the
ArrayList or HashMap is referenced from a thread's stack or registers, then
GC skips the ArrayList or HashMap. Thus, a Java thread acquires a "lock"
on the ArrayList or HashMap when it starts referencing the ArrayList or
HashMap. The Java thread releases a "lock" when it stops referencing the
ArrayList or HashMap. "lock" doesn't mean anything more than the thread
having a reference to the ArrayList or HashMap. It does not imply any
synchronization mechanism that will cause a performance problem for the
Java thread.
Non-ZGC algorithms pause the application during the move operation. While
the application is paused, GC can be sure that no Java threads are
operating on the ArrayList or HashMap.
For ZGC, a GC thread can move the ArrayList and its internal Object[] with
trimming. Whenever a GC thread is moving an object, Java threads that
attempt to access the object already block and wait for the move to finish,
if I understand correctly. If the Java thread wants to access the
ArrayList before a GC thread has moved it, the Java thread does the move
operation. Since the Java thread is executing code outside of the
ArrayList, then the Java thread will do the trimming before the Java thread
starts looking at Object[] bounds.
> Yes, I am talking about large object allocation. While we certainly could
deal with it, my point was rather that it would be painful and require a
bunch of special handling.
The trimming can avoid large objects. With that in mind, trimming large
Object[] in ArrayList and HashMap is probably not worth the additional
effort. Large sparsely-used ArrayList doesn't even show up in the average
heap dump. Large sparsely-used HashMap only takes 0.4% of the heap. Large
sparsely-used ConcurrentHashMap only takes 0.1% of the heap. We can
revisit trimming large sparsely-used ArrayLists and HashMaps when we are
looking for heap reductions ≤ 0.5%. :)
>> Perhaps this could be modelled as a library problem instead. Sounds like
perhaps the growing strategy of the library is too aggressive for your
preferences?
> I was thinking of rather having a less aggressive growing strategy, or
something along those lines.
This won't help in HashMap's case. The table sizes are powers of 2. The
growth has to be a doubling. HashMap is already growing as slowly as
possible.
Slowing the growth rate of ArrayList will only reduce waste by < 0.3%.
On Mon, Jan 23, 2023 at 2:34 PM Erik Osterlund <erik.osterlund at oracle.com>
wrote:
>
>
> On 23 Jan 2023, at 18:45, Nathan Reynolds <numeralnathan at gmail.com> wrote:
>
>
> > 1. Such a change would have user observable differences in behaviour,
> which could introduce bugs in user code, due to the optimization.
>
> How is this user observable? The Object[] is buried inside an ArrayList
> or HashMap. This idea is not touching other Object[]'s outside a
> collection.
>
> I suppose a performance impact from having to grow is somewhat
> observable. This was noted in my original email. However, growing is not
> functionally observable.
>
> The only way to functionally observe this is through reflection.
> Reflection is a moot concern because there are no guarantees when using
> reflection and switching to a new Java version. So, any reflection into
> ArrayList or HashMap would have to be fixed like any other functional
> change to ArrayList or HashMap.
>
>
> The array list code itself will probably blow up if for example you GC in
> the middle of let’s say the add method. It checks if we have capacity, then
> we GC draining that capacity, then add the object out of bounds. GC
> safepoints can happen anywhere in Java code, including in any method on the
> array list, which uses the length for various reasons, assuming it won’t
> change seemingly concurrently.
>
> > 2. The JIT relies a lot on hoisting range checks out of loops to get
> good performance, but we need to safepoint poll inside of loops. If we take
> the slow path of such a safepoint poll, and perform a GC which shrinks the
> arrays, then the range checks that were only performed before the loop, are
> no longer valid, and we would essentially have to deoptimize all activation
> records (frames) in the system, which with virtual threads could be a lot
> of frames. Either that or stop hoisting range checks out of loops, but that
> comes with its own performance problems.
>
> ArrayList uses its internal size field to determine what elements the code
> looks at. If the Object[] changes length to anything ≥ size, then loops
> and hoisting still work. The new Object[] length won't cause a problem.
>
> HashMap may or may not be possible to trim. It will take some effort to
> examine the code and determine if a GC trim will work or the code may need
> to be changed to be compatible. One way to make the Java code compatible
> would be to make the Java code set the HashMap's table field to null while
> holding the reference to the table in a local variable. GC will then not
> recognize the table as belonging to HashMap and hence GC won't do anything
> with it.
>
>
> Loads from an array can speculatively float above user control checks like
> the ArrayList internal range check. That’s fine and the user can’t observe
> the difference in behaviour. However, an aaload has its own range check
> against the actual array length. Loads into the array can not float above
> that array range check, because that could perform a load outside the heap
> boundary. Therefore, with your proposal, a read out of bounds of the array
> list would not only throw an exception, but occasionally also access
> outside the heap and hence crash the JVM, when accessed in a loop within
> which the capacity can shrink, rendering hoisted range checks invalid.
> That’s my concern here. And combating that would likely involve a huge
> deopt hammer triggered by GC.
>
> > 3. Many GCs allocate objects that are larger than a certain threshold in
> special storage that deals with unique concerns for larger objects. If said
> objects shrink to below that limit, I suspect we would have many
> interesting problems to deal with. Not impossible to fix, but would
> probably be rather painful to deal with.
>
> Are you talking about large object allocation? If so, this is beyond my
> expertise. However, I would assume that GC could allocate a new location
> for the Object[], copy the data, and leave the large Object[] as garbage
> for collection later. This is probably worth it since the memory saves
> could be very significant.
>
>
> Yes, I am talking avout large object allocation. While we certainly could
> deal with it, my point was rather that it would be painful and require a
> bunch of special handling.
>
> > Perhaps this could be modelled as a library problem instead. Sounds like
> perhaps the growing strategy of the library is too aggressive for your
> preferences?
>
> I am not sure how a Java library could implement this. The library could
> use reflection to get access to the Object[] in ArrayList or HashMap,
> allocate a new Object[], copy the elements, and use reflection to store the
> new Object[]. However, there is no synchronization mechanism to get the
> other Java threads to see the reference to the new Object[]. Some threads
> will continue to use the old Object[] and some threads will use the new
> Object[]. Also, if other Java threads change the elements in the old
> Object[] while the library is making the change, then those changes could
> be lost.
>
>
> I was thinking of rather having a less aggressive growing strategy, or
> something along those lines.
>
> The beauty of trimming during GC move is that GC and Java threads have
> already negotiated how to synchronize the object move. In Serial GC, the
> Java threads are paused and GC can move anything it wants. GC changes the
> stack (and registers?) of all the Java threads. The Java threads have no
> way to tell that the object moved. In ZGC, the Java and GC threads work
> together to perform the move. If I understand correctly, other GC
> algorithms execute moves like Serial or ZGC.
>
>
> As I mentioned earlier, they have not negotiated when such GC can shrink,
> if a java thread is in the middle of a method poking around at an array
> list.
>
> Hope this makes sense.
>
> /Erik
>
>
> On Mon, Jan 23, 2023 at 1:57 AM Erik Osterlund <erik.osterlund at oracle.com>
> wrote:
>
>> Hi Nathan,
>>
>> I can think of a number of concerning aspects of doing something like
>> this:
>>
>> 1. Such a change would have user observable differences in behaviour,
>> which could introduce bugs in user code, due to the optimization.
>>
>> 2. The JIT relies a lot on hoisting range checks out of loops to get good
>> performance, but we need to safepoint poll inside of loops. If we take the
>> slow path of such a safepoint poll, and perform a GC which shrinks the
>> arrays, then the range checks that were only performed before the loop, are
>> no longer valid, and we would essentially have to deoptimize all activation
>> records (frames) in the system, which with virtual threads could be a lot
>> of frames. Either that or stop hoisting range checks out of loops, but that
>> comes with its own performance problems.
>>
>> 3. Many GCs allocate objects that are larger than a certain threshold in
>> special storage that deals with unique concerns for larger objects. If said
>> objects shrink to below that limit, I suspect we would have many
>> interesting problems to deal with. Not impossible to fix, but would
>> probably be rather painful to deal with.
>>
>> Perhaps this could be modelled as a library problem instead. Sounds like
>> perhaps the growing strategy of the library is too aggressive for your
>> preferences?
>>
>> Thanks,
>> /Erik
>>
>> > On 21 Jan 2023, at 00:24, Nathan Reynolds <numeralnathan at gmail.com>
>> wrote:
>> >
>> > ArrayList, HashMap, and other collections have internal Object[]'s to
>> store the elements. The Object[]'s are lazily initialized to minimize heap
>> usage. However, there is still quite a bit of waste in these Object[]'s.
>> Some collections are oversized from the beginning. Some collections are
>> filled and then partially or fully emptied. For an average heap dump,
>> 10.1% of the heap is wasted on oversized Object[]'s.
>> >
>> > These Object[]'s can be trimmed during GC's move operation but only
>> when the process transitions to idle. Since the trimming only happens when
>> the application transitions to idle, then the extra CPU and/or pause time
>> isn't a concern. The performance penalty to the application to resize
>> larger won't happen for a while since the process is idle. With the
>> reduction of heap usage more idle processes can run on a single VM (i.e.,
>> the JVM will be more Docker Container friendly).
>> >
>> > The trimming may apply to other collections such as ArrayDeque, BitSet,
>> HashSet, IdentityHashMap, LinkedHashMap, Stack, TreeMap, TreeSet,
>> WeakHashMap, Hashtable, and Vector. HashSet, IdentityHashMap,
>> LinkedHashMap, TreeMap, and TreeSet may be implemented with HashMap so
>> trimming HashMap will automatically make these other collections benefit.
>> >
>> > How did I come up with the 10.1% heap reduction? I analyzed 650 heap
>> dumps and presented the findings in Java Memory Hogs. By carefully adding
>> up the right numbers, I calculate that the heap reduction will be 10.1%.
>> That's a big savings and it will impact the entire world of Java programs!
>> >
>> > I entertained the idea of having a setting to trim with every GC or
>> when objects promote from young to old generation. One concern is that the
>> application would take a performance penalty to enlarge some or all the
>> Object[]'s. Another concern is that the extra CPU time and/or pause time
>> will interfere with the application's performance.
>> >
>> > What do you think of the overall idea? Do you have ideas of how to
>> trim more often than when the application transitions to idle?
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20230123/f28aaf5a/attachment-0001.htm>
More information about the hotspot-gc-dev
mailing list