<div dir="ltr"><div>> The impact of inserting a new element is a very observable thing. <br></div><div><br></div><div>I don't understand. The trimming does not change the in-use elements in ArrayList's and HashMap's Object[]. It simply removes nulls. For ArrayList, this means removing nulls at the end of the Object[]. These nulls occupy positions ≥ ArrayList's size. So, the Java code that uses the ArrayList can't detect the change. For HashMap, this means removing nulls between used buckets. So, the Java code that uses the HashMap can't detect any difference.</div><div><br></div><div>> A change to that sizing behavior can lead to very visible and surprising outcomes, like large latencies in adding elements to HashMap instances that were intentionally sized to prevent those high latencies and to gain predictability, with that predictability being legitimately based on the documented HashMap behavior.</div><div><br></div><div>
This is why I suggested in my original email 3 configuration modes: off, idle, always. <br></div><div><br></div><div>I realize that some workloads idle for a while (e.g., stock trading programs after market) and then start up with a burst of activity and must populate ArrayLists and HashMaps with a huge number of elements quickly. These kinds of applications may suffer from trimming. The off mode may be the right choice.<br></div><div><br></div><div>However, the average ArrayList has 12.1 elements. If the Object[] header is 12 bytes and we have 32-bit references, then the average Object[] is about 60.4 bytes. Allocating space and copying the data takes about 433 ns (guesstimate). Allocating and copying the Object[] arrays for 1,000 average-sized ArrayLists will take about 7 𝛍s. So, perhaps many more workloads can handle trimming than we suspect.</div><div><br></div><div></div><div>If idle mode is used, then the trimming only happens when the program goes idle. If there is a performance impact while program is idling, then I doubt anyone cares. When the program transitions to busy again, then there will be a performance impact; however, the performance impact may be negligible (i.e., 7 𝛍s). Only testing with the feature will tell. For most workloads, the load slowly climbs so the impact will be in the noise (e.g., a server at a company where the work force comes in slowly in the morning). For single-user programs, I doubt a user can detect it.<br></div><div><br></div><div>In my initial email, I didn't think the always mode would be good. I realize there would be performance problems. However, some workloads may still benefit. We've seen heap reduction improve response time since GC doesn't kick in as often. If we can somehow cheaply track which ArrayLists and HashMaps have a huge variance in size, then we can avoid trimming these. Then, always mode would be viable. This may be premature optimization. We would need to have the feature in place to determine what exactly are the problems with using always mode.<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 23, 2023 at 1:10 PM Gil Tene <<a href="mailto:gil@azul.com">gil@azul.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><br><div><br><blockquote type="cite"><div>On Jan 23, 2023, at 7:44 AM, Nathan Reynolds <<a href="mailto:numeralnathan@gmail.com" target="_blank">numeralnathan@gmail.com</a>> wrote:</div><br><div><div dir="ltr"><div>> 1. Such a change would have user observable differences in behaviour,
which could introduce bugs in user code, due to the optimization.
</div><div><br></div><div>How is this user observable? The Object[] is buried inside an ArrayList or HashMap. This idea is not touching other Object[]'s outside a collection.<br></div><div><br></div><div>I suppose a performance impact from having to grow is somewhat observable. This was noted in my original email. However, growing is not functionally observable.<br></div><div><br></div><div>The only way to functionally observe this is through reflection. Reflection is a moot concern because there are no guarantees when using reflection and switching to a new Java version. So, any reflection into ArrayList or HashMap would have to be fixed like any other functional change to ArrayList or HashMap.<br></div></div></div></blockquote><div><br></div>The impact of inserting a new element is a very observable thing. <span style="color:rgb(0,0,0)">A change to that sizing behavior can lead to very visible and surprising outcomes, like large latencies in adding elements to HashMap instances that were intentionally sized to prevent those high latencies and to gain predictability, with that predictability being legitimately based on the documented HashMap behavior.</span></div><div><br></div><div>E.g. <a href="https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/HashMap.html" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/HashMap.html</a> includes:</div><div>"<span style="color:rgb(71,71,71);font-family:"DejaVu Serif",Georgia,"Times New Roman",Times,serif;font-size:14px;background-color:rgb(255,255,255)">An instance of </span><code style="font-family:"DejaVu Sans Mono",monospace;font-size:14px;padding-top:4px;margin-top:8px;line-height:1.4em;color:rgb(71,71,71)">HashMap</code><span style="color:rgb(71,71,71);font-family:"DejaVu Serif",Georgia,"Times New Roman",Times,serif;font-size:14px;background-color:rgb(255,255,255)"> has two parameters that affect its performance: </span><i style="color:rgb(71,71,71);font-family:"DejaVu Serif",Georgia,"Times New Roman",Times,serif;font-size:14px">initial capacity</i><span style="color:rgb(71,71,71);font-family:"DejaVu Serif",Georgia,"Times New Roman",Times,serif;font-size:14px;background-color:rgb(255,255,255)"> and </span><i style="color:rgb(71,71,71);font-family:"DejaVu Serif",Georgia,"Times New Roman",Times,serif;font-size:14px">load factor</i><span style="color:rgb(71,71,71);font-family:"DejaVu Serif",Georgia,"Times New Roman",Times,serif;font-size:14px;background-color:rgb(255,255,255)">. The </span><i style="color:rgb(71,71,71);font-family:"DejaVu Serif",Georgia,"Times New Roman",Times,serif;font-size:14px">capacity</i><span style="color:rgb(71,71,71);font-family:"DejaVu Serif",Georgia,"Times New Roman",Times,serif;font-size:14px;background-color:rgb(255,255,255)"> is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The </span><i style="color:rgb(71,71,71);font-family:"DejaVu Serif",Georgia,"Times New Roman",Times,serif;font-size:14px">load factor</i><span style="color:rgb(71,71,71);font-family:"DejaVu Serif",Georgia,"Times New Roman",Times,serif;font-size:14px;background-color:rgb(255,255,255)"> is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is </span><i style="color:rgb(71,71,71);font-family:"DejaVu Serif",Georgia,"Times New Roman",Times,serif;font-size:14px">rehashed</i><span style="color:rgb(71,71,71);font-family:"DejaVu Serif",Georgia,"Times New Roman",Times,serif;font-size:14px;background-color:rgb(255,255,255)"> (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.</span>"</div><div><br></div><div>ArrayList has no such specification, but performant code may have people empirically relying on current behavior as well. [Here a flag-drive control is more defensible tho].</div><div><br></div><div><blockquote type="cite"><div><div dir="ltr"><div><br></div><div>> 2. The JIT relies a lot on hoisting range checks out of loops to get
good performance, but we need to safepoint poll inside of loops. If we
take the slow path of such a safepoint poll, and perform a GC which
shrinks the arrays, then the range checks that were only performed
before the loop, are no longer valid, and we would essentially have to
deoptimize all activation records (frames) in the system, which with
virtual threads could be a lot of frames. Either that or stop hoisting
range checks out of loops, but that comes with its own performance
problems.</div><div><br></div><div>ArrayList uses its internal size field to determine what elements the code looks at. If the Object[] changes length to anything ≥ size, then loops and hoisting still work. The new Object[] length won't cause a problem.</div><div><br></div><div>HashMap may or may not be possible to trim. It will take some effort to examine the code and determine if a GC trim will work or the code may need to be changed to be compatible. One way to make the Java code compatible would be to make the Java code set the HashMap's table field to null while holding the reference to the table in a local variable. GC will then not recognize the table as belonging to HashMap and hence GC won't do anything with it.</div><div><br>
> 3. Many GCs allocate objects that are larger than a certain threshold in
special storage that deals with unique concerns for larger objects. If
said objects shrink to below that limit, I suspect we would have many
interesting problems to deal with. Not impossible to fix, but would
probably be rather painful to deal with.</div><div><br></div><div>Are you talking about large object allocation? If so, this is beyond my expertise. However, I would assume that GC could allocate a new location for the Object[], copy the data, and leave the large Object[] as garbage for collection later. This is probably worth it since the memory saves could be very significant.<br></div><div>
<br>> Perhaps this could be modelled as a library problem instead. Sounds like
perhaps the growing strategy of the library is too aggressive for your
preferences? <br></div><div><br></div><div>I am not sure how a Java library could implement this. The library could use reflection to get access to the Object[] in ArrayList or HashMap, allocate a new Object[], copy the elements, and use reflection to store the new Object[]. However, there is no synchronization mechanism to get the other Java threads to see the reference to the new Object[]. Some threads will continue to use the old Object[] and some threads will use the new Object[]. Also, if other Java threads change the elements in the old Object[] while the library is making the change, then those changes could be lost.</div><div><br></div><div>The beauty of trimming during GC move is that GC and Java threads have already negotiated how to synchronize the object move. In Serial GC, the Java threads are paused and GC can move anything it wants. GC changes the stack (and registers?) of all the Java threads. The Java threads have no way to tell that the object moved. In ZGC, the Java and GC threads work together to perform the move. If I understand correctly, other GC algorithms execute moves like Serial or ZGC.<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 23, 2023 at 1:57 AM Erik Osterlund <<a href="mailto:erik.osterlund@oracle.com" target="_blank">erik.osterlund@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Nathan,<br>
<br>
I can think of a number of concerning aspects of doing something like this:<br>
<br>
1. Such a change would have user observable differences in behaviour, which could introduce bugs in user code, due to the optimization.<br>
<br>
2. The JIT relies a lot on hoisting range checks out of loops to get good performance, but we need to safepoint poll inside of loops. If we take the slow path of such a safepoint poll, and perform a GC which shrinks the arrays, then the range checks that were only performed before the loop, are no longer valid, and we would essentially have to deoptimize all activation records (frames) in the system, which with virtual threads could be a lot of frames. Either that or stop hoisting range checks out of loops, but that comes with its own performance problems.<br>
<br>
3. Many GCs allocate objects that are larger than a certain threshold in special storage that deals with unique concerns for larger objects. If said objects shrink to below that limit, I suspect we would have many interesting problems to deal with. Not impossible to fix, but would probably be rather painful to deal with.<br>
<br>
Perhaps this could be modelled as a library problem instead. Sounds like perhaps the growing strategy of the library is too aggressive for your preferences?<br>
<br>
Thanks,<br>
/Erik<br>
<br>
> On 21 Jan 2023, at 00:24, Nathan Reynolds <<a href="mailto:numeralnathan@gmail.com" target="_blank">numeralnathan@gmail.com</a>> wrote:<br>
> <br>
> ArrayList, HashMap, and other collections have internal Object[]'s to store the elements. The Object[]'s are lazily initialized to minimize heap usage. However, there is still quite a bit of waste in these Object[]'s. Some collections are oversized from the beginning. Some collections are filled and then partially or fully emptied. For an average heap dump, 10.1% of the heap is wasted on oversized Object[]'s.<br>
> <br>
> These Object[]'s can be trimmed during GC's move operation but only when the process transitions to idle. Since the trimming only happens when the application transitions to idle, then the extra CPU and/or pause time isn't a concern. The performance penalty to the application to resize larger won't happen for a while since the process is idle. With the reduction of heap usage more idle processes can run on a single VM (i.e., the JVM will be more Docker Container friendly). <br>
> <br>
> The trimming may apply to other collections such as ArrayDeque, BitSet, HashSet, IdentityHashMap, LinkedHashMap, Stack, TreeMap, TreeSet, WeakHashMap, Hashtable, and Vector. HashSet, IdentityHashMap, LinkedHashMap, TreeMap, and TreeSet may be implemented with HashMap so trimming HashMap will automatically make these other collections benefit.<br>
> <br>
> How did I come up with the 10.1% heap reduction? I analyzed 650 heap dumps and presented the findings in Java Memory Hogs. By carefully adding up the right numbers, I calculate that the heap reduction will be 10.1%. That's a big savings and it will impact the entire world of Java programs! <br>
> <br>
> I entertained the idea of having a setting to trim with every GC or when objects promote from young to old generation. One concern is that the application would take a performance penalty to enlarge some or all the Object[]'s. Another concern is that the extra CPU time and/or pause time will interfere with the application's performance.<br>
> <br>
> What do you think of the overall idea? Do you have ideas of how to trim more often than when the application transitions to idle?<br>
<br>
</blockquote></div>
</div></blockquote></div><br></div></blockquote></div>