OpenJDK G1 Patch

Tue May 22 07:28:47 UTC 2018

Hi Kirk,

-	weak references were in 2001 the only alternative for C++ smart pointers like design and I guess still there is no other solution available
-	CMSTriggerInterval is not invoking the full collection cycle, it invokes normal concurrent CMS cycle (1 ParNew + 2 STW) so that there is at least one concurrent CMS cycle per the trigger interval (considering CMS runs due to CMSTriggerRatio and another reasons)
-	we are fine with running regular concurrent CMS cycle which next to releasing the weak references keeps our old gen always clean having much higher chance to survive sudden increase in the promotion rate
-	we never run the full cycle, or better we have never seen an end of the full cycle as applications are killed and restarted before it can finish

Regards,
Michal

Od: "Kirk Pepperdine" kirk.pepperdine at gmail.com
Komu: "Michal Frajt" michal at frajt.eu
Kopie: "hotspot-gc-dev at openjdk.java.net openjdk.java.net" hotspot-gc-dev at openjdk.java.net, rbruno at gsd.inesc-id.pt, rs at jelastic.com
Datum: Tue, 22 May 2018 10:36:04 +0900
Předmet: Re: OpenJDK G1 Patch

Hi Michal,
From your description is seems as if there are two different problems at play here. One is a resource minimization issue and another is a design issue with one of your frameworks. Do correct me if I’m wrong.

please remember that some years back we had to patch a similar parameter to the CMS (CMSTriggerInterval) which helps us to regularly clean/sweep the complete heap. 
I view might be overly tainted by my aversion to full collections with mostly concurrent collectors but… I’ve had customers use CMSTriggerInterval as a means to speculatively run the collector. In all cases it was a flag that we ended up removing because of pause time concerns. It is but one flag of a couple that speculatively run the collector often without benefit. I’m not really a fan of making speculative calls to run a collector. Unless you do this with care and purpose the end results are often damaging to app performance. I use examples of both good and bad “speculative” calls in my workshop to expose the issues. I have 0 customers that enjoy experiencing a full collection when running either G1 or CMS.

We have an application framework where we massively use weak references as a sort of smart pointers where we need the garbage collector to at least regularly tell us that objects are not referenced from the application layer anymore. The application framework objects are memory and CPU and network bound as an existence of an object in the heap means receiving thousands of updates with necessary decoding and value updates. Unfortunately all CMS/G1/… collectors see an allocated object as a memory resource ignoring the fact that there might be other machine resources coupled with the object being alive.
I’m not sure that I completely understand the point you are trying to make here but isn’t it the role of GC is to manage memory. Finalize is an accidental technique that it used to release coupled resources in cases where you may not understand the life-cycle of that object.  If the collector isn’t managing to release the resources that you need to be released suggest that there is another issue at play. For example, in G1 dense tenured regions will not be collected unless you adjust the G1MixedGCLiveThresholdPercent which I’ve set of 100% on some occasions. However in those cases the real issue was some design flaw in the application that once corrected allowed us to return to the default setting.

There is an issue with how G1 which often results with a full collection that I believe involve collections ofreference types. I think I have a benchmark that exposes the flaw I just need more time to confirm.

We are still using the CMS as all G1 tests indicate 2-3x performance degradation of the ParNew like phase. We were not looking yet for the CMSTriggerInterval parameter replacement in the G1, but something like the GCFrequency parameter will be required for us as well to keep our application framework working. Obviously the regularly trigged all regions evacuation should be as much concurrent as it can be and not just a simple full collection.
### PHASE Post-Marking @ 2.991### HEAP  reserved: 0x00000004c0000000-0x00000007c0000000  region-size: 4194304######   type                         address-range       used  prev-live  next-live          gc-eff     remset  code-roots###                                                 (bytes)    (bytes)    (bytes)      (bytes/ms)    (bytes)    (bytes)###   OLD  0x00000004c0000000-0x00000004c0400000     928768     928768     904368             0.0       5832         16###   FREE 0x00000004c0400000-0x00000004c0800000          0          0          0             0.0       5832         16###   FREE 0x00000004c0800000-0x00000004c0c00000          0          0          0             0.0       5832         16###   FREE 0x00000004c0c00000-0x00000004c1000000          0          0          0             0.0       5832         16…… huge snip …….###   EDEN 0x00000007bf400000-0x00000007bf800000    4194304    4194304    4194304             0.0      12552         16###   EDEN 0x00000007bf800000-0x00000007bfc00000    4194304    4194304    4194304             0.0       8712         16###   EDEN 0x00000007bfc00000-0x00000007c0000000    4194304    4194304    4194304             0.0       8712         16
So, if you look at map of regions is it not the regions being used by Eden be returned at the end of a Young collections? Do we really need a Full to achieve this?

We hope there will be another alternative when CMS is killed for G1 to live.
Agreed… 

Regards,
Kirk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20180522/3a5272b9/attachment.htm>