OpenJDK G1 Patch

Wed May 23 08:38:49 UTC 2018

> On May 23, 2018, at 9:36 AM, Michal Frajt <michal at frajt.eu> wrote:
> 
> Hi Kirk,
> 
> I’m just on the back end of the Oracle Dev tour in Japan… Once I land I’ll work on the test case to hopefully sort out what is happening with references and G1 (if this is indeed the cause of the grief I’ve been trying to sort through) and I do plan on sharing the results.
> -	please share with us your finding, we would like to know how the G1 handles processing of weak references which are across all regions, we simply need to get them cleaned within a certain time at least (our CMSTriggerInterval)
> -	we have had similar issue with the Azul C4 where the weak references behave(d) very differently, when you just shortly access a weak reference referent, the weak reference won’t be cleaned in the next C4 major cycle (like minutes interval), so you have to stop accessing it which goes completely against the weak reference (or soft reference) idea 

Indeed, this feels like a bug.. certainly doesn’t meet WeakReference definitions as I understand it.

> 
> Yes, I understand but the ParNew before the CMS cycle has typically proven to be a huge loss overall.
> -	our finding is exactly opposite, you should run ParNew before the initial marking is started
> -	I had to even patch Hotspot http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2012-August/001297.html, https://bugs.openjdk.java.net/browse/JDK-7189971 <https://bugs.openjdk.java.net/browse/JDK-7189971>
Right, still a loss in every prod system where I’ve seen it being used.

> 
> If you need to run frequent CMS cycles to keep tenured clean then it’s very very likely that you’re suffering from a premature promotion problem. We have tuning strategies to manage this situation that doesn’t require excessive triggering of CMS.
> -	we run frequent CMS cycles to keep the weak references cleaned, tenured getting cleaned is just a side effect

I can’t say that I understand your use case but it sounds as if you *need* to have weak references cleaned frequently then again I’m thinking there is something in the design of your framework that is bothering me. If you need weak references you need weak references. That said, if you need them cleaned quickly then I have to ask why?

> 
> Right, I can typically have JVMs run for weeks or longer without suffering from concurrent mode failures. It’s about controlling frequency and promotion rates. And it’s not so easy as the aging process is affected by the size of Eden.
> -	right, in one of our distribution layers I was evening thinking about accessing the object age via the Unsafe to re-create it right before it gets promoted so the object stays very likely in the young generation
> -	now when thinking more about it, I could actually just try reduce the age via the Unsafe, the object should stay longer in the survivor spaces, need to try

Messing with the internal structures in this way seems like a gross violation… it feels worse than the triggering of a full. Again I’d suggest there is a tuning strategy to manage this.

Again, this isn’t to say your are facing real issues and that these fixes don’t work for you. The question is; is it appropriate to drop in patches to solve your specific problem for you domain without regard to other use cases? If so, what if I drop in a patch that is detrimental to your applications performance? IMO, what is really missing here is a general due diligence that confirms that this solves problems in the broader population. That we have different experiences with ParNew before CMS InitialMark/Remark isn’t surprising to me. And, it’s not that I don’t trust your benchmarks, it’s that I don’t trust anyones benchmarks including my own unless they have gone through a serious level of vetting. Even then…...
> 
> Regards,
> Michal
> 
> 
> Od: "Kirk Pepperdine" kirk.pepperdine at gmail.com
> Komu: "Michal Frajt" michal at frajt.eu
> Kopie:
> Datum: Tue, 22 May 2018 19:11:12 +0200
> Předmet: Re: OpenJDK G1 Patch
> 
> 
>> On May 22, 2018, at 9:28 AM, Michal Frajt <michal at frajt.eu <mailto:michal at frajt.eu>> wrote:
>> 
>> 
>> Hi Kirk,
>> 
>> -	weak references were in 2001 the only alternative for C++ smart pointers like design and I guess still there is no other solution available
> 
> I’m just on the back end of the Oracle Dev tour in Japan… Once I land I’ll work on the test case to hopefully sort out what is happening with references and G1 (if this is indeed the cause of the grief I’ve been trying to sort through) and I do plan on sharing the results.
> 
>> -	CMSTriggerInterval is not invoking the full collection cycle, it invokes normal concurrent CMS cycle (1 ParNew + 2 STW) so that there is at least one concurrent CMS cycle per the trigger interval (considering CMS runs due to CMSTriggerRatio and another reasons)
> 
> Yes, I understand but the ParNew before the CMS cycle has typically proven to be a huge loss overall.
> 
>> -	we are fine with running regular concurrent CMS cycle which next to releasing the weak references keeps our old gen always clean having much higher chance to survive sudden increase in the promotion rate
> 
> If you need to run frequent CMS cycles to keep tenured clean then it’s very very likely that you’re suffering from a premature promotion problem. We have tuning strategies to manage this situation that doesn’t require excessive triggering of CMS.
> 
> 
>> -	we never run the full cycle, or better we have never seen an end of the full cycle as applications are killed and restarted before it can finish
> 
> Right, I can typically have JVMs run for weeks or longer without suffering from concurrent mode failures. It’s about controlling frequency and promotion rates. And it’s not so easy as the aging process is affected by the size of Eden.
> 
> G1 pause times should cluster to some value that is approximately constant.. or should mostly band around a narrow range of values. To achieve this with minimal pause times and not risk suffering a full requires that you allocate more memory and a larger pause time goal. Unfortunately the larger heap sizes runs counter to your goal. However G1 was designed for big iron and I find that if you under-resource it you’re asking for high GC overhead and other troubles. For smaller heaps Parallel and/or CMS is still by far a better choice.
> 
> Kind regards,
> Kirk
>