Status of JEP-8204088/JDK-8236073
Jonathan Joo
jonathanjoo at google.com
Mon Jun 7 23:16:04 UTC 2021
Hi Thomas,
I took some time to read through the bugs related to GCTimeRatio.
I think GCTimeRatio *may* work for this purpose, if all of the relevant
open issues are addressed. Like you mentioned in your email, I was indeed
able to repro the fact that even when GCTimeRatio is set to aggressive
levels (i.e. GCTimeRatio=1), too much of the heap is still allocated. So
fixing the related bugs may definitely help here, and I'll experiment more
with your proposed fixes. Furthermore, I'd like to also investigate how
well SoftMaxHeapSize works at keeping heap usage within the limit - you
mentioned in your earlier email that the heap sizing issues have been
addressed but I wasn't sure of the exact status of that. I'll patch your
changes at
https://github.com/tschatzl/jdk/tree/8238687-investigate-memory-uncommit-during-young-gc2
to get a firsthand idea.
However, one consideration against GCTimeRatio is that GCTimeRatio relies
on GC pause times, whereas ideally we can use total CPU overhead. (The
latter would be able to incorporate time spent by concurrent GC worker
threads, which may be constantly doing work in the background. As far as I
understand, this is not necessarily reflected in pause times.) Thus, I
believe there are slight differences there which make CPU overhead a more
accurate measurement of "load" than GC pause times (at least, for the use
case we anticipate here at Google).
We already have developed some internal patches which allow us to compute
GC CPU overhead, so using this metric to influence SoftMaxHeapSize
shouldn't be too much of a problem for us. Given that we have this
information:
1.
Do you see any benefit to using pause times to determine SoftMaxHeapSize
rather than CPU overhead? Is one more viable than the other?
2.
Do you think there is value in modifying GCTimeRatio to measure CPU
overhead rather than pause times?
3.
If not, would it be helpful to still introduce this functionality into
the JVM, perhaps as a new JVM flag like `GCCpuRatio`? (So as to not collide
with GCTimeRatio's existing functionality.)
Thank you for your insights and thoughtful responses, as always!
~ Jonathan
On Tue, May 25, 2021 at 2:38 AM Jonathan Joo <jonathanjoo at google.com> wrote:
> Hi Thomas,
>
> Again, thanks so much for the detailed response. Sounds good to me! I will
> take a more careful look at the bugs you mentioned.
>
> Also, once things have crystallized a bit more on our end I'll be sure to
> set up some time for further discussion.
>
> Thank you!
>
> ~ Jonathan
>
> On Fri, May 21, 2021 at 4:41 AM Thomas Schatzl <thomas.schatzl at oracle.com>
> wrote:
>
>> Hi,
>>
>> On 20.05.21 20:00, Jonathan Joo wrote:
>> > +cc Man Cao (manc at google.com <mailto:manc at google.com>)
>> >
>> > Hi Thomas,
>> >
>> > I've been thinking more about SoftMaxHeapSize and how we might use it.
>> > Our preliminary thoughts have revolved around using GC CPU overhead as
>> a
>> > metric to determine a reasonable SoftMaxHeapSize value (assuming
>> > SoftMaxHeapSize is dynamic and can change at runtime). Do you think
>> this
>> > is viable? For example, setting a predetermined target GC CPU overhead,
>> > and using this to either increase or decrease SoftMaxHeapSize
>> accordingly.
>>
>> Yes.
>>
>> >
>> > Doing this may also have the benefit of removing the need for
>> > MinHeapFreeRatio, MaxHeapFreeRatio, and GCTimeRatio flags. Because the
>> > heap size will be changed solely based on GC CPU usage, we may not need
>> > these separate flags to trigger heap resizing events.
>>
>> What you suggest is exactly like the GCTimeRatio flag (specified in a
>> different way though), and actually that's what it's supposed to do.
>> Size the amount of committed and used heap so that the gc cpu overhead
>> (or in this case ratio between gc cpu usage and mutator cpu usage) is
>> kept at a certain level.
>>
>> However as at least indicated, the current heap sizing based on
>> GCTimeRatio is *broken* (basically since day one), and expands too much,
>> even on stable loads. This is exactly what these other CRs I mentioned
>> earlier are/were supposed to fix. (JDK-8253413 and JDK-8238687, really,
>> please have a look what they do :) JDK-8247843 is then about re-tuning
>> default GCTimeRatio value; note that I'm not sure how it's specified as
>> a ratio is perfect).
>>
>> I think they also remove or at least push back the use of
>> MinHeapFreeRatio and MaxHeapFreeRatio. (Removing it a bit more with
>> JDK-8248324 I think, but there were thoughts to go further for full gc,
>> because otherwise it won't complement what non-full gcs do).
>>
>> After that I wanted to add SoftMaxHeapSize as another sizing condition
>> for cases that this heuristic does not catch.
>>
>> >
>> > I'm sure there are a number of factors that go into deciding whether a
>> > heap is under or over-provisioned, but I'm wondering if there are any
>> > significant ones that need to be considered alongside GC CPU usage. I
>> > can also see long pause times as being an indicator that GC may need to
>> > run more frequently, etc.
>>
>> Long pause times are often an indication of a) changing application
>> behavior or b) the prediction being way off.
>>
>> One of these patches I mentioned earlier also improves the latter by
>> remodeling young gen sizing. Unfortunately it does not fix cardinality
>> estimations for the remembered sets, which impact young gen sizing a
>> lot, that would be JDK-82231731 for which there are ideas/prototypes.
>> This kind of needed an overhaul of the remembered sets though
>> (JDK-8017163, which is out for review _now_...).
>>
>> Problem a) is kind of a research question that needs to be addressed at
>> some point. There is some experience about the causes and how one could
>> detect them, but nothing concrete. It seems that getting problem b) out
>> of the way will likely decrease the work to be spent on a) significantly
>> anyway...
>>
>> > (Though I'm not sure whether these will be
>> > implicitly encompassed as part of GC CPU overhead already.)
>>
>> Pauses are counted towards gctimeratio.
>>
>> >
>> > Let me know what you think - happy to also set up a meeting to discuss
>> > this in more detail.
>>
>> I believe you can say that we are aware of these issues, and there is
>> already some "grand plan" sort of in place (if you look at the bugs
>> assigned to me) to get at least approximately where you want to be - I
>> think at least, if I understand your problem and ideas correctly :). In
>> any case, getting there takes time, and hence help would be very
>> appreciated.
>>
>> Fine with me about talking about this in more detail.
>>
>> Thanks,
>> Thomas
>>
>
More information about the hotspot-gc-dev
mailing list