RFR: 8293218: serviceability/tmtools/jstat/GcNewTest.java fails with "Error in the percent calculation" [v3]

Mon Sep 12 09:57:46 UTC 2022

On Fri, 9 Sep 2022 18:22:48 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

>> Kevin Walls has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Clarify that loop is for checking heap not changing.  Exception if continually changing.
>
> This code still isn't sitting right with me and I'm still not sure of it's original intent. The method is called `provokeGc`, but it seems to be quite a bit more complex than that. It (used to) loop 3 times, each time calculating the youngGen size as a % of the heap size (according to the MemoryPoolMXBean values). It then (on each iteration) allocates that much memory by applying the % to Runtime.maxMemory(), and does so twice. It could have just allocated the youngGen size twice, but instead used the round about way of determining how much to allocate. I'm not sure why.
> 
> In any case, the end result should be triggering at least one youngGen GC per iteration, but more likely two unless it was empty at the start. Then it follows this up with a full GC (on each iteration). So what is the point of any of this? It it testing consistency of the MemoryPoolMXBean values? Is it testing the relation between the MemoryPoolMXBean heap size and Runtime.maxMemory()?
> 
> You have changed it to only do the allocation part once rather than on every iteration, so now the loop is only checking if the heap is still shrinking, but just because heapSize == heapSize0 doesn't mean it's not shrinking. It can just mean no GC happened between the two Pools.getHeapCommittedSize() calls. I think maybe the original intend was to do the allocateHeap() part while the heap is shrinking, not just once after it has stabilized.
> 
> BTW, make sure you test with ZGC.

Thanks @plummercj 

Yes, the whole fraction/percentage calculation is odd.  Maybe it could just allocate eden size in bytes, but I was not rushing to throw it out.

I think it's clear that provokeGc wants to allocate and cause a GC.  Tripling the pair of allocations appears to be an effort make extra sure it does that, but I think we're agreeing there is no clear point to that.

The current provokeGc is from 2016:

8168396: Unexpected OOME in GcCauseTest02 and GcTest02
http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/1b7fd4c2f65e

...so there was a previous OOM problem, possibly because young gen was large and could not be promoted.  That probably inspired the fraction calculation.  I'm not sure it's the best way of doing things.

Yes, my update is to retry if we see a shrink between fetching eden and heap size.  It doesn't mean the heap has fully stabilized and has finished shrinking, it just means that we don't think a shrink occurred between fetching the two stats.  That means our fraction calculation should be OK.  No GC beteween the two stats being fetched is the common case of course, but now we are seeing enough GCs in between them to cause a frequent failures.

ZGC:
The tests in serviceability/tmtools/jstat do not run with ZGC or Shenandoah (requires vm.gc != "Z" & vm.gc != "Shenandoah").
They use e.g. "jstat -gcnew PID" which doesn't produce numbers with ZGC, so they would fail.

-------------

PR: https://git.openjdk.org/jdk/pull/10218