RFR: 8198668: MemoryPoolMBean/isUsageThresholdExceeded/isexceeded001/TestDescription.java still failing [v2]

Wed Jun 29 08:30:39 UTC 2022

On Tue, 28 Jun 2022 11:43:42 GMT, Kevin Walls <kevinw at openjdk.org> wrote:

>> Test has been problemlisted for a long time due to intermittent failures.
>> 
>> This is a difficult test as it tries to monitor usage thresholds on Memory Pools which are outside its control.
>> Not just Java heap pools, where the allocation it makes may or may not affect a particuclar pool, but non-heap pools such as CodeHeap and Metadata, where other activity in the VM can affect their usage and surprise the test.
>> 
>> The test iterates JMX memory pools where thresholds are supported, sets a threshold one byte higher than current usage, and makes an allocation.  This only makes sense on Java heap pools.  It is tempting to skip non-heap pools, but this test can still give a sanity test about threshold behaviour.  That is actually its main purpose, as the allocation is unlikely to affect the pool being tested.
>> 
>> With the changes here, I'm seeing the test and all its variations pass reliably, i.e. 50 iterations in each tested platform.
>> 
>> Skip testing a non-heap memory pool, e.g. CodeHeap, if it is hitting the threshold while we test, because that means it is changing outside our control.  Also re-test isExceeded on failure, as fetching the usage and isExceeded is a race.
>> 
>> Logging of more pool stats to better understand failures.
>
> Kevin Walls has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Show log output

> 

Thanks Thomas -

It's not a great test. 8-)

It is an old test, and has been problemlisted for a long time.  That means it isn't run all the time, but does get run, and can fail, so causes bug reports and takes up people's time.

I went with making it more robust, in that I used to be able to see false positives frequently, and now I can see none.  If there are future failures, I would revisit.

Yes using peak usage is good, it already did do that, but it was confusing when it prints the result of monitor.getPeakUsage(), then makes a comparison by calling monitor.getPeakUsage() again - you may not get the same value so what we log and what we compare aren't the same.

I try to avoid the races in this far from ideal test, and noticing when the pool is changing outside our control avoids false failures in CodeHeaps which I was seeing frequently. 

The small allocation and then checking if thresholds are reached: yes I think I covered that this is really unlikely to test much.  However I have seen it hit the G1 old gen at the right time, where it observes 0 usage, makes the allocation, and then presumably there's been a GC and that gen has significant usage and the threshold is reached.

5 pool java.lang:name=G1 Old Gen,type=MemoryPool of type: Heap memory
  supports usage thresholds
     used value is 0      max is 31675383808 isExceeded = false
  threshold set to 1
  threshold count  0
  reset peak usage. peak usage = 0 isExceeded = false
  Allocated heap. isExceeded = true
     used value is 16740352      max is 31675383808 isExceeded = true
peak used value is 16740352 peak max is 31675383808

Not claiming that makes it a great test,  but I would like to get it back in circulation, so we can find out if it causes more noise, and can consider whether it is worth keeping or reworking further.

-------------

PR: https://git.openjdk.org/jdk/pull/9309