RFR: 8198668: MemoryPoolMBean/isUsageThresholdExceeded/isexceeded001/TestDescription.java still failing [v2]
Kevin Walls
kevinw at openjdk.org
Wed Jun 29 08:30:39 UTC 2022
On Tue, 28 Jun 2022 11:43:42 GMT, Kevin Walls <kevinw at openjdk.org> wrote:
>> Test has been problemlisted for a long time due to intermittent failures.
>>
>> This is a difficult test as it tries to monitor usage thresholds on Memory Pools which are outside its control.
>> Not just Java heap pools, where the allocation it makes may or may not affect a particuclar pool, but non-heap pools such as CodeHeap and Metadata, where other activity in the VM can affect their usage and surprise the test.
>>
>> The test iterates JMX memory pools where thresholds are supported, sets a threshold one byte higher than current usage, and makes an allocation. This only makes sense on Java heap pools. It is tempting to skip non-heap pools, but this test can still give a sanity test about threshold behaviour. That is actually its main purpose, as the allocation is unlikely to affect the pool being tested.
>>
>> With the changes here, I'm seeing the test and all its variations pass reliably, i.e. 50 iterations in each tested platform.
>>
>> Skip testing a non-heap memory pool, e.g. CodeHeap, if it is hitting the threshold while we test, because that means it is changing outside our control. Also re-test isExceeded on failure, as fetching the usage and isExceeded is a race.
>>
>> Logging of more pool stats to better understand failures.
>
> Kevin Walls has updated the pull request incrementally with one additional commit since the last revision:
>
> Show log output
>
Thanks Thomas -
It's not a great test. 8-)
It is an old test, and has been problemlisted for a long time. That means it isn't run all the time, but does get run, and can fail, so causes bug reports and takes up people's time.
I went with making it more robust, in that I used to be able to see false positives frequently, and now I can see none. If there are future failures, I would revisit.
Yes using peak usage is good, it already did do that, but it was confusing when it prints the result of monitor.getPeakUsage(), then makes a comparison by calling monitor.getPeakUsage() again - you may not get the same value so what we log and what we compare aren't the same.
I try to avoid the races in this far from ideal test, and noticing when the pool is changing outside our control avoids false failures in CodeHeaps which I was seeing frequently.
The small allocation and then checking if thresholds are reached: yes I think I covered that this is really unlikely to test much. However I have seen it hit the G1 old gen at the right time, where it observes 0 usage, makes the allocation, and then presumably there's been a GC and that gen has significant usage and the threshold is reached.
5 pool java.lang:name=G1 Old Gen,type=MemoryPool of type: Heap memory
supports usage thresholds
used value is 0 max is 31675383808 isExceeded = false
threshold set to 1
threshold count 0
reset peak usage. peak usage = 0 isExceeded = false
Allocated heap. isExceeded = true
used value is 16740352 max is 31675383808 isExceeded = true
peak used value is 16740352 peak max is 31675383808
Not claiming that makes it a great test, but I would like to get it back in circulation, so we can find out if it causes more noise, and can consider whether it is worth keeping or reworking further.
-------------
PR: https://git.openjdk.org/jdk/pull/9309
More information about the serviceability-dev
mailing list