RFR: 8198668: MemoryPoolMBean/isUsageThresholdExceeded/isexceeded001/TestDescription.java still failing [v2]

Wed Jun 29 05:46:40 UTC 2022

On Tue, 28 Jun 2022 11:43:42 GMT, Kevin Walls <kevinw at openjdk.org> wrote:

>> Test has been problemlisted for a long time due to intermittent failures.
>> 
>> This is a difficult test as it tries to monitor usage thresholds on Memory Pools which are outside its control.
>> Not just Java heap pools, where the allocation it makes may or may not affect a particuclar pool, but non-heap pools such as CodeHeap and Metadata, where other activity in the VM can affect their usage and surprise the test.
>> 
>> The test iterates JMX memory pools where thresholds are supported, sets a threshold one byte higher than current usage, and makes an allocation.  This only makes sense on Java heap pools.  It is tempting to skip non-heap pools, but this test can still give a sanity test about threshold behaviour.  That is actually its main purpose, as the allocation is unlikely to affect the pool being tested.
>> 
>> With the changes here, I'm seeing the test and all its variations pass reliably, i.e. 50 iterations in each tested platform.
>> 
>> Skip testing a non-heap memory pool, e.g. CodeHeap, if it is hitting the threshold while we test, because that means it is changing outside our control.  Also re-test isExceeded on failure, as fetching the usage and isExceeded is a race.
>> 
>> Logging of more pool stats to better understand failures.
>
> Kevin Walls has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Show log output

So, with your patch you now use peakUsage to notice transient usage increases that are not observable via normal usage counter, right? 

This is fine, but the test looks still too random for my taste, I'd be afraid of too many false positives.

The biggest issue is how the test tries to provoke usage increase. That won't work for metaspace nor code heap, so whatever movements they make is purely random, defined by concurrent activity.

A more robust - albeit not perfect - way to test this would be to either provoke usage from java (with metaspace, heavy class loading that hopefully outraces concurrent unloads; with code heap, provoking compiles), but probably simpler and more predictable would be whitebox functions that allocate memory directly in metaspace and code heap. 

You'd still have the concurrency factor, but with large enough increases the chance is good that it registers as positive usage increase. One also could repeat the operation several times in case one does not see a net increase, to lessen the chance of concurrent releases.

The increase should be largish also because Metaspace can satisfy allocations from an internal free list of blocks that have been prematurely released by the current loader, but are cached in its name since the arena lives as long as the loader lives. This would not register as usage increase.

> @kevinjwalls is this going to be impacted by:
> 
> #8831
> 
> ?

I think only in the sense that #8831 removes the compressed class space pool, so one less pool to test.

-------------

PR: https://git.openjdk.org/jdk/pull/9309