RFR: 8373022: serviceability/sa/ClhsdbScanOops.java assumes no GC should occur

Fri Dec 5 11:07:35 UTC 2025

On Thu, 4 Dec 2025 17:12:44 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

>> Hello,
>> 
>> If the initial heap size is set too low in serviceability/sa/ClhsdbScanOops.java, a GC migh run, which will interfere with the test and might cause it to fail. 
>> 
>> The test is scanning the oops in a region of the heap, and after a GC that region appears to be empty, so the output that the test expects is not present. Running the test with a larger explicit InitialHeapSize gives enough headroom to not run a GC.
>> 
>> Testing: 
>> * serviceability/sa/ClhsdbScanOops.java originally failed when run with `-XX:InitialRAMPercentage=0` (which is the new default). We now explicitly set `-XX:InitialHeapSize=100M`. I've rerun the test 10 times with Serial and Parallel for each test and they all pass.
>
>> > It's probably just the timing of the GC that determines whether the initial small heap is a problem or not. If you want the SA tests to be reliable with something like InitialRAMPercentage=0, probably all of the tests should be updated. However, personally I don't think this type of fix should be necessary unless you feel testing in the manner is something we want to support. There are plenty of tests that start failing when non-standard command line options are used.
>> 
>> FYI we just integrated a change that sets InitialRAMPercentage=0 for JDK 26 that we've been working on (see #28641). We've run up to Oracle's tier8 twice now, and apart from the tests that are included in this PR, we've not seen any other SA failures.
>> 
>> Of course there might be other intermittent failures in the future, in which case I see two approaches moving forward: problem listing or bumping the initial heap size for the affected tests, or going over all SA tests and making sure that they all run with a "large" initial heap size (like 100MB). Unless we start seing many (for some definition of many) test failures from now, a pragmatic compromise is to selectively bump the initial heap size of such tests, like I do in this PR.
>> 
>> Of course, the optimal approach would be to make any affected SA tests more robst to GC timings. But, since I'm not sure how much time we want to invest in improving SA tests, bumping the heap size is likely a good compromise here.
> 
> For the most part the SA tests are fine if there is a GC. The way they usually work is to launch the debuggee, wait for it to reach a stable point (all threads idle), and then start to query the debuggee. If a GC happens before reaching stability, that should be fine, and after stability we wouldn't expect any GCs no matter what the heap size is. There are some SA tests that run on active processes where GCs can happen, but they are written to allow for errors.

Thank you @plummercj for analysing the tests. I've removed the explicit InitialHeapSize from test/jdk/com/sun/jdi/MethodInvokeWithTraceOnTest.java in favor of https://github.com/openjdk/jdk/pull/28666. I've also removed test/hotspot/jtreg/serviceability/sa/ClhsdbScanOops.java from the ProblemList.

The two sets of tests that are associated with this issue but not addressed in this PR are about to be solved in https://github.com/openjdk/jdk/pull/28666 and https://github.com/openjdk/jdk/pull/28655. I'm holding off on this PR until those changes are integrated.

I've updated both the issue and this PR to reflect the new changes.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28637#issuecomment-3616404117