RFR: 8279016: JFR Leak Profiler is broken with Shenandoah [v9]
Erik Gahlin
egahlin at openjdk.org
Tue Nov 5 10:00:33 UTC 2024
On Mon, 4 Nov 2024 19:13:44 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> While testing unrelated Shenandoah patch, I caught a GC assert when Leak Profiler was running ([JDK-8337194](https://bugs.openjdk.org/browse/JDK-8337194)).
>>
>> Leak Profiler is notorious in using the mark words for its own needs. We have been trying to mitigate its impact on GCs by moving to separate bitsets for tracking marked objects, or by treating "marked without fwdptr" as "JFR marked" and handling it. But this is not reliable, since things like [putting indexes in mark word](https://github.com/openjdk/jdk/blob/3baff2af6a30cc6cb2e0d4391db1cf7e61c33f64/src/hotspot/share/jfr/leakprofiler/chains/edgeStore.cpp#L275-L280) sneak in. This is okay for Leak Profiler alone, since it restores the mark words after the operation completes, but that is still not enough when GC is already running.
>>
>> I say we side-step this whack-a-mole by cleanly bailing from JFR op, when we know it is unsafe to do. I thought to use `VM_Operation::doit_prologue`, but I think GC start may sneak in between checking in prologue and op start.
>>
>> This realistically only affects Shenandoah. All other STW collectors would never see what Leak Profiler did with mark words. ZGC would not see it, since it does not care about mark words for its own operation.
>>
>> Additional testing:
>> - [x] `jdk_jfr` pass by default
>> - [x] `jdk_jfr` now passes with `-XX:+UseShenandoah`
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits:
>
> - Merge branch 'master' into JDK-8279016-jfr-leak-profiler-shenandoah
> - Disable Shenandoah-targeted test specially
> - Redo the whole thing by disabling Leak Prof with Shenandoah
> - Merge branch 'master' into JDK-8279016-jfr-leak-profiler-shenandoah
> - Merge branch 'master' into JDK-8279016-jfr-leak-profiler-shenandoah
> - Merge branch 'master' into JDK-8279016-jfr-leak-profiler-shenandoah
> - Just exclude the tests for Shenandoah
> - Merge branch 'master' into JDK-8279016-jfr-leak-profiler-shenandoah
> - Tighten up comments prose
> - Roman's review: more precise GC state check, more includes
> - ... and 3 more: https://git.openjdk.org/jdk/compare/23fa1a33...d47b1665
I'm not sure why all these tests are run with Shenandoah (or any specific GC). The purpose of these unit tests is to check the Leak Profiler implementation, for example, that the object age is written correctly or that array information is serialized properly. It doesn't matter which GC, compiler, etc. is being used.
When the JFR tests were initially written, the purpose of the jtreg "jfr" tag was to filter out the JFR tests so they don't receive external flags. Since then, I think vm.flagless has been added. It may be more appropriate.
If the interaction with a certain GC needs to be tested, it's better to write a dedicated test for that, like TestG1.java and TestZ.java. If such a test doesn't work, it can be put on the ProblemList.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20328#issuecomment-2456731236
More information about the hotspot-jfr-dev
mailing list