RFR: 8322957: Generational ZGC: Relocation selection must join the STS [v2]
Stefan Karlsson
stefank at openjdk.org
Fri Jan 12 09:26:19 UTC 2024
On Thu, 11 Jan 2024 10:03:44 GMT, Stefan Karlsson <stefank at openjdk.org> wrote:
>> The concurrent ZGC threads don't automatically participate in the safepoint protocol, which means that they can run concurrently with safepoint VM Operations. Instead they use other means to hook into the safepoint protocol whenever they need to make changes that could be racing with the various VM Operations. The most common way is to join the "suspendible thread set". For details around this see `SafepointSynchronize::begin` and the call to `Universe::heap()->safepoint_synchronize_begin()`.
>>
>> It turns out that the relocation selection phase was updated to use a call oop_iterate, to modify oops of some of the objects. This was done without having the GC threads join the suspendible thread set. This means that various VM Operations could run concurrently with the oop_iterate. This caused the failure described in JDK-8322957: The JFR Leak Profiler modified the object header bits, while the GC's oop_iterate function used the same bits to determine if the oop iteration over an object should be skipped. This lead to objects not being modified as they were supposed to, which lead to broken oops and asserts.
>>
>> The fix is quite small and could be limited to the lines added to [src/hotspot/share/gc/z/zRelocationSet.cpp](https://github.com/openjdk/jdk/compare/master...stefank:jdk:8322957_sts_with_relocation_selection?expand=1#diff-883b7a72f757c1c5331769ad4a5c763335d0267ee33a0bc06896fa16d89ea58f). However, to lower the risk of reintroducing a bug like this again, we've added extra verification code. Some of the infrastructure to get the correct verification is placed outside of the GC code, and that's why this PR is sent to the hotspot-dev list.
>>
>> This has been tested with the reproducer of the original bug + tier1-7 on linux-x64-debug.
>
> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix release builds
Tier1-7 passes.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17368#issuecomment-1888729354
More information about the hotspot-dev
mailing list