[jdk19] RFR: 8290867: Race freeing remembered set segments [v2]
Thomas Schatzl
tschatzl at openjdk.org
Wed Aug 3 08:01:58 UTC 2022
On Wed, 3 Aug 2022 07:51:06 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:
>> Hi all,
>>
>> please review this fix for a crash due to a race in remembered set segment deallocation. Here is the description (provided by chaeubl as reported):
>>
>> - Thread A executes `G1SegmentedArray::create_new_segment` and tries to pop an element from the `_free_segment_list`. For that, thread A executes `LockFreeStack::pop()`
>> - Thread A reads `LockFreeStack::top()`
>> - Thread B executes `LockFreeStack::pop()`, also reads `LockFreeStack::top()` and pops that element from the stack
>> - Thread B executes `Atomic::cmpxchg(&_first, prev, next);` in `G1SegmentedArray::create_new_segment` but it fails because another thread already registered a different segment
>> - Thread B calls `G1SegmentedArraySegment::delete_segment` and frees the value
>> - Thread A tries to access `top()->next` in `LockFreeStack::pop()`, which causes a segfault because `top()` was freed by thread B
>>
>> The fix is to delay the deletion of that memory segment until all readers (i.e. in `G1SegmentedArrayFreeList::get` calling `_list.pop()`) drop the references to that memory segment. The readers are already guarded by a `CriticalSection`.
>>
>> Testing: tier1-5 running, reproducer that adds extra delays that significantly delays to widen the opportunity this race can occur passes on BigRAMTester (otherwise crashes in a few seconds)
>>
>> Thanks,
>> Thomas
>
> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix crash when exiting
Testing showed an issue with VM shutdown: VM shutdown calls the the destructor for the (global) free list pool, which in turn frees all segments; however at that time the VM is not in a state where `GlobalCounter` works (threads are detached, no valid threads list) and crashes.
The last commit provides a workaround for that issue, not doing the global synchronization when shutting down. That is safe because the process is single-threaded in the `VM_Exit` operation anyway.
This is a workaround for this issue that needs to be cleaned up, but it's too late imho in the release to find a better solution for this issue.
Passes tier1-5 with no crashes.
-------------
PR: https://git.openjdk.org/jdk19/pull/152
More information about the hotspot-gc-dev
mailing list