RFR: 8325202: gc/g1/TestMarkStackOverflow.java intermittently crash: G1CMMarkStack::ChunkAllocator::allocate_new_chunk [v2]

Albert Mingkun Yang ayang at openjdk.org
Tue Feb 20 13:58:55 UTC 2024


On Tue, 20 Feb 2024 12:10:08 GMT, Ivan Walulya <iwalulya at openjdk.org> wrote:

>> Please review this bug fix to guarantee that expanding the G1CMMarkStack considers the expected capacity for the thread triggering the expansion instead of expanding based on current capacity. 
>> 
>> Failure Mode:
>> Assume current stack capacity is 1:
>> 
>> 1: Thread 1: Obtains `cur_idx` as 1 and notices that the associated bucket is not allocated, indicating insufficient capacity. It then attempts to double the capacity to 2.
>> 2: Thread 2: Obtains `cur_idx` as 2 and finds that the bucket associated with index 2 is also not allocated. Consequently, Thread 2 also tries to expand the stack.
>> 3: Due to a delay in Thread 1's execution, Thread 2 acquires the lock first and initiates the expansion by calling the `expand()` function. This function doubles the capacity from 1 to 2.
>> 4: However, upon returning from the expansion, the bucket associated with `cur_idx` 2 remains unallocated. Consequently, when Thread 2 tries to access this bucket, it crashes.
>> 
>> The problem is that the `expand()` function is called without considering the specific thread context that necessitated the expansion. Instead, it expands the capacity based on the current size of the stack, leading to races when multiple threads concurrently attempt to expand the stack's capacity.
>> 
>> Testing: - Tier 1 -3 
>>                - 2M test iterations (bug is reproducible ~1/100k test iterations)
>
> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Thomas Review

src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 159:

> 157:         return nullptr;
> 158:       }
> 159:     }

As I understand it, the issue is that `expand` doesn't guarantee that `_buckets[bucket]` is filled. What do you think of the idea of using a `while` loop here? (one doesn't need a new API and the `*2` logic stays centralized.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17912#discussion_r1495865601


More information about the hotspot-gc-dev mailing list