RFR: 8329088: Stack chunk thawing races with concurrent GC stack iteration [v2]
Patricio Chilano Mateo
pchilanomate at openjdk.org
Mon Apr 8 15:04:01 UTC 2024
On Fri, 5 Apr 2024 09:35:22 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:
>> When we thaw the last frame from a stack chunk, we non-atomically set the stack pointer (sp), and set its argsize to 0. Unfortunately, GC threads may iterate over the frames of the stack chunk concurrently. When initializing their stack frame iterator, they read the sp and argsize racingly. Since there is no synchronization between the threads, we may observe inconsistent pairs of sp and argsize, for example the updated sp with a stale argsize, or the updated argsize with a stale sp.
>>
>> At the core of the problem, the stack chunks define sp and argsize. The argsize is used to calculate where the bottom of the stack chunk is, which is required to determine if it is empty or not. This patch proposes to switch things around and store the bottom directly in the chunk, instead of argsize. Instead, argsize is calculated from the bottom. By changing the relationship of which property is stored and which property is calculated, we can simplify this code quite a bit.
>>
>> In the new model, is_empty() is true iff sp and bottom are exactly the same. Bottom is only set during freezing, never during thawing. The bottom is initialized whenever the bottom frame is frozen, and left untouched during thawing. Unlike thawing, the freeze operation does not race with the GC by design. Hence we have moved one of the racy mutations to the operation that doesn't race with the GC. The GC is now only exposed to changing sp(). It doesn't matter if it observes the old or new sp(), now that we have removed the only source if inconsistency describing said frame (racing argsize).
>>
>> Testing: tier1-5, manual testing of test/jdk/jdk/internal/vm/Continuation
>
> Erik Österlund has updated the pull request incrementally with one additional commit since the last revision:
>
> Nits
> In the new model, is_empty() is true iff sp and bottom are exactly the same. Bottom is only set during freezing, never during thawing. The bottom is initialized whenever the bottom frame is frozen, and left untouched during thawing. Unlike thawing, the freeze operation does not race with the GC by design. Hence we have moved one of the racy mutations to the operation that doesn't race with the GC. The GC is now only exposed to changing sp(). It doesn't matter if it observes the old or new sp(), now that we have removed the only source if inconsistency describing said frame (racing argsize).
>
So if the race happens only when resetting the stackChunk values when thawing the last frame, wouldn't it be enough to avoid clearing the argsize there? Because if we read the new sp when creating the stack frame iterator, regardless of the argsize value read, is_done() will be true so we won't iterate any frame. I'm trying to understand if the new model is needed to fix the race or that is part of a cleanup/refactoring.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/18643#issuecomment-2042991266
More information about the hotspot-dev
mailing list