RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2]

Daniel D. Daugherty dcubed at openjdk.org
Tue May 16 20:16:45 UTC 2023


On Tue, 16 May 2023 14:33:05 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. 
>> 
>> To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount.
>> I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route.
>> 
>> I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5.
>> 
>> Thanks,
>> Patricio
>
> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove extra assert

Summary: Thumbs up. This is most definitely NOT a trivial fix.

Normally the way I like to review these kinds of fixes is to map the failure
modes back to the fix just to make sure I understand how each of the
failure modes is covered by the changes. For this particular bug that has
NOT been an easy thing to do. I believe the failure modes are complicated
to follow because we "lost" the synchronization of being in the
`VTMS_unmount_begin` transition which allowed a thread calling
`recompute_enabled()` to race with our ending/exiting vthread/cthread
combo. Ouch.

The fix does restore the synchronization of being in the
`VTMS_unmount_begin` transition so I can see how we would no longer
be racing with the `recompute_enabled()` calling thread.

-------------

Marked as reviewed by dcubed (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/13949#pullrequestreview-1429333688


More information about the hotspot-dev mailing list