Integrated: 8373106: JFR suspend/resume deadlock on macOS in pthreads library
Markus Grönlund
mgronlun at openjdk.org
Tue Jan 13 19:43:47 UTC 2026
On Mon, 12 Jan 2026 21:29:26 GMT, Markus Grönlund <mgronlun at openjdk.org> wrote:
> Greetings,
>
> this change effectively reverts [JDK-8358429](https://bugs.openjdk.org/browse/JDK-8358429), which was an attempt to minimize the time the Threads_lock is held during JFR sampling. That change was premised on the, at the time, two known reasons for why we held the Threads_lock during the entire sampling interval.
>
> After this change, subtle deadlocks happened on macOS, very intermittently, in the pthreads library, in that a suspended thread could be the owner of an internal process lock, a process lock that was then needed when sending pthread_kill signal to resume it.
>
> By rolling back to holding the Threads_lock for the entire duration of the sampling interval (like we have done for many many years in the era before JFR Cooperative Sampling), we prevent JavaThreads from calling os::create_thread().
>
> I have decided to rollback the solution to the version we know work, instead of attempting a more granular solution, perhaps using sigprocmask() to create a critical section around pthread_create in os_bsd.cpp. This is something we might want to do later, but more time is then needed for falsifying / verifying the correct fix.
>
> Testing: jdk_jfr, stress testing
>
> Thanks
> Markus
>
> PS Indirect barriers removed are explicitly re-inserted as per [JDK-8373485](https://bugs.openjdk.org/browse/JDK-8373485)
This pull request has now been integrated.
Changeset: b070367b
Author: Markus Grönlund <mgronlun at openjdk.org>
URL: https://git.openjdk.org/jdk/commit/b070367bdf980ef1c257cab485927db39b544241
Stats: 62 lines in 1 file changed: 12 ins; 18 del; 32 mod
8373106: JFR suspend/resume deadlock on macOS in pthreads library
Reviewed-by: egahlin
-------------
PR: https://git.openjdk.org/jdk/pull/29178
More information about the hotspot-jfr-dev
mailing list