Integrated: 8293864: Kitchensink24HStress.java fails with SIGSEGV in JfrCheckpointManager::lease

Markus Grönlund mgronlun at openjdk.org
Mon Oct 10 12:42:35 UTC 2022


On Wed, 28 Sep 2022 12:33:20 GMT, Markus Grönlund <mgronlun at openjdk.org> wrote:

> Greetings,
> 
> With [JDK-8289692](https://bugs.openjdk.org/browse/JDK-8289692), a need arose to introduce a mutex mechanism for the buffers on the global list to prevent the JFR Recorder Thread from resetting buffers currently in use.
> 
> The introduced mutex mechanism was adequate as far as preventing the race condition. However, its protective characteristic was overemphasized, leading to the mistake of introducing a release, or removal, operation onto a current epoch list, as part of periodic flushing. The problem is that although the callback operations are safe, the underlying list type used with the global mspace does not allow for concurrent excision of nodes. The crash occurs because a thread attempts to dereference the next pointer to traverse a node that is excised and deleted after the thread loaded it.
> 
> I have reviewed how to handle better the introduction of virtual threads and their checkpoint data. I have concluded that the best way is to preserve the old system (pre-Loom) as much as possible, keeping the global and thread-local mspaces as before. Instead, an additional mspace dedicated solely to virtual threads is introduced. This categorization preserves the matching of operations against the underlying list infrastructure. It also provides flexibility in handling virtual threads distinct from regular, or "carrier", threads. The separate mspace of virtual threads is more dynamic as it does not need to preallocate buffers, a memory win for systems not using virtual threads. Also, the sizes of the different buffer types can be better controlled. With this restructuring, there are no longer any list concurrency issues; hence the mutex mechanism introduced with [JDK-8289692](https://bugs.openjdk.org/browse/JDK-8289692) becomes obsolete. Applying a release operation for the curren
 t epoch list, of the global mspace, as part of flushing is a broken invariant and is therefore removed.
> 
> Testing: jdk_jfr, stress testing
> 
> Thanks
> Markus

This pull request has now been integrated.

Changeset: 35d17a00
Author:    Markus Grönlund <mgronlun at openjdk.org>
URL:       https://git.openjdk.org/jdk/commit/35d17a00ab4028071a8fc7cd781b3306e6811970
Stats:     206 lines in 9 files changed: 89 ins; 33 del; 84 mod

8293864: Kitchensink24HStress.java fails with SIGSEGV in JfrCheckpointManager::lease

Reviewed-by: egahlin

-------------

PR: https://git.openjdk.org/jdk/pull/10467


More information about the hotspot-jfr-dev mailing list