RFR: 8358645: Access violation in ThreadsSMRSupport::print_info_on during thread dump

Aleksey Shipilev shade at openjdk.org
Wed Jun 25 09:11:28 UTC 2025


On Tue, 24 Jun 2025 23:43:05 GMT, David Holmes <dholmes at openjdk.org> wrote:

> We started observing these occasional crashes involving the JfrSamplerThread (which is a NonJavaThread) during JFR's termination thread dump. Examination of the `print_info_on` code showed that it was making an invalid assumption that it is safe to walk the thread's thread-list at a safepoint, as that is not true when you are dealing with a NonJavaThread, as it is not held at the safepoint and so the list you are walking can disappear whilst you examine it. I added some test code to make this more likely and was able to trigger similar crashes - see JBS for details.
> 
> The simple fix is to only walk JavaThreads.
> 
> Testing
>  - tier 5 JFR tests
>  - selected tests with JFR explicitly enabled
>  - tier 1-3 (sanity)
> 
> Thanks

OK, this makes sense.  I would probably phrase the comment to capture exactly the race we try to avoid. E.g.:

"We can only trust _threads_list_ptr when it is not actively updated. This is only guaranteed when we are inspecting a JavaThread and we are at safepoint, or if any thread inspects itself."

This also raises a minor question if null-checking `_threads_list_ptr` in `EnableThreadSMRStatistics` block above is benign. Looks like it, but maybe we want to move that block under the safepoint/thread check as well.

-------------

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/25963#pullrequestreview-2957340492


More information about the hotspot-runtime-dev mailing list