RFR: 8369283: Improve trace logs in safepoint machinery [v4]

Aleksey Shipilev shade at openjdk.org
Wed Oct 8 07:36:36 UTC 2025


On Wed, 8 Oct 2025 05:28:20 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Rethink safepoint sync logging
>
> src/hotspot/share/runtime/safepoint.cpp line 348:
> 
>> 346:   _nof_threads_hit_polling_page = 0;
>> 347: 
>> 348:   log_debug(safepoint)("Safepoint synchronization initiated using %s wait barrier. (%d threads)", _wait_barrier->description(), nof_threads);
> 
> I don't see the point in moving this partially into `SafepointTracing::begin` but losing the information as to how many threads are being handled.

I went back and forth on this myself yesterday, but settled on this variant, because:

 1. Completeness: the older placing was IMO too late. I think the logging statement is currently there to capture the total number of threads, which you can only reliably get after taking a `Threads_lock`. But in doing so we miss the time we took to take a `Threads_lock` itself, not to mention `Universe::safepoint_synchronize_begin()` that coordinates with GC. Both are parts of safepoint sync, IMO.
 2. Usefulness: the number of total threads makes little sense for arming, since it is likely the majority of non-runnable threads would never actually reach the wait barrier. The number of threads, including the number of runnable threads we actually handled, is printed in `SafepointTracing::end()`. The minor problem with that: in debugging, vm op might have crashed before reaching the end of safepoint.
 3. Consistency: all logging statements of the same "level" are in `SafepointTracing`. 
  
So, to make this even more complete, I added another tracing log right before arming, and before after the initial filtering, where we know exactly how many runnable and total threads we are dealing with.

It would be like this:


[5.500s][debug][safepoint] Safepoint synchronization begins 
[5.500s][trace][safepoint] Arming futex wait barrier
[5.500s][trace][safepoint] Setting thread local yield flag for threads
[5.500s][trace][safepoint] 26 total threads, waiting for 12 threads to block
[5.500s][trace][safepoint] Checking thread status
[5.500s][trace][safepoint] Thread 0x000073cc5c47f670 is still running
...
[5.500s][trace][safepoint] Thread 0x000073cc5c494e90 is still running
[5.500s][trace][safepoint] Waiting for 12 threads to block
...

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27673#discussion_r2412870376


More information about the hotspot-dev mailing list