RFR: 8297154: Improve safepoint cleanup logging

Robbin Ehn rehn at openjdk.org
Mon Nov 21 08:14:08 UTC 2022


On Wed, 16 Nov 2022 18:52:50 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> When backporting [JDK-8280784](https://bugs.openjdk.org/browse/JDK-8280784), I realized that safepoint cleanup logging added by [JDK-8219436](https://bugs.openjdk.org/browse/JDK-8219436) is misleading. `-Xlog:safepoint` output includes "cleanup" time into "Reaching safepoint", which users can arguably expect to be only the time-to-safepoint. With large reported "Reaching safepoint" time, users would chase a safepoint polling / scheduling problem, rather than suspect the safepoint cleanup costs. I was confused about this myself, and only reading the safepoint timing code made the picture more clear.
> 
> It can be observed if you enable both `-Xlog:safepoint` and `-Xlog:safepoint+stats` and run this on busy machine:
> 
> 
> $ build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:+SafepointALot -XX:GuaranteedSafepointInterval=1000 [1.647s][info][safepoint,stats] VM Operation                 [ threads: total initial_running ][ time:       sync    cleanup       vmop      total ] page_trap_count
> [1.647s][info][safepoint,stats] Cleanup                      [             15               1 ][            26541     240544      30371     297456 ]               0
> [1.647s][info][safepoint      ] Safepoint "Cleanup", Time since last: 1499772987 ns, Reaching safepoint: 267085 ns, At safepoint: 30371 ns, Total: 297456 ns
> 
> 
> Note how "sync" takes only 26541 ns, "cleanup" takes 240544 ns, yet "Reaching safepoint" reports the sums of both.
> 
> To fix this, I propose we report "Cleanup" times separately. This changes the logging statement. If we want to keep the logging statement intact, we can move "cleanup" to "At safepoint" costs, which would be slightly less confusing. This is what we do with `RuntimeService::record_safepoint_end` anyway: management layer clearly separates "sync" and "safepoint" time, and "cleanup" more clearly belongs to "safepoint" time here.
> 
> While we are at it, we can also track and report max cleanup time.
> 
> Example output after:
> 
> 
> [1.099s][info][safepoint,stats] VM Operation                 [ threads: total initial_running ][ time:       sync    cleanup       vmop      total ] page_trap_count
> [1.099s][info][safepoint,stats] Cleanup                      [             15               1 ][            77391     124662       5420     207473 ]               0
> [1.099s][info][safepoint      ] Safepoint "Cleanup", Time since last: 1013526788 ns, Reaching safepoint: 77391 ns, Cleanup: 124662 ns, At safepoint: 5420 ns, Total: 207473 ns
> 
> 
> Additional testing:
>  - [x] Eyeballing the safepoint logs
>  - [x] Linux x86_64 fastdebug `tier1`, `tier2`

Looks good, thank you!

-------------

PR: https://git.openjdk.org/jdk/pull/11194


More information about the hotspot-runtime-dev mailing list