Integrated: 8297154: Improve safepoint cleanup logging
Aleksey Shipilev
shade at openjdk.org
Wed Nov 23 16:25:27 UTC 2022
On Wed, 16 Nov 2022 18:52:50 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> When backporting [JDK-8280784](https://bugs.openjdk.org/browse/JDK-8280784), I realized that safepoint cleanup logging added by [JDK-8219436](https://bugs.openjdk.org/browse/JDK-8219436) is misleading. `-Xlog:safepoint` output includes "cleanup" time into "Reaching safepoint", which users can arguably expect to be only the time-to-safepoint. With large reported "Reaching safepoint" time, users would chase a safepoint polling / scheduling problem, rather than suspect the safepoint cleanup costs. I was confused about this myself, and only reading the safepoint timing code made the picture more clear.
>
> It can be observed if you enable both `-Xlog:safepoint` and `-Xlog:safepoint+stats` and run this on busy machine:
>
>
> $ build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:+SafepointALot -XX:GuaranteedSafepointInterval=1000 [1.647s][info][safepoint,stats] VM Operation [ threads: total initial_running ][ time: sync cleanup vmop total ] page_trap_count
> [1.647s][info][safepoint,stats] Cleanup [ 15 1 ][ 26541 240544 30371 297456 ] 0
> [1.647s][info][safepoint ] Safepoint "Cleanup", Time since last: 1499772987 ns, Reaching safepoint: 267085 ns, At safepoint: 30371 ns, Total: 297456 ns
>
>
> Note how "sync" takes only 26541 ns, "cleanup" takes 240544 ns, yet "Reaching safepoint" reports the sums of both.
>
> To fix this, I propose we report "Cleanup" times separately. This changes the logging statement. If we want to keep the logging statement intact, we can move "cleanup" to "At safepoint" costs, which would be slightly less confusing. This is what we do with `RuntimeService::record_safepoint_end` anyway: management layer clearly separates "sync" and "safepoint" time, and "cleanup" more clearly belongs to "safepoint" time here.
>
> While we are at it, we can also track and report max cleanup time.
>
> Example output after:
>
>
> [1.099s][info][safepoint,stats] VM Operation [ threads: total initial_running ][ time: sync cleanup vmop total ] page_trap_count
> [1.099s][info][safepoint,stats] Cleanup [ 15 1 ][ 77391 124662 5420 207473 ] 0
> [1.099s][info][safepoint ] Safepoint "Cleanup", Time since last: 1013526788 ns, Reaching safepoint: 77391 ns, Cleanup: 124662 ns, At safepoint: 5420 ns, Total: 207473 ns
>
>
> Additional testing:
> - [x] Eyeballing the safepoint logs
> - [x] Linux x86_64 fastdebug `tier1`, `tier2`
This pull request has now been integrated.
Changeset: 086763a6
Author: Aleksey Shipilev <shade at openjdk.org>
URL: https://git.openjdk.org/jdk/commit/086763a629fe8b0592f5566c6e346716868d5e63
Stats: 11 lines in 2 files changed: 9 ins; 0 del; 2 mod
8297154: Improve safepoint cleanup logging
Reviewed-by: jsjolen, rehn
-------------
PR: https://git.openjdk.org/jdk/pull/11194
More information about the hotspot-runtime-dev
mailing list