What gc.log lines reliably indicate STW pauses for ZGC?

Leslie Zhai zhaixiang at loongson.cn
Fri Sep 15 05:52:53 UTC 2023


Hi Misha,

Thanks for your bug report!

> 2023年9月15日 13:07,Misha Dmitriev <mdmitriev at linkedin.com> 写道:
> 
> Hi Leslie,
> 
> I think it's worth letting you know, just in case: GCViewer seems to not handle ZGC logs very well at this time. I hope you can pass this information to its maintainers. I guess it's also worth notifying the ZGC group about this issue, in case you guys want to use GCViewer at some point. I used it a lot in the past to analyze CMS and G1 logs, and I still consider it a very good tool. So it would be sad if it doesn't support more modern GCs well.
> 
> Anyway, here is why I think so. I am attaching a short ZGC log, which was created by taking one of our real production logs and cutting it to a few hundred lines. When I grep it for pauses, I get the following:
> 
> $ grep ") Pause" jdk17-zgc.gc.log 
> [2023-09-05T18:54:16.683+0000][506290.976s][info   ][gc,phases        ] GC(76751) Pause Mark End 0.070ms
> [2023-09-05T18:54:16.765+0000][506291.058s][info   ][gc,phases        ] GC(76751) Pause Relocate Start 0.013ms
> [2023-09-05T18:54:20.922+0000][506295.215s][info   ][gc,phases        ] GC(76752) Pause Mark Start 0.017ms
> [2023-09-05T18:54:23.456+0000][506297.749s][info   ][gc,phases        ] GC(76752) Pause Mark End 0.060ms
> [2023-09-05T18:54:23.566+0000][506297.859s][info   ][gc,phases        ] GC(76752) Pause Relocate Start 0.010ms
> [2023-09-05T18:54:23.623+0000][506297.916s][info   ][gc,phases        ] GC(76753) Pause Mark Start 0.012ms
> [2023-09-05T18:54:24.664+0000][506298.956s][info   ][gc,phases        ] GC(76753) Pause Mark End 0.064ms
> [2023-09-05T18:54:24.721+0000][506299.013s][info   ][gc,phases        ] GC(76753) Pause Relocate Start 0.012ms
> 
> There are clearly 8 pauses. However, when I feed it to GCViewer (I built it from the latest source code), it says that there are just 3 pauses. It also prints some NaNs, probably due to lack of precision. On bigger logs we observed similar discrepancies, though the difference between pauses obtained by grepping and reported by GCViewer was not so big.

I am fixing the bug, owing to the enhancement about ZGC and ShenandoahGC was implemented by ourselves.

Thanks,
Leslie Zhai

> 
> Misha
> 
> From: Misha Dmitriev <mdmitriev at linkedin.com>
> Sent: Friday, September 8, 2023 11:40 AM
> To: Stefan Karlsson <stefan.karlsson at oracle.com>; zgc-dev at openjdk.org <zgc-dev at openjdk.org>
> Subject: Re: What gc.log lines reliably indicate STW pauses for ZGC?
>  Thank you very much, Stefan! Really appreciate your response.
> 
> Misha
> 
> From: Stefan Karlsson <stefan.karlsson at oracle.com>
> Sent: Thursday, September 7, 2023 11:08 PM
> To: Misha Dmitriev <mdmitriev at linkedin.com>; zgc-dev at openjdk.org <zgc-dev at openjdk.org>
> Subject: Re: What gc.log lines reliably indicate STW pauses for ZGC?
>  Hi Misha,
> 
> On 2023-09-06 21:15, Misha Dmitriev wrote:
>> Hi ZGC developers,
>> 
>> I work at LinkedIn, where we have internal tooling for end-to-end GC performance monitoring and analysis. To record GC pauses and other interesting events, we parse GC logs as they are generated, in near real time. Since JDK 11, the unified logging format allowed us to extract STW pauses for all GCs in the same straightforward way: we look for  lines such as:
>> 
>> [2020-03-09T19:12:22.542+0000][info][gc             ] GC(62189) Pause Young (Allocation Failure) 14263M->9334M(14745M) 41.359ms
>> 
>> that is, we check every line for the following condition (we don't use pattern matching for speed):
>> 
>> if (line.endsWith("ms") && line.contains("[gc ") && line.contains("Pause ")) {
>>   // Extract pause time in ms from the end of the line
>> 
>> However, looks like ZGC, though it uses the unified logging format, has a different format for STW pause lines. From the logs that we have, I am not 100% sure I understand everything. So far looks like the candidates lines look like
>> 
>> [2023-09-05T18:47:27.323+0000][505881.615s][info   ][gc,phases        ] GC(76657) Pause Mark Start 0.024ms
>> 
>> i.e. the condition should be
>> 
>> if (line.endsWith("ms") && line.contains("[gc,phases ") && line.contains("Pause ")) {
>>   // Extract pause time in ms from the end of line
>> 
>> I wonder if this is all we need or I am missing something?
> 
> I think you also need to search for 'info' to make sure you don't get the more detailed breakdown of the pauses.
> 
>> 
>> Also, what's the meaning of lines like these:
>> 
>> [2023-09-05T18:47:44.003+0000][505898.295s][debug  ][gc,phases        ] GC(76661) Pause Mark Try Complete (ZWorker#2) 0.002ms
>> 
>> Formally they look similar to the previous line, but "debug" and "ZWorker#2" suggests that it may not be a STW pause but pausing just one thread? Please advise.
> 
> This line prints what one of the GC worker threads was doing during the Mark End pause. The "(ZWorker#2)" indicates that this a GC worker thread. You should filter out these lines if you try to collect the GC pauses.
> 
> FWIW, in the version you are running this line is printed on the 'debug' level. However, in the latest source code this has changed and worker logging has been moved to the 'trace' level. There we have this breakdown:
> 1) 'info' logs a full pause or concurrent phase
> 2) 'debug' logs a subphase of (1)
> 3) 'trace' logs what individual worker threads are doing
> 
> HTH,
> StefanK
> 
>> 
>> Thank you,
>> 
>> Misha
>> 
> 
> <jdk17-zgc.gc.log>



More information about the zgc-dev mailing list