Measuring STW Pauses

Per Liden per.liden at oracle.com
Wed Nov 21 09:09:06 UTC 2018


On 11/21/18 9:23 AM, Alex Elent wrote:
> Perfect! Watching the video now. Thanks for your help and have a great 
> night. Looking forward to sub 1ms pauses :)

Good! There are a more information and presentations available on the 
ZGC wiki, in case you haven't see that:

https://wiki.openjdk.java.net/display/zgc/Main

cheers,
Per

> 
> On Wed, Nov 21, 2018, 12:13 AM Per Liden <per.liden at oracle.com 
> <mailto:per.liden at oracle.com> wrote:
> 
>     Hi Alex,
> 
>     On 11/21/18 8:45 AM, Alex Elent wrote:
>      > Hi Per,
>      >
>      > Thanks for reaching out so quickly. I've subscribed to zgc-dev as
>     well.
>      >
>      > Complete transparency, I'm not that familiar with Java GC.
>     Although that
>      > was probably made obvious from my previous question.
>      >
>      > Let me give you some context. I'd like to migrate to ZGC from
>     Parallel
>      > GC. When using Parallel GC the GC log has clear entries for "Pause"
>      > events which I understand to be stop the world pauses. For
>     example Pause
>      > Full Ergonomics or Pause Full System.gc ().
>      >
>      > What gc log timings would one need to parse in ZGC logs in order to
>      > measure total stop the world pauses?
> 
>     The numbers you quoted in your initial mail are correct. I.e. they are
>     the pauses you had in that GC cycle. So 4.975ms was your longest pause
>     there. It's just that taking the sum of the three GC pauses isn't the
>     right way to look at it.
> 
>     If you enable -Xlog:gc*:gc.log you'll also see the "Garbage Collection
>     Statistics" table being printed every 10 seconds. There's you'll see
>     pause time average and max over time.
> 
>     I did a presentation at Oracle Code One, where there's a section on how
>     to interpret the GC logs. Watching that video might be helpful here.
>     It's available here https://www.youtube.com/watch?v=kF_r3GE3zOo and the
>     slides here:
>     http://cr.openjdk.java.net/~pliden/slides/ZGC-OracleCodeOne-2018.pdf
> 
>     Also note that when switching from ParallelGC to ZGC, you're switching
>     from a collector optimized for throughout to one optimized for low
>     latency. Maybe that's exactly what you want (i.e. you want better
>     application response times), but keep in mind that these collectors are
>     very different and make different trade-offs.
> 
>     cheers,
>     Per
> 
>      >
>      > Thanks again,
>      >
>      > On Tue, Nov 20, 2018, 11:20 PM Per Liden <per.liden at oracle.com
>     <mailto:per.liden at oracle.com>
>      > <mailto:per.liden at oracle.com <mailto:per.liden at oracle.com>> wrote:
>      >
>      >     Hi Alex,
>      >
>      >     (note that you need to be subscribed to
>     zgc-dev at openjdk.java.net <mailto:zgc-dev at openjdk.java.net>
>      >     <mailto:zgc-dev at openjdk.java.net
>     <mailto:zgc-dev at openjdk.java.net>> to be
>      >     able to send messages to the list, else a moderator has to
>     approve each
>      >     message).
>      >
>      >     On 11/21/18 12:52 AM, Alex Elent wrote:
>      >      > Hello,
>      >      >
>      >      > I read through JEP 333 and it looks incredibly promising. My
>      >     question is,
>      >      > how does one measure the STW pauses in ZGC?
>      >      >
>      >      > I read through the mailing list archives and understand
>     there are
>      >     3 pauses:
>      >      > Pause Mark Start
>      >      > Pause Mark End
>      >      > Pause Relocate Start
>      >      >
>      >      > Using the below snippet of gc log:
>      >      > [2018-11-20T23:34:46.180+0000][5135ms] GC(1) *Pause Mark Start
>      >     4.975ms*
>      >      > [2018-11-20T23:34:46.301+0000][5256ms] GC(1) Concurrent Mark
>      >     120.118ms
>      >      > [2018-11-20T23:34:46.301+0000][5256ms] GC(1) *Pause Mark
>     End 0.684ms*
>      >      > [2018-11-20T23:34:46.304+0000][5259ms] GC(1) Concurrent
>     Process
>      >     Non-Strong
>      >      > References 2.308ms
>      >      > [2018-11-20T23:34:46.304+0000][5259ms] GC(1) Concurrent Reset
>      >     Relocation
>      >      > Set 0.008ms
>      >      > [2018-11-20T23:34:46.304+0000][5259ms] GC(1) Concurrent
>     Destroy
>      >     Detached
>      >      > Pages 0.001ms
>      >      > [2018-11-20T23:34:46.305+0000][5260ms] GC(1) Concurrent Select
>      >     Relocation
>      >      > Set 1.072ms
>      >      > [2018-11-20T23:34:46.306+0000][5261ms] GC(1) Concurrent
>     Prepare
>      >     Relocation
>      >      > Set 1.078ms
>      >      > [2018-11-20T23:34:46.310+0000][5265ms] GC(1) *Pause Relocate
>      >     Start 3.764ms*
>      >      >
>      >      > Does that mean the STW pause for that cycle was 4.975 +
>     0.684 + 3.764
>      >      > (9.423)?
>      >
>      >     No, the goal of 10ms max pause time refers to each individual
>     pause,
>      >     not
>      >     the sum of them. The three pauses in ZGC are typically spaced
>     out in
>      >     time (by heavy operations such as concurrent marking and
>      >     relocation), so
>      >     treating them as one single pause would not make sense. Note
>     that we
>      >     also track MMU (Minimum Mutator Utilization, also printed in
>     the GC
>      >     log), which reflects pause times and how they are spaced out
>     in time.
>      >
>      >     cheers,
>      >     Per
>      >
>      >      >
>      >      > Thanks in advance,
>      >      >
>      >
> 


More information about the zgc-dev mailing list