Allocation stalls and Page Cache Flushed messages

Fri Oct 25 12:11:09 UTC 2019

On 10/24/19 7:36 PM, Sundara Mohan M wrote:
> HI Per,
> 
>  > This means that ZGC had cached heap regions (ZPages) that were ready to
>  > be used for new allocations, but they had the wrong size, so some memory
>  > (one or more ZPage) was flushed out from the cache so that the physical
>  > memory they occupied could be reused to build a new ZPage of the correct
>  > size.
> 
> "they had wrong size" does that mean we didn't have a cached page with 
> our new required page size?
> For ex. We cached few 2M page and now we need 4M page so it will flush 
> multiple pages in cache to get the new page with 4M?

Right, it in this case it would flush two 2M pages, and re-map them as a 
single contiguous 4M page.

/Per

> 
> 
> Thanks
> Sundar
> 
> On Tue, Oct 22, 2019 at 2:44 AM Per Liden <per.liden at oracle.com 
> <mailto:per.liden at oracle.com>> wrote:
> 
>     Hi,
> 
>     On 10/17/19 7:05 AM, Sundara Mohan M wrote:
>      > Hi,
>      >     We started using ZGC and seeing this pattern, Whenever there is
>      > "Allocation Stall" messages before that i see messages like this
>      > ...
>      > [2019-10-15T20:02:42.403+0000][71060.202s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 10M requested, 18M(18530M->18512M) flushed
>      > [2019-10-15T20:02:42.582+0000][71060.380s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 12M requested, 18M(17432M->17414M) flushed
>      > [2019-10-15T20:02:42.717+0000][71060.515s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 14M requested, 36M(16632M->16596M) flushed
>      > [2019-10-15T20:02:46.128+0000][71063.927s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 4M requested, 18M(7742M->7724M) flushed
>      > [2019-10-15T20:02:49.716+0000][71067.514s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 28M(2374M->2346M) flushed
>      > [2019-10-15T20:02:49.785+0000][71067.583s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 76M(2346M->2270M) flushed
>      > [2019-10-15T20:02:49.966+0000][71067.765s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 10M(2270M->2260M) flushed
>      > [2019-10-15T20:02:50.006+0000][71067.805s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 10M(2260M->2250M) flushed
>      > [2019-10-15T20:02:50.018+0000][71067.816s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 10M(2250M->2240M) flushed
>      > [2019-10-15T20:02:50.098+0000][71067.896s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 10M(2240M->2230M) flushed
>      > [2019-10-15T20:02:50.149+0000][71067.947s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 10M(2230M->2220M) flushed
>      > [2019-10-15T20:02:50.198+0000][71067.996s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 10M(2220M->2210M) flushed
>      > [2019-10-15T20:02:50.313+0000][71068.111s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 10M(2210M->2200M) flushed
>      > [2019-10-15T20:02:50.327+0000][71068.125s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 26M(2200M->2174M) flushed
>      > [2019-10-15T20:02:50.346+0000][71068.145s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 14M(2174M->2160M) flushed
>      > [2019-10-15T20:02:50.365+0000][71068.163s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 14M(2160M->2146M) flushed
>      > [2019-10-15T20:02:50.371+0000][71068.170s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 8M(2146M->2138M) flushed
>      > [2019-10-15T20:02:50.388+0000][71068.187s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 8M(2138M->2130M) flushed
>      > [2019-10-15T20:02:50.402+0000][71068.201s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 8M(2130M->2122M) flushed
>      > [2019-10-15T20:02:50.529+0000][71068.327s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 8M(2122M->2114M) flushed
>      > [2019-10-15T20:02:50.620+0000][71068.418s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 8M(2114M->2106M) flushed
>      > [2019-10-15T20:02:50.657+0000][71068.455s][info][gc,heap     ]
>     Page Cache
>      > Flushed: 2M requested, 14M(2106M->2092M) flushed
>      > ...
>      >
>      > 1. Any idea what this info message conveys about the system?
> 
>     This means that ZGC had cached heap regions (ZPages) that were ready to
>     be used for new allocations, but they had the wrong size, so some
>     memory
>     (one or more ZPage) was flushed out from the cache so that the physical
>     memory they occupied could be reused to build a new ZPage of the
>     correct
>     size. This is not in itself catastrophic, but it is more expensive than
>     a regular allocation as it involves remapping memory. The only real way
>     to avoid this is to increase the max heap size.
> 
>      >
>      > 2. Our application is allocating object 5X sometimes and
>     Allocation Stall
>      > happens immediately after that. Can ZGC heuristic learn about
>     this increase
>      > and later will it adjust accordingly? Other than increasing
>     memory is there
>      > any other tuning option to consider that will help in this scenario?
> 
>     In general, increasing the heap size and/or adjusting the number of
>     concurrent work threads (-XX:ConcGCThreads=X) are your main tuning
>     options. Using -XX:+UseLargePages is of course also good if you want
>     max
>     performance. There are a few other ZGC-specific options, but those
>     matter a whole lot less and will likely not help in this situation.
> 
>     The ZGC wiki has some more information on tuning:
>     https://wiki.openjdk.java.net/display/zgc/Main
> 
>     cheers,
>     Per
> 
>      >
>      > I am using JDK12 with 80G heap.
>      >
>      > Thanks
>      > Sundar
>      >
>