Allocation stalls and Page Cache Flushed messages
Per Liden
per.liden at oracle.com
Fri Oct 25 12:11:09 UTC 2019
On 10/24/19 7:36 PM, Sundara Mohan M wrote:
> HI Per,
>
> > This means that ZGC had cached heap regions (ZPages) that were ready to
> > be used for new allocations, but they had the wrong size, so some memory
> > (one or more ZPage) was flushed out from the cache so that the physical
> > memory they occupied could be reused to build a new ZPage of the correct
> > size.
>
> "they had wrong size" does that mean we didn't have a cached page with
> our new required page size?
> For ex. We cached few 2M page and now we need 4M page so it will flush
> multiple pages in cache to get the new page with 4M?
Right, it in this case it would flush two 2M pages, and re-map them as a
single contiguous 4M page.
/Per
>
>
> Thanks
> Sundar
>
> On Tue, Oct 22, 2019 at 2:44 AM Per Liden <per.liden at oracle.com
> <mailto:per.liden at oracle.com>> wrote:
>
> Hi,
>
> On 10/17/19 7:05 AM, Sundara Mohan M wrote:
> > Hi,
> > We started using ZGC and seeing this pattern, Whenever there is
> > "Allocation Stall" messages before that i see messages like this
> > ...
> > [2019-10-15T20:02:42.403+0000][71060.202s][info][gc,heap ]
> Page Cache
> > Flushed: 10M requested, 18M(18530M->18512M) flushed
> > [2019-10-15T20:02:42.582+0000][71060.380s][info][gc,heap ]
> Page Cache
> > Flushed: 12M requested, 18M(17432M->17414M) flushed
> > [2019-10-15T20:02:42.717+0000][71060.515s][info][gc,heap ]
> Page Cache
> > Flushed: 14M requested, 36M(16632M->16596M) flushed
> > [2019-10-15T20:02:46.128+0000][71063.927s][info][gc,heap ]
> Page Cache
> > Flushed: 4M requested, 18M(7742M->7724M) flushed
> > [2019-10-15T20:02:49.716+0000][71067.514s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 28M(2374M->2346M) flushed
> > [2019-10-15T20:02:49.785+0000][71067.583s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 76M(2346M->2270M) flushed
> > [2019-10-15T20:02:49.966+0000][71067.765s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 10M(2270M->2260M) flushed
> > [2019-10-15T20:02:50.006+0000][71067.805s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 10M(2260M->2250M) flushed
> > [2019-10-15T20:02:50.018+0000][71067.816s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 10M(2250M->2240M) flushed
> > [2019-10-15T20:02:50.098+0000][71067.896s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 10M(2240M->2230M) flushed
> > [2019-10-15T20:02:50.149+0000][71067.947s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 10M(2230M->2220M) flushed
> > [2019-10-15T20:02:50.198+0000][71067.996s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 10M(2220M->2210M) flushed
> > [2019-10-15T20:02:50.313+0000][71068.111s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 10M(2210M->2200M) flushed
> > [2019-10-15T20:02:50.327+0000][71068.125s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 26M(2200M->2174M) flushed
> > [2019-10-15T20:02:50.346+0000][71068.145s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 14M(2174M->2160M) flushed
> > [2019-10-15T20:02:50.365+0000][71068.163s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 14M(2160M->2146M) flushed
> > [2019-10-15T20:02:50.371+0000][71068.170s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 8M(2146M->2138M) flushed
> > [2019-10-15T20:02:50.388+0000][71068.187s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 8M(2138M->2130M) flushed
> > [2019-10-15T20:02:50.402+0000][71068.201s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 8M(2130M->2122M) flushed
> > [2019-10-15T20:02:50.529+0000][71068.327s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 8M(2122M->2114M) flushed
> > [2019-10-15T20:02:50.620+0000][71068.418s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 8M(2114M->2106M) flushed
> > [2019-10-15T20:02:50.657+0000][71068.455s][info][gc,heap ]
> Page Cache
> > Flushed: 2M requested, 14M(2106M->2092M) flushed
> > ...
> >
> > 1. Any idea what this info message conveys about the system?
>
> This means that ZGC had cached heap regions (ZPages) that were ready to
> be used for new allocations, but they had the wrong size, so some
> memory
> (one or more ZPage) was flushed out from the cache so that the physical
> memory they occupied could be reused to build a new ZPage of the
> correct
> size. This is not in itself catastrophic, but it is more expensive than
> a regular allocation as it involves remapping memory. The only real way
> to avoid this is to increase the max heap size.
>
> >
> > 2. Our application is allocating object 5X sometimes and
> Allocation Stall
> > happens immediately after that. Can ZGC heuristic learn about
> this increase
> > and later will it adjust accordingly? Other than increasing
> memory is there
> > any other tuning option to consider that will help in this scenario?
>
> In general, increasing the heap size and/or adjusting the number of
> concurrent work threads (-XX:ConcGCThreads=X) are your main tuning
> options. Using -XX:+UseLargePages is of course also good if you want
> max
> performance. There are a few other ZGC-specific options, but those
> matter a whole lot less and will likely not help in this situation.
>
> The ZGC wiki has some more information on tuning:
> https://wiki.openjdk.java.net/display/zgc/Main
>
> cheers,
> Per
>
> >
> > I am using JDK12 with 80G heap.
> >
> > Thanks
> > Sundar
> >
>
More information about the zgc-dev
mailing list