Discussion on ZGC's Page Cache Flush

Tue Jun 23 08:26:45 UTC 2020

Hi,

On 6/19/20 8:34 AM, Hao Tang wrote:
> Thanks for your reply.
> 
> This is our patch for "balancing" page cache: 
> https://github.com/tanghaoth90/jdk11u/commit/77631cf3 (based on jdk11u).

Sorry, but for IP clarity could you please post that patch to 
cr.openjdk.java.net, otherwise I'm afraid I can't look at the patch.

> 
> We notice two cases that "page cache flush" frequently happens:
> 
> * The number of cached pages is not sufficient for concurrent relocation.
> 
>      For example, 34 medium pages are "to-space" as the GC log shows below.
>      "[2020-03-06T05:46:31.618+0800] GC(10406) Relocation Set (Medium Pages): 54->34, 91 skipped"
>      In our scenario, hundreds of mutator threads is running. To my knowledge, these mutator can possibly relocate medium-sized
>      objects in the relocation set. If there are less than 34 cached medium pages, "page cache flush" is likely to happen.
> 
>      Our strategy is to ensure at least 34 cached medium pages before relocation.
> 
> * A lot of medium(small)-sized objects become unreachable at a moment (such as removing the root of these objects).
>      Assume that the ratio of allocation rate of small and medium objects is 1:1. In this case, small-sized and medium-sized
>      objects occupy 50% and 50% of the total memory, respectively. If medium-sized objects of 25% total memory are removed, there
>      are still cached medium pages of 25% total memory when all small pages are used up. Since ZDriver does not trigger a new
>      GC cycle at this moment, 12.5% total memory should be transformed from medium pages into small pages for allocating small
>      -sized objects.
> 
>      Our strategy is to ensure the ratio of different types of cached pages to match the ratio of allocation rate.
> 
> The patch works well on our application (by eliminating "page cache flush" and the corresponding delay). However, this approach have 
> 
> shortcomings as my previous mail mentioned. It might not be a complete solution for general cases, but still worth discussing. We are 
> 
> also thinking about alternative solutions, such as keep some cached page as buffer.
> 
> Looking forward to your feedback. Thanks.

As of JDK 13, having lots of medium/large pages in the page cache is not 
a problem, since ZGC will split such pages into small pages (which is 
inexpensive) when needed. However, going from small to medium/large is 
more problematic, as it involved (re)mapping memory. One possible 
solution to make this less expensive might be to fuse small pages into 
medium (or large) pages when they are freed. Either by 1) just 
opportunistically fusing small pages that sit next to each other in the 
address space (which would be relatively inexpensive), or 2) by 
remapping memory (which would be more expensive, but that work would be 
done by GC threads).

Alt. 1 would require the page cache to keep pages sorted by virtual 
address. While that's doable, it would be slightly complicated by 
uncommit, which wants to keep pages sorted by LRU.

Alt. 2 might be too expensive to do all the time, but might perhaps be 
useful a complement to alt. 1, if a large set of cached small pages 
can't be fused.

Monitoring the distribution of small/medium page allocations (as you 
mention), might be useful to guide alt. 1 & 2.

cheers,
Per

> 
> Sincerely,
> 
> Hao Tang
> 
> 
> 
> ------------------------------------------------------------------
> From:Per Liden <per.liden at oracle.com>
> Send Time:2020年6月5日 18:54
> To:albert.th at alibaba-inc.com; hotspot-gc-dev openjdk.java.net 
> <hotspot-gc-dev at openjdk.java.net>; zgc-dev <zgc-dev at openjdk.java.net>
> Subject:Re: Discussion on ZGC's Page Cache Flush
> 
> Hi,
> 
> On 6/5/20 11:24 AM, Hao Tang wrote:
>  >
>  > Hi ZGC Team,
>  >
>  > We encountered "Page Cache Flushed" when we enable ZGC feature. Much longer response time can be observed at the time when "Page Cache Flushed" happened. There is a case that is able to reproduce this scenario. In this case, medium-sized objects are periodically cleaned up. Right after the clean-up, small pages is not sufficient for allocating small-sized objects, which needs to flush medium pages into small pages. We found that simply enlarging the max heap size cannot solve this problem. We believe that "page cache flush" issue could be a general problem, because the ratio of small/medium/large objects are not always constant.
>  >
>  > Sample code:
>  > import java.util.Random;
>  > import java.util.concurrent.locks.LockSupport;
>  > public class TestPageCacheFlush {
>  >      /*
>  >       * Options: -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:+UnlockDiagnosticVMOptions -Xms10g -Xmx10g -XX:ParallelGCThreads=2 -XX:ConcGCThreads=4 -Xlog:gc,gc+heap
>  >       * small object: fast allocation
>  >       * medium object: slow allocation, periodic deletion
>  >       */
>  >      public static void main(String[] args) throws Exception {
>  >          long heapSizeKB = Runtime.getRuntime().totalMemory() >> 10;
>  >          System.out.println(heapSizeKB);
>  >          SmallContainer smallContainer = new SmallContainer((long)(heapSizeKB * 0.4));     // 40% heap for live small objects
>  >          MediumContainer mediumContainer = new MediumContainer((long)(heapSizeKB * 0.4));  // 40% heap for live medium objects
>  >          int totalSmall = smallContainer.getTotalObjects();
>  >          int totalMedium = mediumContainer.getTotalObjects();
>  >          int addedSmall = 0;
>  >          int addedMedium = 1; // should not be divided by zero
>  >          while (addedMedium < totalMedium * 10) {
>  >              if (totalSmall / totalMedium > addedSmall / addedMedium) { // keep the ratio of allocated small/medium objects
>  >                  smallContainer.createAndSaveObject();
>  >                  addedSmall ++;
>  >              } else {
>  >                  mediumContainer.createAndAppendObject();
>  >                  addedMedium ++;
>  >              }
>  >              if ((addedSmall + addedMedium) % 50 == 0) {
>  >                  LockSupport.parkNanos(500); // make allocation slower
>  >              }
>  >          }
>  >      }
>  >      static class SmallContainer {
>  >          private final int KB_PER_OBJECT = 64; // 64KB per object
>  >          private final Random RANDOM = new Random();
>  >          private byte[][] smallObjectArray;
>  >          private long totalKB;
>  >          private int totalObjects;
>  >          SmallContainer(long totalKB) {
>  >              this.totalKB = totalKB;
>  >              totalObjects = (int)(totalKB / KB_PER_OBJECT);
>  >              smallObjectArray = new byte[totalObjects][];
>  >          }
>  >          int getTotalObjects() {
>  >              return totalObjects;
>  >          }
>  >          // random insertion (with random deletion)
>  >          void createAndSaveObject() {
>  >              smallObjectArray[RANDOM.nextInt(totalObjects)] = new byte[KB_PER_OBJECT << 10];
>  >          }
>  >      }
>  >      static class MediumContainer {
>  >          private final int KB_PER_OBJECT = 512; // 512KB per object
>  >          private byte[][] mediumObjectArray;
>  >          private int mediumObjectArrayCurrentIndex = 0;
>  >          private long totalKB;
>  >          private int totalObjects;
>  >          MediumContainer(long totalKB) {
>  >              this.totalKB = totalKB;
>  >              totalObjects = (int)(totalKB / KB_PER_OBJECT);
>  >              mediumObjectArray = new byte[totalObjects][];
>  >          }
>  >          int getTotalObjects() {
>  >              return totalObjects;
>  >          }
>  >          void createAndAppendObject() {
>  >              if (mediumObjectArrayCurrentIndex == totalObjects) { // periodic deletion
>  >                  mediumObjectArray = new byte[totalObjects][]; // also delete all medium objects in the old array
>  >                  mediumObjectArrayCurrentIndex = 0;
>  >              } else {
>  >                  mediumObjectArray[mediumObjectArrayCurrentIndex] = new byte[KB_PER_OBJECT << 10];
>  >                  mediumObjectArrayCurrentIndex ++;
>  >              }
>  >          }
>  >      }
>  > }
>  >
>  > To avoid "page cache flush", we made a patch for converting small/medium pages to medium/small pages ahead of time. This patch works well on an application with relatively-stable allocation rate, which has not encountered throughput problem. How do you think of this solution?
>  >
>  > We notice that you are improving the efficiency for map/unmap operations (https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-June/029936.html). It may be a step for improving the delay caused by "page cache flush". Do you have further plan for eliminating or improving "page cache flush"?
> 
> Yes, and as you might have seen, the latest incarnation of this patchset
> includes asynchronous unmapping, which helps reduce the time for page
> cache flushing. I ran your example program above, with these patches and
> can see ~30% reduction in average page allocation time, and ~60%
> reduction in worst case page allocation time. So, it will be an improvement.
> 
> However, I'd be more than happy to take a look at your patch and see
> what you've done. Making page cache flushing even less expensive is
> something we're interested in going forward.
> 
> cheers,
> Per
> 
>  >
>  > Sincerely,Hao Tang
>  >
>