Discussion on ZGC's Page Cache Flush

Hao Tang albert.th at alibaba-inc.com
Fri Jun 5 09:24:37 UTC 2020


Hi ZGC Team,

We encountered "Page Cache Flushed" when we enable ZGC feature. Much longer response time can be observed at the time when "Page Cache Flushed" happened. There is a case that is able to reproduce this scenario. In this case, medium-sized objects are periodically cleaned up. Right after the clean-up, small pages is not sufficient for allocating small-sized objects, which needs to flush medium pages into small pages. We found that simply enlarging the max heap size cannot solve this problem. We believe that "page cache flush" issue could be a general problem, because the ratio of small/medium/large objects are not always constant.

Sample code: 
import java.util.Random;
import java.util.concurrent.locks.LockSupport;
public class TestPageCacheFlush {
    /*
     * Options: -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:+UnlockDiagnosticVMOptions -Xms10g -Xmx10g -XX:ParallelGCThreads=2 -XX:ConcGCThreads=4 -Xlog:gc,gc+heap
     * small object: fast allocation
     * medium object: slow allocation, periodic deletion
     */
    public static void main(String[] args) throws Exception {
        long heapSizeKB = Runtime.getRuntime().totalMemory() >> 10;
        System.out.println(heapSizeKB);
        SmallContainer smallContainer = new SmallContainer((long)(heapSizeKB * 0.4));     // 40% heap for live small objects
        MediumContainer mediumContainer = new MediumContainer((long)(heapSizeKB * 0.4));  // 40% heap for live medium objects
        int totalSmall = smallContainer.getTotalObjects();
        int totalMedium = mediumContainer.getTotalObjects();
        int addedSmall = 0;
        int addedMedium = 1; // should not be divided by zero
        while (addedMedium < totalMedium * 10) {
            if (totalSmall / totalMedium > addedSmall / addedMedium) { // keep the ratio of allocated small/medium objects
                smallContainer.createAndSaveObject();
                addedSmall ++;
            } else {
                mediumContainer.createAndAppendObject();
                addedMedium ++;
            }
            if ((addedSmall + addedMedium) % 50 == 0) {
                LockSupport.parkNanos(500); // make allocation slower
            }
        }
    }
    static class SmallContainer {
        private final int KB_PER_OBJECT = 64; // 64KB per object
        private final Random RANDOM = new Random();
        private byte[][] smallObjectArray;
        private long totalKB;
        private int totalObjects;
        SmallContainer(long totalKB) {
            this.totalKB = totalKB;
            totalObjects = (int)(totalKB / KB_PER_OBJECT);
            smallObjectArray = new byte[totalObjects][];
        }
        int getTotalObjects() {
            return totalObjects;
        }
        // random insertion (with random deletion)
        void createAndSaveObject() {
            smallObjectArray[RANDOM.nextInt(totalObjects)] = new byte[KB_PER_OBJECT << 10];
        }
    }
    static class MediumContainer {
        private final int KB_PER_OBJECT = 512; // 512KB per object
        private byte[][] mediumObjectArray;
        private int mediumObjectArrayCurrentIndex = 0;
        private long totalKB;
        private int totalObjects;
        MediumContainer(long totalKB) {
            this.totalKB = totalKB;
            totalObjects = (int)(totalKB / KB_PER_OBJECT);
            mediumObjectArray = new byte[totalObjects][];
        }
        int getTotalObjects() {
            return totalObjects;
        }
        void createAndAppendObject() {
            if (mediumObjectArrayCurrentIndex == totalObjects) { // periodic deletion
                mediumObjectArray = new byte[totalObjects][]; // also delete all medium objects in the old array
                mediumObjectArrayCurrentIndex = 0;
            } else {
                mediumObjectArray[mediumObjectArrayCurrentIndex] = new byte[KB_PER_OBJECT << 10];
                mediumObjectArrayCurrentIndex ++;
            }
        }
    }
}

To avoid "page cache flush", we made a patch for converting small/medium pages to medium/small pages ahead of time. This patch works well on an application with relatively-stable allocation rate, which has not encountered throughput problem. How do you think of this solution?

We notice that you are improving the efficiency for map/unmap operations (https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-June/029936.html). It may be a step for improving the delay caused by "page cache flush". Do you have further plan for eliminating or improving "page cache flush"?

Sincerely,Hao Tang


More information about the zgc-dev mailing list