RFR: 8339387: ZGC: Synchronize medium page allocation
Stefan Johansson
sjohanss at openjdk.org
Fri Sep 6 08:25:52 UTC 2024
On Fri, 6 Sep 2024 07:14:11 GMT, Stefan Johansson <sjohanss at openjdk.org> wrote:
> Please review this change to synchronize medium page allocations in ZGC.
>
> **Summary**
> In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing.
>
> This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case.
>
> **Testing**
> * Functional testing through mach5 tier1-7 using ZGC
> * Performance testing through aurora to verify no regression occur
> * Manual testing to verify performance
> * Manual testing to verify we avoid page cache flushing
As mentioned in the summary, there is no direct performance improvement seen in most benchmarks by this change. But looking at memory usage from our logs we can see improvements in how ZGC uses memory.
In the below statistics logging from the end of a benchmark run where medium objects are in use we can see some of the improvements. Even if they don't translate into a score improvement, they will improve the latency of some allocation operations.
Baseline:
[369.264s][info][gc,stats ] Last 10s Last 10m
[369.264s][info][gc,stats ] Avg / Max Avg / Max
[369.264s][info][gc,stats ] Memory: Allocation Rate 438 / 950 684 / 2846 684 / 2846 684 / 2846 MB/s
[369.264s][info][gc,stats ] Memory: Defragment 0 / 0 18 / 190 18 / 190 18 / 190 ops/s
[369.264s][info][gc,stats ] Memory: Page Cache Flush 0 / 0 36 / 380 36 / 380 36 / 380 MB/s
[369.264s][info][gc,stats ] Memory: Undo Page Allocation 0 / 1 2 / 71 2 / 71 2 / 71 ops/s
With this change:
[369.104s][info][gc,stats ] Memory: Allocation Rate 465 / 620 612 / 1086 612 / 1086 612 / 1086 MB/s
[369.104s][info][gc,stats ] Memory: Defragment 0 / 0 0 / 0 0 / 0 0 / 0 ops/s
[369.104s][info][gc,stats ] Memory: Page Cache Flush 0 / 0 0 / 0 0 / 0 0 / 0 MB/s
[369.104s][info][gc,stats ] Memory: Undo Page Allocation 0 / 0 0 / 8 0 / 8 0 / 8 ops/s
Additional details about the different lines:
**Allocation rate** - The maximum allocation rate is down, because its not inflated by many unnecessary medium page allocation happening at once.
**Defragment** - ZGC try to defragment the virtual address space by remapping memory used by small page from high addresses to low. This will only happen when the page cache only caches medium and large pages, which might be case after a set of medium page allocations that are later undone. In this run all such defragmentations were avoided.
**Page Cache Flush** - When there are no medium (or large) pages available in the cache, the cache needs to be flushed to allow a creation of a new page. When not doing the unnecessary allocations ZGC is able to avoid flushing in this benchmark.
**Undo Page Allocation** - When a page is allocated but later found to not be needed, we undo the page allocation. This can happen for small pages as well, so we still have some undos. But the one for medium pages are avoided.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20883#issuecomment-2333513053
More information about the hotspot-gc-dev
mailing list