RFR: JDK-8274249: ZGC: Bulk free empty relocated pages

Tue Sep 28 13:10:57 UTC 2021

On Fri, 24 Sep 2021 03:41:27 GMT, 王超 <github.com+25214855+casparcwang at openjdk.org> wrote:

> Similar to JDK-8255237, bulk free empty relocated pages can amortize the cost of freeing a page and speed up the relocation stage.
> 
> The following is the result of specjbb2015 after applying the patch (the tests turn off  the option`UseDynamicNumberOfGCThreads`): the average relocation time speeds up 14%, and the max relocation time speeds up 18%.
> 
> patch:
> [2021-09-18T13:11:51.736+0800][info][gc,stats    ]       Phase: Concurrent Relocate                           0.000 / 0.000       373.180 / 569.855     275.312 / 569.855     275.312 / 569.855     ms
> [2021-09-18T15:30:07.168+0800][info][gc,stats    ]       Phase: Concurrent Relocate                           0.000 / 0.000       381.266 / 577.812     277.272 / 577.812     277.272 / 577.812     ms
> [2021-09-18T17:37:56.305+0800][info][gc,stats    ]       Phase: Concurrent Relocate                           0.000 / 0.000       345.037 / 494.135     259.497 / 506.815     259.497 / 506.815     ms
> 
> 
> origin:
> [2021-09-18T01:01:32.897+0800][info][gc,stats    ]       Phase: Concurrent Relocate                           0.000 / 0.000       429.099 / 662.120     327.213 / 759.723     327.213 / 759.723     ms
> [2021-09-18T03:11:10.433+0800][info][gc,stats    ]       Phase: Concurrent Relocate                           0.000 / 0.000       413.014 / 613.035     307.625 / 613.035     307.625 / 613.035     ms
> [2021-09-18T05:21:12.743+0800][info][gc,stats    ]       Phase: Concurrent Relocate                           0.000 / 0.000       411.745 / 642.242     308.986 / 642.242     308.986 / 642.242     ms

1)  Instead of checking for the allocator type in the general code, this whole thing could be moved into the ZRelocateSmallAllocator, like this: https://github.com/openjdk/jdk/compare/master...pliden:8274249_zgc_bulk_free_empty_pages

2) How did you arrive at the bulk limit of 32? Did you try other numbers and this worked the best?

3) Freeing in bulk feels like a reasonable thing to do, and I'm sure it will cause less contention on the page cache lock. So this will probably help in the normal case. However, in the case were memory is low, GC workers are now hogging free memory could cause allocation stalls to be longer than needed and cause in-place relocation to happen more often than needed. This hogging also gets worse the more GC workers we have. So, I'm a bit hesitant to bring this in without some more thought. For example, instead of having a fixed bulk free limit (like 32) we might instead want to look at other ways of limiting the amount of times `free_page()` is called. I suspect the main problem with calling `free_page()` comes in the beginning of the relocation phase, where we might be relocating a lot very sparse pages. I.e. the time between calls to `free_page()` becomes very short. Later in the relocation phase, as pages get less and less sparse and we spend more and more time copying objects, the
  calls to `free_page()` becomes less frequent. So, perhaps we could instead track number of relocated bytes, and call `free_pages()` once we pass some limit (say 2M). They we get a fairly uniform time between the calls to `free_page()` and avoid hogging memory for too long. This is just a thought, there might be other/better strategies they would work too. One would have to implement and benchmark to figure out which works best.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5670