RFR: JDK-8274249: ZGC: Bulk free empty relocated pages [v2]

Wed Sep 29 08:10:29 UTC 2021

On Wed, 29 Sep 2021 07:55:42 GMT, 王超 <github.com+25214855+casparcwang at openjdk.org> wrote:

>> Similar to JDK-8255237, bulk free empty relocated pages can amortize the cost of freeing a page and speed up the relocation stage.
>> 
>> The following is the result of specjbb2015 after applying the patch (the tests turn off  the option`UseDynamicNumberOfGCThreads`): the average relocation time speeds up 14%, and the max relocation time speeds up 18%.
>> 
>> patch:
>> [2021-09-18T13:11:51.736+0800][info][gc,stats    ]       Phase: Concurrent Relocate                           0.000 / 0.000       373.180 / 569.855     275.312 / 569.855     275.312 / 569.855     ms
>> [2021-09-18T15:30:07.168+0800][info][gc,stats    ]       Phase: Concurrent Relocate                           0.000 / 0.000       381.266 / 577.812     277.272 / 577.812     277.272 / 577.812     ms
>> [2021-09-18T17:37:56.305+0800][info][gc,stats    ]       Phase: Concurrent Relocate                           0.000 / 0.000       345.037 / 494.135     259.497 / 506.815     259.497 / 506.815     ms
>> 
>> 
>> origin:
>> [2021-09-18T01:01:32.897+0800][info][gc,stats    ]       Phase: Concurrent Relocate                           0.000 / 0.000       429.099 / 662.120     327.213 / 759.723     327.213 / 759.723     ms
>> [2021-09-18T03:11:10.433+0800][info][gc,stats    ]       Phase: Concurrent Relocate                           0.000 / 0.000       413.014 / 613.035     307.625 / 613.035     307.625 / 613.035     ms
>> [2021-09-18T05:21:12.743+0800][info][gc,stats    ]       Phase: Concurrent Relocate                           0.000 / 0.000       411.745 / 642.242     308.986 / 642.242     308.986 / 642.242     ms
>
> 王超 has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Exit bulk free if in place relocation happens

Thank your for your review and suggestions.

> * Instead of checking for the allocator type in the general code, this whole thing could be moved into the ZRelocateSmallAllocator, like this: [master...pliden:8274249_zgc_bulk_free_empty_pages](https://github.com/openjdk/jdk/compare/master...pliden:8274249_zgc_bulk_free_empty_pages)

`ZRelocateSmallAllocator` is a shared object between different relocation tasks, if the logic is moved to `ZRelocateSmallAllocator`, it will need lock-free structures to hold bulk free pages. So I think keep it in the thread local closure make things simpler.

> * How did you arrive at the bulk limit of 32? Did you try other numbers and this worked the best?

64 produces better result than 32 in my testing, but 128 produce the same result as 64. I choose 32 because the worry about in-place relocation in low memory corner cases.

> * Freeing in bulk feels like a reasonable thing to do, and I'm sure it will cause less contention on the page cache lock. So this will probably help in the normal case. However, in the case were memory is low, GC workers are now hogging free memory could cause allocation stalls to be longer than needed and cause in-place relocation to happen more often than needed. This hogging also gets worse the more GC workers we have. So, I'm a bit hesitant to bring this in without some more thought. For example, instead of having a fixed bulk free limit (like 32) we might instead want to look at other ways of limiting the amount of times `free_page()` is called.

Buffering the relocated pages in a local private structure will cause in-place relocation and allocation stall happen more if the memory is low.  So I add a check to exit bulk free mode if in-place relocation happens.

>  I suspect the main problem with calling `free_page()` comes in the beginning of the relocation phase, where we might be relocating a lot very sparse pages. I.e. the time between calls to `free_page()` becomes very short. Later in the relocation phase, as pages get less and less sparse and we spend more and more time copying objects, the calls to `free_page()` becomes less frequent. So, perhaps we could instead track number of relocated bytes, and call `free_pages()` once we pass some limit (say 2M). They we get a fairly uniform time between the calls to `free_page()` and avoid hogging memory for too long. This is just a thought, there might be other/better strategies they would work too. One would have to implement and benchmark to figure out which works best.

Use relocated bytes to control the frequency of bulk free, that's a good idea, I will test it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5670