RFR: 8256390: ZGC: Relocate in-place instead of having a heap reserve [v2]

Thu Nov 19 15:34:03 UTC 2020

On Tue, 17 Nov 2020 16:30:23 GMT, Per Liden <pliden at openjdk.org> wrote:

>> The heap reserve is a portion of the heap that is set aside to cope with “emergency situations”, like when ZGC needs to compact the heap when it’s already full. However, due to a number of factors (mutator relocation, page pinning, etc) there is no guarantee that relocation will always succeed. In fact, we can end up in situations where we fail to reclaim memory even when 90% of the heap is garbage (JDK-8254346 is an example of where this happens).
>> 
>> So, even if the heap reserve serves its purpose well in most cases, it still doesn't offer any guarantees that relocation succeeds in all cases. To get the guarantees we want we could, instead of having a heap reserve, turn to doing in-place relocation. In other words, if a worker fails to allocate a new page, then it compacts the page currently being relocated, and then use that page as the target for new allocations. In addition, if a mutator fails to relocate an object, then it waits for a worker to relocate the object, instead of pinning the page. This patch does exactly that, and hence makes ZGC more resilient in situations where memory is scares. It also means that ZGC works even better when using small heap (where memory tends to be scares).
>> 
>> Testing: Tier1-7 on linux-x64, Tier-1-3 on all Oracle platforms, 50+ runs of gc-test-suite on linux-x64 (both with and without `-XX:+ZStressRelocateInPlace`).
>
> Per Liden has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Review Stefank

Looks awesome in general. Getting rid of the reserve and making sure relocation can't fail is indeed very nice. One question mark only about the _ref_count and lack of load_acquire in double-checked locking patterns. I think we should load_acquire also inside of the lock, so we don't rely on the release of the lock also being an acquire. Unless I missed something in the protocol.

src/hotspot/share/gc/z/zForwarding.cpp line 136:

> 134:     ZStatTimer timer(ZCriticalPhaseRelocationStall);
> 135:     ZLocker<ZConditionLock> locker(&_ref_lock);
> 136:     while (Atomic::load(&_ref_count) != 0) {

It would appear that we want acquire semantics for the _ref_count. The lock in this scope will give the load load_release() semantics. In practice, that boils down to the same fencing on all current platforms. But it is semantically incorrect in terms of memory model guarantees. I don't think we need to optimize this path, so just doing a load_acquire here seems more clear IMO. Unless I missed something. I see the elision of acquire on a few other double-locked patterns, and think we should consistently not do that.

-------------

Changes requested by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/1228