RFR: 8342444: Shenandoah: Uncommit regions from a separate, STS aware thread [v2]

Tue Nov 12 19:27:18 UTC 2024

On Tue, 12 Nov 2024 18:17:29 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> Hmm, I'm not sure we can do that. Prior to this change, the control thread performed both clearing the bitmap and uncommitting the region's bitmap, so they could never happen concurrently. With this change, a separate thread could perform the uncommit. Consider:
>> 
>> 1. Control thread takes heap lock, observes that bitmap slice for region A is committed
>> 2. Control thread releases heap lock, begins clearing bitmap (writing zeros to bitmap slice)
>> 3. Uncommit thread takes heap lock, believes it must uncommit region A
>> 4. Uncommit thread uncommits bitmap slice for region A
>> 5. Segfault in Control Thread
>> 
>> I do _believe_ if we had a per region lock, it would be useful here. Holding a lock over the entire heap for this feels like overkill. Or, we could schedule the uncommit so that it does not occur during a GC cycle.
>
> So, wait a sec. This code is in `ShenandoahResetBitmapTask`, so it can run in parallel. Putting a lock here inhibits parallelism. I understand the failure mode, but I think we should really be optimizing for the case when `ShenandoahUncommit` is not enabled (e.g. `-Xmx` == `-Xms`).
> 
> Sounds like there is a hassle in allowing concurrent uncommit to overlap with the GC cycle. In addition to this particular problem, we might be stealing cycles from the GC threads and take additional TTSP lag to park the uncommitter for the in-cycle GC pauses. I have no clear solution for this yet, but I think we need to explore if we can suspend the uncommit before going into GC cycle...

We could have the control and uncommit threads coordinate their efforts. In the worst case, it could mean delaying concurrent reset while the control thread waits for the uncommit thread to yield.

We could also try a more targeted lock only for the region's bitmap slice, but it doesn't seem right that one thread would be trying to clear a bitmap, while the other is trying to uncommit it. A lock could preserve technical correctness, but contention here would just mean that one thread would have wasted its time (either clearing a bitmap that is then uncommitted, or attempting to clear a bitmap that was first uncommitted (in this case, we would need the control thread to detect this and skip the region)).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22019#discussion_r1838641673