RFR: 8327000: GenShen: Integrate updated Shenandoah implementation of FreeSet into GenShen [v8]
Kelvin Nilsen
kdnilsen at openjdk.org
Tue Jun 25 16:44:31 UTC 2024
On Tue, 25 Jun 2024 15:28:46 GMT, Kelvin Nilsen <kdnilsen at openjdk.org> wrote:
>> I'm testing now a configuration that honors the intent of this comment, to only increment gc_no_progress count if "consecutive Full GCs" fail to make progress. This is not how it was implemented before, because our implementation has been also incrementing gc_no_progress count if degenerated fails to make progress, and we have not been resetting the count when we experience a productive concurrent GC. My interpretation of the intent is that a concurrent GC happening between two unproductive Full GCs does not count as two "consecutive" unproductive full GCs.
>>
>> If this change behaves well, we may be able to remove the ShenandoahNoProgressThreshold override on TestThreadFailure#generational.
>
> In general, this new configuration works well and passes all GHA and all CI/CD pipeline tests. However, I tried removing the ShenandoanNoProgressThreshold override from this test, and it fails. In one execution that I carefully analyzed, the behavior is:
>
> 1. For NastyThread-0 through NastyThread-11, we perform a FullGC (which has good progress) but the good progress is not enough to satisfy the failed allocation request so we throw OOM.
>
> 2. With NastyThread-12, we do not fail fast. GC(127) is concurrent young. GC(128) through GC(132) are Full GCs, each with Bad Progress, but each yielding enough free memory to satisfy at least one additional allocation by NastyThread-12.
>
> 3. GC(133) is a full GC also with bad progress. This time, the bad progress is not enough to satisfy the pending alloc request (for 4112 bytes), so we throw OOM.
>
> 4. At this point, we have experienced 5 (Default value of ShenandoaohNoProgressThreshold) consecutive full GCs with no progress, so when the main thread attempts to allocate NastyThread-13 after joining with NastyThread-12, it does not even bother to attempt a Full GC. It just immediately throws OOM.
>
> 5. This causes the test to fail, because main is not "supposed" to experience OOM.
>
> Another complication is that the failure doesn't always happen with NastyThread-13. Sometimes it happens with NastyThread-5. GC degradation is not cumulative. Each NastyThread is supposed to start with a clean slate (after a Full GC reclaims all previously allocated memory).
>
> And finally, I have observed that this test will still occasionally fail even with the ShenandoahNoProgressThreshold=24 override.
>
> So I'm puzzling a bit over why GenShen inherently needs a larger value of ShenandoahNoProgressThreshold than traditional Shenandoah in order to pass this test. I think the explanation is that GenShen introduces more "heap fragmentation" between OLD and YOUNG generations. Full GC can help sift out these fragments of memory so that "smaller" allocation requests can still succeed, even though Full GC reports "bad progress".
>
> My current thought is to apply one more tweak to the GenShen behavior: If a pending allocation succeeds following a Full GC, I am inclined to count this as "good progress", regardless of what the other metrics think about progress. That we were able to allocate following Full GC and were not able to allocate before Full GC is the ultimate measure of "good progress". Will experiment with this.
With this change, I got 100 consecutive successful runs of the TestThreadFailure.java test without having an override on ShenandoahNoProgressThreshold in the GenShen configuration of the test. I believe this is preferrable to the use of the override. It makes GenShen behave more similar to the way Shenandoah behaves.
I am now testing on CI/CD pipeline to see if this change introduces any other performance or correctness regressions.
-------------
PR Review Comment: https://git.openjdk.org/shenandoah/pull/440#discussion_r1653173995
More information about the shenandoah-dev
mailing list