RFR: 8367646: [GenShen] Control thread may overwrite gc cancellation cause set by mutator
William Kemper
wkemper at openjdk.org
Mon Oct 6 22:31:50 UTC 2025
On Mon, 6 Oct 2025 22:00:01 GMT, William Kemper <wkemper at openjdk.org> wrote:
> I believe the following events could lead to this assertion failure:
> 1. Control thread reads the heap's gc cancellation cause as `shenandoah_concurrent_gc`
> 2. Mutator thread has an allocation failure and sets the heap's gc cancellation cause to `shenandoah_alloc_failure`
> 3. Control thread uses stale value from `1` and decides to unconditionally clear the cancellation cause
> 4. Mutator thread assert that gc is still cancelled
>
> The proposed fix here has the control thread use a CAS operation to only clear the gc if the existing value is `shenandoah_concurrent_gc`. This will stop the control thread from erroneously changing the value if a mutator has already set it to `shenandoah_alloc_failure`. A mutator thread may still have an allocation failure after the control thread has cleared the cancellation, but this is normal and expected.
I was never able to reproduce it. The crash was observed on somewhat exotic hardware. I'm fairly confident in the fix because in all other cases the cancellation cause is only cleared on a safepoint. That is to say, this code is the only place the collector clears the cancellation cause concurrently with mutators.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/27662#issuecomment-3374480022
More information about the hotspot-gc-dev
mailing list