RFR: 8305403: Shenandoah evacuation workers may deadlock

Y. Srinivas Ramakrishna ysr at openjdk.org
Fri Apr 14 18:09:36 UTC 2023


On Thu, 13 Apr 2023 17:03:57 GMT, William Kemper <wkemper at openjdk.org> wrote:

>>> Given that I can't remember why we (I) did the complex transition in the first place, let's get rid of it and see what explodes (if anything). There's good chances that the original reasons are no longer relevant. We might even consider a more aggressive approach, see my comments (your call).
>> 
>> I did not follow the recent development closely. IIRC, the original reason for this complicated dance is that, if a thread encounters evacuation OOM and enters here, it needs to wait all other threads to exit EVAC OOM critical sections, before it can proceed, since another thread may evacuate the same oop successfully, therefore, this thread has to read forwarding pointer on its way out.
>
> @zhengyu123 - The code that is meant to suspend threads when they oom-during-evac is still there. That protocol would take over _after_ the evacuating thread has cancelled the GC.

@earthling-amzn : The changes look fine to me. Is there a reason why this piece of code tat aided testing isn't in the PR:


diff --git a/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp b/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp
index 4158f4bee22..e261dd3a81b 100644
--- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp
+++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp
@@ -292,8 +292,7 @@ inline oop ShenandoahHeap::evacuate_object(oop p, Thread* thread) {
   HeapWord* copy = nullptr;
 
 #ifdef ASSERT
-  if (ShenandoahOOMDuringEvacALot &&
-      (os::random() & 1) == 0) { // Simulate OOM every ~2nd slow-path call
+  if (ShenandoahOOMDuringEvacALot && thread->is_Worker_thread() && SuspendibleThreadSet::should_yield() && uint(os::random()) % 1000 < 1) {
         copy = nullptr;
   } else {
 #endif



I'd change it to a general probability number specified via command-line rather than fixing it at 0.5.

Can you talk about stress- or soak-tests with the fix in place?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13309#issuecomment-1509035530


More information about the hotspot-gc-dev mailing list