RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java

Mon Aug 8 12:16:11 UTC 2016

Hi Kim,

On 2016-08-01 20:47, Kim Barrett wrote:
> This updated webrev addresses concerns about the performance of the VM
> API used by the reference processor thread in the original webrev.
>
> New webrevs:
> full: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03/
>       http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03/
> incr: http://cr.openjdk.java.net/~kbarrett/8156500/jdk.03.incr/
>       http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.03.incr/

Overall I think you managed to hit a good balance here, keeping the VM 
API straight forward without loosing flexibility. Having the 
ReferenceHandler doing the actual work seems sound and avoids some 
complexity, which I like.

I have one suggestion though, regarding CheckReferencePendingList(). 
While reviewing I found that I had to check several times what its 
return value actually meant, the "check" part of the name doesn't quite 
reveal that. Further, it seems to me that the waiting path of this 
function has fairly little in common with the non-waiting path, e.g. it 
always returns true. So, to make both the naming and implementation more 
clear I'd like to suggest that we split this into two separate 
functions, HasReferencePendingList() and WaitForReferencePendingList(), 
like this:

JVM_ENTRY(jboolean, JVM_HasReferencePendingList(JNIEnv* env))
   JVMWrapper("JVM_HasReferencePendingList");
   MonitorLockerEx ml(Heap_lock);
   return Universe::has_reference_pending_list();
JVM_END

JVM_ENTRY(void, JVM_WaitForReferencePendingList(JNIEnv* env))
   JVMWrapper("JVM_WaitForReferencePendingList");
   MonitorLockerEx ml(Heap_lock);
   while (!Universe::has_reference_pending_list()) {
     ml.wait();
   }
JVM_END

And of course, this would then also be reflected in j.l.r.Reference, 
which would have:

private static native boolean hasReferencePendingList();
private static native void waitForReferencePendingList();

Other than this I think the patch looks good.

>
> The originally offered approach was an attempt to minimize changes.  I
> was trying to be as similar to the existing code as possible, while
> moving part of it into the VM, where we have the necessary control
> over suspension and the like.  We already know we want to make changes
> in this area, in order to eliminate the use of
> jdk.internal.ref.Cleaner (JDK-8149925).  But we've also agreed that
> JDK-8149925 wasn't urgent and to defer it to JDK 10.  I don't think we
> should revisit that.

Is JDK-8149925 the correct bug id? Looks like that bug has already been 
fixed in 9.

cheers,
Per

>
> As Peter pointed out, my original change introduces a small
> performance regression on a microbenchmark for reference processing
> throughput.  It also showed a small performance benefit on a different
> microbenchmark (allocation rate for a stress test of direct byte
> buffers), as noted in the original RFR.  I can reproduce both of
> those.
>
> I think reference processing thread throughput is the right measure to
> use, so long as others don't become horrible.  Focusing on that, it's
> better to just let the reference processing thread do the processing,
> rather than slowing it down to allow for the rare case when there's
> another thread that could possibly help.  This is especially true now
> that Cleaner has such limited usage.
>
> This leads to a different API for other threads; rather than
> tryHandlePending that other threads can call to help and to examine
> progress, we now have waitForReferenceProcessing.  The VM API is also
> different; rather than popReferencePendingList to get or wait, we now
> have getAndClearReferencePendingList and checkReferencePendingList,
> the latter with an optional wait.
>
> The splitting of the VM API allows us to avoid a narrow race condition
> discussed by Peter in his prototypes.  Peter suggested this race was
> ok because java.nio.Bits makes several retries.  However, those
> retries are only done before throwing OOME.  There are no retries
> before invoking GC, so this race could lead to unnecessary successive
> GCs.
>
> Doing all the processing on the reference processing thread eliminates
> execution of Cleaners on application threads, though that's not nearly
> as much an issue now that the use of Cleaner is so restricted.
>
> We've also eliminated a pre-existing issue where a helper could report
> no progress even though the reference processing thread (and other
> helpers) had removed a pending reference, but not yet processed it.
> This could result in the non-progressing helper invoking GC or
> reporting failure, even though it might have succeeded if it had
> waited for the other thread(s) to complete processing the reference(s)
> being worked on.
>
> I think the new waitForReferenceProcessing could be used to fix
> JDK-6306116, though that isn't part of this change, and was not really
> a motivating factor.
>
> I don't know if the new waitForReferenceProcessing will be used by
> whatever we eventually do for JDK-8149925, but I think the new VM API
> will suffice for that.  That is, I think JDK-8149925 might require
> changes to the core-libs API used by nio.Bits, and won't require
> further VM API changes.
>
>
>