RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java

Wed Jun 29 11:22:55 UTC 2016

Hi Kim,

Let me chime-in although it's a bit late...

I think this is a good change to finally get rid of OOME problems in 
this area.

On 06/28/2016 07:45 PM, Kim Barrett wrote:
>> On Jun 28, 2016, at 5:33 AM, Per Liden <per.liden at oracle.com> wrote:
>> Patch looks good. The only thing I don't feel qualified to review is the initialization order change in thread.cpp, so I'll let others comments on that.
> Thanks.  I’ll be following up on that area.
>
>> I like the pop-one-reference-at-a-time semantics, which simplifies things a lot and keeps the interface nice and clean. I was previously afraid that it might cause a noticeable performance degradation compared to lifting the whole list into Java in one go, but your testing seem to prove that's not the case.
> I was concerned about that too, and had tried a different approach that also still supported the existing "some callers wait and others don’t" API, but it was a bit messy.  Coleen convinced me to try this (since it was easy) and do the measurement, and it worked out well.
>

The repeated JNI invocations do not present a big overhead, but it is 
measurable. For example, the following benchmark measures the time it 
takes after a GC cycle for a bunch of references to be transfered to 
Java side, enqueued into ReferenceQueue and dequeued from it:

http://cr.openjdk.java.net/~plevart/misc/PendingReferenceHandling/ReferenceEnqueueBench.java

The results on my i7-4771 PC:

Original JDK 9:

Benchmark                                (refCount)  Mode  Cnt 
Score      Error  Units
ReferenceEnqueueBench.dequeueReferences      100000    ss  100 38410.515 
± 1011.769  us/op

Patched (by Kim):

Benchmark                                (refCount)  Mode  Cnt 
Score      Error  Units
ReferenceEnqueueBench.dequeueReferences      100000    ss  100 42197.522 
± 1161.451  us/op

So, about 10% worse for this benchmark.

Transfering the whole list in one JNI invocation has the potential for 
further optimizations on the Java side (like handling the whole popped 
list privately without additional synchronization - if we ever find a 
way for java.nio.Bits to wait for it reliably - or even enqueue-ing a 
chunk of consecutive references destined for the same queue using a 
single synchronized action on the queue, etc...)

If the JNI API was something like the following:

     /* Atomically pop the pending reference list if wholeList is true,
      * or just next pending reference if wholeList is false.
      * If wait is true and the pending reference list is empty, blocks 
until
      * it becomes non-empty, or returns null if wait is false.
      */
     private static native Reference<?> popReferencePendingList(boolean 
wholeList, boolean wait);

Then not too much complication is needed in Reference.java (diff to 
current JDK 9 sources) to consume this API and already have some benefit 
from it:

http://cr.openjdk.java.net/~plevart/misc/PendingReferenceHandling/webrev/

There is a possible race here between ReferenceHandler calling 
tryHandlePending(true) and java.nio.Bits calling tryHandlePending(false) 
that can make the method return false to java.nio.Bits when 
ReferenceHandler has just popped the whole list and not yet installed it 
in the private pendingList field, but java.nio.Bits makes several 
retries in this case with exponentially increasing delays, so that does 
not currently present a problem. 
java/nio/Buffer/DirectBufferAllocTest.java test passes with this change.

And the above benchmark shows improvement instead of regression:

Proposed (my Peter):

Benchmark                                (refCount)  Mode  Cnt 
Score      Error  Units
ReferenceEnqueueBench.dequeueReferences      100000    ss  100 34134.977 
± 1274.753  us/op

So what do you think? Is it worth the little additional logic on the 
Java side?

Regards, Peter