RFR: 8156500: deadlock provoked by new stress test com/sun/jdi/OomDebugTest.java

Kim Barrett kim.barrett at oracle.com
Mon Aug 15 03:15:26 UTC 2016


> On Aug 13, 2016, at 4:20 AM, Peter Levart <peter.levart at gmail.com> wrote:
> 
> Hi Mandy,
> 
> On 08/13/2016 01:55 AM, Mandy Chung wrote:
>>> On Aug 8, 2016, at 6:25 PM, Kim Barrett <kim.barrett at oracle.com>
>>>  wrote:
>>> 
>>> full: 
>>> http://cr.openjdk.java.net/~kbarrett/8156500/jdk.04/
>>> 
>>>      
>>> http://cr.openjdk.java.net/~kbarrett/8156500/hotspot.04/
>> This looks very good.  
>> 
>> Have you considered having JVM_WaitForReferencePendingList method to return the pending list?  i.e. make it a blocking version of getAndClearReferencePendingList rather than two operations.
>> 
>> waitForReferenceProcessing is really wait for pending reference(s) enqueued (additionally cleaner gets invoked).  What do you think to rename this method to “waitForPendingReferencesEnqueued”?
>> 
> 
> I think the split is intentional. It's a clever way to avoid an otherwise inevitable race that could otherwise cause DBB allocating thread to fail with OOME prematurely.
> 
> waitForReferencePendingList() is invoked out of synchronized block so that it does not prevent the progress of thread(s) invoking waitForReferenceProcessing(), while getAndClearReferencePendingList is non-blocking and is invoked inside the synchronized block that also sets a boolean flag. Code in waitForReferenceProcessing() checks both the non-emptiness of reference pending list and the boolean flag while holding the lock.
> 
> The following sequence of actions (as used in NIO Bits):
> 
> System.gc();
> Reference.waitForReferenceProcessing();
> 
> ...therefore either:
> 
> - completes immediately after System.gc() returns without discovering any pending Reference, returning false; or
> - completes after System.gc() discovers at least one pending Reference and ReferenceHandler thread processes at least one Cleaner, returning true; or enqueues all pending Reference(s), returning false.
> 
> When waitForReferenceProcessing() returns false consecutively even after a series of exponentially increasing pauses, the DBB allocating thread(s) can be sure they have exhausted all options to allocate the direct buffer and must fail with OOME.

Thanks for answering Mandy’s questions for me.  That all looks right.

> I have a feeling that these pauses are now unnecessary. Will try to check with some experiments…

I found that the DirectBufferAllocTest will sometimes fail if the pauses are taken out.
I think what’s going on is that the multiple threads are competing for resources, and
some threads in that test lose out if all of them are waiting and wake up at the same
time.  The exponentially increasing back-off scatters the threads enough for that to
become very unlikely, though with sufficiently bad luck… But I think the current
implementation could also fail that test with similarly bad luck.  It just requires *very*
bad luck, so we’re not seeing it as a problem.  And that test is a pretty extreme stress
test.

> 
> Regards, Peter
> 
>> Grammar question:
>> 
>> "there are no more references”
>> "If there aren't any pending {@link Reference}s"
>> 
>> - This is the case with zero pending reference, shouldn’t it use “is” such that:
>> 
>> "there us no more reference”
>> "If there isn't any pending {@link Reference}”
>> 
>> Please update the synposis of JDK-8156500 to make it clear what this fix is (it’s nothing related to JDI).  Maybe something like “Move the pending reference list to VM”
>> 
>> Mandy




More information about the core-libs-dev mailing list