Analysis on JDK-8022321 java/lang/ref/OOMEInReferenceHandler.java fails intermittently

Wed Mar 23 20:42:06 UTC 2016

Hi Kim,

Thinking more about your approach. Basically your idea is to detect that 
there are no more unprocessed but pending or enqueued Cleanables by 
timing out on waiting for next Cleanable to be processed. In that case 
the timeout should be reset when each Cleanable is detected to be 
processed so that when there's a "silence" detected for at least the 
whole timeout period, we can claim with enough probability that there 
are no more unprocessed Cleanables either pending or enqueued and that 
we can give up with OOME.

Let me try to see with a prototype if this approach leads to success...

Regards, Peter

On 03/23/2016 08:33 PM, Peter Levart wrote:
> Hi Kim,
>
> On 03/23/2016 07:55 PM, Kim Barrett wrote:
>>> On Mar 23, 2016, at 10:02 AM, Peter Levart<peter.levart at gmail.com>  wrote:
>>> ...so I checked what it would be needed if there was such getPendingReferences() native method. It turns out that a single native method would not be enough to support the precise direct ByteBuffer allocation. Here's a refactored webrev that introduces a getPendingReferences() method which could be turned into a native equivalent one day. There is another native method needed - int awaitEnqueuePhaseStart():
>>>
>>> http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.09.part2/
>> I don't think the Reference.awaitEnqueuePhaseStart thing is needed.
>>
>> Rather, I think the Direct-X-Buffer allocation should conspire with
>> the the Direct-X-Buffer cleanups directly to manage that sort of
>> thing, and not add anything to Reference and the reference processing
>> thread.  E.g. the phase and signal/wait are purely part of
>> Direct-X-Buffer.  (I also think something like that could/should have
>> been done instead of providing Direct-X-Buffer with access to
>> Reference.tryHandlePending, but that's likely water under the bridge
>> now.)
>>
>> Something very roughly like this:
>>
>> allocating thread, after allocation failed
>>
>> bool waitForCleanups() {
>>    int epoch = DXB.getCleanupCounter();
>>    long start = startTime();
>>    long timeout = calcTimeout(start)
>>    synchronized (DXB.getCleanupMonitor()) {
>>      while (epoch == DBX.getCleanupCounter()) {
>>        wait(timeout);
>>        timeout = calcTimeout(start);
>>        if (timeout <= 0) break;
>>      }
>>      return epoch != DBX.getCleanupCounter();
>>    }
>> }
>>
>> cleanup function, after freeing memory
>>
>>    synchronized (DBX.getCleanupMonitor()) {
>>      DBX.incCleanupCounter();
>>      DBX.getCleanupMonitor().notify_all();
>>    }
>>
>> Actually, epoch should probably have been obtained *before* the failed
>> allocation attempt, and should be an argument to waitForCleanups.
>>
>> That's all quite sketchy, but I need to do other things today.
>>
>> Peter, care to try filling this in?
>>
>
> There's no need to maintain a special cleanup counter as java.nio.Bits 
> already maintains the amount of currently allocated direct memory (in 
> bytes). What your suggestion leads to is similar to one of previous 
> versions of java.nio.Bits which waited for some 'timeout' time after 
> invoking System.gc() and then re-tried reservation, failing if it 
> didn't succeed. The problem with such "asynchronous" approach is that 
> there's no right value of 'timeout' for all situations. If you wait 
> for to short time, you might get OOME although there are plenty 
> unreachable but still uncleaned direct buffers. If you wait for to 
> long, your throughput will suffer. There has to be some "feedback" 
> from reference processing to know when there's still beneficial to 
> wait and when there's no point in waiting any more.
>
> Regards, Peter
>