RFR: JDK-8066859 java/lang/ref/OOMEInReferenceHandler.java failed with java.lang.Exception: Reference Handler thread died

Thu May 7 10:57:08 UTC 2015

Hi,

I see the intermittent failures have already been fixed (I haven't 
noticed this before) in this issue:

     https://bugs.openjdk.java.net/browse/JDK-8067751

The fix was simply adding: "-XX:-UseGCOverheadLimit" option to the VM 
options.

So my conclusion that the remaining intermittent test failures stem from 
Cleaner.clean() throwing OOME were right. It's just that the cause of 
OOME seems to be the "GC overhead limit exceeded".

I always thought that a thread can only get OOME when it tries to 
allocate something. Now the above issue indicates that OOME can be 
thrown in a thread at any time. Is my understanding correct? How and 
when is a thread chosen then for this type of OOME?

Regards, Peter

On 05/07/2015 12:25 PM, Peter Levart wrote:
>
>
> On 05/07/2015 09:06 AM, Laurent Bourgès wrote:
>>
>> Peter,
>>
>> I looked at Cleaner by curiosity and it seems to be not catching the 
>> oome from thunk.run !
>>
>> If oome1 is thrown by thunk.run at line 150 then it is catched at 
>> line 157 but your new try/catch block (oome2) only encapsulates the 
>> doPriviledge block.
>>
>> If this block also throws a new oome2 due to the first oome1 (no 
>> memory left), it will work but I would have prefered a more explicit 
>> solution and check oome1 first ...
>>
>> My 2 cents (I am not a reviewer).
>>
>> Laurent
>>
>
> Laurent,
>
> You have a point and I asked myself the same question. The question is 
> how to treat OOME thrown from thunk.run(). Current behavior is to 
> exit() JVM for any exception (Throwable). I maintained that semantics. 
> I only added a handler for OOME thrown in the handler of the 1st 
> exception. I might have just exit()-ed the VM if OOME is thrown, but 
> leaving no trace and just exiting VM would not help anyone diagnose 
> what went wrong. So I opted for keeping the VM running for a while by 
> delaying the handling of 1st exception to "better times". If better 
> times never come, then the application is probably stuck anyway.
>
> An alternative would be to catch OOME from thunk.run() and ignore it 
> (printing it out would be ugly if VM is left to run), but that would 
> silently ignore OOMEs thrown from thunk.run() and noone would notice 
> that Cleaner(s) might not have clean-ed up the resources they should.
>
> The complete fix would be to inspect the code paths of all 
> Cleaner.thunk's run() methods and see if and where they can throw 
> exceptions (OOMEs in particular) and whether they can be prevented. I 
> did that and found myself wandering deeply in the hotspot code that I 
> don't understand completely. Cleaner's are used in the following places:
>
> - in java.lang.invoke.CallSite, to invalidate the dependent nmethods 
> when the context class is GC-ed - that one was added recently. 
> Intermittent failures of the test predate it's addition, so I would 
> not suspect this one immediately.
>
> - in sun.misc.Perf, to detach the memory of a native ByteBuffer 
> obtained from Perf.attach() when the ByteBuffer is GC-ed. The Java 
> code-path does not have any allocations and the native code that does 
> detaching is in hotspot/src/share/vm/prims/perf.cpp:
>
> static JNINativeMethod perfmethods[] = {
>     ...
>   {CC"detach",              CC"("BB")V", FN_PTR(Perf_Detach)},So c
>     ...
>
> PERF_ENTRY(void, Perf_Detach(JNIEnv *env, jobject unused, jobject 
> buffer))
>
>   PerfWrapper("Perf_Detach");
>
>   if (!UsePerfData) {
>     // With -XX:-UsePerfData, detach is just a NOP
>     return;
>   }
>
>   void* address = 0;
>   jlong capacity = 0;
>
>   // get buffer address and capacity
>   {
>    ThreadToNativeFromVM ttnfv(thread);
>    address = env->GetDirectBufferAddress(buffer);
>    capacity = env->GetDirectBufferCapacity(buffer);
>   }
>
>   PerfMemory::detach((char*)address, capacity, CHECK);
>
> PERF_END
>
> - in sun.nio.ch.IOVecWrapper, to deallocate native memory associated 
> with the wrapper when it is GC-ed. The Java code-path does not perform 
> any allocations and finally just calls native 
> Unsafe.freeMemory(allocationAddress) where allocationAddress was 
> obtained from Unsafe.allocateMemory(size). The native code for 
> Unsafe.freeMemory is in hotspot/src/share/vm/prims/unsafe.cpp:
>
> static JNINativeMethod methods[] = {
>     ...
>     {CC"freeMemory",         CC"("ADR")V", FN_PTR(Unsafe_FreeMemory)},
>     ...
>
> UNSAFE_ENTRY(void, Unsafe_FreeMemory(JNIEnv *env, jobject unsafe, 
> jlong addr))
>   UnsafeWrapper("Unsafe_FreeMemory");
>   void* p = addr_from_java(addr);
>   if (p == NULL) {
>     return;
>   }
>   os::free(p);
> UNSAFE_END
>
> - in sun.nio.fs.NativeBuffer, to deallocate native memory allocated 
> for NativeBuffer by Unsafe.allocateMemory(size) with exactly the same 
> Cleaner.thunk (Deallocator) as used in sun.nio.ch.IOVecWrapper.
>
> Can anyone confirm whether the above two native methods can throw any 
> exception or not?
>
> Anyway. If none of the Cleaner.thunk's run() methods can throw any 
> exception, then my handling of OOME is redundant and a code-path never 
> taken. But I would still leave it there in case some new Cleaner use 
> comes along which is not verified yet...
>
> Regards, Peter
>