deadlock with jni NewDirectByteBuffer called from multiple threads introduced in JDK 1.6.0_04

Thu Jan 8 15:43:42 PST 2009

Keith,

I can see the problem - in fact I think I was involved in the code 
change that has triggered the deadlock. :(

Two threads are concurrently try to use direct buffers and they are 
racing to initialize direct buffer support. The first thread has got the 
job of doing the initialization and is looking up classes which leads to 
one thing then another and ultimately a safepoint is requested. 
Meanwhile the other thread that lost the initialization race enters a 
polling loop waiting for the first thread to complete the initialization.

The problem is that while in this polling loop (a sched_yield() call) 
the thread is marked as ThreadInVM, which means that the VMThread will 
wait for it to reach the safepoint. As it never does anything to 
encounter the safepoint then VMThread keeps waiting; so the 
initialization thread keeps waiting, and so the second thread keeps 
waiting - deadlock.

The code change that caused this was the change of the thread's state to 
ThreadInVM. This was done because on Solaris the os::yield_all call can 
turn into an os_sleep call and that requires the thread to be 
ThreadInVM. On linux the os::yield_all call is just sched_yield and so 
the state change is not only not needed but dangerous.

I will file a bug for this immediately.

There should be a workaround however: don't have a race to initialize 
the direct buffers. If you can insert a call to create a NewDirectBuffer 
early in the apps lifetime, from one thread, then initialization will be 
able to occur with no race and this problem won't occur.

David Holmes

Keith McNeill said the following on 01/09/09 09:15:
> Here is a gdb stack dump from linux64.  Look for NewDirectByteBuffer to 
> find the calls.
> 
> David Holmes - Sun Microsystems wrote:
>> Hi Keith,
>>
>> What platform are you on? Can you see where threads block inside 
>> NewDirectByteBuffer?
>>
>> On Solaris pstack would show you what state the process in. I think 
>> linux has similar functionality, but don't know about Windows.
>>
>> David Holmes
>>
>> Keith McNeill said the following on 01/09/09 06:22:
>>> Our software has a C++ network layer using a large java runtime via 
>>> JNI.  When new clients connect to our server we make some 
>>> NewDirectByteBuffer calls so that we can pass data from the c++ 
>>> network layer to the the java runtime system.   We use the JVM 
>>> invocation JNI interface (i.e. we startup with our own exe rather 
>>> than java.exe).  This same basic setup has been running for several 
>>> years.
>>>
>>> We have recently found that we can get what appears to be deadlock 
>>> within calls to NewDirectByteBuffer.   Debugging we can see multiple 
>>> threads down in the guts of NewDirectByteBuffer blocked.    Once the 
>>> deadlock occurs the JVM is hosed.  We can't get stack dumps from it, 
>>> can't do anything with it. This problem is complicated to reproduce 
>>> but we can do it reliably.
>>> We have been able to reproduce this with JDK 1.6.0_04 through JDK 
>>> 1.6.0_11.  We haven't been able to reproduce with JDK 1.6.0_03 down 
>>> through JDK 1.5.
>>>
>>> Any suggestions on the best way to debug this JDK problem?
>>>
>>> Keith
>>>
>>>