RFR: 8266310: deadlock while loading the JNI code [v2]

David Holmes david.holmes at oracle.com
Thu May 20 23:11:02 UTC 2021


Hi Peter,

On 21/05/2021 12:42 am, Peter Levart wrote:
> Hi Aleksei,
> 
> Are you trying to solve this in principle or do you have a concrete 
> problem at hand which triggers this deadlock? If it is the later, then 
> some rearrangement of code might do the trick... For example, native 
> libraries are typically loaded by a class initializer of some class that 
> is guaranteed to be initialized before the 1st invocation of a native 
> method from such library. But if such class can also be loaded and 
> initialized by some other trigger, deadlock can occur. Best remedy for 
> such situation is to move all native methods to a special class that 
> serves just for interfacing with native code and also contains an 
> initializer that loads the native library and nothing else. Such 
> arrangement would ensure that the order of taking locks is always the 
> same: classLoadingLock -> nativeLibraryLock ...

There were specific examples for this problem, but Aleksei is trying to 
define a general solution - which unfortunately doesn't exist.

The basic deadlock scenario is a special variant of the general class 
initialization deadlock:

Thread 1:
- loadLibrary
  - acquire loadLibrary global lock
    - call JNI_OnLoad
      - use class Foo (which needs to be loaded and initialized)
        - block acquiring <clinit> lock for Foo

Thread 2:
  - Initialize class Foo
   - Acquire <clinit> lock for Foo
     - <clinit>
       - call loadLibrary(x) // for any X
        - block acquiring loadLibrary global lock

We can reduce the chance of deadlock by using a per-native-library lock 
instead of the global loadLibrary lock - which is what Aleksei's initial 
version did. But we cannot remove all deadlock possibility because we 
must ensure only one thread can be executing JNI_OnLoad for any given 
native library.

Cheers,
David
-----


> Regards, Peter
> 
> On 5/20/21 12:31 AM, David Holmes wrote:
>> On 20/05/2021 2:29 am, Aleksei Voitylov wrote:
>>> On Wed, 19 May 2021 16:21:41 GMT, Aleksei Voitylov 
>>> <avoitylov at openjdk.org> wrote:
>>>
>>>>> Please review this PR which fixes the deadlock in ClassLoader 
>>>>> between the two lock objects - a lock object associated with the 
>>>>> class being loaded, and the ClassLoader.loadedLibraryNames hash 
>>>>> map, locked during the native library load operation.
>>>>>
>>>>> Problem being fixed:
>>>>>
>>>>> The initial reproducer demonstrated a deadlock between the 
>>>>> JarFile/ZipFile and the hash map. That deadlock exists even when 
>>>>> the ZipFile/JarFile lock is removed because there's another lock 
>>>>> object in the class loader, associated with the name of the class 
>>>>> being loaded. Such objects are stored in 
>>>>> ClassLoader.parallelLockMap. The deadlock occurs when JNI_OnLoad() 
>>>>> loads exactly the same class, whose signature is being verified in 
>>>>> another thread.
>>>>>
>>>>> Proposed fix:
>>>>>
>>>>> The proposed patch suggests to get rid of locking 
>>>>> loadedLibraryNames hash map and synchronize on each entry name, as 
>>>>> it's done with class names in see 
>>>>> ClassLoader.getClassLoadingLock(name) method.
>>>>>
>>>>> The patch introduces nativeLibraryLockMap which holds the lock 
>>>>> objects for each library name, and the getNativeLibraryLock() 
>>>>> private method is used to lazily initialize the corresponding lock 
>>>>> object. nativeLibraryContext was changed to ThreadLocal, so that in 
>>>>> any concurrent thread it would have a NativeLibrary object on top 
>>>>> of the stack, that's being currently loaded/unloaded in that 
>>>>> thread. nativeLibraryLockMap accumulates the names of all native 
>>>>> libraries loaded - in line with class loading code, it is not 
>>>>> explicitly cleared.
>>>>>
>>>>> Testing:  jtreg and jck testing with no regressions. A new 
>>>>> regression test was developed.
>>>>
>>>> Aleksei Voitylov has updated the pull request incrementally with one 
>>>> additional commit since the last revision:
>>>>
>>>>    address review comments, add tests
>>>
>>> Dear colleagues,
>>>
>>> The updated PR addresses review comment regarding ThreadLocal as well 
>>> as David' concern around the lock being held during 
>>> JNI_OnLoad/JNI_OnUnload calls, and ensures all lock objects are 
>>> deallocated. Multiple threads are allowed to enter 
>>> NativeLibrary.load() to prevent any thread from locking while another 
>>> thread loads a library. Before the update, there could be a class 
>>> loading lock held by a parallel capable class loader, which can 
>>> deadlock with the library loading lock. As proposed by David Holmes, 
>>> the library loading lock was removed because dlopen/LoadLibrary are 
>>> thread safe and they maintain internal reference counters on 
>>> libraries. There's still a lock being held while a pair of containers 
>>> are read/updated. It's not going to deadlock as there's no lock/wait 
>>> operation performed while that lock is held. Multiple threads may 
>>> create their own copies of NativeLibrary object and register it for 
>>> auto unloading.
>>>
>>> Tests for auto unloading were added along with the PR update. There 
>>> are now 3 jtreg tests:
>>> - one checks for deadlock, similar to the one proposed by Chris Hegarty
>>> - two other tests are for library unload.
>>>
>>> The major side effect of that multiple threads are allowed to enter 
>>> is that JNI_OnLoad/JNI_OnUnload may be called multiple (but same) 
>>> number of times from concurrent threads. In particular, the number of 
>>> calls to JNI_OnLoad must be equal to the number of calls to 
>>> JNI_OnUnload after the relevant class loader is garbage collected. 
>>> This may affect the behaviour that relies on specific order or the 
>>> number of JNI_OnLoad/JNI_OnUnload calls. The current JNI 
>>> specification does not mandate how many times JNI_OnLoad/JNI_OnUnload 
>>> are called. Also, we could not locate tests in jck/jtreg/vmTestbase 
>>> that would rely on the specific order or number of calls to 
>>> JNI_OnLoad/JNI_OnUnload.
>>
>> But you can't make such a change! That was my point. To fix the 
>> deadlock we must not hold a lock. But we must ensure only a single 
>> call to JNI_OnLoad is possible. It is an unsolvable problem with those 
>> constraints. You can't just change the behaviour of JNI_OnLoad like that.
>>
>> David
>> -----
>>
> 
> If this is really a problem that several people are facing, then perhaps 
> a change in the API could solve it. I'm thinking
> 
>>> Thank you Alan Bateman, David Holmes and Chris Hegarty for your 
>>> valuable input.
>>>
>>> -------------
>>>
>>> PR: https://git.openjdk.java.net/jdk/pull/3976
>>>
> 


More information about the core-libs-dev mailing list