RFR: 8266310: deadlock while loading the JNI code

David Holmes david.holmes at oracle.com
Thu May 13 13:05:35 UTC 2021


Hi Aleksei,

On 13/05/2021 9:54 pm, Aleksei Voitylov wrote:
> Hi David,
> 
> On 12/05/2021 10:56, David Holmes wrote:
>> Hi Aleksei,
>>
>> On 11/05/2021 11:19 pm, Aleksei Voitylov wrote:
>>> Please review this PR which fixes the deadlock in ClassLoader between
>>> the two lock objects - a lock object associated with the class being
>>> loaded, and the ClassLoader.loadedLibraryNames hash map, locked
>>> during the native library load operation.
>>>
>>> Problem being fixed:
>>>
>>> The initial reproducer demonstrated a deadlock between the
>>> JarFile/ZipFile and the hash map. That deadlock exists even when the
>>> ZipFile/JarFile lock is removed because there's another lock object
>>> in the class loader, associated with the name of the class being
>>> loaded. Such objects are stored in ClassLoader.parallelLockMap. The
>>> deadlock occurs when JNI_OnLoad() loads exactly the same class, whose
>>> signature is being verified in another thread.
>>>
>>> Proposed fix:
>>>
>>> The proposed patch suggests to get rid of locking loadedLibraryNames
>>> hash map and synchronize on each entry name, as it's done with class
>>> names in see ClassLoader.getClassLoadingLock(name) method.
>>
>> The ClassLoader per-name locking map has a number of problems so I'm
>> not sure using it as a model is a good idea. In particular your new
>> map only grows, with entries never being removed AFAICS.
> OK. I'm working on a version of a patch which addresses this concern.
>>
>> This is a difficult deadlock to solve. Even using a per-entry lock it
>> isn't obvious to me that you can't still get a deadlock.
> 
> I believe the patch solves the deadlock when different libraries are in
> play. Yes, it's still possible to get a deadlock when a library with a
> JNI_Onload instantiates a class that requires the same library to be
> loaded. Is this a valid use case to begin with for future use and why
> such a recursion should be allowed? If not, I'd like to file a CSR for
> JNI specification with the intent to forbid such behavior. It's obvious
> there are no current users of such behavior because it deadlocks.

So in the current case we have:

Thread 1:
  - loads native lib and acquires load-library lock
  - JNI_Onload tries to initialize class X but can't get the class init lock

Thread 2:
  - initializes class X and has the class init lock
  - <clinit> tries to load a native library and can't get the 
load-library lock

If the library lock is finer granularity then we can reduce the risk of 
deadlock but not remove it.

I don't think we can try to forbid anything through the JNI 
specification in such a case. This is really just one of a number of 
class initialization deadlocks that are possible. At most the spec could 
caution against such things, but there is no way to detect it and so 
"forbid" it. And you can construct similar deadlocks without class 
initialization being involved.

>>
>> My own thoughts on this problem were that we should not be calling
>> JNI_OnLoad with any lock held. But that risks use of a library in a
>> separate thread before the JNI+OnLoad has completed. As I said this is
>> a difficult deadlock to solve - and may not have a complete solution.
> 
> Thanks, this is a very appealing idea. While it's true dlopen on Linux
> and Mac OS are explicitly marked thread safe (and LoadLibrary on Windows
> can probably be assumed so as well, though is not marked so in Microsoft
> documentation), all implementations use reference counting techniques,
> and the number of calls to open has to be equal to the number of calls
> to close. I don't know how to achieve that without locks.

Not every problem has a solution :) If JNI_OnLoad has to execute 
atomically with respect to loading a library then there will always be a 
deadlock potential. The only complete solution would not hold a lock 
while JNI_OnLoad is executed, but that completely changes the semantics 
of native library loading.

Cheers,
David

> -Aleksei
> 
>>
>> David
>> -----
>>
>>> The patch introduces nativeLibraryLockMap which holds the lock
>>> objects for each library name, and the getNativeLibraryLock() private
>>> method is used to lazily initialize the corresponding lock object.
>>> nativeLibraryContext was changed to ThreadLocal, so that in any
>>> concurrent thread it would have a NativeLibrary object on top of the
>>> stack, that's being currently loaded/unloaded in that thread.
>>> nativeLibraryLockMap accumulates the names of all native libraries
>>> loaded - in line with class loading code, it is not explicitly cleared.
>>>
>>> Testing:  jtreg and jck testing with no regressions. A new regression
>>> test was developed.
>>>
>>> -------------
>>>
>>> Commit messages:
>>>    - JDK-8266310: deadlock while loading the JNI code
>>>    - JDK-8266310: deadlock while loading the JNI code
>>>
>>> Changes: https://git.openjdk.java.net/jdk/pull/3976/files
>>>    Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3976&range=00
>>>     Issue: https://bugs.openjdk.java.net/browse/JDK-8266310
>>>     Stats: 475 lines in 6 files changed: 456 ins; 0 del; 19 mod
>>>     Patch: https://git.openjdk.java.net/jdk/pull/3976.diff
>>>     Fetch: git fetch https://git.openjdk.java.net/jdk
>>> pull/3976/head:pull/3976
>>>
>>> PR: https://git.openjdk.java.net/jdk/pull/3976
>>>
> 


More information about the core-libs-dev mailing list