RFR: 8264634: CollectCLDClosure collects duplicated CLDs when dumping dynamic archive [v2]

Ioi Lam iklam at openjdk.java.net
Sat Apr 3 01:45:31 UTC 2021


On Fri, 2 Apr 2021 10:21:56 GMT, Yi Yang <yyang at openjdk.org> wrote:

>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (/home/qingfeng.yy/openjdk16_so_warning/jdk/src/hotspot/share/classfile/classLoaderData.cpp:316), pid=68929, tid=68930
>> #  assert(_keep_alive > 0) failed: Invalid keep alive decrement count
>> #
>> # JRE version: OpenJDK Runtime Environment (17.0) (slowdebug build 17-internal+0-adhoc.qingfengyy.jdk)
>> # Java VM: OpenJDK 64-Bit Server VM (slowdebug 17-internal+0-adhoc.qingfengyy.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
>> # Problematic frame:
>> # V  [libjvm.so+0x781087]  ClassLoaderData::dec_keep_alive()+0x31
>> 
>> Stack: [0x00007f1593072000,0x00007f1593173000],  sp=0x00007f1593171c00,  free space=1023k
>> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x781087]  ClassLoaderData::dec_keep_alive()+0x31
>> V  [libjvm.so+0xef19e7]  MetaspaceShared::link_and_cleanup_shared_classes(Thread*)+0x181
>> V  [libjvm.so+0x1260834]  JavaThread::invoke_shutdown_hooks()+0x46
>> V  [libjvm.so+0x12609e5]  Threads::destroy_vm()+0xe7
>> V  [libjvm.so+0xbb40ec]  jni_DestroyJavaVM_inner+0x91
>> V  [libjvm.so+0xbb4147]  jni_DestroyJavaVM+0x1f
>> C  [libjli.so+0x4b4f]  JavaMain+0xc61
>> C  [libjli.so+0xad93]  ThreadJavaMain+0x27
>> We observed VM crashed when dumping dynamic archive in a simple springboot application(See detailed content on JBS attachment). I did some investigations. In rare case, both of the following paths may be stepped on when dumping dynamic archive:
>> 
>> 1. SIGINT
>> at java.lang.Shutdown.beforeHalt(java.base at 17-internal/Native Method)
>> at java.lang.Shutdown.exit(java.base at 17-internal/Shutdown.java:172)
>> - locked <0x00000007fef02040> (a java.lang.Class for java.lang.Shutdown)
>> at java.lang.Terminator$1.handle(java.base at 17-internal/Terminator.java:51)
>> at jdk.internal.misc.Signal$1.run(java.base at 17-internal/Signal.java:219)
>> at java.lang.Thread.run(java.base at 17-internal/Thread.java:831)
>> 
>> 2. Normal Exit
>> JavaThread::invoke_shutdown_hooks()+0x46
>> Threads::destroy_vm()+0xe7
>> jni_DestroyJavaVM_inner+0x91
>> jni_DestroyJavaVM+0x1f
>> JavaMain+0xc61
>> ThreadJavaMain+0x27
>> 
>> They would call MetaspaceShared::link_and_cleanup_shared_classes, and CollectCLDClosure collects duplicated CLDs into _loaded_cld, _keep_alive is decrementing twice, causing a negative _keep_alive.
>> 
>> Testing(linux_x64):
>> [+] test/hotspot/jtreg/runtime/cds
>> [+] test/hotspot/jtreg/gc
>
> Yi Yang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
> 
>   CollectCLDClosure collects duplicated CLDs when dumping dynamic archive

The fix looks reasonable. If MetaspaceShared::link_and_cleanup_shared_classes may be called twice, it's better to isolate the loaded_cld for each invocation. Allocating it locally will also avoid any potential threading issues.

I have some requests for cleaning up the code.

src/hotspot/share/memory/metaspaceShared.cpp line 569:

> 567:   ResourceMark rm;
> 568:   GrowableArray<ClassLoaderData*> loaded_cld;
> 569:   CollectCLDClosure collect_cld(&loaded_cld);

I think we should add a comment to say why it's necessary to first collect the ClassLoaderDatas first:

// ClassLoaderDataGraph::loaded_cld_do requires ClassLoaderDataGraph_lock.
// We cannot link the classes while holding this lock (or else we may run into deadlock).
// Therefore, we need to first collect all the CLDs, and then link their classes after
// releasing the lock.

src/hotspot/share/memory/metaspaceShared.cpp line 600:

> 598:     cld->dec_keep_alive();
> 599:   }
> 600:   loaded_cld.trunc_to(0);

There's no need for the trucate -- `loaded_cld` is locally allocated and will be freed after this function returns.

Also, to improve modularity, I think we should move the dec_keep_alive loop into the destructor of CollectCLDClosure.

Also, `loaded_cld` can be moved as a field into CollectCLDClosure.

-------------

Changes requested by iklam (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3320


More information about the hotspot-runtime-dev mailing list