A guarantee failure with CMSClassUnloadingMaxInterval

Hiroshi Yamauchi yamauchi at google.com
Wed May 15 21:20:06 UTC 2013


Hi,

I'm getting the following JVM crash (a guarantee failure) with flag
-XX:CMSClassUnloadingMaxInterval=N (where N > 0) somewhat
intermittently. I have a theory of why it's crashing. It'd be great if
someone can indicate whether I'm looking at this right.

Here goes:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (dictionary.cpp:267), pid=30153, tid=1325398848
#  guarantee(!is_alive->do_object_b(k_def_class_loader)) failed:
defining loader should not be live if klass is not
#
# JRE version: 7.0_17-b02
# Java VM: Java HotSpot(TM) Server VM (23.7-b01 mixed mode linux-x86 )
# Failed to write core dump. Core dumps have been disabled. To enable
core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x4eeb9400):  VMThread [stack: 0x4ef7f000,0x4f000000] [id=30176]

Stack: [0x4ef7f000,0x4f000000],  sp=0x4effe7c0,  free space=509k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x720449]  VMError::report_and_die()+0x199
V  [libjvm.so+0x2e8829]  report_vm_error(char const*, int, char
const*, char const*)+0x49
V  [libjvm.so+0x33b08f]  Dictionary::do_unloading(BoolObjectClosure*)+0x8ef
V  [libjvm.so+0x6aeec5]  SystemDictionary::do_unloading(BoolObjectClosure*)+0x25
V  [libjvm.so+0x39f741]  GenMarkSweep::mark_sweep_phase1(int, bool)+0xd1
V  [libjvm.so+0x39fce5]  GenMarkSweep::invoke_at_safepoint(int,
ReferenceProcessor*, bool)+0x105
V  [libjvm.so+0x2d3b62]  CMSCollector::do_compaction_work(bool)+0x142
V  [libjvm.so+0x2d4d4e]
CMSCollector::acquire_control_and_collect(bool, bool)+0x19e
V  [libjvm.so+0x2d5226]  ConcurrentMarkSweepGeneration::collect(bool,
bool, unsigned int, bool)+0xe6
V  [libjvm.so+0x39e805]  GenCollectedHeap::do_collection(bool, bool,
unsigned int, bool, int)+0x4a5
V  [libjvm.so+0x28f4c7]
GenCollectorPolicy::satisfy_failed_allocation(unsigned int, bool)+0xe7
V  [libjvm.so+0x7215f4]  VM_GenCollectForAllocation::doit()+0x74
V  [libjvm.so+0x72ae61]  VM_Operation::evaluate()+0x41
V  [libjvm.so+0x729748]  VMThread::evaluate_operation(VM_Operation*)+0x78
V  [libjvm.so+0x729cd8]  VMThread::loop()+0x1e8
V  [libjvm.so+0x72a375]  VMThread::run()+0x85
V  [libjvm.so+0x5df1f1]  java_start(Thread*)+0x111
C  [libpthread.so.0+0x6d4c]  start_thread+0xcc

VM_Operation (0x4cd8d6b8): GenCollectForAllocation, mode: safepoint,
requested by thread 0x4cc38c00


Background: Enabled with CMSClassUnloadingEnabled, this flag should
cause the class unloading to invoke less frequently, namely, at every
N+1 concurrent collections, as opposed to at every concurrent
collection.

Here's what I thought:

Here's the failing guarantee (in Dictionary::do_unloading() in
dictionary.cpp, around line 262 in
http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/f8075a623349/src/share/vm/classfile/dictionary.cpp):

         // The loader in this entry is alive. If the klass is dead,
         // the loader must be an initiating loader (rather than the
         // defining loader). Remove this entry.
         if (!is_alive->do_object_b(e)) {
           guarantee(!is_alive->do_object_b(k_def_class_loader),
                     "defining loader should not be live if klass is not");
           // If we get here, the class_loader must not be the defining
           // loader, it must be an initiating one.
           assert(k_def_class_loader != class_loader,
                  "cannot have live defining loader and unreachable klass");

It seems that at the time of the crash, e is not alive but
k_def_class_loader is alive (hence the guarantee failure) and the
debugger indicates that k_def_class_loader == class_loader (which
would mean that the following assert would also fail.) I'm not sure
what this really means (I'm not class (un)loading expert), but it
appears that there's a certain class (un)loading invariant that broke
due to CMSClassUnloadingMaxInterval.

Here's a theory of what I think might be happening:

In the CMS initialization sequence, the root scanning option is set
statically (once per JVM invocation) to either SO_AllClasses or
SO_SystemClasses based on the flag value of CMSClassUnloadingEnabled
(at line 810 in
http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/f8075a623349/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp):

 // Choose what strong roots should be scanned depending on verification options
 // and perm gen collection mode.
 if (!CMSClassUnloadingEnabled) {
   // If class unloading is disabled we want to include all classes
into the root set.
   add_root_scanning_option(SharedHeap:: SO_AllClasses);
 } else {
   add_root_scanning_option(SharedHeap::SO_SystemClasses);
 }

This code appears to assume that this choice solely depends on the
value of flag CMSClassUnloadingEnabled.

Then, at the beginning of each concurrent collection invocation, the
following code decides whether to perform class unloading with this
code (at line 3223 in
http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/f8075a623349/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp):

bool CMSCollector::update_should_unload_classes() {
  _should_unload_classes = false;
  // Condition 1 above
  if (_full_gc_requested && ExplicitGCInvokesConcurrentAndUnloadsClasses) {
    _should_unload_classes = true;
  } else if (CMSClassUnloadingEnabled) { // Condition 2.a above
     // Disjuncts 2.b.(i,ii,iii) above
    _should_unload_classes = (concurrent_cycles_since_last_unload() >=
                               CMSClassUnloadingMaxInterval)
                            || _permGen->should_concurrent_collect()
                            || _cmsGen->is_too_full();
   }
   return _should_unload_classes;
}

If CMSClassUnloadingMaxInterval is 0 (which is the default case), then
if CMSClassUnloadingEnabled is true,
_should_unload_classes will be always true (that is, class unloading
will be performed in every concurrent collection run) as
concurrent_cycles_since_last_unload() would always return 0.

But, if CMSClassUnloadingMaxInterval > 0, _should_unload_classes won't
be always true (and we won't perform class unloading in every
concurrent collection run) even if CMSClassUnloadingEnabled is true.

Is it possible it should pick SO_AllClasses or SO_SystemClasses based
on the result of the update_should_unload_classes() call at every
concurrent collection, as opposed to solely based on the value of the
CMSClassUnloadingEnabled flag once at initialization?

For an experiment, If I insert this code:

 remove_root_scanning_option(SharedHeap::SO_SystemClasses |
                                             SharedHeap::SO_AllClasses);
 if (should_unload_classes()) {
   add_root_scanning_option(SharedHeap::SO_SystemClasses);
 } else {
   add_root_scanning_option(SharedHeap::SO_AllClasses);
 }

in CMSCollector::setup_cms_unloading_and_verification_state(), around
line 3254 in http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/file/f8075a623349/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp,

then the crash appears to fail to recur. I don't know if this does the
right thing or just is masking the bug.

I could reproduce this crash with JDK 7u17. I could not with a JDK8
probably because the failing guarantee line was removed in JDK8
(around line 168 in
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/1ae0472ff3a0/src/share/vm/classfile/dictionary.cpp.)
I'm not sure what the removal of the guarantee meant as it was part of
what looks like a perm gen removal change.

Anyhow, it'd nice to be able to use CMSClassUnloadingMaxInterval.

Thanks,
Hiroshi



More information about the hotspot-gc-dev mailing list