RFR(S): 8248851: CMS: Missing memory fences between free chunk check and klass read

Yangfei (Felix) felix.yang at huawei.com
Tue Jul 7 03:22:20 UTC 2020


Hi,

We were witnessing random JVM crash that triggers in our production environment.
We were running an aarch64 jdk8u release build with -XX:+UseConcMarkSweepGC.
We see three different crash logs as reported on the issue.
Debugging show that this caused by missing memory fences in CMS for systems with weak memory model like aarch64.

The overall procedure in CMS promotion looks like:

ConcurrentMarkSweepGeneration::par_promote {

  HeapWord* obj_ptr = ps->lab.alloc(alloc_sz);
          |---> CFLS_LAB::alloc
                          |--->FreeChunk::markNotFree
	 
  oop obj = oop(obj_ptr);
  OrderAccess::storestore();
  
  obj->set_mark(m);
  OrderAccess::storestore();
  
  // Finally, install the klass pointer (this should be volatile).
  OrderAccess::storestore();
  obj->set_klass(old->klass());

  ......

void markNotFree() {
     // Set _prev (klass) to null before (if) clearing the mark word below
     _prev = NULL;
#ifdef _LP64
     if (UseCompressedOops) {
       OrderAccess::storestore();
       set_mark(markOopDesc::prototype());
     }
#endif
     assert(!is_free(), "Error");
}

From the first crash log on the issue, the crash site was in CompactibleFreeListSpace::block_size.
We found that it's possible on aarch64 (Reference [1]) that the klass load may be scheduled before the free chunk check in CompactibleFreeListSpace::block_size().
Then we may have an invalid non-null klass, which leads to the crash.  Same issue exists in CompactibleFreeListSpace::block_is_obj(), which leads to the other two crashes.

Webrev for jdk8u-dev: http://cr.openjdk.java.net/~fyang/8248851/webrev.00/ 
JTreg tested on x86_64-linux-gnu &aarch64-linux-gnu with jdk8u release builds.
Comments?

Thanks,
Felix

[1] Reference: armv8 architecture reference manual K11.6.1
This restriction applies only when the data value returned by a read is used as a data value to calculate the
address of a subsequent read or write. It does not apply if the data value returned by a read determines the
condition flags values, and the values of the flags are used for condition code evaluation to determine the
address of a subsequent read, either through conditional execution or the evaluation of a branch. This is called
a control dependency.


More information about the hotspot-gc-dev mailing list