RFR(XS): 8248219: aarch64: missing memory barrier in fast_storefield and fast_accessfield

Yangfei (Felix) felix.yang at huawei.com
Wed Jun 24 09:38:49 UTC 2020


Hi,

  Bug: https://bugs.openjdk.java.net/browse/JDK-8248219 
  We were witnessing random JVM crash that triggers about 2 or 3 times each day in one of our production environment.
  Debugging show this is triggered after patching bytecode.

  Normal template interpreter sequence:
    TemplateTable::putfield_or_static  ->  TemplateTable::resolve_cache_and_index  ->  InterpreterRuntime::resolve_from_cache  -> InterpreterRuntime::resolve_get_put

  cpCache entry is updated after that.

  Then we do bytecode patching in TemplateTable::putfield_or_static and dispatch to the next bytecode through InterpreterMacroAssembler::dispatch_next.

 510 void InterpreterMacroAssembler::dispatch_next(TosState state, int step, bool generate_poll) {
 511   // load next bytecode
 512   ldrb(rscratch1, Address(pre(rbcp, step)));
 513   dispatch_base(state, Interpreter::dispatch_table(state), generate_poll);
 514 }

  After that we switch to the fast path: TemplateTable::fast_storefield.  This will read the cpCache entry.  
  But we may have invalid cpCache entry values as the memory barrier is missing.
  The bytecode load may be scheduled before the bytecode load and even before the setting of the cpCache entry.

Reference:
armv8 architecture reference manual K11.6.1
This restriction applies only when the data value returned by a read is used as a data value to calculate the
address of a subsequent read or write. It does not apply if the data value returned by a read determines the
condition flags values, and the values of the flags are used for condition code evaluation to determine the
address of a subsequent read, either through conditional execution or the evaluation of a branch. This is called
a control dependency.

One fix adding explicit memory barriers looks like:
diff -r abdc7ca79bdf src/hotspot/cpu/aarch64/templateTable_aarch64.cpp
--- a/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp Tue Jun 23 13:41:55 2020 +0300
+++ b/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp Wed Jun 24 15:01:21 2020 +0800
@@ -2975,6 +2975,9 @@
   // access constant pool cache
   __ get_cache_and_index_at_bcp(r2, r1, 1);

+  // Must prevent reordering of the following cp cache loads with bytecode load
+  __ membar(MacroAssembler::LoadLoad);
+
   // test for volatile with r3
   __ ldrw(r3, Address(r2, in_bytes(base +
                                    ConstantPoolCacheEntry::flags_offset())));
@@ -3067,6 +3070,10 @@

   // access constant pool cache
   __ get_cache_and_index_at_bcp(r2, r1, 1);
+
+  // Must prevent reordering of the following cp cache loads with bytecode load
+  __ membar(MacroAssembler::LoadLoad);
+
   __ ldr(r1, Address(r2, in_bytes(ConstantPoolCache::base_offset() +
                                   ConstantPoolCacheEntry::f2_offset())));
   __ ldrw(r3, Address(r2, in_bytes(ConstantPoolCache::base_offset() +


Another approach avoids the memory barrier by introducing artificial true dependency: 
--- a/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp Tue Jun 23 13:41:55 2020 +0300
+++ b/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp Wed Jun 24 15:43:49 2020 +0800
@@ -2975,6 +2975,11 @@
   // access constant pool cache
   __ get_cache_and_index_at_bcp(r2, r1, 1);

+  // use artificial data dependency from the bytecode load to the following
+  // cp cache loads to avoid a load-load barrier
+  __ eor(rscratch2, rscratch1, rscratch1);
+  __ orr(r2, r2, rscratch2);
+
   // test for volatile with r3
   __ ldrw(r3, Address(r2, in_bytes(base +
                                    ConstantPoolCacheEntry::flags_offset())));
@@ -3067,6 +3072,12 @@

   // access constant pool cache
   __ get_cache_and_index_at_bcp(r2, r1, 1);
+
+  // use artificial data dependency from the bytecode load to the following
+  // cp cache loads to avoid a load-load barrier
+  __ eor(rscratch2, rscratch1, rscratch1);
+  __ orr(r2, r2, rscratch2);
+
   __ ldr(r1, Address(r2, in_bytes(ConstantPoolCache::base_offset() +
                                   ConstantPoolCacheEntry::f2_offset())));
   __ ldrw(r3, Address(r2, in_bytes(ConstantPoolCache::base_offset() +

  Suggenstions?

Thanks,
Felix


More information about the hotspot-runtime-dev mailing list