Redundant barrier elimination

Lindenmaier, Goetz goetz.lindenmaier at sap.com
Wed Feb 12 08:20:57 PST 2014


Hi,

during the PPC port, we encountered some problems with the current
representation of the barriers.

1.) We can do load-acquire (ld-twi-isync) on ppc.  Therefore we implement 
     MemBarAcquire  empty.
    But there were places where MemBarAcquire is issued without 
    corresponding to a dedicated load.  To distinguish this, we introduced
    MemBarLoadFence.
    Further, there are graphs where a ld.acq is followed by a membar instruction
    (sync or lwsync), in this case we can omit the -twi-isync.  We check this 
    during matching by calling followed_by_acquire() in the matcher predicate.
    (Comparable to Matcher::post_store_load_barrier().)
2.) Similar holds for st.rel on ia64. 
3.)  MemBarVolatileNode is specified to do a StoreLoad barrier.  On ppc,
    we match it to a node that issues the 'sync' instruction.  This is the
    only instruction doing a StoreLoad barrier.
    But 'sync' also does all the other barriers, so we could coalesce it with 
    any other MemBar node.
4.) We think that in do_exits() a MemBarStoreStore suffices. 

As a solution, I could think of a generic node, that indicates by four flags
which barrier it should execute (LoadLoad, LoadStore etc.).  An optimization 
on the ideal graph then could coalesce nodes, or-ing the flags.  The matcher
could then just match the cheapest instruction doing the required barriers.

Barriers that are implemented empty should just not be issued to the 
ideal graph.  There should be a way to configure per-platform which 
barriers are not needed.  Eventually they should be replaced by 
MemBarCPUOrder.  Also, MemBarCPUOrder should not be issued
if there is an other MemBar operation.
CPU order could be modeled by the node proposed above if none of
the flags is set.

In addition, on PPC, we could peel off barrier operations from cmpxchg 
and represent them as individual IR nodes. Then these could be subject to
further optimization, too.

Best regards,
  Goetz and Martin


-----Original Message-----
From: hotspot-compiler-dev-bounces at openjdk.java.net [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doug Lea
Sent: Mittwoch, 12. Februar 2014 15:41
To: hotspot compiler
Subject: Redundant barrier elimination


While exploring JMM update options, we noticed that hotspot
will tend to pile up memory barrier nodes without bothering to
coalesce them into one barrier instruction. Considering the likelihood
that JMM updates will accentuate pile-ups, I looked into improvements.

Currently, there is only one case where coalescing is attempted.
Matcher::post_store_load_barrier does a TSO-specific
forward pass, that handles only MemBarVolatile. This is a
harder case than others, because it takes into account that
other MemBars are no-ops on TSO. It is (or
should be) called only from dfa on x86 and sparc.
So it does not apply on processors for which MemBarAcquire and
MemBarRelease are not no-ops. But for all (known) processors,
you can always do an easier check for redundancy, buttressed
by hardware-model-specific ones like post_store_load_barrier
when applicable. I put together the following, that does a
basic check, but I don't offhand know of a cpu-independent
place to call it from. Needing to invoke this from each barrier
case in each .ad file seems suboptimal. Any advice would be
welcome. Or perhaps suggestions about placing similar functionality
somewhere other than Matcher?

Thanks!

... diffs from JDK9 (warning: I haven't even tried to compile this)

diff -r 4c8bda53850f src/share/vm/opto/matcher.cpp
--- a/src/share/vm/opto/matcher.cpp	Thu Feb 06 13:08:44 2014 -0800
+++ b/src/share/vm/opto/matcher.cpp	Wed Feb 12 09:07:17 2014 -0500
@@ -2393,6 +2393,54 @@
    return false;
  }

+// Detect if current barrier is redundant.  Returns true if there is
+// another upcoming barrier or atomic operation with at least the same
+// properties before next store or load. Assumes that MemBarVolatile
+// and CompareAndSwap* provide "full" fences, and that non-biased
+// FastLock/Unlock provide acquire/release
+bool Matcher::is_redundant_barrier(const Node* vmb) {
+  Compile* C = Compile::current();
+  assert(vmb->is_MemBar(), "");
+  const MemBarNode* membar = vmb->as_MemBar();
+  int vop = vmb->Opcode();
+
+  // Get the Ideal Proj node, ctrl, that can be used to iterate forward
+  Node* ctrl = NULL;
+  for (DUIterator_Fast imax, i = membar->fast_outs(imax); i < imax; i++) {
+    Node* p = membar->fast_out(i);
+    assert(p->is_Proj(), "only projections here");
+    if ((p->as_Proj()->_con == TypeFunc::Control) &&
+        !C->node_arena()->contains(p)) { // Unmatched old-space only
+      ctrl = p;
+      break;
+    }
+  }
+  assert((ctrl != NULL), "missing control projection");
+
+  for (DUIterator_Fast jmax, j = ctrl->fast_outs(jmax); j < jmax; j++) {
+    Node *x = ctrl->fast_out(j);
+    int xop = x->Opcode();
+
+    if (xop == vop ||
+        xop == Op_MemBarVolatile ||
+        xop == Op_CompareAndSwapL ||
+        xop == Op_CompareAndSwapP ||
+        xop == Op_CompareAndSwapN ||
+        xop == Op_CompareAndSwapI ||
+        (!UseBiasedLocking &&
+         ((xop == Op_FastLock && vop == Op_MemBarAcquire) ||
+          (xop == Op_FastUnlock && vop == Op_MemBarRelease)))) {
+      return true;
+    }
+
+    if (x->is_Load() || x->is_Store() || x->is_LoadStore() ||
+        x->is_Call() || x->is_SafePoint() || x->is_block_proj()) {
+      break;
+    }
+  }
+  return false;
+}
+
  //=============================================================================
  //---------------------------State---------------------------------------------
  State::State(void) {





More information about the hotspot-compiler-dev mailing list