Redundant barrier elimination
Lindenmaier, Goetz
goetz.lindenmaier at sap.com
Wed Feb 12 08:20:57 PST 2014
Hi,
during the PPC port, we encountered some problems with the current
representation of the barriers.
1.) We can do load-acquire (ld-twi-isync) on ppc. Therefore we implement
MemBarAcquire empty.
But there were places where MemBarAcquire is issued without
corresponding to a dedicated load. To distinguish this, we introduced
MemBarLoadFence.
Further, there are graphs where a ld.acq is followed by a membar instruction
(sync or lwsync), in this case we can omit the -twi-isync. We check this
during matching by calling followed_by_acquire() in the matcher predicate.
(Comparable to Matcher::post_store_load_barrier().)
2.) Similar holds for st.rel on ia64.
3.) MemBarVolatileNode is specified to do a StoreLoad barrier. On ppc,
we match it to a node that issues the 'sync' instruction. This is the
only instruction doing a StoreLoad barrier.
But 'sync' also does all the other barriers, so we could coalesce it with
any other MemBar node.
4.) We think that in do_exits() a MemBarStoreStore suffices.
As a solution, I could think of a generic node, that indicates by four flags
which barrier it should execute (LoadLoad, LoadStore etc.). An optimization
on the ideal graph then could coalesce nodes, or-ing the flags. The matcher
could then just match the cheapest instruction doing the required barriers.
Barriers that are implemented empty should just not be issued to the
ideal graph. There should be a way to configure per-platform which
barriers are not needed. Eventually they should be replaced by
MemBarCPUOrder. Also, MemBarCPUOrder should not be issued
if there is an other MemBar operation.
CPU order could be modeled by the node proposed above if none of
the flags is set.
In addition, on PPC, we could peel off barrier operations from cmpxchg
and represent them as individual IR nodes. Then these could be subject to
further optimization, too.
Best regards,
Goetz and Martin
-----Original Message-----
From: hotspot-compiler-dev-bounces at openjdk.java.net [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Doug Lea
Sent: Mittwoch, 12. Februar 2014 15:41
To: hotspot compiler
Subject: Redundant barrier elimination
While exploring JMM update options, we noticed that hotspot
will tend to pile up memory barrier nodes without bothering to
coalesce them into one barrier instruction. Considering the likelihood
that JMM updates will accentuate pile-ups, I looked into improvements.
Currently, there is only one case where coalescing is attempted.
Matcher::post_store_load_barrier does a TSO-specific
forward pass, that handles only MemBarVolatile. This is a
harder case than others, because it takes into account that
other MemBars are no-ops on TSO. It is (or
should be) called only from dfa on x86 and sparc.
So it does not apply on processors for which MemBarAcquire and
MemBarRelease are not no-ops. But for all (known) processors,
you can always do an easier check for redundancy, buttressed
by hardware-model-specific ones like post_store_load_barrier
when applicable. I put together the following, that does a
basic check, but I don't offhand know of a cpu-independent
place to call it from. Needing to invoke this from each barrier
case in each .ad file seems suboptimal. Any advice would be
welcome. Or perhaps suggestions about placing similar functionality
somewhere other than Matcher?
Thanks!
... diffs from JDK9 (warning: I haven't even tried to compile this)
diff -r 4c8bda53850f src/share/vm/opto/matcher.cpp
--- a/src/share/vm/opto/matcher.cpp Thu Feb 06 13:08:44 2014 -0800
+++ b/src/share/vm/opto/matcher.cpp Wed Feb 12 09:07:17 2014 -0500
@@ -2393,6 +2393,54 @@
return false;
}
+// Detect if current barrier is redundant. Returns true if there is
+// another upcoming barrier or atomic operation with at least the same
+// properties before next store or load. Assumes that MemBarVolatile
+// and CompareAndSwap* provide "full" fences, and that non-biased
+// FastLock/Unlock provide acquire/release
+bool Matcher::is_redundant_barrier(const Node* vmb) {
+ Compile* C = Compile::current();
+ assert(vmb->is_MemBar(), "");
+ const MemBarNode* membar = vmb->as_MemBar();
+ int vop = vmb->Opcode();
+
+ // Get the Ideal Proj node, ctrl, that can be used to iterate forward
+ Node* ctrl = NULL;
+ for (DUIterator_Fast imax, i = membar->fast_outs(imax); i < imax; i++) {
+ Node* p = membar->fast_out(i);
+ assert(p->is_Proj(), "only projections here");
+ if ((p->as_Proj()->_con == TypeFunc::Control) &&
+ !C->node_arena()->contains(p)) { // Unmatched old-space only
+ ctrl = p;
+ break;
+ }
+ }
+ assert((ctrl != NULL), "missing control projection");
+
+ for (DUIterator_Fast jmax, j = ctrl->fast_outs(jmax); j < jmax; j++) {
+ Node *x = ctrl->fast_out(j);
+ int xop = x->Opcode();
+
+ if (xop == vop ||
+ xop == Op_MemBarVolatile ||
+ xop == Op_CompareAndSwapL ||
+ xop == Op_CompareAndSwapP ||
+ xop == Op_CompareAndSwapN ||
+ xop == Op_CompareAndSwapI ||
+ (!UseBiasedLocking &&
+ ((xop == Op_FastLock && vop == Op_MemBarAcquire) ||
+ (xop == Op_FastUnlock && vop == Op_MemBarRelease)))) {
+ return true;
+ }
+
+ if (x->is_Load() || x->is_Store() || x->is_LoadStore() ||
+ x->is_Call() || x->is_SafePoint() || x->is_block_proj()) {
+ break;
+ }
+ }
+ return false;
+}
+
//=============================================================================
//---------------------------State---------------------------------------------
State::State(void) {
More information about the hotspot-compiler-dev
mailing list