/hg/icedtea8-forest/hotspot: 35 new changesets

andrew at icedtea.classpath.org andrew at icedtea.classpath.org
Sun Apr 10 00:29:53 UTC 2016


changeset 5cd005a0470b in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=5cd005a0470b
author: adinn
date: Wed Aug 26 17:13:59 2015 +0100

	8134322, PR2922: AArch64: Fix several errors in C2 biased locking implementation
	Summary: Several errors in C2 biased locking require fixing
	Reviewed-by: kvn
	Contributed-by: hui.shi at linaro.org


changeset babe8ca2d61e in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=babe8ca2d61e
author: enevill
date: Tue Sep 15 12:59:51 2015 +0000

	8136524, PR2922: aarch64: test/compiler/runtime/7196199/Test7196199.java fails
	Summary: Fix safepoint handlers to save 128 bits on vector poll
	Reviewed-by: kvn
	Contributed-by: felix.yang at linaro.org


changeset 0896e50fab35 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=0896e50fab35
author: roland
date: Thu Feb 25 09:43:56 2016 -0500

	8136596, PR2922: Remove aarch64: MemBarRelease when final field's allocation is NoEscape or ArgEscape
	Summary: elide MemBar when AllocateNode _is_non_escaping
	Reviewed-by: kvn, roland
	Contributed-by: hui.shi at linaro.org


changeset b317b9da87e4 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=b317b9da87e4
author: enevill
date: Wed Sep 16 13:50:57 2015 +0000

	8136615, PR2922: aarch64: elide DecodeN when followed by CmpP 0
	Summary: remove DecodeN when comparing a narrow oop with 0
	Reviewed-by: kvn, adinn


changeset c192885e7c16 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=c192885e7c16
author: aph
date: Mon Sep 28 16:18:15 2015 +0000

	8136165, PR2922: AARCH64: Tidy up compiled native calls
	Summary: Do some cleaning
	Reviewed-by: roland, kvn, enevill


changeset 75ae9026eadd in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=75ae9026eadd
author: aph
date: Wed Sep 30 13:23:46 2015 +0000

	8138641, PR2922: Disable C2 peephole by default for aarch64
	Reviewed-by: roland
	Contributed-by: felix.yang at linaro.org


changeset 953c4e38008b in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=953c4e38008b
author: aph
date: Tue Sep 29 17:01:37 2015 +0000

	8138575, PR2922: Improve generated code for profile counters
	Reviewed-by: kvn


changeset f987924334cd in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=f987924334cd
author: enevill
date: Thu Oct 15 15:33:54 2015 +0000

	8139674, PR2922: aarch64: guarantee failure in TestOptionsWithRanges.java
	Summary: Fix negative overflow in instruction field
	Reviewed-by: kvn, roland, adinn, aph


changeset 6e4896ac5bbc in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=6e4896ac5bbc
author: ecaspole
date: Mon Sep 21 10:36:36 2015 -0400

	8131645, PR2922: [ARM64] crash on Cavium when using G1
	Summary: Add a fence when creating the CodeRootSetTable so the readers do not see invalid memory.
	Reviewed-by: aph, tschatzl


changeset 6a589c3915be in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=6a589c3915be
author: adinn
date: Thu Oct 08 11:06:07 2015 -0400

	PR2922: Backport optimization of volatile puts/gets and CAS to use ldar/stlr


changeset 0b5123ad9c31 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=0b5123ad9c31
author: enevill
date: Wed Oct 28 17:47:45 2015 +0000

	PR2922: Fix thinko when backporting 8131645. Table ends up being allocated twice.


changeset 86b2d612adf1 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=86b2d612adf1
author: enevill
date: Wed Oct 28 17:51:10 2015 +0000

	8140611, PR2922: aarch64: jtreg test jdk/tools/pack200/UnpackerMemoryTest.java SEGVs
	Summary: Fix register usage on calling native synchronized methods
	Reviewed-by: kvn, adinn


changeset 27acb51158b9 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=27acb51158b9
author: enevill
date: Thu Feb 25 05:44:08 2016 -0500

	PR2922: Some 32 bit shifts still being anded with 0x3f instead of 0x1f.


changeset 2bbfb04230ec in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=2bbfb04230ec
author: aph
date: Tue Sep 08 14:08:58 2015 +0100

	8135157, PR2922: DMB elimination in AArch64 C2 synchronization implementation
	Summary: Reduce memory barrier usage in C2 fast lock and unlock.
	Reviewed-by: kvn
	Contributed-by: wei.tang at linaro.org, aph at redhat.com


changeset 14f41a6da05f in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=14f41a6da05f
author: aph
date: Wed Nov 04 13:38:38 2015 +0100

	8138966, PR2922: Intermittent SEGV running ParallelGC
	Summary: Add necessary memory fences so that the parallel threads are unable to observe partially filled block tables.
	Reviewed-by: tschatzl


changeset a0284b5f2c3a in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=a0284b5f2c3a
author: enevill
date: Thu Nov 19 15:15:20 2015 +0000

	8143067, PR2922: aarch64: guarantee failure in javac
	Summary: Fix adrp going out of range during code relocation
	Reviewed-by: aph, kvn


changeset 498c0173ac25 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=498c0173ac25
author: hshi
date: Tue Nov 24 09:02:26 2015 +0000

	8143285, PR2922: aarch64: Missing load acquire when checking if ConstantPoolCacheEntry is resolved
	Reviewed-by: roland, aph


changeset 285af921daec in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=285af921daec
author: enevill
date: Fri Feb 26 03:44:38 2016 -0500

	PR2922: Add support for large code cache


changeset 384b670295d9 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=384b670295d9
author: enevill
date: Tue Jan 05 17:40:17 2016 +0000

	PR2922: Fix client build after addition of large code cache support


changeset 6ff8db505d54 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=6ff8db505d54
author: enevill
date: Tue Dec 29 16:47:34 2015 +0000

	8146286, PR2922: aarch64: guarantee failures with large code cache sizes on jtreg test java/lang/invoke/LFCaching/LFMultiThreadCachingTest.java
	Summary: patch trampoline calls with special case bl to itself which does not cause guarantee failure
	Reviewed-by: aph


changeset 216100b310c3 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=216100b310c3
author: hshi
date: Thu Nov 26 15:37:04 2015 +0000

	8143584, PR2922: Load constant pool tag and class status with load acquire
	Reviewed-by: roland, aph


changeset b286409be4b9 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=b286409be4b9
author: aph
date: Wed Nov 25 18:13:13 2015 +0000

	8144028, PR2922: Use AArch64 bit-test instructions in C2
	Reviewed-by: kvn


changeset 27d7474e68ca in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=27d7474e68ca
author: fyang
date: Mon Dec 07 21:23:02 2015 +0800

	8144587, PR2922: aarch64: generate vectorized MLA/MLS instructions
	Summary: Add support for MLA/MLS (vector) instructions
	Reviewed-by: roland


changeset 8fae3f3129fd in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=8fae3f3129fd
author: aph
date: Tue Dec 15 19:18:05 2015 +0000

	8145438, PR2922: Guarantee failures since 8144028: Use AArch64 bit-test instructions in C2
	Summary: Implement short and long versions of bit test instructions.
	Reviewed-by: kvn


changeset a0a416432508 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=a0a416432508
author: aph
date: Wed Dec 16 11:35:59 2015 +0000

	8144582, PR2922: AArch64 does not generate correct branch profile data
	Reviewed-by: kvn


changeset 03c02db49d16 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=03c02db49d16
author: fyang
date: Mon Dec 07 21:14:56 2015 +0800

	8144201, PR2922: aarch64: jdk/test/com/sun/net/httpserver/Test6a.java fails with --enable-unlimited-crypto
	Summary: Fix typo in stub generate_cipherBlockChaining_decryptAESCrypt
	Reviewed-by: roland


changeset 9b413b1b49a9 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=9b413b1b49a9
author: enevill
date: Fri Jan 08 11:39:47 2016 +0000

	8146678, PR2922: aarch64: assertion failure: call instruction in an infinite loop
	Summary: Remove assertion
	Reviewed-by: aph


changeset 8344270ca8ca in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=8344270ca8ca
author: enevill
date: Tue Jan 12 14:55:15 2016 +0000

	8146843, PR2922: aarch64: add scheduling support for FP and vector instructions
	Summary: add pipeline classes for FP/vector pipeline
	Reviewed-by: aph


changeset 31421ce3f8a1 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=31421ce3f8a1
author: aph
date: Tue Jan 19 17:52:52 2016 +0000

	8146709, PR2922: AArch64: Incorrect use of ADRP for byte_map_base
	Reviewed-by: roland


changeset 2d6aa4a52092 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=2d6aa4a52092
author: hshi
date: Wed Jan 20 04:56:51 2016 -0800

	8147805, PR2922: aarch64: C1 segmentation fault due to inline Unsafe.getAndSetObject
	Summary: In Aarch64 LIR_Assembler.atomic_op, keep stored data reference register in decompressed forms as it may be used later
	Reviewed-by: aph
	Contributed-by: hui.shi at linaro.org, felix.yang at linaro.org


changeset b0a61be7e092 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=b0a61be7e092
author: enevill
date: Tue Jan 26 14:04:01 2016 +0000

	8148240, PR2922: aarch64: random infrequent null pointer exceptions in javac
	Summary: Disable fp as an allocatable register
	Reviewed-by: aph


changeset ecca96e2dfcf in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=ecca96e2dfcf
author: andrew
date: Tue Mar 01 02:00:13 2016 +0000

	PR2922: Apply ReservedCodeCacheSize default limiting to AArch64 only.


changeset 15b7a15b9310 in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=15b7a15b9310
author: enevill
date: Thu Mar 31 08:30:30 2016 +0000

	PR2922: Add missing includes to macroAssembler_aarch64.cpp


changeset 5e587a29a6aa in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=5e587a29a6aa
author: aph
date: Thu Feb 25 14:59:44 2016 +0000

	8150652, PR2922: Remove unused code in AArch64 back end
	Reviewed-by: kvn


changeset 49b8cecd1bbe in /hg/icedtea8-forest/hotspot
details: http://icedtea.classpath.org/hg/icedtea8-forest/hotspot?cmd=changeset;node=49b8cecd1bbe
author: andrew
date: Sun Apr 10 01:08:29 2016 +0100

	Added tag icedtea-3.0.0 for changeset 5e587a29a6aa


diffstat:

 .hgtags                                                               |     1 +
 src/cpu/aarch64/vm/aarch64.ad                                         |  3743 +++++++++-
 src/cpu/aarch64/vm/assembler_aarch64.cpp                              |     5 -
 src/cpu/aarch64/vm/assembler_aarch64.hpp                              |    32 +-
 src/cpu/aarch64/vm/c1_CodeStubs_aarch64.cpp                           |    32 +-
 src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.cpp                        |    39 +-
 src/cpu/aarch64/vm/c1_LIRAssembler_aarch64.hpp                        |     8 +-
 src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.cpp                      |     4 +-
 src/cpu/aarch64/vm/c1_MacroAssembler_aarch64.hpp                      |     1 +
 src/cpu/aarch64/vm/c1_Runtime1_aarch64.cpp                            |    57 +-
 src/cpu/aarch64/vm/c2_globals_aarch64.hpp                             |     2 +-
 src/cpu/aarch64/vm/compiledIC_aarch64.cpp                             |     6 +-
 src/cpu/aarch64/vm/globalDefinitions_aarch64.hpp                      |     4 +
 src/cpu/aarch64/vm/globals_aarch64.hpp                                |    12 +-
 src/cpu/aarch64/vm/icBuffer_aarch64.cpp                               |    21 +-
 src/cpu/aarch64/vm/interp_masm_aarch64.cpp                            |    23 +-
 src/cpu/aarch64/vm/macroAssembler_aarch64.cpp                         |   330 +-
 src/cpu/aarch64/vm/macroAssembler_aarch64.hpp                         |   118 +-
 src/cpu/aarch64/vm/methodHandles_aarch64.cpp                          |     4 +-
 src/cpu/aarch64/vm/nativeInst_aarch64.cpp                             |   141 +-
 src/cpu/aarch64/vm/nativeInst_aarch64.hpp                             |    75 +-
 src/cpu/aarch64/vm/relocInfo_aarch64.cpp                              |    29 +-
 src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp                          |   541 +-
 src/cpu/aarch64/vm/stubGenerator_aarch64.cpp                          |     6 +-
 src/cpu/aarch64/vm/templateInterpreter_aarch64.cpp                    |     4 +-
 src/cpu/aarch64/vm/templateTable_aarch64.cpp                          |    12 +-
 src/cpu/aarch64/vm/vm_version_aarch64.cpp                             |     8 +
 src/cpu/aarch64/vm/vtableStubs_aarch64.cpp                            |     2 +-
 src/os_cpu/linux_aarch64/vm/os_linux_aarch64.cpp                      |     9 +-
 src/share/vm/adlc/formssel.cpp                                        |     3 +-
 src/share/vm/gc_implementation/g1/g1CodeCacheRemSet.cpp               |     4 +-
 src/share/vm/gc_implementation/parallelScavenge/psParallelCompact.hpp |     7 +-
 src/share/vm/opto/callnode.hpp                                        |    14 +
 src/share/vm/opto/graphKit.cpp                                        |     2 +-
 src/share/vm/opto/macro.cpp                                           |     8 +-
 src/share/vm/opto/memnode.cpp                                         |     4 +-
 src/share/vm/runtime/arguments.cpp                                    |    12 +-
 src/share/vm/utilities/globalDefinitions.hpp                          |     5 +
 test/compiler/codegen/8144028/BitTests.java                           |   164 +
 39 files changed, 4620 insertions(+), 872 deletions(-)

diffs (truncated from 8663 to 500 lines):

diff -r 9a57d01ddf03 -r 49b8cecd1bbe .hgtags
--- a/.hgtags	Fri Dec 18 08:55:47 2015 +0100
+++ b/.hgtags	Sun Apr 10 01:08:29 2016 +0100
@@ -830,3 +830,4 @@
 ddd297e340b1170d3cec011ee64e729f8b493c86 jdk8u77-b01
 1b4072e4bb3ad54c4e894998486a8b33f0689160 jdk8u77-b02
 e9585e814cc954c06e870f3bdf37171029da0d5e icedtea-3.0.0pre10
+5e587a29a6aac06d6b5a7ebeea99a291d82520c8 icedtea-3.0.0
diff -r 9a57d01ddf03 -r 49b8cecd1bbe src/cpu/aarch64/vm/aarch64.ad
--- a/src/cpu/aarch64/vm/aarch64.ad	Fri Dec 18 08:55:47 2015 +0100
+++ b/src/cpu/aarch64/vm/aarch64.ad	Sun Apr 10 01:08:29 2016 +0100
@@ -545,7 +545,7 @@
     R26
  /* R27, */			// heapbase
  /* R28, */			// thread
-    R29,                        // fp
+ /* R29, */                     // fp
  /* R30, */			// lr
  /* R31 */			// sp
 );
@@ -579,7 +579,7 @@
     R26, R26_H,
  /* R27, R27_H,	*/		// heapbase
  /* R28, R28_H, */		// thread
-    R29, R29_H,                 // fp
+ /* R29, R29_H, */              // fp
  /* R30, R30_H, */		// lr
  /* R31, R31_H */		// sp
 );
@@ -952,20 +952,1864 @@
   static int emit_deopt_handler(CodeBuffer& cbuf);
 
   static uint size_exception_handler() {
-    // count up to 4 movz/n/k instructions and one branch instruction
-    return 5 * NativeInstruction::instruction_size;
+    return MacroAssembler::far_branch_size();
   }
 
   static uint size_deopt_handler() {
-    // count one adr and one branch instruction
-    return 2 * NativeInstruction::instruction_size;
+    // count one adr and one far branch instruction
+    // return 4 * NativeInstruction::instruction_size;
+    return NativeInstruction::instruction_size + MacroAssembler::far_branch_size();
   }
 };
 
+  // graph traversal helpers
+
+  MemBarNode *parent_membar(const Node *n);
+  MemBarNode *child_membar(const MemBarNode *n);
+  bool leading_membar(const MemBarNode *barrier);
+
+  bool is_card_mark_membar(const MemBarNode *barrier);
+  bool is_CAS(int opcode);
+
+  MemBarNode *leading_to_normal(MemBarNode *leading);
+  MemBarNode *normal_to_leading(const MemBarNode *barrier);
+  MemBarNode *card_mark_to_trailing(const MemBarNode *barrier);
+  MemBarNode *trailing_to_card_mark(const MemBarNode *trailing);
+  MemBarNode *trailing_to_leading(const MemBarNode *trailing);
+
+  // predicates controlling emit of ldr<x>/ldar<x> and associated dmb
+
+  bool unnecessary_acquire(const Node *barrier);
+  bool needs_acquiring_load(const Node *load);
+
+  // predicates controlling emit of str<x>/stlr<x> and associated dmbs
+
+  bool unnecessary_release(const Node *barrier);
+  bool unnecessary_volatile(const Node *barrier);
+  bool needs_releasing_store(const Node *store);
+
+  // predicate controlling translation of CompareAndSwapX
+  bool needs_acquiring_load_exclusive(const Node *load);
+
+  // predicate controlling translation of StoreCM
+  bool unnecessary_storestore(const Node *storecm);
 %}
 
 source %{
 
+  // Optimizaton of volatile gets and puts
+  // -------------------------------------
+  //
+  // AArch64 has ldar<x> and stlr<x> instructions which we can safely
+  // use to implement volatile reads and writes. For a volatile read
+  // we simply need
+  //
+  //   ldar<x>
+  //
+  // and for a volatile write we need
+  //
+  //   stlr<x>
+  // 
+  // Alternatively, we can implement them by pairing a normal
+  // load/store with a memory barrier. For a volatile read we need
+  // 
+  //   ldr<x>
+  //   dmb ishld
+  //
+  // for a volatile write
+  //
+  //   dmb ish
+  //   str<x>
+  //   dmb ish
+  //
+  // We can also use ldaxr and stlxr to implement compare and swap CAS
+  // sequences. These are normally translated to an instruction
+  // sequence like the following
+  //
+  //   dmb      ish
+  // retry:
+  //   ldxr<x>   rval raddr
+  //   cmp       rval rold
+  //   b.ne done
+  //   stlxr<x>  rval, rnew, rold
+  //   cbnz      rval retry
+  // done:
+  //   cset      r0, eq
+  //   dmb ishld
+  //
+  // Note that the exclusive store is already using an stlxr
+  // instruction. That is required to ensure visibility to other
+  // threads of the exclusive write (assuming it succeeds) before that
+  // of any subsequent writes.
+  //
+  // The following instruction sequence is an improvement on the above
+  //
+  // retry:
+  //   ldaxr<x>  rval raddr
+  //   cmp       rval rold
+  //   b.ne done
+  //   stlxr<x>  rval, rnew, rold
+  //   cbnz      rval retry
+  // done:
+  //   cset      r0, eq
+  //
+  // We don't need the leading dmb ish since the stlxr guarantees
+  // visibility of prior writes in the case that the swap is
+  // successful. Crucially we don't have to worry about the case where
+  // the swap is not successful since no valid program should be
+  // relying on visibility of prior changes by the attempting thread
+  // in the case where the CAS fails.
+  //
+  // Similarly, we don't need the trailing dmb ishld if we substitute
+  // an ldaxr instruction since that will provide all the guarantees we
+  // require regarding observation of changes made by other threads
+  // before any change to the CAS address observed by the load.
+  //
+  // In order to generate the desired instruction sequence we need to
+  // be able to identify specific 'signature' ideal graph node
+  // sequences which i) occur as a translation of a volatile reads or
+  // writes or CAS operations and ii) do not occur through any other
+  // translation or graph transformation. We can then provide
+  // alternative aldc matching rules which translate these node
+  // sequences to the desired machine code sequences. Selection of the
+  // alternative rules can be implemented by predicates which identify
+  // the relevant node sequences.
+  //
+  // The ideal graph generator translates a volatile read to the node
+  // sequence
+  //
+  //   LoadX[mo_acquire]
+  //   MemBarAcquire
+  //
+  // As a special case when using the compressed oops optimization we
+  // may also see this variant
+  //
+  //   LoadN[mo_acquire]
+  //   DecodeN
+  //   MemBarAcquire
+  //
+  // A volatile write is translated to the node sequence
+  //
+  //   MemBarRelease
+  //   StoreX[mo_release] {CardMark}-optional
+  //   MemBarVolatile
+  //
+  // n.b. the above node patterns are generated with a strict
+  // 'signature' configuration of input and output dependencies (see
+  // the predicates below for exact details). The card mark may be as
+  // simple as a few extra nodes or, in a few GC configurations, may
+  // include more complex control flow between the leading and
+  // trailing memory barriers. However, whatever the card mark
+  // configuration these signatures are unique to translated volatile
+  // reads/stores -- they will not appear as a result of any other
+  // bytecode translation or inlining nor as a consequence of
+  // optimizing transforms.
+  //
+  // We also want to catch inlined unsafe volatile gets and puts and
+  // be able to implement them using either ldar<x>/stlr<x> or some
+  // combination of ldr<x>/stlr<x> and dmb instructions.
+  //
+  // Inlined unsafe volatiles puts manifest as a minor variant of the
+  // normal volatile put node sequence containing an extra cpuorder
+  // membar
+  //
+  //   MemBarRelease
+  //   MemBarCPUOrder
+  //   StoreX[mo_release] {CardMark}-optional
+  //   MemBarVolatile
+  //
+  // n.b. as an aside, the cpuorder membar is not itself subject to
+  // matching and translation by adlc rules.  However, the rule
+  // predicates need to detect its presence in order to correctly
+  // select the desired adlc rules.
+  //
+  // Inlined unsafe volatile gets manifest as a somewhat different
+  // node sequence to a normal volatile get
+  //
+  //   MemBarCPUOrder
+  //        ||       \\
+  //   MemBarAcquire LoadX[mo_acquire]
+  //        ||
+  //   MemBarCPUOrder
+  //
+  // In this case the acquire membar does not directly depend on the
+  // load. However, we can be sure that the load is generated from an
+  // inlined unsafe volatile get if we see it dependent on this unique
+  // sequence of membar nodes. Similarly, given an acquire membar we
+  // can know that it was added because of an inlined unsafe volatile
+  // get if it is fed and feeds a cpuorder membar and if its feed
+  // membar also feeds an acquiring load.
+  //
+  // Finally an inlined (Unsafe) CAS operation is translated to the
+  // following ideal graph
+  //
+  //   MemBarRelease
+  //   MemBarCPUOrder
+  //   CompareAndSwapX {CardMark}-optional
+  //   MemBarCPUOrder
+  //   MemBarAcquire
+  //
+  // So, where we can identify these volatile read and write
+  // signatures we can choose to plant either of the above two code
+  // sequences. For a volatile read we can simply plant a normal
+  // ldr<x> and translate the MemBarAcquire to a dmb. However, we can
+  // also choose to inhibit translation of the MemBarAcquire and
+  // inhibit planting of the ldr<x>, instead planting an ldar<x>.
+  //
+  // When we recognise a volatile store signature we can choose to
+  // plant at a dmb ish as a translation for the MemBarRelease, a
+  // normal str<x> and then a dmb ish for the MemBarVolatile.
+  // Alternatively, we can inhibit translation of the MemBarRelease
+  // and MemBarVolatile and instead plant a simple stlr<x>
+  // instruction.
+  //
+  // when we recognise a CAS signature we can choose to plant a dmb
+  // ish as a translation for the MemBarRelease, the conventional
+  // macro-instruction sequence for the CompareAndSwap node (which
+  // uses ldxr<x>) and then a dmb ishld for the MemBarAcquire.
+  // Alternatively, we can elide generation of the dmb instructions
+  // and plant the alternative CompareAndSwap macro-instruction
+  // sequence (which uses ldaxr<x>).
+  // 
+  // Of course, the above only applies when we see these signature
+  // configurations. We still want to plant dmb instructions in any
+  // other cases where we may see a MemBarAcquire, MemBarRelease or
+  // MemBarVolatile. For example, at the end of a constructor which
+  // writes final/volatile fields we will see a MemBarRelease
+  // instruction and this needs a 'dmb ish' lest we risk the
+  // constructed object being visible without making the
+  // final/volatile field writes visible.
+  //
+  // n.b. the translation rules below which rely on detection of the
+  // volatile signatures and insert ldar<x> or stlr<x> are failsafe.
+  // If we see anything other than the signature configurations we
+  // always just translate the loads and stores to ldr<x> and str<x>
+  // and translate acquire, release and volatile membars to the
+  // relevant dmb instructions.
+  //
+
+  // graph traversal helpers used for volatile put/get and CAS
+  // optimization
+
+  // 1) general purpose helpers
+
+  // if node n is linked to a parent MemBarNode by an intervening
+  // Control and Memory ProjNode return the MemBarNode otherwise return
+  // NULL.
+  //
+  // n may only be a Load or a MemBar.
+
+  MemBarNode *parent_membar(const Node *n)
+  {
+    Node *ctl = NULL;
+    Node *mem = NULL;
+    Node *membar = NULL;
+
+    if (n->is_Load()) {
+      ctl = n->lookup(LoadNode::Control);
+      mem = n->lookup(LoadNode::Memory);
+    } else if (n->is_MemBar()) {
+      ctl = n->lookup(TypeFunc::Control);
+      mem = n->lookup(TypeFunc::Memory);
+    } else {
+	return NULL;
+    }
+
+    if (!ctl || !mem || !ctl->is_Proj() || !mem->is_Proj()) {
+      return NULL;
+    }
+
+    membar = ctl->lookup(0);
+
+    if (!membar || !membar->is_MemBar()) {
+      return NULL;
+    }
+
+    if (mem->lookup(0) != membar) {
+      return NULL;
+    }
+
+    return membar->as_MemBar();
+  }
+
+  // if n is linked to a child MemBarNode by intervening Control and
+  // Memory ProjNodes return the MemBarNode otherwise return NULL.
+
+  MemBarNode *child_membar(const MemBarNode *n)
+  {
+    ProjNode *ctl = n->proj_out(TypeFunc::Control);
+    ProjNode *mem = n->proj_out(TypeFunc::Memory);
+
+    // MemBar needs to have both a Ctl and Mem projection
+    if (! ctl || ! mem)
+      return NULL;
+
+    MemBarNode *child = NULL;
+    Node *x;
+
+    for (DUIterator_Fast imax, i = ctl->fast_outs(imax); i < imax; i++) {
+      x = ctl->fast_out(i);
+      // if we see a membar we keep hold of it. we may also see a new
+      // arena copy of the original but it will appear later
+      if (x->is_MemBar()) {
+	  child = x->as_MemBar();
+	  break;
+      }
+    }
+
+    if (child == NULL) {
+      return NULL;
+    }
+
+    for (DUIterator_Fast imax, i = mem->fast_outs(imax); i < imax; i++) {
+      x = mem->fast_out(i);
+      // if we see a membar we keep hold of it. we may also see a new
+      // arena copy of the original but it will appear later
+      if (x == child) {
+	return child;
+      }
+    }
+    return NULL;
+  }
+
+  // helper predicate use to filter candidates for a leading memory
+  // barrier
+  //
+  // returns true if barrier is a MemBarRelease or a MemBarCPUOrder
+  // whose Ctl and Mem feeds come from a MemBarRelease otherwise false
+
+  bool leading_membar(const MemBarNode *barrier)
+  {
+    int opcode = barrier->Opcode();
+    // if this is a release membar we are ok
+    if (opcode == Op_MemBarRelease) {
+      return true;
+    }
+    // if its a cpuorder membar . . .
+    if (opcode != Op_MemBarCPUOrder) {
+      return false;
+    }
+    // then the parent has to be a release membar
+    MemBarNode *parent = parent_membar(barrier);
+    if (!parent) {
+      return false;
+    }
+    opcode = parent->Opcode();
+    return opcode == Op_MemBarRelease;
+  }
+ 
+  // 2) card mark detection helper
+
+  // helper predicate which can be used to detect a volatile membar
+  // introduced as part of a conditional card mark sequence either by
+  // G1 or by CMS when UseCondCardMark is true.
+  //
+  // membar can be definitively determined to be part of a card mark
+  // sequence if and only if all the following hold
+  //
+  // i) it is a MemBarVolatile
+  //
+  // ii) either UseG1GC or (UseConcMarkSweepGC && UseCondCardMark) is
+  // true
+  //
+  // iii) the node's Mem projection feeds a StoreCM node.
+  
+  bool is_card_mark_membar(const MemBarNode *barrier)
+  {
+    if (!UseG1GC && !(UseConcMarkSweepGC && UseCondCardMark)) {
+      return false;
+    }
+
+    if (barrier->Opcode() != Op_MemBarVolatile) {
+      return false;
+    }
+
+    ProjNode *mem = barrier->proj_out(TypeFunc::Memory);
+
+    for (DUIterator_Fast imax, i = mem->fast_outs(imax); i < imax ; i++) {
+      Node *y = mem->fast_out(i);
+      if (y->Opcode() == Op_StoreCM) {
+	return true;
+      }
+    }
+  
+    return false;
+  }
+
+
+  // 3) helper predicates to traverse volatile put or CAS graphs which
+  // may contain GC barrier subgraphs
+
+  // Preamble
+  // --------
+  //
+  // for volatile writes we can omit generating barriers and employ a
+  // releasing store when we see a node sequence sequence with a
+  // leading MemBarRelease and a trailing MemBarVolatile as follows
+  //
+  //   MemBarRelease
+  //  {      ||      } -- optional
+  //  {MemBarCPUOrder}
+  //         ||     \\
+  //         ||     StoreX[mo_release]
+  //         | \     /
+  //         | MergeMem
+  //         | /
+  //   MemBarVolatile
+  //
+  // where
+  //  || and \\ represent Ctl and Mem feeds via Proj nodes
+  //  | \ and / indicate further routing of the Ctl and Mem feeds
+  // 
+  // this is the graph we see for non-object stores. however, for a
+  // volatile Object store (StoreN/P) we may see other nodes below the
+  // leading membar because of the need for a GC pre- or post-write
+  // barrier.
+  //
+  // with most GC configurations we with see this simple variant which
+  // includes a post-write barrier card mark.
+  //
+  //   MemBarRelease______________________________
+  //         ||    \\               Ctl \        \\
+  //         ||    StoreN/P[mo_release] CastP2X  StoreB/CM
+  //         | \     /                       . . .  /
+  //         | MergeMem
+  //         | /
+  //         ||      /
+  //   MemBarVolatile
+  //
+  // i.e. the leading membar feeds Ctl to a CastP2X (which converts
+  // the object address to an int used to compute the card offset) and
+  // Ctl+Mem to a StoreB node (which does the actual card mark).
+  //
+  // n.b. a StoreCM node will only appear in this configuration when
+  // using CMS. StoreCM differs from a normal card mark write (StoreB)
+  // because it implies a requirement to order visibility of the card
+  // mark (StoreCM) relative to the object put (StoreP/N) using a
+  // StoreStore memory barrier (arguably this ought to be represented
+  // explicitly in the ideal graph but that is not how it works). This
+  // ordering is required for both non-volatile and volatile
+  // puts. Normally that means we need to translate a StoreCM using
+  // the sequence
+  //
+  //   dmb ishst
+  //   stlrb
+  //
+  // However, in the case of a volatile put if we can recognise this
+  // configuration and plant an stlr for the object write then we can
+  // omit the dmb and just plant an strb since visibility of the stlr
+  // is ordered before visibility of subsequent stores. StoreCM nodes
+  // also arise when using G1 or using CMS with conditional card
+  // marking. In these cases (as we shall see) we don't need to insert
+  // the dmb when translating StoreCM because there is already an
+  // intervening StoreLoad barrier between it and the StoreP/N.
+  //
+  // It is also possible to perform the card mark conditionally on it
+  // currently being unmarked in which case the volatile put graph
+  // will look slightly different
+  //
+  //   MemBarRelease____________________________________________
+  //         ||    \\               Ctl \     Ctl \     \\  Mem \
+  //         ||    StoreN/P[mo_release] CastP2X   If   LoadB     |
+  //         | \     /                              \            |
+  //         | MergeMem                            . . .      StoreB
+  //         | /                                                /
+  //         ||     /


More information about the distro-pkg-dev mailing list