RFR: AArch64: compiler/7196199/Test7196199.java fails on AArch64-Linux with MaxVectorSize > 8

Felix Yang felix.yang at linaro.org
Sun Sep 13 13:52:56 UTC 2015


Hi,

  Thanks for pointing out the MaxVectorSize issue.  I modified the patch
accordingly.  Also added the handling of oopmap for the call site.  The v2
patch is posted below.

  I can provoke the bug using OpenJDK8 on my 16 & 32-core AArch64 servers
by executing the following command:
        $java -Xmx32m -Xbatch -XX:+IgnoreUnrecognizedVMOptions
-XX:-TieredCompilation -XX:CompileCommand=exclude,Test7196199.test
-XX:+SafepointALot -XX:GuaranteedSafepointInterval=100 Test7196199
  And the latest Linaro OpenJDK8 & 9 releases both failed the test too.

  Here is the JTReg regression test result for AArch64 OpenJDK8:
  AArch64 JDK8 JTReg test result (without v2 patch):
        1. hotspot:     Test results: passed: 683; failed: 8; error: 5
        2. nashorn:     Test results: passed: 17
        3. langtools:   Test results: passed: 3,090; failed: 1; error: 7
        4. jdk:         Test results: passed: 4,763; failed: 1,052; error:
137

  AArch64 JDK8 JTReg test result (with v2 patch):
        1. hotspot:     Test results: passed: 684; failed: 7; error: 5
(New Passed:  compiler/7196199/Test7196199.java;        New Failed:  No)
        2. nashorn:     Test results: passed: 17
        3. langtools:   Test results: passed: 3,089; failed: 2; error: 7
    (New Passed:  tools/javac/generics/rawOverride/7062745/
GenericOverrideTest.java

                             New Failed:   tools/javac/cast/intersection/
IntersectionTypeCastTest.java

                             New Failed:   tools/javac/lambda/
TestInvokeDynamic.java)
        4. jdk:         Test results: passed: 4,762; failed: 1,054; error:
136  (New Passed:  com/sun/net/httpserver/Test9.java

                                     New Failed:
com/sun/net/httpserver/Test13.java

                                     New Failed:  sun/security/krb5/auto/
MaxRetries.java

                                     Error changed to Failed:
java/net/DatagramSocket/SetDatagramSocketImplFactory/ADatagramSocket.java

                                     Error changed to Failed:
java/rmi/activation/rmidViaInheritedChannel/InheritedChannelNotServerSocke
t.java

                                     Failed changed to Error:
java/rmi/activation/rmidViaInheritedChannel/RmidViaInheritedChannel.java)


The test result for langtools & jdk is just a reference here, because we
find that the testcases may fail randomly on our platform.  I also tested
specJBB2005, there is no noticeable performance change when running 32
warehouses.

The v1 patch is generated from AArch64 OpenJDK8 and it turns out that the
TABS in it comes from the original OpenJDK8 source code.  The v2 patch can
be applied cleanly to OpenJDK9.  Will move on to OpenJDK9 for future
development soon.

How about this one?  Thanks.


v2 PATCH:

diff -r 1c453a12be30 src/cpu/aarch64/vm/macroAssembler_aarch64.cpp
--- a/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp Fri Sep 11 13:00:54
2015 -0700
+++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.cpp Sun Sep 13 16:43:55
2015 +0800
@@ -2286,18 +2286,30 @@
 }
 #endif

-void MacroAssembler::push_CPU_state() {
-    push(0x3fffffff, sp);         // integer registers except lr & sp
-
+void MacroAssembler::push_CPU_state(bool save_vectors) {
+  push(0x3fffffff, sp);         // integer registers except lr & sp
+
+  if (!save_vectors) {
     for (int i = 30; i >= 0; i -= 2)
       stpd(as_FloatRegister(i), as_FloatRegister(i+1),
            Address(pre(sp, -2 * wordSize)));
+  } else {
+    for (int i = 30; i >= 0; i -= 2)
+      stpq(as_FloatRegister(i), as_FloatRegister(i+1),
+           Address(pre(sp, -4 * wordSize)));
+  }
 }

-void MacroAssembler::pop_CPU_state() {
-  for (int i = 0; i < 32; i += 2)
-    ldpd(as_FloatRegister(i), as_FloatRegister(i+1),
-         Address(post(sp, 2 * wordSize)));
+void MacroAssembler::pop_CPU_state(bool restore_vectors) {
+  if (!restore_vectors) {
+    for (int i = 0; i < 32; i += 2)
+      ldpd(as_FloatRegister(i), as_FloatRegister(i+1),
+           Address(post(sp, 2 * wordSize)));
+  } else {
+    for (int i = 0; i < 32; i += 2)
+      ldpq(as_FloatRegister(i), as_FloatRegister(i+1),
+           Address(post(sp, 4 * wordSize)));
+  }

   pop(0x3fffffff, sp);         // integer registers except lr & sp
 }
diff -r 1c453a12be30 src/cpu/aarch64/vm/macroAssembler_aarch64.hpp
--- a/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp Fri Sep 11 13:00:54
2015 -0700
+++ b/src/cpu/aarch64/vm/macroAssembler_aarch64.hpp Sun Sep 13 16:43:55
2015 +0800
@@ -777,8 +777,8 @@

   DEBUG_ONLY(void verify_heapbase(const char* msg);)

-  void push_CPU_state();
-  void pop_CPU_state() ;
+  void push_CPU_state(bool save_vectors = false);
+  void pop_CPU_state(bool restore_vectors = false) ;

   // Round up to a power of two
   void round_to(Register reg, int modulus);
diff -r 1c453a12be30 src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp
--- a/src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp Fri Sep 11 13:00:54 2015
-0700
+++ b/src/cpu/aarch64/vm/sharedRuntime_aarch64.cpp Sun Sep 13 16:43:55 2015
+0800
@@ -75,8 +75,8 @@
 // FIXME -- this is used by C1
 class RegisterSaver {
  public:
-  static OopMap* save_live_registers(MacroAssembler* masm, int
additional_frame_words, int* total_frame_words);
-  static void restore_live_registers(MacroAssembler* masm);
+  static OopMap* save_live_registers(MacroAssembler* masm, int
additional_frame_words, int* total_frame_words, bool save_vectors = false);
+  static void restore_live_registers(MacroAssembler* masm, bool
restore_vectors = false);

   // Offsets into the register save area
   // Used by deoptimization when it is managing result register
@@ -108,7 +108,17 @@

 };

-OopMap* RegisterSaver::save_live_registers(MacroAssembler* masm, int
additional_frame_words, int* total_frame_words) {
+OopMap* RegisterSaver::save_live_registers(MacroAssembler* masm, int
additional_frame_words, int* total_frame_words, bool save_vectors) {
+#ifdef COMPILER2
+  if (save_vectors) {
+    // Save upper half of vector registers
+    int vect_words = 32 * 8 / wordSize;
+    additional_frame_words += vect_words;
+  }
+#else
+  assert(!save_vectors, "vectors are generated only by C2");
+#endif
+
   int frame_size_in_bytes = round_to(additional_frame_words*wordSize +
                                      reg_save_size*BytesPerInt, 16);
   // OopMap frame size is in compiler stack slots (jint's) not bytes or
words
@@ -122,7 +132,7 @@
   // Save registers, fpu state, and flags.

   __ enter();
-  __ push_CPU_state();
+  __ push_CPU_state(save_vectors);

   // Set an oopmap for the call site.  This oopmap will map all
   // oop-registers and debug-info registers as callee-saved.  This
@@ -139,14 +149,14 @@
                                     // register slots are 8 bytes
                                     // wide, 32 floating-point
                                     // registers
-      oop_map->set_callee_saved(VMRegImpl::stack2reg(sp_offset),
+      oop_map->set_callee_saved(VMRegImpl::stack2reg(sp_offset +
additional_frame_slots),
                                 r->as_VMReg());
     }
   }

   for (int i = 0; i < FloatRegisterImpl::number_of_registers; i++) {
     FloatRegister r = as_FloatRegister(i);
-    int sp_offset = 2 * i;
+    int sp_offset = save_vectors ? (4 * i) : (2 * i);
     oop_map->set_callee_saved(VMRegImpl::stack2reg(sp_offset),
                               r->as_VMReg());
   }
@@ -154,8 +164,11 @@
   return oop_map;
 }

-void RegisterSaver::restore_live_registers(MacroAssembler* masm) {
-  __ pop_CPU_state();
+void RegisterSaver::restore_live_registers(MacroAssembler* masm, bool
restore_vectors) {
+#ifndef COMPILER2
+  assert(!restore_vectors, "vectors are generated only by C2");
+#endif
+  __ pop_CPU_state(restore_vectors);
   __ leave();
 }

@@ -177,9 +190,9 @@
 }

 // Is vector's size (in bytes) bigger than a size saved by default?
-// 16 bytes XMM registers are saved by default using fxsave/fxrstor
instructions.
+// 8 bytes vector registers are saved by default on AArch64.
 bool SharedRuntime::is_wide_vector(int size) {
-  return size > 16;
+  return size > 8;
 }
 // The java_calling_convention describes stack locations as ideal slots on
 // a frame with no abi restrictions. Since we must observe abi restrictions
@@ -2742,7 +2755,7 @@
   bool save_vectors = (poll_type == POLL_AT_VECTOR_LOOP);

   // Save registers, fpu state, and flags
-  map = RegisterSaver::save_live_registers(masm, 0, &frame_size_in_words);
+  map = RegisterSaver::save_live_registers(masm, 0, &frame_size_in_words,
save_vectors);

   // The following is basically a call_VM.  However, we need the precise
   // address of the call in order to generate an oopmap. Hence, we do all
the
@@ -2793,7 +2806,7 @@
   __ bind(noException);

   // Normal exit, restore registers and exit.
-  RegisterSaver::restore_live_registers(masm);
+  RegisterSaver::restore_live_registers(masm, save_vectors);

   __ ret(lr);



On 10 September 2015 at 00:47, Edward Nevill <edward.nevill at gmail.com>
wrote:

> On Wed, 2015-09-09 at 22:39 +0800, Felix Yang wrote:
> > Hi JIT developers,
> >
> >     The testcase passes if MaxVectorSize < 16 or UseLoopSafepoints option
> > is turned off.
> >     After warmup, the test_incrc method failed verification once loop
> safe
> > point interrupt triggers for the outer loop.
>
> Hi,
>
> Thanks for finding this!
>
> I am unable to provoke it using the test case you mention, but it is
> definitely a bug. What command do you use to provoke it? The command I
> tried was
>
> /home/ed/images/jdk/bin/java -jar lib/jtreg.jar -nr -conc:48 -timeout:3
> -othervm -jdk:/home/ed/images/jdk -vmoption:-XX:MaxVectorSize=16 -v1 -a
> -ignore:quiet
> /home/ed/jdk9-dev/hs-comp/hotspot/test/compiler/runtime/7196199/Test7196199.java
>
> > +void MacroAssembler::push_CPU_state(bool save_vectors) {
> > +  push(0x3fffffff, sp);         // integer registers except lr & sp
> > +
> > +  if (!save_vectors) {
> >      for (int i = 30; i >= 0; i -= 2)
> >        stpd(as_FloatRegister(i), as_FloatRegister(i+1),
> >     Address(pre(sp, -2 * wordSize)));
>
> Your patch has TABS in it which means it does not apply cleanly. You may
> like to turn off tab compression/expansion in your editor.
>
> > +#ifdef COMPILER2
> > +  if (save_vectors) {
> > +    assert(MaxVectorSize == 16, "only 128bit vectors are supported
> now");
> ------------------------------^^ ???
> > +    // Save upper half of vector registers
> > +    int vect_words = 32 * 8 / wordSize;
> > +    additional_frame_words += vect_words;
> > +  }
> > +#else
> > +  assert(!save_vectors, "vectors are generated only by C2");
> > +#endif
>
>
> > -void RegisterSaver::restore_live_registers(MacroAssembler* masm) {
> > -  __ pop_CPU_state();
> > +void RegisterSaver::restore_live_registers(MacroAssembler* masm, bool
> > restore_vectors) {
> > +#ifdef COMPILER2
> > +  if (restore_vectors) {
> > +    assert(MaxVectorSize == 16, "only 128bit vectors are supported
> now");
> -------------------------------^^ ???
> > +  }
> > +#else
> > +  assert(!restore_vectors, "vectors are generated only by C2");
> > +#endif
> > +  __ pop_CPU_state(restore_vectors);
> >    __ leave();
> >  }
> >
> > @@ -176,9 +194,9 @@
> >  }
>
> > +
> > +#ifdef COMPILER2
> > +  MaxVectorSize = 16;
> --------------------^^ ???
> > +#endif
> >  }
>
> Why is it necessary to restrict the vector size to 16? This stops the user
> from doing, for example, -XX:MaxVectorSize=8.
>
> aarch64.ad ceils the MaxVectorSize at 16
>
> // Vector width in bytes.
> const int Matcher::vector_width_in_bytes(BasicType bt) {
>   int size = MIN2(16,(int)MaxVectorSize);
>   // Minimum 2 values in vector
>   if (size < 2*type2aelembytes(bt)) size = 0;
>   // But never < 4
>   if (size < 4) size = 0;
>   return size;
> }
>
> All the best,
> Ed.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150913/cbf120f2/attachment-0001.html>


More information about the hotspot-compiler-dev mailing list