[aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

Mon Aug 17 06:00:13 UTC 2020

Hi Andrew,

Thanks a lot for the review! Sorry for the late reply, as I was on 
vacation last week. And thanks to Pengfei and Joshua for helping 
clarifying some details in the patch.

> 
> Testing:
> 
> I was able to test this patch on a loaned Fujitsu FX700. I replicated
> your results, passing tier1 tests and the jtreg compiler tests in
> vectorization, codegen, c2/cr6340864 and loopopts.
> 

Thanks for the testing.

> I also eyeballed /some/ of the generated code to check that it looked
> ok. I'd really like to be able to do that systematically for a
> comprehensive test suite that exercised every rule but I only had the
> machine for a few days. This really ought to be done as a follow-up to
> ensure that all the rules are working as expected.
> 
> 

Yes, we would expect Pengfei's OptoAssembly check patch can get merged 
in future.

> 
> General Comments:
> 
> Sizing the NEON registers using 8 slots -- even though there might
> actually be more (or less!) slots in use for a VecA is fine. However, I
> think this needs a little bit more explanation in the .ad. file (see
> comments on ra webrev below)
> 

OK, I will try to have some more clear comments in ad file.

> I'm ok with your choice to use p7 as an always true predicate register
> and also how you choose to init and re-init from code defined via the ad
> file based on C->max_vector_size().
> 
> I am not clear why you are choosing to re-init ptrue after certain JVM
> runtime calls (e.g. when Z calls into the runtime) and not others e.g.
> when we call a JVM_ENTRY. Could you explain the rationale you have
> followed here?
> 
> 

We do the re-init at any possible return points to c2 code, not in any 
runtime c++ functions, which will reduce the re-init calls.

Actually I found those entries by some hack of jvm. In the hacky code 
below we use gcc option -finstrument-functions to build hotspot. With 
this option, each C/C++ function entry/exit will call the instrument 
functions we defined. In instrument functions, we clobber p7 (or other 
reg for test) register, and in c2 function return we verify that p7 (or 
other reg) has been reinitialized.

http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch

> 
> Specific Comments (feature webrev):
> 
> 
> globals_aarch64.hpp:102
> 
> Just out of interest why does UseSVE have range(0,2)? It seems you are
> only testing for UseSVE > 0. Does value 2 correspond to  an optional subset?
> 
> 

Thanks to Pengfei's reply for this. :-)

> 
> Specific Comments (register allocator webrev):
> 
> 
> aarch64.ad:97-100
> 
> Why have you added a reg_def for R8 and R9 here and also to alloc_class
> chunk0 at lines 544-545? They aren't used by C2 so why define them?
> 

I think Pengfei has helped to explain that. I will either add clear 
comments or rename the register name as you suggested.

> 
> assembler_aarch64.hpp:280 (also 699)
> 
> prf sets a predicate register field. pgrf sets a governing predicate
> register field. Should the name not be gprf.
> 

Thanks to Pengfei's comment.

> 
> chaitin.cpp:648-660
> 
> The comment is rather oddly formatted.
> 

Thanks!

> At line 650 you guard the assert with a test for lrg._is_vector. Is that
> not always going to be guaranteed by the outer condition
> lrg._is_scalable? If so then you should really assert lrg._is_vector.
> 
> The special case code for computation of num_regs for a vector stack
> slot also appears in this file with a slightly different organization in
> find_first_set (line 1350) and in PhaseChaitin::Select (line 1590).
> There is another similar case in RegMask::num_registers at regmask.cpp:
> 98. It would be better to factor out the common code into methods of
> LRG. Maybe using the following?
> 
>    bool LRG::is_scalable_vector() {
>      if (_is_scalable) {
>        assert(_is_vector == 1);
>        assert(_num_regs == == RegMask::SlotsPerVecA)
>        return true;
>      }
>      return false;
>    }
> 
>    int LRG::scalable_num_regs() {
>      assert(is_scalable_vector());
>      if (OptoReg::is_stack(_reg)) {
>        return _scalable_reg_slots
>      } else {
>        return num_reg_slots;
>      }
>    }
> 
> 
> chaitin.cpp:1350
> 
> Once again the test for lrg._is_vector should be guaranteed by the outer
> test of lrg._is_scalable. Refactoring using the common methods of LRG as
> above ought to help.
> 
> chaitin.cpp:1591
> 
> Use common method code.
> 
> 
> postaloc.cpp:308/323
> 
> Once again you should be able to use common method code of LRG here.
> 
> 
> regmask.cpp:91
> 
> Once again you should be able to use common method code of LRG here.
> 

As Joshua clarified, we are also working on predicate scalable reg, 
which is not in this patch. Thanks for the suggestion, I will try to 
refactor this a bit.

> Specific Comments (c2 webrev):
> 
> 
> aarch64.ad:3815
> 
> very nice defensive check!
> 
> 
> assembler_aarch64.hpp:2469 & 2699+
> 
> Andrew Haley is definitely going to ask you to update function entry
> (assembler_aarch64.cpp:76) to call these new instruction generation
> methods and then validate the generated code using asm_check So, I guess
> you might as well do that now ;-)
> 
>

Yes! :-) Will add the test code. Thanks!

> zBarrierSetAssembler_aarch64.cpp:434
> 
> Can you explain why we need to check p7 here and not do so in other
> places where we call into the JVM? I'm not saying this is wrong. I just
> want to know how you decided where re-init of p7 was needed.
>

Actually I found this by my hack patch above while running jtreg tests. 
The stub slowpath here can be a c++ function.

> superword.cpp:97
> 
> Does this mean that is someone sets the maximum vector size to a
> non-power of two, such as 384, all superword operations will be
> bypassed? Including those which can be done using NEON vectors?
> 

Current SLP vectorizer only supports power-of-2 vector size. We are 
trying to work out a new vectorizer to support all SVE vector sizes, so 
we would expect a size like 384 could go to that path. I tried current 
patch on a 512-bit SVE hardware which does not support 384-bit:

$ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same)
openjdk version "16-internal" 2021-03-16

$ java -XX:MaxVectorSize=48 -version
OpenJDK 64-Bit Server VM warning: Current system only supports max SVE 
vector length 32. Set MaxVectorSize to 32

(Fallbacks to 32 and issue a warning, as the prctl() call returns 32 
instead of unsupported 48: 
https://www.kernel.org/doc/Documentation/arm64/sve.txt)

Do you think we need to exit vm instead of warning and fallbacking to 32 
here?

Thanks,
Ningsheng