[aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

Mon Aug 10 04:45:17 UTC 2020

Hi Andrew Dinn,

Thank you so much for taking the time to review this.

As Ningsheng is on leave this week, I will attempt to answer your specific questions based on what I know. I'm sorry that I'm not able to answer all your questions since I'm not familiar with every detail of the patch. And you may still need to wait him coming back to update the webrev(s).

> I was able to test this patch on a loaned Fujitsu FX700. I replicated your
> results, passing tier1 tests and the jtreg compiler tests in vectorization,
> codegen, c2/cr6340864 and loopopts.
> 
> I also eyeballed /some/ of the generated code to check that it looked ok. I'd
> really like to be able to do that systematically for a comprehensive test suite
> that exercised every rule but I only had the machine for a few days. This
> really ought to be done as a follow-up to ensure that all the rules are working
> as expected.

Not sure if you have tried my newly added test in the vectorization folder. It checks if expected SVE/NEON instructions are generated as expected for each C2 vectornode by checking the OptoAssembly output. I put it in another webrev so you may have missed it.
http://cr.openjdk.java.net/~pli/rfr/8231441/jtreg.webrev.00/

> Specific Comments (feature webrev):
> 
> 
> globals_aarch64.hpp:102
> 
> Just out of interest why does UseSVE have range(0,2)? It seems you are only
> testing for UseSVE > 0. Does value 2 correspond to  an optional subset?

AArch64 SVE has multiple versions. Current Fujitsu FX machine supports SVE1 only. We leave 2 here for SVE2 support in the near future.
https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator/resources/tutorials/sve/sve-vs-sve2/introduction-to-sve2

> Specific Comments (register allocator webrev):
> 
> 
> aarch64.ad:97-100
> 
> Why have you added a reg_def for R8 and R9 here and also to alloc_class
> chunk0 at lines 544-545? They aren't used by C2 so why define them?

This has no functionality change to the two scratch registers. But if these are missing in the register definition, the regmask for vector registers won't start at an aligned position. So we prefer adding them back to make the computation easier.

> assembler_aarch64.hpp:280 (also 699)
> 
> prf sets a predicate register field. pgrf sets a governing predicate register field.
> Should the name not be gprf.

I guess the reason is that the ArmARM doc says "the Pg field".

> chaitin.cpp:648-660
> 
> The comment is rather oddly formatted.

Thanks for catching this.

> At line 650 you guard the assert with a test for lrg._is_vector. Is that not
> always going to be guaranteed by the outer condition lrg._is_scalable? If so
> then you should really assert lrg._is_vector.
> 
> The special case code for computation of num_regs for a vector stack slot
> also appears in this file with a slightly different organization in find_first_set
> (line 1350) and in PhaseChaitin::Select (line 1590).
> There is another similar case in RegMask::num_registers at regmask.cpp:
> 98. It would be better to factor out the common code into methods of LRG.
> Maybe using the following?
> 
>   bool LRG::is_scalable_vector() {
>     if (_is_scalable) {
>       assert(_is_vector == 1);
>       assert(_num_regs == == RegMask::SlotsPerVecA)
>       return true;
>     }
>     return false;
>   }
> 
>   int LRG::scalable_num_regs() {
>     assert(is_scalable_vector());
>     if (OptoReg::is_stack(_reg)) {
>       return _scalable_reg_slots
>     } else {
>       return num_reg_slots;
>     }
>   }
> 
> 
> chaitin.cpp:1350
> 
> Once again the test for lrg._is_vector should be guaranteed by the outer test
> of lrg._is_scalable. Refactoring using the common methods of LRG as above
> ought to help.
> 
> chaitin.cpp:1591
> 
> Use common method code.
> 
> 
> postaloc.cpp:308/323
> 
> Once again you should be able to use common method code of LRG here.
> 
> 
> regmask.cpp:91
> 
> Once again you should be able to use common method code of LRG here.

Thanks for above suggestions. We will consider refactoring these parts.

> Specific Comments (c2 webrev):
> 
> 
> aarch64.ad:3815
> 
> very nice defensive check!
> 
> 
> assembler_aarch64.hpp:2469 & 2699+
> 
> Andrew Haley is definitely going to ask you to update function entry
> (assembler_aarch64.cpp:76) to call these new instruction generation
> methods and then validate the generated code using asm_check So, I guess
> you might as well do that now ;-)

Thanks for letting us know. We will check how to validate those.

> zBarrierSetAssembler_aarch64.cpp:434
> 
> Can you explain why we need to check p7 here and not do so in other places
> where we call into the JVM? I'm not saying this is wrong. I just want to know
> how you decided where re-init of p7 was needed.

Sorry I don't know how the places are decided. But I will ask Ningsheng to explain this question and reply you later.

> superword.cpp:97
> 
> Does this mean that is someone sets the maximum vector size to a non-
> power of two, such as 384, all superword operations will be bypassed?
> Including those which can be done using NEON vectors?

The existing SLP doesn't support non-power-of-2 vector size (there are some assertions inside) so we added this. Yes, it's better if we have some mechanism to fall back to NEON for non-power-of-2 size. But so far in practice, we don't know any real chip implements the non-power-of-2 vector size. Also, we are now working on a new predicate-driven auto-vectorization pass to support SVE better. Do you think it's ok if we print some warnings if someone sets a non-power-of-2 size in vm options? Or any other suggestions in the short term?

--
Thanks,
Pengfei