[aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

Mon Aug 10 08:43:34 UTC 2020

Hi Pengfei,

On 10/08/2020 05:45, Pengfei Li wrote:

>> I also eyeballed /some/ of the generated code to check that it
>> looked ok. I'd really like to be able to do that systematically for
>> a comprehensive test suite that exercised every rule but I only had
>> the machine for a few days. This really ought to be done as a
>> follow-up to ensure that all the rules are working as expected.
> 
> Not sure if you have tried my newly added test in the vectorization
> folder. It checks if expected SVE/NEON instructions are generated as
> expected for each C2 vectornode by checking the OptoAssembly output.
> I put it in another webrev so you may have missed it. 
> http://cr.openjdk.java.net/~pli/rfr/8231441/jtreg.webrev.00/

Ah, thank you. That was not in the patch I Ningsheng pointed me at. It
is exactly what is needed to check the generation rules are all working.

>> Just out of interest why does UseSVE have range(0,2)? It seems you
>> are only testing for UseSVE > 0. Does value 2 correspond to  an
>> optional subset?

> AArch64 SVE has multiple versions. Current Fujitsu FX machine
> supports SVE1 only. We leave 2 here for SVE2 support in the near
> future. 
> https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator/resources/tutorials/sve/sve-vs-sve2/introduction-to-sve2

Ah ok, thanks. Got it. Being able to switch on level 1 without level 2
is a good idea.

>> Why have you added a reg_def for R8 and R9 here and also to
>> alloc_class chunk0 at lines 544-545? They aren't used by C2 so why
>> define them?
> 
> This has no functionality change to the two scratch registers. But if
> these are missing in the register definition, the regmask for vector
> registers won't start at an aligned position. So we prefer adding
> them back to make the computation easier.

It would be good to make this clear with a comment. Also, I think you
should change the name of the registers to R8_UNUSED and R9_UNUSED just
to emphasize that these are not expected to be included in any register
sets.

>> prf sets a predicate register field. pgrf sets a governing
>> predicate register field. Should the name not be gprf.
> 
> I guess the reason is that the ArmARM doc says "the Pg field".

Ok, let's leave it at that then and blame ARM ;-)

>> chaitin.cpp:648-660
>> 
>> The comment is rather oddly formatted.
> 
> Thanks for catching this.

Well, that's what reviews are for ...

>> At line 650 you guard the assert with a test for lrg._is_vector. Is
>> that not always going to be guaranteed by the outer condition
>> lrg._is_scalable? If so then you should really assert
>> lrg._is_vector.
>> 
>> . . .

> Thanks for above suggestions. We will consider refactoring these
> parts.

Ok, I'll wait for an updated webrev.

>> Andrew Haley is definitely going to ask you to update function
>> entry (assembler_aarch64.cpp:76) to call these new instruction
>> generation methods and then validate the generated code using
>> asm_check So, I guess you might as well do that now ;-)
> 
> Thanks for letting us know. We will check how to validate those.

Ok, thanks.

>> Can you explain why we need to check p7 here and not do so in other
>> places where we call into the JVM? I'm not saying this is wrong. I
>> just want to know how you decided where re-init of p7 was needed.
> 
> Sorry I don't know how the places are decided. But I will ask
> Ningsheng to explain this question and reply you later.

Sure, thanks.

>> Does this mean that is someone sets the maximum vector size to a
>> non- power of two, such as 384, all superword operations will be
>> bypassed? Including those which can be done using NEON vectors?
> 
> The existing SLP doesn't support non-power-of-2 vector size (there
> are some assertions inside) so we added this. Yes, it's better if we
> have some mechanism to fall back to NEON for non-power-of-2 size. But
> so far in practice, we don't know any real chip implements the
> non-power-of-2 vector size. Also, we are now working on a new
> predicate-driven auto-vectorization pass to support SVE better. Do
> you think it's ok if we print some warnings if someone sets a
> non-power-of-2 size in vm options? Or any other suggestions in the
> short term?

Well, the test for MaxVectorSize in vm_version.cpp currently only
ensures it has been set to a multiple of 16. I think you probably ought
to check for a power of two at that point and exit the VM otherwise. If
hardware comes along that supports a non-power of two we can deal with
it at that point.

regards,

Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill