[aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support
Ningsheng Jian
ningsheng.jian at arm.com
Wed Aug 19 10:40:49 UTC 2020
Hi Magnus,
Thanks for the review!
On 8/19/20 6:05 PM, Magnus Ihse Bursie wrote:
> On 2020-08-19 11:53, Ningsheng Jian wrote:
>> Hi Andrew,
>>
>> I have updated the patch based on the review comments. Would you mind
>> taking another look? Thanks!
>>
>> Full:
>> http://cr.openjdk.java.net/~njian/8231441/webrev.04/
> Build changes look good. Thank you for remembering to cc build-dev!
>
> This is maybe not relevant, but I was surprised to find
> src/hotspot/cpu/aarch64/aarch64-asmtest.py, because a) it's python code,
> and b) the name implies that it is a test, even though that it resides
> in src. Is this really proper?
This handy script is used to (manually) generate some code in
assembler_aarch64.cpp. The generated code is for assembler smoke test,
so it named that. It's helpful to make sure the assembler emits correct
binary code, but I am not sure whether a python code in the project is
proper or not.
Thanks,
Ningsheng
>
> /Magnus
>>
>> Incremental:
>> http://cr.openjdk.java.net/~njian/8231441/webrev.04-vs-03/
>>
>> Also add build-dev, as there's a makefile change.
>>
>> And the split parts:
>>
>> 1) SVE feature detection:
>> http://cr.openjdk.java.net/~njian/8231441/webrev.04-feature
>>
>> 2) c2 register allocation:
>> http://cr.openjdk.java.net/~njian/8231441/webrev.04-ra
>>
>> 3) SVE c2 backend:
>> http://cr.openjdk.java.net/~njian/8231441/webrev.04-c2
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
>> CSR: https://bugs.openjdk.java.net/browse/JDK-8248742
>>
>> JTreg tests are still running, and so far no new failure found.
>>
>> Thanks,
>> Ningsheng
>>
>> On 8/17/20 5:16 PM, Andrew Dinn wrote:
>>> Hi Pengfei,
>>>
>>> On 17/08/2020 07:00, Ningsheng Jian wrote:
>>>> Thanks a lot for the review! Sorry for the late reply, as I was on
>>>> vacation last week. And thanks to Pengfei and Joshua for helping
>>>> clarifying some details in the patch.
>>>
>>> Yes, they did a very good job of answering most of the pending
>>> questions.
>>>
>>>>> I also eyeballed /some/ of the generated code to check that it looked
>>>>> ok. I'd really like to be able to do that systematically for a
>>>>> comprehensive test suite that exercised every rule but I only had the
>>>>> machine for a few days. This really ought to be done as a follow-up to
>>>>> ensure that all the rules are working as expected.
>>>>
>>>> Yes, we would expect Pengfei's OptoAssembly check patch can get merged
>>>> in future.
>>>
>>> I'm fine with that as a follow-up patch if you raise a JIRA for it.
>>>
>>>>> I am not clear why you are choosing to re-init ptrue after certain JVM
>>>>> runtime calls (e.g. when Z calls into the runtime) and not others e.g.
>>>>> when we call a JVM_ENTRY. Could you explain the rationale you have
>>>>> followed here?
>>>>
>>>> We do the re-init at any possible return points to c2 code, not in any
>>>> runtime c++ functions, which will reduce the re-init calls.
>>>>
>>>> Actually I found those entries by some hack of jvm. In the hacky code
>>>> below we use gcc option -finstrument-functions to build hotspot. With
>>>> this option, each C/C++ function entry/exit will call the instrument
>>>> functions we defined. In instrument functions, we clobber p7 (or other
>>>> reg for test) register, and in c2 function return we verify that p7 (or
>>>> other reg) has been reinitialized.
>>>>
>>>> http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch
>>>>
>>>
>>> Nice work. It's very good to have that documented. I'm willing to accept
>>> i) that this has found all current cases and ii) that the verify will
>>> catch any cases that might get introduced by future changes (e.g. the
>>> callout introduced by ZGC that you mention below). As the above mot say
>>> there is a slim chance this might have missed some cases but I think it
>>> is pretty unlikely.
>>>
>>>
>>>>> Specific Comments (register allocator webrev):
>>>>>
>>>>>
>>>>> aarch64.ad:97-100
>>>>>
>>>>> Why have you added a reg_def for R8 and R9 here and also to
>>>>> alloc_class
>>>>> chunk0 at lines 544-545? They aren't used by C2 so why define them?
>>>>>
>>>>
>>>> I think Pengfei has helped to explain that. I will either add clear
>>>> comments or rename the register name as you suggested.
>>>
>>> Ok, good.
>>>
>>>> As Joshua clarified, we are also working on predicate scalable reg,
>>>> which is not in this patch. Thanks for the suggestion, I will try to
>>>> refactor this a bit.
>>>
>>> Ok, I'll wait for an updated patch. Are you planning to include the
>>> scalable predicate reg code as part of this patch? I think that would be
>>> better as it would help to clarify the need to distinguish vector regs
>>> as a subset of scalable regs.
>>>
>>>>> zBarrierSetAssembler_aarch64.cpp:434
>>>>>
>>>>> Can you explain why we need to check p7 here and not do so in other
>>>>> places where we call into the JVM? I'm not saying this is wrong. I
>>>>> just
>>>>> want to know how you decided where re-init of p7 was needed.
>>>>>
>>>>
>>>> Actually I found this by my hack patch above while running jtreg tests.
>>>> The stub slowpath here can be a c++ function.
>>>
>>> Yes, good catch.
>>>
>>>>> superword.cpp:97
>>>>>
>>>>> Does this mean that is someone sets the maximum vector size to a
>>>>> non-power of two, such as 384, all superword operations will be
>>>>> bypassed? Including those which can be done using NEON vectors?
>>>>>
>>>>
>>>> Current SLP vectorizer only supports power-of-2 vector size. We are
>>>> trying to work out a new vectorizer to support all SVE vector sizes, so
>>>> we would expect a size like 384 could go to that path. I tried current
>>>> patch on a 512-bit SVE hardware which does not support 384-bit:
>>>>
>>>> $ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same)
>>>> openjdk version "16-internal" 2021-03-16
>>>>
>>>> $ java -XX:MaxVectorSize=48 -version
>>>> OpenJDK 64-Bit Server VM warning: Current system only supports max SVE
>>>> vector length 32. Set MaxVectorSize to 32
>>>>
>>>> (Fallbacks to 32 and issue a warning, as the prctl() call returns 32
>>>> instead of unsupported 48:
>>>> https://www.kernel.org/doc/Documentation/arm64/sve.txt)
>>>>
>>>> Do you think we need to exit vm instead of warning and fallbacking
>>>> to 32
>>>> here?
>>>
>>> Yes, I think a vm exit would probably be a better choice.
>>>
>>> regards,
>>>
>>>
>>> Andrew Dinn
>>> -----------
>>> Red Hat Distinguished Engineer
>>> Red Hat UK Ltd
>>> Registered in England and Wales under Company Registration No. 03798903
>>> Directors: Michael Cunningham, Michael ("Mike") O'Neill
>>>
>>
>
More information about the aarch64-port-dev
mailing list