RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4]
Andrew Haley
aph at openjdk.org
Wed Aug 13 12:30:15 UTC 2025
On Tue, 12 Aug 2025 09:50:21 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> For `loadConH`, LLVM and GCC use
>>
>> mov wscratch, #const
>> dup v0.4h, wscratch
>>
>> We should investigate that.
>>
>> As far as I can see, LLVM and GCC do this for all vector immediates that don't need more than 2 movz/movk instructions.
>
>> HI @theRealAph Thanks a lot for your comment. I feel it is a good idea to modify `loadConH` to move a constant instead of doing an `ldr` from the constant pool (it could probably get us some performance benefit as well). However, the scope of this ticket was to mainly fix the JTREG errors that >16B SVE machines were running into due to illegal immediates being passed to the `sve_dup` instruction. Would it be acceptable if I push this fix first and then create a follow up task to work on optimizing `loadConH`? I can create a new JBS ticket and assign it to myself and tag it here as well if that helps. Thank you!
>
> Well, yes, but I'm proposing a simpler and better fix to the problem. Sure, if you want to do this in two steps go ahead.
> > > HI @theRealAph Thanks a lot for your comment. I feel it is a good idea to modify `loadConH` to move a constant instead of doing an `ldr` from the constant pool (it could probably get us some performance benefit as well). However, the scope of this ticket was to mainly fix the JTREG errors that >16B SVE machines were running into due to illegal immediates being passed to the `sve_dup` instruction. Would it be acceptable if I push this fix first and then create a follow up task to work on optimizing `loadConH`? I can create a new JBS ticket and assign it to myself and tag it here as well if that helps. Thank you!
> >
> >
> > Well, yes, but I'm proposing a simpler and better fix to the problem. Sure, if you want to do this in two steps go ahead.
>
> Apologies, I thought I could change just the replicate backend nodes to be able to generate the `mov` to a scratch reg -> `dup` to replicate the value but missed the point that I can't still get rid of the `loadConH` node that loads the immediate from the constant pool.
Why not?
> If we want to change `loadConH` to instead generate a `mov` of an immediate to a scratch register, then we might have to change the `dst` from being a `vRegF` to a `iRegI`
I don't understand.
Why not do something along these lines?
// Replicate a 16-bit half precision float
instruct replicateHF_imm8_gt128b(vReg dst, immHDupV con) %{
predicate(Matcher::vector_length_in_bytes(n) > 16);
match(Set dst (Replicate con));
format %{ "replicateHF_imm8_gt128b $dst, $con\t# vector > 128 bits" %}
ins_encode %{
assert(UseSVE > 0, "must be sve");
if (constant fits) {
__ sve_dup($dst$$FloatRegister, __ H, (int)($con$$constant));
} else
__ mov(rscratch1, (int)($con$$constant));
__ sve_dup($dst$$FloatRegister, __ H, rscratch1);
}
%}
ins_pipe(pipe_slow);
%}
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3183701776
More information about the hotspot-compiler-dev
mailing list