RFR: 8361582: AArch64: Some ConH values cannot be replicated with SVE [v4]

Wed Aug 13 12:30:15 UTC 2025

On Tue, 12 Aug 2025 09:50:21 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> For `loadConH`, LLVM and GCC use
>> 
>>         mov     wscratch, #const
>>         dup     v0.4h, wscratch
>> 
>> We should investigate that.
>> 
>> As far as I can see, LLVM and GCC do this for all vector immediates that don't need more than 2 movz/movk instructions.
>
>> HI @theRealAph Thanks a lot for your comment. I feel it is a good idea to modify `loadConH` to move a constant instead of doing an `ldr` from the constant pool (it could probably get us some performance benefit as well). However, the scope of this ticket was to mainly fix the JTREG errors that >16B SVE machines were running into due to illegal immediates being passed to the `sve_dup` instruction. Would it be acceptable if I push this fix first and then create a follow up task to work on optimizing `loadConH`? I can create a new JBS ticket and assign it to myself and tag it here as well if that helps. Thank you!
> 
> Well, yes, but I'm proposing a simpler and better fix to the problem. Sure, if you want to do this in two steps go ahead.

> > > HI @theRealAph Thanks a lot for your comment. I feel it is a good idea to modify `loadConH` to move a constant instead of doing an `ldr` from the constant pool (it could probably get us some performance benefit as well). However, the scope of this ticket was to mainly fix the JTREG errors that >16B SVE machines were running into due to illegal immediates being passed to the `sve_dup` instruction. Would it be acceptable if I push this fix first and then create a follow up task to work on optimizing `loadConH`? I can create a new JBS ticket and assign it to myself and tag it here as well if that helps. Thank you!
> > 
> > 
> > Well, yes, but I'm proposing a simpler and better fix to the problem. Sure, if you want to do this in two steps go ahead.
> 
> Apologies, I thought I could change just the replicate backend nodes to be able to generate the `mov` to a scratch reg -> `dup` to replicate the value but missed the point that I can't still get rid of the `loadConH` node that loads the immediate from the constant pool.

Why not?

> If we want to change `loadConH` to instead generate a `mov` of an immediate to a scratch register, then we might have to change the `dst` from being a `vRegF` to a `iRegI`

I don't understand.

Why not do something along these lines?

// Replicate a 16-bit half precision float
instruct replicateHF_imm8_gt128b(vReg dst, immHDupV con) %{
  predicate(Matcher::vector_length_in_bytes(n) > 16);
  match(Set dst (Replicate con));
  format %{ "replicateHF_imm8_gt128b $dst, $con\t# vector > 128 bits" %}
  ins_encode %{
    assert(UseSVE > 0, "must be sve");
   if (constant fits) {
    __ sve_dup($dst$$FloatRegister, __ H, (int)($con$$constant));
  } else
    __ mov(rscratch1, (int)($con$$constant));
    __ sve_dup($dst$$FloatRegister, __ H, rscratch1);
  }

  %}
  ins_pipe(pipe_slow);
%}

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26589#issuecomment-3183701776