RFR: 8374349: [VectorAPI]: AArch64: Prefer merging mode SVE CPY instruction

Thu Jan 29 01:24:48 UTC 2026

On Wed, 28 Jan 2026 10:17:30 GMT, Andrew Haley <aph at openjdk.org> wrote:

> > Therefore, when you test this change using the C case, you will see a significant performance improvement.
> > > I see 2% uplift on these numbers.
> > 
> > 
> > @theRealAph And I think this also explains your question on these numbers.
> 
> Not at all.
> 
> The performance claim above was:
> 
> > Microbenchmarks show this change brings performance uplift ranging from 11% to 33%, depending on the specific operation and data types.
> 
> But the real performance uplift, as measured in Java microbenchmarks, is 2%.

Sorry, this is my mistake, I should be more precise. I should say that when this optimization takes effect, the performance improvement is 11%-33%, depending on the specific operation and data types. Thanks for point this out!

> Definitions in Assembler should generate the instructions in the Architecture reference Manual. When doing this, please override sve_cpy in MacroAssembler instead of here.

Agreed, that's exactly what I was thinking too.

@theRealAph Thank you for your suggestion. I will address the issue you pointed out in the next commit.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/29359#issuecomment-3814815084