RFR: 8300669: AArch64: Table based tails processing and wider stores for Arrays.fill() intrinsic [v6]

Fri Jan 27 11:38:18 UTC 2023

On Thu, 26 Jan 2023 18:13:31 GMT, Andrew Haley <aph at openjdk.org> wrote:

> The worst part of it is that, with separate code for each length of copy, we have something that looks good in benchmarks, but it occupies more instruction cache that could be used for other things. I would not be surprised if, when integrated into an application with high icache pressure, it didn't make things worse.

Those are definitely important points. This is a lot more instruction cache than with the status quo dedicated to a stub that addresses one specific need.

Also, as you say, this has the potential to noticeably increase cache pressure. A counter-argument to that might be that an application doing a lot of array fills may well be filling arrays of the same size or from a limited range of sizes (even if those sizes varied from app to app). But that's not always going to be true.

Stepping back a bit from that critique I'm prompted to ask what is motivating this change. What evidence is there that we need this solution, or any, in preference to the one we already have?

> Also, it's a lot of complicated code, with only a small gain even on a specially-designed benchmark.

That's another important point that underlines the need for a compelling answer to the question of motivation, preferably an answer which also offers reason to believe the benchmark results will translate to similar gains in a real app.

-------------

PR: https://git.openjdk.org/jdk/pull/12222