RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4]
Emanuel Peter
epeter at openjdk.org
Thu May 22 10:42:55 UTC 2025
On Thu, 22 May 2025 06:17:40 GMT, Emanuel Peter <epeter at openjdk.org> wrote:
>> Hi @eme64 I removed the `@Warmup` entirely and the test does pass on aarch64. Although I am a bit afraid to fully remove it as it could sometimes lead to the loop not being warm enough for c2 vectorization to kick in. I haven't tried with different values of the warmup iterations though. Do you think it's ok to remove it entirely?
>
> @Bhavana-Kilambi The TestFramework actually forces C2 compilation:
> - runs warmup iterations, maybe C2 triggers automatically because there are enough iterations.
> - Once warmup is over, the TestFramework checks if the method is already compiled, if not, it enqueues it.
> - In the end, we know it is C2 compiled, which gives us the C2 IR we can match with.
>
> In my experience, having low warmup count works in most cases. Except when you need profiling data. If you have zero warup, we basically have compilation with `-Xcomp`.
>
> So it really depends on your specific case. In general, I would avoid doing an `Xcomp` compilation / zero warmup, because then we do not test normal compilation with profiling. And compilation with profiling is more important I think.
>
> But in cases where you have a large loop in the test method, we would trigger OSR and normal compilation with profiling rather soon anyway. So lowering the warmup is ok. How many loop iterations do we need for OSR?
> `product(intx, Tier4BackEdgeThreshold, 40000`. We could round that up to `100_000`, just to be sure. With `LEN = 2048`, you would thus only need about `50` invocations of the tests during warmup to reach C2 compilation. Hence, the current `@Warmup(10000)` is much too high, I think. You could cut down the runtime by about a factor of `100` here, if my math is correct :exploding_head:
>
> What do you think?
> Hi @eme64 Thanks for the details and suggestions. I tried with a `@Warmpup(50) `(my calculation is 50 * 2048 = 102400 which is around 100_000) and the test passes on aarch64 (it passes even with 0 warmp though). Do you think we can go ahead with `@Warmup(50)` ?
Sounds good :)
> Also, can I ask if any other tests failed on your side (they shouldn't though as I havent touched any other code other than FP16)?
There was no other related test failure :)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2900746291
More information about the hotspot-compiler-dev
mailing list