RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long)
Galder Zamarreño
galder at openjdk.org
Wed Jul 17 09:20:51 UTC 2024
On Wed, 10 Jul 2024 14:24:05 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:
> The C2 changes look nice! I just added one comment here about style. It would also be good to add some IR tests checking that the intrinsic is creating `MaxL`/`MinL` nodes before macro expansion, and a microbenchmark to compare results.
Thanks for the review. +1 to the IR tests, I'll work on those.
Re: microbenchmark - what do you have exactly in mind? For vectorization performance there is `ReductionPerf` though it's not a microbenchmark per se. Do you want a microbenchmark for the performance of vectorized max/min long? For non-vectorization performance there is `MathBench`.
I would not expect performance differences in `MathBench` because the backend is still the same and this change really benefits vectorization. I've run the min/max long tests on darwin/aarch64 and linux/x64 and indeed I see no difference:
linux/x64
Benchmark (seed) Mode Cnt Score Error Units
MathBench.maxLong 0 thrpt 8 1464197.164 ± 27044.205 ops/ms # base
MathBench.minLong 0 thrpt 8 1469917.328 ± 25397.401 ops/ms # base
MathBench.maxLong 0 thrpt 8 1469615.250 ± 17950.429 ops/ms # patched
MathBench.minLong 0 thrpt 8 1456290.514 ± 44455.727 ops/ms # patched
darwin/aarch64
Benchmark (seed) Mode Cnt Score Error Units
MathBench.maxLong 0 thrpt 8 1739341.447 ? 210983.444 ops/ms # base
MathBench.minLong 0 thrpt 8 1659547.649 ? 260554.159 ops/ms # base
MathBench.maxLong 0 thrpt 8 1660449.074 ? 254534.725 ops/ms # patched
MathBench.minLong 0 thrpt 8 1729728.021 ? 16327.575 ops/ms # patched
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2232836799
More information about the core-libs-dev
mailing list