RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long)

Wed Jul 17 09:20:51 UTC 2024

On Wed, 10 Jul 2024 14:24:05 GMT, Jasmine Karthikeyan <jkarthikeyan at openjdk.org> wrote:

> The C2 changes look nice! I just added one comment here about style. It would also be good to add some IR tests checking that the intrinsic is creating `MaxL`/`MinL` nodes before macro expansion, and a microbenchmark to compare results.

Thanks for the review. +1 to the IR tests, I'll work on those.

Re: microbenchmark - what do you have exactly in mind? For vectorization performance there is `ReductionPerf` though it's not a microbenchmark per se. Do you want a microbenchmark for the performance of vectorized max/min long? For non-vectorization performance there is `MathBench`.

I would not expect performance differences in `MathBench` because the backend is still the same and this change really benefits vectorization. I've run the min/max long tests on darwin/aarch64 and linux/x64 and indeed I see no difference:

linux/x64

Benchmark          (seed)   Mode  Cnt        Score       Error   Units
MathBench.maxLong       0  thrpt    8  1464197.164 ± 27044.205  ops/ms # base
MathBench.minLong       0  thrpt    8  1469917.328 ± 25397.401  ops/ms # base
MathBench.maxLong       0  thrpt    8  1469615.250 ± 17950.429  ops/ms # patched
MathBench.minLong       0  thrpt    8  1456290.514 ± 44455.727  ops/ms # patched

darwin/aarch64

Benchmark          (seed)   Mode  Cnt        Score        Error   Units
MathBench.maxLong       0  thrpt    8  1739341.447 ? 210983.444  ops/ms # base
MathBench.minLong       0  thrpt    8  1659547.649 ? 260554.159  ops/ms # base
MathBench.maxLong       0  thrpt    8  1660449.074 ? 254534.725  ops/ms # patched
MathBench.minLong       0  thrpt    8  1729728.021 ?  16327.575  ops/ms # patched

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2232836799