RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12]

Thu Feb 20 06:27:57 UTC 2025

On Wed, 19 Feb 2025 19:50:50 GMT, Evgeny Astigeevich <eastigeevich at openjdk.org> wrote:

>> I will run a comparison next with the same batch of tests but looking at `int` and see if there are any differences compared with `long` or not.
>
> Hi @galderz,
> Results from Graviton 3(Neoverse-V1).
> Without the patch:
> 
> Benchmark                       (probability)  (range)  (seed)  (size)   Mode  Cnt      Score    Error   Units
> MinMaxVector.intClippingRange             N/A       90       0    1000  thrpt    8  12565.427 ± 37.538  ops/ms
> MinMaxVector.intClippingRange             N/A      100       0    1000  thrpt    8  12462.072 ± 84.067  ops/ms
> MinMaxVector.intLoopMax                    50      N/A     N/A    2048  thrpt    8   5113.090 ± 68.720  ops/ms
> MinMaxVector.intLoopMax                    80      N/A     N/A    2048  thrpt    8   5129.857 ± 35.005  ops/ms
> MinMaxVector.intLoopMax                   100      N/A     N/A    2048  thrpt    8   5116.081 ±  8.946  ops/ms
> MinMaxVector.intLoopMin                    50      N/A     N/A    2048  thrpt    8   6174.544 ± 52.573  ops/ms
> MinMaxVector.intLoopMin                    80      N/A     N/A    2048  thrpt    8   6110.884 ± 54.447  ops/ms
> MinMaxVector.intLoopMin                   100      N/A     N/A    2048  thrpt    8   6178.661 ± 48.450  ops/ms
> MinMaxVector.intReductionMax               50      N/A     N/A    2048  thrpt    8   5109.270 ± 10.525  ops/ms
> MinMaxVector.intReductionMax               80      N/A     N/A    2048  thrpt    8   5123.426 ± 28.229  ops/ms
> MinMaxVector.intReductionMax              100      N/A     N/A    2048  thrpt    8   5133.799 ±  7.693  ops/ms
> MinMaxVector.intReductionMin               50      N/A     N/A    2048  thrpt    8   5130.209 ± 15.491  ops/ms
> MinMaxVector.intReductionMin               80      N/A     N/A    2048  thrpt    8   5127.823 ± 27.767  ops/ms
> MinMaxVector.intReductionMin              100      N/A     N/A    2048  thrpt    8   5118.217 ± 22.186  ops/ms
> MinMaxVector.longClippingRange            N/A       90       0    1000  thrpt    8   1831.026 ± 15.502  ops/ms
> MinMaxVector.longClippingRange            N/A      100       0    1000  thrpt    8   1827.194 ± 22.076  ops/ms
> MinMaxVector.longLoopMax                   50      N/A     N/A    2048  thrpt    8   2643.383 ±  9.830  ops/ms
> MinMaxVector.longLoopMax                   80      N/A     N/A    2048  thrpt    8   2640.417 ±  7.797  ops/ms
> MinMaxVector.longLoopMax                  100      N/A     N/A    2048  thrpt    8   1244.321 ±  1.001  ops/ms
> MinMaxVector.longLoopMin                   50      N/A     N/A    2048  thrpt    8   3239.234 ±  8.813  ops/ms
> MinMaxVector.longLoopMin                   80      N/A     N/A    2048  thrpt    8   3252.713 ±  3...

Thanks @eastig for the results on Graviton 3. I'm summarising them here:

Benchmark                       (probability)  (range)  (seed)  (size)   Mode  Cnt       Base      Patch   Units
MinMaxVector.longClippingRange            N/A       90       0    1000  thrpt    8   1831.026   5094.259  ops/ms (+178%)
MinMaxVector.longClippingRange            N/A      100       0    1000  thrpt    8   1827.194   5096.835  ops/ms (+180%)
MinMaxVector.longLoopMax                   50      N/A     N/A    2048  thrpt    8   2643.383   2636.438  ops/ms
MinMaxVector.longLoopMax                   80      N/A     N/A    2048  thrpt    8   2640.417   2644.069  ops/ms
MinMaxVector.longLoopMax                  100      N/A     N/A    2048  thrpt    8   1244.321   2646.250  ops/ms (+112%)
MinMaxVector.longLoopMin                   50      N/A     N/A    2048  thrpt    8   3239.234   2648.504  ops/ms (-18%)
MinMaxVector.longLoopMin                   80      N/A     N/A    2048  thrpt    8   3252.713   2658.082  ops/ms (-18%)
MinMaxVector.longLoopMin                  100      N/A     N/A    2048  thrpt    8   1204.370   2647.532  ops/ms (+119%)
MinMaxVector.longReductionMax              50      N/A     N/A    2048  thrpt    8   2536.322   2536.254  ops/ms
MinMaxVector.longReductionMax              80      N/A     N/A    2048  thrpt    8   2536.318   2536.209  ops/ms
MinMaxVector.longReductionMax             100      N/A     N/A    2048  thrpt    8   1395.273   2536.342  ops/ms (+81%)
MinMaxVector.longReductionMin              50      N/A     N/A    2048  thrpt    8   2536.325   2536.271  ops/ms
MinMaxVector.longReductionMin              80      N/A     N/A    2048  thrpt    8   2536.265   2536.250  ops/ms
MinMaxVector.longReductionMin             100      N/A     N/A    2048  thrpt    8   1389.982   2536.246  ops/ms (+82%)

On Graviton 3 there are wide enough registers for vectorization to kick in, so we see similar improvements to x64 AVX-512 in https://github.com/openjdk/jdk/pull/20098#issuecomment-2642788364. There is some variance in the 50/80% probability range, this was also observed slightly there, but on the aarch64 system it looks more pronounced. Interesting that it happened with min but not max but could be variance.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2670574593