RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v12]
Evgeny Astigeevich
eastigeevich at openjdk.org
Wed Feb 19 19:54:05 UTC 2025
On Wed, 19 Feb 2025 17:43:54 GMT, Galder Zamarreño <galder at openjdk.org> wrote:
>> Galder Zamarreño has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 44 additional commits since the last revision:
>>
>> - Merge branch 'master' into topic.intrinsify-max-min-long
>> - Fix typo
>> - Renaming methods and variables and add docu on algorithms
>> - Fix copyright years
>> - Make sure it runs with cpus with either avx512 or asimd
>> - Test can only run with 256 bit registers or bigger
>>
>> * Remove platform dependant check
>> and use platform independent configuration instead.
>> - Fix license header
>> - Tests should also run on aarch64 asimd=true envs
>> - Added comment around the assertions
>> - Adjust min/max identity IR test expectations after changes
>> - ... and 34 more: https://git.openjdk.org/jdk/compare/384bab03...a190ae68
>
> I will run a comparison next with the same batch of tests but looking at `int` and see if there are any differences compared with `long` or not.
Hi @galderz,
Results from Graviton 3(Neoverse-V1).
Without the patch:
Benchmark (probability) (range) (seed) (size) Mode Cnt Score Error Units
MinMaxVector.intClippingRange N/A 90 0 1000 thrpt 8 12565.427 ± 37.538 ops/ms
MinMaxVector.intClippingRange N/A 100 0 1000 thrpt 8 12462.072 ± 84.067 ops/ms
MinMaxVector.intLoopMax 50 N/A N/A 2048 thrpt 8 5113.090 ± 68.720 ops/ms
MinMaxVector.intLoopMax 80 N/A N/A 2048 thrpt 8 5129.857 ± 35.005 ops/ms
MinMaxVector.intLoopMax 100 N/A N/A 2048 thrpt 8 5116.081 ± 8.946 ops/ms
MinMaxVector.intLoopMin 50 N/A N/A 2048 thrpt 8 6174.544 ± 52.573 ops/ms
MinMaxVector.intLoopMin 80 N/A N/A 2048 thrpt 8 6110.884 ± 54.447 ops/ms
MinMaxVector.intLoopMin 100 N/A N/A 2048 thrpt 8 6178.661 ± 48.450 ops/ms
MinMaxVector.intReductionMax 50 N/A N/A 2048 thrpt 8 5109.270 ± 10.525 ops/ms
MinMaxVector.intReductionMax 80 N/A N/A 2048 thrpt 8 5123.426 ± 28.229 ops/ms
MinMaxVector.intReductionMax 100 N/A N/A 2048 thrpt 8 5133.799 ± 7.693 ops/ms
MinMaxVector.intReductionMin 50 N/A N/A 2048 thrpt 8 5130.209 ± 15.491 ops/ms
MinMaxVector.intReductionMin 80 N/A N/A 2048 thrpt 8 5127.823 ± 27.767 ops/ms
MinMaxVector.intReductionMin 100 N/A N/A 2048 thrpt 8 5118.217 ± 22.186 ops/ms
MinMaxVector.longClippingRange N/A 90 0 1000 thrpt 8 1831.026 ± 15.502 ops/ms
MinMaxVector.longClippingRange N/A 100 0 1000 thrpt 8 1827.194 ± 22.076 ops/ms
MinMaxVector.longLoopMax 50 N/A N/A 2048 thrpt 8 2643.383 ± 9.830 ops/ms
MinMaxVector.longLoopMax 80 N/A N/A 2048 thrpt 8 2640.417 ± 7.797 ops/ms
MinMaxVector.longLoopMax 100 N/A N/A 2048 thrpt 8 1244.321 ± 1.001 ops/ms
MinMaxVector.longLoopMin 50 N/A N/A 2048 thrpt 8 3239.234 ± 8.813 ops/ms
MinMaxVector.longLoopMin 80 N/A N/A 2048 thrpt 8 3252.713 ± 3.446 ops/ms
MinMaxVector.longLoopMin 100 N/A N/A 2048 thrpt 8 1204.370 ± 10.537 ops/ms
MinMaxVector.longReductionMax 50 N/A N/A 2048 thrpt 8 2536.322 ± 0.127 ops/ms
MinMaxVector.longReductionMax 80 N/A N/A 2048 thrpt 8 2536.318 ± 0.277 ops/ms
MinMaxVector.longReductionMax 100 N/A N/A 2048 thrpt 8 1395.273 ± 13.862 ops/ms
MinMaxVector.longReductionMin 50 N/A N/A 2048 thrpt 8 2536.325 ± 0.146 ops/ms
MinMaxVector.longReductionMin 80 N/A N/A 2048 thrpt 8 2536.265 ± 0.272 ops/ms
MinMaxVector.longReductionMin 100 N/A N/A 2048 thrpt 8 1389.982 ± 5.345 ops/ms
With the patch:
Benchmark (probability) (range) (seed) (size) Mode Cnt Score Error Units
MinMaxVector.intClippingRange N/A 90 0 1000 thrpt 8 12598.201 ± 52.631 ops/ms
MinMaxVector.intClippingRange N/A 100 0 1000 thrpt 8 12555.284 ± 62.472 ops/ms
MinMaxVector.intLoopMax 50 N/A N/A 2048 thrpt 8 5079.499 ± 16.392 ops/ms
MinMaxVector.intLoopMax 80 N/A N/A 2048 thrpt 8 5100.673 ± 30.376 ops/ms
MinMaxVector.intLoopMax 100 N/A N/A 2048 thrpt 8 5082.544 ± 23.540 ops/ms
MinMaxVector.intLoopMin 50 N/A N/A 2048 thrpt 8 6137.512 ± 30.198 ops/ms
MinMaxVector.intLoopMin 80 N/A N/A 2048 thrpt 8 6136.233 ± 7.726 ops/ms
MinMaxVector.intLoopMin 100 N/A N/A 2048 thrpt 8 6142.262 ± 96.510 ops/ms
MinMaxVector.intReductionMax 50 N/A N/A 2048 thrpt 8 5116.055 ± 23.270 ops/ms
MinMaxVector.intReductionMax 80 N/A N/A 2048 thrpt 8 5111.481 ± 12.236 ops/ms
MinMaxVector.intReductionMax 100 N/A N/A 2048 thrpt 8 5106.367 ± 9.035 ops/ms
MinMaxVector.intReductionMin 50 N/A N/A 2048 thrpt 8 5115.666 ± 15.539 ops/ms
MinMaxVector.intReductionMin 80 N/A N/A 2048 thrpt 8 5133.127 ± 4.918 ops/ms
MinMaxVector.intReductionMin 100 N/A N/A 2048 thrpt 8 5120.469 ± 24.355 ops/ms
MinMaxVector.longClippingRange N/A 90 0 1000 thrpt 8 5094.259 ± 14.092 ops/ms
MinMaxVector.longClippingRange N/A 100 0 1000 thrpt 8 5096.835 ± 16.517 ops/ms
MinMaxVector.longLoopMax 50 N/A N/A 2048 thrpt 8 2636.438 ± 18.760 ops/ms
MinMaxVector.longLoopMax 80 N/A N/A 2048 thrpt 8 2644.069 ± 3.933 ops/ms
MinMaxVector.longLoopMax 100 N/A N/A 2048 thrpt 8 2646.250 ± 2.007 ops/ms
MinMaxVector.longLoopMin 50 N/A N/A 2048 thrpt 8 2648.504 ± 18.294 ops/ms
MinMaxVector.longLoopMin 80 N/A N/A 2048 thrpt 8 2658.082 ± 3.362 ops/ms
MinMaxVector.longLoopMin 100 N/A N/A 2048 thrpt 8 2647.532 ± 5.600 ops/ms
MinMaxVector.longReductionMax 50 N/A N/A 2048 thrpt 8 2536.254 ± 0.086 ops/ms
MinMaxVector.longReductionMax 80 N/A N/A 2048 thrpt 8 2536.209 ± 0.129 ops/ms
MinMaxVector.longReductionMax 100 N/A N/A 2048 thrpt 8 2536.342 ± 0.068 ops/ms
MinMaxVector.longReductionMin 50 N/A N/A 2048 thrpt 8 2536.271 ± 0.203 ops/ms
MinMaxVector.longReductionMin 80 N/A N/A 2048 thrpt 8 2536.250 ± 0.343 ops/ms
MinMaxVector.longReductionMin 100 N/A N/A 2048 thrpt 8 2536.246 ± 0.179 ops/ms
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2669613497
More information about the core-libs-dev
mailing list