RFR: 8285040: PPC64 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v4]
Martin Doerr
mdoerr at openjdk.java.net
Thu Apr 21 16:03:29 UTC 2022
On Wed, 20 Apr 2022 15:30:34 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:
>> Add match rules for UDivI, UModI, UDivL, UModL as on x86 (JDK-8282221). PPC64 doesn't have DivMod instructions which can deliver both results at once.
>> Note: The x86 tests can currently not be extended to this platform because https://bugs.openjdk.java.net/browse/JDK-8280120 is not yet implemented.
>>
>> (Removed UDivI, UModI again in second commit, because performance was worse. C2 can optimize better without intrinsification as long as we don't have UseDivMod optimization. Added back later with 4th commit.)
>>
>> IntegerDivMod without UDivI, UModI on Power9:
>>
>> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units
>> IntegerDivMod.testDivideRemainderUnsigned 1024 mixed avgt 25 2386.064 ± 2.746 ns/op
>> IntegerDivMod.testDivideRemainderUnsigned 1024 positive avgt 25 2385.697 ± 2.831 ns/op
>> IntegerDivMod.testDivideRemainderUnsigned 1024 negative avgt 25 2386.021 ± 2.756 ns/op
>> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 25 1788.233 ± 5.612 ns/op
>> IntegerDivMod.testDivideUnsigned 1024 positive avgt 25 1785.991 ± 7.001 ns/op
>> IntegerDivMod.testDivideUnsigned 1024 negative avgt 25 1789.000 ± 6.258 ns/op
>> IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 25 2084.063 ± 2.618 ns/op
>> IntegerDivMod.testRemainderUnsigned 1024 positive avgt 25 2080.573 ± 5.779 ns/op
>> IntegerDivMod.testRemainderUnsigned 1024 negative avgt 25 2083.192 ± 2.111 ns/op
>>
>>
>> LongDivMod without UDivL, UModL on Power9:
>>
>> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units
>> LongDivMod.testDivideRemainderUnsigned 1024 mixed avgt 25 5482.364 ± 18.448 ns/op
>> LongDivMod.testDivideRemainderUnsigned 1024 positive avgt 25 4722.370 ± 2.314 ns/op
>> LongDivMod.testDivideRemainderUnsigned 1024 negative avgt 25 2024.052 ± 0.604 ns/op
>> LongDivMod.testDivideUnsigned 1024 mixed avgt 25 4772.528 ± 63.147 ns/op
>> LongDivMod.testDivideUnsigned 1024 positive avgt 25 3711.178 ± 1.178 ns/op
>> LongDivMod.testDivideUnsigned 1024 negative avgt 25 1195.149 ± 0.822 ns/op
>> LongDivMod.testRemainderUnsigned 1024 mixed avgt 25 4753.722 ± 115.171 ns/op
>> LongDivMod.testRemainderUnsigned 1024 positive avgt 25 3749.799 ± 5.935 ns/op
>> LongDivMod.testRemainderUnsigned 1024 negative avgt 25 1488.802 ± 0.628 ns/op
>>
>>
>> With UDivL, UModL:
>>
>> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units
>> LongDivMod.testDivideRemainderUnsigned 1024 mixed avgt 25 3253.162 ± 1.019 ns/op
>> LongDivMod.testDivideRemainderUnsigned 1024 positive avgt 25 3252.280 ± 1.608 ns/op
>> LongDivMod.testDivideRemainderUnsigned 1024 negative avgt 25 3252.933 ± 1.850 ns/op
>> LongDivMod.testDivideUnsigned 1024 mixed avgt 25 1648.233 ± 1.830 ns/op
>> LongDivMod.testDivideUnsigned 1024 positive avgt 25 1648.639 ± 0.816 ns/op
>> LongDivMod.testDivideUnsigned 1024 negative avgt 25 1646.247 ± 3.835 ns/op
>> LongDivMod.testRemainderUnsigned 1024 mixed avgt 25 1766.701 ± 1.897 ns/op
>> LongDivMod.testRemainderUnsigned 1024 positive avgt 25 1767.413 ± 1.450 ns/op
>> LongDivMod.testRemainderUnsigned 1024 negative avgt 25 1767.216 ± 1.800 ns/op
>>
>>
>> It turns out that the "UseDivMod" optimization is key for this benchmark. Implemented with 3rd commit.
>> With UDivL, UModL and UseDivMod optimization:
>>
>> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units
>> LongDivMod.testDivideRemainderUnsigned 1024 mixed avgt 25 1848.883 ± 3.550 ns/op
>> LongDivMod.testDivideRemainderUnsigned 1024 positive avgt 25 1849.743 ± 1.309 ns/op
>> LongDivMod.testDivideRemainderUnsigned 1024 negative avgt 25 1848.598 ± 2.436 ns/op
>> LongDivMod.testDivideUnsigned 1024 mixed avgt 25 1646.810 ± 4.024 ns/op
>> LongDivMod.testDivideUnsigned 1024 positive avgt 25 1648.605 ± 1.157 ns/op
>> LongDivMod.testDivideUnsigned 1024 negative avgt 25 1648.319 ± 1.285 ns/op
>> LongDivMod.testRemainderUnsigned 1024 mixed avgt 25 1766.375 ± 1.559 ns/op
>> LongDivMod.testRemainderUnsigned 1024 positive avgt 25 1765.909 ± 1.815 ns/op
>> LongDivMod.testRemainderUnsigned 1024 negative avgt 25 1766.459 ± 1.255 ns/op
>>
>>
>> Integer version shows basically the same performance, now:
>> IntegerDivMod with UDivI, UModI and UseDivMod optimization:
>>
>> Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units
>> IntegerDivMod.testDivideRemainderUnsigned 1024 mixed avgt 25 1855.158 ± 2.161 ns/op
>> IntegerDivMod.testDivideRemainderUnsigned 1024 positive avgt 25 1857.348 ± 1.569 ns/op
>> IntegerDivMod.testDivideRemainderUnsigned 1024 negative avgt 25 1856.095 ± 2.129 ns/op
>> IntegerDivMod.testDivideUnsigned 1024 mixed avgt 25 1648.743 ± 0.819 ns/op
>> IntegerDivMod.testDivideUnsigned 1024 positive avgt 25 1647.971 ± 1.731 ns/op
>> IntegerDivMod.testDivideUnsigned 1024 negative avgt 25 1648.994 ± 0.861 ns/op
>> IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 25 1777.920 ± 3.967 ns/op
>> IntegerDivMod.testRemainderUnsigned 1024 positive avgt 25 1776.796 ± 5.479 ns/op
>> IntegerDivMod.testRemainderUnsigned 1024 negative avgt 25 1778.992 ± 3.611 ns/op
>
> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision:
>
> Add back Integer nodes after enabling UseDivMod optimization. That makes the difference.
Thanks for the reviews!
-------------
PR: https://git.openjdk.java.net/jdk/pull/8304
More information about the hotspot-compiler-dev
mailing list