[aarch64-port-dev ] [10] RFR(S): JDK-8184943: AARCH64: Intrinsify hasNegatives
Stuart Monteith
stuart.monteith at linaro.org
Thu Jul 20 14:31:10 UTC 2017
The addtional platform I'm able to run produces the following results:
bellsw:
Benchmark (length) Mode Cnt Score Error Units
HasNegatives.loopingFastMethod 4 avgt 50 3822.856 ± 0.266 ns/op
HasNegatives.loopingFastMethod 31 avgt 50 10497.204 ± 0.705 ns/op
HasNegatives.loopingFastMethod 65 avgt 50 11452.405 ± 2.005 ns/op
HasNegatives.loopingFastMethod 101 avgt 50 13462.799 ± 56.811 ns/op
HasNegatives.loopingFastMethod 256 avgt 50 20668.731 ± 156.668 ns/op
HasNegatives.steamFastMethod 4 avgt 50 6208.364 ± 0.429 ns/op
HasNegatives.steamFastMethod 31 avgt 50 23371.059 ± 1.922 ns/op
HasNegatives.steamFastMethod 65 avgt 50 52450.904 ± 4.051 ns/op
HasNegatives.steamFastMethod 101 avgt 50 61061.875 ± 17.735 ns/op
HasNegatives.steamFastMethod 256 avgt 50 164507.570 ± 16.935 ns/op
Linaro patch:
Benchmark (length) Mode Cnt Score Error Units
HasNegatives.loopingFastMethod 4 avgt 50 3823.895 ± 0.264 ns/op
HasNegatives.loopingFastMethod 31 avgt 50 7977.361 ± 141.724 ns/op
HasNegatives.loopingFastMethod 65 avgt 50 12303.588 ± 100.645 ns/op
HasNegatives.loopingFastMethod 101 avgt 50 14464.835 ± 126.982 ns/op
HasNegatives.loopingFastMethod 256 avgt 50 38142.723 ± 3.266 ns/op
HasNegatives.steamFastMethod 4 avgt 50 6208.206 ± 0.401 ns/op
HasNegatives.steamFastMethod 31 avgt 50 23370.868 ± 1.337 ns/op
HasNegatives.steamFastMethod 65 avgt 50 52450.499 ± 6.449 ns/op
HasNegatives.steamFastMethod 101 avgt 50 61013.218 ± 73.249 ns/op
HasNegatives.steamFastMethod 256 avgt 50 159738.530 ± 12.301 ns/op
So there are obvious benefits to the larger 64-byte chunks being read.
On 20 July 2017 at 14:26, Dmitry Chuyko <dmitry.chuyko at bell-sw.com> wrote:
> Andrew,
>
> Just a couple of quick questions on the micro-benchmark:
>
> - What's the purpose of using custom Sink class instead of JMH's Blackhole?
> Is that a check of mixing calculation with actual write of its result? I see
> Blackhole usage in loopingMethod() and testMethod() variants,
> loopingFastMethod below uses Sink.
> - What's the purpose of nested 1000 iteration loops? I guess that may test
> impact in case of on loop unrolling. Again, I see testMethod() variant
> without the loop.
>
> -Dmitry Chuyko
>
>
> On 07/20/2017 03:55 PM, Andrew Haley wrote:
>>
>> Hi,
>>
>> On 20/07/17 11:03, Dmitrij Pochepko wrote:
>>
>>> Please review this small webrev [1] that implements an enhancement [2]
>>> which adds has_negatives intrinsic to AARCH64 OpenJDK port. This intrinsic
>>> performs better than c2-compiled code for every array size tried:
>>
>> Yay! We're off to the races!
>>
>> Yours:
>>
>> Benchmark (length) Mode Cnt Score Error
>> Units
>> HasNegatives.loopingFastMethod 4 avgt 5 6680.619 ? 0.953
>> ns/op
>> HasNegatives.loopingFastMethod 31 avgt 5 12936.791 ? 1.599
>> ns/op
>> HasNegatives.loopingFastMethod 65 avgt 5 14604.253 ? 2.088
>> ns/op
>> HasNegatives.loopingFastMethod 101 avgt 5 19606.385 ? 7.751
>> ns/op
>> HasNegatives.loopingFastMethod 256 avgt 5 30858.498 ? 1.225
>> ns/op
>>
>>
>> Stuart's:
>>
>> Benchmark (length) Mode Cnt Score Error
>> Units
>> HasNegatives.loopingFastMethod 4 avgt 5 5013.024 ? 0.572
>> ns/op
>> HasNegatives.loopingFastMethod 31 avgt 5 9186.044 ? 2.439
>> ns/op
>> HasNegatives.loopingFastMethod 65 avgt 5 13769.220 ? 1.879
>> ns/op
>> HasNegatives.loopingFastMethod 101 avgt 5 15854.385 ? 2.482
>> ns/op
>> HasNegatives.loopingFastMethod 256 avgt 5 26691.626 ? 3.523
>> ns/op
>>
>> I didn't expect a big difference. Note that the really important
>> measurement
>> is on length ~31, which is very common.
>>
>> Benchmark at http://cr.openjdk.java.net/~aph/HasNegativesBench/. Test was
>> on
>> APM.
>>
>
More information about the aarch64-port-dev
mailing list