[aarch64-port-dev ] [10] RFR(S): JDK-8184943: AARCH64: Intrinsify hasNegatives

Thu Jul 20 14:31:10 UTC 2017

The addtional platform I'm able to run produces the following results:

bellsw:

Benchmark                       (length)  Mode  Cnt       Score     Error  Units
HasNegatives.loopingFastMethod         4  avgt   50    3822.856 ±   0.266  ns/op
HasNegatives.loopingFastMethod        31  avgt   50   10497.204 ±   0.705  ns/op
HasNegatives.loopingFastMethod        65  avgt   50   11452.405 ±   2.005  ns/op
HasNegatives.loopingFastMethod       101  avgt   50   13462.799 ±  56.811  ns/op
HasNegatives.loopingFastMethod       256  avgt   50   20668.731 ± 156.668  ns/op
HasNegatives.steamFastMethod           4  avgt   50    6208.364 ±   0.429  ns/op
HasNegatives.steamFastMethod          31  avgt   50   23371.059 ±   1.922  ns/op
HasNegatives.steamFastMethod          65  avgt   50   52450.904 ±   4.051  ns/op
HasNegatives.steamFastMethod         101  avgt   50   61061.875 ±  17.735  ns/op
HasNegatives.steamFastMethod         256  avgt   50  164507.570 ±  16.935  ns/op

Linaro patch:

Benchmark                       (length)  Mode  Cnt       Score     Error  Units
HasNegatives.loopingFastMethod         4  avgt   50    3823.895 ±   0.264  ns/op
HasNegatives.loopingFastMethod        31  avgt   50    7977.361 ± 141.724  ns/op
HasNegatives.loopingFastMethod        65  avgt   50   12303.588 ± 100.645  ns/op
HasNegatives.loopingFastMethod       101  avgt   50   14464.835 ± 126.982  ns/op
HasNegatives.loopingFastMethod       256  avgt   50   38142.723 ±   3.266  ns/op
HasNegatives.steamFastMethod           4  avgt   50    6208.206 ±   0.401  ns/op
HasNegatives.steamFastMethod          31  avgt   50   23370.868 ±   1.337  ns/op
HasNegatives.steamFastMethod          65  avgt   50   52450.499 ±   6.449  ns/op
HasNegatives.steamFastMethod         101  avgt   50   61013.218 ±  73.249  ns/op
HasNegatives.steamFastMethod         256  avgt   50  159738.530 ±  12.301  ns/op

So there are obvious benefits to the larger 64-byte chunks being read.

On 20 July 2017 at 14:26, Dmitry Chuyko <dmitry.chuyko at bell-sw.com> wrote:
> Andrew,
>
> Just a couple of quick questions on the micro-benchmark:
>
> - What's the purpose of using custom Sink class instead of JMH's Blackhole?
> Is that a check of mixing calculation with actual write of its result? I see
> Blackhole usage in loopingMethod() and testMethod() variants,
> loopingFastMethod below uses Sink.
> - What's the purpose of nested 1000 iteration loops? I guess that may test
> impact in case of on loop unrolling. Again, I see testMethod() variant
> without the loop.
>
> -Dmitry Chuyko
>
>
> On 07/20/2017 03:55 PM, Andrew Haley wrote:
>>
>> Hi,
>>
>> On 20/07/17 11:03, Dmitrij Pochepko wrote:
>>
>>> Please review this small webrev [1] that implements an enhancement [2]
>>> which adds has_negatives intrinsic to AARCH64 OpenJDK port. This intrinsic
>>> performs better than c2-compiled code for every array size tried:
>>
>> Yay!  We're off to the races!
>>
>> Yours:
>>
>> Benchmark                       (length)  Mode  Cnt      Score   Error
>> Units
>> HasNegatives.loopingFastMethod         4  avgt    5   6680.619 ? 0.953
>> ns/op
>> HasNegatives.loopingFastMethod        31  avgt    5  12936.791 ? 1.599
>> ns/op
>> HasNegatives.loopingFastMethod        65  avgt    5  14604.253 ? 2.088
>> ns/op
>> HasNegatives.loopingFastMethod       101  avgt    5  19606.385 ? 7.751
>> ns/op
>> HasNegatives.loopingFastMethod       256  avgt    5  30858.498 ? 1.225
>> ns/op
>>
>>
>> Stuart's:
>>
>> Benchmark                       (length)  Mode  Cnt      Score   Error
>> Units
>> HasNegatives.loopingFastMethod         4  avgt    5   5013.024 ? 0.572
>> ns/op
>> HasNegatives.loopingFastMethod        31  avgt    5   9186.044 ? 2.439
>> ns/op
>> HasNegatives.loopingFastMethod        65  avgt    5  13769.220 ? 1.879
>> ns/op
>> HasNegatives.loopingFastMethod       101  avgt    5  15854.385 ? 2.482
>> ns/op
>> HasNegatives.loopingFastMethod       256  avgt    5  26691.626 ? 3.523
>> ns/op
>>
>> I didn't expect a big difference.  Note that the really important
>> measurement
>> is on length ~31, which is very common.
>>
>> Benchmark at http://cr.openjdk.java.net/~aph/HasNegativesBench/.  Test was
>> on
>> APM.
>>
>