RFR(S): 8247307: C2: Loop array fill stub routines are not called

Pengfei Li Pengfei.Li at arm.com
Tue Jun 16 06:18:36 UTC 2020


Hi Vladimir,

Thanks for looking at this.

> I don't see referenced link [1] in e-mail.

Sorry I forgot to paste my JMH url.
[1] http://cr.openjdk.java.net/~pli/rfr/8247307/TestArrayFill.java

> Are these performance data for Aarch64?

Yes. I didn't paste the x86 result since there's no difference after my patch.
But if I turn OptimizeFill on manully there's a regression on x86. (see below)

Before (x86)
  Benchmark                     Mode  Cnt     Score    Error  Units
  TestArrayFill.fillByteArray   avgt   25  1793.206 ± 15.337  ns/op
  TestArrayFill.fillIntArray    avgt   25  6679.491 ± 14.729  ns/op
  TestArrayFill.fillShortArray  avgt   25  3412.708 ± 12.005  ns/op
  TestArrayFill.zeroByteArray   avgt   25  1785.940 ± 15.174  ns/op
  TestArrayFill.zeroIntArray    avgt   25  6666.709 ± 11.735  ns/op
  TestArrayFill.zeroShortArray  avgt   25  3404.146 ± 23.045  ns/op

After (x86)
  Benchmark                     Mode  Cnt     Score     Error  Units
  TestArrayFill.fillByteArray   avgt   25  2281.374 ± 191.220  ns/op
  TestArrayFill.fillIntArray    avgt   25  9009.679 ± 901.541  ns/op
  TestArrayFill.fillShortArray  avgt   25  4828.686 ±  49.199  ns/op
  TestArrayFill.zeroByteArray   avgt   25  2463.745 ±  47.640  ns/op
  TestArrayFill.zeroIntArray    avgt   25  9062.682 ± 939.538  ns/op
  TestArrayFill.zeroShortArray  avgt   25  4837.231 ±  50.026  ns/op

> What x86 CPU you tested on? (avx512?)

The results above are produced on Intel® Xeon® Gold 6152 but UseAVX=2 by default in latest JDK master.

> What size of arrays you tested.

It's 65536 in my test, see [1].

> Few years ago OptimizedFill wins over vectorized loops but CPU and
> vectorization are improved since then. May be we can deprecate this code if
> it does not have performance benefits. Or we should revisit stub's code for
> modern CPUs.

I think it's still valuable since it does have performance benefit on AArch64 if the value to be filled is zero.
See this part of  TestArrayFill.zero* cases.

Before (AArch64)
  Benchmark                     Mode  Cnt      Score     Error  Units
  TestArrayFill.zeroByteArray   avgt   25   2080.313 ±   7.516  ns/op
  TestArrayFill.zeroIntArray    avgt   25  10961.331 ± 527.750  ns/op
  TestArrayFill.zeroShortArray  avgt   25   4126.386 ±  20.997  ns/op

After (AArch64)
  Benchmark                     Mode  Cnt      Score     Error  Units
  TestArrayFill.zeroByteArray   avgt   25    903.434 ±  10.944  ns/op
  TestArrayFill.zeroIntArray    avgt   25   8141.533 ± 946.341  ns/op
  TestArrayFill.zeroShortArray  avgt   25   1784.124 ±  24.618  ns/op

--
Thanks,
Pengfei



More information about the hotspot-compiler-dev mailing list