RFR(S): 8247307: C2: Loop array fill stub routines are not called
Pengfei Li
Pengfei.Li at arm.com
Tue Jun 16 06:24:55 UTC 2020
Sorry I forgot to paste below JMH link in my last email.
[1] http://cr.openjdk.java.net/~pli/rfr/8247307/TestArrayFill.java
BTW. If I turn on OptimizeFill manually there's below performance regression on x86. So I turned it off on x86 in my patch to make things unchanged.
Before (x86 with -XX:+OptimizeFill)
Benchmark Mode Cnt Score Error Units
TestArrayFill.fillByteArray avgt 25 1793.206 ± 15.337 ns/op
TestArrayFill.fillIntArray avgt 25 6679.491 ± 14.729 ns/op
TestArrayFill.fillShortArray avgt 25 3412.708 ± 12.005 ns/op
TestArrayFill.zeroByteArray avgt 25 1785.940 ± 15.174 ns/op
TestArrayFill.zeroIntArray avgt 25 6666.709 ± 11.735 ns/op
TestArrayFill.zeroShortArray avgt 25 3404.146 ± 23.045 ns/op
After (x86 with -XX:+OptimizeFill)
Benchmark Mode Cnt Score Error Units
TestArrayFill.fillByteArray avgt 25 2281.374 ± 191.220 ns/op
TestArrayFill.fillIntArray avgt 25 9009.679 ± 901.541 ns/op
TestArrayFill.fillShortArray avgt 25 4828.686 ± 49.199 ns/op
TestArrayFill.zeroByteArray avgt 25 2463.745 ± 47.640 ns/op
TestArrayFill.zeroIntArray avgt 25 9062.682 ± 939.538 ns/op
TestArrayFill.zeroShortArray avgt 25 4837.231 ± 50.026 ns/op
> Hi,
>
> Can I have a review of this C2 loop optimization fix?
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8247307
> Webrev: http://cr.openjdk.java.net/~pli/rfr/8247307/webrev.00/
>
> C2 has a loop optimization phase called intrinsify_fill. It matches the pattern
> of single array store with an loop invariant in a counted loop, like below, and
> replaces it with call to some stub routine.
>
> for (int i = start; i < limit; i++) {
> a[i] = value;
> }
>
> Unfortunately, this doesn't work in current jdk after loop strip mining.
> The above loop is eventually unrolled and auto-vectorized by subsequent
> optimization phases. Root cause is that in strip-mined loops, the inner
> CountedLoopNode may be used by the address polling node of the safepoint
> in the outer loop. But as the safepoint polling has nothing related to any real
> operations in the loop, it should not hinder the pattern match.
> So in this patch, the polladr's use is ignored in the match check.
>
> We have some performance comparison of the code for array fill, between
> the auto-vectorized version and the stub routine version. The JMH case for
> the tests can be found at [1]. Results show that on x86, the stub code is even
> slower than the auto-vectorized code. To prevent any regression, vm option
> OptimizedFill is turned off for x86 in this patch.
> So this patch doesn't impact on the generated code on x86. On AArch64, the
> two versions show almost the same performance in general cases. But if the
> value to be filled is zero, the stub code's performance is much better. This
> makes sence as AArch64 uses cache maintenance instructions (DC ZVA) to
> zero large blocks in the hand-crafted assembly. Below are JMH scores on
> AArch64.
>
> Before:
> Benchmark Mode Cnt Score Error Units
> TestArrayFill.fillByteArray avgt 25 2078.700 ± 7.719 ns/op
> TestArrayFill.fillIntArray avgt 25 12371.497 ± 566.773 ns/op
> TestArrayFill.fillShortArray avgt 25 4132.439 ± 25.096 ns/op
> TestArrayFill.zeroByteArray avgt 25 2080.313 ± 7.516 ns/op
> TestArrayFill.zeroIntArray avgt 25 10961.331 ± 527.750 ns/op
> TestArrayFill.zeroShortArray avgt 25 4126.386 ± 20.997 ns/op
>
> After:
> Benchmark Mode Cnt Score Error Units
> TestArrayFill.fillByteArray avgt 25 2080.382 ± 2.103 ns/op
> TestArrayFill.fillIntArray avgt 25 11997.621 ± 569.058 ns/op
> TestArrayFill.fillShortArray avgt 25 4309.035 ± 285.456 ns/op
> TestArrayFill.zeroByteArray avgt 25 903.434 ± 10.944 ns/op
> TestArrayFill.zeroIntArray avgt 25 8141.533 ± 946.341 ns/op
> TestArrayFill.zeroShortArray avgt 25 1784.124 ± 24.618 ns/op
>
> Another advantage of using the stub routine is that the generated code size is
> reduced.
>
> Jtreg hotspot::hotspot_all_no_apps, jdk::jdk_core, langtools::tier1 are tested
> and no new failure is found.
Thanks,
Pengfei
More information about the hotspot-compiler-dev
mailing list