PING: RFR: 8245158: C2: Enable SLP for some manually unrolled loops
Pengfei Li
Pengfei.Li at arm.com
Tue May 26 09:50:57 UTC 2020
Ping - Any reviews of this?
--
Thanks,
Pengfei
> Can I have a review of this enhancement of C2 SLP?
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8245158
> Webrev: http://cr.openjdk.java.net/~pli/rfr/8245158/webrev.00/
>
> Below Java loop with stride = 1 can be vectorized by C2.
> for (int i = start; i < limit; i++) {
> c[i] = a[i] + b[i];
> }
>
> But if it's manually unrolled once, like in the code below, SLP would fail to
> vectorize it.
> for (int i = start; i < limit; i += 2) {
> c[i] = a[i] + b[i];
> c[i + 1] = a[i + 1] + b[i + 1];
> }
>
> Notably, if the induction variable's initial value "start" is replaced by a
> compile-time constant, the vectorization works.
>
> Root cause of these is that in current C2 SuperWord implementation,
> find_adjacent_refs() calls find_align_to_ref() to select a "best align to"
> memory reference to create packs, and particularly, the reference selected
> must be "pre-loop alignable". In other words, C2 must be able to adjust the
> pre-loop trip count such that the vectorized access of this reference is aligned.
> Hence, in find_align_to_ref(), unalignable memory references are discarded.
> [1] Then SLP packs creation is aborted if no memory reference is eligible to be
> the "best align to". [2]
>
> In current C2 SLP code, the selected "best align to" reference has two uses.
> One is to compute alignment info in order to find adjacent memory
> references for packs creation. Another use is to facilitate the pre-loop trip
> count adjustment to align vector memory accesses in the main-loop.
> But on some platforms, aligning vector accesses is not a mandatory
> requirement (after Roland's JDK-8215483 [3], this is usually checked by
> "!Matcher::misaligned_vectors_ok() || AlignVector"). So the "best align to"
> memory reference doesn't have to be "pre-loop alignable" on all platforms.
> In this patch, we only discard unalignable references when that platform-
> dependent check returns true.
>
> After this patch, some manually unrolled loops can be vectorized on
> platforms with no alignment requirement. As almost all modern x86 CPUs
> support unaligned vector move, I suspect this can benefit the majority of
> today's CPUs.
>
> Please note that this patch doesn't try to enable SLP for all manually unrolled
> loops. If above case is unrolled more times, vectorization may still don't work.
> The reason behind is that current SLP applies only to main-loops produced by
> the iteration split. When the loop is manually unrolled many times, node
> count may exceed LoopUnrollLimit, resulting in no iteration split at all.
> Although this can be workarounded by relaxing the unrolling policy by
> slp_max_unroll_factor, we don't do in this way since splitting a big loop may
> increase too much code size. Anyone wants to vectorize a super-manually-
> unrolled loop can use -XX:LoopUnrollLimit= with a greater value.
>
> [Tests]
>
> Jtreg hotspot::hotspot_all_no_apps, jdk::jdk_core, langtools::tier1 are tested
> and no new failure is found.
>
> Below are the results of the JMH test [4] from above case.
>
> Before:
> Benchmark Mode Cnt Score Error Units
> TestUnrolledLoop.bar thrpt 25 58097.290 ± 128.802 ops/s
>
> After:
> Benchmark Mode Cnt Score Error Units
> TestUnrolledLoop.bar thrpt 25 260110.139 ± 10902.284 ops/s
>
> [1]
> http://hg.openjdk.java.net/jdk/jdk/file/a0a21978f3b9/src/hotspot/share/opt
> o/superword.cpp#l780
> [2]
> http://hg.openjdk.java.net/jdk/jdk/file/a0a21978f3b9/src/hotspot/share/opt
> o/superword.cpp#l587
> [3] http://hg.openjdk.java.net/jdk/jdk/rev/da7dc9e92d91
> [4] http://cr.openjdk.java.net/~pli/rfr/8245158/TestUnrolledLoop.java
>
> --
> Thanks,
> Pengfei
More information about the hotspot-compiler-dev
mailing list