RFR: JDK-8270147: Increase stride size allowing unrolling more loops
Vladimir Kozlov
kvn at openjdk.java.net
Fri Jul 9 20:10:55 UTC 2021
On Thu, 1 Jul 2021 22:32:06 GMT, Radoslaw Smogura <github.com+7535718+rsmogura at openjdk.org> wrote:
> # Description
>
> This increase allowed stride size for loop unrolling to almost maximum possible
> value which is around `max_jint / 2 - 2`,
> so the value which will prevent overflow when stride is doubled in C2.
>
> The motivation of this change is discussion and research about unrolling
> vector (SIMD) loops. In such a case stride size depends on elements size
> and machine size of vector for AVX256 and int stride size is 8,
> and loop unroll happens. However short vectors will not cause loop unroll.
>
>
> for (int i = 0; i < SPECIES.loopBound(longSize); i += SPECIES.length() /* 8 for int, 16 for short */ ) {
> var v = ShortVector.fromByteBuffer(SPECIES, srcBufferHeap, i << 1, ByteOrder.nativeOrder());
> v.intoByteBuffer(dstBufferHeap, i << 1, ByteOrder.nativeOrder());
> }
>
>
> # Notes
> Stride size was decreased some time ago https://github.com/openjdk/panama-foreign/commit/2683d5390bd58683ae13bdd8582127c308d8fd04
>
> The exact reasons for this are not known for me (over unroll of some loops?).
>
> Original thread https://mail.openjdk.java.net/pipermail/panama-dev/2021-June/014310.html
RFR for [7039652](https://bugs.openjdk.java.net/browse/JDK-7039652) changes
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-April/005314.html
As RFR said, I assumed that stride check was used to limit unrolling to 16 (when initial stride is '1'). That is how `MAX_UNROLL` was introduced.
The stride check
// Check for initial stride being a small enough constant
if (abs(cl->stride_con()) > (1<<2)*future_unroll_ct) return false;
as comment said is equivalent to next check regardles of unrolling state:
if (initial_stride_con > 8) return false;
General note about unrolling. There are two main issues:
1. Over-unroll - main loop will be skipped and not optimized post-loop will execute all iterations.
2. Main loop body size become too big so that fetching of loop's instructions become expensive (my interpretation).
I don't think skipping stride check (checking only for Integer overflow) is correct solution for current issue.
May be use `Matcher::max_vector_size()` instead:
// Check for initial stride being a small constant but not smaller than max vector size
int stride_limit = Matcher::max_vector_size(T_BYTE);
if (abs(cl->stride_con()) > (stride_limit * future_unroll_ct)/2) return false;
-------------
PR: https://git.openjdk.java.net/jdk/pull/4658
More information about the hotspot-compiler-dev
mailing list