RFR: JDK-8270147: Increase stride size allowing unrolling more loops

Vladimir Kozlov kvn at openjdk.java.net
Fri Jul 9 20:10:55 UTC 2021


On Thu, 1 Jul 2021 22:32:06 GMT, Radoslaw Smogura <github.com+7535718+rsmogura at openjdk.org> wrote:

> # Description
> 
> This increase allowed stride size for loop unrolling to almost maximum possible
> value which is around `max_jint /  2 - 2`,
> so the value which will prevent overflow when stride is doubled in C2.
> 
> The motivation of this change is discussion and research about unrolling
> vector (SIMD) loops. In such a case stride size depends on elements size
> and machine size of vector for AVX256 and int stride size is 8,
> and loop unroll happens. However short vectors will not cause loop unroll.
> 
> 
>     for (int i = 0; i < SPECIES.loopBound(longSize); i += SPECIES.length() /* 8 for int, 16 for short */ ) {
>       var v = ShortVector.fromByteBuffer(SPECIES, srcBufferHeap, i << 1, ByteOrder.nativeOrder());
>       v.intoByteBuffer(dstBufferHeap, i << 1, ByteOrder.nativeOrder());
>     }
> 
> 
> # Notes
> Stride size was decreased some time ago https://github.com/openjdk/panama-foreign/commit/2683d5390bd58683ae13bdd8582127c308d8fd04
> 
> The exact reasons for this are not known for me (over unroll of some loops?).
> 
> Original thread https://mail.openjdk.java.net/pipermail/panama-dev/2021-June/014310.html

RFR for [7039652](https://bugs.openjdk.java.net/browse/JDK-7039652) changes 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-April/005314.html

As RFR said, I assumed that stride check was used to limit unrolling to 16 (when initial stride is '1'). That is how `MAX_UNROLL` was introduced.

The stride check

   // Check for initial stride being a small enough constant
   if (abs(cl->stride_con()) > (1<<2)*future_unroll_ct) return false;

as comment said is equivalent to next check regardles of unrolling state:

  if (initial_stride_con > 8) return false;


General note about unrolling. There are two main issues:
1. Over-unroll - main loop will be skipped and not optimized post-loop will execute all iterations.
2. Main loop body size become too big so that fetching of loop's instructions become expensive (my interpretation).

I don't think skipping stride check (checking only for Integer overflow) is correct solution for current issue.
May be use `Matcher::max_vector_size()` instead:

   // Check for initial stride being a small constant but not smaller than max vector size
   int stride_limit = Matcher::max_vector_size(T_BYTE);
   if (abs(cl->stride_con()) > (stride_limit * future_unroll_ct)/2) return false;

-------------

PR: https://git.openjdk.java.net/jdk/pull/4658


More information about the hotspot-compiler-dev mailing list