Issues with loop unrolling: better pinned node

Radosław Smogura mail at smogura.eu
Thu Aug 12 20:37:40 UTC 2021


Vladimir, Paul,

I hope you have a good day.

I wonder what do you think about something like this [1] - it's virtually small single class change

This change tries to find unique memory for load node. I implemented it as separate phase, as optimization may not run in Ideal method. I think it's ligher than phi split out.

Loops has been transformed. RCE started.

Kind regards,
Rado

[1] - https://github.com/rsmogura/panama-vector/commit/a44f515890d2c4df3fd0e0ced76545a7664926c3
[2] - https://github.com/rsmogura/panama-vector/tree/housekeeping-load-memory-optimiziation (full test case)

________________________________
From: Radosław Smogura on behalf of Radosław Smogura <mail at smogura.eu>
Sent: Friday, August 6, 2021 22:43
To: Radosław Smogura <mail at smogura.eu>; Paul Sandoz <paul.sandoz at oracle.com>; Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
Subject: Re: Issues with loop unrolling: better pinned node

Hi all,

Now when I checked it again. it works as expected, and it's the same code.

In draft code I check if the buffer is direct by using type checking to unswitch loop, as unswitching over ByteBuffer.hb did not work (the graph was quite similar). However, I thought that this unswitch actually helped to build correct loops, and any kind of improvement around it would be rather for the purpose of better-looking code.

But it looks like that sometimes (but only sometimes) loop still can not be correctly built, or maybe the full optimization kicks in very, very late.

Kind regards,
Rado
________________________________
From: panama-dev <panama-dev-retn at openjdk.java.net> on behalf of Radosław Smogura <mail at smogura.eu>
Sent: Friday, August 6, 2021 20:22
To: Paul Sandoz <paul.sandoz at oracle.com>
Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
Subject: Re: Issues with loop unrolling: better pinned node

Yes,

The normal case looks, good. It's all about polluted cases [1]

BR,
Rado

[1] https://github.com/openjdk/panama-vector/pull/109
[https://opengraph.githubassets.com/daf8e3b93dd4c25e04d1ce6ae2a91e1b725625bfd85b5027c61fb78ae3a6a361/openjdk/panama-vector/pull/109]<https://github.com/openjdk/panama-vector/pull/109>
(Draft) Perofrmance improvements for polluted cases by rsmogura · Pull Request #109 · openjdk/panama-vector<https://github.com/openjdk/panama-vector/pull/109>
Hi all, I would like to submit this piece of work, for byte buffers and polluted cases. It resolves some performance issues related to mem barriers when in scope are both on- and off-heap buffer. T...
github.com

[https://opengraph.githubassets.com/5fde12f89c012a2abef1542ed59c7272429fa7556f6e82a5e617a293d3a5bee1/openjdk/panama-vector]<https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1>
Comparing openjdk:vectorIntrinsics...rsmogura:vectors-polluted-cases · openjdk/panama-vector<https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1>
Panama vector. Contribute to openjdk/panama-vector development by creating an account on GitHub.
github.com

________________________________
From: Paul Sandoz <paul.sandoz at oracle.com>
Sent: Friday, August 6, 2021 20:04
To: Radosław Smogura <mail at smogura.eu>
Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
Subject: Re: Issues with loop unrolling: better pinned node

I am confused as to the case under test. In your initial email of this thread were you also referring implicitly to polluted cases?

Paul.

> On Aug 6, 2021, at 10:56 AM, Radosław Smogura <mail at smogura.eu> wrote:
>
> Hi Paul,
>
> There's a performance improvement, but. I still can't unroll polluted cases (I cherry-picked loop unrolling). The graph still has few nodes taking buffer limit from phi, and on IR I don't see vectors nodes cascading.
>
> make test TEST='micro:ByteBufferVectorAccess.p' MICRO="OPTIONS=-f 1 -prof perfasm -jvmArgsPrepend=-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0" JOBS=12
> Benchmark                                     (size)  Mode  Cnt   Score   Error  Units
> ByteBufferVectorAccess.pollutedBuffers2         1024  avgt   30  40.472 ? 1.055  ns/op
> ByteBufferVectorAccess.pollutedBuffers2:?asm    1024  avgt          NaN            ---
> ByteBufferVectorAccess.pollutedBuffers3         1024  avgt   30  79.251 ? 0.786  ns/op
> ByteBufferVectorAccess.pollutedBuffers3:?asm    1024  avgt          NaN            ---
> ByteBufferVectorAccess.pollutedBuffers4         1024  avgt   30  83.627 ? 2.140  ns/op
> ByteBufferVectorAccess.pollutedBuffers4:?asm    1024  avgt          NaN            ---
> ByteBufferVectorAccess.pollutedBuffers5         1024  avgt   30  85.561 ? 1.156  ns/op
> ByteBufferVectorAccess.pollutedBuffers5:?asm    1024  avgt          NaN
>
> make test TEST='micro:ByteBufferVectorAccess.p' MICRO="OPTIONS=-f 1 -prof perfasm"
> Benchmark                                     (size)  Mode  Cnt    Score   Error  Units
> ByteBufferVectorAccess.pollutedBuffers2         1024  avgt   10   49.326 ? 0.843  ns/op
> ByteBufferVectorAccess.pollutedBuffers2:?asm    1024  avgt           NaN            ---
> ByteBufferVectorAccess.pollutedBuffers3         1024  avgt   10  100.291 ? 1.271  ns/op
> ByteBufferVectorAccess.pollutedBuffers3:?asm    1024  avgt           NaN            ---
> ByteBufferVectorAccess.pollutedBuffers4         1024  avgt   10  101.494 ? 1.027  ns/op
> ByteBufferVectorAccess.pollutedBuffers4:?asm    1024  avgt           NaN            ---
> ByteBufferVectorAccess.pollutedBuffers5         1024  avgt   10   94.606 ? 1.522  ns/op
> ByteBufferVectorAccess.pollutedBuffers5:?asm    1024  avgt           NaN
>
>
> BR,
> Rado
> From: Paul Sandoz <paul.sandoz at oracle.com>
> Sent: Friday, August 6, 2021 18:04
> To: Radosław Smogura <mail at smogura.eu>
> Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
> Subject: Re: Issues with loop unrolling: better pinned node
>
> Hi Rado,
>
> It’s good you are looking at the IR
>
> Out of curiosity, what happens if you turn off bounds checking [*]?
>
> Paul.
>
> [*]
> -Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0
>
> > On Aug 6, 2021, at 8:39 AM, Radosław Smogura <mail at smogura.eu> wrote:
> >
> > Hi all,
> >
> > I've found that even if we get rid of barriers, the loop can't get unrolled, and not needed code is inside it.
> >
> > I've found this graph, I wonder if it's most optimal, in a partiucalry Load of ByteBuffer index / hb is from phi, could it be attached to initial memory?
> >
> > Here's a picture https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing
> > [https://lh6.googleusercontent.com/SKgGZgfVWFpG8w4mWqguLSU4DVfa1MKYPSQhxv8EoX04XzVz8U8Kc4zHP0iwdR26Suc=w1200-h630-p]<https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing>
> > bb_issues.png<https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing>
> > drive.google.com
> >
> >
> > And sample code
> >
> > protected void copyMemory(ByteBuffer in, ByteBuffer out) {
> >  var limit = SPECIES.loopBound(in.limit());
> >  for (int i=0; i < limit; i += SPECIES.vectorByteSize()) {
> >    final var v = ByteVector.fromByteBuffer(SPECIES, in, i, ByteOrder.nativeOrder());
> >    v.intoByteBuffer(out, i, ByteOrder.nativeOrder());
> >  }
> > }
> >
> > Kind regards,
> > Rado



More information about the panama-dev mailing list