Issues with loop unrolling: better pinned node

Radosław Smogura mail at smogura.eu
Fri Aug 6 18:22:47 UTC 2021


Yes,

The normal case looks, good. It's all about polluted cases [1]

BR,
Rado

[1] https://github.com/openjdk/panama-vector/pull/109
[https://opengraph.githubassets.com/daf8e3b93dd4c25e04d1ce6ae2a91e1b725625bfd85b5027c61fb78ae3a6a361/openjdk/panama-vector/pull/109]<https://github.com/openjdk/panama-vector/pull/109>
(Draft) Perofrmance improvements for polluted cases by rsmogura · Pull Request #109 · openjdk/panama-vector<https://github.com/openjdk/panama-vector/pull/109>
Hi all, I would like to submit this piece of work, for byte buffers and polluted cases. It resolves some performance issues related to mem barriers when in scope are both on- and off-heap buffer. T...
github.com

[https://opengraph.githubassets.com/5fde12f89c012a2abef1542ed59c7272429fa7556f6e82a5e617a293d3a5bee1/openjdk/panama-vector]<https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1>
Comparing openjdk:vectorIntrinsics...rsmogura:vectors-polluted-cases · openjdk/panama-vector<https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1>
Panama vector. Contribute to openjdk/panama-vector development by creating an account on GitHub.
github.com

________________________________
From: Paul Sandoz <paul.sandoz at oracle.com>
Sent: Friday, August 6, 2021 20:04
To: Radosław Smogura <mail at smogura.eu>
Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
Subject: Re: Issues with loop unrolling: better pinned node

I am confused as to the case under test. In your initial email of this thread were you also referring implicitly to polluted cases?

Paul.

> On Aug 6, 2021, at 10:56 AM, Radosław Smogura <mail at smogura.eu> wrote:
>
> Hi Paul,
>
> There's a performance improvement, but. I still can't unroll polluted cases (I cherry-picked loop unrolling). The graph still has few nodes taking buffer limit from phi, and on IR I don't see vectors nodes cascading.
>
> make test TEST='micro:ByteBufferVectorAccess.p' MICRO="OPTIONS=-f 1 -prof perfasm -jvmArgsPrepend=-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0" JOBS=12
> Benchmark                                     (size)  Mode  Cnt   Score   Error  Units
> ByteBufferVectorAccess.pollutedBuffers2         1024  avgt   30  40.472 ? 1.055  ns/op
> ByteBufferVectorAccess.pollutedBuffers2:?asm    1024  avgt          NaN            ---
> ByteBufferVectorAccess.pollutedBuffers3         1024  avgt   30  79.251 ? 0.786  ns/op
> ByteBufferVectorAccess.pollutedBuffers3:?asm    1024  avgt          NaN            ---
> ByteBufferVectorAccess.pollutedBuffers4         1024  avgt   30  83.627 ? 2.140  ns/op
> ByteBufferVectorAccess.pollutedBuffers4:?asm    1024  avgt          NaN            ---
> ByteBufferVectorAccess.pollutedBuffers5         1024  avgt   30  85.561 ? 1.156  ns/op
> ByteBufferVectorAccess.pollutedBuffers5:?asm    1024  avgt          NaN
>
> make test TEST='micro:ByteBufferVectorAccess.p' MICRO="OPTIONS=-f 1 -prof perfasm"
> Benchmark                                     (size)  Mode  Cnt    Score   Error  Units
> ByteBufferVectorAccess.pollutedBuffers2         1024  avgt   10   49.326 ? 0.843  ns/op
> ByteBufferVectorAccess.pollutedBuffers2:?asm    1024  avgt           NaN            ---
> ByteBufferVectorAccess.pollutedBuffers3         1024  avgt   10  100.291 ? 1.271  ns/op
> ByteBufferVectorAccess.pollutedBuffers3:?asm    1024  avgt           NaN            ---
> ByteBufferVectorAccess.pollutedBuffers4         1024  avgt   10  101.494 ? 1.027  ns/op
> ByteBufferVectorAccess.pollutedBuffers4:?asm    1024  avgt           NaN            ---
> ByteBufferVectorAccess.pollutedBuffers5         1024  avgt   10   94.606 ? 1.522  ns/op
> ByteBufferVectorAccess.pollutedBuffers5:?asm    1024  avgt           NaN
>
>
> BR,
> Rado
> From: Paul Sandoz <paul.sandoz at oracle.com>
> Sent: Friday, August 6, 2021 18:04
> To: Radosław Smogura <mail at smogura.eu>
> Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
> Subject: Re: Issues with loop unrolling: better pinned node
>
> Hi Rado,
>
> It’s good you are looking at the IR
>
> Out of curiosity, what happens if you turn off bounds checking [*]?
>
> Paul.
>
> [*]
> -Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0
>
> > On Aug 6, 2021, at 8:39 AM, Radosław Smogura <mail at smogura.eu> wrote:
> >
> > Hi all,
> >
> > I've found that even if we get rid of barriers, the loop can't get unrolled, and not needed code is inside it.
> >
> > I've found this graph, I wonder if it's most optimal, in a partiucalry Load of ByteBuffer index / hb is from phi, could it be attached to initial memory?
> >
> > Here's a picture https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing
> > [https://lh6.googleusercontent.com/SKgGZgfVWFpG8w4mWqguLSU4DVfa1MKYPSQhxv8EoX04XzVz8U8Kc4zHP0iwdR26Suc=w1200-h630-p]<https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing>
> > bb_issues.png<https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing>
> > drive.google.com
> >
> >
> > And sample code
> >
> > protected void copyMemory(ByteBuffer in, ByteBuffer out) {
> >  var limit = SPECIES.loopBound(in.limit());
> >  for (int i=0; i < limit; i += SPECIES.vectorByteSize()) {
> >    final var v = ByteVector.fromByteBuffer(SPECIES, in, i, ByteOrder.nativeOrder());
> >    v.intoByteBuffer(out, i, ByteOrder.nativeOrder());
> >  }
> > }
> >
> > Kind regards,
> > Rado



More information about the panama-dev mailing list