[External] : Re: Issues with loop unrolling: better pinned node

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Mon Aug 16 13:41:22 UTC 2021


> I wonder what do you think about something like this [1] - it's 
> virtually small single class change

Very interesting experiment, Rado! It's encouraging to hear that loop 
opts immediately benefit from it.

 From a architectural perspective, a separate pass to optimize memory 
graph brings excessive complexity:

   (1) yet another pass over the graph and susceptible to pass ordering 
issues;

   (2) separate from GVN: you either have to duplicate GVN-based memory 
optimizations or run new pass with IGVN in a loop until it stabilizes.

IMO the problem you noticed illustrates a general weakness in GVN 
implementation and that's the place where it should be fixed (ideally).

Best regards,
Vladimir Ivanov

> 
> This change tries to find unique memory for load node. I implemented it 
> as separate phase, as optimization may not run in Ideal method. I think 
> it's ligher than phi split out.
> 
> Loops has been transformed. RCE started.
> 
> Kind regards,
> Rado
> 
> [1] - 
> https://github.com/rsmogura/panama-vector/commit/a44f515890d2c4df3fd0e0ced76545a7664926c3 
> <https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/commit/a44f515890d2c4df3fd0e0ced76545a7664926c3__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvu60z1vk$>
> [2] - 
> https://github.com/rsmogura/panama-vector/tree/housekeeping-load-memory-optimiziation 
> <https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/tree/housekeeping-load-memory-optimiziation__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvkGUL-Pw$> 
> (full test case)
> 
> ------------------------------------------------------------------------
> *From:* Radosław Smogura on behalf of Radosław Smogura <mail at smogura.eu>
> *Sent:* Friday, August 6, 2021 22:43
> *To:* Radosław Smogura <mail at smogura.eu>; Paul Sandoz 
> <paul.sandoz at oracle.com>; Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
> *Cc:* panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
> *Subject:* Re: Issues with loop unrolling: better pinned node
> Hi all,
> 
> Now when I checked it again. it works as expected, and it's the same code.
> 
> In draft code I check if the buffer is direct by using type checking to 
> unswitch loop, as unswitching over ByteBuffer.hb did not work (the graph 
> was quite similar). However, I thought that this unswitch actually 
> helped to build correct loops, and any kind of improvement around it 
> would be rather for the purpose of better-looking code.
> 
> But it looks like that sometimes (but only sometimes) loop still can not 
> be correctly built, or maybe the full optimization kicks in very, very late.
> 
> Kind regards,
> Rado
> ------------------------------------------------------------------------
> *From:* panama-dev <panama-dev-retn at openjdk.java.net> on behalf of 
> Radosław Smogura <mail at smogura.eu>
> *Sent:* Friday, August 6, 2021 20:22
> *To:* Paul Sandoz <paul.sandoz at oracle.com>
> *Cc:* panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
> *Subject:* Re: Issues with loop unrolling: better pinned node
> Yes,
> 
> The normal case looks, good. It's all about polluted cases [1]
> 
> BR,
> Rado
> 
> [1] https://github.com/openjdk/panama-vector/pull/109 
> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvTXVlXzw$>
> [https://opengraph.githubassets.com/daf8e3b93dd4c25e04d1ce6ae2a91e1b725625bfd85b5027c61fb78ae3a6a361/openjdk/panama-vector/pull/109]<https://github.com/openjdk/panama-vector/pull/109 
> <https://urldefense.com/v3/__https://opengraph.githubassets.com/daf8e3b93dd4c25e04d1ce6ae2a91e1b725625bfd85b5027c61fb78ae3a6a361/openjdk/panama-vector/pull/109**A3Chttps:/*github.com/openjdk/panama-vector/pull/109__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvjOF75Zk$>>
> (Draft) Perofrmance improvements for polluted cases by rsmogura · Pull 
> Request #109 · 
> openjdk/panama-vector<https://github.com/openjdk/panama-vector/pull/109> 
> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109*3E__;JQ!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvXk316cU$>
> Hi all, I would like to submit this piece of work, for byte buffers and 
> polluted cases. It resolves some performance issues related to mem 
> barriers when in scope are both on- and off-heap buffer. T...
> github.com
> 
> [https://opengraph.githubassets.com/5fde12f89c012a2abef1542ed59c7272429fa7556f6e82a5e617a293d3a5bee1/openjdk/panama-vector]<https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1 
> <https://urldefense.com/v3/__https://opengraph.githubassets.com/5fde12f89c012a2abef1542ed59c7272429fa7556f6e82a5e617a293d3a5bee1/openjdk/panama-vector**A3Chttps:/*github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvt9bVEEU$>>
> Comparing openjdk:vectorIntrinsics...rsmogura:vectors-polluted-cases · 
> openjdk/panama-vector<https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1> 
> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1*3E__;JQ!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvW2CiAB0$>
> Panama vector. Contribute to openjdk/panama-vector development by 
> creating an account on GitHub.
> github.com
> 
> ________________________________
> From: Paul Sandoz <paul.sandoz at oracle.com>
> Sent: Friday, August 6, 2021 20:04
> To: Radosław Smogura <mail at smogura.eu>
> Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
> Subject: Re: Issues with loop unrolling: better pinned node
> 
> I am confused as to the case under test. In your initial email of this 
> thread were you also referring implicitly to polluted cases?
> 
> Paul.
> 
>> On Aug 6, 2021, at 10:56 AM, Radosław Smogura <mail at smogura.eu> wrote:
>>
>> Hi Paul,
>>
>> There's a performance improvement, but. I still can't unroll polluted cases (I cherry-picked loop unrolling). The graph still has few nodes taking buffer limit from phi, and on IR I don't see vectors nodes cascading.
>>
>> make test TEST='micro:ByteBufferVectorAccess.p' MICRO="OPTIONS=-f 1 -prof perfasm -jvmArgsPrepend=-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0" JOBS=12
>> Benchmark                                     (size)  Mode  Cnt   Score   Error  Units
>> ByteBufferVectorAccess.pollutedBuffers2         1024  avgt   30  40.472 ? 1.055  ns/op
>> ByteBufferVectorAccess.pollutedBuffers2:?asm    1024  avgt          NaN            ---
>> ByteBufferVectorAccess.pollutedBuffers3         1024  avgt   30  79.251 ? 0.786  ns/op
>> ByteBufferVectorAccess.pollutedBuffers3:?asm    1024  avgt          NaN            ---
>> ByteBufferVectorAccess.pollutedBuffers4         1024  avgt   30  83.627 ? 2.140  ns/op
>> ByteBufferVectorAccess.pollutedBuffers4:?asm    1024  avgt          NaN            ---
>> ByteBufferVectorAccess.pollutedBuffers5         1024  avgt   30  85.561 ? 1.156  ns/op
>> ByteBufferVectorAccess.pollutedBuffers5:?asm    1024  avgt          NaN
>>
>> make test TEST='micro:ByteBufferVectorAccess.p' MICRO="OPTIONS=-f 1 -prof perfasm"
>> Benchmark                                     (size)  Mode  Cnt    Score   Error  Units
>> ByteBufferVectorAccess.pollutedBuffers2         1024  avgt   10   49.326 ? 0.843  ns/op
>> ByteBufferVectorAccess.pollutedBuffers2:?asm    1024  avgt           NaN            ---
>> ByteBufferVectorAccess.pollutedBuffers3         1024  avgt   10  100.291 ? 1.271  ns/op
>> ByteBufferVectorAccess.pollutedBuffers3:?asm    1024  avgt           NaN            ---
>> ByteBufferVectorAccess.pollutedBuffers4         1024  avgt   10  101.494 ? 1.027  ns/op
>> ByteBufferVectorAccess.pollutedBuffers4:?asm    1024  avgt           NaN            ---
>> ByteBufferVectorAccess.pollutedBuffers5         1024  avgt   10   94.606 ? 1.522  ns/op
>> ByteBufferVectorAccess.pollutedBuffers5:?asm    1024  avgt           NaN
>>
>>
>> BR,
>> Rado
>> From: Paul Sandoz <paul.sandoz at oracle.com>
>> Sent: Friday, August 6, 2021 18:04
>> To: Radosław Smogura <mail at smogura.eu>
>> Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>> Subject: Re: Issues with loop unrolling: better pinned node
>>
>> Hi Rado,
>>
>> It’s good you are looking at the IR
>>
>> Out of curiosity, what happens if you turn off bounds checking [*]?
>>
>> Paul.
>>
>> [*]
>> -Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0
>>
>> > On Aug 6, 2021, at 8:39 AM, Radosław Smogura <mail at smogura.eu> wrote:
>> >
>> > Hi all,
>> >
>> > I've found that even if we get rid of barriers, the loop can't get unrolled, and not needed code is inside it.
>> >
>> > I've found this graph, I wonder if it's most optimal, in a partiucalry Load of ByteBuffer index / hb is from phi, could it be attached to initial memory?
>> >
>> > Here's a picture https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing 
> <https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvDYUmUX8$>
>> > [https://lh6.googleusercontent.com/SKgGZgfVWFpG8w4mWqguLSU4DVfa1MKYPSQhxv8EoX04XzVz8U8Kc4zHP0iwdR26Suc=w1200-h630-p]<https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing 
> <https://urldefense.com/v3/__https://lh6.googleusercontent.com/SKgGZgfVWFpG8w4mWqguLSU4DVfa1MKYPSQhxv8EoX04XzVz8U8Kc4zHP0iwdR26Suc=w1200-h630-p**A3Chttps:/*drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvT2w-EKw$>>
>> > bb_issues.png<https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing 
> <https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvDYUmUX8$>>
>> > drive.google.com
>> >
>> >
>> > And sample code
>> >
>> > protected void copyMemory(ByteBuffer in, ByteBuffer out) {
>> >  var limit = SPECIES.loopBound(in.limit());
>> >  for (int i=0; i < limit; i += SPECIES.vectorByteSize()) {
>> >    final var v = ByteVector.fromByteBuffer(SPECIES, in, i, ByteOrder.nativeOrder());
>> >    v.intoByteBuffer(out, i, ByteOrder.nativeOrder());
>> >  }
>> > }
>> >
>> > Kind regards,
>> > Rado
> 


More information about the panama-dev mailing list