[External] : Re: Issues with loop unrolling: better pinned node
Rado Smogura
mail at smogura.eu
Wed Aug 18 07:17:54 UTC 2021
Hi Vladimir,
Thank you for answer.
In fact, it is was an attempt to confirm that memory flow can be a cause
why loop opts do not work. That's very fair point. I'll think about it
and maybe I'll be able to come out idea how this can be generalized.
Kind regards,
Rado
On 16.08.2021 15:41, Vladimir Ivanov wrote:
>> I wonder what do you think about something like this [1] - it's
>> virtually small single class change
>
> Very interesting experiment, Rado! It's encouraging to hear that loop
> opts immediately benefit from it.
>
> From a architectural perspective, a separate pass to optimize memory
> graph brings excessive complexity:
>
> (1) yet another pass over the graph and susceptible to pass ordering
> issues;
>
> (2) separate from GVN: you either have to duplicate GVN-based memory
> optimizations or run new pass with IGVN in a loop until it stabilizes.
>
> IMO the problem you noticed illustrates a general weakness in GVN
> implementation and that's the place where it should be fixed (ideally).
>
> Best regards,
> Vladimir Ivanov
>
>>
>> This change tries to find unique memory for load node. I implemented
>> it as separate phase, as optimization may not run in Ideal method. I
>> think it's ligher than phi split out.
>>
>> Loops has been transformed. RCE started.
>>
>> Kind regards,
>> Rado
>>
>> [1] -
>> https://github.com/rsmogura/panama-vector/commit/a44f515890d2c4df3fd0e0ced76545a7664926c3
>> <https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/commit/a44f515890d2c4df3fd0e0ced76545a7664926c3__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvu60z1vk$>
>>
>> [2] -
>> https://github.com/rsmogura/panama-vector/tree/housekeeping-load-memory-optimiziation
>> <https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/tree/housekeeping-load-memory-optimiziation__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvkGUL-Pw$>
>> (full test case)
>>
>> ------------------------------------------------------------------------
>> *From:* Radosław Smogura on behalf of Radosław Smogura <mail at smogura.eu>
>> *Sent:* Friday, August 6, 2021 22:43
>> *To:* Radosław Smogura <mail at smogura.eu>; Paul Sandoz
>> <paul.sandoz at oracle.com>; Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
>> *Cc:* panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>> *Subject:* Re: Issues with loop unrolling: better pinned node
>> Hi all,
>>
>> Now when I checked it again. it works as expected, and it's the same
>> code.
>>
>> In draft code I check if the buffer is direct by using type checking
>> to unswitch loop, as unswitching over ByteBuffer.hb did not work (the
>> graph was quite similar). However, I thought that this unswitch
>> actually helped to build correct loops, and any kind of improvement
>> around it would be rather for the purpose of better-looking code.
>>
>> But it looks like that sometimes (but only sometimes) loop still can
>> not be correctly built, or maybe the full optimization kicks in very,
>> very late.
>>
>> Kind regards,
>> Rado
>> ------------------------------------------------------------------------
>> *From:* panama-dev <panama-dev-retn at openjdk.java.net> on behalf of
>> Radosław Smogura <mail at smogura.eu>
>> *Sent:* Friday, August 6, 2021 20:22
>> *To:* Paul Sandoz <paul.sandoz at oracle.com>
>> *Cc:* panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>> *Subject:* Re: Issues with loop unrolling: better pinned node
>> Yes,
>>
>> The normal case looks, good. It's all about polluted cases [1]
>>
>> BR,
>> Rado
>>
>> [1] https://github.com/openjdk/panama-vector/pull/109
>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvTXVlXzw$>
>> [https://opengraph.githubassets.com/daf8e3b93dd4c25e04d1ce6ae2a91e1b725625bfd85b5027c61fb78ae3a6a361/openjdk/panama-vector/pull/109]<https://github.com/openjdk/panama-vector/pull/109
>> <https://urldefense.com/v3/__https://opengraph.githubassets.com/daf8e3b93dd4c25e04d1ce6ae2a91e1b725625bfd85b5027c61fb78ae3a6a361/openjdk/panama-vector/pull/109**A3Chttps:/*github.com/openjdk/panama-vector/pull/109__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvjOF75Zk$>>
>>
>> (Draft) Perofrmance improvements for polluted cases by rsmogura ·
>> Pull Request #109 ·
>> openjdk/panama-vector<https://github.com/openjdk/panama-vector/pull/109>
>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109*3E__;JQ!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvXk316cU$>
>>
>> Hi all, I would like to submit this piece of work, for byte buffers
>> and polluted cases. It resolves some performance issues related to
>> mem barriers when in scope are both on- and off-heap buffer. T...
>> github.com
>>
>> [https://opengraph.githubassets.com/5fde12f89c012a2abef1542ed59c7272429fa7556f6e82a5e617a293d3a5bee1/openjdk/panama-vector]<https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1
>> <https://urldefense.com/v3/__https://opengraph.githubassets.com/5fde12f89c012a2abef1542ed59c7272429fa7556f6e82a5e617a293d3a5bee1/openjdk/panama-vector**A3Chttps:/*github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvt9bVEEU$>>
>>
>> Comparing openjdk:vectorIntrinsics...rsmogura:vectors-polluted-cases
>> ·
>> openjdk/panama-vector<https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1>
>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1*3E__;JQ!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvW2CiAB0$>
>>
>> Panama vector. Contribute to openjdk/panama-vector development by
>> creating an account on GitHub.
>> github.com
>>
>> ________________________________
>> From: Paul Sandoz <paul.sandoz at oracle.com>
>> Sent: Friday, August 6, 2021 20:04
>> To: Radosław Smogura <mail at smogura.eu>
>> Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>> Subject: Re: Issues with loop unrolling: better pinned node
>>
>> I am confused as to the case under test. In your initial email of
>> this thread were you also referring implicitly to polluted cases?
>>
>> Paul.
>>
>>> On Aug 6, 2021, at 10:56 AM, Radosław Smogura <mail at smogura.eu> wrote:
>>>
>>> Hi Paul,
>>>
>>> There's a performance improvement, but. I still can't unroll
>>> polluted cases (I cherry-picked loop unrolling). The graph still has
>>> few nodes taking buffer limit from phi, and on IR I don't see
>>> vectors nodes cascading.
>>>
>>> make test TEST='micro:ByteBufferVectorAccess.p' MICRO="OPTIONS=-f 1
>>> -prof perfasm
>>> -jvmArgsPrepend=-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0"
>>> JOBS=12
>>> Benchmark (size) Mode Cnt
>>> Score Error Units
>>> ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 30
>>> 40.472 ? 1.055 ns/op
>>> ByteBufferVectorAccess.pollutedBuffers2:?asm 1024 avgt
>>> NaN ---
>>> ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 30
>>> 79.251 ? 0.786 ns/op
>>> ByteBufferVectorAccess.pollutedBuffers3:?asm 1024 avgt
>>> NaN ---
>>> ByteBufferVectorAccess.pollutedBuffers4 1024 avgt 30
>>> 83.627 ? 2.140 ns/op
>>> ByteBufferVectorAccess.pollutedBuffers4:?asm 1024 avgt
>>> NaN ---
>>> ByteBufferVectorAccess.pollutedBuffers5 1024 avgt 30
>>> 85.561 ? 1.156 ns/op
>>> ByteBufferVectorAccess.pollutedBuffers5:?asm 1024 avgt NaN
>>>
>>> make test TEST='micro:ByteBufferVectorAccess.p' MICRO="OPTIONS=-f 1
>>> -prof perfasm"
>>> Benchmark (size) Mode Cnt
>>> Score Error Units
>>> ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 10
>>> 49.326 ? 0.843 ns/op
>>> ByteBufferVectorAccess.pollutedBuffers2:?asm 1024 avgt
>>> NaN ---
>>> ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 10
>>> 100.291 ? 1.271 ns/op
>>> ByteBufferVectorAccess.pollutedBuffers3:?asm 1024 avgt
>>> NaN ---
>>> ByteBufferVectorAccess.pollutedBuffers4 1024 avgt 10
>>> 101.494 ? 1.027 ns/op
>>> ByteBufferVectorAccess.pollutedBuffers4:?asm 1024 avgt
>>> NaN ---
>>> ByteBufferVectorAccess.pollutedBuffers5 1024 avgt 10
>>> 94.606 ? 1.522 ns/op
>>> ByteBufferVectorAccess.pollutedBuffers5:?asm 1024 avgt NaN
>>>
>>>
>>> BR,
>>> Rado
>>> From: Paul Sandoz <paul.sandoz at oracle.com>
>>> Sent: Friday, August 6, 2021 18:04
>>> To: Radosław Smogura <mail at smogura.eu>
>>> Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>>> Subject: Re: Issues with loop unrolling: better pinned node
>>>
>>> Hi Rado,
>>>
>>> It’s good you are looking at the IR
>>>
>>> Out of curiosity, what happens if you turn off bounds checking [*]?
>>>
>>> Paul.
>>>
>>> [*]
>>> -Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0
>>>
>>> > On Aug 6, 2021, at 8:39 AM, Radosław Smogura <mail at smogura.eu> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > I've found that even if we get rid of barriers, the loop can't get
>>> unrolled, and not needed code is inside it.
>>> >
>>> > I've found this graph, I wonder if it's most optimal, in a
>>> partiucalry Load of ByteBuffer index / hb is from phi, could it be
>>> attached to initial memory?
>>> >
>>> > Here's a picture
>>> https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing
>>
>> <https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvDYUmUX8$>
>>
>>> >
>>> [https://lh6.googleusercontent.com/SKgGZgfVWFpG8w4mWqguLSU4DVfa1MKYPSQhxv8EoX04XzVz8U8Kc4zHP0iwdR26Suc=w1200-h630-p]<https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing
>>
>> <https://urldefense.com/v3/__https://lh6.googleusercontent.com/SKgGZgfVWFpG8w4mWqguLSU4DVfa1MKYPSQhxv8EoX04XzVz8U8Kc4zHP0iwdR26Suc=w1200-h630-p**A3Chttps:/*drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvT2w-EKw$>>
>>
>>> >
>>> bb_issues.png<https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing
>>
>> <https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvDYUmUX8$>>
>>
>>> > drive.google.com
>>> >
>>> >
>>> > And sample code
>>> >
>>> > protected void copyMemory(ByteBuffer in, ByteBuffer out) {
>>> > var limit = SPECIES.loopBound(in.limit());
>>> > for (int i=0; i < limit; i += SPECIES.vectorByteSize()) {
>>> > final var v = ByteVector.fromByteBuffer(SPECIES, in, i,
>>> ByteOrder.nativeOrder());
>>> > v.intoByteBuffer(out, i, ByteOrder.nativeOrder());
>>> > }
>>> > }
>>> >
>>> > Kind regards,
>>> > Rado
>>
More information about the panama-dev
mailing list