[External] : Re: Issues with loop unrolling: better pinned node
Rado Smogura
mail at smogura.eu
Thu Aug 19 16:39:24 UTC 2021
Hi all,
I hope you have a good day.
As still optimizing loops would be good approach, I thought about
optimizing a mixed access with this approach:
1. When mixed access is detected set flag "raw / byte array" mixed access.
2. Bail out and restart compilation (will happen during first phases,
and only for few methods).
3. Pass a flag to compiler.
4. Modify find_alias_type / flatten_alias_type, so that if byte array
will be queried for alias, raw ptr and raw alias will be used.
Kind regards,
Rado
On 18.08.2021 09:17, Rado Smogura wrote:
> Hi Vladimir,
>
>
> Thank you for answer.
>
>
> In fact, it is was an attempt to confirm that memory flow can be a
> cause why loop opts do not work. That's very fair point. I'll think
> about it and maybe I'll be able to come out idea how this can be
> generalized.
>
>
> Kind regards,
>
> Rado
>
> On 16.08.2021 15:41, Vladimir Ivanov wrote:
>>> I wonder what do you think about something like this [1] - it's
>>> virtually small single class change
>>
>> Very interesting experiment, Rado! It's encouraging to hear that loop
>> opts immediately benefit from it.
>>
>> From a architectural perspective, a separate pass to optimize memory
>> graph brings excessive complexity:
>>
>> (1) yet another pass over the graph and susceptible to pass
>> ordering issues;
>>
>> (2) separate from GVN: you either have to duplicate GVN-based
>> memory optimizations or run new pass with IGVN in a loop until it
>> stabilizes.
>>
>> IMO the problem you noticed illustrates a general weakness in GVN
>> implementation and that's the place where it should be fixed (ideally).
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>>
>>> This change tries to find unique memory for load node. I implemented
>>> it as separate phase, as optimization may not run in Ideal method. I
>>> think it's ligher than phi split out.
>>>
>>> Loops has been transformed. RCE started.
>>>
>>> Kind regards,
>>> Rado
>>>
>>> [1] -
>>> https://github.com/rsmogura/panama-vector/commit/a44f515890d2c4df3fd0e0ced76545a7664926c3
>>> <https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/commit/a44f515890d2c4df3fd0e0ced76545a7664926c3__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvu60z1vk$>
>>>
>>> [2] -
>>> https://github.com/rsmogura/panama-vector/tree/housekeeping-load-memory-optimiziation
>>> <https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/tree/housekeeping-load-memory-optimiziation__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvkGUL-Pw$>
>>> (full test case)
>>>
>>> ------------------------------------------------------------------------
>>>
>>> *From:* Radosław Smogura on behalf of Radosław Smogura
>>> <mail at smogura.eu>
>>> *Sent:* Friday, August 6, 2021 22:43
>>> *To:* Radosław Smogura <mail at smogura.eu>; Paul Sandoz
>>> <paul.sandoz at oracle.com>; Vladimir Ivanov
>>> <vladimir.x.ivanov at oracle.com>
>>> *Cc:* panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>>> *Subject:* Re: Issues with loop unrolling: better pinned node
>>> Hi all,
>>>
>>> Now when I checked it again. it works as expected, and it's the same
>>> code.
>>>
>>> In draft code I check if the buffer is direct by using type checking
>>> to unswitch loop, as unswitching over ByteBuffer.hb did not work
>>> (the graph was quite similar). However, I thought that this unswitch
>>> actually helped to build correct loops, and any kind of improvement
>>> around it would be rather for the purpose of better-looking code.
>>>
>>> But it looks like that sometimes (but only sometimes) loop still can
>>> not be correctly built, or maybe the full optimization kicks in
>>> very, very late.
>>>
>>> Kind regards,
>>> Rado
>>> ------------------------------------------------------------------------
>>>
>>> *From:* panama-dev <panama-dev-retn at openjdk.java.net> on behalf of
>>> Radosław Smogura <mail at smogura.eu>
>>> *Sent:* Friday, August 6, 2021 20:22
>>> *To:* Paul Sandoz <paul.sandoz at oracle.com>
>>> *Cc:* panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>>> *Subject:* Re: Issues with loop unrolling: better pinned node
>>> Yes,
>>>
>>> The normal case looks, good. It's all about polluted cases [1]
>>>
>>> BR,
>>> Rado
>>>
>>> [1] https://github.com/openjdk/panama-vector/pull/109
>>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvTXVlXzw$>
>>> [https://opengraph.githubassets.com/daf8e3b93dd4c25e04d1ce6ae2a91e1b725625bfd85b5027c61fb78ae3a6a361/openjdk/panama-vector/pull/109]<https://github.com/openjdk/panama-vector/pull/109
>>> <https://urldefense.com/v3/__https://opengraph.githubassets.com/daf8e3b93dd4c25e04d1ce6ae2a91e1b725625bfd85b5027c61fb78ae3a6a361/openjdk/panama-vector/pull/109**A3Chttps:/*github.com/openjdk/panama-vector/pull/109__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvjOF75Zk$>>
>>>
>>> (Draft) Perofrmance improvements for polluted cases by rsmogura ·
>>> Pull Request #109 ·
>>> openjdk/panama-vector<https://github.com/openjdk/panama-vector/pull/109>
>>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109*3E__;JQ!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvXk316cU$>
>>>
>>> Hi all, I would like to submit this piece of work, for byte buffers
>>> and polluted cases. It resolves some performance issues related to
>>> mem barriers when in scope are both on- and off-heap buffer. T...
>>> github.com
>>>
>>> [https://opengraph.githubassets.com/5fde12f89c012a2abef1542ed59c7272429fa7556f6e82a5e617a293d3a5bee1/openjdk/panama-vector]<https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1
>>> <https://urldefense.com/v3/__https://opengraph.githubassets.com/5fde12f89c012a2abef1542ed59c7272429fa7556f6e82a5e617a293d3a5bee1/openjdk/panama-vector**A3Chttps:/*github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvt9bVEEU$>>
>>>
>>> Comparing openjdk:vectorIntrinsics...rsmogura:vectors-polluted-cases
>>> ·
>>> openjdk/panama-vector<https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1>
>>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1*3E__;JQ!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvW2CiAB0$>
>>>
>>> Panama vector. Contribute to openjdk/panama-vector development by
>>> creating an account on GitHub.
>>> github.com
>>>
>>> ________________________________
>>> From: Paul Sandoz <paul.sandoz at oracle.com>
>>> Sent: Friday, August 6, 2021 20:04
>>> To: Radosław Smogura <mail at smogura.eu>
>>> Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>>> Subject: Re: Issues with loop unrolling: better pinned node
>>>
>>> I am confused as to the case under test. In your initial email of
>>> this thread were you also referring implicitly to polluted cases?
>>>
>>> Paul.
>>>
>>>> On Aug 6, 2021, at 10:56 AM, Radosław Smogura <mail at smogura.eu> wrote:
>>>>
>>>> Hi Paul,
>>>>
>>>> There's a performance improvement, but. I still can't unroll
>>>> polluted cases (I cherry-picked loop unrolling). The graph still
>>>> has few nodes taking buffer limit from phi, and on IR I don't see
>>>> vectors nodes cascading.
>>>>
>>>> make test TEST='micro:ByteBufferVectorAccess.p' MICRO="OPTIONS=-f 1
>>>> -prof perfasm
>>>> -jvmArgsPrepend=-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0"
>>>> JOBS=12
>>>> Benchmark (size) Mode Cnt
>>>> Score Error Units
>>>> ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 30
>>>> 40.472 ? 1.055 ns/op
>>>> ByteBufferVectorAccess.pollutedBuffers2:?asm 1024 avgt
>>>> NaN ---
>>>> ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 30
>>>> 79.251 ? 0.786 ns/op
>>>> ByteBufferVectorAccess.pollutedBuffers3:?asm 1024 avgt
>>>> NaN ---
>>>> ByteBufferVectorAccess.pollutedBuffers4 1024 avgt 30
>>>> 83.627 ? 2.140 ns/op
>>>> ByteBufferVectorAccess.pollutedBuffers4:?asm 1024 avgt
>>>> NaN ---
>>>> ByteBufferVectorAccess.pollutedBuffers5 1024 avgt 30
>>>> 85.561 ? 1.156 ns/op
>>>> ByteBufferVectorAccess.pollutedBuffers5:?asm 1024 avgt NaN
>>>>
>>>> make test TEST='micro:ByteBufferVectorAccess.p' MICRO="OPTIONS=-f 1
>>>> -prof perfasm"
>>>> Benchmark (size) Mode Cnt
>>>> Score Error Units
>>>> ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 10
>>>> 49.326 ? 0.843 ns/op
>>>> ByteBufferVectorAccess.pollutedBuffers2:?asm 1024 avgt
>>>> NaN ---
>>>> ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 10
>>>> 100.291 ? 1.271 ns/op
>>>> ByteBufferVectorAccess.pollutedBuffers3:?asm 1024 avgt
>>>> NaN ---
>>>> ByteBufferVectorAccess.pollutedBuffers4 1024 avgt 10
>>>> 101.494 ? 1.027 ns/op
>>>> ByteBufferVectorAccess.pollutedBuffers4:?asm 1024 avgt
>>>> NaN ---
>>>> ByteBufferVectorAccess.pollutedBuffers5 1024 avgt 10
>>>> 94.606 ? 1.522 ns/op
>>>> ByteBufferVectorAccess.pollutedBuffers5:?asm 1024 avgt
>>>> NaN
>>>>
>>>>
>>>> BR,
>>>> Rado
>>>> From: Paul Sandoz <paul.sandoz at oracle.com>
>>>> Sent: Friday, August 6, 2021 18:04
>>>> To: Radosław Smogura <mail at smogura.eu>
>>>> Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>>>> Subject: Re: Issues with loop unrolling: better pinned node
>>>>
>>>> Hi Rado,
>>>>
>>>> It’s good you are looking at the IR
>>>>
>>>> Out of curiosity, what happens if you turn off bounds checking [*]?
>>>>
>>>> Paul.
>>>>
>>>> [*]
>>>> -Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0
>>>>
>>>> > On Aug 6, 2021, at 8:39 AM, Radosław Smogura <mail at smogura.eu>
>>>> wrote:
>>>> >
>>>> > Hi all,
>>>> >
>>>> > I've found that even if we get rid of barriers, the loop can't
>>>> get unrolled, and not needed code is inside it.
>>>> >
>>>> > I've found this graph, I wonder if it's most optimal, in a
>>>> partiucalry Load of ByteBuffer index / hb is from phi, could it be
>>>> attached to initial memory?
>>>> >
>>>> > Here's a picture
>>>> https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing
>>>
>>>
>>> <https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvDYUmUX8$>
>>>
>>>> >
>>>> [https://lh6.googleusercontent.com/SKgGZgfVWFpG8w4mWqguLSU4DVfa1MKYPSQhxv8EoX04XzVz8U8Kc4zHP0iwdR26Suc=w1200-h630-p]<https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing
>>>
>>>
>>> <https://urldefense.com/v3/__https://lh6.googleusercontent.com/SKgGZgfVWFpG8w4mWqguLSU4DVfa1MKYPSQhxv8EoX04XzVz8U8Kc4zHP0iwdR26Suc=w1200-h630-p**A3Chttps:/*drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvT2w-EKw$>>
>>>
>>>> >
>>>> bb_issues.png<https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing
>>>
>>>
>>> <https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvDYUmUX8$>>
>>>
>>>> > drive.google.com
>>>> >
>>>> >
>>>> > And sample code
>>>> >
>>>> > protected void copyMemory(ByteBuffer in, ByteBuffer out) {
>>>> > var limit = SPECIES.loopBound(in.limit());
>>>> > for (int i=0; i < limit; i += SPECIES.vectorByteSize()) {
>>>> > final var v = ByteVector.fromByteBuffer(SPECIES, in, i,
>>>> ByteOrder.nativeOrder());
>>>> > v.intoByteBuffer(out, i, ByteOrder.nativeOrder());
>>>> > }
>>>> > }
>>>> >
>>>> > Kind regards,
>>>> > Rado
>>>
More information about the panama-dev
mailing list