[External] : Re: Issues with loop unrolling: better pinned node

Tue Sep 7 18:01:31 UTC 2021

Thanks for giving it a try, Rado.

It feels like a lot of complexity comes from attempting to support 
multiple slices per memory operation.

How would it look like if you give up on them and use the 
TypePtr::BOTTOM/AliasIdxBot? Such memory operations won't be amenable 
for further memory-related optimizations (since they alias with any 
other memory operation), but it should significantly simplify their 
support, shouldn't it?

Best regards,
Vladimir Ivanov

On 02.09.2021 22:53, Rado Smogura wrote:
> Hi Vladimir,
> 
> 
> Thank you for feedback.
> 
> 
> There was one idea I had previously and I added it here (I surprised it 
> works):
> 
> * add additional filed TypeTuple _multi_load_adr to Node and set it in 
> mixed mode,
> 
> * in anti-deps add external loop to do analysis for every address from 
> this tuple
> 
> Minor changes:
> 
> * pass this field to mach node;
> 
> * in anti-deps load node has to traverse memory chain (normally this is 
> done in Ideal).
> 
> 
> I checked it with mixed "mode" operating on int and byte vectors and I 
> see storeV (raw / byte[]) gets anit-dep to loadV (raw/int[]), and same 
> for storeV(raw/byte[]) - so that's good - as there's interference over raw.
> 
> 
> https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/compare/vectorIntrinsics*mask...rsmogura:mixed-mode-use-bot-mem-opt-antideps?expand=1__;Kw!!ACWV5N9M2RV99hQ!ZnX5KqhoIDbUbqdEBiwN3v2aGgLQLfRteZuZKx0RmLzqhfMhcKrMedWCzfG8mBggvHhJ2R8$ 
> 
> 
> Kind regards,
> 
> Rado
> 
> On 01.09.2021 15:22, Vladimir Ivanov wrote:
>> Interesting idea, Rado! Representing memory effects of 
>> mixed/mismatched accesses with TypePtr::BOTTOM does look promising.
>>
>> Regarding the preferred IR shapes, I'd try to teach alias analysis 
>> (Compile::find_alias_type()) and PhaseCFG::insert_anti_dependences() 
>> about loads/stores on wide memory (TypePtr::BOTTOM) and see what kind 
>> of problems arise to decide how to proceed. I hope there's a way to 
>> avoid dummy nodes when representing desired effects.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 30.08.2021 18:12, Rado Smogura wrote:
>>> Hi all,
>>>
>>>
>>> I added one missing thing. I want to build something like this. Would 
>>> it make sense?
>>>
>>>
>>>     STORE
>>>
>>>
>>>                                 addr
>>>                                   │
>>>                                   │
>>>           reset_memory()          │
>>>              │    ┌───────────────┴────────┐
>>>              │    │ CheckCastPP (-> BOT)   │
>>>              │    └──────┬─────────────────┘
>>>              │           │
>>>              ├───────┐   │
>>>              │       │   │
>>>              │       │   │
>>>              │  ┌────┴───┴──────────────────────────┐
>>>              │  │            StoreVector            │
>>>              │  └───┬───────────────────────────┬───┘
>>>              │      │                           │
>>>              │      │                           │
>>> ┌┴──────┴───────────────────────────┴────────────────────────────┐
>>>             │ BOT  RAW byte[]                          │
>>>             │ MergeMem                                        │
>>> └────────────────────────────────────────────────────────────────┘
>>>
>>>
>>>
>>>      LOAD
>>>
>>>               │
>>>               │
>>>               ├─────────┐
>>>               │         │
>>>               │ 
>>> ┌───────┴─────────────────────────────────────────────────────┐
>>>               │ │ LoadVector 
>>> (BOT)                                            │
>>>               │ 
>>> └───────────────────────┬─────────────────────────┬───────────┘
>>>               │                         │ │
>>>               │     addr base -> raw │                         │ addr 
>>> base -> byte[]
>>>               │                         │ │
>>>               │           ┌─────────────┴─────────┐ 
>>> ┌───────────┴───────────┐
>>>               │           │DummyStoreV (raw)      │ │DummyStoreV 
>>> (byte[])   │ //No-op stores
>>>               │           └──────┬────────────────┘ 
>>> └──┬────────────────────┘
>>>               │                  │                       │
>>>               │     ┌────────────┘             ┌─────────┘
>>>               │     │                          │
>>> ┌─┴─────┴──────────────────────────┴──────────────────────────────┐
>>>             │ BOT  RAW byte[]                           │
>>>             │ MergeMem                                         │
>>> └─────────────────────────────────────────────────────────────────┘
>>>
>>>
>>> DummyStore is "virtual" node inserted after load, intended to emulate 
>>> store, and prevent writes / reads to go on the side of load vector 
>>> (it fact it more prevents store / load to see through mem-memrge).
>>>
>>> I did test it with following code.
>>>
>>> public static void copyMemoryBytes3(ByteBuffer in, ByteBuffer out, 
>>> ByteBuffer out2,byte[] arr) {
>>>      for (int i=0; i <SPECIES_BYTE.loopBound(in.limit()); i 
>>> +=SPECIES_BYTE.vectorByteSize()) {
>>>          var v1 = ByteVector.fromByteBuffer(SPECIES_BYTE, in, i, 
>>> ByteOrder.nativeOrder());
>>>          arr[i] = (byte) i;
>>>          var v2 = ByteVector.fromByteBuffer(SPECIES_BYTE, out, i, 
>>> ByteOrder.nativeOrder());
>>>          v1.intoByteBuffer(out, i, ByteOrder.nativeOrder());
>>>      }
>>> }
>>>
>>> Kind regards,
>>>
>>> Rado
>>>
>>> On 27.08.2021 20:16, Rado Smogura wrote:
>>>> Hi all,
>>>>
>>>>
>>>> I experimented a little bit, and I wonder if this is reasonable, the 
>>>> outcome on graphs is as expected, and operations looks like properly 
>>>> ordered (but this is my private opinion).
>>>>
>>>> https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/commit/755b62823aaed0cddf78e8ccfc60c063bb40779a__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVvmWp1wY$ 
>>>>
>>>>
>>>> Kind regards,
>>>>
>>>> Rado
>>>>
>>>> On 19.08.2021 22:26, Rado Smogura wrote:
>>>>> I think I answered this question quite simply... it will not work.
>>>>>
>>>>> On 19.08.2021 18:39, Rado Smogura wrote:
>>>>>> Hi all,
>>>>>>
>>>>>>
>>>>>> I hope you have a good day.
>>>>>>
>>>>>>
>>>>>> As still optimizing loops would be good approach, I thought about 
>>>>>> optimizing a mixed access with this approach:
>>>>>>
>>>>>>
>>>>>> 1. When mixed access is detected set flag "raw / byte array" mixed 
>>>>>> access.
>>>>>>
>>>>>> 2. Bail out and restart compilation (will happen during first 
>>>>>> phases, and only for few methods).
>>>>>>
>>>>>> 3. Pass a flag to compiler.
>>>>>>
>>>>>> 4. Modify find_alias_type / flatten_alias_type, so that if byte 
>>>>>> array will be queried for alias, raw ptr and raw alias will be used.
>>>>>>
>>>>>>
>>>>>> Kind regards,
>>>>>>
>>>>>> Rado
>>>>>>
>>>>>> On 18.08.2021 09:17, Rado Smogura wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>>
>>>>>>> Thank you for answer.
>>>>>>>
>>>>>>>
>>>>>>> In fact, it is was an attempt to confirm that memory flow can be 
>>>>>>> a cause why loop opts do not work. That's very fair point. I'll 
>>>>>>> think about it and maybe I'll be able to come out idea how this 
>>>>>>> can be generalized.
>>>>>>>
>>>>>>>
>>>>>>> Kind regards,
>>>>>>>
>>>>>>> Rado
>>>>>>>
>>>>>>> On 16.08.2021 15:41, Vladimir Ivanov wrote:
>>>>>>>>> I wonder what do you think about something like this [1] - it's 
>>>>>>>>> virtually small single class change
>>>>>>>>
>>>>>>>> Very interesting experiment, Rado! It's encouraging to hear that 
>>>>>>>> loop opts immediately benefit from it.
>>>>>>>>
>>>>>>>> From a architectural perspective, a separate pass to optimize 
>>>>>>>> memory graph brings excessive complexity:
>>>>>>>>
>>>>>>>>   (1) yet another pass over the graph and susceptible to pass 
>>>>>>>> ordering issues;
>>>>>>>>
>>>>>>>>   (2) separate from GVN: you either have to duplicate GVN-based 
>>>>>>>> memory optimizations or run new pass with IGVN in a loop until 
>>>>>>>> it stabilizes.
>>>>>>>>
>>>>>>>> IMO the problem you noticed illustrates a general weakness in 
>>>>>>>> GVN implementation and that's the place where it should be fixed 
>>>>>>>> (ideally).
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Vladimir Ivanov
>>>>>>>>
>>>>>>>>>
>>>>>>>>> This change tries to find unique memory for load node. I 
>>>>>>>>> implemented it as separate phase, as optimization may not run 
>>>>>>>>> in Ideal method. I think it's ligher than phi split out.
>>>>>>>>>
>>>>>>>>> Loops has been transformed. RCE started.
>>>>>>>>>
>>>>>>>>> Kind regards,
>>>>>>>>> Rado
>>>>>>>>>
>>>>>>>>> [1] - 
>>>>>>>>> https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/commit/a44f515890d2c4df3fd0e0ced76545a7664926c3__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVLT5AsEE$ 
>>>>>>>>> <https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/commit/a44f515890d2c4df3fd0e0ced76545a7664926c3__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvu60z1vk$> 
>>>>>>>>>
>>>>>>>>> [2] - 
>>>>>>>>> https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/tree/housekeeping-load-memory-optimiziation__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVcBkmVi0$ 
>>>>>>>>> <https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/tree/housekeeping-load-memory-optimiziation__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvkGUL-Pw$> 
>>>>>>>>> (full test case)
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>>
>>>>>>>>> *From:* Radosław Smogura on behalf of Radosław Smogura 
>>>>>>>>> <mail at smogura.eu>
>>>>>>>>> *Sent:* Friday, August 6, 2021 22:43
>>>>>>>>> *To:* Radosław Smogura <mail at smogura.eu>; Paul Sandoz 
>>>>>>>>> <paul.sandoz at oracle.com>; Vladimir Ivanov 
>>>>>>>>> <vladimir.x.ivanov at oracle.com>
>>>>>>>>> *Cc:* panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>>>>>>>>> *Subject:* Re: Issues with loop unrolling: better pinned node
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Now when I checked it again. it works as expected, and it's the 
>>>>>>>>> same code.
>>>>>>>>>
>>>>>>>>> In draft code I check if the buffer is direct by using type 
>>>>>>>>> checking to unswitch loop, as unswitching over ByteBuffer.hb 
>>>>>>>>> did not work (the graph was quite similar). However, I thought 
>>>>>>>>> that this unswitch actually helped to build correct loops, and 
>>>>>>>>> any kind of improvement around it would be rather for the 
>>>>>>>>> purpose of better-looking code.
>>>>>>>>>
>>>>>>>>> But it looks like that sometimes (but only sometimes) loop 
>>>>>>>>> still can not be correctly built, or maybe the full 
>>>>>>>>> optimization kicks in very, very late.
>>>>>>>>>
>>>>>>>>> Kind regards,
>>>>>>>>> Rado
>>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>>
>>>>>>>>> *From:* panama-dev <panama-dev-retn at openjdk.java.net> on behalf 
>>>>>>>>> of Radosław Smogura <mail at smogura.eu>
>>>>>>>>> *Sent:* Friday, August 6, 2021 20:22
>>>>>>>>> *To:* Paul Sandoz <paul.sandoz at oracle.com>
>>>>>>>>> *Cc:* panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>>>>>>>>> *Subject:* Re: Issues with loop unrolling: better pinned node
>>>>>>>>> Yes,
>>>>>>>>>
>>>>>>>>> The normal case looks, good. It's all about polluted cases [1]
>>>>>>>>>
>>>>>>>>> BR,
>>>>>>>>> Rado
>>>>>>>>>
>>>>>>>>> [1] 
>>>>>>>>> https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVfxQRu38$ 
>>>>>>>>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvTXVlXzw$> 
>>>>>>>>>
>>>>>>>>> [https://urldefense.com/v3/__https://opengraph.githubassets.com/daf8e3b93dd4c25e04d1ce6ae2a91e1b725625bfd85b5027c61fb78ae3a6a361/openjdk/panama-vector/pull/109__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVmHZKrgY$ 
>>>>>>>>> ]<https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVfxQRu38$ 
>>>>>>>>> <https://urldefense.com/v3/__https://opengraph.githubassets.com/daf8e3b93dd4c25e04d1ce6ae2a91e1b725625bfd85b5027c61fb78ae3a6a361/openjdk/panama-vector/pull/109**A3Chttps:/*github.com/openjdk/panama-vector/pull/109__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvjOF75Zk$>> 
>>>>>>>>>
>>>>>>>>> (Draft) Perofrmance improvements for polluted cases by rsmogura 
>>>>>>>>> · Pull Request #109 · 
>>>>>>>>> openjdk/panama-vector<https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVfxQRu38$ 
>>>>>>>>> > 
>>>>>>>>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109*3E__;JQ!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvXk316cU$> 
>>>>>>>>>
>>>>>>>>> Hi all, I would like to submit this piece of work, for byte 
>>>>>>>>> buffers and polluted cases. It resolves some performance issues 
>>>>>>>>> related to mem barriers when in scope are both on- and off-heap 
>>>>>>>>> buffer. T...
>>>>>>>>> github.com
>>>>>>>>>
>>>>>>>>> [https://urldefense.com/v3/__https://opengraph.githubassets.com/5fde12f89c012a2abef1542ed59c7272429fa7556f6e82a5e617a293d3a5bee1/openjdk/panama-vector__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVLW0LAx0$ 
>>>>>>>>> ]<https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVBYc4LXE$ 
>>>>>>>>> <https://urldefense.com/v3/__https://opengraph.githubassets.com/5fde12f89c012a2abef1542ed59c7272429fa7556f6e82a5e617a293d3a5bee1/openjdk/panama-vector**A3Chttps:/*github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvt9bVEEU$>> 
>>>>>>>>>
>>>>>>>>> Comparing 
>>>>>>>>> openjdk:vectorIntrinsics...rsmogura:vectors-polluted-cases · 
>>>>>>>>> openjdk/panama-vector<https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVBYc4LXE$ 
>>>>>>>>> > 
>>>>>>>>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1*3E__;JQ!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvW2CiAB0$> 
>>>>>>>>>
>>>>>>>>> Panama vector. Contribute to openjdk/panama-vector development 
>>>>>>>>> by creating an account on GitHub.
>>>>>>>>> github.com
>>>>>>>>>
>>>>>>>>> ________________________________
>>>>>>>>> From: Paul Sandoz <paul.sandoz at oracle.com>
>>>>>>>>> Sent: Friday, August 6, 2021 20:04
>>>>>>>>> To: Radosław Smogura <mail at smogura.eu>
>>>>>>>>> Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>>>>>>>>> Subject: Re: Issues with loop unrolling: better pinned node
>>>>>>>>>
>>>>>>>>> I am confused as to the case under test. In your initial email 
>>>>>>>>> of this thread were you also referring implicitly to polluted 
>>>>>>>>> cases?
>>>>>>>>>
>>>>>>>>> Paul.
>>>>>>>>>
>>>>>>>>>> On Aug 6, 2021, at 10:56 AM, Radosław Smogura 
>>>>>>>>>> <mail at smogura.eu> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Paul,
>>>>>>>>>>
>>>>>>>>>> There's a performance improvement, but. I still can't unroll 
>>>>>>>>>> polluted cases (I cherry-picked loop unrolling). The graph 
>>>>>>>>>> still has few nodes taking buffer limit from phi, and on IR I 
>>>>>>>>>> don't see vectors nodes cascading.
>>>>>>>>>>
>>>>>>>>>> make test TEST='micro:ByteBufferVectorAccess.p' 
>>>>>>>>>> MICRO="OPTIONS=-f 1 -prof perfasm 
>>>>>>>>>> -jvmArgsPrepend=-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0" 
>>>>>>>>>> JOBS=12
>>>>>>>>>> Benchmark (size) Mode Cnt Score   Error  Units
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 30 40.472 ? 
>>>>>>>>>> 1.055  ns/op
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers2:?asm 1024 
>>>>>>>>>> avgt          NaN            ---
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 30 79.251 ? 
>>>>>>>>>> 0.786  ns/op
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers3:?asm 1024 
>>>>>>>>>> avgt          NaN            ---
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers4 1024 avgt 30 83.627 ? 
>>>>>>>>>> 2.140  ns/op
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers4:?asm 1024 
>>>>>>>>>> avgt          NaN            ---
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers5 1024 avgt 30 85.561 ? 
>>>>>>>>>> 1.156  ns/op
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers5:?asm 1024 
>>>>>>>>>> avgt          NaN
>>>>>>>>>>
>>>>>>>>>> make test TEST='micro:ByteBufferVectorAccess.p' 
>>>>>>>>>> MICRO="OPTIONS=-f 1 -prof perfasm"
>>>>>>>>>> Benchmark (size) Mode Cnt Score   Error  Units
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 10 49.326 ? 
>>>>>>>>>> 0.843  ns/op
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers2:?asm 1024 
>>>>>>>>>> avgt           NaN            ---
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 10 100.291 ? 
>>>>>>>>>> 1.271  ns/op
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers3:?asm 1024 
>>>>>>>>>> avgt           NaN            ---
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers4 1024 avgt 10 101.494 ? 
>>>>>>>>>> 1.027  ns/op
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers4:?asm 1024 
>>>>>>>>>> avgt           NaN            ---
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers5 1024 avgt 10 94.606 ? 
>>>>>>>>>> 1.522  ns/op
>>>>>>>>>> ByteBufferVectorAccess.pollutedBuffers5:?asm 1024 
>>>>>>>>>> avgt           NaN
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> BR,
>>>>>>>>>> Rado
>>>>>>>>>> From: Paul Sandoz <paul.sandoz at oracle.com>
>>>>>>>>>> Sent: Friday, August 6, 2021 18:04
>>>>>>>>>> To: Radosław Smogura <mail at smogura.eu>
>>>>>>>>>> Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>>>>>>>>>> Subject: Re: Issues with loop unrolling: better pinned node
>>>>>>>>>>
>>>>>>>>>> Hi Rado,
>>>>>>>>>>
>>>>>>>>>> It’s good you are looking at the IR
>>>>>>>>>>
>>>>>>>>>> Out of curiosity, what happens if you turn off bounds checking 
>>>>>>>>>> [*]?
>>>>>>>>>>
>>>>>>>>>> Paul.
>>>>>>>>>>
>>>>>>>>>> [*]
>>>>>>>>>> -Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0
>>>>>>>>>>
>>>>>>>>>> > On Aug 6, 2021, at 8:39 AM, Radosław Smogura 
>>>>>>>>>> <mail at smogura.eu> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Hi all,
>>>>>>>>>> >
>>>>>>>>>> > I've found that even if we get rid of barriers, the loop 
>>>>>>>>>> can't get unrolled, and not needed code is inside it.
>>>>>>>>>> >
>>>>>>>>>> > I've found this graph, I wonder if it's most optimal, in a 
>>>>>>>>>> partiucalry Load of ByteBuffer index / hb is from phi, could 
>>>>>>>>>> it be attached to initial memory?
>>>>>>>>>> >
>>>>>>>>>> > Here's a picture 
>>>>>>>>>> https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVkhhZ0w8$ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvDYUmUX8$> 
>>>>>>>>>
>>>>>>>>>> > 
>>>>>>>>>> [https://urldefense.com/v3/__https://lh6.googleusercontent.com/SKgGZgfVWFpG8w4mWqguLSU4DVfa1MKYPSQhxv8EoX04XzVz8U8Kc4zHP0iwdR26Suc=w1200-h630-p__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVgkskdP0$ 
>>>>>>>>>> ]<https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVkhhZ0w8$ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <https://urldefense.com/v3/__https://lh6.googleusercontent.com/SKgGZgfVWFpG8w4mWqguLSU4DVfa1MKYPSQhxv8EoX04XzVz8U8Kc4zHP0iwdR26Suc=w1200-h630-p**A3Chttps:/*drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvT2w-EKw$>> 
>>>>>>>>>
>>>>>>>>>> > 
>>>>>>>>>> bb_issues.png<https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVkhhZ0w8$ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvDYUmUX8$>> 
>>>>>>>>>
>>>>>>>>>> > drive.google.com
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > And sample code
>>>>>>>>>> >
>>>>>>>>>> > protected void copyMemory(ByteBuffer in, ByteBuffer out) {
>>>>>>>>>> >  var limit = SPECIES.loopBound(in.limit());
>>>>>>>>>> >  for (int i=0; i < limit; i += SPECIES.vectorByteSize()) {
>>>>>>>>>> >    final var v = ByteVector.fromByteBuffer(SPECIES, in, i, 
>>>>>>>>>> ByteOrder.nativeOrder());
>>>>>>>>>> >    v.intoByteBuffer(out, i, ByteOrder.nativeOrder());
>>>>>>>>>> >  }
>>>>>>>>>> > }
>>>>>>>>>> >
>>>>>>>>>> > Kind regards,
>>>>>>>>>> > Rado
>>>>>>>>>