[External] : Re: Issues with loop unrolling: better pinned node

Wed Sep 1 13:22:15 UTC 2021

Interesting idea, Rado! Representing memory effects of mixed/mismatched 
accesses with TypePtr::BOTTOM does look promising.

Regarding the preferred IR shapes, I'd try to teach alias analysis 
(Compile::find_alias_type()) and PhaseCFG::insert_anti_dependences() 
about loads/stores on wide memory (TypePtr::BOTTOM) and see what kind of 
problems arise to decide how to proceed. I hope there's a way to avoid 
dummy nodes when representing desired effects.

Best regards,
Vladimir Ivanov

On 30.08.2021 18:12, Rado Smogura wrote:
> Hi all,
> 
> 
> I added one missing thing. I want to build something like this. Would it 
> make sense?
> 
> 
>     STORE
> 
> 
>                                 addr
>                                   │
>                                   │
>           reset_memory()          │
>              │    ┌───────────────┴────────┐
>              │    │ CheckCastPP (-> BOT)   │
>              │    └──────┬─────────────────┘
>              │           │
>              ├───────┐   │
>              │       │   │
>              │       │   │
>              │  ┌────┴───┴──────────────────────────┐
>              │  │            StoreVector            │
>              │  └───┬───────────────────────────┬───┘
>              │      │                           │
>              │      │                           │
> ┌┴──────┴───────────────────────────┴────────────────────────────┐
>             │ BOT  RAW byte[]                          │
>             │ MergeMem                                        │
> └────────────────────────────────────────────────────────────────┘
> 
> 
> 
>      LOAD
> 
>               │
>               │
>               ├─────────┐
>               │         │
>               │ 
> ┌───────┴─────────────────────────────────────────────────────┐
>               │ │ LoadVector 
> (BOT)                                            │
>               │ 
> └───────────────────────┬─────────────────────────┬───────────┘
>               │                         │ │
>               │     addr base -> raw │                         │    addr 
> base -> byte[]
>               │                         │ │
>               │           ┌─────────────┴─────────┐ 
> ┌───────────┴───────────┐
>               │           │DummyStoreV (raw)      │ │DummyStoreV 
> (byte[])   │ //No-op stores
>               │           └──────┬────────────────┘ 
> └──┬────────────────────┘
>               │                  │                       │
>               │     ┌────────────┘             ┌─────────┘
>               │     │                          │
> ┌─┴─────┴──────────────────────────┴──────────────────────────────┐
>             │ BOT  RAW byte[]                           │
>             │ MergeMem                                         │
> └─────────────────────────────────────────────────────────────────┘
> 
> 
> DummyStore is "virtual" node inserted after load, intended to emulate 
> store, and prevent writes / reads to go on the side of load vector (it 
> fact it more prevents store / load to see through mem-memrge).
> 
> I did test it with following code.
> 
> public static void copyMemoryBytes3(ByteBuffer in, ByteBuffer out, 
> ByteBuffer out2,byte[] arr) {
>      for (int i=0; i <SPECIES_BYTE.loopBound(in.limit()); i 
> +=SPECIES_BYTE.vectorByteSize()) {
>          var v1 = ByteVector.fromByteBuffer(SPECIES_BYTE, in, i, 
> ByteOrder.nativeOrder());
>          arr[i] = (byte) i;
>          var v2 = ByteVector.fromByteBuffer(SPECIES_BYTE, out, i, 
> ByteOrder.nativeOrder());
>          v1.intoByteBuffer(out, i, ByteOrder.nativeOrder());
>      }
> }
> 
> Kind regards,
> 
> Rado
> 
> On 27.08.2021 20:16, Rado Smogura wrote:
>> Hi all,
>>
>>
>> I experimented a little bit, and I wonder if this is reasonable, the 
>> outcome on graphs is as expected, and operations looks like properly 
>> ordered (but this is my private opinion).
>>
>> https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/commit/755b62823aaed0cddf78e8ccfc60c063bb40779a__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVvmWp1wY$ 
>>
>>
>> Kind regards,
>>
>> Rado
>>
>> On 19.08.2021 22:26, Rado Smogura wrote:
>>> I think I answered this question quite simply... it will not work.
>>>
>>> On 19.08.2021 18:39, Rado Smogura wrote:
>>>> Hi all,
>>>>
>>>>
>>>> I hope you have a good day.
>>>>
>>>>
>>>> As still optimizing loops would be good approach, I thought about 
>>>> optimizing a mixed access with this approach:
>>>>
>>>>
>>>> 1. When mixed access is detected set flag "raw / byte array" mixed 
>>>> access.
>>>>
>>>> 2. Bail out and restart compilation (will happen during first 
>>>> phases, and only for few methods).
>>>>
>>>> 3. Pass a flag to compiler.
>>>>
>>>> 4. Modify find_alias_type / flatten_alias_type, so that if byte 
>>>> array will be queried for alias, raw ptr and raw alias will be used.
>>>>
>>>>
>>>> Kind regards,
>>>>
>>>> Rado
>>>>
>>>> On 18.08.2021 09:17, Rado Smogura wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>>
>>>>> Thank you for answer.
>>>>>
>>>>>
>>>>> In fact, it is was an attempt to confirm that memory flow can be a 
>>>>> cause why loop opts do not work. That's very fair point. I'll think 
>>>>> about it and maybe I'll be able to come out idea how this can be 
>>>>> generalized.
>>>>>
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Rado
>>>>>
>>>>> On 16.08.2021 15:41, Vladimir Ivanov wrote:
>>>>>>> I wonder what do you think about something like this [1] - it's 
>>>>>>> virtually small single class change
>>>>>>
>>>>>> Very interesting experiment, Rado! It's encouraging to hear that 
>>>>>> loop opts immediately benefit from it.
>>>>>>
>>>>>> From a architectural perspective, a separate pass to optimize 
>>>>>> memory graph brings excessive complexity:
>>>>>>
>>>>>>   (1) yet another pass over the graph and susceptible to pass 
>>>>>> ordering issues;
>>>>>>
>>>>>>   (2) separate from GVN: you either have to duplicate GVN-based 
>>>>>> memory optimizations or run new pass with IGVN in a loop until it 
>>>>>> stabilizes.
>>>>>>
>>>>>> IMO the problem you noticed illustrates a general weakness in GVN 
>>>>>> implementation and that's the place where it should be fixed 
>>>>>> (ideally).
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir Ivanov
>>>>>>
>>>>>>>
>>>>>>> This change tries to find unique memory for load node. I 
>>>>>>> implemented it as separate phase, as optimization may not run in 
>>>>>>> Ideal method. I think it's ligher than phi split out.
>>>>>>>
>>>>>>> Loops has been transformed. RCE started.
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Rado
>>>>>>>
>>>>>>> [1] - 
>>>>>>> https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/commit/a44f515890d2c4df3fd0e0ced76545a7664926c3__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVLT5AsEE$ 
>>>>>>> <https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/commit/a44f515890d2c4df3fd0e0ced76545a7664926c3__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvu60z1vk$> 
>>>>>>>
>>>>>>> [2] - 
>>>>>>> https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/tree/housekeeping-load-memory-optimiziation__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVcBkmVi0$ 
>>>>>>> <https://urldefense.com/v3/__https://github.com/rsmogura/panama-vector/tree/housekeeping-load-memory-optimiziation__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvkGUL-Pw$> 
>>>>>>> (full test case)
>>>>>>>
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>
>>>>>>> *From:* Radosław Smogura on behalf of Radosław Smogura 
>>>>>>> <mail at smogura.eu>
>>>>>>> *Sent:* Friday, August 6, 2021 22:43
>>>>>>> *To:* Radosław Smogura <mail at smogura.eu>; Paul Sandoz 
>>>>>>> <paul.sandoz at oracle.com>; Vladimir Ivanov 
>>>>>>> <vladimir.x.ivanov at oracle.com>
>>>>>>> *Cc:* panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>>>>>>> *Subject:* Re: Issues with loop unrolling: better pinned node
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Now when I checked it again. it works as expected, and it's the 
>>>>>>> same code.
>>>>>>>
>>>>>>> In draft code I check if the buffer is direct by using type 
>>>>>>> checking to unswitch loop, as unswitching over ByteBuffer.hb did 
>>>>>>> not work (the graph was quite similar). However, I thought that 
>>>>>>> this unswitch actually helped to build correct loops, and any 
>>>>>>> kind of improvement around it would be rather for the purpose of 
>>>>>>> better-looking code.
>>>>>>>
>>>>>>> But it looks like that sometimes (but only sometimes) loop still 
>>>>>>> can not be correctly built, or maybe the full optimization kicks 
>>>>>>> in very, very late.
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Rado
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>
>>>>>>> *From:* panama-dev <panama-dev-retn at openjdk.java.net> on behalf 
>>>>>>> of Radosław Smogura <mail at smogura.eu>
>>>>>>> *Sent:* Friday, August 6, 2021 20:22
>>>>>>> *To:* Paul Sandoz <paul.sandoz at oracle.com>
>>>>>>> *Cc:* panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>>>>>>> *Subject:* Re: Issues with loop unrolling: better pinned node
>>>>>>> Yes,
>>>>>>>
>>>>>>> The normal case looks, good. It's all about polluted cases [1]
>>>>>>>
>>>>>>> BR,
>>>>>>> Rado
>>>>>>>
>>>>>>> [1] 
>>>>>>> https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVfxQRu38$ 
>>>>>>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvTXVlXzw$> 
>>>>>>>
>>>>>>> [https://urldefense.com/v3/__https://opengraph.githubassets.com/daf8e3b93dd4c25e04d1ce6ae2a91e1b725625bfd85b5027c61fb78ae3a6a361/openjdk/panama-vector/pull/109__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVmHZKrgY$ 
>>>>>>> ]<https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVfxQRu38$ 
>>>>>>> <https://urldefense.com/v3/__https://opengraph.githubassets.com/daf8e3b93dd4c25e04d1ce6ae2a91e1b725625bfd85b5027c61fb78ae3a6a361/openjdk/panama-vector/pull/109**A3Chttps:/*github.com/openjdk/panama-vector/pull/109__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvjOF75Zk$>> 
>>>>>>>
>>>>>>> (Draft) Perofrmance improvements for polluted cases by rsmogura · 
>>>>>>> Pull Request #109 · 
>>>>>>> openjdk/panama-vector<https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVfxQRu38$ 
>>>>>>> > 
>>>>>>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/pull/109*3E__;JQ!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvXk316cU$> 
>>>>>>>
>>>>>>> Hi all, I would like to submit this piece of work, for byte 
>>>>>>> buffers and polluted cases. It resolves some performance issues 
>>>>>>> related to mem barriers when in scope are both on- and off-heap 
>>>>>>> buffer. T...
>>>>>>> github.com
>>>>>>>
>>>>>>> [https://urldefense.com/v3/__https://opengraph.githubassets.com/5fde12f89c012a2abef1542ed59c7272429fa7556f6e82a5e617a293d3a5bee1/openjdk/panama-vector__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVLW0LAx0$ 
>>>>>>> ]<https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVBYc4LXE$ 
>>>>>>> <https://urldefense.com/v3/__https://opengraph.githubassets.com/5fde12f89c012a2abef1542ed59c7272429fa7556f6e82a5e617a293d3a5bee1/openjdk/panama-vector**A3Chttps:/*github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvt9bVEEU$>> 
>>>>>>>
>>>>>>> Comparing 
>>>>>>> openjdk:vectorIntrinsics...rsmogura:vectors-polluted-cases · 
>>>>>>> openjdk/panama-vector<https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVBYc4LXE$ 
>>>>>>> > 
>>>>>>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-vector/compare/vectorIntrinsics...rsmogura:vectors-polluted-cases?expand=1*3E__;JQ!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvW2CiAB0$> 
>>>>>>>
>>>>>>> Panama vector. Contribute to openjdk/panama-vector development by 
>>>>>>> creating an account on GitHub.
>>>>>>> github.com
>>>>>>>
>>>>>>> ________________________________
>>>>>>> From: Paul Sandoz <paul.sandoz at oracle.com>
>>>>>>> Sent: Friday, August 6, 2021 20:04
>>>>>>> To: Radosław Smogura <mail at smogura.eu>
>>>>>>> Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>>>>>>> Subject: Re: Issues with loop unrolling: better pinned node
>>>>>>>
>>>>>>> I am confused as to the case under test. In your initial email of 
>>>>>>> this thread were you also referring implicitly to polluted cases?
>>>>>>>
>>>>>>> Paul.
>>>>>>>
>>>>>>>> On Aug 6, 2021, at 10:56 AM, Radosław Smogura <mail at smogura.eu> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Paul,
>>>>>>>>
>>>>>>>> There's a performance improvement, but. I still can't unroll 
>>>>>>>> polluted cases (I cherry-picked loop unrolling). The graph still 
>>>>>>>> has few nodes taking buffer limit from phi, and on IR I don't 
>>>>>>>> see vectors nodes cascading.
>>>>>>>>
>>>>>>>> make test TEST='micro:ByteBufferVectorAccess.p' 
>>>>>>>> MICRO="OPTIONS=-f 1 -prof perfasm 
>>>>>>>> -jvmArgsPrepend=-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0" JOBS=12 
>>>>>>>>
>>>>>>>> Benchmark                                     (size) Mode Cnt 
>>>>>>>> Score   Error  Units
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers2         1024 avgt 30 
>>>>>>>> 40.472 ? 1.055  ns/op
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers2:?asm    1024 
>>>>>>>> avgt          NaN            ---
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers3         1024 avgt 30 
>>>>>>>> 79.251 ? 0.786  ns/op
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers3:?asm    1024 
>>>>>>>> avgt          NaN            ---
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers4         1024 avgt 30 
>>>>>>>> 83.627 ? 2.140  ns/op
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers4:?asm    1024 
>>>>>>>> avgt          NaN            ---
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers5         1024 avgt 30 
>>>>>>>> 85.561 ? 1.156  ns/op
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers5:?asm    1024 
>>>>>>>> avgt          NaN
>>>>>>>>
>>>>>>>> make test TEST='micro:ByteBufferVectorAccess.p' 
>>>>>>>> MICRO="OPTIONS=-f 1 -prof perfasm"
>>>>>>>> Benchmark                                     (size) Mode Cnt 
>>>>>>>> Score   Error  Units
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers2         1024 avgt 10 
>>>>>>>> 49.326 ? 0.843  ns/op
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers2:?asm    1024 
>>>>>>>> avgt           NaN            ---
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers3         1024 avgt 10 
>>>>>>>> 100.291 ? 1.271  ns/op
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers3:?asm    1024 
>>>>>>>> avgt           NaN            ---
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers4         1024 avgt 10 
>>>>>>>> 101.494 ? 1.027  ns/op
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers4:?asm    1024 
>>>>>>>> avgt           NaN            ---
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers5         1024 avgt 10 
>>>>>>>> 94.606 ? 1.522  ns/op
>>>>>>>> ByteBufferVectorAccess.pollutedBuffers5:?asm    1024 
>>>>>>>> avgt           NaN
>>>>>>>>
>>>>>>>>
>>>>>>>> BR,
>>>>>>>> Rado
>>>>>>>> From: Paul Sandoz <paul.sandoz at oracle.com>
>>>>>>>> Sent: Friday, August 6, 2021 18:04
>>>>>>>> To: Radosław Smogura <mail at smogura.eu>
>>>>>>>> Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
>>>>>>>> Subject: Re: Issues with loop unrolling: better pinned node
>>>>>>>>
>>>>>>>> Hi Rado,
>>>>>>>>
>>>>>>>> It’s good you are looking at the IR
>>>>>>>>
>>>>>>>> Out of curiosity, what happens if you turn off bounds checking [*]?
>>>>>>>>
>>>>>>>> Paul.
>>>>>>>>
>>>>>>>> [*]
>>>>>>>> -Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0
>>>>>>>>
>>>>>>>> > On Aug 6, 2021, at 8:39 AM, Radosław Smogura <mail at smogura.eu> 
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > Hi all,
>>>>>>>> >
>>>>>>>> > I've found that even if we get rid of barriers, the loop can't 
>>>>>>>> get unrolled, and not needed code is inside it.
>>>>>>>> >
>>>>>>>> > I've found this graph, I wonder if it's most optimal, in a 
>>>>>>>> partiucalry Load of ByteBuffer index / hb is from phi, could it 
>>>>>>>> be attached to initial memory?
>>>>>>>> >
>>>>>>>> > Here's a picture 
>>>>>>>> https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVkhhZ0w8$ 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvDYUmUX8$> 
>>>>>>>
>>>>>>>> > 
>>>>>>>> [https://urldefense.com/v3/__https://lh6.googleusercontent.com/SKgGZgfVWFpG8w4mWqguLSU4DVfa1MKYPSQhxv8EoX04XzVz8U8Kc4zHP0iwdR26Suc=w1200-h630-p__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVgkskdP0$ 
>>>>>>>> ]<https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVkhhZ0w8$ 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <https://urldefense.com/v3/__https://lh6.googleusercontent.com/SKgGZgfVWFpG8w4mWqguLSU4DVfa1MKYPSQhxv8EoX04XzVz8U8Kc4zHP0iwdR26Suc=w1200-h630-p**A3Chttps:/*drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;XSUv!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvT2w-EKw$>> 
>>>>>>>
>>>>>>>> > 
>>>>>>>> bb_issues.png<https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!ceve5Eoh01VSiAxgPOSMpL_oQpz6MJI6KeGEcvULButhjMZGdxMq2SB02arX5hxVkhhZ0w8$ 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <https://urldefense.com/v3/__https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing__;!!ACWV5N9M2RV99hQ!c_1aeHKPVlV91PddNfGPUgWISKQSh-fctE1r_hS0mCRD7zdKUeyFHAZBxTadx8tvDYUmUX8$>> 
>>>>>>>
>>>>>>>> > drive.google.com
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > And sample code
>>>>>>>> >
>>>>>>>> > protected void copyMemory(ByteBuffer in, ByteBuffer out) {
>>>>>>>> >  var limit = SPECIES.loopBound(in.limit());
>>>>>>>> >  for (int i=0; i < limit; i += SPECIES.vectorByteSize()) {
>>>>>>>> >    final var v = ByteVector.fromByteBuffer(SPECIES, in, i, 
>>>>>>>> ByteOrder.nativeOrder());
>>>>>>>> >    v.intoByteBuffer(out, i, ByteOrder.nativeOrder());
>>>>>>>> >  }
>>>>>>>> > }
>>>>>>>> >
>>>>>>>> > Kind regards,
>>>>>>>> > Rado
>>>>>>>