Issues with loop unrolling: better pinned node
Radosław Smogura
mail at smogura.eu
Fri Aug 6 17:56:06 UTC 2021
Hi Paul,
There's a performance improvement, but. I still can't unroll polluted cases (I cherry-picked loop unrolling). The graph still has few nodes taking buffer limit from phi, and on IR I don't see vectors nodes cascading.
make test TEST='micro:ByteBufferVectorAccess.p' MICRO="OPTIONS=-f 1 -prof perfasm -jvmArgsPrepend=-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0" JOBS=12
Benchmark (size) Mode Cnt Score Error Units
ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 30 40.472 ? 1.055 ns/op
ByteBufferVectorAccess.pollutedBuffers2:?asm 1024 avgt NaN ---
ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 30 79.251 ? 0.786 ns/op
ByteBufferVectorAccess.pollutedBuffers3:?asm 1024 avgt NaN ---
ByteBufferVectorAccess.pollutedBuffers4 1024 avgt 30 83.627 ? 2.140 ns/op
ByteBufferVectorAccess.pollutedBuffers4:?asm 1024 avgt NaN ---
ByteBufferVectorAccess.pollutedBuffers5 1024 avgt 30 85.561 ? 1.156 ns/op
ByteBufferVectorAccess.pollutedBuffers5:?asm 1024 avgt NaN
make test TEST='micro:ByteBufferVectorAccess.p' MICRO="OPTIONS=-f 1 -prof perfasm"
Benchmark (size) Mode Cnt Score Error Units
ByteBufferVectorAccess.pollutedBuffers2 1024 avgt 10 49.326 ? 0.843 ns/op
ByteBufferVectorAccess.pollutedBuffers2:?asm 1024 avgt NaN ---
ByteBufferVectorAccess.pollutedBuffers3 1024 avgt 10 100.291 ? 1.271 ns/op
ByteBufferVectorAccess.pollutedBuffers3:?asm 1024 avgt NaN ---
ByteBufferVectorAccess.pollutedBuffers4 1024 avgt 10 101.494 ? 1.027 ns/op
ByteBufferVectorAccess.pollutedBuffers4:?asm 1024 avgt NaN ---
ByteBufferVectorAccess.pollutedBuffers5 1024 avgt 10 94.606 ? 1.522 ns/op
ByteBufferVectorAccess.pollutedBuffers5:?asm 1024 avgt NaN
BR,
Rado
________________________________
From: Paul Sandoz <paul.sandoz at oracle.com>
Sent: Friday, August 6, 2021 18:04
To: Radosław Smogura <mail at smogura.eu>
Cc: panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
Subject: Re: Issues with loop unrolling: better pinned node
Hi Rado,
It’s good you are looking at the IR
Out of curiosity, what happens if you turn off bounds checking [*]?
Paul.
[*]
-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0
> On Aug 6, 2021, at 8:39 AM, Radosław Smogura <mail at smogura.eu> wrote:
>
> Hi all,
>
> I've found that even if we get rid of barriers, the loop can't get unrolled, and not needed code is inside it.
>
> I've found this graph, I wonder if it's most optimal, in a partiucalry Load of ByteBuffer index / hb is from phi, could it be attached to initial memory?
>
> Here's a picture https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing
> [https://lh6.googleusercontent.com/SKgGZgfVWFpG8w4mWqguLSU4DVfa1MKYPSQhxv8EoX04XzVz8U8Kc4zHP0iwdR26Suc=w1200-h630-p]<https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing>
> bb_issues.png<https://drive.google.com/file/d/1G7ZN0xHOVIVHmZ_5TTIUdm3F30okAzvO/view?usp=sharing>
> drive.google.com
>
>
> And sample code
>
> protected void copyMemory(ByteBuffer in, ByteBuffer out) {
> var limit = SPECIES.loopBound(in.limit());
> for (int i=0; i < limit; i += SPECIES.vectorByteSize()) {
> final var v = ByteVector.fromByteBuffer(SPECIES, in, i, ByteOrder.nativeOrder());
> v.intoByteBuffer(out, i, ByteOrder.nativeOrder());
> }
> }
>
> Kind regards,
> Rado
More information about the panama-dev
mailing list