Foreign memory access hot loop benchmark
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Mon Jan 4 18:27:54 UTC 2021
What happens with longs? Do you still see the slowdown?
Maurizio
On 04/01/2021 17:31, Antoine Chambille wrote:
> /(using fixed width font ;)/
>
>
> Thank you Maurizio, for looking into this.
>
> This is a good find, I've just updated and rebuilt the Panama JDK, I
> confirm that the big slowdown with manually unrolled loop and memory
> handles has disappeared for the AddBenchmark.unrolledMHI_v2 benchmark.
> But it is apparently still present in one last case:
> AddBenchmark.unrolledMHI
>
> Maybe another missing annotation?
>
> Benchmark Mode Cnt Score Error Units
> AddBenchmark.scalarArray thrpt 5 5270072.806 ▒
> 43618.821 ops/s
> AddBenchmark.scalarArrayHandle thrpt 5 5155791.142 ▒
> 122147.967 ops/s
> AddBenchmark.scalarMHI thrpt 5 2215595.625 ▒
> 27044.786 ops/s
> AddBenchmark.scalarMHI_v2 thrpt 5 2165838.557 ▒
> 48477.364 ops/s
> AddBenchmark.scalarUnsafe thrpt 5 2057853.572 ▒
> 21064.385 ops/s
> AddBenchmark.unrolledArray thrpt 5 6346056.064 ▒
> 304425.251 ops/s
> AddBenchmark.unrolledArrayHandle thrpt 5 1991324.025 ▒
> 39434.066 ops/s
> AddBenchmark.unrolledMHI thrpt 5 206541.946 ▒
> 4031.057 ops/s
> AddBenchmark.unrolledMHI_v2 thrpt 5 2240957.905 ▒
> 24239.357 ops/s
> AddBenchmark.unrolledUnsafe thrpt 5 2185038.207 ▒
> 27611.150 ops/s
>
>
> benchmark source code:
> https://github.com/chamb/panama-benchmarks/blob/master/memory/src/main/java/com/activeviam/test/AddBenchmark.java
> <https://urldefense.com/v3/__https://github.com/chamb/panama-benchmarks/blob/master/memory/src/main/java/com/activeviam/test/AddBenchmark.java__;!!GqivPVa7Brio!JU6EURo-BWwcJORcaJf4nCVfO3syPdA8AA83gp1B80CykWNTu1mpv7qQj-YAzN8kPVBWQQQ$>
>
>
> // CODE OF THE REMAINING SLOW BENCHMARK
> static final VarHandle MHI = MemoryLayout.ofSequence(SIZE,
> MemoryLayouts.JAVA_DOUBLE)
> .varHandle(double.class,
> MemoryLayout.PathElement.sequenceElement());
>
> @Benchmark
> public void unrolledMHI(Data state) {
> final MemorySegment is = state.inputSegment;
> final MemorySegment os = state.outputSegment;
>
> for(int i = 0; i < SIZE; i+=4) {
> MHI.set(os, (long) (i), (double) MHI.get(is, (long) (i)) +
> (double) MHI.get(os, (long) (i)));
> MHI.set(os, (long) (i+1), (double) MHI.get(is, (long) (i+1)) +
> (double) MHI.get(os, (long) (i+1)));
> MHI.set(os, (long) (i+2), (double) MHI.get(is, (long) (i+2)) +
> (double) MHI.get(os, (long) (i+2)));
> MHI.set(os, (long) (i+3), (double) MHI.get(is, (long) (i+3)) +
> (double) MHI.get(os, (long) (i+3)));
> }
> }
>
>
>
> Best,
> -Antoine
>
>
>
>
>
>
>
> On Mon, Jan 4, 2021 at 6:29 PM Antoine Chambille <ach at activeviam.com
> <mailto:ach at activeviam.com>> wrote:
>
> Thank you Maurizio, for looking into this.
>
> This is a good find, I've just updated and rebuilt the panama JDK,
> I confirm that the big slowdown with manually unrolled loop and
> memory handles has disappeared for the AddBenchmark.unrolledMHI_v2
> benchmark. But it is apparently still present in one last case:
> AddBenchmark.unrolledMHI
>
> Maybe another missing annotation?
>
> Benchmark Mode Cnt Score Error
> Units
> AddBenchmark.scalarArray thrpt 5 5270072.806 ▒
> 43618.821 ops/s
> AddBenchmark.scalarArrayHandle thrpt 5 5155791.142 ▒
> 122147.967 ops/s
> AddBenchmark.scalarMHI thrpt 5 2215595.625 ▒
> 27044.786 ops/s
> AddBenchmark.scalarMHI_v2 thrpt 5 2165838.557 ▒
> 48477.364 ops/s
> AddBenchmark.scalarUnsafe thrpt 5 2057853.572 ▒
> 21064.385 ops/s
> AddBenchmark.unrolledArray thrpt 5 6346056.064 ▒
> 304425.251 ops/s
> AddBenchmark.unrolledArrayHandle thrpt 5 1991324.025 ▒
> 39434.066 ops/s
> AddBenchmark.unrolledMHI thrpt 5 206541.946 ▒
> 4031.057 ops/s
> AddBenchmark.unrolledMHI_v2 thrpt 5 2240957.905 ▒
> 24239.357 ops/s
> AddBenchmark.unrolledUnsafe thrpt 5 2185038.207 ▒
> 27611.150 ops/s
>
>
> benchmark source code:
> https://github.com/chamb/panama-benchmarks/blob/master/memory/src/main/java/com/activeviam/test/AddBenchmark.java
> <https://urldefense.com/v3/__https://github.com/chamb/panama-benchmarks/blob/master/memory/src/main/java/com/activeviam/test/AddBenchmark.java__;!!GqivPVa7Brio!JU6EURo-BWwcJORcaJf4nCVfO3syPdA8AA83gp1B80CykWNTu1mpv7qQj-YAzN8kPVBWQQQ$>
>
>
> // CODE OF THE REMAINING SLOW BENCHMARK
> static final VarHandle MHI = MemoryLayout.ofSequence(SIZE,
> MemoryLayouts.JAVA_DOUBLE)
> .varHandle(double.class,
> MemoryLayout.PathElement.sequenceElement());
>
> @Benchmark
> public void unrolledMHI(Data state) {
> final MemorySegment is = state.inputSegment;
> final MemorySegment os = state.outputSegment;
>
> for(int i = 0; i < SIZE; i+=4) {
> MHI.set(os, (long) (i), (double) MHI.get(is, (long)
> (i)) + (double) MHI.get(os, (long) (i)));
> MHI.set(os, (long) (i+1), (double) MHI.get(is, (long)
> (i+1)) + (double) MHI.get(os, (long) (i+1)));
> MHI.set(os, (long) (i+2), (double) MHI.get(is, (long)
> (i+2)) + (double) MHI.get(os, (long) (i+2)));
> MHI.set(os, (long) (i+3), (double) MHI.get(is, (long)
> (i+3)) + (double) MHI.get(os, (long) (i+3)));
> }
> }
>
>
>
> Best,
> -Antoine
>
>
>
>
>
>
>
> On Wed, Nov 25, 2020 at 1:42 PM Maurizio Cimadamore
> <maurizio.cimadamore at oracle.com
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>
> I did some investigation, and, during the problematic
> benchmark we were
> hitting some inline thresholds, as evidenced by
> `-XX:PrintInlining`:
>
> @ 92 jdk.incubator.foreign.MemoryAccess::getLongAtIndex (12
> bytes)
> NodeCountInliningCutoff
> @ 96 jdk.incubator.foreign.MemoryAccess::setLongAtIndex (13
> bytes)
> NodeCountInliningCutoff
> @ 111 jdk.incubator.foreign.MemoryAccess::getLongAtIndex (12
> bytes)
> NodeCountInliningCutoff
> @ 120 jdk.incubator.foreign.MemoryAccess::getLongAtIndex (12
> bytes)
> NodeCountInliningCutoff
> @ 124 jdk.incubator.foreign.MemoryAccess::setLongAtIndex (13
> bytes)
> NodeCountInliningCutoff
>
> The problem is that the static accessors in MemoryAccess are
> lacking a
> @ForceInline annotation. This is being addressed here:
>
> https://github.com/openjdk/panama-foreign/pull/401
> <https://urldefense.com/v3/__https://github.com/openjdk/panama-foreign/pull/401__;!!GqivPVa7Brio!JU6EURo-BWwcJORcaJf4nCVfO3syPdA8AA83gp1B80CykWNTu1mpv7qQj-YAzN8kGtEIdr4$>
>
> Thanks
> Maurizio
>
>
> On 25/11/2020 11:51, Maurizio Cimadamore wrote:
> >
> > On 24/11/2020 11:19, Antoine Chambille wrote:
> >> If I look at the slow benchmark in detail, I observe that
> the first
> >> two warmups run at the expected speed, but then it slows
> down 20x.
> >> Very strange, it's almost as if some JIT optimization is
> suddenly
> >> turned off:
> >
> > This is something I've observed in the past as well, in some
> cases,
> > when playing with VH.
> >
> > We'll take a look.
> >
> > Thanks
> > Maurizio
> >
>
>
>
More information about the panama-dev
mailing list