Foreign memory access hot loop benchmark

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Mon Jan 4 18:27:54 UTC 2021


What happens with longs? Do you still see the slowdown?

Maurizio

On 04/01/2021 17:31, Antoine Chambille wrote:
> /(using fixed width font ;)/
>
>
> Thank you Maurizio, for looking into this.
>
> This is a good find, I've just updated and rebuilt the Panama JDK, I 
> confirm that the big slowdown with manually unrolled loop and memory 
> handles has disappeared for the AddBenchmark.unrolledMHI_v2 benchmark. 
> But it is apparently still present in one last case: 
> AddBenchmark.unrolledMHI
>
> Maybe another missing annotation?
>
> Benchmark  Mode  Cnt        Score        Error  Units
> AddBenchmark.scalarArray            thrpt    5  5270072.806 ▒ 
>  43618.821  ops/s
> AddBenchmark.scalarArrayHandle      thrpt    5  5155791.142 ▒ 
> 122147.967  ops/s
> AddBenchmark.scalarMHI              thrpt    5  2215595.625 ▒ 
>  27044.786  ops/s
> AddBenchmark.scalarMHI_v2           thrpt    5  2165838.557 ▒ 
>  48477.364  ops/s
> AddBenchmark.scalarUnsafe           thrpt    5  2057853.572 ▒ 
>  21064.385  ops/s
> AddBenchmark.unrolledArray          thrpt    5  6346056.064 ▒ 
> 304425.251  ops/s
> AddBenchmark.unrolledArrayHandle    thrpt    5  1991324.025 ▒ 
>  39434.066  ops/s
> AddBenchmark.unrolledMHI            thrpt    5   206541.946 ▒   
> 4031.057  ops/s
> AddBenchmark.unrolledMHI_v2         thrpt    5  2240957.905 ▒ 
>  24239.357  ops/s
> AddBenchmark.unrolledUnsafe         thrpt    5  2185038.207 ▒ 
>  27611.150  ops/s
>
>
> benchmark source code:
> https://github.com/chamb/panama-benchmarks/blob/master/memory/src/main/java/com/activeviam/test/AddBenchmark.java 
> <https://urldefense.com/v3/__https://github.com/chamb/panama-benchmarks/blob/master/memory/src/main/java/com/activeviam/test/AddBenchmark.java__;!!GqivPVa7Brio!JU6EURo-BWwcJORcaJf4nCVfO3syPdA8AA83gp1B80CykWNTu1mpv7qQj-YAzN8kPVBWQQQ$>
>
>
> // CODE OF THE REMAINING SLOW BENCHMARK
> static final VarHandle MHI = MemoryLayout.ofSequence(SIZE, 
> MemoryLayouts.JAVA_DOUBLE)
>             .varHandle(double.class, 
> MemoryLayout.PathElement.sequenceElement());
>
> @Benchmark
> public void unrolledMHI(Data state) {
>     final MemorySegment is = state.inputSegment;
>     final MemorySegment os = state.outputSegment;
>
>     for(int i = 0; i < SIZE; i+=4) {
>         MHI.set(os, (long) (i),   (double) MHI.get(is, (long) (i))   + 
> (double) MHI.get(os, (long) (i)));
>         MHI.set(os, (long) (i+1), (double) MHI.get(is, (long) (i+1)) + 
> (double) MHI.get(os, (long) (i+1)));
>         MHI.set(os, (long) (i+2), (double) MHI.get(is, (long) (i+2)) + 
> (double) MHI.get(os, (long) (i+2)));
>         MHI.set(os, (long) (i+3), (double) MHI.get(is, (long) (i+3)) + 
> (double) MHI.get(os, (long) (i+3)));
>     }
> }
>
>
>
> Best,
> -Antoine
>
>
>
>
>
>
>
> On Mon, Jan 4, 2021 at 6:29 PM Antoine Chambille <ach at activeviam.com 
> <mailto:ach at activeviam.com>> wrote:
>
>     Thank you Maurizio, for looking into this.
>
>     This is a good find, I've just updated and rebuilt the panama JDK,
>     I confirm that the big slowdown with manually unrolled loop and
>     memory handles has disappeared for the AddBenchmark.unrolledMHI_v2
>     benchmark. But it is apparently still present in one last case:
>     AddBenchmark.unrolledMHI
>
>     Maybe another missing annotation?
>
>     Benchmark                            Mode  Cnt  Score        Error
>      Units
>     AddBenchmark.scalarArray            thrpt    5  5270072.806 ▒
>      43618.821  ops/s
>     AddBenchmark.scalarArrayHandle      thrpt    5  5155791.142 ▒
>     122147.967  ops/s
>     AddBenchmark.scalarMHI              thrpt    5  2215595.625 ▒
>      27044.786  ops/s
>     AddBenchmark.scalarMHI_v2           thrpt    5  2165838.557 ▒
>      48477.364  ops/s
>     AddBenchmark.scalarUnsafe           thrpt    5  2057853.572 ▒
>      21064.385  ops/s
>     AddBenchmark.unrolledArray          thrpt    5  6346056.064 ▒
>     304425.251  ops/s
>     AddBenchmark.unrolledArrayHandle    thrpt    5  1991324.025 ▒
>      39434.066  ops/s
>     AddBenchmark.unrolledMHI            thrpt    5 206541.946 ▒  
>     4031.057  ops/s
>     AddBenchmark.unrolledMHI_v2         thrpt    5  2240957.905 ▒
>      24239.357  ops/s
>     AddBenchmark.unrolledUnsafe         thrpt    5  2185038.207 ▒
>      27611.150  ops/s
>
>
>     benchmark source code:
>     https://github.com/chamb/panama-benchmarks/blob/master/memory/src/main/java/com/activeviam/test/AddBenchmark.java
>     <https://urldefense.com/v3/__https://github.com/chamb/panama-benchmarks/blob/master/memory/src/main/java/com/activeviam/test/AddBenchmark.java__;!!GqivPVa7Brio!JU6EURo-BWwcJORcaJf4nCVfO3syPdA8AA83gp1B80CykWNTu1mpv7qQj-YAzN8kPVBWQQQ$>
>
>
>     // CODE OF THE REMAINING SLOW BENCHMARK
>     static final VarHandle MHI = MemoryLayout.ofSequence(SIZE,
>     MemoryLayouts.JAVA_DOUBLE)
>                 .varHandle(double.class,
>     MemoryLayout.PathElement.sequenceElement());
>
>         @Benchmark
>         public void unrolledMHI(Data state) {
>             final MemorySegment is = state.inputSegment;
>             final MemorySegment os = state.outputSegment;
>
>             for(int i = 0; i < SIZE; i+=4) {
>                 MHI.set(os, (long) (i),   (double) MHI.get(is, (long)
>     (i))   + (double) MHI.get(os, (long) (i)));
>                 MHI.set(os, (long) (i+1), (double) MHI.get(is, (long)
>     (i+1)) + (double) MHI.get(os, (long) (i+1)));
>                 MHI.set(os, (long) (i+2), (double) MHI.get(is, (long)
>     (i+2)) + (double) MHI.get(os, (long) (i+2)));
>                 MHI.set(os, (long) (i+3), (double) MHI.get(is, (long)
>     (i+3)) + (double) MHI.get(os, (long) (i+3)));
>             }
>         }
>
>
>
>     Best,
>     -Antoine
>
>
>
>
>
>
>
>     On Wed, Nov 25, 2020 at 1:42 PM Maurizio Cimadamore
>     <maurizio.cimadamore at oracle.com
>     <mailto:maurizio.cimadamore at oracle.com>> wrote:
>
>         I did some investigation, and, during the problematic
>         benchmark we were
>         hitting some inline thresholds, as evidenced by
>         `-XX:PrintInlining`:
>
>         @ 92 jdk.incubator.foreign.MemoryAccess::getLongAtIndex (12
>         bytes)
>         NodeCountInliningCutoff
>         @ 96 jdk.incubator.foreign.MemoryAccess::setLongAtIndex (13
>         bytes)
>         NodeCountInliningCutoff
>         @ 111 jdk.incubator.foreign.MemoryAccess::getLongAtIndex (12
>         bytes)
>         NodeCountInliningCutoff
>         @ 120 jdk.incubator.foreign.MemoryAccess::getLongAtIndex (12
>         bytes)
>         NodeCountInliningCutoff
>         @ 124 jdk.incubator.foreign.MemoryAccess::setLongAtIndex (13
>         bytes)
>         NodeCountInliningCutoff
>
>         The problem is that the static accessors in MemoryAccess are
>         lacking a
>         @ForceInline annotation. This is being addressed here:
>
>         https://github.com/openjdk/panama-foreign/pull/401
>         <https://urldefense.com/v3/__https://github.com/openjdk/panama-foreign/pull/401__;!!GqivPVa7Brio!JU6EURo-BWwcJORcaJf4nCVfO3syPdA8AA83gp1B80CykWNTu1mpv7qQj-YAzN8kGtEIdr4$>
>
>         Thanks
>         Maurizio
>
>
>         On 25/11/2020 11:51, Maurizio Cimadamore wrote:
>         >
>         > On 24/11/2020 11:19, Antoine Chambille wrote:
>         >> If I look at the slow benchmark in detail, I observe that
>         the first
>         >> two warmups run at the expected speed, but then it slows
>         down 20x.
>         >> Very strange, it's almost as if some JIT optimization is
>         suddenly
>         >> turned off:
>         >
>         > This is something I've observed in the past as well, in some
>         cases,
>         > when playing with VH.
>         >
>         > We'll take a look.
>         >
>         > Thanks
>         > Maurizio
>         >
>
>
>


More information about the panama-dev mailing list