Foreign memory access hot loop benchmark

Tue Jan 5 16:50:50 UTC 2021

Thanks,
I'll take a look - my guts tell me that the method is just too big when 
using VH directly (something I've seen in other cases). Note that the 
fact that we have @ForceInline on the MemoryAccess accessors helps, 
since that will tell hotspot to always inline those access, no matter 
the size of the enclosing method. I'm afraid here we're in a situation 
where the benchmark method gets too big and no further inlining happens 
(even though, if we progressed with inlining we'd end up with a 
_smaller_ compiled method overall).

I'll try to test this hypothesis. Stay tuned.

Cheers
Maurizio

On 05/01/2021 16:45, Antoine Chambille wrote:
> Yes I see the same slowdown with longs than with doubles.
>
> -Antoine
>
>
>
> On Mon, Jan 4, 2021 at 7:33 PM Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com 
> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>
>     What happens with longs? Do you still see the slowdown?
>
>     Maurizio
>
>     On 04/01/2021 17:31, Antoine Chambille wrote:
>>     /(using fixed width font ;)/
>>
>>
>>     Thank you Maurizio, for looking into this.
>>
>>     This is a good find, I've just updated and rebuilt the Panama
>>     JDK, I confirm that the big slowdown with manually unrolled loop
>>     and memory handles has disappeared for the
>>     AddBenchmark.unrolledMHI_v2 benchmark. But it is apparently still
>>     present in one last case: AddBenchmark.unrolledMHI
>>
>>     Maybe another missing annotation?
>>
>>     Benchmark        Mode  Cnt        Score        Error  Units
>>     AddBenchmark.scalarArray            thrpt    5  5270072.806 ▒
>>      43618.821  ops/s
>>     AddBenchmark.scalarArrayHandle      thrpt    5  5155791.142 ▒
>>     122147.967  ops/s
>>     AddBenchmark.scalarMHI              thrpt    5  2215595.625 ▒
>>      27044.786  ops/s
>>     AddBenchmark.scalarMHI_v2           thrpt    5  2165838.557 ▒
>>      48477.364  ops/s
>>     AddBenchmark.scalarUnsafe           thrpt    5  2057853.572 ▒
>>      21064.385  ops/s
>>     AddBenchmark.unrolledArray          thrpt    5  6346056.064 ▒
>>     304425.251  ops/s
>>     AddBenchmark.unrolledArrayHandle    thrpt    5  1991324.025 ▒
>>      39434.066  ops/s
>>     AddBenchmark.unrolledMHI            thrpt    5 206541.946 ▒  
>>     4031.057  ops/s
>>     AddBenchmark.unrolledMHI_v2         thrpt    5  2240957.905 ▒
>>      24239.357  ops/s
>>     AddBenchmark.unrolledUnsafe         thrpt    5  2185038.207 ▒
>>      27611.150  ops/s
>>
>>
>>     benchmark source code:
>>     https://github.com/chamb/panama-benchmarks/blob/master/memory/src/main/java/com/activeviam/test/AddBenchmark.java
>>     <https://urldefense.com/v3/__https://github.com/chamb/panama-benchmarks/blob/master/memory/src/main/java/com/activeviam/test/AddBenchmark.java__;!!GqivPVa7Brio!JU6EURo-BWwcJORcaJf4nCVfO3syPdA8AA83gp1B80CykWNTu1mpv7qQj-YAzN8kPVBWQQQ$>
>>
>>
>>     // CODE OF THE REMAINING SLOW BENCHMARK
>>     static final VarHandle MHI = MemoryLayout.ofSequence(SIZE,
>>     MemoryLayouts.JAVA_DOUBLE)
>>                 .varHandle(double.class,
>>     MemoryLayout.PathElement.sequenceElement());
>>
>>     @Benchmark
>>     public void unrolledMHI(Data state) {
>>         final MemorySegment is = state.inputSegment;
>>         final MemorySegment os = state.outputSegment;
>>
>>         for(int i = 0; i < SIZE; i+=4) {
>>             MHI.set(os, (long) (i),   (double) MHI.get(is, (long)
>>     (i))   + (double) MHI.get(os, (long) (i)));
>>             MHI.set(os, (long) (i+1), (double) MHI.get(is, (long)
>>     (i+1)) + (double) MHI.get(os, (long) (i+1)));
>>             MHI.set(os, (long) (i+2), (double) MHI.get(is, (long)
>>     (i+2)) + (double) MHI.get(os, (long) (i+2)));
>>             MHI.set(os, (long) (i+3), (double) MHI.get(is, (long)
>>     (i+3)) + (double) MHI.get(os, (long) (i+3)));
>>         }
>>     }
>>
>>
>>
>>     Best,
>>     -Antoine
>>
>>
>>
>>
>>
>>
>>
>>
>>         On Wed, Nov 25, 2020 at 1:42 PM Maurizio Cimadamore
>>         <maurizio.cimadamore at oracle.com
>>         <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>
>>             I did some investigation, and, during the problematic
>>             benchmark we were
>>             hitting some inline thresholds, as evidenced by
>>             `-XX:PrintInlining`:
>>
>>             @ 92 jdk.incubator.foreign.MemoryAccess::getLongAtIndex
>>             (12 bytes)
>>             NodeCountInliningCutoff
>>             @ 96 jdk.incubator.foreign.MemoryAccess::setLongAtIndex
>>             (13 bytes)
>>             NodeCountInliningCutoff
>>             @ 111 jdk.incubator.foreign.MemoryAccess::getLongAtIndex
>>             (12 bytes)
>>             NodeCountInliningCutoff
>>             @ 120 jdk.incubator.foreign.MemoryAccess::getLongAtIndex
>>             (12 bytes)
>>             NodeCountInliningCutoff
>>             @ 124 jdk.incubator.foreign.MemoryAccess::setLongAtIndex
>>             (13 bytes)
>>             NodeCountInliningCutoff
>>
>>             The problem is that the static accessors in MemoryAccess
>>             are lacking a
>>             @ForceInline annotation. This is being addressed here:
>>
>>             https://github.com/openjdk/panama-foreign/pull/401
>>             <https://urldefense.com/v3/__https://github.com/openjdk/panama-foreign/pull/401__;!!GqivPVa7Brio!JU6EURo-BWwcJORcaJf4nCVfO3syPdA8AA83gp1B80CykWNTu1mpv7qQj-YAzN8kGtEIdr4$>
>>
>>             Thanks
>>             Maurizio
>>
>>
>>             On 25/11/2020 11:51, Maurizio Cimadamore wrote:
>>             >
>>             > On 24/11/2020 11:19, Antoine Chambille wrote:
>>             >> If I look at the slow benchmark in detail, I observe
>>             that the first
>>             >> two warmups run at the expected speed, but then it
>>             slows down 20x.
>>             >> Very strange, it's almost as if some JIT optimization
>>             is suddenly
>>             >> turned off:
>>             >
>>             > This is something I've observed in the past as well, in
>>             some cases,
>>             > when playing with VH.
>>             >
>>             > We'll take a look.
>>             >
>>             > Thanks
>>             > Maurizio
>>             >
>>
>>
>>
>