Foreign memory access hot loop benchmark
Antoine Chambille
ach at activeviam.com
Mon Jan 4 17:29:39 UTC 2021
Thank you Maurizio, for looking into this.
This is a good find, I've just updated and rebuilt the panama JDK, I
confirm that the big slowdown with manually unrolled loop and memory
handles has disappeared for the AddBenchmark.unrolledMHI_v2 benchmark. But
it is apparently still present in one last case: AddBenchmark.unrolledMHI
Maybe another missing annotation?
Benchmark Mode Cnt Score Error
Units
AddBenchmark.scalarArray thrpt 5 5270072.806 ▒ 43618.821
ops/s
AddBenchmark.scalarArrayHandle thrpt 5 5155791.142 ▒ 122147.967
ops/s
AddBenchmark.scalarMHI thrpt 5 2215595.625 ▒ 27044.786
ops/s
AddBenchmark.scalarMHI_v2 thrpt 5 2165838.557 ▒ 48477.364
ops/s
AddBenchmark.scalarUnsafe thrpt 5 2057853.572 ▒ 21064.385
ops/s
AddBenchmark.unrolledArray thrpt 5 6346056.064 ▒ 304425.251
ops/s
AddBenchmark.unrolledArrayHandle thrpt 5 1991324.025 ▒ 39434.066
ops/s
AddBenchmark.unrolledMHI thrpt 5 206541.946 ▒ 4031.057
ops/s
AddBenchmark.unrolledMHI_v2 thrpt 5 2240957.905 ▒ 24239.357
ops/s
AddBenchmark.unrolledUnsafe thrpt 5 2185038.207 ▒ 27611.150
ops/s
benchmark source code:
https://github.com/chamb/panama-benchmarks/blob/master/memory/src/main/java/com/activeviam/test/AddBenchmark.java
// CODE OF THE REMAINING SLOW BENCHMARK
static final VarHandle MHI = MemoryLayout.ofSequence(SIZE,
MemoryLayouts.JAVA_DOUBLE)
.varHandle(double.class,
MemoryLayout.PathElement.sequenceElement());
@Benchmark
public void unrolledMHI(Data state) {
final MemorySegment is = state.inputSegment;
final MemorySegment os = state.outputSegment;
for(int i = 0; i < SIZE; i+=4) {
MHI.set(os, (long) (i), (double) MHI.get(is, (long) (i)) +
(double) MHI.get(os, (long) (i)));
MHI.set(os, (long) (i+1), (double) MHI.get(is, (long) (i+1)) +
(double) MHI.get(os, (long) (i+1)));
MHI.set(os, (long) (i+2), (double) MHI.get(is, (long) (i+2)) +
(double) MHI.get(os, (long) (i+2)));
MHI.set(os, (long) (i+3), (double) MHI.get(is, (long) (i+3)) +
(double) MHI.get(os, (long) (i+3)));
}
}
Best,
-Antoine
On Wed, Nov 25, 2020 at 1:42 PM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:
> I did some investigation, and, during the problematic benchmark we were
> hitting some inline thresholds, as evidenced by `-XX:PrintInlining`:
>
> @ 92 jdk.incubator.foreign.MemoryAccess::getLongAtIndex (12 bytes)
> NodeCountInliningCutoff
> @ 96 jdk.incubator.foreign.MemoryAccess::setLongAtIndex (13 bytes)
> NodeCountInliningCutoff
> @ 111 jdk.incubator.foreign.MemoryAccess::getLongAtIndex (12 bytes)
> NodeCountInliningCutoff
> @ 120 jdk.incubator.foreign.MemoryAccess::getLongAtIndex (12 bytes)
> NodeCountInliningCutoff
> @ 124 jdk.incubator.foreign.MemoryAccess::setLongAtIndex (13 bytes)
> NodeCountInliningCutoff
>
> The problem is that the static accessors in MemoryAccess are lacking a
> @ForceInline annotation. This is being addressed here:
>
> https://github.com/openjdk/panama-foreign/pull/401
>
> Thanks
> Maurizio
>
>
> On 25/11/2020 11:51, Maurizio Cimadamore wrote:
> >
> > On 24/11/2020 11:19, Antoine Chambille wrote:
> >> If I look at the slow benchmark in detail, I observe that the first
> >> two warmups run at the expected speed, but then it slows down 20x.
> >> Very strange, it's almost as if some JIT optimization is suddenly
> >> turned off:
> >
> > This is something I've observed in the past as well, in some cases,
> > when playing with VH.
> >
> > We'll take a look.
> >
> > Thanks
> > Maurizio
> >
>
More information about the panama-dev
mailing list