Foreign memory access hot loop benchmark

Antoine Chambille ach at activeviam.com
Mon Jan 4 17:31:55 UTC 2021


*(using fixed width font ;)*


Thank you Maurizio, for looking into this.

This is a good find, I've just updated and rebuilt the Panama JDK, I
confirm that the big slowdown with manually unrolled loop and memory
handles has disappeared for the AddBenchmark.unrolledMHI_v2 benchmark. But
it is apparently still present in one last case: AddBenchmark.unrolledMHI

Maybe another missing annotation?

Benchmark                            Mode  Cnt        Score        Error
 Units
AddBenchmark.scalarArray            thrpt    5  5270072.806 ▒  43618.821
 ops/s
AddBenchmark.scalarArrayHandle      thrpt    5  5155791.142 ▒ 122147.967
 ops/s
AddBenchmark.scalarMHI              thrpt    5  2215595.625 ▒  27044.786
 ops/s
AddBenchmark.scalarMHI_v2           thrpt    5  2165838.557 ▒  48477.364
 ops/s
AddBenchmark.scalarUnsafe           thrpt    5  2057853.572 ▒  21064.385
 ops/s
AddBenchmark.unrolledArray          thrpt    5  6346056.064 ▒ 304425.251
 ops/s
AddBenchmark.unrolledArrayHandle    thrpt    5  1991324.025 ▒  39434.066
 ops/s
AddBenchmark.unrolledMHI            thrpt    5   206541.946 ▒   4031.057
 ops/s
AddBenchmark.unrolledMHI_v2         thrpt    5  2240957.905 ▒  24239.357
 ops/s
AddBenchmark.unrolledUnsafe         thrpt    5  2185038.207 ▒  27611.150
 ops/s


benchmark source code:
https://github.com/chamb/panama-benchmarks/blob/master/memory/src/main/java/com/activeviam/test/AddBenchmark.java


// CODE OF THE REMAINING SLOW BENCHMARK
static final VarHandle MHI = MemoryLayout.ofSequence(SIZE,
MemoryLayouts.JAVA_DOUBLE)
            .varHandle(double.class,
MemoryLayout.PathElement.sequenceElement());

@Benchmark
public void unrolledMHI(Data state) {
    final MemorySegment is = state.inputSegment;
    final MemorySegment os = state.outputSegment;

    for(int i = 0; i < SIZE; i+=4) {
        MHI.set(os, (long) (i),   (double) MHI.get(is, (long) (i))   +
(double) MHI.get(os, (long) (i)));
        MHI.set(os, (long) (i+1), (double) MHI.get(is, (long) (i+1)) +
(double) MHI.get(os, (long) (i+1)));
        MHI.set(os, (long) (i+2), (double) MHI.get(is, (long) (i+2)) +
(double) MHI.get(os, (long) (i+2)));
        MHI.set(os, (long) (i+3), (double) MHI.get(is, (long) (i+3)) +
(double) MHI.get(os, (long) (i+3)));
    }
}



Best,
-Antoine







On Mon, Jan 4, 2021 at 6:29 PM Antoine Chambille <ach at activeviam.com> wrote:

> Thank you Maurizio, for looking into this.
>
> This is a good find, I've just updated and rebuilt the panama JDK, I
> confirm that the big slowdown with manually unrolled loop and memory
> handles has disappeared for the AddBenchmark.unrolledMHI_v2 benchmark. But
> it is apparently still present in one last case: AddBenchmark.unrolledMHI
>
> Maybe another missing annotation?
>
> Benchmark                            Mode  Cnt        Score        Error
>  Units
> AddBenchmark.scalarArray            thrpt    5  5270072.806 ▒  43618.821
>  ops/s
> AddBenchmark.scalarArrayHandle      thrpt    5  5155791.142 ▒ 122147.967
>  ops/s
> AddBenchmark.scalarMHI              thrpt    5  2215595.625 ▒  27044.786
>  ops/s
> AddBenchmark.scalarMHI_v2           thrpt    5  2165838.557 ▒  48477.364
>  ops/s
> AddBenchmark.scalarUnsafe           thrpt    5  2057853.572 ▒  21064.385
>  ops/s
> AddBenchmark.unrolledArray          thrpt    5  6346056.064 ▒ 304425.251
>  ops/s
> AddBenchmark.unrolledArrayHandle    thrpt    5  1991324.025 ▒  39434.066
>  ops/s
> AddBenchmark.unrolledMHI            thrpt    5   206541.946 ▒   4031.057
>  ops/s
> AddBenchmark.unrolledMHI_v2         thrpt    5  2240957.905 ▒  24239.357
>  ops/s
> AddBenchmark.unrolledUnsafe         thrpt    5  2185038.207 ▒  27611.150
>  ops/s
>
>
> benchmark source code:
>
> https://github.com/chamb/panama-benchmarks/blob/master/memory/src/main/java/com/activeviam/test/AddBenchmark.java
>
>
> // CODE OF THE REMAINING SLOW BENCHMARK
> static final VarHandle MHI = MemoryLayout.ofSequence(SIZE,
> MemoryLayouts.JAVA_DOUBLE)
>             .varHandle(double.class,
> MemoryLayout.PathElement.sequenceElement());
>
>     @Benchmark
>     public void unrolledMHI(Data state) {
>         final MemorySegment is = state.inputSegment;
>         final MemorySegment os = state.outputSegment;
>
>         for(int i = 0; i < SIZE; i+=4) {
>             MHI.set(os, (long) (i),   (double) MHI.get(is, (long) (i))   +
> (double) MHI.get(os, (long) (i)));
>             MHI.set(os, (long) (i+1), (double) MHI.get(is, (long) (i+1)) +
> (double) MHI.get(os, (long) (i+1)));
>             MHI.set(os, (long) (i+2), (double) MHI.get(is, (long) (i+2)) +
> (double) MHI.get(os, (long) (i+2)));
>             MHI.set(os, (long) (i+3), (double) MHI.get(is, (long) (i+3)) +
> (double) MHI.get(os, (long) (i+3)));
>         }
>     }
>
>
>
> Best,
> -Antoine
>
>
>
>
>
>
>
> On Wed, Nov 25, 2020 at 1:42 PM Maurizio Cimadamore <
> maurizio.cimadamore at oracle.com> wrote:
>
>> I did some investigation, and, during the problematic benchmark we were
>> hitting some inline thresholds, as evidenced by `-XX:PrintInlining`:
>>
>> @ 92   jdk.incubator.foreign.MemoryAccess::getLongAtIndex (12 bytes)
>> NodeCountInliningCutoff
>> @ 96   jdk.incubator.foreign.MemoryAccess::setLongAtIndex (13 bytes)
>> NodeCountInliningCutoff
>> @ 111   jdk.incubator.foreign.MemoryAccess::getLongAtIndex (12 bytes)
>> NodeCountInliningCutoff
>> @ 120   jdk.incubator.foreign.MemoryAccess::getLongAtIndex (12 bytes)
>> NodeCountInliningCutoff
>> @ 124   jdk.incubator.foreign.MemoryAccess::setLongAtIndex (13 bytes)
>> NodeCountInliningCutoff
>>
>> The problem is that the static accessors in MemoryAccess are lacking a
>> @ForceInline annotation. This is being addressed here:
>>
>> https://github.com/openjdk/panama-foreign/pull/401
>>
>> Thanks
>> Maurizio
>>
>>
>> On 25/11/2020 11:51, Maurizio Cimadamore wrote:
>> >
>> > On 24/11/2020 11:19, Antoine Chambille wrote:
>> >> If I look at the slow benchmark in detail, I observe that the first
>> >> two warmups run at the expected speed, but then it slows down 20x.
>> >> Very strange, it's almost as if some JIT optimization is suddenly
>> >> turned off:
>> >
>> > This is something I've observed in the past as well, in some cases,
>> > when playing with VH.
>> >
>> > We'll take a look.
>> >
>> > Thanks
>> > Maurizio
>> >
>>
>
>


More information about the panama-dev mailing list