Foreign memory access hot loop benchmark
Antoine Chambille
ach at activeviam.com
Fri Jan 8 13:28:01 UTC 2021
Hi Maurizio, thanks for investigating and taking the time to explain.
Using the 8-bits alignment "trick" I reproduce what you described and all
benchmarks are now in the same ballpark. And performance with memory
handles is similar to Unsafe. Very nice.
Benchmark Mode Cnt Score Error
Units
AddBenchmark.scalarArray thrpt 5 3670444.769 ▒ 2530.711
ops/s
AddBenchmark.scalarArrayHandle thrpt 5 3646632.374 ▒ 5986.572
ops/s
AddBenchmark.scalarUnsafe thrpt 5 1478253.636 ▒ 7656.381
ops/s
AddBenchmark.scalarMHI thrpt 5 1598951.179 ▒ 29155.440
ops/s
AddBenchmark.scalarMHI_v2 thrpt 5 1601492.148 ▒ 9539.575
ops/s
AddBenchmark.unrolledArray thrpt 5 5388372.121 ▒ 12996.095
ops/s
AddBenchmark.unrolledArrayHandle thrpt 5 1387713.576 ▒ 2794.709
ops/s
AddBenchmark.unrolledUnsafe thrpt 5 1527765.861 ▒ 3157.150
ops/s
AddBenchmark.unrolledMHI thrpt 5 1586333.615 ▒ 5012.909
ops/s
AddBenchmark.unrolledMHI_long thrpt 5 1373930.743 ▒ 3170.683
ops/s
AddBenchmark.unrolledMHI_v2 thrpt 5 1588131.924 ▒ 6045.960
ops/s
AddBenchmark.unrolledMHI_v2_long thrpt 5 1373246.663 ▒ 18637.544
ops/s
The next significant improvements will probably come from vectorization,
automatic vectorization (that apparently only works with arrays for now)
and the Vector API when it supports segments. Can't wait!
-Antoine
On Tue, Jan 5, 2021 at 7:52 PM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:
> Good news, it wasn't as nasty as anticipated.
>
> It seems like your benchmark was accidentally comparing pears with
> apples - in the sense that the VarHandle created in your benchmark was
> checking alignment, while the ones we have in MemoryAccess do not.
>
> This is what I get with your code:
>
> Benchmark Mode Cnt Score Error Units
> AddBenchmark.unrolledMHI_long avgt 30 2.947 ? 0.029 us/op
> AddBenchmark.unrolledMHI_v2_long avgt 30 0.341 ? 0.004 us/op
> AddBenchmark.unrolledUnsafe avgt 30 0.251 ? 0.002 us/op
>
> But the var handle is created as follows:
>
> static final VarHandle MHI_L = MemoryLayout.ofSequence(SIZE,
> MemoryLayouts.JAVA_LONG.withBitAlignment(8))
> .varHandle(long.class,
> MemoryLayout.PathElement.sequenceElement());
>
> Then the numbers I get are much better:
>
> Benchmark Mode Cnt Score Error Units
> AddBenchmark.unrolledMHI_long avgt 30 0.339 ? 0.005 us/op
> AddBenchmark.unrolledMHI_v2_long avgt 30 0.341 ? 0.004 us/op
> AddBenchmark.unrolledUnsafe avgt 30 0.256 ? 0.002 us/op
>
> We know we have issues when it comes to hoisting the alignment check out
> of loops (Vlad, do you happen to have a JBS issue for this?) - we have
> some workarounds in place which work for simple loops, but fail to work
> in more complex code like yours.
>
> Eventually, the upcoming improvements for long loop optimizations will
> hopefully render much of these edge cases obsolete.
>
> On 05/01/2021 16:50, Maurizio Cimadamore wrote:
> > Thanks,
> > I'll take a look - my guts tell me that the method is just too big
> > when using VH directly (something I've seen in other cases). Note that
> > the fact that we have @ForceInline on the MemoryAccess accessors
> > helps, since that will tell hotspot to always inline those access, no
> > matter the size of the enclosing method. I'm afraid here we're in a
> > situation where the benchmark method gets too big and no further
> > inlining happens (even though, if we progressed with inlining we'd end
> > up with a _smaller_ compiled method overall).
> >
> > I'll try to test this hypothesis. Stay tuned.
> >
> > Cheers
> > Maurizio
> >
> >
> > On 05/01/2021 16:45, Antoine Chambille wrote:
> >> Yes I see the same slowdown with longs than with doubles.
> >>
> >> -Antoine
> >>
> >>
> >>
> >> On Mon, Jan 4, 2021 at 7:33 PM Maurizio Cimadamore
> >> <maurizio.cimadamore at oracle.com
> >> <mailto:maurizio.cimadamore at oracle.com>> wrote:
> >>
> >> What happens with longs? Do you still see the slowdown?
> >>
> >> Maurizio
> >>
>
More information about the panama-dev
mailing list