Foreign memory access hot loop benchmark

Fri Jan 8 13:28:01 UTC 2021

Hi Maurizio, thanks for investigating and taking the time to explain.

Using the 8-bits alignment "trick" I reproduce what you described and all
benchmarks are now in the same ballpark. And performance with memory
handles is similar to Unsafe. Very nice.

Benchmark                              Mode  Cnt        Score       Error
 Units
AddBenchmark.scalarArray              thrpt    5  3670444.769 ▒  2530.711
 ops/s
AddBenchmark.scalarArrayHandle        thrpt    5  3646632.374 ▒  5986.572
 ops/s
AddBenchmark.scalarUnsafe             thrpt    5  1478253.636 ▒  7656.381
 ops/s
AddBenchmark.scalarMHI                thrpt    5  1598951.179 ▒ 29155.440
 ops/s
AddBenchmark.scalarMHI_v2             thrpt    5  1601492.148 ▒  9539.575
 ops/s
AddBenchmark.unrolledArray            thrpt    5  5388372.121 ▒ 12996.095
 ops/s
AddBenchmark.unrolledArrayHandle      thrpt    5  1387713.576 ▒  2794.709
 ops/s
AddBenchmark.unrolledUnsafe           thrpt    5  1527765.861 ▒  3157.150
 ops/s
AddBenchmark.unrolledMHI              thrpt    5  1586333.615 ▒  5012.909
 ops/s
AddBenchmark.unrolledMHI_long         thrpt    5  1373930.743 ▒  3170.683
 ops/s
AddBenchmark.unrolledMHI_v2           thrpt    5  1588131.924 ▒  6045.960
 ops/s
AddBenchmark.unrolledMHI_v2_long      thrpt    5  1373246.663 ▒ 18637.544
 ops/s

The next significant improvements will probably come from vectorization,
automatic vectorization (that apparently only works with arrays for now)
and the Vector API when it supports segments. Can't wait!

-Antoine

On Tue, Jan 5, 2021 at 7:52 PM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:

> Good news, it wasn't as nasty as anticipated.
>
> It seems like your benchmark was accidentally comparing pears with
> apples - in the sense that the VarHandle created in your benchmark was
> checking alignment, while the ones we have in MemoryAccess do not.
>
> This is what I get with your code:
>
> Benchmark                         Mode  Cnt  Score   Error  Units
> AddBenchmark.unrolledMHI_long     avgt   30  2.947 ? 0.029  us/op
> AddBenchmark.unrolledMHI_v2_long  avgt   30  0.341 ? 0.004  us/op
> AddBenchmark.unrolledUnsafe       avgt   30  0.251 ? 0.002  us/op
>
> But the var handle is created as follows:
>
> static final VarHandle MHI_L = MemoryLayout.ofSequence(SIZE,
> MemoryLayouts.JAVA_LONG.withBitAlignment(8))
>              .varHandle(long.class,
> MemoryLayout.PathElement.sequenceElement());
>
> Then the numbers I get are much better:
>
> Benchmark                         Mode  Cnt  Score   Error  Units
> AddBenchmark.unrolledMHI_long     avgt   30  0.339 ? 0.005  us/op
> AddBenchmark.unrolledMHI_v2_long  avgt   30  0.341 ? 0.004  us/op
> AddBenchmark.unrolledUnsafe       avgt   30  0.256 ? 0.002  us/op
>
> We know we have issues when it comes to hoisting the alignment check out
> of loops (Vlad, do you happen to have a JBS issue for this?) - we have
> some workarounds in place which work for simple loops, but fail to work
> in more complex code like yours.
>
> Eventually, the upcoming improvements for long loop optimizations will
> hopefully render much of these edge cases obsolete.
>
> On 05/01/2021 16:50, Maurizio Cimadamore wrote:
> > Thanks,
> > I'll take a look - my guts tell me that the method is just too big
> > when using VH directly (something I've seen in other cases). Note that
> > the fact that we have @ForceInline on the MemoryAccess accessors
> > helps, since that will tell hotspot to always inline those access, no
> > matter the size of the enclosing method. I'm afraid here we're in a
> > situation where the benchmark method gets too big and no further
> > inlining happens (even though, if we progressed with inlining we'd end
> > up with a _smaller_ compiled method overall).
> >
> > I'll try to test this hypothesis. Stay tuned.
> >
> > Cheers
> > Maurizio
> >
> >
> > On 05/01/2021 16:45, Antoine Chambille wrote:
> >> Yes I see the same slowdown with longs than with doubles.
> >>
> >> -Antoine
> >>
> >>
> >>
> >> On Mon, Jan 4, 2021 at 7:33 PM Maurizio Cimadamore
> >> <maurizio.cimadamore at oracle.com
> >> <mailto:maurizio.cimadamore at oracle.com>> wrote:
> >>
> >>     What happens with longs? Do you still see the slowdown?
> >>
> >>     Maurizio
> >>
>