Foreign + Vectors - benchmarks for copying and swapping

Radosław Smogura mail at smogura.eu
Wed Jun 23 18:41:20 UTC 2021


Hi Paul,

Can you share a code, I could not unroll loop. I can only eliminate range checks and that's all.

In fact it's bit odd, as the code for loading int and byte vectors looks like same.

I've got few suspicions why ByteBuffer vectors can be harder to optimize:

  *   array length is taken from constant memory
  *   array length is non-negative

Kind regards,
Rado

________________________________
Od: Paul Sandoz <paul.sandoz at oracle.com>
Wysłane: wtorek, 22 czerwca 2021 22:29
Do: Radosław Smogura <mail at smogura.eu>
DW: Maurizio Cimadamore <maurizio.cimadamore at oracle.com>; Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
Temat: Re: Foreign + Vectors - benchmarks for copying and swapping

In general that should be ok. Try using IntVector instead and it will unroll (with your patch removing CPU barriers)

I wonder if this may be a limitation specific to bytes.

Paul.

> On Jun 21, 2021, at 4:28 PM, Radosław Smogura <mail at smogura.eu> wrote:
>
> Hi,
>
> I think why the copy case may fail with unrolling, because
>        • loop unroll takes the range check from intoByteBuffer as the loop exit condition
>        • the range check uses unsigned compare, which is not supported by loop unroll
>
> I think in this code
>         for (int i = 0; i < bound; i += lanes) {
>           final var srcVector = ByteVector
>               .fromByteBuffer(BYTE_VECTOR_SPECIES, src, i, ByteOrder.nativeOrder());
>
>           srcVector.intoByteBuffer(dst, i, ByteOrder.nativeOrder());
>         }
> exit condition should be i < bound, not a range check from intoByteBuffer.
>
> Kind regards,
> Rado
>
> Od: Paul Sandoz <paul.sandoz at oracle.com>
> Wysłane: poniedziałek, 21 czerwca 2021 23:25
> Do: Maurizio Cimadamore <maurizio.cimadamore at oracle.com>
> DW: Radosław Smogura <mail at smogura.eu>; Vladimir Ivanov <vladimir.x.ivanov at oracle.com>; panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
> Temat: Re: Foreign + Vectors - benchmarks for copying and swapping
>
> Replacing the upper bound in `segmentImplicitScalar` with a constant (1024 say) results in a similar time to `bufferNativeScalar` without a constant bound, both of which (alas) are still slower that scalar array access (which benefits greatly from auto-vectorization).
>
> I wonder if the segment subrange checking for int value ranges is having an impact on bounds checking?
>
> Paul.
>
> > On Jun 21, 2021, at 1:56 PM, Maurizio Cimadamore <maurizio.cimadamore at oracle.com> wrote:
> >
> >
> > On 21/06/2021 20:33, Paul Sandoz wrote:
> >> - Segment scalar access is penalized compared to ByteBuffer (from allocate or allocateDirect) scalar access.
> >
> > Odd
> >
> > We have many benchmarks similar to this (see LoopOverNonConstant) and they seem to offer same level of performance compared with ByteBuffers.
> >
> > I wonder if the loop limit being "SPECIES.loopBound(srcArray.length)" plays a role? Have you tried replacing that expression with a constant?
> >
> > Maurizio
> >



More information about the panama-dev mailing list