Foreign + Vectors - benchmarks for copying and swapping

Fri Jun 18 21:37:11 UTC 2021

> On Jun 18, 2021, at 2:03 PM, Maurizio Cimadamore <Maurizio.Cimadamore at Oracle.COM> wrote:
> 
> 
> On 18/06/2021 20:55, Paul Sandoz wrote:
>> The order declared in the vector load/store overrides any order declared on the buffer (should make the specification clearer in that respect). (In this case in the source is bytes, so there is no swapping).
> Doh - right!
>> 
>> —
>> 
>> There is something odd going on when tiered compilation is switched off, the result for copyWithVector is much worse for smaller sizes.
> 
> Is this what Uwe is seeing I wonder?
> 
> https://github.com/apache/lucene/pull/177#issuecomment-861265227
> 

Possibly.

>> 
>> With larger sizes with and without tiered, similar result are observed with similar generated code (of less quality than with tiered for smaller sizes, oddly enough).
>> 
>> Whether tiered is enabled or not there is no loop unrolling.
>> 
>> I think something may have regressed, although we have previously focused more on array access than buffer access.
> Is the vector implementation performing a bulk copy into a byte array IIRC? If so, maybe there's an issue with bulk copy - which would be the same issue we're seeing on the memory access front?

No, the intrinsic byte vector access to a byte buffer works similarly to intrinsic byte vector access to a byte array, using the buffer’s base and offset (to calculate the address relative to the base).

Paul.