[vector] AVX2 ByteVector.shiftR performance and semantics

Wed Jul 24 21:34:38 UTC 2019

I just built the API again from the vectorIntrinsics branch, and rewrote the code posted earlier in this chain as follows:

@BenchmarkMode(Mode.Throughput)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(value = 1, jvmArgsPrepend = {"--add-modules=jdk.incubator.vector",
        "-XX:TypeProfileLevel=111", "-XX:-TieredCompilation", "-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0"})
public class RightLogicalShift {

  @Param({"1024"})
  private int size;

  long[] data;

  @Setup(Level.Trial)
  public void init() {
    data = newLongBitmap(size);
  }

  @Benchmark
  public int shiftRightByte() {
    return LongVector.fromArray(L256, data, 0)
            .reinterpretAsBytes()
            .lanewise(LSHR, 4)
            .and((byte)0x0F)
            .lane(0);
  }

  @Benchmark
  public int shiftRightInt() {
    return LongVector.fromArray(L256, data, 0)
            .reinterpretAsInts()
            .lanewise(LSHR, 4)
            .and(0x0F0F0F0F)
            .reinterpretAsBytes()
            .lane(0);
  }
}

I got quite a major performance degradation in shiftRightInt, and when I looked at the disassembly I noticed that vpsrld, the instruction I was expecting to see, was not used in shiftRightInt.

Latest

Benchmark                         (size)   Mode  Cnt   Score   Error   Units
RightLogicalShift.shiftRightByte    1024  thrpt    5  39.239 ± 0.471  ops/us
RightLogicalShift.shiftRightInt     1024  thrpt    5  18.968 ± 0.292  ops/us

Same logic back in January:

Benchmark                         (size)   Mode  Cnt    Score    Error   Units
PopCount.shiftRByte         1024  thrpt    5   29.310 ±  0.680  ops/us
PopCount.shiftRInt             1024  thrpt    5  257.261 ± 17.210  ops/us

Obviously I appreciate this isn't a released API and if this is an expected regression during some refactoring please ignore this email, but if it's not something you were aware of I hope the information helps.

________________________________
From: John Rose <john.r.rose at oracle.com>
Sent: 03 February 2019 20:06
To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
Cc: Richard Startin <richard at openkappa.co.uk>; panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
Subject: Re: [vector] AVX2 ByteVector.shiftR performance and semantics

On Feb 1, 2019, at 3:55 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com<mailto:vladimir.x.ivanov at oracle.com>> wrote:

Yes, I find the names misleading as well and fully agree it's worth to consider alternatives.

(Every time I use those methods I have to refresh my memory about different terminology - ">>"/">>>", shiftR/aShiftR, signed/unsigned).

Quick thoughts:

I agree on systematic name prefixing; that's how stuff gets discovered in IDEs.

If we are ever tempted to overload a term like "shift" to refer *both* to intra-lane
ops *and* cross-lane ops, we should check ourselves and back away slowly.

When we get lambda cracking we can use unambiguous and natural operators
like ">>>" for intra-lane ops.  Oh happy day…