[vector] AVX2 ByteVector.shiftR performance and semantics
Richard Startin
richard at openkappa.co.uk
Wed Jul 24 21:34:38 UTC 2019
I just built the API again from the vectorIntrinsics branch, and rewrote the code posted earlier in this chain as follows:
@BenchmarkMode(Mode.Throughput)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(value = 1, jvmArgsPrepend = {"--add-modules=jdk.incubator.vector",
"-XX:TypeProfileLevel=111", "-XX:-TieredCompilation", "-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0"})
public class RightLogicalShift {
@Param({"1024"})
private int size;
long[] data;
@Setup(Level.Trial)
public void init() {
data = newLongBitmap(size);
}
@Benchmark
public int shiftRightByte() {
return LongVector.fromArray(L256, data, 0)
.reinterpretAsBytes()
.lanewise(LSHR, 4)
.and((byte)0x0F)
.lane(0);
}
@Benchmark
public int shiftRightInt() {
return LongVector.fromArray(L256, data, 0)
.reinterpretAsInts()
.lanewise(LSHR, 4)
.and(0x0F0F0F0F)
.reinterpretAsBytes()
.lane(0);
}
}
I got quite a major performance degradation in shiftRightInt, and when I looked at the disassembly I noticed that vpsrld, the instruction I was expecting to see, was not used in shiftRightInt.
Latest
Benchmark (size) Mode Cnt Score Error Units
RightLogicalShift.shiftRightByte 1024 thrpt 5 39.239 ± 0.471 ops/us
RightLogicalShift.shiftRightInt 1024 thrpt 5 18.968 ± 0.292 ops/us
Same logic back in January:
Benchmark (size) Mode Cnt Score Error Units
PopCount.shiftRByte 1024 thrpt 5 29.310 ± 0.680 ops/us
PopCount.shiftRInt 1024 thrpt 5 257.261 ± 17.210 ops/us
Obviously I appreciate this isn't a released API and if this is an expected regression during some refactoring please ignore this email, but if it's not something you were aware of I hope the information helps.
________________________________
From: John Rose <john.r.rose at oracle.com>
Sent: 03 February 2019 20:06
To: Vladimir Ivanov <vladimir.x.ivanov at oracle.com>
Cc: Richard Startin <richard at openkappa.co.uk>; panama-dev at openjdk.java.net <panama-dev at openjdk.java.net>
Subject: Re: [vector] AVX2 ByteVector.shiftR performance and semantics
On Feb 1, 2019, at 3:55 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com<mailto:vladimir.x.ivanov at oracle.com>> wrote:
Yes, I find the names misleading as well and fully agree it's worth to consider alternatives.
(Every time I use those methods I have to refresh my memory about different terminology - ">>"/">>>", shiftR/aShiftR, signed/unsigned).
Quick thoughts:
I agree on systematic name prefixing; that's how stuff gets discovered in IDEs.
If we are ever tempted to overload a term like "shift" to refer *both* to intra-lane
ops *and* cross-lane ops, we should check ourselves and back away slowly.
When we get lambda cracking we can use unambiguous and natural operators
like ">>>" for intra-lane ops. Oh happy day…
More information about the panama-dev
mailing list