Allocation caused by vector rebracketing twice
Viswanathan, Sandhya
sandhya.viswanathan at intel.com
Mon Jan 21 19:29:16 UTC 2019
Hi Richard,
Thanks a lot for your feedback. I analyzed this further and it looks like the high allocation rate is happening because the Long addAll reduction intrinsic is not in place for AVX < 3 and so boxing is happening to call the Java implementation. I will send out a patch shortly which should fix this.
Best Regards,
Sandhya
-----Original Message-----
From: panama-dev [mailto:panama-dev-bounces at openjdk.java.net] On Behalf Of Richard Startin
Sent: Saturday, January 19, 2019 11:29 AM
To: panama-dev at openjdk.java.net
Subject: Allocation caused by vector rebracketing twice
It's great to see that there is now a shiftR method because this was the missing link to make vector bit counts possible. With great excitement, I tried this at 54348:a8516a4be714 but it doesn't work very well yet.
@Benchmark
public int vectorBitCount() {
int bitCount = 0;
var lookupPos = YMM_BYTE.fromArray(LOOKUP_POS, 0);
var lookupNeg = YMM_BYTE.fromArray(LOOKUP_NEG, 0);
var lowMask = YMM_BYTE.broadcast((byte)0x0F);
for (int i = 0; i < data.length; i+= 4) {
var bytes = (ByteVector)YMM_LONG.fromArray(data, i).rebracket(YMM_BYTE);
bitCount += (int)((LongVector)lookupPos.rearrange(bytes.and(lowMask).toShuffle())
.add(lookupNeg.rearrange(bytes.shiftR(4).and(lowMask).toShuffle()))
.rebracket(YMM_LONG)).addAll();
}
return bitCount;
}
JMH -prof gc shows high allocation rates:
Iteration 1: 0.008 ops/us
·gc.alloc.rate: 1112.997 MB/sec
·gc.alloc.rate.norm: 219264.051 B/op
·gc.churn.G1_Eden_Space: 1042.296 MB/sec
·gc.churn.G1_Eden_Space.norm: 205335.753 B/op
·gc.churn.G1_Old_Gen: 0.001 MB/sec
·gc.churn.G1_Old_Gen.norm: 0.258 B/op
·gc.count: 6.000 counts
·gc.time: 6.000 ms
Stripping the code down until negligible allocation rates are observed, the smallest reproducer is where the vector is rebracketed and then the reverse rebracket is performed:
@Benchmark
public int vectorBitCount() {
int bitCount = 0;
for (int i = 0; i < data.length; i+= 4) {
bitCount += (int)((LongVector)YMM_LONG.fromArray(data, i).rebracket(YMM_BYTE).rebracket(YMM_LONG)).addAll();
}
return bitCount;
}
-prof gc:
Iteration 1: 0.310 ops/us
·gc.alloc.rate: 3223.539 MB/sec
·gc.alloc.rate.norm: 16384.001 B/op
·gc.churn.G1_Eden_Space: 3084.887 MB/sec
·gc.churn.G1_Eden_Space.norm: 15679.287 B/op
·gc.churn.G1_Old_Gen: 0.005 MB/sec
·gc.churn.G1_Old_Gen.norm: 0.026 B/op
·gc.count: 8.000 counts
·gc.time: 9.000 ms
Without reversing the rebracket I see negligible allocation
@Benchmark
public int vectorBitCount() {
int bitCount = 0;
for (int i = 0; i < data.length; i+= 4) {
bitCount += (int)((ByteVector)YMM_LONG.fromArray(data, i).rebracket(YMM_BYTE)).addAll();
}
return bitCount;
}
Iteration 1: 1.456 ops/us
·gc.alloc.rate: ≈ 10⁻⁴ MB/sec
·gc.alloc.rate.norm: ≈ 10⁻⁴ B/op
·gc.count: ≈ 0 counts
Thanks,
Richard
More information about the panama-dev
mailing list