Array addition and array sum Panama benchmarks
Roland Westrelin
rwestrel at redhat.com
Wed Mar 20 16:26:53 UTC 2024
> I think a good place to start would be to explain the difference between
> scalarUnsafeArray and scalarUnsafeUnsafe. The other benchmark can be
> looked at later, as (after looking at the assembly) I don’t think this
> is an issue that is specific to FFM.
The reason there's no vectorization for scalarUnsafeUnsafe:
public void scalarUnsafeUnsafe(Data state) {
final long ia = state.inputAddress;
final long oa = state.outputAddress;
for(int i = 0; i < SIZE; i++) {
U.putDouble(oa + 8*i, U.getDouble(ia + 8*i) + U.getDouble(oa + 8*i));
}
}
is that the compiler can't prove it's legal to vectorize. Doubles are
read from ia and oa and then added and written back to oa. There's no
way for the compiler to tell that the off heap areas pointed to by ia
and oa don't overlap. So possibly, the value written to:
oa + 8*i
is going to be read back at the next iteration with:
ia + 8*i
(ia could be oa+8)
The autovectorizer would need to insert a runtime check that the 2 areas
don't overlap but there's no support for that at this point. I suppose
the same issue exists with the MemorySegment API when memory is off
heap.
But then why does this one:
public void scalarSegmentArray(Data state) {
final MemorySegment input = state.inputSegment;
final double[] output = state.outputArray;
for(int i = 0; i < SIZE; i++) {
output[i] += input.getAtIndex(JAVA_DOUBLE, i);
}
}
not vectorize? input and output can't overlap because one is off heap
and the other is on heap. It seems for doubles the MemorySegment API
reads a double in 2 steps: use getLongUnaligned() and then convert the
result to double with Double.longBitsToDouble(). The vectorizer doesn't
support vectorization of that long to double move. Whether it can or not
(that is whether vector instructions for that exist or not), I don't
know.
Roland.
More information about the panama-dev
mailing list