Slow code due to many AbstractVector::check and AbstractShuffle::checkIndexes checks in C2

Kai Burjack kburjack at googlemail.com
Fri May 22 10:02:30 UTC 2020


I am very much looking forward to the Panama Vector API, which is currently
developed in the
vectorIntrinsics branch and I am currently playing with it for a Java
matrix/vector library in order
to speed up simple 4x4 matrix and vector multiplications, as has been done
for many years in
the .net core library using their SIMD intrinsics.

After having implemented an XMM and also YMM register-based algorithm for
4x4 matrix
multiplications, the issue currently is that all potential speedups are
eliminated even on C2
due to various index checks. When looking at the disassembly in particular:

for FloatVector.rearrange:
; - jdk.internal.vm.vector.VectorSupport$VectorPayload::getPayload at -1 (line
98)
; - jdk.incubator.vector.AbstractShuffle::reorder at 1 (line 75)
; - jdk.incubator.vector.AbstractShuffle::checkIndexes at 1 (line 124)
; - jdk.incubator.vector.FloatVector::rearrangeTemplate at 1 (line 1995)

and FloatVector.fma:
; - jdk.incubator.vector.AbstractVector::sameSpecies at 8 (line 133)
; - jdk.incubator.vector.AbstractVector::check at 2 (line 124)
; - jdk.incubator.vector.FloatVector::lanewiseTemplate at 15 (line 814)
; - jdk.incubator.vector.Float256Vector::lanewise at 4 (line 289)
; - jdk.incubator.vector.Float256Vector::lanewise at 4 (line 41)
; - jdk.incubator.vector.FloatVector::fma at 6 (line 2133)

generate costly checks in C2. So the generated C2 code contains many
thousands of
instructions and branching for what could be a simple sequence of mostly
vmulps, vaddps,
vfmadd231ps, vpermd (or vshufps) and vmovdqu instructions.

If I patch both methods above to avoid the index checks (in particular the
very costly
check in FloatVector.rearrange()) I get my code down from ~53ns/op to
~11ns/op (JMH-benchmarked).
I know it's probably very early to ask about performance for what's
probably not even
a primary use-case of Java (using it to accelerate numeric algorithms for
computer graphics
applications), but I just want to let you know that there are people caring
about it. :)

Anyways, thanks for the fantastic work on Panama so far!

Kind regards,
Kai


More information about the panama-dev mailing list