Slow code due to many AbstractVector::check and AbstractShuffle::checkIndexes checks in C2

Sat May 23 21:38:32 UTC 2020

On May 23, 2020, at 3:33 AM, Kai Burjack <kburjack at googlemail.com <mailto:kburjack at googlemail.com>> wrote:
> 
> Hi Paul, hi John,
> 
> thanks for getting back to me about it!
> 
> I've prepared a standard Maven JMH benchmark under:
> https://github.com/JOML-CI/panama-vector-bench <https://github.com/JOML-CI/panama-vector-bench>
> The README.md contains my current results with as
> much optimization as I could cram out of the code for my
> test CPU.
> 
> I always test from the current tip of the vectorIntrinsics branch of:
> https://github.com/openjdk/panama-vector/tree/vectorIntrinsics <https://github.com/openjdk/panama-vector/tree/vectorIntrinsics>
> as it can be nicely shallow-cloned in a few seconds.
> 
> The results I gave before were based on the commit:
> "[vector] Undo workaround fix" commit.
> 
> It'd be nice if at some point in the future any vectorized algorithm
> was faster than the scalar one in those benchmarks.
> 
> Thanks for looking into it!

Thanks for the extra data.  (Replying to panama-dev to get
it logged.)

> Would it be possible to simply expose the vector species,
> like Float128Vector statically to user code so as not having to
> call vspecies() and drag the actual species as runtime information
> through the template code in FloatVector? That way, the JIT
> would statically know that the user is really only working with
> a particular vector species and can optimize for it?

The JIT is smart and can do that already.  If it fails to do
so in a particular case, there may be a bug in the JIT,
but we expect that any code path which uses just one
kind of vector will “sniff out” the exact type of that vector
and DTRT without the user knowing the name of that
exact type.

This expectation extends even to vector-species-polymorphic
algorithms, as long as either (a) they are inlined or (b) they
are used, dynamically, on only one species at a time.  We
are thinking about additional techniques which would lift
even those restrictions, in the setting of further optimizations
for streams, and eventually streams-over-vectors.

> I am very sure there is a reason for the current design.

Yep.  One reason is complexity:  We are willing to burn
in 1+N type names (to cover N lane types) but not 1+N*(1+M)
type names (to cover M shapes).  Another reason is to
encourage programmers to avoid static dependencies on
particular species; this will (we think) lead to more portable
code.  Yet another reason, building on that, is that we don’t
at this time know all of the shapes we will be supporting
over time.  The existing VectorShape enum reflects current
hardware and software assumptions, and is very very likely
to expand over time.

— John