[aarch64-port-dev ] Weird behaviour with tests for: JDK-8213134 AArch64: vector shift failed with MaxVectorSize=8

Mon Aug 12 15:23:35 UTC 2019

On 12/08/2019 11:30, Andrew Dinn wrote:
> I am currently trying to test a backport of the above patch for JDK11 on
> AArch64 and I noticed that tests Test{Short/Int/...}Vect etc all run
> much slower than TestByteVect. Waaaay slower. The difference is roughly
> 10-20 seconds vs 10-20 minutes.

I forgot to mention the figures cited above were obtained running with a
fastdebug build. The problem is less severe with a release build
although there is still quite a noticeable slowdown (from 3 seconds to
33 seconds).

It turns out that the problem when executing the TestShortVect relates
to the sequence of loops at the end of the top level method. Each of
them iterates over a call to a different sub-test method (there are 72
of these methods!).

When each loop is entered the top-level method first gets C1 OSR
compiled and then C2 OSR compiled.

The C2 compile includes only the code for the loop. After the loop
completes it terminates very quickly with an unconditional uncommon trap
that reverts to interpreted.

The C1 code runs particularly slowly as it includes lots of profiling
(and for non-product code also includes various debug checks). The C1
compile also takes a long time as it compiles the whole method every
time rather than just the loop code.

So, the slow down seems to result from a combination of taking a long
time to deliver not very well optimized C1 code to replace interpreted
execution and very little gain when the C1 code finally gets run because
of the associated profiling (and debug verify) costs with the assumed
gain from doing that compilation foiled by an almost immediate reversion
to interpreted once the C2 code is delivered.

Anyway, the important thing is that it doesn't appear to be a problem
with the patch side-effecting the test which was what I really had to check.

The puzzling this is why this same problem does not cause a slow down
for TestByteVect? There are a similar 72 loops at the end of the top
level method but they don't lead to OSR compiles of the top level
method. Given that the code is pretty much identical except for using
byte[] in place of short[] that's still something of a mystery.

regards,

Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander