[aarch64-port-dev ] Weird behaviour with tests for: JDK-8213134 AArch64: vector shift failed with MaxVectorSize=8

Mon Aug 12 10:30:17 UTC 2019

I am currently trying to test a backport of the above patch for JDK11 on
AArch64 and I noticed that tests Test{Short/Int/...}Vect etc all run
much slower than TestByteVect. Waaaay slower. The difference is roughly
10-20 seconds vs 10-20 minutes.

I also noticed when I ran 'top' in thread view that the C1 Compiler
thread runs flat out while the Main thread eats only a few percent of CPU.

This is weird because all the tests have essentially the same code
structure: Call a whole slew of sub-test methods (there really are a lot
of them) in a loop to ensure they are C2 compiled. Then call each
sub-test method and loop over the resultant array verifying that each
entry is as expected.

I reran with -XX:+PrintCompilation and observed a significant difference
in the compiler behaviour. In the Byte test method TestByteVect::test
and the many sub-test methods it calls in its warmup loop get compiled
at level 3 and then level 4. They *stay* that way as the main routine
goes on to call each sub-test method in the block that follows the loop.

By contrast: in the Short test method TestShortVect::test still gets
compiled at level 3 then level 4 in the warmup loop as does each
sub-test method called in the loop but then ... Between successive
sub-test calls in the following block the level 4 and level 3 versions
of TestShortVect::test are repeatedly made not entrant. The method is
deoptimized and then re-optimized first to level 3 and then to level 4
(you can tell because after each call a message is printed).

Finally, towards the end of the Short Test test there is a whole series
of 'made zombie' notices for the many different level 3 and level 4
versions of TestShortVect::test as well as a pair of 'made zombie'
notices for the level 3 versions of each of the sub-test methods.

So, it seemed that this weird oscillating behaviour constitutes repeated
OSR compilation then deopt of the main test method. It was not clear to
me why the de-opt is happening (also why it is not happening in the Byte
case). The way the top level test routine is routine seems to pose a
severe challenge to some of the assumptions the OSR compiler is making.
Anyway, it is bizarre as well as extremely inconvenient that each test
takes such a very, very long time to run. Especially inconvenient given
that each test has to be run for each of the 4 available MaxVectorSize
settings.

In order to verify that it is the top level test method causing the
problem I reran the TestShortVect test with a compiler restriction as
follows

  -XX:CompileCommand=compileonly,TestShortVect::test_*

i.e. only compile the sub-test methods called from the main loop (which
all start with prefix test_). As expected, the run time came back down
into the expected 10-20 second range.

Clearly, the fact that these tests constitute a basket case for the
compiler needs further investigation. I will look into that -- unless
anyone already knows why this will be happening.

Meanwhile, there may well be some easy way to avoid the compiler issue
by refactoring the code (e.g. installing the loops in a separate verify
routine). If not then a CompileCommand added to the @run arguments for
the test would make the tests much more useful (well, it would make them
useful).

regards,

Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander