[aarch64-port-dev ] RFR: 8204331: AArch64: fix CAS not embedded in normal graph error

Thu Jun 21 14:20:21 UTC 2018

The following patch fixes a problem in the AArch64 port which was caused
by the introduction of the GC API which delegates responsibility for
barrier generation to GC code (JDK-8202377)

  webrev:  http://cr.openjdk.java.net/~adinn/8204331/webrev.00/

  JIRA: https://bugs.openjdk.java.net/browse/JDK-8204331

The new G1 barrier assembler in combination with the C2 fence generator
introduced a few small changes to the the memory flow layout in ideal
subgraphs arising from (normal and Unsafe) volatile stores and (Unsafe)
volatile CASes. This affects the results returned by the AARch64
predicates employed by ad file instructions to suppress generation of
certain hw memory barriers instructions and, instead, generate acquiring
loads + releasing stores. The change was caught by asserts in the CAS
code which detected the new, unexpected graph shape.

Reviews would be very welcome!

The Patch:

The fix involves several small changes to reflect each change introduced
by the GC code.

1) Re-ordering of trailing volatile/cpu order barriers
The C2 fence generator class reversed the order of the volatile and cpu
order pair generated at the end of an Unsafe volatile store subgraph
(from Volatile=>CPUOrder to CPUOrder=>Volatile). This is now uniform
with the order of the trailing acquire/cpu order barriers
(CPUOrder=>Acquire).

The relevant predicates have been updated to expect and check for
barriers in this order.

2) Extra memory flows in CAS graphs. Unsafe CASObject graphs for GCs
which employ a conditional card mark (G1GC or CMS+USeCondCardMark) may
now include an extra If=>IfTrue/IfFalse ===> Region+Phi(BotIdx) between
the card mark memory barrier and the trailing cpu order/acquire
barriers. The new IfNode models a test of the boolean result returned by
the CAS operation. It gets wrapped up in the barrier flow because of the
different order of generation of the GC post barrier and the (SCMemProj)
memory projection from the CAS itself.

Previously the CAS and SCMemProj simply fed direct into the trailing
barriers so there was no need to check the GC post-barrier subgraph. Now
a CASObject looks more like a volatile StoreN/P, both of them feeding
their memory flow into the card mark membar. In consequence, testing for
a CASObject now requires two stages as per a StoreN/P -- matching the
pair of memory subgraphs from leading to card mark membar and from card
mark to trailing membar.

The predicates which traverse the memory flow between leading and
trailing/card mark membars have been updated so they now look for the
same pattern in either case, modulo the presence of a Store or a
CAS+SCMemProj pair. In the case where a card mark member is present this
new test is now combined with a check on the trailing graph in the CAS
case, just as with a StoreN/P.

The predicates which traverse the memory flow between card mark and
trailing membars  have been updated to search through (potentially) one
more Phi(BotIdx) link. So, the maximum number of Phis to indirect trough
has been increased to 4 for G1GC and 1 for ConcMarkSweep+UseCondCardMark.

Testing:

This has been tested by running tier1 tests. All the tests which were
previously failing are now working. There were 5 failures due to
timeouts which do not appear to relate to this issue.

The code has also been tested by printing generated code for some simple
usages of volatile load/store and CAS, checking the generated code by
eyeball. This was repeated for SerialGC, ParallelGC,
ConcMarkSweep-UseCondCardMark, ConcMarkSweep+UseCondCardMark and G1GC.

It would be good to also run further tests (esp jcstress). However,
since this breakage is stopping testing of other changes and since it
seems to remedy the problem in common cases at least I suggest
committing this fix now and running these tests afterwards.

Additional test specific to this issue?:

There are currently no proper tests for this code as it was difficult to
know how to identify what code gets generated. Detection of failures is
left to asserts in the predicates. That approach has worked in the
present case but only after a breaking change has been pushed and only
by detecting one of several breaking changes. So the current regime is
rather fragile. It would be good to have some jtreg tests to help detect
breakages like this more quickly and effectively.

A slowdebug build with an available hsdis-aarch64.so could be used to
print out generated assembly. Usefully, in debug mode the C2 back end
will print out block comments indicating both where membars have been
placed and where they have been elided. If a jtreg test was run in an
otherjvm with CompileCommand options to compile and print target methods
then the output could be scanned to look for the relevant membar/membar
elided comments and for presence or absence of ldar/stlr and ldaxr/stlxr
instructions. This would be enough to test that basic transformations
are sound.

Is it possible to rely on an hsdis library being available in jtreg
tests? Alternatively, is it possible to make execution of the test
conditional on the library being present?

regards,

Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander