[aarch64-port-dev ] RFR: AArch64 -- patch to make volatile reads and stores use ldar/stlr
Andrew Dinn
adinn at redhat.com
Wed Apr 22 09:00:08 UTC 2015
Hi Vladimir,
Thanks for responding.
On 21/04/15 18:03, Vladimir Kozlov wrote:
> Would it help if we add a new flag to NodeFlags to indicate volatile
> memory node?
Yes, I think this would provide some benefit although I don't know if it
will justify cost of implementing and testing the necessary shared code
changes given that the benefit is only partial. Even with such a flag we
would still need to do some of the work that this patch does. Also, the
flag would not help with a the next optimization we are planning
(removing unnecessary barriers from CAS sequences).
Your suggestion will /simplify/ the predicates for LoadX/StoreX but will
not make much difference to execution time. The current predicates for
these nodes, needs_acquiring_load() and needs_releasing_store(), check
immediately whether a LoadX is acquiring or a StoreX is releasing. So,
we have a quick filter for the /common path/ case where we simply
generate an ldr<x> or stlr<x> (I assume volatile ops are relatively rare).
However, we cannot rely solely on these existing flags as they are
sometimes set in other (non-volatile) cases. Currently, if the acquiring
or releasing flag is set the predicates go on to investigate the nodes
around the LoadX/StoreX looking for a specific configuration of
MemBar_XXX nodes. Adding the volatile flag would remove the need to do
this extra work in the /uncommon/ case.
Unfortunately, the other half of the patch involves eliding generation
of dmb instructions for the MemBar_XXX nodes associated with the a LoadX
or StoreX. Three other predicates are used for this purpose,
unnecesary_membar_acquire, unnecesary_membar_release and
unnecesary_membar_volatile. They search the ideal graph around the
membar to see whether it belongs to a signature sequence containing an
acquiring load or releasing store.
The check involves i) finding a related LoadX or StoreX, ii) checking it
is acquiring or releasing and ii) ensuring it is either tailed or
bracketed by a node group with a 'signature' shape which includes the
MemBar we started from. A volatile flag would not really speed up this
check and would not do much to simplify it.
The current predicates test for very specific signature configurations.
So, in cases where the MemBar node is /not/ part of a signature volatile
sequence they detect this very rapidly. In the common case the predicate
immediately fails to find the required sort of parent/child node in the
tree surrounding the initial MemBar (i.e. one which is not a
LoadX/StoreX, Proj or MemBar). If we do arrive at a LoadX/StoreX in the
node tree we could, perhaps, check the volatile flag and short-circuit
any further graph traversal but the bulk of the work involves traversing
from the initial MemBar to the LoadX/StoreX.
n.b. I say perhaps because I am not yet certain that we just test the
volatile flag. I don't know for sure that we could not arrive at the
memory node from a MemBar which was not part of the signature sequence
(I /am/ very confident that the current checks correctly identify
volatile node sequences and only those sequences).
Could we punt this question of providing the volatile flag, considering
it as a potential follow-up improvement? I'd prefer to get this
AArch64-only change in as is -- or, at least, with some tweaking --
before going on to consider something which requires changes to shared code.
Also, I mentioned the CAS case above because that is what I am looking
at next and it extends the technique used in the current patch.
Currently CAS operations translate to something which looks like this
dmb
...
retry:
ldaxr
...
stlxr
cbz retry
...
dmb
The ldaxr/stlxr pair are generated in response to a CompareAndSwapX node
and the dmb instructions arise from MemBar nodes. For a CAS AArch64 can
safely elide the dmb instructions. So, the next improvement will be to
extend the unnecessary_membar_xxx predicates to elide the dmb generation
when a suitable configuration of MemBar_XXX and CompareAndSwapX nodes is
seen. Of course, this test will not be helped at all by provision of a
volatile flag.
regards,
Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Matt Parson (USA), Charlie Peters
(USA), Michael O'Neill (Ireland)
More information about the aarch64-port-dev
mailing list