[aarch64-port-dev ] [Roland Westrelin] Re: Aarch64 port for ZGC, so far

Stuart Monteith stuart.monteith at linaro.org
Thu Nov 29 18:34:06 UTC 2018

    Thanks for looking at this Roland, Andrew. I don't believe I
understand C2 well enough to understand what the solution would be
here in the general case.

I'll add these as tests in a new revision, in the meantime I'm testing
with your patch, and I've come across another case. I've been running
with SPECjbb-2015, as it generates sufficient garbage with enough
complex methods to give some confidence at this early stage. There are
a number of CompareAndSwap variants called here. The example I
included was there for java.lang.class, so failed fairly early on.

The current issue I find is the same trailing_membar assert:

#  Internal Error
pid=30296, tid=30392
#  assert(ldst->trailing_membar() != __null) failed: expected trailing membar

With this in the replay:

compile java/util/concurrent/ConcurrentLinkedQueue offer
(Ljava/lang/Object;)Z -1 4 inline 16 0 -1
java/util/concurrent/ConcurrentLinkedQueue offer (Ljava/lang/Object;)Z
1 5 java/util/Objects requireNonNull
(Ljava/lang/Object;)Ljava/lang/Object; 1 8
java/util/concurrent/ConcurrentLinkedQueue$Node <init>
(Ljava/lang/Object;)V 2 1 java/lang/Object <init> ()V 2 9
java/lang/invoke/VarHandleGuards guard_LL_V
3 30 java/lang/invoke/VarForm getMemberName
(I)Ljava/lang/invoke/MemberName; 3 33
java/lang/invoke/VarHandleReferences$FieldInstanceReadWrite set
4 11 java/util/Objects requireNonNull
(Ljava/lang/Object;)Ljava/lang/Object; 1 39
java/lang/invoke/VarHandleGuards guard_LLL_Z
2 34 java/lang/invoke/VarForm getMemberName
(I)Ljava/lang/invoke/MemberName; 2 37
compareAndSet (Ljava/lang/invoke/VarHandleReferences$FieldInstanceReadWrite;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Z
3 11 java/util/Objects requireNonNull
(Ljava/lang/Object;)Ljava/lang/Object; 1 57
java/lang/invoke/VarHandleGuards guard_LLL_Z
2 34 java/lang/invoke/VarForm getMemberName
(I)Ljava/lang/invoke/MemberName; 2 37
3 11 java/util/Objects requireNonNull

ConcurrentLinkedQueue.offer is different before as is calling
VarHandle instead of Unsafe, and it calls weakCompareAndSet
conditionally depending on the result of a CompareAndSet. I suppose
there is different graph for the matcher to handle, but I don't have
that right now.

I'm writing a separate testcase that calls ConcurrentLinkedQueue, as
there are a few more conditions that need to be handled.
As well as doing that, I'll continue looking for interesting cases, I
can exclude failing cases as I encounter them.

Thanks again,

On Thu, 29 Nov 2018 at 14:58, Roland Westrelin <rwestrel at redhat.com> wrote:
> Hi Andrew,
> > I'll take your word for it that the matcher problem Stuart ran into can
> > be fixed by the tweak you applied to Matcher::clone_address_expressions.
> > I don't know what ZBarrierSetC2::matcher_find_shared_visit is meant to
> > do, never mind how it interacts with clone_address_expressions. However,
> > I don't see any problem having clone_address_expressions return false on
> > AArch64 when the address is being consumed by a LoadStore.
> A field load is:
> (LoadI (AddP base field_offset))
> and we want a single instruction that embeds the address calculation.
> 2 field load of the same field would be:
> (LoadI (AddP base field_offset))
> (LoadI (AddP base field_offset))
> with a shared AddP. When c2 performs matching, if it sees a node that is
> shared such as this one, it matches it as a standalone mach node. So we
> would have an instruction to compute the field address and 2 loads that
> use the result of that instructions. Given how cheap it is to let the
> memory access instruction do the address calculation, that's not what we
> want. Instead we want the matcher to operate as if there are 2 different
> AddP nodes. That's what cloning in the matcher is about. It doesn't
> really clone anything but it makes sure the AddP above is not seen as
> shared and matched once for every memory access instructions.
> Now with ZGC, I think we can have some field access at some (AddP ...)
> address followed by a slow path call to the runtime in the barrier code
> that passes that address. So the (AddP ...) is shared between a memory
> access node and a call node. That causes it to be matched
> separately. Given the call is in the slow path, that's not what we
> want. So the ZGC specific code "clones" the AddP in that scenario.
> On aarch64, in the case of a cas, the AddP address input is "cloned" but
> it's not matched as part of the cas mach node which confuses the matcher
> logic.
> Roland.

More information about the zgc-dev mailing list