code optimized away before a deopt

Tue Mar 18 14:57:53 UTC 2014

Tom --

It generates an HSAIL instruction atomic_add_global (similar to lock xadd in x86).  I don't know for sure what that becomes in the gpu Instructions.  In rough perf measurements I made it was more performant than doing a similar thing using an atomic_cas loop

-- Tom
> -----Original Message-----
> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com]
> Sent: Monday, March 17, 2014 6:07 PM
> To: Deneau, Tom
> Cc: Douglas Simon; Lukas Stadler; graal-dev at openjdk.java.net
> Subject: Re: code optimized away before a deopt
> 
> How is atomicGetAndAddTlabTop actually implemented?
> 
> tom
> 
> On Mar 17, 2014, at 2:28 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> 
> > I'll take a look at edenAllocate logic but I was trying to avoid the
> while(true) loop which the compareAndSwap logic requires.
> >
> > -- Tom
> >
> >> -----Original Message-----
> >> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com]
> >> Sent: Monday, March 17, 2014 4:18 PM
> >> To: Deneau, Tom
> >> Cc: Douglas Simon; Lukas Stadler; graal-dev at openjdk.java.net
> >> Subject: Re: code optimized away before a deopt
> >>
> >> So you are using a TLAB from multiple threads so you need use
> >> atomic's to actually do the allocation?  Writing it using cleanup
> >> logic doesn't seem right. I think should be using something like the
> >> edenAllocate code in NewInstanceStub but against the TLAB.  Then you
> >> don't need cleanup logic.  Or you could just allocate directly from
> >> eden instead of using a TLAB.
> >>
> >> tom
> >>
> >> On Mar 17, 2014, at 1:40 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> >>
> >>> I was just noticing that the condition mentioned below (first
> >> mentioned last November) still affects our snippet for
> >> newInstanceNode lowering.  When I brought this up originally our
> >> DeoptimizeNode.deopt was not really implemented and was just making
> >> the workitem return early.  Now, DeoptimizeNode.deopt is doing the
> >> necessary save state, etc but that doesn't change the fact that
> >> side-effecting nodes before the deopt will get thrown away.
> >>>
> >>> We have a workaround that seems to work for now (additional if logic
> >> around the deopt, which happens to be always true but the compiler
> >> hasn't figured that out) but I wonder if that is a robust workaround.
> >> Is there already a bug filed for this, or should I file one?
> >>>
> >>> -- Tom
> >>>
> >>> Here's the workaround we have now...
> >>>
> >>>       Object result;
> >>>       Word thread = thread();
> >>>       Word top = atomicGetAndAddTlabTop(thread, size);
> >>>       Word end = readTlabEnd(thread);
> >>>       Word newTop = top.add(size);
> >>>       if (useTLAB() && probability(FAST_PATH_PROBABILITY,
> >> newTop.belowOrEqual(end))) {
> >>>           // writeTlabTop(thread, newTop) was done by the
> >> atomicGetAndAdd
> >>>           result = formatObject(hub, size, top, prototypeMarkWord,
> >> fillContents);
> >>>       } else {
> >>>           atomicGetAndAddTlabTop(thread, -size);
> >>>           new_stub.inc();
> >>>           // the if statement below should not be necessary, but
> >>> when
> >> I remove it,
> >>>           // the atomicGetAndAddTlabTop(thread, -size) doesn't get
> >> generated.
> >>>           if (!newTop.belowOrEqual(end)) {
> >>>               DeoptimizeNode.deopt(DeoptimizationAction.None,
> >> DeoptimizationReason.RuntimeConstraint);
> >>>           }
> >>>      . . .
> >>>
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Doug Simon [mailto:doug.simon at oracle.com]
> >>>> Sent: Saturday, November 23, 2013 10:24 AM
> >>>> To: Lukas Stadler
> >>>> Cc: Deneau, Tom; graal-dev at openjdk.java.net
> >>>> Subject: Re: code optimized away before a deopt
> >>>>
> >>>>
> >>>> On Nov 23, 2013, at 5:18 PM, Lukas Stadler <lukas.stadler at jku.at>
> >> wrote:
> >>>>
> >>>>> But it _is_ ok to remove side effecting nodes, because we know
> >>>>> they
> >>>> will be reelected.
> >>>>
> >>>> Yes, normally, but when you write this pattern in a snippet, then
> >>>> this won't be true right, since we don't resume execution in the
> >>>> bytecode of the snippet (yet!). That why I was thinking at least a
> >>>> warning would be a good idea.
> >>>>
> >>>> -Doug
> >>>>
> >>>>> Maybe the cleanup operations could be part of a special purpose
> >>>>> deopt
> >>>> node?
> >>>>>
> >>>>> - Lukas
> >>>>>
> >>>>> Von meinem iPhone gesendet
> >>>>>
> >>>>>> Am 23.11.2013 um 16:39 schrieb Doug Simon
> <doug.simon at oracle.com>:
> >>>>>>
> >>>>>> This is done by the ConvertDeoptimizeToGuardPhase which replaces
> >>>> conditionals where one branch ends in a deopt with a GuardNode.
> >>>> This does indeed have the side effect of (silently) deleting all
> >>>> other nodes on that deopt-terminated branch. We should add some
> >>>> code to either make the deletion not silent or better, throw an
> >>>> error if these are any side- effecting nodes that will be deleted.
> >>>>>>
> >>>>>> -Doug
> >>>>>>
> >>>>>>> On Nov 23, 2013, at 1:58 AM, Deneau, Tom <tom.deneau at amd.com>
> >> wrote:
> >>>>>>>
> >>>>>>> I've noticed that if I have a snippet that does a test and if
> >>>>>>> the
> >>>> test fails, branches to a block that does some cleanup operations
> >>>> and then calls DeoptimizeNode.deopt(xxx, yyy), the cleanup code
> >>>> gets "optimized away".  I guess this is related to what Gilles was
> >>>> talking about, maybe the cleanup operations were considered not
> >>>> state
> >> changing?
> >>>>>>>
> >>>>>>> In our current state of HSAIL backend, a deopt just returns
> >>>>>>> early
> >>>> from the kernel.   Is there something I can do to make the cleanup
> >> code
> >>>> get emitted before the deopt?
> >>>>>>>
> >>>>>>> -- Tom
> >>>>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
>