code optimized away before a deopt

Deneau, Tom tom.deneau at amd.com
Wed Mar 19 17:04:04 UTC 2014


Tom --



I looked back and I did model our AtomicGetAndAddNode and LoweredAtomicGetAndAddNode after the corresponding CompareAndSwapNodes.  Here is the nodeIntrinsic



    @NodeIntrinsic

    public static long atomicGetAndAdd(long base, int offset, LocationIdentity locationIdentity, int delta) {

        // dummy filler code here

    }



and here is how it is getting called



    public static Word atomicGetAndAddTlabTop(Word thread, int size) {

        return Word.unsigned(AtomicGetAndAddNode.atomicGetAndAdd(thread.rawValue(), threadTlabTopOffset(), TLAB_TOP_LOCATION, size));

    }



In AtomicGetAndAddNode, we basically ignore the location parameter, I don't know if that would be causing this behavior.



The LoweredAtomicGetAndAddNode looks like LoweredCompareAndSwapNode and is shown here:



public class LoweredAtomicGetAndAddNode extends FixedAccessNode implements StateSplit, LIRLowerable, MemoryCheckpoint.Single {



    @Input private ValueNode delta;

    @Input(notDataflow = true) private FrameState stateAfter;



    public FrameState stateAfter() {

        return stateAfter;

    }



    public void setStateAfter(FrameState x) {

        assert x == null || x.isAlive() : "frame state must be in a graph";

        updateUsages(stateAfter, x);

        stateAfter = x;

    }



    public boolean hasSideEffect() {

        return true;

    }



    public ValueNode getDelta() {

        return delta;

    }



    public LoweredAtomicGetAndAddNode(ValueNode object, LocationNode location, ValueNode delta, BarrierType barrierType, boolean compressible) {

        super(object, location, StampFactory.forKind(Kind.Long.getStackKind()), barrierType, compressible);

        this.delta = delta;

    }



    @Override

    public LocationIdentity getLocationIdentity() {

        return location().getLocationIdentity();

    }





    @Override

    public void generate(LIRGeneratorTool gen) {

        HSAILHotSpotLIRGenerator hsailGen = (HSAILHotSpotLIRGenerator) gen;

        hsailGen.visitAtomicGetAndAdd(this, location().generateAddress(hsailGen, hsailGen.operand(object())));

    }



    @Override

    public MemoryCheckpoint asMemoryCheckpoint() {

        return this;

    }



    @Override

    public MemoryPhiNode asMemoryPhi() {

        return null;

    }



}







> -----Original Message-----

> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com]

> Sent: Tuesday, March 18, 2014 12:02 PM

> To: Deneau, Tom

> Cc: Douglas Simon; Lukas Stadler; graal-dev at openjdk.java.net

> Subject: Re: code optimized away before a deopt

>

> After looking at little bit more, I think CompareAndSwapNode would be a

> better model which behaves as a memory checkpoint which includes the

> state split stuff.  Chris Thalinger may be adding a node which

> represents lock xadd to support Unsafe.getAndSet which will probably

> have the semantics you want.  You could probably use that once it lands.

>

> tom

>

> On Mar 18, 2014, at 9:49 AM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:

>

> > OK, thanks, that makes sense.

> >

> >> -----Original Message-----

> >> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com]

> >> Sent: Tuesday, March 18, 2014 11:42 AM

> >> To: Deneau, Tom

> >> Cc: Douglas Simon; Lukas Stadler; graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>

> >> Subject: Re: code optimized away before a deopt

> >>

> >> So it's a custom node of some sort?  I think you probably need make

> >> sure it behaves like a WriteNode by implementing StateSplit and

> >> having hasSideEffect return true.  The state can be null.

> >>

> >> tom

> >>

> >> On Mar 18, 2014, at 7:57 AM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:

> >>

> >>> Tom --

> >>>

> >>> It generates an HSAIL instruction atomic_add_global (similar to lock

> >>> xadd in x86).  I don't know for sure what that becomes in the gpu

> >>> Instructions.  In rough perf measurements I made it was more

> >>> performant than doing a similar thing using an atomic_cas loop

> >>>

> >>> -- Tom

> >>>> -----Original Message-----

> >>>> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com]

> >>>> Sent: Monday, March 17, 2014 6:07 PM

> >>>> To: Deneau, Tom

> >>>> Cc: Douglas Simon; Lukas Stadler; graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>

> >>>> Subject: Re: code optimized away before a deopt

> >>>>

> >>>> How is atomicGetAndAddTlabTop actually implemented?

> >>>>

> >>>> tom

> >>>>

> >>>> On Mar 17, 2014, at 2:28 PM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>>

> wrote:

> >>>>

> >>>>> I'll take a look at edenAllocate logic but I was trying to avoid

> >>>>> the

> >>>> while(true) loop which the compareAndSwap logic requires.

> >>>>>

> >>>>> -- Tom

> >>>>>

> >>>>>> -----Original Message-----

> >>>>>> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com]

> >>>>>> Sent: Monday, March 17, 2014 4:18 PM

> >>>>>> To: Deneau, Tom

> >>>>>> Cc: Douglas Simon; Lukas Stadler; graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>

> >>>>>> Subject: Re: code optimized away before a deopt

> >>>>>>

> >>>>>> So you are using a TLAB from multiple threads so you need use

> >>>>>> atomic's to actually do the allocation?  Writing it using cleanup

> >>>>>> logic doesn't seem right. I think should be using something like

> >>>>>> the edenAllocate code in NewInstanceStub but against the TLAB.

> >>>>>> Then you don't need cleanup logic.  Or you could just allocate

> >>>>>> directly from eden instead of using a TLAB.

> >>>>>>

> >>>>>> tom

> >>>>>>

> >>>>>> On Mar 17, 2014, at 1:40 PM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>>

> >> wrote:

> >>>>>>

> >>>>>>> I was just noticing that the condition mentioned below (first

> >>>>>> mentioned last November) still affects our snippet for

> >>>>>> newInstanceNode lowering.  When I brought this up originally our

> >>>>>> DeoptimizeNode.deopt was not really implemented and was just

> >>>>>> making the workitem return early.  Now, DeoptimizeNode.deopt is

> >>>>>> doing the necessary save state, etc but that doesn't change the

> >>>>>> fact that side-effecting nodes before the deopt will get thrown

> away.

> >>>>>>>

> >>>>>>> We have a workaround that seems to work for now (additional if

> >>>>>>> logic

> >>>>>> around the deopt, which happens to be always true but the

> >>>>>> compiler hasn't figured that out) but I wonder if that is a

> >>>>>> robust

> >> workaround.

> >>>>>> Is there already a bug filed for this, or should I file one?

> >>>>>>>

> >>>>>>> -- Tom

> >>>>>>>

> >>>>>>> Here's the workaround we have now...

> >>>>>>>

> >>>>>>>     Object result;

> >>>>>>>     Word thread = thread();

> >>>>>>>     Word top = atomicGetAndAddTlabTop(thread, size);

> >>>>>>>     Word end = readTlabEnd(thread);

> >>>>>>>     Word newTop = top.add(size);

> >>>>>>>     if (useTLAB() && probability(FAST_PATH_PROBABILITY,

> >>>>>> newTop.belowOrEqual(end))) {

> >>>>>>>         // writeTlabTop(thread, newTop) was done by the

> >>>>>> atomicGetAndAdd

> >>>>>>>         result = formatObject(hub, size, top, prototypeMarkWord,

> >>>>>> fillContents);

> >>>>>>>     } else {

> >>>>>>>         atomicGetAndAddTlabTop(thread, -size);

> >>>>>>>         new_stub.inc();

> >>>>>>>         // the if statement below should not be necessary, but

> >>>>>>> when

> >>>>>> I remove it,

> >>>>>>>         // the atomicGetAndAddTlabTop(thread, -size) doesn't get

> >>>>>> generated.

> >>>>>>>         if (!newTop.belowOrEqual(end)) {

> >>>>>>>             DeoptimizeNode.deopt(DeoptimizationAction.None,

> >>>>>> DeoptimizationReason.RuntimeConstraint);

> >>>>>>>         }

> >>>>>>>    . . .

> >>>>>>>

> >>>>>>>

> >>>>>>>

> >>>>>>>

> >>>>>>>> -----Original Message-----

> >>>>>>>> From: Doug Simon [mailto:doug.simon at oracle.com]

> >>>>>>>> Sent: Saturday, November 23, 2013 10:24 AM

> >>>>>>>> To: Lukas Stadler

> >>>>>>>> Cc: Deneau, Tom; graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>

> >>>>>>>> Subject: Re: code optimized away before a deopt

> >>>>>>>>

> >>>>>>>>

> >>>>>>>> On Nov 23, 2013, at 5:18 PM, Lukas Stadler

> >>>>>>>> <lukas.stadler at jku.at<mailto:lukas.stadler at jku.at>>

> >>>>>> wrote:

> >>>>>>>>

> >>>>>>>>> But it _is_ ok to remove side effecting nodes, because we know

> >>>>>>>>> they

> >>>>>>>> will be reelected.

> >>>>>>>>

> >>>>>>>> Yes, normally, but when you write this pattern in a snippet,

> >>>>>>>> then this won't be true right, since we don't resume execution

> >>>>>>>> in the bytecode of the snippet (yet!). That why I was thinking

> >>>>>>>> at least a warning would be a good idea.

> >>>>>>>>

> >>>>>>>> -Doug

> >>>>>>>>

> >>>>>>>>> Maybe the cleanup operations could be part of a special

> >>>>>>>>> purpose deopt

> >>>>>>>> node?

> >>>>>>>>>

> >>>>>>>>> - Lukas

> >>>>>>>>>

> >>>>>>>>> Von meinem iPhone gesendet

> >>>>>>>>>

> >>>>>>>>>> Am 23.11.2013 um 16:39 schrieb Doug Simon

> >>>> <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>:

> >>>>>>>>>>

> >>>>>>>>>> This is done by the ConvertDeoptimizeToGuardPhase which

> >>>>>>>>>> replaces

> >>>>>>>> conditionals where one branch ends in a deopt with a GuardNode.

> >>>>>>>> This does indeed have the side effect of (silently) deleting

> >>>>>>>> all other nodes on that deopt-terminated branch. We should add

> >>>>>>>> some code to either make the deletion not silent or better,

> >>>>>>>> throw an error if these are any side- effecting nodes that will

> >>>>>>>> be

> >> deleted.

> >>>>>>>>>>

> >>>>>>>>>> -Doug

> >>>>>>>>>>

> >>>>>>>>>>> On Nov 23, 2013, at 1:58 AM, Deneau, Tom

> >>>>>>>>>>> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>>

> >>>>>> wrote:

> >>>>>>>>>>>

> >>>>>>>>>>> I've noticed that if I have a snippet that does a test and

> >>>>>>>>>>> if the

> >>>>>>>> test fails, branches to a block that does some cleanup

> >>>>>>>> operations and then calls DeoptimizeNode.deopt(xxx, yyy), the

> >>>>>>>> cleanup code gets "optimized away".  I guess this is related to

> >>>>>>>> what Gilles was talking about, maybe the cleanup operations

> >>>>>>>> were considered not state

> >>>>>> changing?

> >>>>>>>>>>>

> >>>>>>>>>>> In our current state of HSAIL backend, a deopt just returns

> >>>>>>>>>>> early

> >>>>>>>> from the kernel.   Is there something I can do to make the

> >> cleanup

> >>>>>> code

> >>>>>>>> get emitted before the deopt?

> >>>>>>>>>>>

> >>>>>>>>>>> -- Tom

> >>>>>>>>>>

> >>>>>>>>

> >>>>>>>

> >>>>>>>

> >>>>>>

> >>>>>

> >>>>>

> >>>>

> >>>

> >>>

> >>

> >

> >

>




More information about the graal-dev mailing list