code optimized away before a deopt
Deneau, Tom
tom.deneau at amd.com
Wed Mar 19 17:04:04 UTC 2014
Tom --
I looked back and I did model our AtomicGetAndAddNode and LoweredAtomicGetAndAddNode after the corresponding CompareAndSwapNodes. Here is the nodeIntrinsic
@NodeIntrinsic
public static long atomicGetAndAdd(long base, int offset, LocationIdentity locationIdentity, int delta) {
// dummy filler code here
}
and here is how it is getting called
public static Word atomicGetAndAddTlabTop(Word thread, int size) {
return Word.unsigned(AtomicGetAndAddNode.atomicGetAndAdd(thread.rawValue(), threadTlabTopOffset(), TLAB_TOP_LOCATION, size));
}
In AtomicGetAndAddNode, we basically ignore the location parameter, I don't know if that would be causing this behavior.
The LoweredAtomicGetAndAddNode looks like LoweredCompareAndSwapNode and is shown here:
public class LoweredAtomicGetAndAddNode extends FixedAccessNode implements StateSplit, LIRLowerable, MemoryCheckpoint.Single {
@Input private ValueNode delta;
@Input(notDataflow = true) private FrameState stateAfter;
public FrameState stateAfter() {
return stateAfter;
}
public void setStateAfter(FrameState x) {
assert x == null || x.isAlive() : "frame state must be in a graph";
updateUsages(stateAfter, x);
stateAfter = x;
}
public boolean hasSideEffect() {
return true;
}
public ValueNode getDelta() {
return delta;
}
public LoweredAtomicGetAndAddNode(ValueNode object, LocationNode location, ValueNode delta, BarrierType barrierType, boolean compressible) {
super(object, location, StampFactory.forKind(Kind.Long.getStackKind()), barrierType, compressible);
this.delta = delta;
}
@Override
public LocationIdentity getLocationIdentity() {
return location().getLocationIdentity();
}
@Override
public void generate(LIRGeneratorTool gen) {
HSAILHotSpotLIRGenerator hsailGen = (HSAILHotSpotLIRGenerator) gen;
hsailGen.visitAtomicGetAndAdd(this, location().generateAddress(hsailGen, hsailGen.operand(object())));
}
@Override
public MemoryCheckpoint asMemoryCheckpoint() {
return this;
}
@Override
public MemoryPhiNode asMemoryPhi() {
return null;
}
}
> -----Original Message-----
> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com]
> Sent: Tuesday, March 18, 2014 12:02 PM
> To: Deneau, Tom
> Cc: Douglas Simon; Lukas Stadler; graal-dev at openjdk.java.net
> Subject: Re: code optimized away before a deopt
>
> After looking at little bit more, I think CompareAndSwapNode would be a
> better model which behaves as a memory checkpoint which includes the
> state split stuff. Chris Thalinger may be adding a node which
> represents lock xadd to support Unsafe.getAndSet which will probably
> have the semantics you want. You could probably use that once it lands.
>
> tom
>
> On Mar 18, 2014, at 9:49 AM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
>
> > OK, thanks, that makes sense.
> >
> >> -----Original Message-----
> >> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com]
> >> Sent: Tuesday, March 18, 2014 11:42 AM
> >> To: Deneau, Tom
> >> Cc: Douglas Simon; Lukas Stadler; graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>
> >> Subject: Re: code optimized away before a deopt
> >>
> >> So it's a custom node of some sort? I think you probably need make
> >> sure it behaves like a WriteNode by implementing StateSplit and
> >> having hasSideEffect return true. The state can be null.
> >>
> >> tom
> >>
> >> On Mar 18, 2014, at 7:57 AM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
> >>
> >>> Tom --
> >>>
> >>> It generates an HSAIL instruction atomic_add_global (similar to lock
> >>> xadd in x86). I don't know for sure what that becomes in the gpu
> >>> Instructions. In rough perf measurements I made it was more
> >>> performant than doing a similar thing using an atomic_cas loop
> >>>
> >>> -- Tom
> >>>> -----Original Message-----
> >>>> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com]
> >>>> Sent: Monday, March 17, 2014 6:07 PM
> >>>> To: Deneau, Tom
> >>>> Cc: Douglas Simon; Lukas Stadler; graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>
> >>>> Subject: Re: code optimized away before a deopt
> >>>>
> >>>> How is atomicGetAndAddTlabTop actually implemented?
> >>>>
> >>>> tom
> >>>>
> >>>> On Mar 17, 2014, at 2:28 PM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>>
> wrote:
> >>>>
> >>>>> I'll take a look at edenAllocate logic but I was trying to avoid
> >>>>> the
> >>>> while(true) loop which the compareAndSwap logic requires.
> >>>>>
> >>>>> -- Tom
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com]
> >>>>>> Sent: Monday, March 17, 2014 4:18 PM
> >>>>>> To: Deneau, Tom
> >>>>>> Cc: Douglas Simon; Lukas Stadler; graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>
> >>>>>> Subject: Re: code optimized away before a deopt
> >>>>>>
> >>>>>> So you are using a TLAB from multiple threads so you need use
> >>>>>> atomic's to actually do the allocation? Writing it using cleanup
> >>>>>> logic doesn't seem right. I think should be using something like
> >>>>>> the edenAllocate code in NewInstanceStub but against the TLAB.
> >>>>>> Then you don't need cleanup logic. Or you could just allocate
> >>>>>> directly from eden instead of using a TLAB.
> >>>>>>
> >>>>>> tom
> >>>>>>
> >>>>>> On Mar 17, 2014, at 1:40 PM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>>
> >> wrote:
> >>>>>>
> >>>>>>> I was just noticing that the condition mentioned below (first
> >>>>>> mentioned last November) still affects our snippet for
> >>>>>> newInstanceNode lowering. When I brought this up originally our
> >>>>>> DeoptimizeNode.deopt was not really implemented and was just
> >>>>>> making the workitem return early. Now, DeoptimizeNode.deopt is
> >>>>>> doing the necessary save state, etc but that doesn't change the
> >>>>>> fact that side-effecting nodes before the deopt will get thrown
> away.
> >>>>>>>
> >>>>>>> We have a workaround that seems to work for now (additional if
> >>>>>>> logic
> >>>>>> around the deopt, which happens to be always true but the
> >>>>>> compiler hasn't figured that out) but I wonder if that is a
> >>>>>> robust
> >> workaround.
> >>>>>> Is there already a bug filed for this, or should I file one?
> >>>>>>>
> >>>>>>> -- Tom
> >>>>>>>
> >>>>>>> Here's the workaround we have now...
> >>>>>>>
> >>>>>>> Object result;
> >>>>>>> Word thread = thread();
> >>>>>>> Word top = atomicGetAndAddTlabTop(thread, size);
> >>>>>>> Word end = readTlabEnd(thread);
> >>>>>>> Word newTop = top.add(size);
> >>>>>>> if (useTLAB() && probability(FAST_PATH_PROBABILITY,
> >>>>>> newTop.belowOrEqual(end))) {
> >>>>>>> // writeTlabTop(thread, newTop) was done by the
> >>>>>> atomicGetAndAdd
> >>>>>>> result = formatObject(hub, size, top, prototypeMarkWord,
> >>>>>> fillContents);
> >>>>>>> } else {
> >>>>>>> atomicGetAndAddTlabTop(thread, -size);
> >>>>>>> new_stub.inc();
> >>>>>>> // the if statement below should not be necessary, but
> >>>>>>> when
> >>>>>> I remove it,
> >>>>>>> // the atomicGetAndAddTlabTop(thread, -size) doesn't get
> >>>>>> generated.
> >>>>>>> if (!newTop.belowOrEqual(end)) {
> >>>>>>> DeoptimizeNode.deopt(DeoptimizationAction.None,
> >>>>>> DeoptimizationReason.RuntimeConstraint);
> >>>>>>> }
> >>>>>>> . . .
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Doug Simon [mailto:doug.simon at oracle.com]
> >>>>>>>> Sent: Saturday, November 23, 2013 10:24 AM
> >>>>>>>> To: Lukas Stadler
> >>>>>>>> Cc: Deneau, Tom; graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>
> >>>>>>>> Subject: Re: code optimized away before a deopt
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Nov 23, 2013, at 5:18 PM, Lukas Stadler
> >>>>>>>> <lukas.stadler at jku.at<mailto:lukas.stadler at jku.at>>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> But it _is_ ok to remove side effecting nodes, because we know
> >>>>>>>>> they
> >>>>>>>> will be reelected.
> >>>>>>>>
> >>>>>>>> Yes, normally, but when you write this pattern in a snippet,
> >>>>>>>> then this won't be true right, since we don't resume execution
> >>>>>>>> in the bytecode of the snippet (yet!). That why I was thinking
> >>>>>>>> at least a warning would be a good idea.
> >>>>>>>>
> >>>>>>>> -Doug
> >>>>>>>>
> >>>>>>>>> Maybe the cleanup operations could be part of a special
> >>>>>>>>> purpose deopt
> >>>>>>>> node?
> >>>>>>>>>
> >>>>>>>>> - Lukas
> >>>>>>>>>
> >>>>>>>>> Von meinem iPhone gesendet
> >>>>>>>>>
> >>>>>>>>>> Am 23.11.2013 um 16:39 schrieb Doug Simon
> >>>> <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>:
> >>>>>>>>>>
> >>>>>>>>>> This is done by the ConvertDeoptimizeToGuardPhase which
> >>>>>>>>>> replaces
> >>>>>>>> conditionals where one branch ends in a deopt with a GuardNode.
> >>>>>>>> This does indeed have the side effect of (silently) deleting
> >>>>>>>> all other nodes on that deopt-terminated branch. We should add
> >>>>>>>> some code to either make the deletion not silent or better,
> >>>>>>>> throw an error if these are any side- effecting nodes that will
> >>>>>>>> be
> >> deleted.
> >>>>>>>>>>
> >>>>>>>>>> -Doug
> >>>>>>>>>>
> >>>>>>>>>>> On Nov 23, 2013, at 1:58 AM, Deneau, Tom
> >>>>>>>>>>> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>>
> >>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> I've noticed that if I have a snippet that does a test and
> >>>>>>>>>>> if the
> >>>>>>>> test fails, branches to a block that does some cleanup
> >>>>>>>> operations and then calls DeoptimizeNode.deopt(xxx, yyy), the
> >>>>>>>> cleanup code gets "optimized away". I guess this is related to
> >>>>>>>> what Gilles was talking about, maybe the cleanup operations
> >>>>>>>> were considered not state
> >>>>>> changing?
> >>>>>>>>>>>
> >>>>>>>>>>> In our current state of HSAIL backend, a deopt just returns
> >>>>>>>>>>> early
> >>>>>>>> from the kernel. Is there something I can do to make the
> >> cleanup
> >>>>>> code
> >>>>>>>> get emitted before the deopt?
> >>>>>>>>>>>
> >>>>>>>>>>> -- Tom
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
>
More information about the graal-dev
mailing list