code optimized away before a deopt

Tue Mar 18 17:01:33 UTC 2014

After looking at little bit more, I think CompareAndSwapNode would be a better model which behaves as a memory checkpoint which includes the state split stuff.  Chris Thalinger may be adding a node which represents lock xadd to support Unsafe.getAndSet which will probably have the semantics you want.  You could probably use that once it lands.

tom

On Mar 18, 2014, at 9:49 AM, Deneau, Tom <tom.deneau at amd.com> wrote:

> OK, thanks, that makes sense.
> 
>> -----Original Message-----
>> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com]
>> Sent: Tuesday, March 18, 2014 11:42 AM
>> To: Deneau, Tom
>> Cc: Douglas Simon; Lukas Stadler; graal-dev at openjdk.java.net
>> Subject: Re: code optimized away before a deopt
>> 
>> So it's a custom node of some sort?  I think you probably need make sure
>> it behaves like a WriteNode by implementing StateSplit and having
>> hasSideEffect return true.  The state can be null.
>> 
>> tom
>> 
>> On Mar 18, 2014, at 7:57 AM, Deneau, Tom <tom.deneau at amd.com> wrote:
>> 
>>> Tom --
>>> 
>>> It generates an HSAIL instruction atomic_add_global (similar to lock
>>> xadd in x86).  I don't know for sure what that becomes in the gpu
>>> Instructions.  In rough perf measurements I made it was more
>>> performant than doing a similar thing using an atomic_cas loop
>>> 
>>> -- Tom
>>>> -----Original Message-----
>>>> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com]
>>>> Sent: Monday, March 17, 2014 6:07 PM
>>>> To: Deneau, Tom
>>>> Cc: Douglas Simon; Lukas Stadler; graal-dev at openjdk.java.net
>>>> Subject: Re: code optimized away before a deopt
>>>> 
>>>> How is atomicGetAndAddTlabTop actually implemented?
>>>> 
>>>> tom
>>>> 
>>>> On Mar 17, 2014, at 2:28 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>>>> 
>>>>> I'll take a look at edenAllocate logic but I was trying to avoid the
>>>> while(true) loop which the compareAndSwap logic requires.
>>>>> 
>>>>> -- Tom
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com]
>>>>>> Sent: Monday, March 17, 2014 4:18 PM
>>>>>> To: Deneau, Tom
>>>>>> Cc: Douglas Simon; Lukas Stadler; graal-dev at openjdk.java.net
>>>>>> Subject: Re: code optimized away before a deopt
>>>>>> 
>>>>>> So you are using a TLAB from multiple threads so you need use
>>>>>> atomic's to actually do the allocation?  Writing it using cleanup
>>>>>> logic doesn't seem right. I think should be using something like
>>>>>> the edenAllocate code in NewInstanceStub but against the TLAB.
>>>>>> Then you don't need cleanup logic.  Or you could just allocate
>>>>>> directly from eden instead of using a TLAB.
>>>>>> 
>>>>>> tom
>>>>>> 
>>>>>> On Mar 17, 2014, at 1:40 PM, Deneau, Tom <tom.deneau at amd.com>
>> wrote:
>>>>>> 
>>>>>>> I was just noticing that the condition mentioned below (first
>>>>>> mentioned last November) still affects our snippet for
>>>>>> newInstanceNode lowering.  When I brought this up originally our
>>>>>> DeoptimizeNode.deopt was not really implemented and was just making
>>>>>> the workitem return early.  Now, DeoptimizeNode.deopt is doing the
>>>>>> necessary save state, etc but that doesn't change the fact that
>>>>>> side-effecting nodes before the deopt will get thrown away.
>>>>>>> 
>>>>>>> We have a workaround that seems to work for now (additional if
>>>>>>> logic
>>>>>> around the deopt, which happens to be always true but the compiler
>>>>>> hasn't figured that out) but I wonder if that is a robust
>> workaround.
>>>>>> Is there already a bug filed for this, or should I file one?
>>>>>>> 
>>>>>>> -- Tom
>>>>>>> 
>>>>>>> Here's the workaround we have now...
>>>>>>> 
>>>>>>>     Object result;
>>>>>>>     Word thread = thread();
>>>>>>>     Word top = atomicGetAndAddTlabTop(thread, size);
>>>>>>>     Word end = readTlabEnd(thread);
>>>>>>>     Word newTop = top.add(size);
>>>>>>>     if (useTLAB() && probability(FAST_PATH_PROBABILITY,
>>>>>> newTop.belowOrEqual(end))) {
>>>>>>>         // writeTlabTop(thread, newTop) was done by the
>>>>>> atomicGetAndAdd
>>>>>>>         result = formatObject(hub, size, top, prototypeMarkWord,
>>>>>> fillContents);
>>>>>>>     } else {
>>>>>>>         atomicGetAndAddTlabTop(thread, -size);
>>>>>>>         new_stub.inc();
>>>>>>>         // the if statement below should not be necessary, but
>>>>>>> when
>>>>>> I remove it,
>>>>>>>         // the atomicGetAndAddTlabTop(thread, -size) doesn't get
>>>>>> generated.
>>>>>>>         if (!newTop.belowOrEqual(end)) {
>>>>>>>             DeoptimizeNode.deopt(DeoptimizationAction.None,
>>>>>> DeoptimizationReason.RuntimeConstraint);
>>>>>>>         }
>>>>>>>    . . .
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: Doug Simon [mailto:doug.simon at oracle.com]
>>>>>>>> Sent: Saturday, November 23, 2013 10:24 AM
>>>>>>>> To: Lukas Stadler
>>>>>>>> Cc: Deneau, Tom; graal-dev at openjdk.java.net
>>>>>>>> Subject: Re: code optimized away before a deopt
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Nov 23, 2013, at 5:18 PM, Lukas Stadler <lukas.stadler at jku.at>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> But it _is_ ok to remove side effecting nodes, because we know
>>>>>>>>> they
>>>>>>>> will be reelected.
>>>>>>>> 
>>>>>>>> Yes, normally, but when you write this pattern in a snippet, then
>>>>>>>> this won't be true right, since we don't resume execution in the
>>>>>>>> bytecode of the snippet (yet!). That why I was thinking at least
>>>>>>>> a warning would be a good idea.
>>>>>>>> 
>>>>>>>> -Doug
>>>>>>>> 
>>>>>>>>> Maybe the cleanup operations could be part of a special purpose
>>>>>>>>> deopt
>>>>>>>> node?
>>>>>>>>> 
>>>>>>>>> - Lukas
>>>>>>>>> 
>>>>>>>>> Von meinem iPhone gesendet
>>>>>>>>> 
>>>>>>>>>> Am 23.11.2013 um 16:39 schrieb Doug Simon
>>>> <doug.simon at oracle.com>:
>>>>>>>>>> 
>>>>>>>>>> This is done by the ConvertDeoptimizeToGuardPhase which
>>>>>>>>>> replaces
>>>>>>>> conditionals where one branch ends in a deopt with a GuardNode.
>>>>>>>> This does indeed have the side effect of (silently) deleting all
>>>>>>>> other nodes on that deopt-terminated branch. We should add some
>>>>>>>> code to either make the deletion not silent or better, throw an
>>>>>>>> error if these are any side- effecting nodes that will be
>> deleted.
>>>>>>>>>> 
>>>>>>>>>> -Doug
>>>>>>>>>> 
>>>>>>>>>>> On Nov 23, 2013, at 1:58 AM, Deneau, Tom <tom.deneau at amd.com>
>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I've noticed that if I have a snippet that does a test and if
>>>>>>>>>>> the
>>>>>>>> test fails, branches to a block that does some cleanup operations
>>>>>>>> and then calls DeoptimizeNode.deopt(xxx, yyy), the cleanup code
>>>>>>>> gets "optimized away".  I guess this is related to what Gilles
>>>>>>>> was talking about, maybe the cleanup operations were considered
>>>>>>>> not state
>>>>>> changing?
>>>>>>>>>>> 
>>>>>>>>>>> In our current state of HSAIL backend, a deopt just returns
>>>>>>>>>>> early
>>>>>>>> from the kernel.   Is there something I can do to make the
>> cleanup
>>>>>> code
>>>>>>>> get emitted before the deopt?
>>>>>>>>>>> 
>>>>>>>>>>> -- Tom
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
> 
>