suspcions about GC and HSAIL Deopt

Mon Mar 10 17:03:12 UTC 2014

It’s based on my understanding of what the special deopting method does which is something like:

void deoptFromHSAIL(int id, HSAILFrame frame) {
   if (id == 0) {
       // copy info out of frame into registers/stack slots
       Deoptimize();
   } else if (id == 1) {
       // copy info out of frame into registers/stack slots
       Deoptimize();
   } else if ...

Gilles can confirm/correct.

-Doug

On Mar 10, 2014, at 5:53 PM, Deneau, Tom <tom.deneau at amd.com> wrote:

> Gilles, Doug --
> 
> 
> 
> I was wondering about this statement Doug made...
> 
> 
> 
> This javaCall is to the special deopting nmethod if I understand correctly. And the save state area is used solely as input to a deopt instruction in which case there is no possibility of a GC between entering the javaCall and hitting the deopt instruction by which time all oops have been copied from the save state area (i.e., the hsailFrame) to slots in the special deopting method’s frame.
> 
> 
> 
> 
> 
> Is it true there is no possibility of GC between entering the nmethod and hitting the deopt call/instruction?  How is that prevented?
> 
> 
> 
> -- Tom
> 
> 
> 
>> -----Original Message-----
> 
>> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
> 
>> Gilles Duboscq
> 
>> Sent: Monday, March 10, 2014 10:14 AM
> 
>> To: Deneau, Tom
> 
>> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> 
>> Subject: Re: suspcions about GC and HSAIL Deopt
> 
>> 
> 
>> Ok, sounds good
> 
>> 
> 
>> On Mon, Mar 10, 2014 at 2:28 PM, Tom Deneau <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
> 
>>> Gilles --
> 
>>> 
> 
>>> Update on this...
> 
>>> 
> 
>>> Yes, I put in the code to save the oops maps, currently somewhat
> 
>> simplified in that only hsail $d registers can have oops and we are not
> 
>> saving stack slots yet.
> 
>>> 
> 
>>> Using that I implemented a quickie solution that copied the detected
> 
>> oops into a regular java Object array before the first deopt, then
> 
>> reloaded them into the particular frame before each deopt.  Logging code
> 
>> did show that there were times when the original value of the oop had
> 
>> changed to a new value and we no longer hit our spurious failures.
> 
>> I'm sure its inefficient when compared to an oops_do approach but it did
> 
>> seem to work.
> 
>>> 
> 
>>> I will probably submit the webrev with this quickie solution and we
> 
>> can discuss how to make it use oops_do.
> 
>>> 
> 
>>> -- Tom
> 
>>> 
> 
>>> 
> 
>>> 
> 
>>> 
> 
>>> 
> 
>>>> -----Original Message-----
> 
>>>> From: gilwooden at gmail.com<mailto:gilwooden at gmail.com> [mailto:gilwooden at gmail.com] On Behalf Of
> 
>>>> Gilles Duboscq
> 
>>>> Sent: Monday, March 10, 2014 7:58 AM
> 
>>>> To: Deneau, Tom
> 
>>>> Cc: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>; sumatra-dev at openjdk.java.net<mailto:sumatra-dev at openjdk.java.net>
> 
>>>> Subject: Re: suspcions about GC and HSAIL Deopt
> 
>>>> 
> 
>>>> Using Handle and restoring the value should work. In the long term we
> 
>>>> may want to just have an opps_do on the save area and hook into
> 
>>>> JavaThread::oops_do.
> 
>>>> 
> 
>>>> However even with the Handles version you need "oop maps" for the
> 
>>>> save areas. It shouldn't be very hard to extract them from the HSAIL
> 
>>>> compilation but currently they are just thrown away.
> 
>>>> 
> 
>>>> -Gilles
> 
>>>> 
> 
>>>> On Fri, Mar 7, 2014 at 2:21 PM, Doug Simon <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>
> 
>>>> wrote:
> 
>>>>> 
> 
>>>>> On Mar 7, 2014, at 1:52 PM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
> 
>>>>> 
> 
>>>>>> Doug --
> 
>>>>>> 
> 
>>>>>> Regarding your handle-based solution...
> 
>>>>>> 
> 
>>>>>> would it be sufficient to convert all the saved oops (in all the
> 
>>>> workitem saved state areas) to Handles before the first javaCall
> 
>>>> (while we are still in thread_in_vm mode), and then before each
> 
>>>> javaCall just convert back the one save area that is being used in
> 
>> that javaCall?
> 
>>>>> 
> 
>>>>> This javaCall is to the special deopting nmethod if I understand
> 
>>>> correctly. And the save state area is used solely as input to a deopt
> 
>>>> instruction in which case there is no possibility of a GC between
> 
>>>> entering the javaCall and hitting the deopt instruction by which time
> 
>>>> all oops have been copied from the save state area (i.e., the
> 
>>>> hsailFrame) to slots in the special deopting method’s frame. At that
> 
>>>> point, the oops in the save state area are dead and standard GC root
> 
>>>> scanning knows where to find their copies. If this is all correct,
> 
>>>> then your suggestion should work.
> 
>>>>> 
> 
>>>>> -Doug
> 
>>>>> 
> 
>>>>>>> -----Original Message-----
> 
>>>>>>> From: Doug Simon [mailto:doug.simon at oracle.com]
> 
>>>>>>> Sent: Friday, March 07, 2014 4:27 AM
> 
>>>>>>> To: Deneau, Tom
> 
>>>>>>> Cc: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>; sumatra-dev at openjdk.java.net<mailto:sumatra-dev at openjdk.java.net>
> 
>>>>>>> Subject: Re: suspcions about GC and HSAIL Deopt
> 
>>>>>>> 
> 
>>>>>>> 
> 
>>>>>>> On Mar 7, 2014, at 12:30 AM, Deneau, Tom <tom.deneau at amd.com<mailto:tom.deneau at amd.com>>
> 
>> wrote:
> 
>>>>>>> 
> 
>>>>>>>> While preparing this webrev for the hsail deoptimization work
> 
>>>>>>>> we've
> 
>>>>>>> been doing, I noticed some spurious failures when we run on HSA
> 
>>>>>>> hardware.  I have a theory of what's happening, let me know if
> 
>>>>>>> this makes sense...
> 
>>>>>>>> 
> 
>>>>>>>> First the big overview:
> 
>>>>>>>> 
> 
>>>>>>>> When we run a kernel, and it returns from the GPU each workitem
> 
>>>>>>>> can be
> 
>>>>>>> in one of 3 states:
> 
>>>>>>>> 
> 
>>>>>>>> a) finished normally
> 
>>>>>>>> b) deopted and saved its state (and set the deopt-happened
> 
>>>>>>>> flag)
> 
>>>>>>>> c) on entry, saw deopt-happened=true and so just exited early
> 
>>>>>>>>    without running.
> 
>>>>>>>> 
> 
>>>>>>>> This last one exists because we don't want to have to allocate
> 
>>>>>>>> enough
> 
>>>>>>> deopt save space so that each workitem has its own unique save
> 
>>>> space.
> 
>>>>>>>> Instead we only allocate enough for the number of concurrent
> 
>>>>>>>> workitems
> 
>>>>>>> possible.
> 
>>>>>>>> 
> 
>>>>>>>> When we return from the GPU, if one or more workitems deopted
> 
>> we:
> 
>>>>>>>> 
> 
>>>>>>>> a) for the workitems that finished normally, there is nothing
> 
>>>>>>>> to do
> 
>>>>>>>> 
> 
>>>>>>>> b) for each deopted workitems, we want to run it thru the
> 
>>>>>>>>    interpreter going first thru the special host trampoline
> 
>> code
> 
>>>>>>>>    infrastructure that Gilles created.  The trampoline host
> 
>> code
> 
>>>>>>>>    takes a deoptId (sort of like a pc, telling where the deopt
> 
>>>>>>>>    occurred in the hsail code) and a pointer to the saved hsail
> 
>>>>>>>>    frame.  We currently do this sequentially although other
> 
>>>>>>>>    policies are possible.
> 
>>>>>>>> 
> 
>>>>>>>> c) for each never ran workitem, we can just run it from the
> 
>>>>>>>>    beginning of the kernel "method", just making sure we pass
> 
>> the
> 
>>>>>>>>    arguments and the appropriate workitem id for each one.
> 
>> Again,
> 
>>>>>>>>    we currently do this sequentially although other policies
> 
>> are
> 
>>>>>>>>    possible.
> 
>>>>>>>> 
> 
>>>>>>>> When we enter the JVM to run the kernel, we transition to
> 
>>>>>>>> thread_in_vm
> 
>>>>>>> mode.  So while running on the GPU, no oops are moving (although
> 
>>>>>>> of course GCs may be delayed).
> 
>>>>>>>> 
> 
>>>>>>>> When we start looking for workitems of type b or c above, we are
> 
>>>>>>>> still
> 
>>>>>>> in thread_in_vm mode.  However since both b and c above use the
> 
>>>>>>> javaCall infrastructure, I believe they are transitioning to
> 
>>>>>>> thread_in_java mode on each call, and oops can move.
> 
>>>>>>>> 
> 
>>>>>>>> So if for instance there are two deopting workitems, it is
> 
>>>>>>>> possible
> 
>>>>>>> that after executing the first one that the saved deopt state for
> 
>>>>>>> the second one is no longer valid.
> 
>>>>>>>> 
> 
>>>>>>>> The junit tests on which I have seen the spurious failures are
> 
>>>>>>>> ones
> 
>>>>>>> where lots of workitems deopt.  When run in the hotspot debug
> 
>>>>>>> build, we usually see SEGVs in interpreter code and the access is
> 
>>>>>>> always to 0xbaadbabe.
> 
>>>>>>>> 
> 
>>>>>>>> Note that when Gilles was developing his infrastructure, the
> 
>>>>>>>> only test
> 
>>>>>>> cases we had all had a single workitem deopting so would not show
> 
>>>> this.
> 
>>>>>>> Also even with multi-deopting test cases, I believe the reason we
> 
>>>>>>> don't see this on the simulator is that the concurrency is much
> 
>>>>>>> less there so the number of workitems of type b) above will be
> 
>> much less.
> 
>>>>>>> On hardware, we can have thousands of workitems deopting.
> 
>>>>>>>> 
> 
>>>>>>>> I suppose the solution to this is to mark any oops in the deopt
> 
>>>>>>>> saved
> 
>>>>>>> state in some way that GC can find them and fix them.  What is
> 
>>>>>>> the best way to do this?
> 
>>>>>>> 
> 
>>>>>>> I'm not sure it's the most optimal solution, but around each
> 
>>>>>>> javaCall, you could convert each saved oop to a Handle and
> 
>>>>>>> convert it back after the call. I'm not aware of other mechanisms
> 
>>>>>>> in HotSpot for registering GC roots but that doesn't mean they
> 
>> don't exist.
> 
>>>>>>> 
> 
>>>>>>>> Or is there any way to execute javaCalls from thread_in_vm mode
> 
>>>>>>> without allowing GCs to happen?
> 
>>>>>>> 
> 
>>>>>>> You are calling arbitrary Java code right? That means you cannot
> 
>>>>>>> guarantee allocation won't be performed which in turn means you
> 
>>>>>>> cannot disable GC (even though there are mechanisms for doing so
> 
>>>>>>> like GC_locker::lock_critical/GC_locker::unlock_critical).
> 
>>>>>>> 
> 
>>>>>>> -Doug
> 
>>>>>> 
> 
>>>>>> 
> 
>>>>> 
> 
>>> 
> 
>