suspcions about GC and HSAIL Deopt

Mon Mar 10 17:10:12 UTC 2014

Ah, I was worried about the (admittedly small) window between entering the special deopting method and getting those values safely into register/stack slots, but now I realize there are no safepoints in that window (I hope) so no GC can happen.

-- Tom

> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com]
> Sent: Monday, March 10, 2014 12:03 PM
> To: Deneau, Tom
> Cc: Gilles Duboscq; sumatra-dev at openjdk.java.net; graal-
> dev at openjdk.java.net
> Subject: Re: suspcions about GC and HSAIL Deopt
> 
> It's based on my understanding of what the special deopting method does
> which is something like:
> 
> void deoptFromHSAIL(int id, HSAILFrame frame) {
>    if (id == 0) {
>        // copy info out of frame into registers/stack slots
>        Deoptimize();
>    } else if (id == 1) {
>        // copy info out of frame into registers/stack slots
>        Deoptimize();
>    } else if ...
> 
> Gilles can confirm/correct.
> 
> -Doug
> 
> On Mar 10, 2014, at 5:53 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> 
> > Gilles, Doug --
> >
> >
> >
> > I was wondering about this statement Doug made...
> >
> >
> >
> > This javaCall is to the special deopting nmethod if I understand
> correctly. And the save state area is used solely as input to a deopt
> instruction in which case there is no possibility of a GC between
> entering the javaCall and hitting the deopt instruction by which time
> all oops have been copied from the save state area (i.e., the
> hsailFrame) to slots in the special deopting method's frame.
> >
> >
> >
> >
> >
> > Is it true there is no possibility of GC between entering the nmethod
> and hitting the deopt call/instruction?  How is that prevented?
> >
> >
> >
> > -- Tom
> >
> >
> >
> >> -----Original Message-----
> >
> >> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
> >
> >> Gilles Duboscq
> >
> >> Sent: Monday, March 10, 2014 10:14 AM
> >
> >> To: Deneau, Tom
> >
> >> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> >
> >> Subject: Re: suspcions about GC and HSAIL Deopt
> >
> >>
> >
> >> Ok, sounds good
> >
> >>
> >
> >> On Mon, Mar 10, 2014 at 2:28 PM, Tom Deneau
> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
> >
> >>> Gilles --
> >
> >>>
> >
> >>> Update on this...
> >
> >>>
> >
> >>> Yes, I put in the code to save the oops maps, currently somewhat
> >
> >> simplified in that only hsail $d registers can have oops and we are
> not
> >
> >> saving stack slots yet.
> >
> >>>
> >
> >>> Using that I implemented a quickie solution that copied the detected
> >
> >> oops into a regular java Object array before the first deopt, then
> >
> >> reloaded them into the particular frame before each deopt.  Logging
> code
> >
> >> did show that there were times when the original value of the oop had
> >
> >> changed to a new value and we no longer hit our spurious failures.
> >
> >> I'm sure its inefficient when compared to an oops_do approach but it
> did
> >
> >> seem to work.
> >
> >>>
> >
> >>> I will probably submit the webrev with this quickie solution and we
> >
> >> can discuss how to make it use oops_do.
> >
> >>>
> >
> >>> -- Tom
> >
> >>>
> >
> >>>
> >
> >>>
> >
> >>>
> >
> >>>
> >
> >>>> -----Original Message-----
> >
> >>>> From: gilwooden at gmail.com<mailto:gilwooden at gmail.com>
> [mailto:gilwooden at gmail.com] On Behalf Of
> >
> >>>> Gilles Duboscq
> >
> >>>> Sent: Monday, March 10, 2014 7:58 AM
> >
> >>>> To: Deneau, Tom
> >
> >>>> Cc: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>;
> sumatra-dev at openjdk.java.net<mailto:sumatra-dev at openjdk.java.net>
> >
> >>>> Subject: Re: suspcions about GC and HSAIL Deopt
> >
> >>>>
> >
> >>>> Using Handle and restoring the value should work. In the long term
> we
> >
> >>>> may want to just have an opps_do on the save area and hook into
> >
> >>>> JavaThread::oops_do.
> >
> >>>>
> >
> >>>> However even with the Handles version you need "oop maps" for the
> >
> >>>> save areas. It shouldn't be very hard to extract them from the
> HSAIL
> >
> >>>> compilation but currently they are just thrown away.
> >
> >>>>
> >
> >>>> -Gilles
> >
> >>>>
> >
> >>>> On Fri, Mar 7, 2014 at 2:21 PM, Doug Simon
> <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>
> >
> >>>> wrote:
> >
> >>>>>
> >
> >>>>> On Mar 7, 2014, at 1:52 PM, Deneau, Tom
> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
> >
> >>>>>
> >
> >>>>>> Doug --
> >
> >>>>>>
> >
> >>>>>> Regarding your handle-based solution...
> >
> >>>>>>
> >
> >>>>>> would it be sufficient to convert all the saved oops (in all the
> >
> >>>> workitem saved state areas) to Handles before the first javaCall
> >
> >>>> (while we are still in thread_in_vm mode), and then before each
> >
> >>>> javaCall just convert back the one save area that is being used in
> >
> >> that javaCall?
> >
> >>>>>
> >
> >>>>> This javaCall is to the special deopting nmethod if I understand
> >
> >>>> correctly. And the save state area is used solely as input to a
> deopt
> >
> >>>> instruction in which case there is no possibility of a GC between
> >
> >>>> entering the javaCall and hitting the deopt instruction by which
> time
> >
> >>>> all oops have been copied from the save state area (i.e., the
> >
> >>>> hsailFrame) to slots in the special deopting method's frame. At
> that
> >
> >>>> point, the oops in the save state area are dead and standard GC
> root
> >
> >>>> scanning knows where to find their copies. If this is all correct,
> >
> >>>> then your suggestion should work.
> >
> >>>>>
> >
> >>>>> -Doug
> >
> >>>>>
> >
> >>>>>>> -----Original Message-----
> >
> >>>>>>> From: Doug Simon [mailto:doug.simon at oracle.com]
> >
> >>>>>>> Sent: Friday, March 07, 2014 4:27 AM
> >
> >>>>>>> To: Deneau, Tom
> >
> >>>>>>> Cc: graal-dev at openjdk.java.net<mailto:graal-
> dev at openjdk.java.net>; sumatra-dev at openjdk.java.net<mailto:sumatra-
> dev at openjdk.java.net>
> >
> >>>>>>> Subject: Re: suspcions about GC and HSAIL Deopt
> >
> >>>>>>>
> >
> >>>>>>>
> >
> >>>>>>> On Mar 7, 2014, at 12:30 AM, Deneau, Tom
> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>>
> >
> >> wrote:
> >
> >>>>>>>
> >
> >>>>>>>> While preparing this webrev for the hsail deoptimization work
> >
> >>>>>>>> we've
> >
> >>>>>>> been doing, I noticed some spurious failures when we run on HSA
> >
> >>>>>>> hardware.  I have a theory of what's happening, let me know if
> >
> >>>>>>> this makes sense...
> >
> >>>>>>>>
> >
> >>>>>>>> First the big overview:
> >
> >>>>>>>>
> >
> >>>>>>>> When we run a kernel, and it returns from the GPU each workitem
> >
> >>>>>>>> can be
> >
> >>>>>>> in one of 3 states:
> >
> >>>>>>>>
> >
> >>>>>>>> a) finished normally
> >
> >>>>>>>> b) deopted and saved its state (and set the deopt-happened
> >
> >>>>>>>> flag)
> >
> >>>>>>>> c) on entry, saw deopt-happened=true and so just exited early
> >
> >>>>>>>>    without running.
> >
> >>>>>>>>
> >
> >>>>>>>> This last one exists because we don't want to have to allocate
> >
> >>>>>>>> enough
> >
> >>>>>>> deopt save space so that each workitem has its own unique save
> >
> >>>> space.
> >
> >>>>>>>> Instead we only allocate enough for the number of concurrent
> >
> >>>>>>>> workitems
> >
> >>>>>>> possible.
> >
> >>>>>>>>
> >
> >>>>>>>> When we return from the GPU, if one or more workitems deopted
> >
> >> we:
> >
> >>>>>>>>
> >
> >>>>>>>> a) for the workitems that finished normally, there is nothing
> >
> >>>>>>>> to do
> >
> >>>>>>>>
> >
> >>>>>>>> b) for each deopted workitems, we want to run it thru the
> >
> >>>>>>>>    interpreter going first thru the special host trampoline
> >
> >> code
> >
> >>>>>>>>    infrastructure that Gilles created.  The trampoline host
> >
> >> code
> >
> >>>>>>>>    takes a deoptId (sort of like a pc, telling where the deopt
> >
> >>>>>>>>    occurred in the hsail code) and a pointer to the saved hsail
> >
> >>>>>>>>    frame.  We currently do this sequentially although other
> >
> >>>>>>>>    policies are possible.
> >
> >>>>>>>>
> >
> >>>>>>>> c) for each never ran workitem, we can just run it from the
> >
> >>>>>>>>    beginning of the kernel "method", just making sure we pass
> >
> >> the
> >
> >>>>>>>>    arguments and the appropriate workitem id for each one.
> >
> >> Again,
> >
> >>>>>>>>    we currently do this sequentially although other policies
> >
> >> are
> >
> >>>>>>>>    possible.
> >
> >>>>>>>>
> >
> >>>>>>>> When we enter the JVM to run the kernel, we transition to
> >
> >>>>>>>> thread_in_vm
> >
> >>>>>>> mode.  So while running on the GPU, no oops are moving (although
> >
> >>>>>>> of course GCs may be delayed).
> >
> >>>>>>>>
> >
> >>>>>>>> When we start looking for workitems of type b or c above, we
> are
> >
> >>>>>>>> still
> >
> >>>>>>> in thread_in_vm mode.  However since both b and c above use the
> >
> >>>>>>> javaCall infrastructure, I believe they are transitioning to
> >
> >>>>>>> thread_in_java mode on each call, and oops can move.
> >
> >>>>>>>>
> >
> >>>>>>>> So if for instance there are two deopting workitems, it is
> >
> >>>>>>>> possible
> >
> >>>>>>> that after executing the first one that the saved deopt state
> for
> >
> >>>>>>> the second one is no longer valid.
> >
> >>>>>>>>
> >
> >>>>>>>> The junit tests on which I have seen the spurious failures are
> >
> >>>>>>>> ones
> >
> >>>>>>> where lots of workitems deopt.  When run in the hotspot debug
> >
> >>>>>>> build, we usually see SEGVs in interpreter code and the access
> is
> >
> >>>>>>> always to 0xbaadbabe.
> >
> >>>>>>>>
> >
> >>>>>>>> Note that when Gilles was developing his infrastructure, the
> >
> >>>>>>>> only test
> >
> >>>>>>> cases we had all had a single workitem deopting so would not
> show
> >
> >>>> this.
> >
> >>>>>>> Also even with multi-deopting test cases, I believe the reason
> we
> >
> >>>>>>> don't see this on the simulator is that the concurrency is much
> >
> >>>>>>> less there so the number of workitems of type b) above will be
> >
> >> much less.
> >
> >>>>>>> On hardware, we can have thousands of workitems deopting.
> >
> >>>>>>>>
> >
> >>>>>>>> I suppose the solution to this is to mark any oops in the deopt
> >
> >>>>>>>> saved
> >
> >>>>>>> state in some way that GC can find them and fix them.  What is
> >
> >>>>>>> the best way to do this?
> >
> >>>>>>>
> >
> >>>>>>> I'm not sure it's the most optimal solution, but around each
> >
> >>>>>>> javaCall, you could convert each saved oop to a Handle and
> >
> >>>>>>> convert it back after the call. I'm not aware of other
> mechanisms
> >
> >>>>>>> in HotSpot for registering GC roots but that doesn't mean they
> >
> >> don't exist.
> >
> >>>>>>>
> >
> >>>>>>>> Or is there any way to execute javaCalls from thread_in_vm mode
> >
> >>>>>>> without allowing GCs to happen?
> >
> >>>>>>>
> >
> >>>>>>> You are calling arbitrary Java code right? That means you cannot
> >
> >>>>>>> guarantee allocation won't be performed which in turn means you
> >
> >>>>>>> cannot disable GC (even though there are mechanisms for doing so
> >
> >>>>>>> like GC_locker::lock_critical/GC_locker::unlock_critical).
> >
> >>>>>>>
> >
> >>>>>>> -Doug
> >
> >>>>>>
> >
> >>>>>>
> >
> >>>>>
> >
> >>>
> >
> >
>