suspcions about GC and HSAIL Deopt

Mon Mar 10 13:28:48 UTC 2014

Gilles --

Update on this...

Yes, I put in the code to save the oops maps, currently somewhat simplified in that only hsail $d registers can have oops and we are not saving stack slots yet.

Using that I implemented a quickie solution that copied the detected oops into a regular java Object array before the first deopt, then reloaded them into the particular frame before each deopt.  Logging code did show that there were times when the original value of the oop had changed to a new value and we no longer hit our spurious failures.    I'm sure its inefficient when compared to an oops_do approach but it did seem to work.

I will probably submit the webrev with this quickie solution and we can discuss how to make it use oops_do.

-- Tom

> -----Original Message-----
> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
> Gilles Duboscq
> Sent: Monday, March 10, 2014 7:58 AM
> To: Deneau, Tom
> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> Subject: Re: suspcions about GC and HSAIL Deopt
> 
> Using Handle and restoring the value should work. In the long term we
> may want to just have an opps_do on the save area and hook into
> JavaThread::oops_do.
> 
> However even with the Handles version you need "oop maps" for the save
> areas. It shouldn't be very hard to extract them from the HSAIL
> compilation but currently they are just thrown away.
> 
> -Gilles
> 
> On Fri, Mar 7, 2014 at 2:21 PM, Doug Simon <doug.simon at oracle.com>
> wrote:
> >
> > On Mar 7, 2014, at 1:52 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
> >
> >> Doug --
> >>
> >> Regarding your handle-based solution...
> >>
> >> would it be sufficient to convert all the saved oops (in all the
> workitem saved state areas) to Handles before the first javaCall (while
> we are still in thread_in_vm mode), and then before each javaCall just
> convert back the one save area that is being used in that javaCall?
> >
> > This javaCall is to the special deopting nmethod if I understand
> correctly. And the save state area is used solely as input to a deopt
> instruction in which case there is no possibility of a GC between
> entering the javaCall and hitting the deopt instruction by which time
> all oops have been copied from the save state area (i.e., the
> hsailFrame) to slots in the special deopting method’s frame. At that
> point, the oops in the save state area are dead and standard GC root
> scanning knows where to find their copies. If this is all correct, then
> your suggestion should work.
> >
> > -Doug
> >
> >>> -----Original Message-----
> >>> From: Doug Simon [mailto:doug.simon at oracle.com]
> >>> Sent: Friday, March 07, 2014 4:27 AM
> >>> To: Deneau, Tom
> >>> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> >>> Subject: Re: suspcions about GC and HSAIL Deopt
> >>>
> >>>
> >>> On Mar 7, 2014, at 12:30 AM, Deneau, Tom <tom.deneau at amd.com> wrote:
> >>>
> >>>> While preparing this webrev for the hsail deoptimization work we've
> >>> been doing, I noticed some spurious failures when we run on HSA
> >>> hardware.  I have a theory of what's happening, let me know if this
> >>> makes sense...
> >>>>
> >>>> First the big overview:
> >>>>
> >>>> When we run a kernel, and it returns from the GPU each workitem can
> >>>> be
> >>> in one of 3 states:
> >>>>
> >>>>  a) finished normally
> >>>>  b) deopted and saved its state (and set the deopt-happened flag)
> >>>>  c) on entry, saw deopt-happened=true and so just exited early
> >>>>     without running.
> >>>>
> >>>> This last one exists because we don't want to have to allocate
> >>>> enough
> >>> deopt save space so that each workitem has its own unique save
> space.
> >>>> Instead we only allocate enough for the number of concurrent
> >>>> workitems
> >>> possible.
> >>>>
> >>>> When we return from the GPU, if one or more workitems deopted we:
> >>>>
> >>>>  a) for the workitems that finished normally, there is nothing to
> >>>> do
> >>>>
> >>>>  b) for each deopted workitems, we want to run it thru the
> >>>>     interpreter going first thru the special host trampoline code
> >>>>     infrastructure that Gilles created.  The trampoline host code
> >>>>     takes a deoptId (sort of like a pc, telling where the deopt
> >>>>     occurred in the hsail code) and a pointer to the saved hsail
> >>>>     frame.  We currently do this sequentially although other
> >>>>     policies are possible.
> >>>>
> >>>>  c) for each never ran workitem, we can just run it from the
> >>>>     beginning of the kernel "method", just making sure we pass the
> >>>>     arguments and the appropriate workitem id for each one.  Again,
> >>>>     we currently do this sequentially although other policies are
> >>>>     possible.
> >>>>
> >>>> When we enter the JVM to run the kernel, we transition to
> >>>> thread_in_vm
> >>> mode.  So while running on the GPU, no oops are moving (although of
> >>> course GCs may be delayed).
> >>>>
> >>>> When we start looking for workitems of type b or c above, we are
> >>>> still
> >>> in thread_in_vm mode.  However since both b and c above use the
> >>> javaCall infrastructure, I believe they are transitioning to
> >>> thread_in_java mode on each call, and oops can move.
> >>>>
> >>>> So if for instance there are two deopting workitems, it is possible
> >>> that after executing the first one that the saved deopt state for
> >>> the second one is no longer valid.
> >>>>
> >>>> The junit tests on which I have seen the spurious failures are ones
> >>> where lots of workitems deopt.  When run in the hotspot debug build,
> >>> we usually see SEGVs in interpreter code and the access is always to
> >>> 0xbaadbabe.
> >>>>
> >>>> Note that when Gilles was developing his infrastructure, the only
> >>>> test
> >>> cases we had all had a single workitem deopting so would not show
> this.
> >>> Also even with multi-deopting test cases, I believe the reason we
> >>> don't see this on the simulator is that the concurrency is much less
> >>> there so the number of workitems of type b) above will be much less.
> >>> On hardware, we can have thousands of workitems deopting.
> >>>>
> >>>> I suppose the solution to this is to mark any oops in the deopt
> >>>> saved
> >>> state in some way that GC can find them and fix them.  What is the
> >>> best way to do this?
> >>>
> >>> I'm not sure it's the most optimal solution, but around each
> >>> javaCall, you could convert each saved oop to a Handle and convert
> >>> it back after the call. I'm not aware of other mechanisms in HotSpot
> >>> for registering GC roots but that doesn't mean they don't exist.
> >>>
> >>>> Or is there any way to execute javaCalls from thread_in_vm mode
> >>> without allowing GCs to happen?
> >>>
> >>> You are calling arbitrary Java code right? That means you cannot
> >>> guarantee allocation won't be performed which in turn means you
> >>> cannot disable GC (even though there are mechanisms for doing so
> >>> like GC_locker::lock_critical/GC_locker::unlock_critical).
> >>>
> >>> -Doug
> >>
> >>
> >