suspcions about GC and HSAIL Deopt

Fri Mar 7 04:52:36 PST 2014

Doug --

Regarding your handle-based solution...

would it be sufficient to convert all the saved oops (in all the workitem saved state areas) to Handles before the first javaCall (while we are still in thread_in_vm mode), and then before each javaCall just convert back the one save area that is being used in that javaCall?

-- Tom

> -----Original Message-----
> From: Doug Simon [mailto:doug.simon at oracle.com]
> Sent: Friday, March 07, 2014 4:27 AM
> To: Deneau, Tom
> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
> Subject: Re: suspcions about GC and HSAIL Deopt
> 
> 
> On Mar 7, 2014, at 12:30 AM, Deneau, Tom <tom.deneau at amd.com> wrote:
> 
> > While preparing this webrev for the hsail deoptimization work we've
> been doing, I noticed some spurious failures when we run on HSA
> hardware.  I have a theory of what's happening, let me know if this
> makes sense...
> >
> > First the big overview:
> >
> > When we run a kernel, and it returns from the GPU each workitem can be
> in one of 3 states:
> >
> >   a) finished normally
> >   b) deopted and saved its state (and set the deopt-happened flag)
> >   c) on entry, saw deopt-happened=true and so just exited early
> >      without running.
> >
> > This last one exists because we don't want to have to allocate enough
> deopt save space so that each workitem has its own unique save space.
> > Instead we only allocate enough for the number of concurrent workitems
> possible.
> >
> > When we return from the GPU, if one or more workitems deopted we:
> >
> >   a) for the workitems that finished normally, there is nothing to do
> >
> >   b) for each deopted workitems, we want to run it thru the
> >      interpreter going first thru the special host trampoline code
> >      infrastructure that Gilles created.  The trampoline host code
> >      takes a deoptId (sort of like a pc, telling where the deopt
> >      occurred in the hsail code) and a pointer to the saved hsail
> >      frame.  We currently do this sequentially although other
> >      policies are possible.
> >
> >   c) for each never ran workitem, we can just run it from the
> >      beginning of the kernel "method", just making sure we pass the
> >      arguments and the appropriate workitem id for each one.  Again,
> >      we currently do this sequentially although other policies are
> >      possible.
> >
> > When we enter the JVM to run the kernel, we transition to thread_in_vm
> mode.  So while running on the GPU, no oops are moving (although of
> course GCs may be delayed).
> >
> > When we start looking for workitems of type b or c above, we are still
> in thread_in_vm mode.  However since both b and c above use the javaCall
> infrastructure, I believe they are transitioning to thread_in_java mode
> on each call, and oops can move.
> >
> > So if for instance there are two deopting workitems, it is possible
> that after executing the first one that the saved deopt state for the
> second one is no longer valid.
> >
> > The junit tests on which I have seen the spurious failures are ones
> where lots of workitems deopt.  When run in the hotspot debug build, we
> usually see SEGVs in interpreter code and the access is always to
> 0xbaadbabe.
> >
> > Note that when Gilles was developing his infrastructure, the only test
> cases we had all had a single workitem deopting so would not show this.
> Also even with multi-deopting test cases, I believe the reason we don't
> see this on the simulator is that the concurrency is much less there so
> the number of workitems of type b) above will be much less.  On
> hardware, we can have thousands of workitems deopting.
> >
> > I suppose the solution to this is to mark any oops in the deopt saved
> state in some way that GC can find them and fix them.  What is the best
> way to do this?
> 
> I'm not sure it's the most optimal solution, but around each javaCall,
> you could convert each saved oop to a Handle and convert it back after
> the call. I'm not aware of other mechanisms in HotSpot for registering
> GC roots but that doesn't mean they don't exist.
> 
> > Or is there any way to execute javaCalls from thread_in_vm mode
> without allowing GCs to happen?
> 
> You are calling arbitrary Java code right? That means you cannot
> guarantee allocation won't be performed which in turn means you cannot
> disable GC (even though there are mechanisms for doing so like
> GC_locker::lock_critical/GC_locker::unlock_critical).
> 
> -Doug