suspcions about GC and HSAIL Deopt

Thu Mar 6 15:30:20 PST 2014

While preparing this webrev for the hsail deoptimization work we've been doing, I noticed some spurious failures when we run on HSA hardware.  I have a theory of what's happening, let me know if this makes sense...

First the big overview:

When we run a kernel, and it returns from the GPU each workitem can be in one of 3 states:

   a) finished normally
   b) deopted and saved its state (and set the deopt-happened flag)
   c) on entry, saw deopt-happened=true and so just exited early
      without running.

This last one exists because we don't want to have to allocate enough deopt save space so that each workitem has its own unique save space.
Instead we only allocate enough for the number of concurrent workitems possible.

When we return from the GPU, if one or more workitems deopted we:

   a) for the workitems that finished normally, there is nothing to do

   b) for each deopted workitems, we want to run it thru the
      interpreter going first thru the special host trampoline code
      infrastructure that Gilles created.  The trampoline host code
      takes a deoptId (sort of like a pc, telling where the deopt
      occurred in the hsail code) and a pointer to the saved hsail
      frame.  We currently do this sequentially although other
      policies are possible.

   c) for each never ran workitem, we can just run it from the
      beginning of the kernel "method", just making sure we pass the
      arguments and the appropriate workitem id for each one.  Again,
      we currently do this sequentially although other policies are
      possible.

When we enter the JVM to run the kernel, we transition to thread_in_vm mode.  So while running on the GPU, no oops are moving (although of course GCs may be delayed).

When we start looking for workitems of type b or c above, we are still in thread_in_vm mode.  However since both b and c above use the javaCall infrastructure, I believe they are transitioning to thread_in_java mode on each call, and oops can move.

So if for instance there are two deopting workitems, it is possible that after executing the first one that the saved deopt state for the second one is no longer valid.

The junit tests on which I have seen the spurious failures are ones where lots of workitems deopt.  When run in the hotspot debug build, we usually see SEGVs in interpreter code and the access is always to 0xbaadbabe.

Note that when Gilles was developing his infrastructure, the only test cases we had all had a single workitem deopting so would not show this.  Also even with multi-deopting test cases, I believe the reason we don't see this on the simulator is that the concurrency is much less there so the number of workitems of type b) above will be much less.  On hardware, we can have thousands of workitems deopting.

I suppose the solution to this is to mark any oops in the deopt saved state in some way that GC can find them and fix them.  What is the best way to do this?

Or is there any way to execute javaCalls from thread_in_vm mode without allowing GCs to happen?

-- Tom