suspcions about GC and HSAIL Deopt

Mon Mar 10 17:45:34 UTC 2014

On Mon, Mar 10, 2014 at 6:10 PM, Tom Deneau <tom.deneau at amd.com> wrote:
> Ah, I was worried about the (admittedly small) window between entering the special deopting method and getting those values safely into register/stack slots, but now I realize there are no safepoints in that window (I hope) so no GC can happen.
Yes exactly
>
> -- Tom
>
>> -----Original Message-----
>> From: Doug Simon [mailto:doug.simon at oracle.com]
>> Sent: Monday, March 10, 2014 12:03 PM
>> To: Deneau, Tom
>> Cc: Gilles Duboscq; sumatra-dev at openjdk.java.net; graal-
>> dev at openjdk.java.net
>> Subject: Re: suspcions about GC and HSAIL Deopt
>>
>> It's based on my understanding of what the special deopting method does
>> which is something like:
>>
>> void deoptFromHSAIL(int id, HSAILFrame frame) {
>>    if (id == 0) {
>>        // copy info out of frame into registers/stack slots
>>        Deoptimize();
>>    } else if (id == 1) {
>>        // copy info out of frame into registers/stack slots
>>        Deoptimize();
>>    } else if ...
>>
>> Gilles can confirm/correct.
>>
>> -Doug
>>
>> On Mar 10, 2014, at 5:53 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>>
>> > Gilles, Doug --
>> >
>> >
>> >
>> > I was wondering about this statement Doug made...
>> >
>> >
>> >
>> > This javaCall is to the special deopting nmethod if I understand
>> correctly. And the save state area is used solely as input to a deopt
>> instruction in which case there is no possibility of a GC between
>> entering the javaCall and hitting the deopt instruction by which time
>> all oops have been copied from the save state area (i.e., the
>> hsailFrame) to slots in the special deopting method's frame.
>> >
>> >
>> >
>> >
>> >
>> > Is it true there is no possibility of GC between entering the nmethod
>> and hitting the deopt call/instruction?  How is that prevented?
>> >
>> >
>> >
>> > -- Tom
>> >
>> >
>> >
>> >> -----Original Message-----
>> >
>> >> From: gilwooden at gmail.com [mailto:gilwooden at gmail.com] On Behalf Of
>> >
>> >> Gilles Duboscq
>> >
>> >> Sent: Monday, March 10, 2014 10:14 AM
>> >
>> >> To: Deneau, Tom
>> >
>> >> Cc: graal-dev at openjdk.java.net; sumatra-dev at openjdk.java.net
>> >
>> >> Subject: Re: suspcions about GC and HSAIL Deopt
>> >
>> >>
>> >
>> >> Ok, sounds good
>> >
>> >>
>> >
>> >> On Mon, Mar 10, 2014 at 2:28 PM, Tom Deneau
>> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
>> >
>> >>> Gilles --
>> >
>> >>>
>> >
>> >>> Update on this...
>> >
>> >>>
>> >
>> >>> Yes, I put in the code to save the oops maps, currently somewhat
>> >
>> >> simplified in that only hsail $d registers can have oops and we are
>> not
>> >
>> >> saving stack slots yet.
>> >
>> >>>
>> >
>> >>> Using that I implemented a quickie solution that copied the detected
>> >
>> >> oops into a regular java Object array before the first deopt, then
>> >
>> >> reloaded them into the particular frame before each deopt.  Logging
>> code
>> >
>> >> did show that there were times when the original value of the oop had
>> >
>> >> changed to a new value and we no longer hit our spurious failures.
>> >
>> >> I'm sure its inefficient when compared to an oops_do approach but it
>> did
>> >
>> >> seem to work.
>> >
>> >>>
>> >
>> >>> I will probably submit the webrev with this quickie solution and we
>> >
>> >> can discuss how to make it use oops_do.
>> >
>> >>>
>> >
>> >>> -- Tom
>> >
>> >>>
>> >
>> >>>
>> >
>> >>>
>> >
>> >>>
>> >
>> >>>
>> >
>> >>>> -----Original Message-----
>> >
>> >>>> From: gilwooden at gmail.com<mailto:gilwooden at gmail.com>
>> [mailto:gilwooden at gmail.com] On Behalf Of
>> >
>> >>>> Gilles Duboscq
>> >
>> >>>> Sent: Monday, March 10, 2014 7:58 AM
>> >
>> >>>> To: Deneau, Tom
>> >
>> >>>> Cc: graal-dev at openjdk.java.net<mailto:graal-dev at openjdk.java.net>;
>> sumatra-dev at openjdk.java.net<mailto:sumatra-dev at openjdk.java.net>
>> >
>> >>>> Subject: Re: suspcions about GC and HSAIL Deopt
>> >
>> >>>>
>> >
>> >>>> Using Handle and restoring the value should work. In the long term
>> we
>> >
>> >>>> may want to just have an opps_do on the save area and hook into
>> >
>> >>>> JavaThread::oops_do.
>> >
>> >>>>
>> >
>> >>>> However even with the Handles version you need "oop maps" for the
>> >
>> >>>> save areas. It shouldn't be very hard to extract them from the
>> HSAIL
>> >
>> >>>> compilation but currently they are just thrown away.
>> >
>> >>>>
>> >
>> >>>> -Gilles
>> >
>> >>>>
>> >
>> >>>> On Fri, Mar 7, 2014 at 2:21 PM, Doug Simon
>> <doug.simon at oracle.com<mailto:doug.simon at oracle.com>>
>> >
>> >>>> wrote:
>> >
>> >>>>>
>> >
>> >>>>> On Mar 7, 2014, at 1:52 PM, Deneau, Tom
>> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>> wrote:
>> >
>> >>>>>
>> >
>> >>>>>> Doug --
>> >
>> >>>>>>
>> >
>> >>>>>> Regarding your handle-based solution...
>> >
>> >>>>>>
>> >
>> >>>>>> would it be sufficient to convert all the saved oops (in all the
>> >
>> >>>> workitem saved state areas) to Handles before the first javaCall
>> >
>> >>>> (while we are still in thread_in_vm mode), and then before each
>> >
>> >>>> javaCall just convert back the one save area that is being used in
>> >
>> >> that javaCall?
>> >
>> >>>>>
>> >
>> >>>>> This javaCall is to the special deopting nmethod if I understand
>> >
>> >>>> correctly. And the save state area is used solely as input to a
>> deopt
>> >
>> >>>> instruction in which case there is no possibility of a GC between
>> >
>> >>>> entering the javaCall and hitting the deopt instruction by which
>> time
>> >
>> >>>> all oops have been copied from the save state area (i.e., the
>> >
>> >>>> hsailFrame) to slots in the special deopting method's frame. At
>> that
>> >
>> >>>> point, the oops in the save state area are dead and standard GC
>> root
>> >
>> >>>> scanning knows where to find their copies. If this is all correct,
>> >
>> >>>> then your suggestion should work.
>> >
>> >>>>>
>> >
>> >>>>> -Doug
>> >
>> >>>>>
>> >
>> >>>>>>> -----Original Message-----
>> >
>> >>>>>>> From: Doug Simon [mailto:doug.simon at oracle.com]
>> >
>> >>>>>>> Sent: Friday, March 07, 2014 4:27 AM
>> >
>> >>>>>>> To: Deneau, Tom
>> >
>> >>>>>>> Cc: graal-dev at openjdk.java.net<mailto:graal-
>> dev at openjdk.java.net>; sumatra-dev at openjdk.java.net<mailto:sumatra-
>> dev at openjdk.java.net>
>> >
>> >>>>>>> Subject: Re: suspcions about GC and HSAIL Deopt
>> >
>> >>>>>>>
>> >
>> >>>>>>>
>> >
>> >>>>>>> On Mar 7, 2014, at 12:30 AM, Deneau, Tom
>> <tom.deneau at amd.com<mailto:tom.deneau at amd.com>>
>> >
>> >> wrote:
>> >
>> >>>>>>>
>> >
>> >>>>>>>> While preparing this webrev for the hsail deoptimization work
>> >
>> >>>>>>>> we've
>> >
>> >>>>>>> been doing, I noticed some spurious failures when we run on HSA
>> >
>> >>>>>>> hardware.  I have a theory of what's happening, let me know if
>> >
>> >>>>>>> this makes sense...
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> First the big overview:
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> When we run a kernel, and it returns from the GPU each workitem
>> >
>> >>>>>>>> can be
>> >
>> >>>>>>> in one of 3 states:
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> a) finished normally
>> >
>> >>>>>>>> b) deopted and saved its state (and set the deopt-happened
>> >
>> >>>>>>>> flag)
>> >
>> >>>>>>>> c) on entry, saw deopt-happened=true and so just exited early
>> >
>> >>>>>>>>    without running.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> This last one exists because we don't want to have to allocate
>> >
>> >>>>>>>> enough
>> >
>> >>>>>>> deopt save space so that each workitem has its own unique save
>> >
>> >>>> space.
>> >
>> >>>>>>>> Instead we only allocate enough for the number of concurrent
>> >
>> >>>>>>>> workitems
>> >
>> >>>>>>> possible.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> When we return from the GPU, if one or more workitems deopted
>> >
>> >> we:
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> a) for the workitems that finished normally, there is nothing
>> >
>> >>>>>>>> to do
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> b) for each deopted workitems, we want to run it thru the
>> >
>> >>>>>>>>    interpreter going first thru the special host trampoline
>> >
>> >> code
>> >
>> >>>>>>>>    infrastructure that Gilles created.  The trampoline host
>> >
>> >> code
>> >
>> >>>>>>>>    takes a deoptId (sort of like a pc, telling where the deopt
>> >
>> >>>>>>>>    occurred in the hsail code) and a pointer to the saved hsail
>> >
>> >>>>>>>>    frame.  We currently do this sequentially although other
>> >
>> >>>>>>>>    policies are possible.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> c) for each never ran workitem, we can just run it from the
>> >
>> >>>>>>>>    beginning of the kernel "method", just making sure we pass
>> >
>> >> the
>> >
>> >>>>>>>>    arguments and the appropriate workitem id for each one.
>> >
>> >> Again,
>> >
>> >>>>>>>>    we currently do this sequentially although other policies
>> >
>> >> are
>> >
>> >>>>>>>>    possible.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> When we enter the JVM to run the kernel, we transition to
>> >
>> >>>>>>>> thread_in_vm
>> >
>> >>>>>>> mode.  So while running on the GPU, no oops are moving (although
>> >
>> >>>>>>> of course GCs may be delayed).
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> When we start looking for workitems of type b or c above, we
>> are
>> >
>> >>>>>>>> still
>> >
>> >>>>>>> in thread_in_vm mode.  However since both b and c above use the
>> >
>> >>>>>>> javaCall infrastructure, I believe they are transitioning to
>> >
>> >>>>>>> thread_in_java mode on each call, and oops can move.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> So if for instance there are two deopting workitems, it is
>> >
>> >>>>>>>> possible
>> >
>> >>>>>>> that after executing the first one that the saved deopt state
>> for
>> >
>> >>>>>>> the second one is no longer valid.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> The junit tests on which I have seen the spurious failures are
>> >
>> >>>>>>>> ones
>> >
>> >>>>>>> where lots of workitems deopt.  When run in the hotspot debug
>> >
>> >>>>>>> build, we usually see SEGVs in interpreter code and the access
>> is
>> >
>> >>>>>>> always to 0xbaadbabe.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> Note that when Gilles was developing his infrastructure, the
>> >
>> >>>>>>>> only test
>> >
>> >>>>>>> cases we had all had a single workitem deopting so would not
>> show
>> >
>> >>>> this.
>> >
>> >>>>>>> Also even with multi-deopting test cases, I believe the reason
>> we
>> >
>> >>>>>>> don't see this on the simulator is that the concurrency is much
>> >
>> >>>>>>> less there so the number of workitems of type b) above will be
>> >
>> >> much less.
>> >
>> >>>>>>> On hardware, we can have thousands of workitems deopting.
>> >
>> >>>>>>>>
>> >
>> >>>>>>>> I suppose the solution to this is to mark any oops in the deopt
>> >
>> >>>>>>>> saved
>> >
>> >>>>>>> state in some way that GC can find them and fix them.  What is
>> >
>> >>>>>>> the best way to do this?
>> >
>> >>>>>>>
>> >
>> >>>>>>> I'm not sure it's the most optimal solution, but around each
>> >
>> >>>>>>> javaCall, you could convert each saved oop to a Handle and
>> >
>> >>>>>>> convert it back after the call. I'm not aware of other
>> mechanisms
>> >
>> >>>>>>> in HotSpot for registering GC roots but that doesn't mean they
>> >
>> >> don't exist.
>> >
>> >>>>>>>
>> >
>> >>>>>>>> Or is there any way to execute javaCalls from thread_in_vm mode
>> >
>> >>>>>>> without allowing GCs to happen?
>> >
>> >>>>>>>
>> >
>> >>>>>>> You are calling arbitrary Java code right? That means you cannot
>> >
>> >>>>>>> guarantee allocation won't be performed which in turn means you
>> >
>> >>>>>>> cannot disable GC (even though there are mechanisms for doing so
>> >
>> >>>>>>> like GC_locker::lock_critical/GC_locker::unlock_critical).
>> >
>> >>>>>>>
>> >
>> >>>>>>> -Doug
>> >
>> >>>>>>
>> >
>> >>>>>>
>> >
>> >>>>>
>> >
>> >>>
>> >
>> >
>>
>
>