webrev for hsail deoptimization

Deneau, Tom tom.deneau at amd.com
Tue Mar 11 20:08:33 UTC 2014


I have placed a webrev up at 
 http://cr.openjdk.java.net/~tdeneau/graal-webrevs/webrev-hsail-deopt 
which we would like to get checked into the graal trunk.

This consists of at least the start of support for deoptimization in
HSAIL kernels.  Although the list of files changed may look long, many
of the files have only a few lines changed.  Special thanks to Gilles
Duboscq and Doug Simon who provided some of the infrastructure that
this webrev uses.

Below I have described

   * an overview of the codepaths the data structures
   * java and hotspot source changes


Deoptimization Data Structures and Overview
===========================================

At kernel dispatch time, we allocate space for any workitems should
want to deopt.  To reduce the space requirements, space is only
reserved for the maximum number of possible concurrent workitems that
could be running on that hsa device.

A deopting workitem sets a "deopt happened" flag, and future workitems
that see "deopt happened" as true will just set a flag saying they
never ran and exit early.  Currently the never_ran array is one per
workitem.  We are looking at ways to make this smaller but HSA devices
have a lot of freedom in how they schedule workitems (current hardware
and the simulator are quite different).

Workitems that deopt atomically bump an index saying where they should
store their deopt data.  The deopt data consists of
   * workitemid
   * deopt actionAndReason
   * the first HSAILFrame

An HSAILFrame consists of
   * the deoptId or "pc" offset where the deopt occurred
   * number of s registers
   * number of d registers
   * a bitmap indicating which d registers are oops
   * space for saving the d and s registers

Currently we always set num_s_registers to 32 and num_d_registers to
16 but in the hsail code of the kernel we only save the union of the
actual registers that are live at any of the infopoints.

On return from the GPU, we check if there were any deopts.  If not, we
just return back to java.  If there was at least one deopt then

   a) for the workitems that finished normally, there is nothing to do

   b) if there are any deopted workitems, we want to run each deopting
      workitem thru the interpreter going first thru the special host
      trampoline code infrastructure that Gilles created.  The
      trampoline host code takes the deoptId and a pointer to the
      saved hsail frame.  We currently do this sequentially although
      other policies are possible.

   c) for each never ran workitem, we can just run it from the
      beginning of the kernel "method", just making sure we pass the
      arguments and the appropriate workitem id for each one.  Again,
      we currently do this sequentially although other policies are
      possible.

Because running either type b or c above can cause GCs, and because
some of our saved d registers are pointers into the java heap, we take
care in case any of these saved pointers are affected by GC.  The
current strategy of using an Object array supplied by the java side
will be replaced later with an oops_do type of strategy.



Description of source changes in this webrev.
=============================================
graal source changes
====================

Assembler, HSAILAssembler
   minor changes for new instructions needed for saving deopt information

GraalKernelTester.java
   force simulator to run single threaded.

KernelTester.java
   minor changes to handle exceptions which escape the kernel method

HSAILLIRGenerator.java
   support switches with keys of type long


HSAILHotSpotBackend.java

   * compileKernel uses some new optimisticOpts which help generate
     deopts when needed.  Also, we dump the infopoints if Log:CodeGen
     is on

   * HSAILNonNullParametersPhase stamps the appropriate parameters as
     nonNull

   * installKernel uses the new trampoline infrastructure added by
     Gilles do produce the host trampoline deopt method and install
     it.

   * emitCode adds a little bit of code to the prologue and a lot of
     code to the epilogue.  See description at the bottom for the data
     structures used by the never-ran path and the deopt path.

HSAILHotSpotLIRGenerator.java

   * code added by Gilles to build the host graph for the host
     trampoline deopt method.  I suppose some of this would be common
     to any gpu trampoline deopt and should be moved to some
     hsail-independent location.

   * code to handle the creation of a DeoptimizeOp for emitting HSAIL
     code for a deoptimization

HSAILHotSpotLoweringProvider.java

   * refactored to support different strategies for different nodes.
     UnwindNode strategy is to get replaced by a DeoptimizeNode.

HotSpotVMConfig.java

  * define offets to fields in the deopt data structures

VMErrorNode.java

  * public access to constructor (used by building of host graph for
    trampoline code)

HSAIL.java
  * some new non-allocatable registers defined (used by deopt paths)

HSAILControlFlow.java
  * code to emit hsail for a deoptimizationNode

ComputeProbabilityClosure.java
  * just using a change that Gilles made in the patch he gave me.


mx/projects was affected by the move of ExternalCompilationResult to
com.oracle.graal.gpu.  In addition, several files had one line import
changes from the move of ExternalCompilationResult.


hotspot source changes
======================

gpu_hsail.cpp, hpp

   * the execute_kernel routine pushes an extra parameter where deopt info can be saved

   * while pushing kernel args, keeps track if any are null and if so
     sets some new gpu_exception fields in thread structure which gets
     used when thread returns to java mode

   * on return from kernel checks if any deopt occurred.  If so,

      * runs any deopting workitems thru the trampoline deopt code
        which ends up running the kernel method thru the interpreter
        for that workitem.

      * runs any never-ran workitems using simple javaCall.

gpu_hsail_Frame.hpp
   * new structure that defines the layout of a physical HSAIL frame

hsailArgumentsBase.*, hsailKernelArguments.hpp hsailJavaCallArguments.hpp
   * refactored to share code between kernel argument setup and
     javaCall argument setup

javaClasses.cpp

   * contains logic to check the new gpu_exception fields in thread
     structure and if detected, set as top frame on return

graalCompiler.cpp, hpp
   * logic added by Gilles for external_deopt_i2c

javaCalls.cpp, hpp
   * logic added by Gilles for external_deopt_i2c

sharedRuntime.cpp
   * maybe Gilles can explain why the assert was removed in the patch
     he gave me (it asserts if I put it back in)

thread.cpp, hpp
   * handle new gpu_exception fields

vmStructs.cpp
vmStructs_hsail.hpp
   * handle new hsail deopt structs





More information about the graal-dev mailing list