Thoughts on triggering GPU redirection
Deneau, Tom
tom.deneau at amd.com
Mon Jan 14 15:04:42 PST 2013
Given a call to some library API, we'll call it ParallelFor, which
specifies a range of integers and a lambda expression to execute
across that range, we want to eventually redirect it to a GPU
execution. The java code for ParallelFor will likely spin off a bunch
of threads where the work is done.
We've been experimenting with different ways of triggering that
redirection. To keep things simple in the discussion below, let's
assume TieredCompilation is off.
The first triggering strategy was simple, when ParalleFor itself gets
marked for compilation by C2, we could look up the stack to find our
caller, cause the caller to be compiled and the call to ParallelFor
could be intrinsifed to go to some native method which executes the
equivalent code on the GPU.
But maybe you want to trigger sooner than that. ParallelFor may be
called with a large range, each element of the range executing a
lambdain the worker threads. So when the lambda gets hot enough to be
compiled (which will happen much sooner), that could also be a
trigger. The stack at the time of the lambda compilation will be in
one of the worker threads so we'd have to find the ParallelFor caller
back int the "main" thread that spun off the worker threads, get that
to compile, etc.
Both of the above strategies might work OK when there is one
ParallelFor call site but in general we might not be able to track
the lambda executions or the ParallelFor execution itself to any
particular call sites.
An alternative would be to keep counters per call site where
ParallelFor is called in the interpreter. We could count for instance
not just # of calls, but total # of range elements or whatever other
statistics we wanted. When the statistics at that site overflow, we
trigger the compilation of the ParallelFor caller (up one level on the
stack) and mark that particular call site for intrinsic redirection.
Thus, we avoid having to guess which call sites contributed the
counts.
Is the above something feasible for the interpreter to be made to do?
-- Tom Deneau
More information about the sumatra-dev
mailing list