reduction / graal / kaveri / heterogeneous queueing ?

Tue Jun 17 20:43:27 UTC 2014

Guys,

I have been playing with a system of gpu based reduction that is 
intended to work as follows:

e.g.

public void kernel(Object[] input, Object[] output, int i) {

   output[i] = foo(input[i*2], input[(i*2)+1];

}

foo is the reducing function and is expected to return the reduction of 
two elements of the sequence being reduced.

kernel would be called with e.g.

  input=Object[2n], output=Object[n], i=n.

The idea is to go through a number of reduction steps, each one taking 
an input of size 2x and producing an output of size x, which can then be 
fed back into the same kernel as input for the following round. Odds and 
ends can be picked up by the cpu and folded in at a suitable juncture - 
repeat until output array is too small to reduce further on the gpu and 
so finish up the reduction on the cpu....

Questions:

- does this sound sensible ?

- I've read about Kaveri h/w supporting heterogeneous (including 
gpu->gpu) queueing. Is this available, or are there plans to surface it, 
in clumatra/graal/okra ? I need this to sequence the steps of my 
reduction efficiently.

- anything else that anyone feels is relevant :-)

looking forward to hearing from you,

Jules