[code-reflection] RFR: [feature][HAT] 2D and 3D ranges for the NDRange API [v3]

Thu Jul 24 13:23:12 UTC 2025

On Thu, 24 Jul 2025 08:21:20 GMT, Juan Fumero <duke at openjdk.org> wrote:

>> This patch introduces the concept of 2D and 3D ranges. 
>> 
>> This allows developers to do the following:
>> - Accessing 1D, 2D and 3D thread indexing within the kernels. The `KernelContext` API has been extended to do this.
>> - Dispatching 1D, 2D and 3D kernels via the `ComputeContext`.
>> 
>> **IMPORTANT**: This patch has been tested for the OpenCL backend, the CUDA backend generates correct code, but for some reason I can't dispatch it due to CUDA incompatibilities (seems to be orthogonal to this PR). I will investigate this in a separate issue. 
>> 
>> An example of use is provided in the `matmul/Main.java` class.
>> 
>> Example of 1D kernel in HAT:
>> 
>> 
>> @CodeReflection
>>     public static void matrixMultiplyKernel1D(@RO KernelContext kc, @RO F32Array matrixA, @RO F32Array matrixB, @RW F32Array matrixC, int size) {
>>         if (kc.x < kc.maxX) {
>>             for (int j = 0; j < size; j++) {
>>                 float acc = 0;
>>                 for (int k = 0; k < size; k++) {
>>                     acc += (matrixA.array(kc.x * size + k) * matrixB.array(k * size + j));
>>                 }
>>                 matrixC.array(kc.x * size + j, acc);
>>             }
>>         }
>>     }
>> 
>> 
>> Example of a 2D kernel:
>> 
>> 
>> @CodeReflection
>>     public static void matrixMultiplyKernel2D(@RO KernelContext kc, @RO F32Array matrixA, @RO F32Array matrixB, @RW F32Array matrixC, int size) {
>>         if (kc.x < kc.maxX) {
>>             if (kc.y < kc.maxY) {
>>                 float acc = 0;
>>                 for (int k = 0; k < size; k++) {
>>                     acc += (matrixA.array(kc.x * size + k) * matrixB.array(k * size + kc.y));
>>                 }
>>                 matrixC.array(kc.x * size + kc.y, acc);
>>             }
>>         }
>>     }
>> 
>> 
>> How to dispatch?
>> 
>> The dispatcher now has more parameters to setup the max thread size for each dimension. In the case of a 2D:
>> 
>> 
>> computeContext.dispatchKernel(maxX, maxY,
>>     kernelContext -> matrixMultiplyKernel2D(kernelContext, matrixA, matrixB, matrixC, size)
>> );
>> 
>> 
>> ### How to test this patch?
>> 
>> I extended the test-case for the matrix multiplication. 
>> 
>> Run with 1D:
>> 
>> 
>> HAT=SHOW_CODE java @hat/run ffi-opencl matmul 1D
>> 
>> 
>> Run with 2D
>> 
>> 
>> HAT=SHOW_CODE java @hat/run ffi-opencl matmul 2D
>
> Juan Fumero has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Merge branch 'code-reflection' into dev/ndrange
>  - [hat] launch exception if more than 3D configuration is set
>  - [feature] 2D and 3D ranges for the NDRange API
>    
>    This patch introduces the concept of 2D and 3D ranges.
>    This allows developers to:
>    - Access 1D, 2D and 3D thread indexing within the kernels.
>      The KernelContext API has been extended.
>    - Dispatch 1D, 2D and 3D kernels via the ComputeContext.
>    
>    An example of use is provided in the matmul/Main.java class.

I fixed the dimension based on Gary's feedback. This PR is ready.

-------------

PR Comment: https://git.openjdk.org/babylon/pull/496#issuecomment-3113451887