[code-reflection] RFR: [hat][proposal] ComputeRange and ThreadMesh API for defining 1D, 2D and 3D Ranges [v4]

Thu Aug 14 08:49:30 UTC 2025

On Thu, 14 Aug 2025 08:38:41 GMT, Juan Fumero <duke at openjdk.org> wrote:

>> This PR proposes an extension of the HAT API to leverage 1D, 2D and 3D ranges for the compute context dispatch. 
>> A `ComputeRange` is an entity that holds global and local thread mesh. In the future, we can add offsets to it. 
>> 
>> Each `ThreadMesh` is a triplet representing the number of threads for x,y, and z dimensions. 
>> 
>> How to dispatch 1D kernels?
>> 
>> 
>> ComputeRange range1D = new ComputeRange(new GlobalMesh1D(size));
>> cc.dispatchKernel(range1D,
>>                 kc -> myKernel(...));
>> 
>> 
>> How to dispatch 2D kernels?
>> 
>> 
>> ComputeRange range2D = new ComputeRange(new GlobalMesh2D(size, size));
>> cc.dispatchKernel(range2D,
>>                 kc -> my2DKernel(...));
>> 
>> 
>> How to enable local mesh? 
>> 
>> We pass a second parameter to the ComputeRange constructor to define local mesh. If it is not passed, then it is `null` and the HAT runtime can select a default set of values. 
>> 
>> 
>> ComputeRange computeRange = new ComputeRange(
>>         new GlobalMesh2D(globalSize, globalSize), 
>>         new LocalMesh2D(16, 16));
>> cc.dispatchKernel(computeRange,
>>                 kc -> matrixMultiplyKernel2D(kc, matrixA, matrixB, matrixC, globalSize)
>>         );
>> 
>> 
>> In addition, this PR renames the `KernelContext` internal API to map the context ndrange object to native memory to `KernelBufferContext`.
>> 
>> 
>> #### How to check? 
>> 
>> 
>> java @hat/run ffi-opencl matmul 1D
>> java @hat/run ffi-opencl matmul 2D
>> 
>> java @hat/run ffi-cuda matmul 1D
>> java @hat/run ffi-cuda matmul 2D
>
> Juan Fumero has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits:
> 
>  - Merge branch 'code-reflection' into hat/api/computerange
>  - Merge branch 'code-reflection' into hat/api/computerange
>  - [hat] BuildCallGraph refactored
>  - [hat] Improve logging for thread-mesh in OpenCL
>  - [hat][cuda] 1D to 3D mesh refactored for the CUDA Backend
>  - [hat] ThreadMesh Buffer composed with iFace to set global and local thread-mesh
>  - [hat][api] ThreadMesh moved to records implementation
>  - [hat] Javadoc for the rest of the ComputeRange Class
>  - [hat] Add threadmesh subtyping to keep consistency accross dimensions between global and local
>  - [hat] ThreadBlock dispatcher enabled for the CUDA backend
>  - ... and 1 more: https://git.openjdk.org/babylon/compare/c5102d7e...31bee39b

@jjfumero 
Your change (at version 31bee39b5c4a355cfec341d479ec007871af69ea) is now ready to be sponsored by a Committer.

-------------

PR Comment: https://git.openjdk.org/babylon/pull/516#issuecomment-3187524303