[code-reflection] RFR: [hat][proposal] ComputeRange and ThreadMesh API for defining 1D, 2D and 3D Ranges [v4]
duke
duke at openjdk.org
Thu Aug 14 08:49:30 UTC 2025
On Thu, 14 Aug 2025 08:38:41 GMT, Juan Fumero <duke at openjdk.org> wrote:
>> This PR proposes an extension of the HAT API to leverage 1D, 2D and 3D ranges for the compute context dispatch.
>> A `ComputeRange` is an entity that holds global and local thread mesh. In the future, we can add offsets to it.
>>
>> Each `ThreadMesh` is a triplet representing the number of threads for x,y, and z dimensions.
>>
>> How to dispatch 1D kernels?
>>
>>
>> ComputeRange range1D = new ComputeRange(new GlobalMesh1D(size));
>> cc.dispatchKernel(range1D,
>> kc -> myKernel(...));
>>
>>
>> How to dispatch 2D kernels?
>>
>>
>> ComputeRange range2D = new ComputeRange(new GlobalMesh2D(size, size));
>> cc.dispatchKernel(range2D,
>> kc -> my2DKernel(...));
>>
>>
>> How to enable local mesh?
>>
>> We pass a second parameter to the ComputeRange constructor to define local mesh. If it is not passed, then it is `null` and the HAT runtime can select a default set of values.
>>
>>
>> ComputeRange computeRange = new ComputeRange(
>> new GlobalMesh2D(globalSize, globalSize),
>> new LocalMesh2D(16, 16));
>> cc.dispatchKernel(computeRange,
>> kc -> matrixMultiplyKernel2D(kc, matrixA, matrixB, matrixC, globalSize)
>> );
>>
>>
>> In addition, this PR renames the `KernelContext` internal API to map the context ndrange object to native memory to `KernelBufferContext`.
>>
>>
>> #### How to check?
>>
>>
>> java @hat/run ffi-opencl matmul 1D
>> java @hat/run ffi-opencl matmul 2D
>>
>> java @hat/run ffi-cuda matmul 1D
>> java @hat/run ffi-cuda matmul 2D
>
> Juan Fumero has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits:
>
> - Merge branch 'code-reflection' into hat/api/computerange
> - Merge branch 'code-reflection' into hat/api/computerange
> - [hat] BuildCallGraph refactored
> - [hat] Improve logging for thread-mesh in OpenCL
> - [hat][cuda] 1D to 3D mesh refactored for the CUDA Backend
> - [hat] ThreadMesh Buffer composed with iFace to set global and local thread-mesh
> - [hat][api] ThreadMesh moved to records implementation
> - [hat] Javadoc for the rest of the ComputeRange Class
> - [hat] Add threadmesh subtyping to keep consistency accross dimensions between global and local
> - [hat] ThreadBlock dispatcher enabled for the CUDA backend
> - ... and 1 more: https://git.openjdk.org/babylon/compare/c5102d7e...31bee39b
@jjfumero
Your change (at version 31bee39b5c4a355cfec341d479ec007871af69ea) is now ready to be sponsored by a Committer.
-------------
PR Comment: https://git.openjdk.org/babylon/pull/516#issuecomment-3187524303
More information about the babylon-dev
mailing list