[code-reflection] RFR: [hat][proposal] ComputeRange and ThreadMesh API for defining 1D, 2D and 3D Ranges [v3]
Juan Fumero
duke at openjdk.org
Tue Aug 12 15:00:13 UTC 2025
> This PR proposes an extension of the HAT API to leverage 1D, 2D and 3D ranges for the compute context dispatch.
> A `ComputeRange` is an entity that holds global and local thread mesh. In the future, we can add offsets to it.
>
> Each `ThreadMesh` is a triplet representing the number of threads for x,y, and z dimensions.
>
> How to dispatch 1D kernels?
>
>
> ComputeRange range1D = new ComputeRange(new GlobalMesh1D(size));
> cc.dispatchKernel(range1D,
> kc -> myKernel(...));
>
>
> How to dispatch 2D kernels?
>
>
> ComputeRange range2D = new ComputeRange(new GlobalMesh2D(size, size));
> cc.dispatchKernel(range2D,
> kc -> my2DKernel(...));
>
>
> How to enable local mesh?
>
> We pass a second parameter to the ComputeRange constructor to define local mesh. If it is not passed, then it is `null` and the HAT runtime can select a default set of values.
>
>
> ComputeRange computeRange = new ComputeRange(
> new GlobalMesh2D(globalSize, globalSize),
> new LocalMesh2D(16, 16));
> cc.dispatchKernel(computeRange,
> kc -> matrixMultiplyKernel2D(kc, matrixA, matrixB, matrixC, globalSize)
> );
>
>
> In addition, this PR renames the `KernelContext` internal API to map the context ndrange object to native memory to `KernelBufferContext`.
>
>
> #### How to check?
>
>
> java @hat/run ffi-opencl matmul 1D
> java @hat/run ffi-opencl matmul 2D
>
> java @hat/run ffi-cuda matmul 1D
> java @hat/run ffi-cuda matmul 2D
Juan Fumero has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits:
- Merge branch 'code-reflection' into hat/api/computerange
- [hat] BuildCallGraph refactored
- [hat] Improve logging for thread-mesh in OpenCL
- [hat][cuda] 1D to 3D mesh refactored for the CUDA Backend
- [hat] ThreadMesh Buffer composed with iFace to set global and local thread-mesh
- [hat][api] ThreadMesh moved to records implementation
- [hat] Javadoc for the rest of the ComputeRange Class
- [hat] Add threadmesh subtyping to keep consistency accross dimensions between global and local
- [hat] ThreadBlock dispatcher enabled for the CUDA backend
- [hat][api] Proposal for ComputeRange and ThreadMesh
-------------
Changes: https://git.openjdk.org/babylon/pull/516/files
Webrev: https://webrevs.openjdk.org/?repo=babylon&pr=516&range=02
Stats: 851 lines in 23 files changed: 709 ins; 91 del; 51 mod
Patch: https://git.openjdk.org/babylon/pull/516.diff
Fetch: git fetch https://git.openjdk.org/babylon.git pull/516/head:pull/516
PR: https://git.openjdk.org/babylon/pull/516
More information about the babylon-dev
mailing list