[code-reflection] RFR: [hat][proposal] ComputeRange and ThreadMesh API for defining 1D, 2D and 3D Ranges [v3]

Tue Aug 12 15:00:13 UTC 2025

> This PR proposes an extension of the HAT API to leverage 1D, 2D and 3D ranges for the compute context dispatch. 
> A `ComputeRange` is an entity that holds global and local thread mesh. In the future, we can add offsets to it. 
> 
> Each `ThreadMesh` is a triplet representing the number of threads for x,y, and z dimensions. 
> 
> How to dispatch 1D kernels?
> 
> 
> ComputeRange range1D = new ComputeRange(new GlobalMesh1D(size));
> cc.dispatchKernel(range1D,
>                 kc -> myKernel(...));
> 
> 
> How to dispatch 2D kernels?
> 
> 
> ComputeRange range2D = new ComputeRange(new GlobalMesh2D(size, size));
> cc.dispatchKernel(range2D,
>                 kc -> my2DKernel(...));
> 
> 
> How to enable local mesh? 
> 
> We pass a second parameter to the ComputeRange constructor to define local mesh. If it is not passed, then it is `null` and the HAT runtime can select a default set of values. 
> 
> 
> ComputeRange computeRange = new ComputeRange(
>         new GlobalMesh2D(globalSize, globalSize), 
>         new LocalMesh2D(16, 16));
> cc.dispatchKernel(computeRange,
>                 kc -> matrixMultiplyKernel2D(kc, matrixA, matrixB, matrixC, globalSize)
>         );
> 
> 
> In addition, this PR renames the `KernelContext` internal API to map the context ndrange object to native memory to `KernelBufferContext`.
> 
> 
> #### How to check? 
> 
> 
> java @hat/run ffi-opencl matmul 1D
> java @hat/run ffi-opencl matmul 2D
> 
> java @hat/run ffi-cuda matmul 1D
> java @hat/run ffi-cuda matmul 2D

Juan Fumero has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits:

 - Merge branch 'code-reflection' into hat/api/computerange
 - [hat] BuildCallGraph refactored
 - [hat] Improve logging for thread-mesh in OpenCL
 - [hat][cuda] 1D to 3D mesh refactored for the CUDA Backend
 - [hat] ThreadMesh Buffer composed with iFace to set global and local thread-mesh
 - [hat][api] ThreadMesh moved to records implementation
 - [hat] Javadoc for the rest of the ComputeRange Class
 - [hat] Add threadmesh subtyping to keep consistency accross dimensions between global and local
 - [hat] ThreadBlock dispatcher enabled for the CUDA backend
 - [hat][api] Proposal for ComputeRange and ThreadMesh

-------------

Changes: https://git.openjdk.org/babylon/pull/516/files
  Webrev: https://webrevs.openjdk.org/?repo=babylon&pr=516&range=02
  Stats: 851 lines in 23 files changed: 709 ins; 91 del; 51 mod
  Patch: https://git.openjdk.org/babylon/pull/516.diff
  Fetch: git fetch https://git.openjdk.org/babylon.git pull/516/head:pull/516

PR: https://git.openjdk.org/babylon/pull/516