[code-reflection] RFR: HAT - New examples for optimizing matmul
Juan Fumero
jfumero at openjdk.org
Tue Sep 30 14:06:42 UTC 2025
This PR includes new examples and tests to that how HAT could optimize matmuls. This PR shows examples using 2D Cache + Loop tiling and 2D Register Tiling.
Two implementations are provided. One more specific to how CUDA handles threads, and another one that can be ported to both CUDA and OpenCL. Both implementations can be further tuned, depending on the GPU card.
The goal is to show how matmul, or any other HAT kernel, can be tuned with the current building blocks of HAT. These examples makes use of local/shared data structaures, private data structures, and local/thread-block IDs to access data.
How to test?
HAT=SHOW_CODE java @hat/run ffi-opencl matmul 2DRTPORTABLE
HAT=SHOW_CODE java @hat/run ffi-cuda matmul 2DRTPORTABLE
-------------
Commit messages:
- [hat] Fix CUDA scheduler
- Merge branch 'code-reflection' into hat/mxm/opts
- [hat] Example of matmul with 2D register tiling
Changes: https://git.openjdk.org/babylon/pull/587/files
Webrev: https://webrevs.openjdk.org/?repo=babylon&pr=587&range=00
Stats: 732 lines in 9 files changed: 704 ins; 5 del; 23 mod
Patch: https://git.openjdk.org/babylon/pull/587.diff
Fetch: git fetch https://git.openjdk.org/babylon.git pull/587/head:pull/587
PR: https://git.openjdk.org/babylon/pull/587
More information about the babylon-dev
mailing list