[code-reflection] Integrated: HAT - New examples for optimizing matmul
Juan Fumero
jfumero at openjdk.org
Tue Sep 30 14:16:48 UTC 2025
On Tue, 30 Sep 2025 14:01:08 GMT, Juan Fumero <jfumero at openjdk.org> wrote:
> This PR includes new examples and tests to that how HAT could optimize matmuls. This PR shows examples using 2D Cache + Loop tiling and 2D Register Tiling.
>
> Two implementations are provided. One more specific to how CUDA handles threads, and another one that can be ported to both CUDA and OpenCL. Both implementations can be further tuned, depending on the GPU card.
>
> The goal is to show how matmul, or any other HAT kernel, can be tuned with the current building blocks of HAT. These examples makes use of local/shared data structaures, private data structures, and local/thread-block IDs to access data.
>
> How to test?
>
>
> HAT=SHOW_CODE java @hat/run ffi-opencl matmul 2DRTPORTABLE
>
> HAT=SHOW_CODE java @hat/run ffi-cuda matmul 2DRTPORTABLE
This pull request has now been integrated.
Changeset: 7d39deec
Author: Juan Fumero <jfumero at openjdk.org>
URL: https://git.openjdk.org/babylon/commit/7d39deec2e23a430e9d285345831da67cc5f81e6
Stats: 732 lines in 9 files changed: 704 ins; 5 del; 23 mod
HAT - New examples for optimizing matmul
-------------
PR: https://git.openjdk.org/babylon/pull/587
More information about the babylon-dev
mailing list