[code-reflection] Integrated: HAT - New examples for optimizing matmul

Tue Sep 30 14:16:48 UTC 2025

On Tue, 30 Sep 2025 14:01:08 GMT, Juan Fumero <jfumero at openjdk.org> wrote:

> This PR includes new examples and tests to that how HAT could optimize matmuls. This PR shows examples using 2D Cache + Loop tiling and 2D Register Tiling. 
> 
> Two implementations are provided. One more specific to how CUDA handles threads, and another one that can be ported to both CUDA and OpenCL. Both implementations can be further tuned, depending on the GPU card. 
> 
> The goal is to show how matmul, or any other HAT kernel, can be tuned with the current building blocks of HAT. These examples makes use of local/shared data structaures, private data structures, and local/thread-block IDs to access data. 
> 
> How to test? 
> 
> 
> HAT=SHOW_CODE java @hat/run ffi-opencl matmul 2DRTPORTABLE
> 
> HAT=SHOW_CODE java @hat/run ffi-cuda matmul 2DRTPORTABLE

This pull request has now been integrated.

Changeset: 7d39deec
Author:    Juan Fumero <jfumero at openjdk.org>
URL:       https://git.openjdk.org/babylon/commit/7d39deec2e23a430e9d285345831da67cc5f81e6
Stats:     732 lines in 9 files changed: 704 ins; 5 del; 23 mod

HAT - New examples for optimizing matmul

-------------

PR: https://git.openjdk.org/babylon/pull/587