[code-reflection] RFR: [hat] Flash-Attention example added [v3]

Juan Fumero jfumero at openjdk.org
Wed Jan 21 09:16:51 UTC 2026


> New example that includes self-attention in Java and HAT, and flash-attention in HAT using tiling and shared memory. This version is a simplified version of the original paper, just to showcase how to use these techniques within HAT. 
> 
> This version uses a single-head and FP32 arrays. 
> 
> Running with the OpenCL backend:
> 
> 
> java -cp hat/job.jar hat.java run ffi-opencl flashattention
> 
> 
> For the CUDA backend:
> 
> 
> java -cp hat/job.jar hat.java run ffi-cuda flashattention

Juan Fumero has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:

 - Merge branch 'code-reflection' into hat/examples/flash-attention
 - [hat][example] code-formatted
 - minor change
 - merge with code-reflection branch
 - Merge branch 'code-reflection' into hat/examples/flash-attention
 - [hat][example] flash-attention improved. Speedups calculations fixed
 - [hat][example] flash-attention functions renamed to avoid colisions with CUDA functions
 - [hat][example] Simple flash-attention added

-------------

Changes: https://git.openjdk.org/babylon/pull/854/files
  Webrev: https://webrevs.openjdk.org/?repo=babylon&pr=854&range=02
  Stats: 624 lines in 8 files changed: 617 ins; 1 del; 6 mod
  Patch: https://git.openjdk.org/babylon/pull/854.diff
  Fetch: git fetch https://git.openjdk.org/babylon.git pull/854/head:pull/854

PR: https://git.openjdk.org/babylon/pull/854


More information about the babylon-dev mailing list