[babylon-docs:master] RFR: [article] Optimizing GPU Programs from Java using Babylon and HAT [v13]

Thu Jan 15 19:37:46 UTC 2026

On Thu, 15 Jan 2026 11:15:02 GMT, Juan Fumero <jfumero at openjdk.org> wrote:

>> [article] Optimizing GPU Programs from Java using Babylon and HAT
>
> Juan Fumero has updated the pull request incrementally with one additional commit since the last revision:
> 
>   [article][hat] clarification thread-indexing

site/articles/hat-matmul.md line 835:

> 833: contiguous elements of the the same row of the matrix (row-major).
> 834: HAT maps the parallel construct `kc.gix` to an equivalent parallel construct
> 835: of the underlying programming model, and our case, CUDA. 

Row-major order and column-major order refer to the _layout_ of a matrix's elements in memory. They are not terms used for the _access_, or _iteration_ over, a matrix's elements, such as iterating over rows or columns. How one accesses the memory, translating from row and column to index _depends_ on the order, and therefore can affect performance if memory is not accessed linearly in regular patterns, or in the case of the GPU linearly with the thread id.

It is important to state clearly upfront that the matrices elements are represented in memory in row-major order as many readers may not be aware of that. Then it becomes much clearer that iterating over a row of a matrix when done appropriately is more efficient.

site/articles/hat-matmul.md line 864:

> 862: row-major. Thus, for matrix A of our example, 
> 863: the following indexing applies for consecutive threads based on the 
> 864: previous kernel:

See prior comment.

-------------

PR Review Comment: https://git.openjdk.org/babylon-docs/pull/15#discussion_r2695686930
PR Review Comment: https://git.openjdk.org/babylon-docs/pull/15#discussion_r2695692606