[babylon-docs:master] RFR: [article] Optimizing GPU Programs from Java using Babylon and HAT [v13]
Juan Fumero
jfumero at openjdk.org
Fri Jan 16 10:09:40 UTC 2026
On Thu, 15 Jan 2026 19:32:05 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:
>> Juan Fumero has updated the pull request incrementally with one additional commit since the last revision:
>>
>> [article][hat] clarification thread-indexing
>
> site/articles/hat-matmul.md line 835:
>
>> 833: contiguous elements of the the same row of the matrix (row-major).
>> 834: HAT maps the parallel construct `kc.gix` to an equivalent parallel construct
>> 835: of the underlying programming model, and our case, CUDA.
>
> Row-major order and column-major order refer to the _layout_ of a matrix's elements in memory. They are not terms used for the _access_, or _iteration_ over, a matrix's elements, such as iterating over rows or columns. How one accesses the memory, translating from row and column to index _depends_ on the order, and therefore can affect performance if memory is not accessed linearly in regular patterns, or in the case of the GPU linearly with the thread id.
>
> It is important to state clearly upfront that the matrices elements are represented in memory in row-major order as many readers may not be aware of that. Then it becomes much clearer that iterating over a row of a matrix when done appropriately is more efficient.
Thanks for the feedback. I just pushed a new version.
-------------
PR Review Comment: https://git.openjdk.org/babylon-docs/pull/15#discussion_r2697830840
More information about the babylon-dev
mailing list