[code-reflection] RFR: 1D Matrix Multiplication example for HAT [v2]
Gary Frost
gfrost at openjdk.org
Mon Apr 7 17:54:03 UTC 2025
On Sat, 18 Jan 2025 06:27:18 GMT, Juan Fumero <duke at openjdk.org> wrote:
>> Add new example for 1D Matrix Multiplication in HAT.
>>
>> ### How to test?
>>
>>
>> ## Compile
>> java --add-modules jdk.incubator.code --enable-preview --source 24 bld
>>
>> ## Run with the OpenCL Backend
>> java @bldr/hatrun ffi-opencl matmul
>>
>> ## Run with the CUDA Backend
>> java @bldr/hatrun ffi-ptx matmul
>>
>>
>> #### Note that the generated kernel for OpenCL contains a race condition:
>>
>>
>> __kernel void matrixMultiplyKernel(
>> __global KernelContext_t *kc, __global F32Array_t* matrixA, __global F32Array_t* matrixB, __global F32Array_t* matrixC, int size
>> ){
>> kc->x=get_global_id(0); // << Shared struct across all threads to store the thread-id
>> if(kc->x<kc->maxX){
>> for(int j = 0; j<size; j=j+1){
>> float acc = (float)0;
>> for(int k = 0; k<size; k=k+1){
>> acc=acc+matrixA->array[(long)(kc->x*size+k)]*matrixB->array[(long)(k*size+j)];
>> }
>> matrixC->array[(long)(kc->x*size+j)]=acc;
>> }
>> }
>> return;
>> }
>>
>>
>> After applying a patch provided by Gary Frost to solve the race condition, it works.
>>
>> Patch:
>>
>>
>> diff --git a/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java b/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java
>> index ade90914d7e..2719fed31ed 100644
>> --- a/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java
>> +++ b/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java
>> @@ -26,7 +26,6 @@
>>
>>
>> import hat.buffer.Buffer;
>> -import hat.buffer.KernelContext;
>> import hat.callgraph.KernelCallGraph;
>> import hat.callgraph.KernelEntrypoint;
>> import hat.optools.FuncOpWrapper;
>> @@ -72,9 +71,13 @@ T typedefStructOrUnion(boolean isStruct, String name, Consumer<T> consumer) {
>>
>>
>> public final T scope() {
>> - return
>> - identifier("kc").rarrow().identifier("x").equals().globalId().semicolon().nl();
>> - //.identifier("kc").rarrow().identifier("maxX").equals().globalSize().semicolon().nl();
>> +
>> + identifier("KernelContext_t").space().identifier("mine").semicolon().nl();
>> + identifier("KernelContext_t").asterisk().space().identifier("kc").equals().ampersand().identifier("mine").semicolon().nl();
>> + identifier("kc").rarrow().identifier("x").equals().globalId().semicolon().nl();
>> + identifier("kc").rarrow().identifier("maxX").equals().identifie...
>
> Juan Fumero has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits:
>
> - Merge branch 'code-reflection' into dev/examples
> - Minor fix seq-comparison code
> - Merge with latest develop
> - Merge branch 'code-reflection' into dev/examples
> - Merge branch 'code-reflection' into dev/examples
> - Merge branch 'code-reflection' into dev/examples
> - MatrixMult example moved to matmul directory
> - Merge branch 'code-reflection' into dev/examples
> - Precision control error down to 1%
> - Matrix-Multiplication checks
> - ... and 1 more: https://git.openjdk.org/babylon/compare/ee3da036...cd3c7ce9
Hi Sidney
As Juan mentioned, I don't think we are there yet. But we have plans.
We need to add low level primitives to HAT for matrix mul/scans etc to
allow us to handoff to the vendor backends (GPU drivers) without
unnecessary copies.
Gary
On Mon, Apr 7, 2025 at 5:20 PM Juan Fumero ***@***.***> wrote:
> Hi @SidneyLann <https://github.com/SidneyLann> , I am not the core
> maintainer of Babylon. Probably Gary Frost can help you with your
> questions. From my view, I think you need to access shared memory and some
> synchronisation primitives to be able to perform reductions. I am not sure
> if this is implemented in HAT yet.
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/openjdk/babylon/pull/276#issuecomment-2783916596>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABBKEN35FIKYUTUIYRDYC5D2YKQWRAVCNFSM6AAAAABSBW7YWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBTHEYTMNJZGY>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
> [image: jjfumero]*jjfumero* left a comment (openjdk/babylon#276)
> <https://github.com/openjdk/babylon/pull/276#issuecomment-2783916596>
>
> Hi @SidneyLann <https://github.com/SidneyLann> , I am not the core
> maintainer of Babylon. Probably Gary Frost can help you with your
> questions. From my view, I think you need to access shared memory and some
> synchronisation primitives to be able to perform reductions. I am not sure
> if this is implemented in HAT yet.
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/openjdk/babylon/pull/276#issuecomment-2783916596>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABBKEN35FIKYUTUIYRDYC5D2YKQWRAVCNFSM6AAAAABSBW7YWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBTHEYTMNJZGY>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
-------------
PR Comment: https://git.openjdk.org/babylon/pull/276#issuecomment-2784126120
More information about the babylon-dev
mailing list