[code-reflection] RFR: HAT memory mallocs and copies improved [v2]
Gary Frost
gfrost at openjdk.org
Fri Dec 12 17:34:25 UTC 2025
On Fri, 12 Dec 2025 12:14:51 GMT, Juan Fumero <jfumero at openjdk.org> wrote:
>> Use of `MappableIface` annotations to tag the buffer accessor for OpenCL.
>> In addition, this PR reuses buffers attach to each dispatch and copies the data that is needed.
>>
>> For OpenCL and macOS, this PR in HAT increases performance of up to 30% end-to-end time (including data movement).
>>
>> Buffer accessors are not available for CUDA, but the CUDA runtime in HAT minimizes the transfers as well as cuda allocations.
>>
>> How to test? All unit-tests should be passing:
>>
>>
>> java @hat/test-suite ffi-opencl
>>
>>
>> Run violajones:
>>
>>
>> HAT=MINIMIZE_COPIES java -cp hat/job.jar hat.java run ffi-opencl -Dheadless=true violajones
>> ```
>>
>> Run GoL:
>>
>>
>> java @hat/run ffi-opencl life
>
> Juan Fumero has updated the pull request incrementally with one additional commit since the last revision:
>
> [hat] life-example revert control to RO
hat/backends/ffi/cuda/src/main/native/cpp/cuda_backend.cpp line 366:
> 364: CudaBackend::CudaBuffer *CudaBackend::getOrCreateBuffer(BufferState *bufferState, u8_t accessor) {
> 365: CudaBuffer *cudaBuffer = nullptr;
> 366: if (bufferState->vendorPtr == nullptr || bufferState->state == BufferState::NEW_STATE) {
So I understand we are embedding the access (RO|WO etc) in the buffer...
we get this access from the arg slot info (RO|WO etc) and then putting this in the memory segment..
What if we pass this to another kernel whose access is different?
-------------
PR Review Comment: https://git.openjdk.org/babylon/pull/746#discussion_r2615045605
More information about the babylon-dev
mailing list