[code-reflection] RFR: HAT memory mallocs and copies improved [v2]

Fri Dec 12 17:34:25 UTC 2025

On Fri, 12 Dec 2025 12:14:51 GMT, Juan Fumero <jfumero at openjdk.org> wrote:

>> Use of `MappableIface` annotations to tag the buffer accessor for OpenCL. 
>> In addition, this PR reuses buffers attach to each dispatch and copies the data that is needed. 
>> 
>> For OpenCL and macOS, this PR in HAT increases performance of up to 30% end-to-end time (including data movement).
>> 
>> Buffer accessors are not available for CUDA, but the CUDA runtime in HAT minimizes the transfers as well as cuda allocations. 
>> 
>> How to test? All unit-tests should be passing:
>> 
>> 
>> java @hat/test-suite ffi-opencl
>> 
>> 
>> Run violajones:
>> 
>> 
>> HAT=MINIMIZE_COPIES java -cp hat/job.jar hat.java run ffi-opencl -Dheadless=true violajones
>> ``` 
>> 
>> Run GoL:
>> 
>> 
>> java @hat/run ffi-opencl life
>
> Juan Fumero has updated the pull request incrementally with one additional commit since the last revision:
> 
>   [hat] life-example revert control to RO

hat/backends/ffi/cuda/src/main/native/cpp/cuda_backend.cpp line 366:

> 364: CudaBackend::CudaBuffer *CudaBackend::getOrCreateBuffer(BufferState *bufferState, u8_t accessor) {
> 365:     CudaBuffer *cudaBuffer = nullptr;
> 366:     if (bufferState->vendorPtr == nullptr || bufferState->state == BufferState::NEW_STATE) {

So I understand we are embedding the access (RO|WO etc) in the buffer... 

we get this access from the arg slot info (RO|WO etc) and then putting this in the memory segment.. 

What if we pass this to another kernel whose access is different?

-------------

PR Review Comment: https://git.openjdk.org/babylon/pull/746#discussion_r2615045605