Error running HAT with ComputeAorta on CPUs (Codeplay)

Juan Fumero juan.fumero at
Thu Jan 23 13:00:56 UTC 2025

Hi all,
   It seems there is an error when running with the Codeplay OCK implementation:

$ java @bldr/hatrun ffi-opencl matmul
Note: /home/juan/babylon/babylon/hat/bldr/ uses preview features of Java SE 24.
Note: Recompile with -Xlint:preview for details.
   CL_PLATFORM_VENDOR.."Codeplay Software Ltd."
   CL_PLATFORM_VERSION."OpenCL 3.0 ComputeAorta 4.0.0 Linux x86_64 (Release, 5be5a8da)"
         CL_DEVICE_TYPE..................... (0x73d821d10650)
         CL_DEVICE_MAX_COMPUTE_UNITS........ 0
         CL_DEVICE_MAX_WORK_GROUP_SIZE...... 127372130793944
         CL_DEVICE_MAX_MEM_ALLOC_SIZE....... 127372117500240
         CL_DEVICE_GLOBAL_MEM_SIZE.......... 127372117477136
         CL_DEVICE_LOCAL_MEM_SIZE........... 984
         CL_DEVICE_VERSION.................. [!s
         CL_DEVICE_OPENCL_C_VERSION......... `K!s
         CL_DEVICE_NAME..................... 5
         CL_DEVICE_BUILT_IN_KERNELS......... c!s

Values are not taken correctly, and then it fails the kernel launch. If I use the Intel compute runtime, it runs fine.

Output from clinfo:

  Platform Name                                   ComputeAorta
Number of devices                                 1
  Device Name                                     ComputeAorta x86_64
  Device Vendor                                   Codeplay Software Ltd.
  Device Vendor ID                                0x10004
  Device Version                                  OpenCL 3.0 ComputeAorta 4.0.0 LLVM 18.1.8
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  4.0
  Device OpenCL C Version                         OpenCL C 1.2 Clang 18.1.8
  Device OpenCL C all versions                    OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_subgroups                                             0xc00000 (3.0.0)
                                                  __opencl_c_work_group_collective_functions                       0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
  Latest comfornace test passed                   v2020-10-18-08
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               32
  Max clock frequency                             5260MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             1024
  Preferred work group size multiple (device)     1
  Preferred work group size multiple (kernel)     1024
  Max sub-groups per work group                   1024
  Sub-group sizes (Intel)                         8, 4, 16, 32, 1
  Preferred / native vector sizes
    char                                                16 / 16
    short                                                8 / 8
    int                                                  4 / 4
    long                                                 2 / 2
    half                                                 0 / 0        (n/a)
    float                                                4 / 4
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)

Kind regards,

