[External] : Re: Error running HAT with ComputeAorta on CPUs (Codeplay)

Juan Fumero juan.fumero at manchester.ac.uk
Fri Jan 24 13:54:16 UTC 2025


This is all I get from that server:

```bash
             dim[23674] = 0
             dim[23675] = 0
             dim[23676] = 0
             dim[23677] = 0
             dim[23678] = 0
             dim[23679] = 0
             dim[23680] = 0
             dim[23681] = 0
             dim[23682] = 0
             dim[23683] = 0
             dim[23684] = 0
             dim[23685] = 0
             dim[23686] = 0
             dim[23687] = 0
             dim[23688] = 0
             dim[23689] = 0
             dim[23690] = 0
             dim[23691] = 0
    var.store %24 %50 @loc="44:43";
    branch ^block_4;

  ^block_7:
    %51 : hat.buffer.F32Array = var.load %8 @loc="47:17";
    %52 : hat.KernelContext = var.load %5 @loc="47:31";
    %53 : int = field.load %52 @"hat.KernelContext::x()int" @loc="47:31";
    %54 : int = var.load %9 @loc="47:38";
    %55 : int = mul %53 %54 @loc="47:31";
    %56 : int = var.load %16 @loc="47:45";
    %57 : int = add %55 %56 @loc="47:31";
    %58 : long = conv %57 @loc="47:17";
    %59 : float = var.load %22 @loc="47:48";
    invoke %51 %58 %59 @"hat.buffer.F32Array::array(long, float)void" @loc="47:17";
    branch ^block_8;

  ^block_8:
    %60 : int = var.load %16 @loc="42:39";
    %61 : int = constant @"1" @loc="42:39";
    %62 : int = add %60 %61 @loc="42:39";
    var.store %16 %62 @loc="42:39";
    branch ^block_2;

  ^block_9:
    branch ^block_11;

  ^block_10:
    branch ^block_11;

  ^block_11:
    return @loc="39:5";
};
#define NDRANGE_OPENCL
#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics : enable
#ifndef NULL
#define NULL 0
#endif
#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics : enable
typedef char s8_t;
typedef char byte;
typedef char boolean;
typedef unsigned char u8_t;
typedef short s16_t;
typedef unsigned short u16_t;
typedef unsigned int u32_t;
typedef int s32_t;
typedef float f32_t;
typedef long s64_t;
typedef unsigned long u64_t;
typedef struct KernelContext_s{
    int x;
    int maxX;
}KernelContext_t;
typedef struct F32Array_s{
    int length;
    float array[1];
}F32Array_t;



__kernel void matrixMultiplyKernel(
    __global KernelContext_t *global_kc, __global F32Array_t* matrixA, __global F32Array_t* matrixB, __global F32Array_t* matrixC, int size
){
    KernelContext_t mine;
    KernelContext_t* kc=&mine;
    kc->x=get_global_id(0);
    kc->maxX=global_kc->maxX;
    if(kc->x<kc->maxX){
        for(int j = 0; j<size; j=j+1){
            float acc = (float)0;
            for(int k = 0; k<size; k=k+1){
                acc=acc+matrixA->array[(long)(kc->x*size+k)]*matrixB->array[(long)(k*size+j)];
            }
            matrixC->array[(long)(kc->x*size+j)]=acc;
        }
    }
    return;
}

clCreateProgramWithSource failed
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007adaa0f5c2f0, pid=710525, tid=710526
#
# JRE version: OpenJDK Runtime Environment (24.0) (build 24-internal-adhoc.juan.babylon)
# Java VM: OpenJDK 64-Bit Server VM (24-internal-adhoc.juan.babylon, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libopencl_backend.so+0x62f0]  programOK+0x1c
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/juan/babylon/babylon/hat/core.710525)
#
# An error report file with more information is saved as:
# /home/juan/babylon/babylon/hat/hs_err_pid710525.log
[1.332s][warning][os] Loading hsdis library failed
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
 ```

There are thousands of messages with `dim[number] = 0`

My take is that the there is a problem in the runtime reading those values, and then, when the kernel is launched, which I guess it will tune the thread-scheduler based on the max number of threads per dimension, it fails.  This only happens with the OCK (oneAPI Construction Kit) .

Juan


________________________________
From: Gary Frost <gary.frost at oracle.com>
Sent: 24 January 2025 12:54
To: Juan Fumero <juan.fumero at manchester.ac.uk>; babylon-dev at openjdk.org <babylon-dev at openjdk.org>
Subject: Re: [External] : Re: Error running HAT with ComputeAorta on CPUs (Codeplay)


Juan,

Thanks for this it does indeed help,

So I think the OpenCL compile failed (which is clearly bad 😉 ), and I did discover that there is a 'crash route' if compile failed, but OpenCL failed to return a build log.

After reporting clCreateProgramWithSource to stderr the backend should have dumped both the OpenCL source and the log but still return a PTR (cast to long) of a OpenCL program (C++ ptr instance).  It seems to have returned a bad ptr (which when recast to an OpenCL program instance (when we try to validate the program from the Java side), causes the crash.

Do you see either/both a dump of the source and/or the log from compile failure on stderr?

I would definitely like to see the  OpenCL... at least.

Gary








________________________________
From: Juan Fumero <juan.fumero at manchester.ac.uk>
Sent: Friday, January 24, 2025 5:00 AM
To: Gary Frost <gary.frost at oracle.com>; babylon-dev at openjdk.org <babylon-dev at openjdk.org>
Subject: Re: [External] : Re: Error running HAT with ComputeAorta on CPUs (Codeplay)

Hi Gary,
   Yes, I do have the logs (see attached file)

Juan
________________________________
From: Gary Frost <gary.frost at oracle.com>
Sent: 23 January 2025 16:10
To: Juan Fumero <juan.fumero at manchester.ac.uk>; babylon-dev at openjdk.org <babylon-dev at openjdk.org>
Subject: Re: [External] : Re: Error running HAT with ComputeAorta on CPUs (Codeplay)

There should also be a log generated when the VM 'tumbled' 😉.  Which will help me map symbols.

Do you still have that?

Gary






________________________________
From: Juan Fumero <juan.fumero at manchester.ac.uk>
Sent: Thursday, January 23, 2025 3:07 PM
To: Gary Frost <gary.frost at oracle.com>; babylon-dev at openjdk.org <babylon-dev at openjdk.org>
Subject: [External] : Re: Error running HAT with ComputeAorta on CPUs (Codeplay)

It crashed after it failed from the clCreateProgramWithSource.

```
clCreateProgramWithSource failed
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000705e5af5c2f0, pid=405337, tid=405338
#
# JRE version: OpenJDK Runtime Environment (24.0) (build 24-internal-adhoc.juan.babylon)
# Java VM: OpenJDK 64-Bit Server VM (24-internal-adhoc.juan.babylon, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libopencl_backend.so+0x62f0]  programOK+0x1c
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/juan/babylon/babylon/hat/core.405337)
#
# An error report file with more information is saved as:
# /home/juan/babylon/babylon/hat/hs_err_pid405337.log
[1.306s][warning][os] Loading hsdis library failed
```

I guess, the program continues, and when it tried to launch, it failed due to wrong thread-scheduling?

Juan


________________________________
From: Gary Frost <gary.frost at oracle.com>
Sent: 23 January 2025 16:00
To: Juan Fumero <juan.fumero at manchester.ac.uk>; babylon-dev at openjdk.org <babylon-dev at openjdk.org>
Subject: Re: Error running HAT with ComputeAorta on CPUs (Codeplay)

Thanks for the heads-up Juan.


What error did you get?  Did it crash in native wrapper?

Gary




________________________________
From: babylon-dev <babylon-dev-retn at openjdk.org> on behalf of Juan Fumero <juan.fumero at manchester.ac.uk>
Sent: Thursday, January 23, 2025 1:00 PM
To: babylon-dev at openjdk.org <babylon-dev at openjdk.org>
Subject: Error running HAT with ComputeAorta on CPUs (Codeplay)

Hi all,
   It seems there is an error when running with the Codeplay OCK implementation:

https://github.com/uxlfoundation/oneapi-construction-kit [github.com]<https://urldefense.com/v3/__https://github.com/uxlfoundation/oneapi-construction-kit__;!!PDiH4ENfjr2_Jw!BH3TA_zDkxD54BTy-E2L4H9uKSMoZQqKld7EUauT9uHB9ICsYdlZxkS2Inwf_QeTy4wnYONS75gtKESdo6MuXjKmCveK$>

```
$ java @bldr/hatrun ffi-opencl matmul
Note: /home/juan/babylon/babylon/hat/bldr/Bldr.java uses preview features of Java SE 24.
Note: Recompile with -Xlint:preview for details.
platform{
   CL_PLATFORM_VENDOR.."Codeplay Software Ltd."
   CL_PLATFORM_VERSION."OpenCL 3.0 ComputeAorta 4.0.0 Linux x86_64 (Release, 5be5a8da)"
   CL_PLATFORM_NAME...."ComputeAorta"
         CL_DEVICE_TYPE..................... (0x73d821d10650)
         CL_DEVICE_MAX_COMPUTE_UNITS........ 0
         CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS. 0
         CL_DEVICE_MAX_WORK_GROUP_SIZE...... 127372130793944
         CL_DEVICE_MAX_MEM_ALLOC_SIZE....... 127372117500240
         CL_DEVICE_GLOBAL_MEM_SIZE.......... 127372117477136
         CL_DEVICE_LOCAL_MEM_SIZE........... 984
         CL_DEVICE_PROFILE..................
         CL_DEVICE_VERSION.................. [!s
         CL_DRIVER_VERSION..................
         CL_DEVICE_OPENCL_C_VERSION......... `K!s
         CL_DEVICE_NAME..................... 5
         CL_DEVICE_EXTENSIONS...............
         CL_DEVICE_BUILT_IN_KERNELS......... c!s
      }
```

Values are not taken correctly, and then it fails the kernel launch. If I use the Intel compute runtime, it runs fine.

Output from clinfo:

```
  Platform Name                                   ComputeAorta
Number of devices                                 1
  Device Name                                     ComputeAorta x86_64
  Device Vendor                                   Codeplay Software Ltd.
  Device Vendor ID                                0x10004
  Device Version                                  OpenCL 3.0 ComputeAorta 4.0.0 LLVM 18.1.8
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  4.0
  Device OpenCL C Version                         OpenCL C 1.2 Clang 18.1.8
  Device OpenCL C all versions                    OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_generic_address_space                                 0xc00000 (3.0.0)
                                                  __opencl_c_subgroups                                             0xc00000 (3.0.0)
                                                  __opencl_c_work_group_collective_functions                       0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_fp64                                                  0xc00000 (3.0.0)
  Latest comfornace test passed                   v2020-10-18-08
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               32
  Max clock frequency                             5260MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             1024
  Preferred work group size multiple (device)     1
  Preferred work group size multiple (kernel)     1024
  Max sub-groups per work group                   1024
  Sub-group sizes (Intel)                         8, 4, 16, 32, 1
  Preferred / native vector sizes
    char                                                16 / 16
    short                                                8 / 8
    int                                                  4 / 4
    long                                                 2 / 2
    half                                                 0 / 0        (n/a)
    float                                                4 / 4
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
```

Kind regards,
Juan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/babylon-dev/attachments/20250124/770d23b6/attachment-0001.htm>


More information about the babylon-dev mailing list