[External] : Re: Error running HAT with ComputeAorta on CPUs (Codeplay)
Juan Fumero
juan.fumero at manchester.ac.uk
Fri Jan 24 13:54:16 UTC 2025
This is all I get from that server:
```bash
dim[23674] = 0
dim[23675] = 0
dim[23676] = 0
dim[23677] = 0
dim[23678] = 0
dim[23679] = 0
dim[23680] = 0
dim[23681] = 0
dim[23682] = 0
dim[23683] = 0
dim[23684] = 0
dim[23685] = 0
dim[23686] = 0
dim[23687] = 0
dim[23688] = 0
dim[23689] = 0
dim[23690] = 0
dim[23691] = 0
var.store %24 %50 @loc="44:43";
branch ^block_4;
^block_7:
%51 : hat.buffer.F32Array = var.load %8 @loc="47:17";
%52 : hat.KernelContext = var.load %5 @loc="47:31";
%53 : int = field.load %52 @"hat.KernelContext::x()int" @loc="47:31";
%54 : int = var.load %9 @loc="47:38";
%55 : int = mul %53 %54 @loc="47:31";
%56 : int = var.load %16 @loc="47:45";
%57 : int = add %55 %56 @loc="47:31";
%58 : long = conv %57 @loc="47:17";
%59 : float = var.load %22 @loc="47:48";
invoke %51 %58 %59 @"hat.buffer.F32Array::array(long, float)void" @loc="47:17";
branch ^block_8;
^block_8:
%60 : int = var.load %16 @loc="42:39";
%61 : int = constant @"1" @loc="42:39";
%62 : int = add %60 %61 @loc="42:39";
var.store %16 %62 @loc="42:39";
branch ^block_2;
^block_9:
branch ^block_11;
^block_10:
branch ^block_11;
^block_11:
return @loc="39:5";
};
#define NDRANGE_OPENCL
#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics : enable
#ifndef NULL
#define NULL 0
#endif
#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics : enable
typedef char s8_t;
typedef char byte;
typedef char boolean;
typedef unsigned char u8_t;
typedef short s16_t;
typedef unsigned short u16_t;
typedef unsigned int u32_t;
typedef int s32_t;
typedef float f32_t;
typedef long s64_t;
typedef unsigned long u64_t;
typedef struct KernelContext_s{
int x;
int maxX;
}KernelContext_t;
typedef struct F32Array_s{
int length;
float array[1];
}F32Array_t;
__kernel void matrixMultiplyKernel(
__global KernelContext_t *global_kc, __global F32Array_t* matrixA, __global F32Array_t* matrixB, __global F32Array_t* matrixC, int size
){
KernelContext_t mine;
KernelContext_t* kc=&mine;
kc->x=get_global_id(0);
kc->maxX=global_kc->maxX;
if(kc->x<kc->maxX){
for(int j = 0; j<size; j=j+1){
float acc = (float)0;
for(int k = 0; k<size; k=k+1){
acc=acc+matrixA->array[(long)(kc->x*size+k)]*matrixB->array[(long)(k*size+j)];
}
matrixC->array[(long)(kc->x*size+j)]=acc;
}
}
return;
}
clCreateProgramWithSource failed
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007adaa0f5c2f0, pid=710525, tid=710526
#
# JRE version: OpenJDK Runtime Environment (24.0) (build 24-internal-adhoc.juan.babylon)
# Java VM: OpenJDK 64-Bit Server VM (24-internal-adhoc.juan.babylon, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C [libopencl_backend.so+0x62f0] programOK+0x1c
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/juan/babylon/babylon/hat/core.710525)
#
# An error report file with more information is saved as:
# /home/juan/babylon/babylon/hat/hs_err_pid710525.log
[1.332s][warning][os] Loading hsdis library failed
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
```
There are thousands of messages with `dim[number] = 0`
My take is that the there is a problem in the runtime reading those values, and then, when the kernel is launched, which I guess it will tune the thread-scheduler based on the max number of threads per dimension, it fails. This only happens with the OCK (oneAPI Construction Kit) .
Juan
________________________________
From: Gary Frost <gary.frost at oracle.com>
Sent: 24 January 2025 12:54
To: Juan Fumero <juan.fumero at manchester.ac.uk>; babylon-dev at openjdk.org <babylon-dev at openjdk.org>
Subject: Re: [External] : Re: Error running HAT with ComputeAorta on CPUs (Codeplay)
Juan,
Thanks for this it does indeed help,
So I think the OpenCL compile failed (which is clearly bad 😉 ), and I did discover that there is a 'crash route' if compile failed, but OpenCL failed to return a build log.
After reporting clCreateProgramWithSource to stderr the backend should have dumped both the OpenCL source and the log but still return a PTR (cast to long) of a OpenCL program (C++ ptr instance). It seems to have returned a bad ptr (which when recast to an OpenCL program instance (when we try to validate the program from the Java side), causes the crash.
Do you see either/both a dump of the source and/or the log from compile failure on stderr?
I would definitely like to see the OpenCL... at least.
Gary
________________________________
From: Juan Fumero <juan.fumero at manchester.ac.uk>
Sent: Friday, January 24, 2025 5:00 AM
To: Gary Frost <gary.frost at oracle.com>; babylon-dev at openjdk.org <babylon-dev at openjdk.org>
Subject: Re: [External] : Re: Error running HAT with ComputeAorta on CPUs (Codeplay)
Hi Gary,
Yes, I do have the logs (see attached file)
Juan
________________________________
From: Gary Frost <gary.frost at oracle.com>
Sent: 23 January 2025 16:10
To: Juan Fumero <juan.fumero at manchester.ac.uk>; babylon-dev at openjdk.org <babylon-dev at openjdk.org>
Subject: Re: [External] : Re: Error running HAT with ComputeAorta on CPUs (Codeplay)
There should also be a log generated when the VM 'tumbled' 😉. Which will help me map symbols.
Do you still have that?
Gary
________________________________
From: Juan Fumero <juan.fumero at manchester.ac.uk>
Sent: Thursday, January 23, 2025 3:07 PM
To: Gary Frost <gary.frost at oracle.com>; babylon-dev at openjdk.org <babylon-dev at openjdk.org>
Subject: [External] : Re: Error running HAT with ComputeAorta on CPUs (Codeplay)
It crashed after it failed from the clCreateProgramWithSource.
```
clCreateProgramWithSource failed
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x0000705e5af5c2f0, pid=405337, tid=405338
#
# JRE version: OpenJDK Runtime Environment (24.0) (build 24-internal-adhoc.juan.babylon)
# Java VM: OpenJDK 64-Bit Server VM (24-internal-adhoc.juan.babylon, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C [libopencl_backend.so+0x62f0] programOK+0x1c
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/juan/babylon/babylon/hat/core.405337)
#
# An error report file with more information is saved as:
# /home/juan/babylon/babylon/hat/hs_err_pid405337.log
[1.306s][warning][os] Loading hsdis library failed
```
I guess, the program continues, and when it tried to launch, it failed due to wrong thread-scheduling?
Juan
________________________________
From: Gary Frost <gary.frost at oracle.com>
Sent: 23 January 2025 16:00
To: Juan Fumero <juan.fumero at manchester.ac.uk>; babylon-dev at openjdk.org <babylon-dev at openjdk.org>
Subject: Re: Error running HAT with ComputeAorta on CPUs (Codeplay)
Thanks for the heads-up Juan.
What error did you get? Did it crash in native wrapper?
Gary
________________________________
From: babylon-dev <babylon-dev-retn at openjdk.org> on behalf of Juan Fumero <juan.fumero at manchester.ac.uk>
Sent: Thursday, January 23, 2025 1:00 PM
To: babylon-dev at openjdk.org <babylon-dev at openjdk.org>
Subject: Error running HAT with ComputeAorta on CPUs (Codeplay)
Hi all,
It seems there is an error when running with the Codeplay OCK implementation:
https://github.com/uxlfoundation/oneapi-construction-kit [github.com]<https://urldefense.com/v3/__https://github.com/uxlfoundation/oneapi-construction-kit__;!!PDiH4ENfjr2_Jw!BH3TA_zDkxD54BTy-E2L4H9uKSMoZQqKld7EUauT9uHB9ICsYdlZxkS2Inwf_QeTy4wnYONS75gtKESdo6MuXjKmCveK$>
```
$ java @bldr/hatrun ffi-opencl matmul
Note: /home/juan/babylon/babylon/hat/bldr/Bldr.java uses preview features of Java SE 24.
Note: Recompile with -Xlint:preview for details.
platform{
CL_PLATFORM_VENDOR.."Codeplay Software Ltd."
CL_PLATFORM_VERSION."OpenCL 3.0 ComputeAorta 4.0.0 Linux x86_64 (Release, 5be5a8da)"
CL_PLATFORM_NAME...."ComputeAorta"
CL_DEVICE_TYPE..................... (0x73d821d10650)
CL_DEVICE_MAX_COMPUTE_UNITS........ 0
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS. 0
CL_DEVICE_MAX_WORK_GROUP_SIZE...... 127372130793944
CL_DEVICE_MAX_MEM_ALLOC_SIZE....... 127372117500240
CL_DEVICE_GLOBAL_MEM_SIZE.......... 127372117477136
CL_DEVICE_LOCAL_MEM_SIZE........... 984
CL_DEVICE_PROFILE..................
CL_DEVICE_VERSION.................. [!s
CL_DRIVER_VERSION..................
CL_DEVICE_OPENCL_C_VERSION......... `K!s
CL_DEVICE_NAME..................... 5
CL_DEVICE_EXTENSIONS...............
CL_DEVICE_BUILT_IN_KERNELS......... c!s
}
```
Values are not taken correctly, and then it fails the kernel launch. If I use the Intel compute runtime, it runs fine.
Output from clinfo:
```
Platform Name ComputeAorta
Number of devices 1
Device Name ComputeAorta x86_64
Device Vendor Codeplay Software Ltd.
Device Vendor ID 0x10004
Device Version OpenCL 3.0 ComputeAorta 4.0.0 LLVM 18.1.8
Device Numeric Version 0xc00000 (3.0.0)
Driver Version 4.0
Device OpenCL C Version OpenCL C 1.2 Clang 18.1.8
Device OpenCL C all versions OpenCL C 0x402000 (1.2.0)
OpenCL C 0x401000 (1.1.0)
OpenCL C 0x400000 (1.0.0)
OpenCL C 0xc00000 (3.0.0)
Device OpenCL C features __opencl_c_generic_address_space 0xc00000 (3.0.0)
__opencl_c_subgroups 0xc00000 (3.0.0)
__opencl_c_work_group_collective_functions 0xc00000 (3.0.0)
__opencl_c_int64 0xc00000 (3.0.0)
__opencl_c_fp64 0xc00000 (3.0.0)
Latest comfornace test passed v2020-10-18-08
Device Type CPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 32
Max clock frequency 5260MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 1024
Preferred work group size multiple (device) 1
Preferred work group size multiple (kernel) 1024
Max sub-groups per work group 1024
Sub-group sizes (Intel) 8, 4, 16, 32, 1
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 0 (n/a)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
```
Kind regards,
Juan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/babylon-dev/attachments/20250124/770d23b6/attachment-0001.htm>
More information about the babylon-dev
mailing list