[aarch64-port-dev ] Sporadic crashes on aarch64 after switching from OpenJDK 9 to 10

Stephan Bergmann sbergman at redhat.com
Tue Jun 19 15:44:48 UTC 2018


I unfortunately have a bit of a complex set up, but maybe somebody has 
an idea how to debug this further:

I am doing Flatpak builds of LibreOffice.  Those builds used to use 
OpenJDK 9.  When I tried to switch to OpenJDK 10, builds for aarch64 
started to fail, in one of LibreOffice's tests that instantiates a JVM 
in a (C++) process.  Builds for other platforms (arm 32-bit, x86 32- and 
64-bit) did not fail.

I unsuccessfully tried to reproduce the failure on various aarch64 
machines (with varying 4K and 64K PAGE_SIZE); the only kind of machine I 
could reproduce it on (not fully reliably, but around 50% of the time) 
is massively parallel 64-core machines (which are routinely used for 
those Flatpak builds).

The symptom is always a SIGSEGV in a thread whose gdb backtrace shows 
just a single frame of apparently JIT-generated code (i.e., outside any 
.so).  A typical such case is

> (gdb) disas 0x0000ffff8a1d6d80,+300
> Dump of assembler code from 0xffff8a1d6d80 to 0xffff8a1d6eac:
>    0x0000ffff8a1d6d80:	.inst	0xffffffff ; undefined
>    0x0000ffff8a1d6d84:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6d88:	adrp	x12, 0x100003add7000
>    0x0000ffff8a1d6d8c:	.inst	0x00386024 ; NYI
>    0x0000ffff8a1d6d90:	.inst	0x60806000 ; undefined
>    0x0000ffff8a1d6d94:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6d98:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6d9c:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6da0:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6da4:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6da8:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6dac:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6db0:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6db4:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6db8:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6dbc:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6dc0:	ldr	w8, [x1,#8]
>    0x0000ffff8a1d6dc4:	cmp	w9, w8
>    0x0000ffff8a1d6dc8:	b.eq	0xffff8a1d6e00
>    0x0000ffff8a1d6dcc:	adrp	x8, 0xffff8263c000
>    0x0000ffff8a1d6dd0:	add	x8, x8, #0x700
>    0x0000ffff8a1d6dd4:	br	x8
>    0x0000ffff8a1d6dd8:	nop
>    0x0000ffff8a1d6ddc:	nop
>    0x0000ffff8a1d6de0:	nop
>    0x0000ffff8a1d6de4:	nop
>    0x0000ffff8a1d6de8:	nop
>    0x0000ffff8a1d6dec:	nop
>    0x0000ffff8a1d6df0:	nop
>    0x0000ffff8a1d6df4:	nop
>    0x0000ffff8a1d6df8:	nop
>    0x0000ffff8a1d6dfc:	nop
>    0x0000ffff8a1d6e00:	nop
>    0x0000ffff8a1d6e04:	sub	x9, sp, #0x14, lsl #12
>    0x0000ffff8a1d6e08:	str	xzr, [x9]
>    0x0000ffff8a1d6e0c:	sub	sp, sp, #0x40
>    0x0000ffff8a1d6e10:	stp	x29, x30, [sp,#48]
>    0x0000ffff8a1d6e14:	ldr	w0, [x1,#28]
>    0x0000ffff8a1d6e18:	ldp	x29, x30, [sp,#48]
>    0x0000ffff8a1d6e1c:	add	sp, sp, #0x40
>    0x0000ffff8a1d6e20:	ldr	x8, [x28,#112]
> => 0x0000ffff8a1d6e24:	ldr	wzr, [x8]
>    0x0000ffff8a1d6e28:	ret
>    0x0000ffff8a1d6e2c:	nop
>    0x0000ffff8a1d6e30:	nop
>    0x0000ffff8a1d6e34:	ldr	x0, [x28,#736]
>    0x0000ffff8a1d6e38:	str	xzr, [x28,#736]
>    0x0000ffff8a1d6e3c:	str	xzr, [x28,#744]
>    0x0000ffff8a1d6e40:	ldp	x29, x30, [sp,#48]
>    0x0000ffff8a1d6e44:	add	sp, sp, #0x40
>    0x0000ffff8a1d6e48:	adrp	x8, 0xffff8266e000
>    0x0000ffff8a1d6e4c:	add	x8, x8, #0x200
>    0x0000ffff8a1d6e50:	br	x8
>    0x0000ffff8a1d6e54:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6e58:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6e5c:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6e60:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6e64:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6e68:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6e6c:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6e70:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6e74:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6e78:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6e7c:	.inst	0x00000000 ; undefined
>    0x0000ffff8a1d6e80:	adrp	x8, 0xffff82670000
>    0x0000ffff8a1d6e84:	add	x8, x8, #0x900
>    0x0000ffff8a1d6e88:	blr	x8
>    0x0000ffff8a1d6e8c:	stp	x0, x1, [sp,#-256]!
>    0x0000ffff8a1d6e90:	stp	x2, x3, [sp,#16]
>    0x0000ffff8a1d6e94:	stp	x4, x5, [sp,#32]
>    0x0000ffff8a1d6e98:	stp	x6, x7, [sp,#48]
>    0x0000ffff8a1d6e9c:	stp	x8, x9, [sp,#64]
>    0x0000ffff8a1d6ea0:	stp	x10, x11, [sp,#80]
>    0x0000ffff8a1d6ea4:	stp	x12, x13, [sp,#96]
>    0x0000ffff8a1d6ea8:	stp	x14, x15, [sp,#112]
> End of assembler dump.

where x8 points at no memory (0xffff99d52008 in this case).  The details 
of the code differ across crashes, but it appears to always be a "ldr 
wzr, [x8]" that triggers the SIGSEGV.  There are more than 100 threads, 
most of which appear to be JVM housekeeping ones (compilation, gc; I 
have lengthy gdb "thread apply all backtrace full" output that I could 
provide.)

I could no longer reproduce the failure when I either made LibreOffice 
instantiate the in-process JVM with -Xint to force interpreted mode, or 
built OpenJDK with --with-debug-level=fastdebug instead of 
--with-debug-level=release.  (I tried a handful of times each; but as 
the failure isn't reliably reproducible, that might of course also have 
just been luck.)

The OpenJDK in the Flatpak build environment is 
http://hg.openjdk.java.net/jdk-updates/jdk10u tag jdk-10.0.1+10 (at 
<https://github.com/flathub/org.freedesktop.Sdk.Extension.openjdk10>, 
which in turn uses the sources packaged by 
<https://src.fedoraproject.org/rpms/java-openjdk/branch/jdk-10>).  I 
also tried replacing that with current tip of that branch, but that 
didn't make a difference (it felt like the failure happened less often, 
like only 10% of the time, but again, that might just have been luck).

I have only restricted access to that 64-core machine, and the only 
viable way for me to test the issue is via the Flatpak build environment 
(e.g., I cannot easily download another OpenJDK 10 build to test 
against).  The failing LibreOffice test itself is also somewhat complex, 
and it would likely not be easy to strip it down to a small reproducer.

Thoughts, anyone?


More information about the aarch64-port-dev mailing list