[aarch64-port-dev ] Sporadic crashes on aarch64 after switching from OpenJDK 9 to 10

Andrew Haley aph at redhat.com
Tue Jun 19 16:19:35 UTC 2018


On 06/19/2018 04:44 PM, Stephan Bergmann wrote:
> I unfortunately have a bit of a complex set up, but maybe somebody has 
> an idea how to debug this further:
> 
> I am doing Flatpak builds of LibreOffice.  Those builds used to use 
> OpenJDK 9.  When I tried to switch to OpenJDK 10, builds for aarch64 
> started to fail, in one of LibreOffice's tests that instantiates a JVM 
> in a (C++) process.  Builds for other platforms (arm 32-bit, x86 32- and 
> 64-bit) did not fail.
> 
> I unsuccessfully tried to reproduce the failure on various aarch64 
> machines (with varying 4K and 64K PAGE_SIZE); the only kind of machine I 
> could reproduce it on (not fully reliably, but around 50% of the time) 
> is massively parallel 64-core machines (which are routinely used for 
> those Flatpak builds).
> 
> The symptom is always a SIGSEGV in a thread whose gdb backtrace shows 
> just a single frame of apparently JIT-generated code (i.e., outside any 
> .so).  A typical such case is
> 
>> (gdb) disas 0x0000ffff8a1d6d80,+300
>> Dump of assembler code from 0xffff8a1d6d80 to 0xffff8a1d6eac:
>>    0x0000ffff8a1d6d80:	.inst	0xffffffff ; undefined
>>    0x0000ffff8a1d6d84:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6d88:	adrp	x12, 0x100003add7000
>>    0x0000ffff8a1d6d8c:	.inst	0x00386024 ; NYI
>>    0x0000ffff8a1d6d90:	.inst	0x60806000 ; undefined
>>    0x0000ffff8a1d6d94:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6d98:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6d9c:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6da0:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6da4:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6da8:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6dac:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6db0:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6db4:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6db8:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6dbc:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6dc0:	ldr	w8, [x1,#8]
>>    0x0000ffff8a1d6dc4:	cmp	w9, w8
>>    0x0000ffff8a1d6dc8:	b.eq	0xffff8a1d6e00
>>    0x0000ffff8a1d6dcc:	adrp	x8, 0xffff8263c000
>>    0x0000ffff8a1d6dd0:	add	x8, x8, #0x700
>>    0x0000ffff8a1d6dd4:	br	x8
>>    0x0000ffff8a1d6dd8:	nop
>>    0x0000ffff8a1d6ddc:	nop
>>    0x0000ffff8a1d6de0:	nop
>>    0x0000ffff8a1d6de4:	nop
>>    0x0000ffff8a1d6de8:	nop
>>    0x0000ffff8a1d6dec:	nop
>>    0x0000ffff8a1d6df0:	nop
>>    0x0000ffff8a1d6df4:	nop
>>    0x0000ffff8a1d6df8:	nop
>>    0x0000ffff8a1d6dfc:	nop
>>    0x0000ffff8a1d6e00:	nop
>>    0x0000ffff8a1d6e04:	sub	x9, sp, #0x14, lsl #12
>>    0x0000ffff8a1d6e08:	str	xzr, [x9]
>>    0x0000ffff8a1d6e0c:	sub	sp, sp, #0x40
>>    0x0000ffff8a1d6e10:	stp	x29, x30, [sp,#48]
>>    0x0000ffff8a1d6e14:	ldr	w0, [x1,#28]
>>    0x0000ffff8a1d6e18:	ldp	x29, x30, [sp,#48]
>>    0x0000ffff8a1d6e1c:	add	sp, sp, #0x40
>>    0x0000ffff8a1d6e20:	ldr	x8, [x28,#112]
>> => 0x0000ffff8a1d6e24:	ldr	wzr, [x8]
>>    0x0000ffff8a1d6e28:	ret
>>    0x0000ffff8a1d6e2c:	nop
>>    0x0000ffff8a1d6e30:	nop
>>    0x0000ffff8a1d6e34:	ldr	x0, [x28,#736]
>>    0x0000ffff8a1d6e38:	str	xzr, [x28,#736]
>>    0x0000ffff8a1d6e3c:	str	xzr, [x28,#744]
>>    0x0000ffff8a1d6e40:	ldp	x29, x30, [sp,#48]
>>    0x0000ffff8a1d6e44:	add	sp, sp, #0x40
>>    0x0000ffff8a1d6e48:	adrp	x8, 0xffff8266e000
>>    0x0000ffff8a1d6e4c:	add	x8, x8, #0x200
>>    0x0000ffff8a1d6e50:	br	x8
>>    0x0000ffff8a1d6e54:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6e58:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6e5c:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6e60:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6e64:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6e68:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6e6c:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6e70:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6e74:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6e78:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6e7c:	.inst	0x00000000 ; undefined
>>    0x0000ffff8a1d6e80:	adrp	x8, 0xffff82670000
>>    0x0000ffff8a1d6e84:	add	x8, x8, #0x900
>>    0x0000ffff8a1d6e88:	blr	x8
>>    0x0000ffff8a1d6e8c:	stp	x0, x1, [sp,#-256]!
>>    0x0000ffff8a1d6e90:	stp	x2, x3, [sp,#16]
>>    0x0000ffff8a1d6e94:	stp	x4, x5, [sp,#32]
>>    0x0000ffff8a1d6e98:	stp	x6, x7, [sp,#48]
>>    0x0000ffff8a1d6e9c:	stp	x8, x9, [sp,#64]
>>    0x0000ffff8a1d6ea0:	stp	x10, x11, [sp,#80]
>>    0x0000ffff8a1d6ea4:	stp	x12, x13, [sp,#96]
>>    0x0000ffff8a1d6ea8:	stp	x14, x15, [sp,#112]
>> End of assembler dump.
> 
> where x8 points at no memory (0xffff99d52008 in this case).  The details 
> of the code differ across crashes, but it appears to always be a "ldr 
> wzr, [x8]" that triggers the SIGSEGV.  There are more than 100 threads, 
> most of which appear to be JVM housekeeping ones (compilation, gc; I 
> have lengthy gdb "thread apply all backtrace full" output that I could 
> provide.)

That's a safepoint SEGV.  It's deliberate.  If you step at that
point you'll enter the safepoint code.

> I could no longer reproduce the failure when I either made LibreOffice 
> instantiate the in-process JVM with -Xint to force interpreted mode, or 
> built OpenJDK with --with-debug-level=fastdebug instead of 
> --with-debug-level=release.  (I tried a handful of times each; but as 
> the failure isn't reliably reproducible, that might of course also have 
> just been luck.)
> 
> The OpenJDK in the Flatpak build environment is 
> http://hg.openjdk.java.net/jdk-updates/jdk10u tag jdk-10.0.1+10 (at 
> <https://github.com/flathub/org.freedesktop.Sdk.Extension.openjdk10>, 
> which in turn uses the sources packaged by 
> <https://src.fedoraproject.org/rpms/java-openjdk/branch/jdk-10>).  I 
> also tried replacing that with current tip of that branch, but that 
> didn't make a difference (it felt like the failure happened less often, 
> like only 10% of the time, but again, that might just have been luck).
> 
> I have only restricted access to that 64-core machine, and the only 
> viable way for me to test the issue is via the Flatpak build environment 
> (e.g., I cannot easily download another OpenJDK 10 build to test 
> against).  The failing LibreOffice test itself is also somewhat complex, 
> and it would likely not be easy to strip it down to a small reproducer.
> 
> Thoughts, anyone?

What exactly is the failure?

You should have an error message and an error log file.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


More information about the aarch64-port-dev mailing list