[aarch64-port-dev ] Sporadic crashes on aarch64 after switching from OpenJDK 9 to 10

Stephan Bergmann sbergman at redhat.com
Wed Jun 20 06:16:51 UTC 2018


On 19/06/18 18:19, Andrew Haley wrote:
> On 06/19/2018 04:44 PM, Stephan Bergmann wrote:
>> I unfortunately have a bit of a complex set up, but maybe somebody has
>> an idea how to debug this further:
>>
>> I am doing Flatpak builds of LibreOffice.  Those builds used to use
>> OpenJDK 9.  When I tried to switch to OpenJDK 10, builds for aarch64
>> started to fail, in one of LibreOffice's tests that instantiates a JVM
>> in a (C++) process.  Builds for other platforms (arm 32-bit, x86 32- and
>> 64-bit) did not fail.
>>
>> I unsuccessfully tried to reproduce the failure on various aarch64
>> machines (with varying 4K and 64K PAGE_SIZE); the only kind of machine I
>> could reproduce it on (not fully reliably, but around 50% of the time)
>> is massively parallel 64-core machines (which are routinely used for
>> those Flatpak builds).
>>
>> The symptom is always a SIGSEGV in a thread whose gdb backtrace shows
>> just a single frame of apparently JIT-generated code (i.e., outside any
>> .so).  A typical such case is
>>
>>> (gdb) disas 0x0000ffff8a1d6d80,+300
>>> Dump of assembler code from 0xffff8a1d6d80 to 0xffff8a1d6eac:
>>>     0x0000ffff8a1d6d80:	.inst	0xffffffff ; undefined
>>>     0x0000ffff8a1d6d84:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6d88:	adrp	x12, 0x100003add7000
>>>     0x0000ffff8a1d6d8c:	.inst	0x00386024 ; NYI
>>>     0x0000ffff8a1d6d90:	.inst	0x60806000 ; undefined
>>>     0x0000ffff8a1d6d94:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6d98:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6d9c:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6da0:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6da4:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6da8:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6dac:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6db0:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6db4:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6db8:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6dbc:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6dc0:	ldr	w8, [x1,#8]
>>>     0x0000ffff8a1d6dc4:	cmp	w9, w8
>>>     0x0000ffff8a1d6dc8:	b.eq	0xffff8a1d6e00
>>>     0x0000ffff8a1d6dcc:	adrp	x8, 0xffff8263c000
>>>     0x0000ffff8a1d6dd0:	add	x8, x8, #0x700
>>>     0x0000ffff8a1d6dd4:	br	x8
>>>     0x0000ffff8a1d6dd8:	nop
>>>     0x0000ffff8a1d6ddc:	nop
>>>     0x0000ffff8a1d6de0:	nop
>>>     0x0000ffff8a1d6de4:	nop
>>>     0x0000ffff8a1d6de8:	nop
>>>     0x0000ffff8a1d6dec:	nop
>>>     0x0000ffff8a1d6df0:	nop
>>>     0x0000ffff8a1d6df4:	nop
>>>     0x0000ffff8a1d6df8:	nop
>>>     0x0000ffff8a1d6dfc:	nop
>>>     0x0000ffff8a1d6e00:	nop
>>>     0x0000ffff8a1d6e04:	sub	x9, sp, #0x14, lsl #12
>>>     0x0000ffff8a1d6e08:	str	xzr, [x9]
>>>     0x0000ffff8a1d6e0c:	sub	sp, sp, #0x40
>>>     0x0000ffff8a1d6e10:	stp	x29, x30, [sp,#48]
>>>     0x0000ffff8a1d6e14:	ldr	w0, [x1,#28]
>>>     0x0000ffff8a1d6e18:	ldp	x29, x30, [sp,#48]
>>>     0x0000ffff8a1d6e1c:	add	sp, sp, #0x40
>>>     0x0000ffff8a1d6e20:	ldr	x8, [x28,#112]
>>> => 0x0000ffff8a1d6e24:	ldr	wzr, [x8]
>>>     0x0000ffff8a1d6e28:	ret
>>>     0x0000ffff8a1d6e2c:	nop
>>>     0x0000ffff8a1d6e30:	nop
>>>     0x0000ffff8a1d6e34:	ldr	x0, [x28,#736]
>>>     0x0000ffff8a1d6e38:	str	xzr, [x28,#736]
>>>     0x0000ffff8a1d6e3c:	str	xzr, [x28,#744]
>>>     0x0000ffff8a1d6e40:	ldp	x29, x30, [sp,#48]
>>>     0x0000ffff8a1d6e44:	add	sp, sp, #0x40
>>>     0x0000ffff8a1d6e48:	adrp	x8, 0xffff8266e000
>>>     0x0000ffff8a1d6e4c:	add	x8, x8, #0x200
>>>     0x0000ffff8a1d6e50:	br	x8
>>>     0x0000ffff8a1d6e54:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6e58:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6e5c:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6e60:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6e64:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6e68:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6e6c:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6e70:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6e74:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6e78:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6e7c:	.inst	0x00000000 ; undefined
>>>     0x0000ffff8a1d6e80:	adrp	x8, 0xffff82670000
>>>     0x0000ffff8a1d6e84:	add	x8, x8, #0x900
>>>     0x0000ffff8a1d6e88:	blr	x8
>>>     0x0000ffff8a1d6e8c:	stp	x0, x1, [sp,#-256]!
>>>     0x0000ffff8a1d6e90:	stp	x2, x3, [sp,#16]
>>>     0x0000ffff8a1d6e94:	stp	x4, x5, [sp,#32]
>>>     0x0000ffff8a1d6e98:	stp	x6, x7, [sp,#48]
>>>     0x0000ffff8a1d6e9c:	stp	x8, x9, [sp,#64]
>>>     0x0000ffff8a1d6ea0:	stp	x10, x11, [sp,#80]
>>>     0x0000ffff8a1d6ea4:	stp	x12, x13, [sp,#96]
>>>     0x0000ffff8a1d6ea8:	stp	x14, x15, [sp,#112]
>>> End of assembler dump.
>>
>> where x8 points at no memory (0xffff99d52008 in this case).  The details
>> of the code differ across crashes, but it appears to always be a "ldr
>> wzr, [x8]" that triggers the SIGSEGV.  There are more than 100 threads,
>> most of which appear to be JVM housekeeping ones (compilation, gc; I
>> have lengthy gdb "thread apply all backtrace full" output that I could
>> provide.)
> 
> That's a safepoint SEGV.  It's deliberate.  If you step at that
> point you'll enter the safepoint code.
> 
>> I could no longer reproduce the failure when I either made LibreOffice
>> instantiate the in-process JVM with -Xint to force interpreted mode, or
>> built OpenJDK with --with-debug-level=fastdebug instead of
>> --with-debug-level=release.  (I tried a handful of times each; but as
>> the failure isn't reliably reproducible, that might of course also have
>> just been luck.)
>>
>> The OpenJDK in the Flatpak build environment is
>> http://hg.openjdk.java.net/jdk-updates/jdk10u tag jdk-10.0.1+10 (at
>> <https://github.com/flathub/org.freedesktop.Sdk.Extension.openjdk10>,
>> which in turn uses the sources packaged by
>> <https://src.fedoraproject.org/rpms/java-openjdk/branch/jdk-10>).  I
>> also tried replacing that with current tip of that branch, but that
>> didn't make a difference (it felt like the failure happened less often,
>> like only 10% of the time, but again, that might just have been luck).
>>
>> I have only restricted access to that 64-core machine, and the only
>> viable way for me to test the issue is via the Flatpak build environment
>> (e.g., I cannot easily download another OpenJDK 10 build to test
>> against).  The failing LibreOffice test itself is also somewhat complex,
>> and it would likely not be easy to strip it down to a small reproducer.
>>
>> Thoughts, anyone?
> 
> What exactly is the failure?

The failure is that the process (running a C++ cppunittester executable, 
with an in-process instantiated JVM) terminates due to a SIGSEGV.  When 
inspecting the generated core file with gdb, it claims the above thread 
caused that "fatal" SIGSEGV.

> You should have an error message and an error log file.

There is no stdout/-err output from the JVM (only some routine 
cppunittester output).  And I cannot find any log file (you mean one of 
those .hs-err-pid* files or whatever they are called exactly, right?). 
Where should I look for it ($HOME, cwd, ...)?


More information about the aarch64-port-dev mailing list