[aarch64-port-dev ] Sporadic crashes on aarch64 after switching from OpenJDK 9 to 10
Andrew Haley
aph at redhat.com
Tue Jun 19 16:19:35 UTC 2018
On 06/19/2018 04:44 PM, Stephan Bergmann wrote:
> I unfortunately have a bit of a complex set up, but maybe somebody has
> an idea how to debug this further:
>
> I am doing Flatpak builds of LibreOffice. Those builds used to use
> OpenJDK 9. When I tried to switch to OpenJDK 10, builds for aarch64
> started to fail, in one of LibreOffice's tests that instantiates a JVM
> in a (C++) process. Builds for other platforms (arm 32-bit, x86 32- and
> 64-bit) did not fail.
>
> I unsuccessfully tried to reproduce the failure on various aarch64
> machines (with varying 4K and 64K PAGE_SIZE); the only kind of machine I
> could reproduce it on (not fully reliably, but around 50% of the time)
> is massively parallel 64-core machines (which are routinely used for
> those Flatpak builds).
>
> The symptom is always a SIGSEGV in a thread whose gdb backtrace shows
> just a single frame of apparently JIT-generated code (i.e., outside any
> .so). A typical such case is
>
>> (gdb) disas 0x0000ffff8a1d6d80,+300
>> Dump of assembler code from 0xffff8a1d6d80 to 0xffff8a1d6eac:
>> 0x0000ffff8a1d6d80: .inst 0xffffffff ; undefined
>> 0x0000ffff8a1d6d84: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6d88: adrp x12, 0x100003add7000
>> 0x0000ffff8a1d6d8c: .inst 0x00386024 ; NYI
>> 0x0000ffff8a1d6d90: .inst 0x60806000 ; undefined
>> 0x0000ffff8a1d6d94: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6d98: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6d9c: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6da0: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6da4: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6da8: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6dac: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6db0: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6db4: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6db8: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6dbc: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6dc0: ldr w8, [x1,#8]
>> 0x0000ffff8a1d6dc4: cmp w9, w8
>> 0x0000ffff8a1d6dc8: b.eq 0xffff8a1d6e00
>> 0x0000ffff8a1d6dcc: adrp x8, 0xffff8263c000
>> 0x0000ffff8a1d6dd0: add x8, x8, #0x700
>> 0x0000ffff8a1d6dd4: br x8
>> 0x0000ffff8a1d6dd8: nop
>> 0x0000ffff8a1d6ddc: nop
>> 0x0000ffff8a1d6de0: nop
>> 0x0000ffff8a1d6de4: nop
>> 0x0000ffff8a1d6de8: nop
>> 0x0000ffff8a1d6dec: nop
>> 0x0000ffff8a1d6df0: nop
>> 0x0000ffff8a1d6df4: nop
>> 0x0000ffff8a1d6df8: nop
>> 0x0000ffff8a1d6dfc: nop
>> 0x0000ffff8a1d6e00: nop
>> 0x0000ffff8a1d6e04: sub x9, sp, #0x14, lsl #12
>> 0x0000ffff8a1d6e08: str xzr, [x9]
>> 0x0000ffff8a1d6e0c: sub sp, sp, #0x40
>> 0x0000ffff8a1d6e10: stp x29, x30, [sp,#48]
>> 0x0000ffff8a1d6e14: ldr w0, [x1,#28]
>> 0x0000ffff8a1d6e18: ldp x29, x30, [sp,#48]
>> 0x0000ffff8a1d6e1c: add sp, sp, #0x40
>> 0x0000ffff8a1d6e20: ldr x8, [x28,#112]
>> => 0x0000ffff8a1d6e24: ldr wzr, [x8]
>> 0x0000ffff8a1d6e28: ret
>> 0x0000ffff8a1d6e2c: nop
>> 0x0000ffff8a1d6e30: nop
>> 0x0000ffff8a1d6e34: ldr x0, [x28,#736]
>> 0x0000ffff8a1d6e38: str xzr, [x28,#736]
>> 0x0000ffff8a1d6e3c: str xzr, [x28,#744]
>> 0x0000ffff8a1d6e40: ldp x29, x30, [sp,#48]
>> 0x0000ffff8a1d6e44: add sp, sp, #0x40
>> 0x0000ffff8a1d6e48: adrp x8, 0xffff8266e000
>> 0x0000ffff8a1d6e4c: add x8, x8, #0x200
>> 0x0000ffff8a1d6e50: br x8
>> 0x0000ffff8a1d6e54: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6e58: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6e5c: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6e60: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6e64: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6e68: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6e6c: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6e70: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6e74: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6e78: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6e7c: .inst 0x00000000 ; undefined
>> 0x0000ffff8a1d6e80: adrp x8, 0xffff82670000
>> 0x0000ffff8a1d6e84: add x8, x8, #0x900
>> 0x0000ffff8a1d6e88: blr x8
>> 0x0000ffff8a1d6e8c: stp x0, x1, [sp,#-256]!
>> 0x0000ffff8a1d6e90: stp x2, x3, [sp,#16]
>> 0x0000ffff8a1d6e94: stp x4, x5, [sp,#32]
>> 0x0000ffff8a1d6e98: stp x6, x7, [sp,#48]
>> 0x0000ffff8a1d6e9c: stp x8, x9, [sp,#64]
>> 0x0000ffff8a1d6ea0: stp x10, x11, [sp,#80]
>> 0x0000ffff8a1d6ea4: stp x12, x13, [sp,#96]
>> 0x0000ffff8a1d6ea8: stp x14, x15, [sp,#112]
>> End of assembler dump.
>
> where x8 points at no memory (0xffff99d52008 in this case). The details
> of the code differ across crashes, but it appears to always be a "ldr
> wzr, [x8]" that triggers the SIGSEGV. There are more than 100 threads,
> most of which appear to be JVM housekeeping ones (compilation, gc; I
> have lengthy gdb "thread apply all backtrace full" output that I could
> provide.)
That's a safepoint SEGV. It's deliberate. If you step at that
point you'll enter the safepoint code.
> I could no longer reproduce the failure when I either made LibreOffice
> instantiate the in-process JVM with -Xint to force interpreted mode, or
> built OpenJDK with --with-debug-level=fastdebug instead of
> --with-debug-level=release. (I tried a handful of times each; but as
> the failure isn't reliably reproducible, that might of course also have
> just been luck.)
>
> The OpenJDK in the Flatpak build environment is
> http://hg.openjdk.java.net/jdk-updates/jdk10u tag jdk-10.0.1+10 (at
> <https://github.com/flathub/org.freedesktop.Sdk.Extension.openjdk10>,
> which in turn uses the sources packaged by
> <https://src.fedoraproject.org/rpms/java-openjdk/branch/jdk-10>). I
> also tried replacing that with current tip of that branch, but that
> didn't make a difference (it felt like the failure happened less often,
> like only 10% of the time, but again, that might just have been luck).
>
> I have only restricted access to that 64-core machine, and the only
> viable way for me to test the issue is via the Flatpak build environment
> (e.g., I cannot easily download another OpenJDK 10 build to test
> against). The failing LibreOffice test itself is also somewhat complex,
> and it would likely not be easy to strip it down to a small reproducer.
>
> Thoughts, anyone?
What exactly is the failure?
You should have an error message and an error log file.
--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the aarch64-port-dev
mailing list