[aarch64-port-dev ] Crashes while building aarch64 port using icedtea 2.5.1

Thu Jul 24 09:41:24 UTC 2014

Hi Fridrich,

On Thu, 2014-07-24 at 07:46 +0200, Fridrich Strba wrote:
>#
># A fatal error has been detected by the Java Runtime Environment:
>#
>#  SIGSEGV (0xb) at pc=0x0000004001c074b4, pid=32573, tid=274904080880

OK, I have had a look at this and discussed this with our QEMU expert and we think we may know what is going on.

Instructions: (pc=0x0000004001c074b4)
> 0x0000004001c07494:   20 f9 ff 58 42 4c 1d 12 5f 00 00 71 20 02 00 54
> 0x0000004001c074a4:   3f 0c 00 b9 fd 7b 44 a9 ff 43 01 91 e8 ec ff f0
> 0x0000004001c074b4:   1f 01 40 b9 c0 03 5f d6 e0 07 00 f9 08 00 80 92
> 0x0000004001c074c4:   e8 03 00 f9 8e be fd 97 da ff ff 17 e0 07 00 f9
> 

Disassembling the above hex gives us:-

   0x411028 <tri>:	ldr	x0, 0x410f4c
   0x41102c <tri+4>:	and	w2, w2, #0x7ffff8
   0x411030 <tri+8>:	cmp	w2, #0x0
   0x411034 <tri+12>:	b.eq	0x411078
   0x411038 <tri+16>:	str	wzr, [x1,#12]
   0x41103c <tri+20>:	ldp	x29, x30, [sp,#64]
   0x411040 <tri+24>:	add	sp, sp, #0x50
   0x411044 <tri+28>:	adrp	x8, 0x1b0000
   0x411048 <tri+32>:	ldr	wzr, [x8]    ;;; <<< Fault address
   0x41104c <tri+36>:	ret
   0x411050 <tri+40>:	str	x0, [sp,#8]
   0x411054 <tri+44>:	mov	x8, #0xffffffffffffffff    	// #-1
   0x411058 <tri+48>:	str	x8, [sp]
   0x41105c <tri+52>:	bl	0x380a94
   0x411060 <tri+56>:	b	0x410fc8
   0x411064 <tri+60>:	str	x0, [sp,#8]

So it is faulting on the instruction "ldr wzr, [x8]". This is a test of the polling page. What OpenJDK does is, when it wants to do a GC it runs all threads to a safe point, ie a point in the compiled code where it knows the location of all object references. It does this by read protecting the polling page. The polling page is polled at well known safe points such as (in the above case) a return from a method.

OpenJDK then traps the resultant SIG SEGV and when all threads have run to a safe point it is safe to do a GC.

However, it would seem that in this case although the signal is being thrown, it is not being caught correctly by OpenJDK.

The first question is whether you are running this is user emulation or system emulation. My QEMU expert tells me that in user emulation mode signals may not be delivered to the correct thread.

He has pointed me at the following QEMU patch which may help, but is not a fix.

Otherwise I am afraid the answer is to run in system emulation mode, or run on real HW.

All the best,
Edward Nevill

--- CUT HERE ---
From: Alexander Graf <agraf at suse.de>
Date: Tue, 10 Jul 2012 20:40:55 +0200
Subject: linux-user: Run multi-threaded code on a single core

Running multi-threaded code can easily expose some of the fundamental
breakages in QEMU's design. It's just not a well supported scenario.

So if we pin the whole process to a single host CPU, we guarantee that
we will never have concurrent memory access actually happen. We can still
get scheduled away at any time, so it's no complete guarantee, but apparently
it reduces the odds well enough to get my test cases to pass.

This gets Java 1.7 working for me again on my test box.

Signed-off-by: Alexander Graf <agraf at suse.de>
---
 linux-user/syscall.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index d62e9e6..5295afb 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -4400,6 +4400,15 @@ static int do_fork(CPUArchState *env, unsigned int flags, abi_ulong newsp,
         if (nptl_flags & CLONE_SETTLS)
             cpu_set_tls (new_env, newtls);
 
+        /* agraf: Pin ourselves to a single CPU when running multi-threaded.
+           This turned out to improve stability for me. */
+        {
+            cpu_set_t mask;
+            CPU_ZERO(&mask);
+            CPU_SET(0, &mask);
+            sched_setaffinity(0, sizeof(mask), &mask);
+        }
+
         /* Grab a mutex so that thread setup appears atomic.  */
         pthread_mutex_lock(&clone_lock);
 
--- CUT HERE ---