OpenJDK compile hangs on PPC64

Severin Gehwolf sgehwolf at redhat.com
Thu Nov 6 10:22:26 UTC 2014


Hi Volker,

Sorry it took so long to get back to you. The PPC system I use for
building isn't all the time available to me :(

On Fri, 2014-10-24 at 11:14 +0200, Volker Simonis wrote:
> Hi Severin,
> 
> as a quick workaround, you could download the IBM J9 from
> http://www.ibm.com/developerworks/java/jdk/beta/ to build a new
> version of OpenJDK 8 from scratch. That shouldn't have any problems.
> 
> If that still hangs, could you please provide some Java stack traces
> (i.e. by using jstack or by sending the Java process a SIGQUIT
> signal).
> 
> Regards,
> Volker
> 
> 
> On Thu, Oct 23, 2014 at 3:45 PM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
> > Hi,
> >
> > I've compiled the OpenJDK Zero variant on PPC 64 (BE) a couple of times.
> > Out of a total of say 6 compiles it "hang" twice. I think it got stuck
> > somewhere compiling JDK classes.
> >
> > The boot JDK which I've used for building is OpenJDK 8 with the PPC64
> > JIT. The host OS is RHEL-7. I see the following showing up in dmesg
> > multiple times:
> >
> > [188362.213095] CPU: 16 PID: 10678 Comm: java Not tainted 3.10.0-123.el7.ppc64 #1
> > [188362.213100] Call Trace:
> > [188362.213105] [c00000006297ae40] [c0000000000166e8] .show_stack+0x78/0x320 (unreliable)
> > [188362.213113] [c00000006297af00] [c0000000008c6618] .dump_stack+0x28/0x3c
> > [188362.213119] [c00000006297af70] [c0000000001aa308] .rcu_check_callbacks+0x488/0x990
> > [188362.213126] [c00000006297b0b0] [c0000000000c9334] .update_process_times+0x54/0xa0
> > [188362.213132] [c00000006297b140] [c000000000144920] .tick_sched_timer+0x80/0x150
> > [188362.213138] [c00000006297b1e0] [c0000000000f177c] .__run_hrtimer+0xac/0x370
> > [188362.213144] [c00000006297b280] [c0000000000f28b8] .hrtimer_interrupt+0x138/0x320
> > [188362.213150] [c00000006297b390] [c00000000001f130] .timer_interrupt+0x120/0x2f0
> > [188362.213156] [c00000006297b440] [c000000000002614] decrementer_common+0x114/0x180
> > [188362.213162] --- Exception: 901 at .arch_local_irq_restore+0xf0/0x150
> > [188362.213162]     LR = .arch_local_irq_restore+0xf0/0x150
> > [188362.213169] [c00000006297b730] [c0000000001488a0] .get_futex_key+0x310/0x3d0 (unreliable)
> > [188362.213176] [c00000006297b7a0] [c000000000051e14] .get_user_pages_fast+0x2a4/0x440
> > [188362.213181] [c00000006297b8a0] [c0000000001486e4] .get_futex_key+0x154/0x3d0
> > [188362.213187] [c00000006297b950] [c000000000149460] .futex_wake+0x50/0x1c0
> > [188362.213192] [c00000006297ba10] [c00000000014ac84] .do_futex+0x234/0xec0
> > [188362.213197] [c00000006297bb50] [c00000000014ba2c] .SyS_futex+0x11c/0x1d0
> > [188362.213202] [c00000006297bc20] [c0000000000a8dc0] .mm_release+0x130/0x190
> > [188362.213208] [c00000006297bca0] [c0000000000b7000] .do_exit+0x190/0xb60
> > [188362.213213] [c00000006297bda0] [c0000000000b7b58] .SyS_exit_group+0x48/0xf0
> > [188362.213219] [c00000006297be30] [c000000000009e7c] syscall_exit+0x0/0x7c
> >
> > PID 10678 was the java (javac) process that got stuck.
> >

I'm still seeing this problem quite frequently.

Boot JDK (ppc64 BE)
$ java -version
openjdk version "1.8.0_20"
OpenJDK Runtime Environment (build 1.8.0_20-b26)
OpenJDK 64-Bit Server VM (build 25.20-b23, mixed mode)

I'm trying to build the Zero variant via with the above boot JDK. Config
+ tools are:
-------------------------------------------------
A new configuration has been successfully created in
 /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release
using configure arguments '--with-boot-jdk=/etc/alternatives/java_sdk_1.8.0 --with-debug-level=release --disable-zip-debug-info --enable-unlimited-crypto --with-zlib=system --with-giflib=system --with-stdc++lib=dynamic --with-num-cores=8 --with-jvm-variants=zero'.

Configuration summary:
* Debug level:    release
* HS debug level: product
* JDK variant:    normal
* JVM variants:   zero
* OpenJDK target: OS: linux, CPU architecture: ppc, address length: 64

Tools summary:
* Boot JDK:       openjdk version "1.8.0_20" OpenJDK Runtime Environment (build 1.8.0_20-b26) OpenJDK 64-Bit Server VM (build 25.20-b23, mixed mode)  (at /etc/alternatives/java_sdk_1.8.0)
* Toolchain:      gcc (GNU Compiler Collection)
* C Compiler:     Version 4.8.2 (at /bin/gcc)
* C++ Compiler:   Version 4.8.2 (at /bin/g++)
-------------------------------------------

It get's looked up with:
"kernel:BUG: soft lockup - CPU#13 stuck for 22s! [javac:30304]"

There seems to be only one defunct javac process (Zombie):
$ ps ax | grep java
30280 pts/1    S+     0:00 /bin/sh -c ( /etc/alternatives/java_sdk_1.8.0/bin/javac -XDignore.symbol.file=true -g -Xlint:all,-deprecation -Werror   -implicit:none -sourcepath "/home/openjdk-tester/openjdk9-hs-comp/jdk/make/src/classes" -d /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist  @/home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch.tmp && /bin/mv /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch.tmp /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch)
30281 pts/1    S+     0:00 /bin/sh -c ( /etc/alternatives/java_sdk_1.8.0/bin/javac -XDignore.symbol.file=true -g -Xlint:all,-deprecation -Werror   -implicit:none -sourcepath "/home/openjdk-tester/openjdk9-hs-comp/jdk/make/src/classes" -d /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist  @/home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch.tmp && /bin/mv /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch.tmp /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch)
30282 pts/1    Zl+    0:05 [javac] <defunct>

Not sure what happened with process 30304. Perhaps the system cleaned it
up before I could investigate anything further.


$ tail -n40 /var/log/messages 
Nov  6 11:10:09 ibm-p730-06-lp1 kernel: [c0000003c94031e0] [c0000000000f177c] .__run_hrtimer+0xac/0x370
Nov  6 11:10:09 ppc-host kernel: [c0000003c9403280] [c0000000000f28b8] .hrtimer_interrupt+0x138/0x320
Nov  6 11:10:09 ppc-host kernel: [c0000003c9403390] [c00000000001f130] .timer_interrupt+0x120/0x2f0
Nov  6 11:10:09 ppc-host kernel: [c0000003c9403440] [c000000000002614] decrementer_common+0x114/0x180
Nov  6 11:10:09 ppc-host kernel: --- Exception: 901 at .arch_local_irq_restore+0xf0/0x150
    LR = .arch_local_irq_restore+0xf0/0x150
Nov  6 11:10:09 ppc-host kernel: [c0000003c9403730] [c0000000001486e4] .get_futex_key+0x154/0x3d0 (unreliable)
Nov  6 11:10:09 ppc-host kernel: [c0000003c94037a0] [c000000000051e14] .get_user_pages_fast+0x2a4/0x440
Nov  6 11:10:09 ppc-host kernel: [c0000003c94038a0] [c0000000001486e4] .get_futex_key+0x154/0x3d0
Nov  6 11:10:09 ppc-host kernel: [c0000003c9403950] [c000000000149460] .futex_wake+0x50/0x1c0
Nov  6 11:10:09 ppc-host kernel: [c0000003c9403a10] [c00000000014ac84] .do_futex+0x234/0xec0
Nov  6 11:10:09 ppc-host kernel: [c0000003c9403b50] [c00000000014ba2c] .SyS_futex+0x11c/0x1d0
Nov  6 11:10:09 ppc-host kernel: [c0000003c9403c20] [c0000000000a8dc0] .mm_release+0x130/0x190
Nov  6 11:10:09 ppc-host kernel: [c0000003c9403ca0] [c0000000000b7000] .do_exit+0x190/0xb60
Nov  6 11:10:09 ppc-host kernel: [c0000003c9403da0] [c0000000000b7b58] .SyS_exit_group+0x48/0xf0
Nov  6 11:10:09 ppc-host kernel: [c0000003c9403e30] [c000000000009e7c] syscall_exit+0x0/0x7c
Nov  6 11:13:09 ppc-host kernel: INFO: rcu_sched self-detected stall on CPU { 13}  (t=114018 jiffies g=38182 c=38181 q=0)
Nov  6 11:13:09 ppc-host kernel: CPU: 13 PID: 30304 Comm: javac Not tainted 3.10.0-123.el7.ppc64 #1
Nov  6 11:13:09 ppc-host kernel: Call Trace:
Nov  6 11:13:09 ppc-host kernel: [c0000003c9402eb0] [c0000000000166e8] .show_stack+0x78/0x320 (unreliable)
Nov  6 11:13:09 ppc-host kernel: [c0000003c9402f70] [c0000000008c6618] .dump_stack+0x28/0x3c
Nov  6 11:13:09 ppc-host kernel: [c0000003c9402fe0] [c0000000001aa308] .rcu_check_callbacks+0x488/0x990
Nov  6 11:13:09 ppc-host kernel: [c0000003c9403120] [c0000000000c9334] .update_process_times+0x54/0xa0
Nov  6 11:13:09 ppc-host kernel: [c0000003c94031b0] [c000000000144920] .tick_sched_timer+0x80/0x150
Nov  6 11:13:09 ppc-host kernel: [c0000003c9403250] [c0000000000f177c] .__run_hrtimer+0xac/0x370
Nov  6 11:13:09 ppc-host kernel: [c0000003c94032f0] [c0000000000f28b8] .hrtimer_interrupt+0x138/0x320
Nov  6 11:13:09 ppc-host kernel: [c0000003c9403400] [c00000000001f130] .timer_interrupt+0x120/0x2f0
Nov  6 11:13:09 ppc-host kernel: [c0000003c94034b0] [c000000000002614] decrementer_common+0x114/0x180
Nov  6 11:13:09 ppc-host kernel: --- Exception: 901 at .arch_local_irq_restore+0xf0/0x150
    LR = .arch_local_irq_restore+0xf0/0x150
Nov  6 11:13:09 ppc-host kernel: [c0000003c94037a0] [c000000000051f3c] .get_user_pages_fast+0x3cc/0x440 (unreliable)
Nov  6 11:13:09 ppc-host kernel: [c0000003c9403810] [c00000000021fa60] .put_compound_page+0x2b0/0x330
Nov  6 11:13:09 ppc-host kernel: [c0000003c94038a0] [c0000000001488a0] .get_futex_key+0x310/0x3d0
Nov  6 11:13:09 ppc-host kernel: [c0000003c9403950] [c000000000149460] .futex_wake+0x50/0x1c0
Nov  6 11:13:09 ppc-host kernel: [c0000003c9403a10] [c00000000014ac84] .do_futex+0x234/0xec0
Nov  6 11:13:09 ppc-host kernel: [c0000003c9403b50] [c00000000014ba2c] .SyS_futex+0x11c/0x1d0
Nov  6 11:13:09 ppc-host kernel: [c0000003c9403c20] [c0000000000a8dc0] .mm_release+0x130/0x190
Nov  6 11:13:09 ppc-host kernel: [c0000003c9403ca0] [c0000000000b7000] .do_exit+0x190/0xb60
Nov  6 11:13:09 ppc-host kernel: [c0000003c9403da0] [c0000000000b7b58] .SyS_exit_group+0x48/0xf0
Nov  6 11:13:09 ppc-host kernel: [c0000003c9403e30] [c000000000009e7c] syscall_exit+0x0/0x7c

Since the stuck javac/java/jar processes are Zombies I cannot run jstack on them.

$ ps 30282
  PID TTY      STAT   TIME COMMAND
30282 pts/1    Zl+    0:05 [javac] <defunct>
$ jstack 30282
30282: Unable to open socket file: target process not responding or HotSpot VM not loaded

Thoughts?

Cheers,
Severin



More information about the ppc-aix-port-dev mailing list