OpenJDK compile hangs on PPC64

Volker Simonis volker.simonis at gmail.com
Thu Nov 6 14:32:13 UTC 2014


Hi Severin, Andrew,

I've cc'ed Andrew as he might be able to help with some of the questions below.

Unfortunately I'm not really sure how I can help here. It seems this
is a kernel problem. After all it states:

"kernel:BUG: soft lockup - CPU#13 stuck for 22s! [javac:30304]"

Googling for that gives a lot of results but I can not say which of
them are really related to your problem. Maybe you could ask some of
yoir kernel guys at RedHat?

By the way, once a process is in "<defunct>" state you won't be able
to get any information from it. It means that the process is actually
dead and just waiting for his parent to die as well.

Following some of the things you might check:

1. Where did you got the boot jdk from? I couldn't find OpenJDK 8
packeges for RHEL 7. Is there a repository where I can get it?
2. I've now idea how your boot jdk (at
/etc/alternatives/java_sdk_1.8.0) was compiled? Maybe it was compiled
for Power N but you're running on Power N-1?
3. Did you build the boot jdk yourself? If you build it yourself,
which boot-jdk did you used for building it? If not, please build it
yourself (with the IBM JDK as boot jdk) and try if that still fails.
4. What hardware do you have (i.e. `cat /proc/cpuinfo` , `lscpu` )? Do
you use real hardware or are you running on QEMU (or something
similar)? Please post the output of `cat /proc/cpuinfo` and `lscpu`.

I've just build and afterwards bootstrapped another OpenJDK 8 build
with the created binaries on RHEL7/PPC64 without any problems. Here's
my system:

$ uname -a
Linux xxx 3.10.0-124.el7.bz1083296.ppc64 #1 SMP Wed Jun 4 06:50:40 EDT
2014 ppc64 ppc64 ppc64 GNU/Linux
$ cat /proc/cpuinfo
processor    : 0
cpu        : POWER8 (architected), altivec supported
clock        : 4116.000000MHz
revision    : 2.1 (pvr 004b 0201)


We also regularly build and test on SLES 9-11 and RHEL 5-7 on PowerPC
6-8 CPUs and I can't remember that we've ever seen the problems you
describe.

Regards,
Volker

On Thu, Nov 6, 2014 at 11:22 AM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
> Hi Volker,
>
> Sorry it took so long to get back to you. The PPC system I use for
> building isn't all the time available to me :(
>
> On Fri, 2014-10-24 at 11:14 +0200, Volker Simonis wrote:
>> Hi Severin,
>>
>> as a quick workaround, you could download the IBM J9 from
>> http://www.ibm.com/developerworks/java/jdk/beta/ to build a new
>> version of OpenJDK 8 from scratch. That shouldn't have any problems.
>>
>> If that still hangs, could you please provide some Java stack traces
>> (i.e. by using jstack or by sending the Java process a SIGQUIT
>> signal).
>>
>> Regards,
>> Volker
>>
>>
>> On Thu, Oct 23, 2014 at 3:45 PM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
>> > Hi,
>> >
>> > I've compiled the OpenJDK Zero variant on PPC 64 (BE) a couple of times.
>> > Out of a total of say 6 compiles it "hang" twice. I think it got stuck
>> > somewhere compiling JDK classes.
>> >
>> > The boot JDK which I've used for building is OpenJDK 8 with the PPC64
>> > JIT. The host OS is RHEL-7. I see the following showing up in dmesg
>> > multiple times:
>> >
>> > [188362.213095] CPU: 16 PID: 10678 Comm: java Not tainted 3.10.0-123.el7.ppc64 #1
>> > [188362.213100] Call Trace:
>> > [188362.213105] [c00000006297ae40] [c0000000000166e8] .show_stack+0x78/0x320 (unreliable)
>> > [188362.213113] [c00000006297af00] [c0000000008c6618] .dump_stack+0x28/0x3c
>> > [188362.213119] [c00000006297af70] [c0000000001aa308] .rcu_check_callbacks+0x488/0x990
>> > [188362.213126] [c00000006297b0b0] [c0000000000c9334] .update_process_times+0x54/0xa0
>> > [188362.213132] [c00000006297b140] [c000000000144920] .tick_sched_timer+0x80/0x150
>> > [188362.213138] [c00000006297b1e0] [c0000000000f177c] .__run_hrtimer+0xac/0x370
>> > [188362.213144] [c00000006297b280] [c0000000000f28b8] .hrtimer_interrupt+0x138/0x320
>> > [188362.213150] [c00000006297b390] [c00000000001f130] .timer_interrupt+0x120/0x2f0
>> > [188362.213156] [c00000006297b440] [c000000000002614] decrementer_common+0x114/0x180
>> > [188362.213162] --- Exception: 901 at .arch_local_irq_restore+0xf0/0x150
>> > [188362.213162]     LR = .arch_local_irq_restore+0xf0/0x150
>> > [188362.213169] [c00000006297b730] [c0000000001488a0] .get_futex_key+0x310/0x3d0 (unreliable)
>> > [188362.213176] [c00000006297b7a0] [c000000000051e14] .get_user_pages_fast+0x2a4/0x440
>> > [188362.213181] [c00000006297b8a0] [c0000000001486e4] .get_futex_key+0x154/0x3d0
>> > [188362.213187] [c00000006297b950] [c000000000149460] .futex_wake+0x50/0x1c0
>> > [188362.213192] [c00000006297ba10] [c00000000014ac84] .do_futex+0x234/0xec0
>> > [188362.213197] [c00000006297bb50] [c00000000014ba2c] .SyS_futex+0x11c/0x1d0
>> > [188362.213202] [c00000006297bc20] [c0000000000a8dc0] .mm_release+0x130/0x190
>> > [188362.213208] [c00000006297bca0] [c0000000000b7000] .do_exit+0x190/0xb60
>> > [188362.213213] [c00000006297bda0] [c0000000000b7b58] .SyS_exit_group+0x48/0xf0
>> > [188362.213219] [c00000006297be30] [c000000000009e7c] syscall_exit+0x0/0x7c
>> >
>> > PID 10678 was the java (javac) process that got stuck.
>> >
>
> I'm still seeing this problem quite frequently.
>
> Boot JDK (ppc64 BE)
> $ java -version
> openjdk version "1.8.0_20"
> OpenJDK Runtime Environment (build 1.8.0_20-b26)
> OpenJDK 64-Bit Server VM (build 25.20-b23, mixed mode)
>
> I'm trying to build the Zero variant via with the above boot JDK. Config
> + tools are:
> -------------------------------------------------
> A new configuration has been successfully created in
>  /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release
> using configure arguments '--with-boot-jdk=/etc/alternatives/java_sdk_1.8.0 --with-debug-level=release --disable-zip-debug-info --enable-unlimited-crypto --with-zlib=system --with-giflib=system --with-stdc++lib=dynamic --with-num-cores=8 --with-jvm-variants=zero'.
>
> Configuration summary:
> * Debug level:    release
> * HS debug level: product
> * JDK variant:    normal
> * JVM variants:   zero
> * OpenJDK target: OS: linux, CPU architecture: ppc, address length: 64
>
> Tools summary:
> * Boot JDK:       openjdk version "1.8.0_20" OpenJDK Runtime Environment (build 1.8.0_20-b26) OpenJDK 64-Bit Server VM (build 25.20-b23, mixed mode)  (at /etc/alternatives/java_sdk_1.8.0)
> * Toolchain:      gcc (GNU Compiler Collection)
> * C Compiler:     Version 4.8.2 (at /bin/gcc)
> * C++ Compiler:   Version 4.8.2 (at /bin/g++)
> -------------------------------------------
>
> It get's looked up with:
> "kernel:BUG: soft lockup - CPU#13 stuck for 22s! [javac:30304]"
>
> There seems to be only one defunct javac process (Zombie):
> $ ps ax | grep java
> 30280 pts/1    S+     0:00 /bin/sh -c ( /etc/alternatives/java_sdk_1.8.0/bin/javac -XDignore.symbol.file=true -g -Xlint:all,-deprecation -Werror   -implicit:none -sourcepath "/home/openjdk-tester/openjdk9-hs-comp/jdk/make/src/classes" -d /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist  @/home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch.tmp && /bin/mv /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch.tmp /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch)
> 30281 pts/1    S+     0:00 /bin/sh -c ( /etc/alternatives/java_sdk_1.8.0/bin/javac -XDignore.symbol.file=true -g -Xlint:all,-deprecation -Werror   -implicit:none -sourcepath "/home/openjdk-tester/openjdk9-hs-comp/jdk/make/src/classes" -d /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist  @/home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch.tmp && /bin/mv /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch.tmp /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch)
> 30282 pts/1    Zl+    0:05 [javac] <defunct>
>
> Not sure what happened with process 30304. Perhaps the system cleaned it
> up before I could investigate anything further.
>
>
> $ tail -n40 /var/log/messages
> Nov  6 11:10:09 ibm-p730-06-lp1 kernel: [c0000003c94031e0] [c0000000000f177c] .__run_hrtimer+0xac/0x370
> Nov  6 11:10:09 ppc-host kernel: [c0000003c9403280] [c0000000000f28b8] .hrtimer_interrupt+0x138/0x320
> Nov  6 11:10:09 ppc-host kernel: [c0000003c9403390] [c00000000001f130] .timer_interrupt+0x120/0x2f0
> Nov  6 11:10:09 ppc-host kernel: [c0000003c9403440] [c000000000002614] decrementer_common+0x114/0x180
> Nov  6 11:10:09 ppc-host kernel: --- Exception: 901 at .arch_local_irq_restore+0xf0/0x150
>     LR = .arch_local_irq_restore+0xf0/0x150
> Nov  6 11:10:09 ppc-host kernel: [c0000003c9403730] [c0000000001486e4] .get_futex_key+0x154/0x3d0 (unreliable)
> Nov  6 11:10:09 ppc-host kernel: [c0000003c94037a0] [c000000000051e14] .get_user_pages_fast+0x2a4/0x440
> Nov  6 11:10:09 ppc-host kernel: [c0000003c94038a0] [c0000000001486e4] .get_futex_key+0x154/0x3d0
> Nov  6 11:10:09 ppc-host kernel: [c0000003c9403950] [c000000000149460] .futex_wake+0x50/0x1c0
> Nov  6 11:10:09 ppc-host kernel: [c0000003c9403a10] [c00000000014ac84] .do_futex+0x234/0xec0
> Nov  6 11:10:09 ppc-host kernel: [c0000003c9403b50] [c00000000014ba2c] .SyS_futex+0x11c/0x1d0
> Nov  6 11:10:09 ppc-host kernel: [c0000003c9403c20] [c0000000000a8dc0] .mm_release+0x130/0x190
> Nov  6 11:10:09 ppc-host kernel: [c0000003c9403ca0] [c0000000000b7000] .do_exit+0x190/0xb60
> Nov  6 11:10:09 ppc-host kernel: [c0000003c9403da0] [c0000000000b7b58] .SyS_exit_group+0x48/0xf0
> Nov  6 11:10:09 ppc-host kernel: [c0000003c9403e30] [c000000000009e7c] syscall_exit+0x0/0x7c
> Nov  6 11:13:09 ppc-host kernel: INFO: rcu_sched self-detected stall on CPU { 13}  (t=114018 jiffies g=38182 c=38181 q=0)
> Nov  6 11:13:09 ppc-host kernel: CPU: 13 PID: 30304 Comm: javac Not tainted 3.10.0-123.el7.ppc64 #1
> Nov  6 11:13:09 ppc-host kernel: Call Trace:
> Nov  6 11:13:09 ppc-host kernel: [c0000003c9402eb0] [c0000000000166e8] .show_stack+0x78/0x320 (unreliable)
> Nov  6 11:13:09 ppc-host kernel: [c0000003c9402f70] [c0000000008c6618] .dump_stack+0x28/0x3c
> Nov  6 11:13:09 ppc-host kernel: [c0000003c9402fe0] [c0000000001aa308] .rcu_check_callbacks+0x488/0x990
> Nov  6 11:13:09 ppc-host kernel: [c0000003c9403120] [c0000000000c9334] .update_process_times+0x54/0xa0
> Nov  6 11:13:09 ppc-host kernel: [c0000003c94031b0] [c000000000144920] .tick_sched_timer+0x80/0x150
> Nov  6 11:13:09 ppc-host kernel: [c0000003c9403250] [c0000000000f177c] .__run_hrtimer+0xac/0x370
> Nov  6 11:13:09 ppc-host kernel: [c0000003c94032f0] [c0000000000f28b8] .hrtimer_interrupt+0x138/0x320
> Nov  6 11:13:09 ppc-host kernel: [c0000003c9403400] [c00000000001f130] .timer_interrupt+0x120/0x2f0
> Nov  6 11:13:09 ppc-host kernel: [c0000003c94034b0] [c000000000002614] decrementer_common+0x114/0x180
> Nov  6 11:13:09 ppc-host kernel: --- Exception: 901 at .arch_local_irq_restore+0xf0/0x150
>     LR = .arch_local_irq_restore+0xf0/0x150
> Nov  6 11:13:09 ppc-host kernel: [c0000003c94037a0] [c000000000051f3c] .get_user_pages_fast+0x3cc/0x440 (unreliable)
> Nov  6 11:13:09 ppc-host kernel: [c0000003c9403810] [c00000000021fa60] .put_compound_page+0x2b0/0x330
> Nov  6 11:13:09 ppc-host kernel: [c0000003c94038a0] [c0000000001488a0] .get_futex_key+0x310/0x3d0
> Nov  6 11:13:09 ppc-host kernel: [c0000003c9403950] [c000000000149460] .futex_wake+0x50/0x1c0
> Nov  6 11:13:09 ppc-host kernel: [c0000003c9403a10] [c00000000014ac84] .do_futex+0x234/0xec0
> Nov  6 11:13:09 ppc-host kernel: [c0000003c9403b50] [c00000000014ba2c] .SyS_futex+0x11c/0x1d0
> Nov  6 11:13:09 ppc-host kernel: [c0000003c9403c20] [c0000000000a8dc0] .mm_release+0x130/0x190
> Nov  6 11:13:09 ppc-host kernel: [c0000003c9403ca0] [c0000000000b7000] .do_exit+0x190/0xb60
> Nov  6 11:13:09 ppc-host kernel: [c0000003c9403da0] [c0000000000b7b58] .SyS_exit_group+0x48/0xf0
> Nov  6 11:13:09 ppc-host kernel: [c0000003c9403e30] [c000000000009e7c] syscall_exit+0x0/0x7c
>
> Since the stuck javac/java/jar processes are Zombies I cannot run jstack on them.
>
> $ ps 30282
>   PID TTY      STAT   TIME COMMAND
> 30282 pts/1    Zl+    0:05 [javac] <defunct>
> $ jstack 30282
> 30282: Unable to open socket file: target process not responding or HotSpot VM not loaded
>
> Thoughts?
>
> Cheers,
> Severin
>


More information about the ppc-aix-port-dev mailing list