OpenJDK compile hangs on PPC64
Severin Gehwolf
sgehwolf at redhat.com
Thu Nov 6 15:29:09 UTC 2014
Hi,
On Thu, 2014-11-06 at 15:32 +0100, Volker Simonis wrote:
> Hi Severin, Andrew,
>
> I've cc'ed Andrew as he might be able to help with some of the questions below.
>
> Unfortunately I'm not really sure how I can help here. It seems this
> is a kernel problem. After all it states:
>
> "kernel:BUG: soft lockup - CPU#13 stuck for 22s! [javac:30304]"
>
> Googling for that gives a lot of results but I can not say which of
> them are really related to your problem. Maybe you could ask some of
> yoir kernel guys at RedHat?
OK.
> By the way, once a process is in "<defunct>" state you won't be able
> to get any information from it. It means that the process is actually
> dead and just waiting for his parent to die as well.
Sure.
> Following some of the things you might check:
>
> 1. Where did you got the boot jdk from? I couldn't find OpenJDK 8
> packeges for RHEL 7. Is there a repository where I can get it?
Unfortunately there is no repository where you can get it.
> 2. I've now idea how your boot jdk (at
> /etc/alternatives/java_sdk_1.8.0) was compiled? Maybe it was compiled
> for Power N but you're running on Power N-1?
This is interesting. Could you elaborate what the problem there is? It
sounds to me that if that's an issue it would be good to check
compatibility at runtime somehow.
> 3. Did you build the boot jdk yourself? If you build it yourself,
> which boot-jdk did you used for building it?
Yes. We built it ourselves using Zero.
> If not, please build it yourself (with the IBM JDK as boot jdk) and try
> if that still fails.
That's an avenue worth exploring, thanks.
> 4. What hardware do you have (i.e. `cat /proc/cpuinfo` , `lscpu` )?
The system is currently not available to me. I'll get back to you on
that once I've got access to a machine again.
> Do you use real hardware or are you running on QEMU (or something
> similar)?
Real hardware. Nothing virtualized.
> Please post the output of `cat /proc/cpuinfo` and `lscpu`.
See above. Will get back to you on that.
> I've just build and afterwards bootstrapped another OpenJDK 8 build
> with the created binaries on RHEL7/PPC64 without any problems. Here's
> my system:
>
> $ uname -a
> Linux xxx 3.10.0-124.el7.bz1083296.ppc64 #1 SMP Wed Jun 4 06:50:40 EDT
> 2014 ppc64 ppc64 ppc64 GNU/Linux
> $ cat /proc/cpuinfo
> processor : 0
> cpu : POWER8 (architected), altivec supported
> clock : 4116.000000MHz
> revision : 2.1 (pvr 004b 0201)
OK. I'd be interested to know if you're able to build the Zero variant
on this system with your boot JDK.
The recipe I use. First step is installing BRs for Zero.
# yum -y install libXtst-devel libXt-devel libXrender-devel cups-devel freetype-devel alsa-lib-devel autoconf automake alsa-lib-devel binutils cups-devel desktop-file-utils fontconfig freetype-devel giflib-devel gcc-c++ gtk2-devel lcms2-devel libjpeg-devel libpng-devel libxslt libX11-devel libXi-devel libXinerama-devel libXt-devel libXtst-devel nss-devel pkgconfig xorg-x11-proto-devel zip libffi-devel openssl prelink
$ bash configure \
--with-boot-jdk="/etc/alternatives/java_sdk_1.8.0" \
--with-debug-level="release" \
--disable-zip-debug-info \
--enable-unlimited-crypto \
--with-zlib=system \
--with-giflib=system \
--with-stdc++lib=dynamic \
--with-num-cores=8 \
--with-jvm-variants=zero
$ make \
SCTP_WERROR= \
DEBUG_BINARIES=true \
ENABLE_FULL_DEBUG_SYMBOLS=0 \
POST_STRIP_CMD="" \
DISABLE_INTREE_EC=true \
LOG=trace \
images
> We also regularly build and test on SLES 9-11 and RHEL 5-7 on PowerPC
> 6-8 CPUs and I can't remember that we've ever seen the problems you
> describe.
That's good to know.
Thanks!
Severin
>
> On Thu, Nov 6, 2014 at 11:22 AM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
> > Hi Volker,
> >
> > Sorry it took so long to get back to you. The PPC system I use for
> > building isn't all the time available to me :(
> >
> > On Fri, 2014-10-24 at 11:14 +0200, Volker Simonis wrote:
> >> Hi Severin,
> >>
> >> as a quick workaround, you could download the IBM J9 from
> >> http://www.ibm.com/developerworks/java/jdk/beta/ to build a new
> >> version of OpenJDK 8 from scratch. That shouldn't have any problems.
> >>
> >> If that still hangs, could you please provide some Java stack traces
> >> (i.e. by using jstack or by sending the Java process a SIGQUIT
> >> signal).
> >>
> >> Regards,
> >> Volker
> >>
> >>
> >> On Thu, Oct 23, 2014 at 3:45 PM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
> >> > Hi,
> >> >
> >> > I've compiled the OpenJDK Zero variant on PPC 64 (BE) a couple of times.
> >> > Out of a total of say 6 compiles it "hang" twice. I think it got stuck
> >> > somewhere compiling JDK classes.
> >> >
> >> > The boot JDK which I've used for building is OpenJDK 8 with the PPC64
> >> > JIT. The host OS is RHEL-7. I see the following showing up in dmesg
> >> > multiple times:
> >> >
> >> > [188362.213095] CPU: 16 PID: 10678 Comm: java Not tainted 3.10.0-123.el7.ppc64 #1
> >> > [188362.213100] Call Trace:
> >> > [188362.213105] [c00000006297ae40] [c0000000000166e8] .show_stack+0x78/0x320 (unreliable)
> >> > [188362.213113] [c00000006297af00] [c0000000008c6618] .dump_stack+0x28/0x3c
> >> > [188362.213119] [c00000006297af70] [c0000000001aa308] .rcu_check_callbacks+0x488/0x990
> >> > [188362.213126] [c00000006297b0b0] [c0000000000c9334] .update_process_times+0x54/0xa0
> >> > [188362.213132] [c00000006297b140] [c000000000144920] .tick_sched_timer+0x80/0x150
> >> > [188362.213138] [c00000006297b1e0] [c0000000000f177c] .__run_hrtimer+0xac/0x370
> >> > [188362.213144] [c00000006297b280] [c0000000000f28b8] .hrtimer_interrupt+0x138/0x320
> >> > [188362.213150] [c00000006297b390] [c00000000001f130] .timer_interrupt+0x120/0x2f0
> >> > [188362.213156] [c00000006297b440] [c000000000002614] decrementer_common+0x114/0x180
> >> > [188362.213162] --- Exception: 901 at .arch_local_irq_restore+0xf0/0x150
> >> > [188362.213162] LR = .arch_local_irq_restore+0xf0/0x150
> >> > [188362.213169] [c00000006297b730] [c0000000001488a0] .get_futex_key+0x310/0x3d0 (unreliable)
> >> > [188362.213176] [c00000006297b7a0] [c000000000051e14] .get_user_pages_fast+0x2a4/0x440
> >> > [188362.213181] [c00000006297b8a0] [c0000000001486e4] .get_futex_key+0x154/0x3d0
> >> > [188362.213187] [c00000006297b950] [c000000000149460] .futex_wake+0x50/0x1c0
> >> > [188362.213192] [c00000006297ba10] [c00000000014ac84] .do_futex+0x234/0xec0
> >> > [188362.213197] [c00000006297bb50] [c00000000014ba2c] .SyS_futex+0x11c/0x1d0
> >> > [188362.213202] [c00000006297bc20] [c0000000000a8dc0] .mm_release+0x130/0x190
> >> > [188362.213208] [c00000006297bca0] [c0000000000b7000] .do_exit+0x190/0xb60
> >> > [188362.213213] [c00000006297bda0] [c0000000000b7b58] .SyS_exit_group+0x48/0xf0
> >> > [188362.213219] [c00000006297be30] [c000000000009e7c] syscall_exit+0x0/0x7c
> >> >
> >> > PID 10678 was the java (javac) process that got stuck.
> >> >
> >
> > I'm still seeing this problem quite frequently.
> >
> > Boot JDK (ppc64 BE)
> > $ java -version
> > openjdk version "1.8.0_20"
> > OpenJDK Runtime Environment (build 1.8.0_20-b26)
> > OpenJDK 64-Bit Server VM (build 25.20-b23, mixed mode)
> >
> > I'm trying to build the Zero variant via with the above boot JDK. Config
> > + tools are:
> > -------------------------------------------------
> > A new configuration has been successfully created in
> > /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release
> > using configure arguments '--with-boot-jdk=/etc/alternatives/java_sdk_1.8.0 --with-debug-level=release --disable-zip-debug-info --enable-unlimited-crypto --with-zlib=system --with-giflib=system --with-stdc++lib=dynamic --with-num-cores=8 --with-jvm-variants=zero'.
> >
> > Configuration summary:
> > * Debug level: release
> > * HS debug level: product
> > * JDK variant: normal
> > * JVM variants: zero
> > * OpenJDK target: OS: linux, CPU architecture: ppc, address length: 64
> >
> > Tools summary:
> > * Boot JDK: openjdk version "1.8.0_20" OpenJDK Runtime Environment (build 1.8.0_20-b26) OpenJDK 64-Bit Server VM (build 25.20-b23, mixed mode) (at /etc/alternatives/java_sdk_1.8.0)
> > * Toolchain: gcc (GNU Compiler Collection)
> > * C Compiler: Version 4.8.2 (at /bin/gcc)
> > * C++ Compiler: Version 4.8.2 (at /bin/g++)
> > -------------------------------------------
> >
> > It get's looked up with:
> > "kernel:BUG: soft lockup - CPU#13 stuck for 22s! [javac:30304]"
> >
> > There seems to be only one defunct javac process (Zombie):
> > $ ps ax | grep java
> > 30280 pts/1 S+ 0:00 /bin/sh -c ( /etc/alternatives/java_sdk_1.8.0/bin/javac -XDignore.symbol.file=true -g -Xlint:all,-deprecation -Werror -implicit:none -sourcepath "/home/openjdk-tester/openjdk9-hs-comp/jdk/make/src/classes" -d /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist @/home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch.tmp && /bin/mv /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch.tmp /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch)
> > 30281 pts/1 S+ 0:00 /bin/sh -c ( /etc/alternatives/java_sdk_1.8.0/bin/javac -XDignore.symbol.file=true -g -Xlint:all,-deprecation -Werror -implicit:none -sourcepath "/home/openjdk-tester/openjdk9-hs-comp/jdk/make/src/classes" -d /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist @/home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch.tmp && /bin/mv /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch.tmp /home/openjdk-tester/openjdk9-hs-comp/build/linux-ppc64-normal-zero-release/make-support/bt_classes_moduleslist/_the.BUILD_GENMODULESLIST_batch)
> > 30282 pts/1 Zl+ 0:05 [javac] <defunct>
> >
> > Not sure what happened with process 30304. Perhaps the system cleaned it
> > up before I could investigate anything further.
> >
> >
> > $ tail -n40 /var/log/messages
> > Nov 6 11:10:09 ibm-p730-06-lp1 kernel: [c0000003c94031e0] [c0000000000f177c] .__run_hrtimer+0xac/0x370
> > Nov 6 11:10:09 ppc-host kernel: [c0000003c9403280] [c0000000000f28b8] .hrtimer_interrupt+0x138/0x320
> > Nov 6 11:10:09 ppc-host kernel: [c0000003c9403390] [c00000000001f130] .timer_interrupt+0x120/0x2f0
> > Nov 6 11:10:09 ppc-host kernel: [c0000003c9403440] [c000000000002614] decrementer_common+0x114/0x180
> > Nov 6 11:10:09 ppc-host kernel: --- Exception: 901 at .arch_local_irq_restore+0xf0/0x150
> > LR = .arch_local_irq_restore+0xf0/0x150
> > Nov 6 11:10:09 ppc-host kernel: [c0000003c9403730] [c0000000001486e4] .get_futex_key+0x154/0x3d0 (unreliable)
> > Nov 6 11:10:09 ppc-host kernel: [c0000003c94037a0] [c000000000051e14] .get_user_pages_fast+0x2a4/0x440
> > Nov 6 11:10:09 ppc-host kernel: [c0000003c94038a0] [c0000000001486e4] .get_futex_key+0x154/0x3d0
> > Nov 6 11:10:09 ppc-host kernel: [c0000003c9403950] [c000000000149460] .futex_wake+0x50/0x1c0
> > Nov 6 11:10:09 ppc-host kernel: [c0000003c9403a10] [c00000000014ac84] .do_futex+0x234/0xec0
> > Nov 6 11:10:09 ppc-host kernel: [c0000003c9403b50] [c00000000014ba2c] .SyS_futex+0x11c/0x1d0
> > Nov 6 11:10:09 ppc-host kernel: [c0000003c9403c20] [c0000000000a8dc0] .mm_release+0x130/0x190
> > Nov 6 11:10:09 ppc-host kernel: [c0000003c9403ca0] [c0000000000b7000] .do_exit+0x190/0xb60
> > Nov 6 11:10:09 ppc-host kernel: [c0000003c9403da0] [c0000000000b7b58] .SyS_exit_group+0x48/0xf0
> > Nov 6 11:10:09 ppc-host kernel: [c0000003c9403e30] [c000000000009e7c] syscall_exit+0x0/0x7c
> > Nov 6 11:13:09 ppc-host kernel: INFO: rcu_sched self-detected stall on CPU { 13} (t=114018 jiffies g=38182 c=38181 q=0)
> > Nov 6 11:13:09 ppc-host kernel: CPU: 13 PID: 30304 Comm: javac Not tainted 3.10.0-123.el7.ppc64 #1
> > Nov 6 11:13:09 ppc-host kernel: Call Trace:
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c9402eb0] [c0000000000166e8] .show_stack+0x78/0x320 (unreliable)
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c9402f70] [c0000000008c6618] .dump_stack+0x28/0x3c
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c9402fe0] [c0000000001aa308] .rcu_check_callbacks+0x488/0x990
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c9403120] [c0000000000c9334] .update_process_times+0x54/0xa0
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c94031b0] [c000000000144920] .tick_sched_timer+0x80/0x150
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c9403250] [c0000000000f177c] .__run_hrtimer+0xac/0x370
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c94032f0] [c0000000000f28b8] .hrtimer_interrupt+0x138/0x320
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c9403400] [c00000000001f130] .timer_interrupt+0x120/0x2f0
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c94034b0] [c000000000002614] decrementer_common+0x114/0x180
> > Nov 6 11:13:09 ppc-host kernel: --- Exception: 901 at .arch_local_irq_restore+0xf0/0x150
> > LR = .arch_local_irq_restore+0xf0/0x150
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c94037a0] [c000000000051f3c] .get_user_pages_fast+0x3cc/0x440 (unreliable)
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c9403810] [c00000000021fa60] .put_compound_page+0x2b0/0x330
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c94038a0] [c0000000001488a0] .get_futex_key+0x310/0x3d0
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c9403950] [c000000000149460] .futex_wake+0x50/0x1c0
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c9403a10] [c00000000014ac84] .do_futex+0x234/0xec0
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c9403b50] [c00000000014ba2c] .SyS_futex+0x11c/0x1d0
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c9403c20] [c0000000000a8dc0] .mm_release+0x130/0x190
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c9403ca0] [c0000000000b7000] .do_exit+0x190/0xb60
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c9403da0] [c0000000000b7b58] .SyS_exit_group+0x48/0xf0
> > Nov 6 11:13:09 ppc-host kernel: [c0000003c9403e30] [c000000000009e7c] syscall_exit+0x0/0x7c
> >
> > Since the stuck javac/java/jar processes are Zombies I cannot run jstack on them.
> >
> > $ ps 30282
> > PID TTY STAT TIME COMMAND
> > 30282 pts/1 Zl+ 0:05 [javac] <defunct>
> > $ jstack 30282
> > 30282: Unable to open socket file: target process not responding or HotSpot VM not loaded
> >
> > Thoughts?
> >
> > Cheers,
> > Severin
> >
More information about the ppc-aix-port-dev
mailing list