JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
David Holmes
david.holmes at oracle.com
Wed Mar 6 10:23:10 UTC 2013
On 6/03/2013 5:55 PM, Dawid Weiss wrote:
>
> Here you go:
> http://pastebin.com/raw.php?i=b2PHLm1e
Thanks. I would have to say this seems to be the suspicious part:
Thread 22 (Thread 0xf20ffb40 (LWP 22939)):
#0 0xf7743430 in __kernel_vsyscall ()
#1 0xf771e96b in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/i386-linux-gnu/libpthread.so.0
#2 0xf6ec849c in os::PlatformEvent::park() ()
from
/var/lib/jenkins/tools/java/32bit/jdk1.8.0-ea-b79/jre/lib/i386/server/libjvm.so
#3 0xf6e98b82 in Monitor::IWait(Thread*, long long) ()
from
/var/lib/jenkins/tools/java/32bit/jdk1.8.0-ea-b79/jre/lib/i386/server/libjvm.so
#4 0xf6e99370 in Monitor::wait(bool, long, bool) ()
from
/var/lib/jenkins/tools/java/32bit/jdk1.8.0-ea-b79/jre/lib/i386/server/libjvm.so
#5 0xf6b5fb16 in SuspendibleThreadSet::join() ()
from
/var/lib/jenkins/tools/java/32bit/jdk1.8.0-ea-b79/jre/lib/i386/server/libjvm.so
#6 0xf6b5ea41 in ConcurrentG1RefineThread::run_young_rs_sampling() ()
from
/var/lib/jenkins/tools/java/32bit/jdk1.8.0-ea-b79/jre/lib/i386/server/libjvm.so
#7 0xf6b5ef91 in ConcurrentG1RefineThread::run() ()
The suspendible thread set logic looks 'tricky". Time for the G1 experts
to take over. :)
David
> Dawid
>
> On Wed, Mar 6, 2013 at 8:52 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
> If the VM is completely unresponsive then it suggests we are at a
> safepoint.
>
> The GC threads are not "hung" in os::parK, they are parked - waiting
> to be notified of something.
>
> The thing is to find out why they are not being woken up.
>
> Can the gdb log be posted somewhere? I don't know if the attachment
> made it to the original posting on hotspot-gc but it's no longer
> available on hotspot-dev.
>
> Thanks,
> David
>
>
> On 6/03/2013 4:07 PM, Krystal Mok wrote:
>
> Hi Uwe,
>
> If you can attach gdb onto it, and jstack -m and jstack -F
> should also
> work; that'll get you the Java stack trace.
> (But it probably doesn't matter in this case, because the hang is
> probably bug in the VM).
>
> - Kris
>
> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
> <uschindler at apache.org <mailto:uschindler at apache.org>> wrote:
>
> Hi,
>
> since a few month we are extensively testing various preview
> builds of JDK 8 for compatibility with Apache Lucene and
> Solr, so we can find any bugs early and prevent the problems
> we had with the release of Java 7 two years ago. Currently
> we have a Linux (Ubuntu 64bit) Jenkins machine that has
> various JDKs (JDK 6, JDK 7, JDK 8 snapshot, IBM J9, older
> JRockit) installed, choosing a different one with different
> hotspot and garbage collector settings on every run of the
> test suite (which takes approx. 30-45 minutes).
>
> JDK 8 b79 works so far very well on Linux, we found some
> strange behavior in early versions (maybe compiler errors),
> but no longer at the moment. There is one configuration that
> constantly and reproducibly hangs in one module that is
> tested: The configuration uses JDK 8 b79 (same for b78), 32
> bit, and G1GC (server or client does not matter). The JVM
> running the tests hangs irresponsible (jstack or kill -3
> have no effect/cannot connect, standard kill does not stop
> it, only kill -9 actually kills it). It can be reproduced in
> this Lucene module 100% (it hangs always).
>
> I was able to connect with GDB to the JVM and get a stack
> trace on all threads (see attachment, dump.txt). As you see
> all threads of G1GC seem to hang in a syscall (os:park(), a
> conditional wait in pthread library). Unfortunately that’s
> all I can give you. A Java stacktrace is not possible
> because the JVM reacts on neither kill -3 nor jstack. With
> all other garbage collectors it passes the test without
> hangs in a few seconds, with 32 bit G1GC it can stand still
> for hours. The 64 bit JVM passes with G1GC, so only the 32
> bit variant is affected. Client or Server VM makes no
> difference.
>
> To reproduce:
> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but
> this should not matter)
> - Download Lucene Source code (e.g. the snapshot version we
> were testing with:
> https://builds.apache.org/job/__Lucene-Artifacts-trunk/2212/__artifact/lucene/dist/
> <https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/>)
> - change to directory lucene/analysis/uima and run:
> ant -Dargs="-server -XX:+UseG1GC"
> -Dtests.multiplier=3 -Dtests.jvms=1 test
> After a while the test framework prints "stalled" messages
> (because the child VM actually running the test no longer
> responds). The PID is also printed. Try to get a stack trace
> or kill it, no response. Only kill -9 helps. Choosing
> another garbage collector in the above command line makes
> the test finish after a few seconds, e.g. -Dargs="-server
> -XX:+UseConcMarkSweepGC"
>
> I posted this bug report directly to the mailing list,
> because with earlier bug reports, there seem to be a problem
> with bugs.sun.com <http://bugs.sun.com> - there is no
> response from any reviewer after several weeks and we were
> able to help to find and fix javadoc and javac-compiler bugs
> early. So I hope you can help for this bug, too.
>
> Uwe
>
> -----
> Uwe Schindler
> uschindler at apache.org <mailto:uschindler at apache.org>
> Apache Lucene PMC Member / Committer
> Bremen, Germany
> http://lucene.apache.org/
>
>
>
>
More information about the hotspot-gc-dev
mailing list