JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
Uwe Schindler
uschindler at apache.org
Wed Mar 6 12:49:07 UTC 2013
Hi Bengt,
That was fast! We are happy that you were able to analyze the bug and will fix it soon. To not make our Jenkins server get stuck in the tests, I will disable G1GC until a new update is installed. We will then only test the other garbage collectors with Lucene.
Do you have an idea, why this bug is not appearing on 64 bit? It might be caused by other GC behavior as the word size is different (the Lucene tests use -Xmx512M, so its fixed in 32 and 64 bit at the moment). I just want to understand this! I can run the test suite with 64 bit JDK over and over, it never hangs. But when running with 32 bit it hangs in all cases.
Uwe
-----
Uwe Schindler
uschindler at apache.org
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/
> -----Original Message-----
> From: hotspot-gc-dev-bounces at openjdk.java.net [mailto:hotspot-gc-dev-
> bounces at openjdk.java.net] On Behalf Of Bengt Rutisson
> Sent: Wednesday, March 06, 2013 1:08 PM
> To: hotspot-gc-dev at openjdk.java.net; David Holmes; Dawid Weiss; hotspot-
> dev at openjdk.java.net
> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
>
>
> Hi all,
>
> I sent this email earlier, but I did "reply list" instead of "reply all". Sorry about
> that.
>
> The hang is due to the fact that we are using single threaded reference
> processing but end up in the multi threaded code path and get stuck in a loop
> that waits for the other processing threads to terminate.
>
> John Cuthbertson is working on a fix for this. I think we have all the
> information we need to solve this.
>
> Bengt
>
> On 3/6/13 9:04 AM, Bengt Rutisson wrote:
> >
> > David,
> >
> > I think this is a VM bug and the thread dumps that Uwe produced are
> > enough to start tracking down the root cause.
> >
> > On 3/6/13 8:52 AM, David Holmes wrote:
> >> If the VM is completely unresponsive then it suggests we are at a
> >> safepoint.
> > Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.
> >
> >>
> >> The GC threads are not "hung" in os::parK, they are parked - waiting
> >> to be notified of something.
> >
> > It looks like the reference processing thread is stuck in a loop where
> > it does wait(). So, the VM is hanging even if that stack trace also
> > ends up in os::park().
> >
> >>
> >> The thing is to find out why they are not being woken up.
> >
> > Actually, in this case we should probably not even be calling wait...
> >
> >>
> >> Can the gdb log be posted somewhere? I don't know if the attachment
> >> made it to the original posting on hotspot-gc but it's no longer
> >> available on hotspot-dev.
> >
> > I received the attachment with the original email. I've attached it to
> > the bug report that I created: 8009536. You can find it there if you
> > want to. But I think we have a fairly good idea of what change caused
> > the hang.
> >
> > Bengt
> >
> >>
> >> Thanks,
> >> David
> >>
> >> On 6/03/2013 4:07 PM, Krystal Mok wrote:
> >>> Hi Uwe,
> >>>
> >>> If you can attach gdb onto it, and jstack -m and jstack -F should
> >>> also work; that'll get you the Java stack trace.
> >>> (But it probably doesn't matter in this case, because the hang is
> >>> probably bug in the VM).
> >>>
> >>> - Kris
> >>>
> >>> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
> >>> <uschindler at apache.org> wrote:
> >>>> Hi,
> >>>>
> >>>> since a few month we are extensively testing various preview builds
> >>>> of JDK 8 for compatibility with Apache Lucene and Solr, so we can
> >>>> find any bugs early and prevent the problems we had with the
> >>>> release of Java 7 two years ago. Currently we have a Linux (Ubuntu
> >>>> 64bit) Jenkins machine that has various JDKs (JDK 6, JDK 7, JDK 8
> >>>> snapshot, IBM J9, older JRockit) installed, choosing a different
> >>>> one with different hotspot and garbage collector settings on every
> >>>> run of the test suite (which takes approx. 30-45 minutes).
> >>>>
> >>>> JDK 8 b79 works so far very well on Linux, we found some strange
> >>>> behavior in early versions (maybe compiler errors), but no longer
> >>>> at the moment. There is one configuration that constantly and
> >>>> reproducibly hangs in one module that is tested: The configuration
> >>>> uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client
> >>>> does not matter). The JVM running the tests hangs irresponsible
> >>>> (jstack or kill -3 have no effect/cannot connect, standard kill
> >>>> does not stop it, only kill -9 actually kills it). It can be
> >>>> reproduced in this Lucene module 100% (it hangs always).
> >>>>
> >>>> I was able to connect with GDB to the JVM and get a stack trace on
> >>>> all threads (see attachment, dump.txt). As you see all threads of
> >>>> G1GC seem to hang in a syscall (os:park(), a conditional wait in
> >>>> pthread library). Unfortunately that’s all I can give you. A Java
> >>>> stacktrace is not possible because the JVM reacts on neither kill
> >>>> -3 nor jstack. With all other garbage collectors it passes the test
> >>>> without hangs in a few seconds, with 32 bit G1GC it can stand still
> >>>> for hours. The 64 bit JVM passes with G1GC, so only the 32 bit
> >>>> variant is affected. Client or Server VM makes no difference.
> >>>>
> >>>> To reproduce:
> >>>> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this
> >>>> should not matter)
> >>>> - Download Lucene Source code (e.g. the snapshot version we were
> >>>> testing with:
> >>>> https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/
> >>>> lucene/dist/)
> >>>> - change to directory lucene/analysis/uima and run:
> >>>> ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
> >>>> -Dtests.jvms=1 test
> >>>> After a while the test framework prints "stalled" messages (because
> >>>> the child VM actually running the test no longer responds). The PID
> >>>> is also printed. Try to get a stack trace or kill it, no response.
> >>>> Only kill -9 helps. Choosing another garbage collector in the above
> >>>> command line makes the test finish after a few seconds, e.g.
> >>>> -Dargs="-server -XX:+UseConcMarkSweepGC"
> >>>>
> >>>> I posted this bug report directly to the mailing list, because with
> >>>> earlier bug reports, there seem to be a problem with bugs.sun.com -
> >>>> there is no response from any reviewer after several weeks and we
> >>>> were able to help to find and fix javadoc and javac-compiler bugs
> >>>> early. So I hope you can help for this bug, too.
> >>>>
> >>>> Uwe
> >>>>
> >>>> -----
> >>>> Uwe Schindler
> >>>> uschindler at apache.org
> >>>> Apache Lucene PMC Member / Committer Bremen, Germany
> >>>> http://lucene.apache.org/
> >>>>
> >>>>
> >
More information about the hotspot-gc-dev
mailing list