JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
Bengt Rutisson
bengt.rutisson at oracle.com
Wed Mar 6 08:04:24 UTC 2013
David,
I think this is a VM bug and the thread dumps that Uwe produced are
enough to start tracking down the root cause.
On 3/6/13 8:52 AM, David Holmes wrote:
> If the VM is completely unresponsive then it suggests we are at a
> safepoint.
Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.
>
> The GC threads are not "hung" in os::parK, they are parked - waiting
> to be notified of something.
It looks like the reference processing thread is stuck in a loop where
it does wait(). So, the VM is hanging even if that stack trace also ends
up in os::park().
>
> The thing is to find out why they are not being woken up.
Actually, in this case we should probably not even be calling wait...
>
> Can the gdb log be posted somewhere? I don't know if the attachment
> made it to the original posting on hotspot-gc but it's no longer
> available on hotspot-dev.
I received the attachment with the original email. I've attached it to
the bug report that I created: 8009536. You can find it there if you
want to. But I think we have a fairly good idea of what change caused
the hang.
Bengt
>
> Thanks,
> David
>
> On 6/03/2013 4:07 PM, Krystal Mok wrote:
>> Hi Uwe,
>>
>> If you can attach gdb onto it, and jstack -m and jstack -F should also
>> work; that'll get you the Java stack trace.
>> (But it probably doesn't matter in this case, because the hang is
>> probably bug in the VM).
>>
>> - Kris
>>
>> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler <uschindler at apache.org>
>> wrote:
>>> Hi,
>>>
>>> since a few month we are extensively testing various preview builds
>>> of JDK 8 for compatibility with Apache Lucene and Solr, so we can
>>> find any bugs early and prevent the problems we had with the release
>>> of Java 7 two years ago. Currently we have a Linux (Ubuntu 64bit)
>>> Jenkins machine that has various JDKs (JDK 6, JDK 7, JDK 8 snapshot,
>>> IBM J9, older JRockit) installed, choosing a different one with
>>> different hotspot and garbage collector settings on every run of the
>>> test suite (which takes approx. 30-45 minutes).
>>>
>>> JDK 8 b79 works so far very well on Linux, we found some strange
>>> behavior in early versions (maybe compiler errors), but no longer at
>>> the moment. There is one configuration that constantly and
>>> reproducibly hangs in one module that is tested: The configuration
>>> uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client
>>> does not matter). The JVM running the tests hangs irresponsible
>>> (jstack or kill -3 have no effect/cannot connect, standard kill does
>>> not stop it, only kill -9 actually kills it). It can be reproduced
>>> in this Lucene module 100% (it hangs always).
>>>
>>> I was able to connect with GDB to the JVM and get a stack trace on
>>> all threads (see attachment, dump.txt). As you see all threads of
>>> G1GC seem to hang in a syscall (os:park(), a conditional wait in
>>> pthread library). Unfortunately that’s all I can give you. A Java
>>> stacktrace is not possible because the JVM reacts on neither kill -3
>>> nor jstack. With all other garbage collectors it passes the test
>>> without hangs in a few seconds, with 32 bit G1GC it can stand still
>>> for hours. The 64 bit JVM passes with G1GC, so only the 32 bit
>>> variant is affected. Client or Server VM makes no difference.
>>>
>>> To reproduce:
>>> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this
>>> should not matter)
>>> - Download Lucene Source code (e.g. the snapshot version we were
>>> testing with:
>>> https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/)
>>> - change to directory lucene/analysis/uima and run:
>>> ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
>>> -Dtests.jvms=1 test
>>> After a while the test framework prints "stalled" messages (because
>>> the child VM actually running the test no longer responds). The PID
>>> is also printed. Try to get a stack trace or kill it, no response.
>>> Only kill -9 helps. Choosing another garbage collector in the above
>>> command line makes the test finish after a few seconds, e.g.
>>> -Dargs="-server -XX:+UseConcMarkSweepGC"
>>>
>>> I posted this bug report directly to the mailing list, because with
>>> earlier bug reports, there seem to be a problem with bugs.sun.com -
>>> there is no response from any reviewer after several weeks and we
>>> were able to help to find and fix javadoc and javac-compiler bugs
>>> early. So I hope you can help for this bug, too.
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> uschindler at apache.org
>>> Apache Lucene PMC Member / Committer
>>> Bremen, Germany
>>> http://lucene.apache.org/
>>>
>>>
More information about the hotspot-gc-dev
mailing list