G1GC/ JIT compilation bug hunt.

Uwe Schindler uschindler at apache.org
Wed Aug 14 05:48:16 PDT 2013


Hi Dawid,

To find the bad optimization, we could also use the same approach like shown in my talks about the bug hunting done at the time of the famous Java 7 Porter Stemmer bug:

Most of us (Java coders, not Hotspot developers) don't have a debug version of the JDK installed with those special libraries. You can try to switch off optimization for methods you suspect to have a problem. Once the bug no longer happens, you have the bad code part (see PDF of http://berlinbuzzwords.de/sessions/testing-lucene-and-solr-various-jvms-bugs-bugs-bugs): "-XX:CompileCommand=exclude,your/package/Class,method". After that you can request assembly output for the broken method in a second step.

Uwe

-----
Uwe Schindler
uschindler at apache.org 
Apache Lucene PMC Chair / Committer
Bremen, Germany
http://lucene.apache.org/

> -----Original Message-----
> From: hotspot-dev-bounces at openjdk.java.net [mailto:hotspot-dev-
> bounces at openjdk.java.net] On Behalf Of Dawid Weiss
> Sent: Wednesday, August 14, 2013 2:39 PM
> To: Mikael Gerdin
> Cc: hotspot-dev
> Subject: Re: G1GC/ JIT compilation bug hunt.
> 
> Thanks Mikael. Limiting the assembly output is a good hint, I'll try to
> reproduce it and see what happens.
> 
> Dawid
> 
> On Wed, Aug 14, 2013 at 8:58 AM, Mikael Gerdin
> <mikael.gerdin at oracle.com> wrote:
> > Hi Dawid,
> >
> >
> > On 2013-08-14 08:27, Dawid Weiss wrote:
> >>
> >> Hi everyone,
> >>
> >> I am a committer to the Lucene/Solr project. We've recently hit what
> >> we believe is a JIT/GC bug -- it manifests itself only when G1GC is
> >> used, on a 32-bit VM:
> >>
> >> Using Java: 32bit/jdk1.8.0-ea-b102 -server -XX:+UseG1GC
> >> Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC
> >>
> >> Here is the Lucene issue where more information is available:
> >> https://issues.apache.org/jira/browse/LUCENE-5168
> >>
> >> In the essence, the problem is that the code hits an assertion (in
> >> Java) which it should never reach. There used to be a problem with
> >> our implementation of readByte which tripped C2, but this was patched
> >> by an alternate implementation a while back -- see here, line 97
> >> (inside
> >> readVInt):
> >>
> >>
> >>
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/lucene/
> >> core/src/java/org/apache/lucene/store/DataInput.java
> >>
> >> This time it seems to be something else and is *not* easily
> >> reproducible on a smaller example (it's not even reproducible on that
> >> particular test all the time).
> >>
> >> Is there anything you can think of that we can do and which would
> >> help you in narrowing down what the problem might be? I initially
> >> thought to pass -XX:+PrintCompilation -XX:+PrintAssembly but this
> >> will result in a huge log as this happens some time in the middle of
> >> a test run (and not always). If there's a shorter route I'd be happy to use
> it.
> >
> >
> > If you have a guess about which method is mis-compiled you can try
> > with -XX:CompileCommand="print org/apache/foo::method"
> > This enables +PrintNMethods on a per-method basis.
> >
> > If you suspect several methods you can use CompileCommandFile and
> > create a text file with several "print" commands.
> >
> > You also need to compile the hsdis disassembler library and place it
> > in the
> > jre/lib/i386 directory to get the actual output from
> > +Print{NMethods,Assembly}.
> >
> > /Mikael
> >
> >>
> >> Dawid
> >>
> >



More information about the hotspot-dev mailing list