G1GC/ JIT compilation bug hunt.

Dawid Weiss dawid.weiss at gmail.com
Tue Aug 20 02:32:24 PDT 2013


One follow-up with respect to those  encoded integers -- I swapped the
correct and incorrect sequence in my last e-mail, sorry. I then also
added a sysout here:

@@ -451,6 +453,7 @@
               termFreq = 1;
             } else {
               termFreq = freq.readVInt();
+              System.err.print("# (tf): " + termFreq + " ");
             }
           }

With this the assertion is no longer hit but it shows where the
difference in numbers comes from:

Correct sequence is:

# (eof)
# (code): 0
# (tf): 3
# (code): 4
# (tf): 3
# (code): 2
# (tf): 5
# (code): 2
# (tf): 2
# (code): 3
# (code): 5
# (code): 2

And the incorrect, assertion-hit run shows:

# (eof)
# (code): 0
# (code): 3
# (code): 4
# (code): 3
# (code): 2
# (code): 5
# (code): 2
# (code): 2
# (code): 3

So there's something odd happening after the first readvint which
returns 0 -- the next number that should be read by:

termFreq = freq.readVInt();

is re-read as the next code.

Dawid


On Tue, Aug 20, 2013 at 11:11 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
> Thanks for comments guys.
>
>> It is great that you can build fastdebug VM. I assume that if I give you a
>> patch to try you can also build with it.
>
> Yes, absolutely.
>
>> First, you can run with next options and then send zipped output (related
>> part before the method compilation and optoassembler output, don't use hsdis
>> for that) and hs_err file when test fails:
>
> I did that. Used the options you requested -- a full log of all I did
> is included in the ZIP file here:
> http://www.cs.put.poznan.pl/dweiss/tmp/debug-files.zip
>
> Cat debug-files\debug-log.txt. There are several runs included in that
> ZIP file, in short:
>
> - jdk18-fastdebug, b102, full run (no abort on assertion, explained in
> debug-log.txt):
>   001-no-abort-on-assertion
>   002-no-abort-on-assertion
>
> - jdk18-fastdebug, b102, abort on assertion (includes mem dumps)
>   003-with-abort-on-assertion
>   004-with-abort-on-assertion
>
> - jdk18-fastdebug, hsx tip, abort on assertion (includes mem dumps)
>   005-hsx-tip
>   006-hsx-tip
>
> - jdk18-fastdebug, hsx tip, excluded readvint method inlining (passed
> build, so only compilation logs available).
>   007-excluded-readvint
>
>> Looking on java code in FreqProxTermsWriterPerField::flush() and based on
>> LUCENE-5168 report generated code somehow missed EOF check. Am I right? This
>> is why the Assert is thrown?
>
> It's not the only method it trips on. Depending on the seed the
> problem shows up in different places. For the dumps I included the
> issue seems to be here:
>
>>         } else {
>>           final int code = freq.readVInt();
>>           if (!readTermFreq) {
>>             docID += code;
>>             termFreq = -1;
>
> If I add sysouts as in:
>
> Index: core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java
> ===================================================================
> --- core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java
>      (revision 1512807)
> +++ core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java
>      (working copy)
> @@ -427,6 +427,7 @@
>          //System.out.println("  cycle");
>          final int termFreq;
>          if (freq.eof()) {
> +          System.err.print("# (eof)");
>            if (postings.lastDocCodes[termID] != -1) {
>              // Return last doc
>              docID = postings.lastDocIDs[termID];
> @@ -442,6 +443,7 @@
>            }
>          } else {
>            final int code = freq.readVInt();
> +          System.err.print("# (code): " + code + " ");
>            if (!readTermFreq) {
>              docID += code;
>              termFreq = -1;
>
> then for a constant seed you'll get an identical sequence of values.
> Once the assertion hits though, the sequence deviates. Comparing a
> passing run (excluding readvint) and a failing run I get a lot of
> initial aligned outputs and then a deviation:
>
> (correct run):
> # (eof)
> # (eof)
> # (eof)
> # (code): 0
> # (code): 3
> # (code): 4
> # (code): 3
>
> (incorrect run):
> # (eof)
> # (eof)
> # (eof)
> # (code): 0
> # (code): 4    <- wtf?
> # (code): 2
> # (code): 2
>
> How can a variable encoded integer be misinterpreted from 3 to 4 is
> beyond me, sorry. But it's not an EOF condition I think, at least the
> deviation happens where the original run had was entering (code)
> branch.
>
>> Could you also send latest version of FreqProxTermsWriterPerField.java you
>> are testing?
>
> I'm testing against a fixed branch (the info is included in debug-log.txt).
>
> If there's anything else I can do, let me know. I'm also curious how
> you approach debugging this thing. It may be time-based so I don't
> think it'll reproduce on your machine out of the box, but who knows.
> All I know is that, for example, if you add one more sysout before the
> while loop it suddenly stops reproducing so probably something gets
> compiled differently... Wild :)
>
> Dawid


More information about the hotspot-dev mailing list