G1GC/ JIT compilation bug hunt.

Mon Sep 2 02:17:20 PDT 2013

I've traced the assembly on a running instance. The problem is quite
interesting -- the code in question is purely computational, I still
don't see how G1GC can be of importance here. In short what's
happening is a method gets inlined in a larger loop more than once and
a field access is placed on the esp spill. The problem is both of
these inlines use a different spill slot and one of the updates is not
properly propagated -- it is saved to one spill slot, then the code
execution flow returns to the top of the loop and another inline reads
a stale value from another spill slot. When you disable escape
analysis everything works as expected.

The crucial block is:

7e25   B1020: # B405 B1021 <- B1017 B1019  Freq: 0.00811184   // XXX
jump from b1017
7e25   MOV    EBX,[ESP + #136]                  // ebx = 80e
7e2c   MOV    ECX,EBX
7e2e   INC    ECX                               // ecx = 80f, increment.
7e2f   MOV    [ESP + #144],ECX                  // this is the
incremented spill but it never gets propagated to [esp + #136] or
re-read from memory.
7e36   MOV    EBX,[ESP + #132]
7e3d   MOV    [EBX + #12],ECX ! Field:
org/apache/lucene/index/ByteSliceReader.upto
7e40   MOV    EBX,[ESP + #152]
7e47   TEST   EBX,EBX
7e49   Jne    B405  P=0.999999 C=-1.000000      // taken, null check

The incremented local value of upto gets saved to [ESP + #144] and to
memory but upon reiterating the loop it is re-read from [ESP + #136]
so it looks like a dependency is missed somewhere in the graph causing
escape analysis to fail.

Ehm, obviously I have no idea how or where to fix this in the JVM code :)

Full optoassembly dump and annotated opto assembly analysis is here:
http://www.cs.put.poznan.pl/dweiss/tmp/hotspot-bug-analysis.zip

Let me know if I can do anything else.

Dawid

On Tue, Aug 27, 2013 at 9:15 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
> Thanks Vladimir. I had the impression I may be looking at the
> interpreter. Just for completeness, I also thought/tried the following
> ways to stop at the jitted code:
>
> 1) call Windows debugger API endpoints from kernel32 --
>
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms679297(v=vs.85).aspx
>
> via JNI/ JNA. This worked surprisingly well -- you can even wait for
> the debugger to attached to the code in a loop, works like a charm. :)
> Although once you break into the debugger I again couldn't make sense
> of the stack -- probably the infrequent call causes a deopt again and
> we're back in the interpreter.
>
> 2) use Unsafe to force a null pointer dereference :) This should send
> a debugger signal, right? I didn't try it though, didn't have the time
> yet. I suspect it may be moved to an infrequent trap too and we're
> back where we started, eh.
>
>> I still did not have time to look on your logs.
>
> No worries, it's been a very very educational experience for me.
> Besides, I used to write tons of assembly (not on x86) and it actually
> feels very good to be back at the assembly mnemonic level ;)
>
> Don't spoil my weekend pet project :) And seriously -- if you want to
> check out those logs let me know in advance and I'll prepare an update
> with hsx-tip if needed.
>
> Dawid
>
> On Tue, Aug 27, 2013 at 3:07 AM, Vladimir Kozlov
> <vladimir.kozlov at oracle.com> wrote:
>> Dawid,
>>
>> Let it crush with AbortVMOnException (without ShowMessageBoxOnError) and
>> look on call stack in hs_err_pidxxx.log file. Jited code usually have
>> uncommon trap for exceptions so the method most likely was deoptimized and
>> you will see interpeter frames on call stack instead of compiled/jited.
>> There should be deoptimization event listed down in hs_err_pidxxx.log which
>> may say in which address in jited code it hit uncommon trap. You can look on
>> it in optoassembly output. Then during method compilation
>> (-XX:+PrintCompilation -Xbatch -XX:CICompilerCount=1 should give you info
>> when the method is compiled, you can verify it in Compile() by
>> this->method()->print()) in debugger stop in ciEnv::register_method() after
>> call to nmethod::new_nmethod(). nm->insts_begin() shows where instructions
>> start. Add instruction offset from optoassembly and set breakepoint at that
>> address. This way you can stop in jited code.
>>
>> G1 affects inlining since its write barriers code size is larger then other
>> collectors (it has pre and post barriers for oop stores).
>>
>> I still did not have time to look on your logs.
>>
>> Vladimir
>>
>>
>> On 8/26/13 12:56 PM, Dawid Weiss wrote:
>>>>
>>>> An ideal scenario and what I want to achieve is I'd like to stop on an
>>>> assertion somewhere in the jitted code and go back to it to follow the
>>>> execution (so that I can see what code is responsible for the issing
>>>
>>> increment). I'll need to sync this up (manually or otherwise) with the
>>> optoassembly too [...]
>>>
>>> Would somebody be so kind and provide some light on whether the above
>>> is feasible? I'm getting up to speed with visual studio -- haven't
>>> used assembly-level debugging in a longer while -- but I'm having
>>> troubles connecting the dots with verbose optoassembly output (the
>>> addresses don't seem to match). Thanks and again, apologies if it's
>>> something trivial but I couldn't find any helpful resources on the
>>> net.
>>>
>>> Dawid
>>>
>>> On Sun, Aug 25, 2013 at 9:43 PM, Dawid Weiss <dawid.weiss at gmail.com>
>>> wrote:
>>>>
>>>> I know why the assertion is hit. I still don't know (and probably
>>>> won't figure it out) why it is so difficult to reproduce or which
>>>> optimization leads to it.
>>>>
>>>> In short, the following code in ByteSliceReader#readByte:
>>>>
>>>>    public byte readByte() {
>>>>      assert !eof();
>>>>      assert upto <= limit;
>>>>      if (upto == limit) {
>>>>        nextSlice();
>>>>      }
>>>>      return buffer[upto++];
>>>>    }
>>>>
>>>> is inlined in a number of places. If I include:
>>>>
>>>>    public byte readByte() {
>>>>      assert !eof();
>>>>      assert upto <= limit;
>>>>      if (upto == limit) {
>>>>        nextSlice();
>>>>        baz = upto;
>>>>      }
>>>>      return buffer[upto++];
>>>>    }
>>>>    static int baz;
>>>>
>>>> I see a number of assignments to 'baz'. This is because methods like
>>>> readVInt fork out many conditional branches, as in:
>>>>
>>>>    public int readVInt() throws IOException {
>>>>      byte b = readByte();
>>>>      if (b >= 0) return b;
>>>>      int i = b & 0x7F;
>>>>      b = readByte();
>>>>      i |= (b & 0x7F) << 7;
>>>>      if (b >= 0) return i;
>>>>      b = readByte();
>>>>      i |= (b & 0x7F) << 14;
>>>>      if (b >= 0) return i;
>>>>      b = readByte();
>>>>      i |= (b & 0x7F) << 21;
>>>>      if (b >= 0) return i;
>>>>      b = readByte();
>>>>      // Warning: the next ands use 0x0F / 0xF0 - beware copy/paste
>>>> errors:
>>>>      i |= (b & 0x0F) << 28;
>>>>      if ((b & 0xF0) == 0) return i;
>>>>      throw new IOException("Invalid vInt detected (too many bits)");
>>>>    }
>>>>
>>>> One of these executions (when upto == limit, and possibly with some
>>>> other upper condition) leads to missing increment on buffer[upto++] --
>>>> upto remains unmodified (but the value at buffer[upto] is returned
>>>> properly). Because upto is moved to a register in many of those
>>>> inlined methods I suspect either the register may be incremented and
>>>> not saved or the increment may be removed due to some local escape
>>>> analysis magic (that has to be incorrect, obviously). How G1GC
>>>> interacts with all this (remember, it only happens with G1GC) I have
>>>> no clue.
>>>>
>>>> Anyway, I've tried looking at -XX:+PrintEscapeAnalysis but I couldn't
>>>> make much sense of it. I also dug into some of the opto assembly
>>>> branches trying to map them back into the original bytecode, but it's
>>>> a very long process -- had to map PcDesc manually to the assembly
>>>> offset, then go to bytecode again. Then I fired visual studio and
>>>> halted it like Vladimir mentioned:
>>>> -XX:AbortVMOnException=java.lang.AssertionError
>>>> -XX:+ShowMessageBoxOnError
>>>>
>>>> Once I attach to the halted JVM I "break all" and I see the reported
>>>> thread's stack trace halted inside ntdll (with the closest JVM frame
>>>> stopped at get_deopt_original_pc). The frames below it don't seem to
>>>> make much sense to me -- they don't seem to point to any addresses
>>>> reported in the compilation log (PcDesc clauses). I'm a complete
>>>> newbie to this, so I apologize if it's a lame question, but how can I
>>>> find out which part of the original jitted code this assertion was
>>>> invoked from?
>>>>
>>>> An ideal scenario and what I want to achieve is I'd like to stop on an
>>>> assertion somewhere in the jitted code and go back to it to follow the
>>>> execution (so that I can see what code is responsible for the missing
>>>> increment). I'll need to sync this up (manually or otherwise) with the
>>>> optoassembly too, otherwise I'll probably lose any remaining hair
>>>> trying to figure out what's what :)
>>>>
>>>> A hint (or any other pointer; RTFM) would be very welcome.
>>>>
>>>> Dawid
>>>>
>>>>
>>>> On Tue, Aug 20, 2013 at 11:32 AM, Dawid Weiss <dawid.weiss at gmail.com>
>>>> wrote:
>>>>>
>>>>> One follow-up with respect to those  encoded integers -- I swapped the
>>>>> correct and incorrect sequence in my last e-mail, sorry. I then also
>>>>> added a sysout here:
>>>>>
>>>>> @@ -451,6 +453,7 @@
>>>>>                 termFreq = 1;
>>>>>               } else {
>>>>>                 termFreq = freq.readVInt();
>>>>> +              System.err.print("# (tf): " + termFreq + " ");
>>>>>               }
>>>>>             }
>>>>>
>>>>> With this the assertion is no longer hit but it shows where the
>>>>> difference in numbers comes from:
>>>>>
>>>>> Correct sequence is:
>>>>>
>>>>> # (eof)
>>>>> # (code): 0
>>>>> # (tf): 3
>>>>> # (code): 4
>>>>> # (tf): 3
>>>>> # (code): 2
>>>>> # (tf): 5
>>>>> # (code): 2
>>>>> # (tf): 2
>>>>> # (code): 3
>>>>> # (code): 5
>>>>> # (code): 2
>>>>>
>>>>> And the incorrect, assertion-hit run shows:
>>>>>
>>>>> # (eof)
>>>>> # (code): 0
>>>>> # (code): 3
>>>>> # (code): 4
>>>>> # (code): 3
>>>>> # (code): 2
>>>>> # (code): 5
>>>>> # (code): 2
>>>>> # (code): 2
>>>>> # (code): 3
>>>>>
>>>>> So there's something odd happening after the first readvint which
>>>>> returns 0 -- the next number that should be read by:
>>>>>
>>>>> termFreq = freq.readVInt();
>>>>>
>>>>> is re-read as the next code.
>>>>>
>>>>> Dawid
>>>>>
>>>>>
>>>>> On Tue, Aug 20, 2013 at 11:11 AM, Dawid Weiss <dawid.weiss at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Thanks for comments guys.
>>>>>>
>>>>>>> It is great that you can build fastdebug VM. I assume that if I give
>>>>>>> you a
>>>>>>> patch to try you can also build with it.
>>>>>>
>>>>>>
>>>>>> Yes, absolutely.
>>>>>>
>>>>>>> First, you can run with next options and then send zipped output
>>>>>>> (related
>>>>>>> part before the method compilation and optoassembler output, don't use
>>>>>>> hsdis
>>>>>>> for that) and hs_err file when test fails:
>>>>>>
>>>>>>
>>>>>> I did that. Used the options you requested -- a full log of all I did
>>>>>> is included in the ZIP file here:
>>>>>> http://www.cs.put.poznan.pl/dweiss/tmp/debug-files.zip
>>>>>>
>>>>>> Cat debug-files\debug-log.txt. There are several runs included in that
>>>>>> ZIP file, in short:
>>>>>>
>>>>>> - jdk18-fastdebug, b102, full run (no abort on assertion, explained in
>>>>>> debug-log.txt):
>>>>>>    001-no-abort-on-assertion
>>>>>>    002-no-abort-on-assertion
>>>>>>
>>>>>> - jdk18-fastdebug, b102, abort on assertion (includes mem dumps)
>>>>>>    003-with-abort-on-assertion
>>>>>>    004-with-abort-on-assertion
>>>>>>
>>>>>> - jdk18-fastdebug, hsx tip, abort on assertion (includes mem dumps)
>>>>>>    005-hsx-tip
>>>>>>    006-hsx-tip
>>>>>>
>>>>>> - jdk18-fastdebug, hsx tip, excluded readvint method inlining (passed
>>>>>> build, so only compilation logs available).
>>>>>>    007-excluded-readvint
>>>>>>
>>>>>>> Looking on java code in FreqProxTermsWriterPerField::flush() and based
>>>>>>> on
>>>>>>> LUCENE-5168 report generated code somehow missed EOF check. Am I
>>>>>>> right? This
>>>>>>> is why the Assert is thrown?
>>>>>>
>>>>>>
>>>>>> It's not the only method it trips on. Depending on the seed the
>>>>>> problem shows up in different places. For the dumps I included the
>>>>>> issue seems to be here:
>>>>>>
>>>>>>>          } else {
>>>>>>>            final int code = freq.readVInt();
>>>>>>>            if (!readTermFreq) {
>>>>>>>              docID += code;
>>>>>>>              termFreq = -1;
>>>>>>
>>>>>>
>>>>>> If I add sysouts as in:
>>>>>>
>>>>>> Index:
>>>>>> core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java
>>>>>> ===================================================================
>>>>>> ---
>>>>>> core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java
>>>>>>       (revision 1512807)
>>>>>> +++
>>>>>> core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java
>>>>>>       (working copy)
>>>>>> @@ -427,6 +427,7 @@
>>>>>>           //System.out.println("  cycle");
>>>>>>           final int termFreq;
>>>>>>           if (freq.eof()) {
>>>>>> +          System.err.print("# (eof)");
>>>>>>             if (postings.lastDocCodes[termID] != -1) {
>>>>>>               // Return last doc
>>>>>>               docID = postings.lastDocIDs[termID];
>>>>>> @@ -442,6 +443,7 @@
>>>>>>             }
>>>>>>           } else {
>>>>>>             final int code = freq.readVInt();
>>>>>> +          System.err.print("# (code): " + code + " ");
>>>>>>             if (!readTermFreq) {
>>>>>>               docID += code;
>>>>>>               termFreq = -1;
>>>>>>
>>>>>> then for a constant seed you'll get an identical sequence of values.
>>>>>> Once the assertion hits though, the sequence deviates. Comparing a
>>>>>> passing run (excluding readvint) and a failing run I get a lot of
>>>>>> initial aligned outputs and then a deviation:
>>>>>>
>>>>>> (correct run):
>>>>>> # (eof)
>>>>>> # (eof)
>>>>>> # (eof)
>>>>>> # (code): 0
>>>>>> # (code): 3
>>>>>> # (code): 4
>>>>>> # (code): 3
>>>>>>
>>>>>> (incorrect run):
>>>>>> # (eof)
>>>>>> # (eof)
>>>>>> # (eof)
>>>>>> # (code): 0
>>>>>> # (code): 4    <- wtf?
>>>>>> # (code): 2
>>>>>> # (code): 2
>>>>>>
>>>>>> How can a variable encoded integer be misinterpreted from 3 to 4 is
>>>>>> beyond me, sorry. But it's not an EOF condition I think, at least the
>>>>>> deviation happens where the original run had was entering (code)
>>>>>> branch.
>>>>>>
>>>>>>> Could you also send latest version of FreqProxTermsWriterPerField.java
>>>>>>> you
>>>>>>> are testing?
>>>>>>
>>>>>>
>>>>>> I'm testing against a fixed branch (the info is included in
>>>>>> debug-log.txt).
>>>>>>
>>>>>> If there's anything else I can do, let me know. I'm also curious how
>>>>>> you approach debugging this thing. It may be time-based so I don't
>>>>>> think it'll reproduce on your machine out of the box, but who knows.
>>>>>> All I know is that, for example, if you add one more sysout before the
>>>>>> while loop it suddenly stops reproducing so probably something gets
>>>>>> compiled differently... Wild :)
>>>>>>
>>>>>> Dawid