G1GC/ JIT compilation bug hunt.

Tue Aug 27 00:15:50 PDT 2013

Thanks Vladimir. I had the impression I may be looking at the
interpreter. Just for completeness, I also thought/tried the following
ways to stop at the jitted code:

1) call Windows debugger API endpoints from kernel32 --

http://msdn.microsoft.com/en-us/library/windows/desktop/ms679297(v=vs.85).aspx

via JNI/ JNA. This worked surprisingly well -- you can even wait for
the debugger to attached to the code in a loop, works like a charm. :)
Although once you break into the debugger I again couldn't make sense
of the stack -- probably the infrequent call causes a deopt again and
we're back in the interpreter.

2) use Unsafe to force a null pointer dereference :) This should send
a debugger signal, right? I didn't try it though, didn't have the time
yet. I suspect it may be moved to an infrequent trap too and we're
back where we started, eh.

> I still did not have time to look on your logs.

No worries, it's been a very very educational experience for me.
Besides, I used to write tons of assembly (not on x86) and it actually
feels very good to be back at the assembly mnemonic level ;)

Don't spoil my weekend pet project :) And seriously -- if you want to
check out those logs let me know in advance and I'll prepare an update
with hsx-tip if needed.

Dawid

On Tue, Aug 27, 2013 at 3:07 AM, Vladimir Kozlov
<vladimir.kozlov at oracle.com> wrote:
> Dawid,
>
> Let it crush with AbortVMOnException (without ShowMessageBoxOnError) and
> look on call stack in hs_err_pidxxx.log file. Jited code usually have
> uncommon trap for exceptions so the method most likely was deoptimized and
> you will see interpeter frames on call stack instead of compiled/jited.
> There should be deoptimization event listed down in hs_err_pidxxx.log which
> may say in which address in jited code it hit uncommon trap. You can look on
> it in optoassembly output. Then during method compilation
> (-XX:+PrintCompilation -Xbatch -XX:CICompilerCount=1 should give you info
> when the method is compiled, you can verify it in Compile() by
> this->method()->print()) in debugger stop in ciEnv::register_method() after
> call to nmethod::new_nmethod(). nm->insts_begin() shows where instructions
> start. Add instruction offset from optoassembly and set breakepoint at that
> address. This way you can stop in jited code.
>
> G1 affects inlining since its write barriers code size is larger then other
> collectors (it has pre and post barriers for oop stores).
>
> I still did not have time to look on your logs.
>
> Vladimir
>
>
> On 8/26/13 12:56 PM, Dawid Weiss wrote:
>>>
>>> An ideal scenario and what I want to achieve is I'd like to stop on an
>>> assertion somewhere in the jitted code and go back to it to follow the
>>> execution (so that I can see what code is responsible for the issing
>>
>> increment). I'll need to sync this up (manually or otherwise) with the
>> optoassembly too [...]
>>
>> Would somebody be so kind and provide some light on whether the above
>> is feasible? I'm getting up to speed with visual studio -- haven't
>> used assembly-level debugging in a longer while -- but I'm having
>> troubles connecting the dots with verbose optoassembly output (the
>> addresses don't seem to match). Thanks and again, apologies if it's
>> something trivial but I couldn't find any helpful resources on the
>> net.
>>
>> Dawid
>>
>> On Sun, Aug 25, 2013 at 9:43 PM, Dawid Weiss <dawid.weiss at gmail.com>
>> wrote:
>>>
>>> I know why the assertion is hit. I still don't know (and probably
>>> won't figure it out) why it is so difficult to reproduce or which
>>> optimization leads to it.
>>>
>>> In short, the following code in ByteSliceReader#readByte:
>>>
>>>    public byte readByte() {
>>>      assert !eof();
>>>      assert upto <= limit;
>>>      if (upto == limit) {
>>>        nextSlice();
>>>      }
>>>      return buffer[upto++];
>>>    }
>>>
>>> is inlined in a number of places. If I include:
>>>
>>>    public byte readByte() {
>>>      assert !eof();
>>>      assert upto <= limit;
>>>      if (upto == limit) {
>>>        nextSlice();
>>>        baz = upto;
>>>      }
>>>      return buffer[upto++];
>>>    }
>>>    static int baz;
>>>
>>> I see a number of assignments to 'baz'. This is because methods like
>>> readVInt fork out many conditional branches, as in:
>>>
>>>    public int readVInt() throws IOException {
>>>      byte b = readByte();
>>>      if (b >= 0) return b;
>>>      int i = b & 0x7F;
>>>      b = readByte();
>>>      i |= (b & 0x7F) << 7;
>>>      if (b >= 0) return i;
>>>      b = readByte();
>>>      i |= (b & 0x7F) << 14;
>>>      if (b >= 0) return i;
>>>      b = readByte();
>>>      i |= (b & 0x7F) << 21;
>>>      if (b >= 0) return i;
>>>      b = readByte();
>>>      // Warning: the next ands use 0x0F / 0xF0 - beware copy/paste
>>> errors:
>>>      i |= (b & 0x0F) << 28;
>>>      if ((b & 0xF0) == 0) return i;
>>>      throw new IOException("Invalid vInt detected (too many bits)");
>>>    }
>>>
>>> One of these executions (when upto == limit, and possibly with some
>>> other upper condition) leads to missing increment on buffer[upto++] --
>>> upto remains unmodified (but the value at buffer[upto] is returned
>>> properly). Because upto is moved to a register in many of those
>>> inlined methods I suspect either the register may be incremented and
>>> not saved or the increment may be removed due to some local escape
>>> analysis magic (that has to be incorrect, obviously). How G1GC
>>> interacts with all this (remember, it only happens with G1GC) I have
>>> no clue.
>>>
>>> Anyway, I've tried looking at -XX:+PrintEscapeAnalysis but I couldn't
>>> make much sense of it. I also dug into some of the opto assembly
>>> branches trying to map them back into the original bytecode, but it's
>>> a very long process -- had to map PcDesc manually to the assembly
>>> offset, then go to bytecode again. Then I fired visual studio and
>>> halted it like Vladimir mentioned:
>>> -XX:AbortVMOnException=java.lang.AssertionError
>>> -XX:+ShowMessageBoxOnError
>>>
>>> Once I attach to the halted JVM I "break all" and I see the reported
>>> thread's stack trace halted inside ntdll (with the closest JVM frame
>>> stopped at get_deopt_original_pc). The frames below it don't seem to
>>> make much sense to me -- they don't seem to point to any addresses
>>> reported in the compilation log (PcDesc clauses). I'm a complete
>>> newbie to this, so I apologize if it's a lame question, but how can I
>>> find out which part of the original jitted code this assertion was
>>> invoked from?
>>>
>>> An ideal scenario and what I want to achieve is I'd like to stop on an
>>> assertion somewhere in the jitted code and go back to it to follow the
>>> execution (so that I can see what code is responsible for the missing
>>> increment). I'll need to sync this up (manually or otherwise) with the
>>> optoassembly too, otherwise I'll probably lose any remaining hair
>>> trying to figure out what's what :)
>>>
>>> A hint (or any other pointer; RTFM) would be very welcome.
>>>
>>> Dawid
>>>
>>>
>>> On Tue, Aug 20, 2013 at 11:32 AM, Dawid Weiss <dawid.weiss at gmail.com>
>>> wrote:
>>>>
>>>> One follow-up with respect to those  encoded integers -- I swapped the
>>>> correct and incorrect sequence in my last e-mail, sorry. I then also
>>>> added a sysout here:
>>>>
>>>> @@ -451,6 +453,7 @@
>>>>                 termFreq = 1;
>>>>               } else {
>>>>                 termFreq = freq.readVInt();
>>>> +              System.err.print("# (tf): " + termFreq + " ");
>>>>               }
>>>>             }
>>>>
>>>> With this the assertion is no longer hit but it shows where the
>>>> difference in numbers comes from:
>>>>
>>>> Correct sequence is:
>>>>
>>>> # (eof)
>>>> # (code): 0
>>>> # (tf): 3
>>>> # (code): 4
>>>> # (tf): 3
>>>> # (code): 2
>>>> # (tf): 5
>>>> # (code): 2
>>>> # (tf): 2
>>>> # (code): 3
>>>> # (code): 5
>>>> # (code): 2
>>>>
>>>> And the incorrect, assertion-hit run shows:
>>>>
>>>> # (eof)
>>>> # (code): 0
>>>> # (code): 3
>>>> # (code): 4
>>>> # (code): 3
>>>> # (code): 2
>>>> # (code): 5
>>>> # (code): 2
>>>> # (code): 2
>>>> # (code): 3
>>>>
>>>> So there's something odd happening after the first readvint which
>>>> returns 0 -- the next number that should be read by:
>>>>
>>>> termFreq = freq.readVInt();
>>>>
>>>> is re-read as the next code.
>>>>
>>>> Dawid
>>>>
>>>>
>>>> On Tue, Aug 20, 2013 at 11:11 AM, Dawid Weiss <dawid.weiss at gmail.com>
>>>> wrote:
>>>>>
>>>>> Thanks for comments guys.
>>>>>
>>>>>> It is great that you can build fastdebug VM. I assume that if I give
>>>>>> you a
>>>>>> patch to try you can also build with it.
>>>>>
>>>>>
>>>>> Yes, absolutely.
>>>>>
>>>>>> First, you can run with next options and then send zipped output
>>>>>> (related
>>>>>> part before the method compilation and optoassembler output, don't use
>>>>>> hsdis
>>>>>> for that) and hs_err file when test fails:
>>>>>
>>>>>
>>>>> I did that. Used the options you requested -- a full log of all I did
>>>>> is included in the ZIP file here:
>>>>> http://www.cs.put.poznan.pl/dweiss/tmp/debug-files.zip
>>>>>
>>>>> Cat debug-files\debug-log.txt. There are several runs included in that
>>>>> ZIP file, in short:
>>>>>
>>>>> - jdk18-fastdebug, b102, full run (no abort on assertion, explained in
>>>>> debug-log.txt):
>>>>>    001-no-abort-on-assertion
>>>>>    002-no-abort-on-assertion
>>>>>
>>>>> - jdk18-fastdebug, b102, abort on assertion (includes mem dumps)
>>>>>    003-with-abort-on-assertion
>>>>>    004-with-abort-on-assertion
>>>>>
>>>>> - jdk18-fastdebug, hsx tip, abort on assertion (includes mem dumps)
>>>>>    005-hsx-tip
>>>>>    006-hsx-tip
>>>>>
>>>>> - jdk18-fastdebug, hsx tip, excluded readvint method inlining (passed
>>>>> build, so only compilation logs available).
>>>>>    007-excluded-readvint
>>>>>
>>>>>> Looking on java code in FreqProxTermsWriterPerField::flush() and based
>>>>>> on
>>>>>> LUCENE-5168 report generated code somehow missed EOF check. Am I
>>>>>> right? This
>>>>>> is why the Assert is thrown?
>>>>>
>>>>>
>>>>> It's not the only method it trips on. Depending on the seed the
>>>>> problem shows up in different places. For the dumps I included the
>>>>> issue seems to be here:
>>>>>
>>>>>>          } else {
>>>>>>            final int code = freq.readVInt();
>>>>>>            if (!readTermFreq) {
>>>>>>              docID += code;
>>>>>>              termFreq = -1;
>>>>>
>>>>>
>>>>> If I add sysouts as in:
>>>>>
>>>>> Index:
>>>>> core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java
>>>>> ===================================================================
>>>>> ---
>>>>> core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java
>>>>>       (revision 1512807)
>>>>> +++
>>>>> core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java
>>>>>       (working copy)
>>>>> @@ -427,6 +427,7 @@
>>>>>           //System.out.println("  cycle");
>>>>>           final int termFreq;
>>>>>           if (freq.eof()) {
>>>>> +          System.err.print("# (eof)");
>>>>>             if (postings.lastDocCodes[termID] != -1) {
>>>>>               // Return last doc
>>>>>               docID = postings.lastDocIDs[termID];
>>>>> @@ -442,6 +443,7 @@
>>>>>             }
>>>>>           } else {
>>>>>             final int code = freq.readVInt();
>>>>> +          System.err.print("# (code): " + code + " ");
>>>>>             if (!readTermFreq) {
>>>>>               docID += code;
>>>>>               termFreq = -1;
>>>>>
>>>>> then for a constant seed you'll get an identical sequence of values.
>>>>> Once the assertion hits though, the sequence deviates. Comparing a
>>>>> passing run (excluding readvint) and a failing run I get a lot of
>>>>> initial aligned outputs and then a deviation:
>>>>>
>>>>> (correct run):
>>>>> # (eof)
>>>>> # (eof)
>>>>> # (eof)
>>>>> # (code): 0
>>>>> # (code): 3
>>>>> # (code): 4
>>>>> # (code): 3
>>>>>
>>>>> (incorrect run):
>>>>> # (eof)
>>>>> # (eof)
>>>>> # (eof)
>>>>> # (code): 0
>>>>> # (code): 4    <- wtf?
>>>>> # (code): 2
>>>>> # (code): 2
>>>>>
>>>>> How can a variable encoded integer be misinterpreted from 3 to 4 is
>>>>> beyond me, sorry. But it's not an EOF condition I think, at least the
>>>>> deviation happens where the original run had was entering (code)
>>>>> branch.
>>>>>
>>>>>> Could you also send latest version of FreqProxTermsWriterPerField.java
>>>>>> you
>>>>>> are testing?
>>>>>
>>>>>
>>>>> I'm testing against a fixed branch (the info is included in
>>>>> debug-log.txt).
>>>>>
>>>>> If there's anything else I can do, let me know. I'm also curious how
>>>>> you approach debugging this thing. It may be time-based so I don't
>>>>> think it'll reproduce on your machine out of the box, but who knows.
>>>>> All I know is that, for example, if you add one more sysout before the
>>>>> while loop it suddenly stops reproducing so probably something gets
>>>>> compiled differently... Wild :)
>>>>>
>>>>> Dawid