G1GC/ JIT compilation bug hunt.
Dawid Weiss
dawid.weiss at gmail.com
Mon Sep 2 02:31:19 PDT 2013
Oh, I forgot to mention - that incremented value cannot be saved to
[esp + #136] immediately because there's a value read based on the
unincremented value (buffer[upto++]). This is all fine, but the
problem is that the incremented spill slot [esp + #144] is never
copied to that other spill slot that is read from the other inline.
Again -- I have no clue what makes this particular code path so
special but I can see it in the debugger clearly.
Dawid
On Mon, Sep 2, 2013 at 11:17 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
> I've traced the assembly on a running instance. The problem is quite
> interesting -- the code in question is purely computational, I still
> don't see how G1GC can be of importance here. In short what's
> happening is a method gets inlined in a larger loop more than once and
> a field access is placed on the esp spill. The problem is both of
> these inlines use a different spill slot and one of the updates is not
> properly propagated -- it is saved to one spill slot, then the code
> execution flow returns to the top of the loop and another inline reads
> a stale value from another spill slot. When you disable escape
> analysis everything works as expected.
>
> The crucial block is:
>
> 7e25 B1020: # B405 B1021 <- B1017 B1019 Freq: 0.00811184 // XXX
> jump from b1017
> 7e25 MOV EBX,[ESP + #136] // ebx = 80e
> 7e2c MOV ECX,EBX
> 7e2e INC ECX // ecx = 80f, increment.
> 7e2f MOV [ESP + #144],ECX // this is the
> incremented spill but it never gets propagated to [esp + #136] or
> re-read from memory.
> 7e36 MOV EBX,[ESP + #132]
> 7e3d MOV [EBX + #12],ECX ! Field:
> org/apache/lucene/index/ByteSliceReader.upto
> 7e40 MOV EBX,[ESP + #152]
> 7e47 TEST EBX,EBX
> 7e49 Jne B405 P=0.999999 C=-1.000000 // taken, null check
>
>
> The incremented local value of upto gets saved to [ESP + #144] and to
> memory but upon reiterating the loop it is re-read from [ESP + #136]
> so it looks like a dependency is missed somewhere in the graph causing
> escape analysis to fail.
>
> Ehm, obviously I have no idea how or where to fix this in the JVM code :)
>
> Full optoassembly dump and annotated opto assembly analysis is here:
> http://www.cs.put.poznan.pl/dweiss/tmp/hotspot-bug-analysis.zip
>
> Let me know if I can do anything else.
>
> Dawid
>
> On Tue, Aug 27, 2013 at 9:15 AM, Dawid Weiss <dawid.weiss at gmail.com> wrote:
>> Thanks Vladimir. I had the impression I may be looking at the
>> interpreter. Just for completeness, I also thought/tried the following
>> ways to stop at the jitted code:
>>
>> 1) call Windows debugger API endpoints from kernel32 --
>>
>> http://msdn.microsoft.com/en-us/library/windows/desktop/ms679297(v=vs.85).aspx
>>
>> via JNI/ JNA. This worked surprisingly well -- you can even wait for
>> the debugger to attached to the code in a loop, works like a charm. :)
>> Although once you break into the debugger I again couldn't make sense
>> of the stack -- probably the infrequent call causes a deopt again and
>> we're back in the interpreter.
>>
>> 2) use Unsafe to force a null pointer dereference :) This should send
>> a debugger signal, right? I didn't try it though, didn't have the time
>> yet. I suspect it may be moved to an infrequent trap too and we're
>> back where we started, eh.
>>
>>> I still did not have time to look on your logs.
>>
>> No worries, it's been a very very educational experience for me.
>> Besides, I used to write tons of assembly (not on x86) and it actually
>> feels very good to be back at the assembly mnemonic level ;)
>>
>> Don't spoil my weekend pet project :) And seriously -- if you want to
>> check out those logs let me know in advance and I'll prepare an update
>> with hsx-tip if needed.
>>
>> Dawid
>>
>> On Tue, Aug 27, 2013 at 3:07 AM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com> wrote:
>>> Dawid,
>>>
>>> Let it crush with AbortVMOnException (without ShowMessageBoxOnError) and
>>> look on call stack in hs_err_pidxxx.log file. Jited code usually have
>>> uncommon trap for exceptions so the method most likely was deoptimized and
>>> you will see interpeter frames on call stack instead of compiled/jited.
>>> There should be deoptimization event listed down in hs_err_pidxxx.log which
>>> may say in which address in jited code it hit uncommon trap. You can look on
>>> it in optoassembly output. Then during method compilation
>>> (-XX:+PrintCompilation -Xbatch -XX:CICompilerCount=1 should give you info
>>> when the method is compiled, you can verify it in Compile() by
>>> this->method()->print()) in debugger stop in ciEnv::register_method() after
>>> call to nmethod::new_nmethod(). nm->insts_begin() shows where instructions
>>> start. Add instruction offset from optoassembly and set breakepoint at that
>>> address. This way you can stop in jited code.
>>>
>>> G1 affects inlining since its write barriers code size is larger then other
>>> collectors (it has pre and post barriers for oop stores).
>>>
>>> I still did not have time to look on your logs.
>>>
>>> Vladimir
>>>
>>>
>>> On 8/26/13 12:56 PM, Dawid Weiss wrote:
>>>>>
>>>>> An ideal scenario and what I want to achieve is I'd like to stop on an
>>>>> assertion somewhere in the jitted code and go back to it to follow the
>>>>> execution (so that I can see what code is responsible for the issing
>>>>
>>>> increment). I'll need to sync this up (manually or otherwise) with the
>>>> optoassembly too [...]
>>>>
>>>> Would somebody be so kind and provide some light on whether the above
>>>> is feasible? I'm getting up to speed with visual studio -- haven't
>>>> used assembly-level debugging in a longer while -- but I'm having
>>>> troubles connecting the dots with verbose optoassembly output (the
>>>> addresses don't seem to match). Thanks and again, apologies if it's
>>>> something trivial but I couldn't find any helpful resources on the
>>>> net.
>>>>
>>>> Dawid
>>>>
>>>> On Sun, Aug 25, 2013 at 9:43 PM, Dawid Weiss <dawid.weiss at gmail.com>
>>>> wrote:
>>>>>
>>>>> I know why the assertion is hit. I still don't know (and probably
>>>>> won't figure it out) why it is so difficult to reproduce or which
>>>>> optimization leads to it.
>>>>>
>>>>> In short, the following code in ByteSliceReader#readByte:
>>>>>
>>>>> public byte readByte() {
>>>>> assert !eof();
>>>>> assert upto <= limit;
>>>>> if (upto == limit) {
>>>>> nextSlice();
>>>>> }
>>>>> return buffer[upto++];
>>>>> }
>>>>>
>>>>> is inlined in a number of places. If I include:
>>>>>
>>>>> public byte readByte() {
>>>>> assert !eof();
>>>>> assert upto <= limit;
>>>>> if (upto == limit) {
>>>>> nextSlice();
>>>>> baz = upto;
>>>>> }
>>>>> return buffer[upto++];
>>>>> }
>>>>> static int baz;
>>>>>
>>>>> I see a number of assignments to 'baz'. This is because methods like
>>>>> readVInt fork out many conditional branches, as in:
>>>>>
>>>>> public int readVInt() throws IOException {
>>>>> byte b = readByte();
>>>>> if (b >= 0) return b;
>>>>> int i = b & 0x7F;
>>>>> b = readByte();
>>>>> i |= (b & 0x7F) << 7;
>>>>> if (b >= 0) return i;
>>>>> b = readByte();
>>>>> i |= (b & 0x7F) << 14;
>>>>> if (b >= 0) return i;
>>>>> b = readByte();
>>>>> i |= (b & 0x7F) << 21;
>>>>> if (b >= 0) return i;
>>>>> b = readByte();
>>>>> // Warning: the next ands use 0x0F / 0xF0 - beware copy/paste
>>>>> errors:
>>>>> i |= (b & 0x0F) << 28;
>>>>> if ((b & 0xF0) == 0) return i;
>>>>> throw new IOException("Invalid vInt detected (too many bits)");
>>>>> }
>>>>>
>>>>> One of these executions (when upto == limit, and possibly with some
>>>>> other upper condition) leads to missing increment on buffer[upto++] --
>>>>> upto remains unmodified (but the value at buffer[upto] is returned
>>>>> properly). Because upto is moved to a register in many of those
>>>>> inlined methods I suspect either the register may be incremented and
>>>>> not saved or the increment may be removed due to some local escape
>>>>> analysis magic (that has to be incorrect, obviously). How G1GC
>>>>> interacts with all this (remember, it only happens with G1GC) I have
>>>>> no clue.
>>>>>
>>>>> Anyway, I've tried looking at -XX:+PrintEscapeAnalysis but I couldn't
>>>>> make much sense of it. I also dug into some of the opto assembly
>>>>> branches trying to map them back into the original bytecode, but it's
>>>>> a very long process -- had to map PcDesc manually to the assembly
>>>>> offset, then go to bytecode again. Then I fired visual studio and
>>>>> halted it like Vladimir mentioned:
>>>>> -XX:AbortVMOnException=java.lang.AssertionError
>>>>> -XX:+ShowMessageBoxOnError
>>>>>
>>>>> Once I attach to the halted JVM I "break all" and I see the reported
>>>>> thread's stack trace halted inside ntdll (with the closest JVM frame
>>>>> stopped at get_deopt_original_pc). The frames below it don't seem to
>>>>> make much sense to me -- they don't seem to point to any addresses
>>>>> reported in the compilation log (PcDesc clauses). I'm a complete
>>>>> newbie to this, so I apologize if it's a lame question, but how can I
>>>>> find out which part of the original jitted code this assertion was
>>>>> invoked from?
>>>>>
>>>>> An ideal scenario and what I want to achieve is I'd like to stop on an
>>>>> assertion somewhere in the jitted code and go back to it to follow the
>>>>> execution (so that I can see what code is responsible for the missing
>>>>> increment). I'll need to sync this up (manually or otherwise) with the
>>>>> optoassembly too, otherwise I'll probably lose any remaining hair
>>>>> trying to figure out what's what :)
>>>>>
>>>>> A hint (or any other pointer; RTFM) would be very welcome.
>>>>>
>>>>> Dawid
>>>>>
>>>>>
>>>>> On Tue, Aug 20, 2013 at 11:32 AM, Dawid Weiss <dawid.weiss at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> One follow-up with respect to those encoded integers -- I swapped the
>>>>>> correct and incorrect sequence in my last e-mail, sorry. I then also
>>>>>> added a sysout here:
>>>>>>
>>>>>> @@ -451,6 +453,7 @@
>>>>>> termFreq = 1;
>>>>>> } else {
>>>>>> termFreq = freq.readVInt();
>>>>>> + System.err.print("# (tf): " + termFreq + " ");
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> With this the assertion is no longer hit but it shows where the
>>>>>> difference in numbers comes from:
>>>>>>
>>>>>> Correct sequence is:
>>>>>>
>>>>>> # (eof)
>>>>>> # (code): 0
>>>>>> # (tf): 3
>>>>>> # (code): 4
>>>>>> # (tf): 3
>>>>>> # (code): 2
>>>>>> # (tf): 5
>>>>>> # (code): 2
>>>>>> # (tf): 2
>>>>>> # (code): 3
>>>>>> # (code): 5
>>>>>> # (code): 2
>>>>>>
>>>>>> And the incorrect, assertion-hit run shows:
>>>>>>
>>>>>> # (eof)
>>>>>> # (code): 0
>>>>>> # (code): 3
>>>>>> # (code): 4
>>>>>> # (code): 3
>>>>>> # (code): 2
>>>>>> # (code): 5
>>>>>> # (code): 2
>>>>>> # (code): 2
>>>>>> # (code): 3
>>>>>>
>>>>>> So there's something odd happening after the first readvint which
>>>>>> returns 0 -- the next number that should be read by:
>>>>>>
>>>>>> termFreq = freq.readVInt();
>>>>>>
>>>>>> is re-read as the next code.
>>>>>>
>>>>>> Dawid
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 20, 2013 at 11:11 AM, Dawid Weiss <dawid.weiss at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Thanks for comments guys.
>>>>>>>
>>>>>>>> It is great that you can build fastdebug VM. I assume that if I give
>>>>>>>> you a
>>>>>>>> patch to try you can also build with it.
>>>>>>>
>>>>>>>
>>>>>>> Yes, absolutely.
>>>>>>>
>>>>>>>> First, you can run with next options and then send zipped output
>>>>>>>> (related
>>>>>>>> part before the method compilation and optoassembler output, don't use
>>>>>>>> hsdis
>>>>>>>> for that) and hs_err file when test fails:
>>>>>>>
>>>>>>>
>>>>>>> I did that. Used the options you requested -- a full log of all I did
>>>>>>> is included in the ZIP file here:
>>>>>>> http://www.cs.put.poznan.pl/dweiss/tmp/debug-files.zip
>>>>>>>
>>>>>>> Cat debug-files\debug-log.txt. There are several runs included in that
>>>>>>> ZIP file, in short:
>>>>>>>
>>>>>>> - jdk18-fastdebug, b102, full run (no abort on assertion, explained in
>>>>>>> debug-log.txt):
>>>>>>> 001-no-abort-on-assertion
>>>>>>> 002-no-abort-on-assertion
>>>>>>>
>>>>>>> - jdk18-fastdebug, b102, abort on assertion (includes mem dumps)
>>>>>>> 003-with-abort-on-assertion
>>>>>>> 004-with-abort-on-assertion
>>>>>>>
>>>>>>> - jdk18-fastdebug, hsx tip, abort on assertion (includes mem dumps)
>>>>>>> 005-hsx-tip
>>>>>>> 006-hsx-tip
>>>>>>>
>>>>>>> - jdk18-fastdebug, hsx tip, excluded readvint method inlining (passed
>>>>>>> build, so only compilation logs available).
>>>>>>> 007-excluded-readvint
>>>>>>>
>>>>>>>> Looking on java code in FreqProxTermsWriterPerField::flush() and based
>>>>>>>> on
>>>>>>>> LUCENE-5168 report generated code somehow missed EOF check. Am I
>>>>>>>> right? This
>>>>>>>> is why the Assert is thrown?
>>>>>>>
>>>>>>>
>>>>>>> It's not the only method it trips on. Depending on the seed the
>>>>>>> problem shows up in different places. For the dumps I included the
>>>>>>> issue seems to be here:
>>>>>>>
>>>>>>>> } else {
>>>>>>>> final int code = freq.readVInt();
>>>>>>>> if (!readTermFreq) {
>>>>>>>> docID += code;
>>>>>>>> termFreq = -1;
>>>>>>>
>>>>>>>
>>>>>>> If I add sysouts as in:
>>>>>>>
>>>>>>> Index:
>>>>>>> core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java
>>>>>>> ===================================================================
>>>>>>> ---
>>>>>>> core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java
>>>>>>> (revision 1512807)
>>>>>>> +++
>>>>>>> core/src/java/org/apache/lucene/index/FreqProxTermsWriterPerField.java
>>>>>>> (working copy)
>>>>>>> @@ -427,6 +427,7 @@
>>>>>>> //System.out.println(" cycle");
>>>>>>> final int termFreq;
>>>>>>> if (freq.eof()) {
>>>>>>> + System.err.print("# (eof)");
>>>>>>> if (postings.lastDocCodes[termID] != -1) {
>>>>>>> // Return last doc
>>>>>>> docID = postings.lastDocIDs[termID];
>>>>>>> @@ -442,6 +443,7 @@
>>>>>>> }
>>>>>>> } else {
>>>>>>> final int code = freq.readVInt();
>>>>>>> + System.err.print("# (code): " + code + " ");
>>>>>>> if (!readTermFreq) {
>>>>>>> docID += code;
>>>>>>> termFreq = -1;
>>>>>>>
>>>>>>> then for a constant seed you'll get an identical sequence of values.
>>>>>>> Once the assertion hits though, the sequence deviates. Comparing a
>>>>>>> passing run (excluding readvint) and a failing run I get a lot of
>>>>>>> initial aligned outputs and then a deviation:
>>>>>>>
>>>>>>> (correct run):
>>>>>>> # (eof)
>>>>>>> # (eof)
>>>>>>> # (eof)
>>>>>>> # (code): 0
>>>>>>> # (code): 3
>>>>>>> # (code): 4
>>>>>>> # (code): 3
>>>>>>>
>>>>>>> (incorrect run):
>>>>>>> # (eof)
>>>>>>> # (eof)
>>>>>>> # (eof)
>>>>>>> # (code): 0
>>>>>>> # (code): 4 <- wtf?
>>>>>>> # (code): 2
>>>>>>> # (code): 2
>>>>>>>
>>>>>>> How can a variable encoded integer be misinterpreted from 3 to 4 is
>>>>>>> beyond me, sorry. But it's not an EOF condition I think, at least the
>>>>>>> deviation happens where the original run had was entering (code)
>>>>>>> branch.
>>>>>>>
>>>>>>>> Could you also send latest version of FreqProxTermsWriterPerField.java
>>>>>>>> you
>>>>>>>> are testing?
>>>>>>>
>>>>>>>
>>>>>>> I'm testing against a fixed branch (the info is included in
>>>>>>> debug-log.txt).
>>>>>>>
>>>>>>> If there's anything else I can do, let me know. I'm also curious how
>>>>>>> you approach debugging this thing. It may be time-based so I don't
>>>>>>> think it'll reproduce on your machine out of the box, but who knows.
>>>>>>> All I know is that, for example, if you add one more sysout before the
>>>>>>> while loop it suddenly stops reproducing so probably something gets
>>>>>>> compiled differently... Wild :)
>>>>>>>
>>>>>>> Dawid
More information about the hotspot-dev
mailing list