G1GC/ JIT compilation bug hunt.
Dawid Weiss
dawid.weiss at gmail.com
Mon Aug 19 13:29:40 PDT 2013
Thanks Vladimir, the tiered compilation hint was really very useful. I
managed to reproduce this error on a 1.8 fastbuild and I can dump
pretty much anything I want. But I still cannot figure out what's
wrong -- it's beyond me. Here are the things I've tried per your
suggestions
- turning off -XX:-OptimizePtrCompare or -XX:-EliminateAutoBox
doesn't help (error still present),
- I do have a -Xbatch with a single compiler thread -- it was more
difficult to hit an error seed (the tests are randomized) but it's
still possible.
I know it would be best to provide a stand-along reproduce package but
it's not trivial given how complex Lucene tests and testing framework
is. Can I provide you with anything that would be helpful except the
above? :) Specifically I can:
1) dump an opto assembly from a failed and non-failed run
2) provide an opto assembly from a g1gc run vs. any other gc (which
doesn't seem to exhibit the problem),
3) provide a -XX:+PrintCompilation -XX:+PrintCompilation2 or a verbose
hotspot.log.
Let me know if any of these (or anything else) would be useful. If
not, I'll try to extract a stand-alone package that would reproduce
the issue, although this is a real killer to pull off.
Dawid
On Fri, Aug 16, 2013 at 9:50 AM, Vladimir Kozlov
<vladimir.kozlov at oracle.com> wrote:
> On 8/15/13 11:45 PM, Dawid Weiss wrote:
>>>
>>> It is with high probability Compiler problem.
>>
>>
>> I believe so. I've re-run the tests with 1.8b102 and the problem is
>> still there, although it's more difficult to show -- I ran a 100 full
>> builds yesterday, five of them tripped on assertions that should be
>> unreachable.
>
>
> We switched on -XX:+TieredCompilation by default in b102. Switch it off to
> use only C2 compiler which has the problem.
>
>
>>
>>> G1 has larger write-barrier code then other GCs. It can affect inlining
>>> decisions. You can try to change -XX:InlineSmallCode=1000 value. It
>>> controls
>>> inlining of methods which were already compiled.
>>>
>>> You can also try -Xbatch -XX:CICompilerCount=1 to get serial
>>> compilations.
>>
>>
>> Thanks for these tips, Vladimir -- very helpful. I hope you don't mind
>> me asking one more question - we had a discussion with another Lucene
>> developer yesterday -- is -Xbatch deterministic in the sense that if
>> you run a single thread/ deterministic piece of code it will always
>> trigger compiles at the same time? What happens if there are two
>> uncoordinated threads that hit a set of the same methods (and thus
>> when the compiler kicks in the statistics will probably be different
>> for each independent run)?
>
>
> -Xbatch (equivalent to -XX:-BackgroubdCompilation) will block only thread
> which first put compilation task on compile queue. Other threads check that
> the task in the queue and resume execution without waiting.
> You still can't get full determinism with several java threads, as you
> notice. But it can reduce some variations in inlining decision because
> compilation will be executed by one Compiler thread (instead of 2 by
> default). So if compilation tasks are put on queue at the same order in
> different runs you most likely will get the same code generation. Of cause
> usually the order is slightly different (especially during startup when
> there are a lot of compilation requests) so you can still get different
> results.
>
>
>>
>> This question originated from a broader discussion where we were
>> wondering how you, the compiler-guru guys approach the debugging in
>> case something like this pops up -- a bug that is very hard to
>> reproduce, that manifests itself rarely and for which pretty much any
>> change at the Java level changes the compilation and thus generates
>> completely different code. This seems to be a tough nut to crack.
>
>
> We usually try to reproduce the problem with debug version of VM which have
> a lot asserts and we may hit one which helps identify the problem. You are
> lucky if you can reproduce a problem in debug VM in debugger.
> We try to get assembler output of compiled method during run when it
> crushes. hs_err file has address and offset in compiled code and small code
> snippet which helps to find the code. After that we "look hard" on assembler
> code and try to figure out what is wrong with it and which compiler part can
> generate such code pattern.
> There is debug flag -XX:AbortVMOnException==java.lang.NullPointerException
> which allow to abort VM on exceptions. And with -XX:+ShowMessageBoxOnError
> flag we allow to attach debugger to VM when it happened.
> When we get only core file it is tough. We try to use Serviceability Agent
> to extract information and compiled code from it and other data.
>
> An other suggestion for you. Since you can avoid problem with switched off
> EA you can try to switch off only
>
> -XX:-OptimizePtrCompare "Use escape analysis to optimize pointers
> compare"
> -XX:-EliminateAutoBox "Control optimizations for autobox elimination"
>
> Vladimir
>
>>
>> Dawid
>>
>
More information about the hotspot-dev
mailing list