Unexplained Jumps In Response Time "Fixed" By Disabling Tiered Compilation

James Lent jlent at tivo.com
Mon May 30 18:30:34 UTC 2016


One update.  In my original email I noted that we had tried turning off 
Tiered Compilation in production and had experienced 40 hours of good 
results.  I spoke 20 hours too early.  At about 60 hours the response 
time spiked to levels much higher than before (2 sec response times 
became 20 second response times) yet from a heap and CPU point of view 
the service appeared OK (heap usage actually slowed as more traffic 
switched to the other node, CPU usage went up but there was still plenty 
of capacity left).  Code Cache usage had levelled off at about 25mb out 
of the 48mb limit.  A restart fixed the issue (after turning Tiered 
Compilation back on).

I was able to reproduce the same symptoms in a dev environment after 
running light traffic for about 71 hours (slow replay of production 
queries).  The big spike in response time occurred at the same time I 
started to see HotSpot logs like the following:

made not compilable on levels 0 1 2 3 4 java.lang.String::equals (81 
bytes) decompile_count > PerMethodRecompilationCutoff
made not compilable on levels 0 1 2 3 4 
org.apache.lucene.util.OpenBitSetIterator::nextDoc (136 bytes) 
decompile_count > PerMethodRecompilationCutoff
made not compilable on levels 0 1 2 3 4 
org.apache.lucene.index.TermInfosReader$CloneableTerm::hashCode (8 
bytes) decompile_count > PerMethodRecompilationCutoff
made not compilable on levels 0 1 2 3 4 
org.apache.solr.search.SolrConstantScoreQuery$ConstantScorer::score (5 
bytes) decompile_count > PerMethodRecompilationCutoff
made not compilable on levels 0 1 2 3 4 
org.roaringbitmap.Util::fillArrayAND (92 bytes) decompile_count > 
PerMethodRecompilationCutoff

They continued to come out for several hours:

* Before 2PM - 0 logs
* 2PM to 3PM - 1 log
* 3PM to 4PM - 38 logs
* 4PM to 5PM - 30 logs
* 5PM to 6PM - 22 logs
* 6PM to 7PM - 64 logs

Many were related to low level (high usage) Java, Solr, Lucene, and 
RoaringBitMap methods.

I think this log means that the HotSpot compiler has recompiled these 
methods 400 times and has now given up and left them in byte code.  This 
is strange, but, would certainly explain the spike in response time.

Given all that my questions are:

1) Is my interpretation of these logs correct?
2) What would trigger the HotSpot compiler to keep recompiling these 
methods over and over?
3) Would increasing the Code Cache size help (it seems to revert to its 
old max of 48m when Tiered Compilation is disabled)?  I doubt it, but, I 
am giving it try in my dev environment.
4) Is it worth considering setting PerMethodRecompilationCutoff to '-1' 
(disable it)?

Thanks in advance for any help you can provide.

Jim



> On 25/05/16 15:33, James Lent wrote:
>    . . .
>> I then tried turning off Tiered Compilation (without changing the
>> default Code Cache size) in dev and the problem went away.  Of course it
>> is hard to prove a negative.  We are now testing in production with good
>> results (over 40 hours of good performance).
>>
>> Any thoughts?  I realize this is not much to go on, but, I thought I
>> would report my issue just in case you get other reports like this and
>> you start to see a pattern.
> Have you tried turning on listing compiles with -XX:+PrintCompilation?
>
> It would be very interesting to see what sort of compile activity is
> going on around the time that the jumps in response times occur. A
> comparison between +TieredCompilation and -ToetredCompilation might be
> very informative.
>
> regards,
>
>
> Andrew Dinn
> -----------
>


James Lent
Senior Software Engineer

*Digitalsmiths*
A TiVo Company

www.digitalsmiths.com <http://www.digitalsmiths.com/>
jlent at digitalsmiths.com <mailto:jlent at digitalsmiths.com>  | office 
919.460.4747


More information about the hotspot-dev mailing list