Unexplained Jumps In Response Time "Fixed" By Disabling Tiered Compilation

James Lent jlent at tivo.com
Tue May 31 00:22:12 UTC 2016


On more observation. On Flex Node 1 (the one with the smaller/default Code Cache max) the Code Cache usage graph is very erratic (lots up ups and downs).  On Flex Node 0 the graph is much smoother.

Speculation:

Perhaps there is some rule that tries to uncompile some methods when the Code Cache usage is high (close to max), but, the the problem is that all the compiled code is important and the compiler ends up thrashing.  Uncompile a method and then decide to compile it again.  This continues until the max decompiles is reached.  At that point the response time starts to spike.

Just a theory based on no knowledge of the actual code.

On 05/30/2016 02:30 PM, James Lent wrote:
One update.  In my original email I noted that we had tried turning off Tiered Compilation in production and had experienced 40 hours of good results.  I spoke 20 hours too early.  At about 60 hours the response time spiked to levels much higher than before (2 sec response times became 20 second response times) yet from a heap and CPU point of view the service appeared OK (heap usage actually slowed as more traffic switched to the other node, CPU usage went up but there was still plenty of capacity left).  Code Cache usage had levelled off at about 25mb out of the 48mb limit.  A restart fixed the issue (after turning Tiered Compilation back on).

I was able to reproduce the same symptoms in a dev environment after running light traffic for about 71 hours (slow replay of production queries).  The big spike in response time occurred at the same time I started to see HotSpot logs like the following:

made not compilable on levels 0 1 2 3 4 java.lang.String::equals (81 bytes) decompile_count > PerMethodRecompilationCutoff
made not compilable on levels 0 1 2 3 4 org.apache.lucene.util.OpenBitSetIterator::nextDoc (136 bytes) decompile_count > PerMethodRecompilationCutoff
made not compilable on levels 0 1 2 3 4 org.apache.lucene.index.TermInfosReader$CloneableTerm::hashCode (8 bytes) decompile_count > PerMethodRecompilationCutoff
made not compilable on levels 0 1 2 3 4 org.apache.solr.search.SolrConstantScoreQuery$ConstantScorer::score (5 bytes) decompile_count > PerMethodRecompilationCutoff
made not compilable on levels 0 1 2 3 4 org.roaringbitmap.Util::fillArrayAND (92 bytes) decompile_count > PerMethodRecompilationCutoff

They continued to come out for several hours:

* Before 2PM - 0 logs
* 2PM to 3PM - 1 log
* 3PM to 4PM - 38 logs
* 4PM to 5PM - 30 logs
* 5PM to 6PM - 22 logs
* 6PM to 7PM - 64 logs

Many were related to low level (high usage) Java, Solr, Lucene, and RoaringBitMap methods.

I think this log means that the HotSpot compiler has recompiled these methods 400 times and has now given up and left them in byte code.  This is strange, but, would certainly explain the spike in response time.

Given all that my questions are:

1) Is my interpretation of these logs correct?
2) What would trigger the HotSpot compiler to keep recompiling these methods over and over?
3) Would increasing the Code Cache size help (it seems to revert to its old max of 48m when Tiered Compilation is disabled)?  I doubt it, but, I am giving it try in my dev environment.
4) Is it worth considering setting PerMethodRecompilationCutoff to '-1' (disable it)?

Thanks in advance for any help you can provide.

Jim




On 25/05/16 15:33, James Lent wrote:
  . . .


I then tried turning off Tiered Compilation (without changing the
default Code Cache size) in dev and the problem went away.  Of course it
is hard to prove a negative.  We are now testing in production with good
results (over 40 hours of good performance).

Any thoughts?  I realize this is not much to go on, but, I thought I
would report my issue just in case you get other reports like this and
you start to see a pattern.


Have you tried turning on listing compiles with -XX:+PrintCompilation?

It would be very interesting to see what sort of compile activity is
going on around the time that the jumps in response times occur. A
comparison between +TieredCompilation and -ToetredCompilation might be
very informative.

regards,


Andrew Dinn
-----------




James Lent
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com/>
jlent at digitalsmiths.com<mailto:jlent at digitalsmiths.com>  | office 919.460.4747

James Lent
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com/>
jlent at digitalsmiths.com<mailto:jlent at digitalsmiths.com>  | office 919.460.4747

________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.


More information about the hotspot-dev mailing list