Java 8 Performance Issue With Tiered Compilation Disabled
Igor Veresov
igor.veresov at oracle.com
Thu Jul 7 17:47:36 UTC 2016
I think the problem could be that make_not_entrant() increases decompile count unless update_recompile_counts() == false. So when the sweeper flushes that cause the increments of the counter and we eventually give up recompiling it. Perhaps make_not_entrant() needs a flag to not increment the counts.
igor
> On Jul 7, 2016, at 9:57 AM, James Lent <jlent at tivo.com> wrote:
>
> I have not been able to collect much more information on this issue. We have been running in production using the following JVM options with good results for over 2 weeks on one node and over 1 week on the other node:
>
> -XX:-TieredCompilation
> -XX:ReservedCodeCacheSize=240m
>
> Here is some production data. I realize it can be problematic to compare results between nodes and across days of the week, but, all the results have been very consistent and we have gone back and forth between the settings several times:
>
> Node
> Tiered
> Date
> Average Response Time (24 Hr)
> Trans/Sec (24 Hr)
> Processor Load (24 Hr)
> Expensive Test Query
> Flex 0
> Disabled
> 6/24/16
> 387 ms
> 6.33
> 2.31
> 1899 ms
> Flex 1
> Enabled
> 6/24/16
> 521 ms
> 5.52
> 2.66
> 3741 ms
> Flex 0
> Disabled
> 7/7/16
> 401 ms
> 5.52
> 2.26
> 1907 ms
> Flex 1
> Disabled
> 7/7/16
> 406 ms
> 5.49
> 2.25
> 1906 ms
>
>
> The "expensive query" is one that mirrors a query we no longer run in production. The cost of that query was what triggered my initial investigation. Specifically that this query would get significantly more expensive about 4 hours after a restart of the service. That prompted my initial post: "Unexplained Jumps In Response Time 'Fixed' By Disabling Tiered Compilation". One of the first steps we took was to optimize that query. I still occasionally manually run the "unoptimized" version of the query to monitor system performance.
>
> My conclusions so far are:
> In this environment, the system runs significantly better with tiered compilation disabled.
> With the increased Code Cache size, the compilation work tails off quickly and I don't think we are in any danger of hitting the decompile limit.
> Just fyi. I realize this does not add anything to understanding the underlying issues.
>
> On 06/06/2016 08:52 AM, James Lent wrote:
>> Thanks for your interest. Answer in-line.
>>
>>
>> On 06/06/2016 04:00 AM, Tobias Hartmann wrote:
>>> Hi James,
>>>
>>> thanks for the report, please find some questions below.
>>>
>>> On 02.06.2016 17:56, James Lent wrote:
>>>> We are running a Solr/Lucene based server on Java 8:
>>>>
>>>> openjdk version "1.8.0_91"
>>>> OpenJDK Runtime Environment (build 1.8.0_91-b14)
>>>> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>>>>
>>>> Recently we tried to run with Tiered Compilation disabled:
>>>>
>>>> -XX:-TieredCompilation
>>>>
>>>> In production the system performed very well for about 70 hours and then
>>>> performance start to degrade very sharply. HotSpot logs indicated that
>>>> 100s of methods had reached their recompile limit. A sample of the logs
>>>> is below:
>>>>
>>>> made not compilable on levels 0 1 2 3 4 java.lang.String::equals (81
>>>> bytes) decompile_count > PerMethodRecompilationCutoff
>>>> made not compilable on levels 0 1 2 3 4
>>>> org.apache.lucene.util.OpenBitSetIterator::nextDoc (136 bytes)
>>>> decompile_count > PerMethodRecompilationCutoff
>>>> made not compilable on levels 0 1 2 3 4
>>>> org.apache.solr.search.SolrConstantScoreQuery$ConstantScorer::score (5
>>>> bytes) decompile_count > PerMethodRecompilationCutoff
>>>> made not compilable on levels 0 1 2 3 4
>>>> org.roaringbitmap.Util::fillArrayAND (92 bytes) decompile_count >
>>>> PerMethodRecompilationCutoff
>>>>
>>>> The methods that hit this limit include Java, Lucene, Solr, and
>>>> RoaringBitMap methods.
>>>>
>>>> I was able to reproduce this in a dev environment by just running very
>>>> light real world traffic. What I observed was that many methods were
>>>> being recompiled 3 or more times per hour.
>>>>
>>>> I was able to "fix" this issue by one of two methods:
>>>>
>>>> 1) Increase max code cache size:
>>>>
>>>> -XX:ReservedCodeCacheSize=240m
>>>>
>>>> 2) Disable code cache flushing:
>>>>
>>>> -XX:-UseCodeCacheFlushing
>>>>
>>>> After running for over 12 hours the maximum number of decompiles for a
>>>> method is 14 and that method hit 11 decompiles very quickly and then
>>>> leveled off (with the first fix it is still at 11). Only 7 methods have
>>>> 8 or more decompiles and this has held steady for hours now).
>>>>
>>>> I also tried the following based on a suggestion from Chris Newland,
>>>> but, it did not help:
>>>>
>>>> -XX:-UseBimorphicInlining
>>>>
>>>> My theory is that the Code Cache flushing algorithm coupled with the
>>>> Code Cache demands of my application are causing the HotSpot compiler to
>>>> "thrash". Keep cycling the methods through the Code Cache:
>>>>
>>>> 1) Compile method A
>>>> 2) Compile method B
>>>> 3) Kick out method A and compile method C
>>>> 4) Kick out method B and compile method A
>>>>
>>>> Observations:
>>>>
>>>> 1) When Tiered Compilation is disabled the Code Cache max drops from
>>>> 240mb to 48mb.
>>> Did the problem also show up with Tiered Compilation?
>>
>> This problem did not. Of course with tiered compilation enabled the code cache max seems to default to 240mb. I did not try running tiered with a smaller max code cache size. I did not that tiered seems to use a lot more code cache. The same application exceeds 70mb when it is enabled.
>>
>> However with tiered compilations enabled we were seeing a hard to understand drop in performance after about 4.5 hours in both production (2 second response times suddenly jump to 3 to 5 seconds). I was able to reproduce this issue in a dev environment with surprisingly little trouble. I investigated this issue from many angles to try to understand what was going on (heap, linux file cache, background network traffic, processor load, EC2 variability). The one thing that did seem to work in dev was turning off Tiered compilation (a desperation idea). I have not been able to identify any interesting compiler logs that correlate with that issue. When we tried it in production it worked very well for 70 hours and then this issue occurred which was actually much worse. I was not a hero.
>>
>> I sent an email on this first issue to the hotspot-dev mailing list (before the "turn off tiered compilation fix" failed in production), but, was not expecting much help because I could not provide much data to work with. Just a description of the symptoms.
>>>> 2) When not constrained (i.e. when one of my two "fixes" is applied) my
>>>> application under this traffic load levels out at just below 22mb in the
>>>> Code Cache.
>>>> 3) When neither of my fixes is applied then the Code Cache usage peaks
>>>> about 18mb, but, is highly variable minute to minute.
>>> You could try to use a lower value of -XX:NmethodSweepActivity to avoid aggressive sweeping when the code cache is getting full. But I doubt that this helps because there is still enough space in the code cache.
>> I will try to give it a try as time permits, but, I have torn down my dev environment for now. I was able to reproduce this issue right on my dev box, but doing so consumes most of its resources. I may try to set it up before I leave tonight.
>>>> We are planning to implement fix one above in production soon. Any
>>>> comments?
>>> I think increasing the code cache size does not hurt and prevents aggressive sweeping in this case because a larger code cache size leads to a larger per method hotness threshold for flushing.
>>>
>>> Did you try to reproduce the problem with a JDK 9 early access build? The symptoms you are describing sound quite similar to an issue we encountered with JDK 9. This was fixed by code aging [1] but the fix was never backported to JDK 8. You can enable/disable code aging with -XX:UseCodeAging in 9. If the problem turns out to be fixed, we could provide a prototype backport of code aging for testing in your environment.
>> Thanks for the info. I have not tried with Java 9. I will put that on the list to try.
>>
>>> Best regards,
>>> Tobias
>>>
>>> [1] http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/975d903e1de3 <http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/975d903e1de3>
>>>
>>>> I can provide HotSpot logs. Chris has provided me the required options
>>>> to get the more detailed information.
>>>>
>>>> ________________________________
>>>>
>>>> This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.
>>
>> James Lent
>> Senior Software Engineer
>>
>> Digitalsmiths
>> A TiVo Company
>>
>> www.digitalsmiths.com <http://www.digitalsmiths.com/>
>> jlent at digitalsmiths.com <mailto:jlent at digitalsmiths.com> | office 919.460.4747
>
> James Lent
> Senior Software Engineer
>
> Digitalsmiths
> A TiVo Company
>
> www.digitalsmiths.com <http://www.digitalsmiths.com/>
> jlent at digitalsmiths.com <mailto:jlent at digitalsmiths.com> | office 919.460.4747
>
>
> This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160707/82e9f6f9/attachment.html>
More information about the hotspot-compiler-dev
mailing list