JIT stops compiling after a while (java 8u45)

Tue Mar 1 19:18:28 UTC 2016

> For real world applications I hope that this is a much smaller issue but
if you must load and execute loads and loads of short lived classes then it
might be reasonable to disable concurrent class unloading (at the cost of
getting serial Full gcs instead).

Unfortunately, this is not a theoretical issue for us. We see this problem
running Presto (http://prestodb.io), which generates bytecode for every
query it processes. For now, we're working around it with a background
thread that watches the size of the code cache and calls System.gc() when
it gets close to the max (
https://github.com/facebook/presto/commit/91e1b3bb6bbfffc62401025a24231cd388992d7c
).

Martin

On Tue, Mar 1, 2016 at 5:17 AM, Mikael Gerdin <mikael.gerdin at oracle.com>
wrote:

> Hi,
>
> On 2016-03-01 13:35, Tobias Hartmann wrote:
>
>> Hi,
>>
>> is just had a another look and it turned out that even with 8u40+ class
>> unloading is triggered. I missed that because it happens *much* later
>> (compared to 8u33) when the code cache already filled up and compilation is
>> disabled. At this point we don't recover because new classes are loaded and
>> new OSR nmethods are compiled rapidly.
>>
>> Summary:
>> The code cache fills up due to OSR nmethods that are not being flushed.
>> With 8u33 and earlier, G1 did more aggressive class unloading (probably due
>> to more allocations or different heuristics) and this allowed the sweeper
>> to flush enough OSR nmethods to continue compilation. With 8u40 and later,
>> class unloading happens long after the code cache is full.
>>
>
> Before 8u40 G1 could only unload classes at Full GCs.
> After 8u40 G1 can unload classes at the end of a concurrent GC cycle,
> avoiding Full GC.
>
> If you run the test with CMS with +CMSClassUnloadingEnabled you will
> probably see similar problematic results since the class unloading in G1 is
> very similar to the one in CMS.
> I haven't investigated in depth why the classes do not get unloaded in the
> G1 and CMS cases but there are several known quirks with how concurrent
> class unloading behaves which causes them to unload classes later than the
> serial Full GC.
>
> Running G1 with -XX:-ClassUnloadingWithConcurrentMark
> or CMS with -XX:-CMSClassUnloadingEnabled
> disables concurrent class unloading completely and works around the issue
> you are seeing.
>
> For real world applications I hope that this is a much smaller issue but
> if you must load and execute loads and loads of short lived classes then it
> might be reasonable to disable concurrent class unloading (at the cost of
> getting serial Full gcs instead).
>
>
>
>> I think we should fix this by flushing "cold" OSR nmethods as well
>> (JDK-8023191). Thomas Schatzl mentioned that we could also trigger a
>> concurrent mark if the code cache is full and hope that some classes are
>> unloaded but I'm afraid this is too invasive (and does not help much in the
>> general case).
>>
>
> If it is possible to flush OSR nmethods without doing a full class
> unloading cycle then I think that path is prefereable.
>
> /Mikael
>
>
>
>> Opinions?
>>
>> Best regards,
>> Tobias
>>
>> On 01.03.2016 11:27, Tobias Hartmann wrote:
>>
>>> Hi Nileema,
>>>
>>> thanks for reporting this issue!
>>>
>>> CC'ing the GC team because this seems to be a GC issue (see evaluation
>>> below).
>>>
>>> On 29.02.2016 23:59, nileema wrote:
>>>
>>>> We are seeing an issue with the CodeCache becoming full which causes the
>>>> compiler to be disabled in jdk-8u45 to jdk-8u72.
>>>>
>>>> We had seen a similar issue in Java7 (old issue:
>>>>
>>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-August/011333.html
>>>> ).
>>>> This issue went away with earlier versions of Java 8.
>>>>
>>>
>>> Reading the old conversation, I'm wondering if this could again be a
>>> problem with OSR nmethods that are not flushed? The bug (JDK-8023191) is
>>> still open - now assigned to me.
>>>
>>> Doing a quick experiment, it looks like we mostly compile OSR methods:
>>>    22129 2137 %     3       Runnable_412::run @ 4 (31 bytes)
>>>    22130 2189 %     4       Runnable_371::run @ 4 (31 bytes)
>>>    22134 2129 %     3       Runnable_376::run @ 4 (31 bytes)
>>>    22136 2109 %     3       Runnable_410::run @ 4 (31 bytes)
>>>
>>> Currently, OSR nmethods are not flushed just because the code cache is
>>> full but only if the nmethod becomes invalid (class loading/unloading,
>>> uncommon trap, ..)
>>>
>>> With your test, class unloading should happen and therefore the OSR
>>> nmethods *should* be flushed.
>>>
>>> We used the test http://github.com/martint/jittest to compare the
>>>> behavior
>>>> of jdk-8u25 and jdk-8u45. For this test, we did not see any CodeCache
>>>> full
>>>> messages with jdk-8u25  but did see them with 8u45+ (8u60  and 8u74)
>>>> Test results comparing 8u25, 8u45 and 8u74:
>>>> https://gist.github.com/nileema/6fb667a215e95919242f
>>>>
>>>> In the results you can see that 8u25 starts collecting the code cache
>>>> much
>>>> sooner than 8u45. 8u45 very quickly hits the limit for code cache. If we
>>>> force a full gc when it is about to hit the code cache limit, we see the
>>>> code cache size go down.
>>>>
>>>
>>> You can use the following flags to get additional information:
>>> -XX:CICompilerCount=1 -XX:+PrintCompilation -XX:+PrintMethodFlushing
>>> -XX:+TraceClassUnloading
>>>
>>> I did some more experiments with 8u45:
>>>
>>> java -mx20g -ms20g -XX:ReservedCodeCacheSize=20m
>>> -XX:+TraceClassUnloading -XX:+UseG1GC -jar
>>> jittest-1.0-SNAPSHOT-standalone.jar | grep "Unloading"
>>> -> We do *not* unload any classes. The code cache fills up with OSR
>>> nmethods that are not flushed.
>>>
>>> Removing the -XX:+UseG1GC flag solves the issue:
>>>
>>> java -mx20g -ms20g -XX:ReservedCodeCacheSize=20m
>>> -XX:+TraceClassUnloading -jar jittest-1.0-SNAPSHOT-standalone.jar | grep
>>> Unloading
>>> -> Prints plenty of [Unloading class Runnable_40 0x00000007c0086028]
>>> messages and the code cache does not fill up.
>>> -> OSR nmethods are flushed because the classes are unloaded:
>>>     21670  970 %     4       Runnable_87::run @ -2 (31 bytes)   made
>>> zombie
>>>
>>> The log files look good:
>>>
>>> 1456825330672   112939  10950016        10195496        111.28
>>> 1456825331675   118563  11432256        10467176        112.41
>>> 1456825332678   125935  11972928        10778432        115.72
>>> [Unloading class Runnable_2498 0x00000007c0566028]
>>> ...
>>> [Unloading class Runnable_34 0x00000007c0082028]
>>> 1456825333684   131493  10220608        5382976         117.46
>>> 1456825334688   137408  10359296        5636120         116.81
>>> 1456825335692   143593  7635136         5914624         114.21
>>>
>>> After the code cache fills up, we unload classes and therefore flush
>>> methods and start over again.
>>>
>>> I checked for several releases if classes are unloaded:
>>> - 8u27: success
>>> - 8u33: success
>>> - 8u40: fail
>>> - 8u45: fail
>>> - 8u76: fail
>>>
>>> The regression was introduced in 8u40.
>>>
>>> I also tried with the latest JDK 9 build and it fails as well (had to
>>> change the bean name from "Code Cache" to "CodeCache" and run with
>>> -XX:-SegmentedCodeCache). Again, -XX:-UseG1GC -XX:+UseParallelGC solves the
>>> problem.
>>>
>>> Can someone from the GC team have a look?
>>>
>>> Is this a known issue?
>>>>
>>>
>>> I'm not aware of any related issue.
>>>
>>> Best regards,
>>> Tobias
>>>
>>> Thanks!
>>>>
>>>> Nileema
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://openjdk.5641.n7.nabble.com/JIT-stops-compiling-after-a-while-java-8u45-tp259603.html
>>>> Sent from the OpenJDK Hotspot Compiler Development List mailing list
>>>> archive at Nabble.com.
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160301/e426b406/attachment-0001.html>