CompileBroker hanging

Wed Mar 25 18:13:28 UTC 2020

Hello,

Thank you for you inputs! Really appreciating to hear your high-level view
on this issue.
I will start by increasing the code cache size as suggested (I will skip
adding the code cache flushing since that may "hide" the issue by instead
reporting it as interpreted execution), and time will tell if it happens
again.

For reference, I uploaded the pictures (from a more or less idle
application) that were removed in my original mail:
https://ibb.co/xfnvdL6
https://ibb.co/gTPxQ4R

I wish I was allowed to provide the entire flamegraph but you know... the
fun part of proprietary software.

Best Regards,
Gustav Åkesson

On Wed, Mar 25, 2020 at 6:02 PM Vladimir Kozlov <vladimir.kozlov at oracle.com>
wrote:

> Which should be fine in this case. Most likely the application is in
> stable state already and compilation happens only
> for "warm" (rare executed but finally hitting compilation threshold)
> methods.
>
> An other (worse) possibility is hitting some uncommon traps in compiled
> hot methods (and forcing recompilation) due to
> application phase change (loading new classes). But I think such
> possibility is low since Gustav said that application
> ran for weeks.
>
> I agree with Nils suggestion to use -XX:-UseCodeCacheFlushing and increase
> CodeCache size. With -XX:+TieredCompilation
> flag you should have 240Mb CodeCache by default. Try to use such size if
> you see degradation of performance with
> -XX:-UseCodeCacheFlushing.
>
> Regards,
> Vladimir K
>
> On 3/25/20 9:16 AM, Nils Eliasson wrote:
> > Hi Gustav,
> >
> > It sounds like you run out of code cache. When the code cache is getting
> full a number of things might happen (1) The
> > code cache is swept more often which consumes CPU time (2) The oldest
> compiled methods that hasn't been used recently
> > will be evicted. Throwing out the oldest compiled methods is a last
> effort to make room. if the compiled code that is
> > evicted is used eventually, it will trigger new compilations. If all the
> code is fairly hot - the VM will be alternate
> > between compiling and throwing out code.
> >
> > You can turn off code cache flushing with -XX:-UseCodeCacheFlushing but
> then when the code cache is full - no more
> > compilations will happen, and you might end up spending a lot of time in
> the interpreter.
> >
> >  From your numbers, your code cache is pretty full - there is always
> some amount of fragmentation that is reported as
> > free. I suggest you increase the code cache size. You write that the
> server has been running for a couple of weeks -
> > that suggest that you only need a modest increase.
> >
> > When you upgrade to a newer JDK there is both jcmds and JFR events that
> is useful for extracting code cache statistics.
> > Since JDK 9 there is also the Segmented Code Cache that helps mitigate
> fragmentation in the code cache.
> >
> > Regards,
> >
> > Nils Eliasson
> >
> > On 2020-03-25 08:14, Gustav Åkesson wrote:
> >> Hello folks,
> >>
> >> We a very peculiar issue that just started to happen in one of our
> >> application instances, in which the JIT compiler was suddenly starting
> to
> >> consume a lot of CPU (JVM had been running for a couple of weeks), and
> the
> >> application was more sluggish as well. Even after removing the load
> towards
> >> the instance the JIT compiler continued to consume CPU, for 5 more
> minutes.
> >> Then it stopped. But when loading again the JIT also started to consume
> CPU
> >> again.
> >>
> >> Does anyone recognize this type of issue or is it a known issue?
> >> I currently do not dare to restart the application since I then suspect
> the
> >> issue will vanish and have no way of analyzing further. That limits my
> >> options...
> >>
> >> Here is two flamegraph pictures when the VM is (more or less) idle:
> >> [image: fg_idle_overfiew.png]
> >>
> >> [image: fg_idle_compilebroker.png]
> >>
> >> And in a flamegraph with load I noticed that more time was spent in
> >> interpreted mode as well.
> >> It might be worth pointing out that when this issue happens the code
> cache
> >> utilization is 115mb out 128mb (our reserved). But I also noticed that
> the
> >> high-water mark was 127.1mb.
> >> Is this a clue why the compiler is erratic? I'm thinking of bumping the
> >> reserved code cache to 256mb as well.
> >>
> >> This is what we're running on:
> >> The VM it is running on RHEL 7, using 8 CPUs and 12GB of RAM.
> >>
> >> *java version "1.8.0_231"*
> >> Java(TM) SE Runtime Environment (build 1.8.0_231-b32)
> >> Java HotSpot(TM) 64-Bit Server VM (build 25.231-b32, mixed mode)
> >>
> >> *jcmd VM.flags:*
> >> -XX:+AlwaysPreTouch
> >> -XX:CICompilerCount=4
> >> -XX:+CMSEdenChunksRecordAlways
> >> -XX:CMSInitiatingOccupancyFraction=80
> >> -XX:+CMSParallelInitialMarkEnabled
> >> -XX:+CMSScavengeBeforeRemark
> >> -XX:CMSWaitDuration=60000
> >> -XX:CompressedClassSpaceSize=33554432
> >> -XX:+DebugNonSafepoints
> >> -XX:+DisableExplicitGC
> >> -XX:ErrorFile=<error-file>
> >> -XX:GCLogFileSize=31457280
> >> -XX:InitialHeapSize=6215958528
> >> -XX:MaxHeapSize=6215958528
> >> -XX:MaxMetaspaceSize=268435456
> >> -XX:MaxNewSize=1034944512
> >> -XX:MaxTenuringThreshold=6
> >> -XX:MetaspaceSize=268435456
> >> -XX:MinHeapDeltaBytes=196608
> >> -XX:NewSize=1034944512
> >> -XX:NumberOfGCLogFiles=3
> >> -XX:OldPLABSize=16
> >> -XX:OldSize=5181014016
> >> -XX:ParGCCardsPerStrideChunk=2048
> >> -XX:+PreserveFramePointer
> >> -XX:+PrintGC
> >> -XX:+PrintGCDateStamps
> >> -XX:+PrintGCDetails
> >> -XX:+PrintGCTimeStamps
> >> -XX:+PrintTenuringDistribution
> >> *-XX:ReservedCodeCacheSize=134217728 *
> >> *-XX:+TieredCompilation*
> >> -XX:+UnlockDiagnosticVMOptions
> >> -XX:+UseBiasedLocking
> >> -XX:+UseCMSInitiatingOccupancyOnly
> >> -XX:+UseCompressedClassPointers
> >> -XX:+UseCompressedOops
> >> -XX:+UseConcMarkSweepGC
> >> -XX:+UseFastUnorderedTimeStamps
> >> -XX:+UseGCLogFileRotation
> >> -XX:+UseParNewGC
> >>
> >>
> >> Best Regards,
> >> Gustav Åkesson
>