Deoptimization taking up most CPU cycles
Haozhun Jin
haozhun.jin at gmail.com
Mon Jun 6 16:58:40 UTC 2016
We are running a distributed system of about 30 hosts. We observe that a few
(~10%) of the hosts become unbearably slow after running for ~3 weeks
(10-50X
slower compared to hosts the are healthy). Restart of LXC container where
JVM
runs in on the unhealthy hosts would cure them. (Our system is configured
in a
way that restarting JVM alone is impossible.) Our system generates byte code
and load them on the fly. But we have caches that keep generated code
around if
code for the same work need to be generated again.
We believe this is a JVM problem because we see that a significant amount of
time is spent in VM deopt logic once the host becomes very slow. An excerpt
of
the perf result is at the end of this email.
Since we see de-opt, we wondered if JVM is repeatedly optimizing and
deoptimizing back and forth. We used `jstat -printcompilation <pid>` to
compare
the number of compilations before and after we run a new piece of work. (We
didn't generate new code for the work. Class comes from cache.) The number
doens't change.
If we slightly tweak the work to force a cache miss and generate new code,
the
compile count will increase. But its performance remains slow. (I made 10
tries. Each generate new classes. In one of them, 1 of the 4 unhealthy hosts
finishes its work in reasonable time. But all 4 unhealthy hosts remain slow
in
the other 9 tries. For that particular work that triggered one of the hosts
to finish in reasonable time, the same work will consistently be fast on
that
particular host.)
We didn't turn on de-opt logging when we started JVM. But we were able to
get
them through GDB. Below is the 2nd, 3rd, 5th, 6th, and 7th argument of
Events::log_deopt_message.
$34746 = 0x7f7a7cc67bf0 "Uncommon trap: reason=%s action=%s pc=0x%016lx
method=%s @ %d"
$34747 = 0x7f7a7cc67905 "unstable_if"
$34748 = 0x7f7a7cc4d0ba "none"
$34749 = 0x7f7a14f422c0
"com_facebook_presto_$gen_PageProcessor_440742.filter(Lcom/facebook/presto/spi/ConnectorSession;Lcom/facebook/presto/spi/block/Block;Lcom/facebook/presto/spi/block/Block;Lcom/facebook/presto/spi/block/"...
$34750 = 450
You can find the class referenced above at
https://gist.github.com/haozhun/920b28932ef253b55f3f497e2d187286
along with its text representation. To summarize, at bci 450, the branch
predicts ~5% one way, ~95% the rest. When one branch is taken, the same
branch
will most likely continue to be taken for the same branch for a while.
My questions:
* Is there anything else we could look at while the process is still
running?
(we have kept 3 of them around, for now)
* Are there any options we can turn on next time we start the process that
could help troubleshoot this when it happens again?
* Is the behavior we’re seeing, namely, repeated uncommon traps for the same
method for “unstable if” with no apparent deoptimization/further
re-compilations normal?
* For my understanding, what does action “none” mean for “unstable if”?
Java version:
java version "1.8.0_72"
Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)
Excerpt of JVM options:
-server
-Xmx120G
-Xms120G
-Xss2048k
-XX:+PreserveFramePointer
-XX:+UseG1GC
-XX:+UnlockExperimentalVMOptions
-XX:G1MaxNewSizePercent=20
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+PrintReferenceGC
-XX:+PrintGCCause
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintClassHistogramAfterFullGC
-XX:+PrintClassHistogramBeforeFullGC
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintSafepointStatistics
-XX:-OmitStackTraceInFastThrow
-XX:+UnlockDiagnosticVMOptions
-XX:+UnlockExperimentalVMOptions
-XX:-UseBiasedLocking
-XX:ReservedCodeCacheSize=512m
Excerpt of perf reuslt:
Samples: 97K of event 'cycles', Event count (approx.): 45248264720
Overhead Command Shared Object Symbol
+ 4.99% swapper [kernel.kallsyms] [k] intel_idle
- 4.40% presto-facebook libc-2.12.so [.] _IO_strn_overflow
_IO_strn_overflow
_IO_default_xsputn
vfprintf
_IO_vsnprintf
Events::log_deopt_message
Deoptimization::uncommon_trap_inner
Deoptimization::uncommon_trap
UncommonTrapBlob
- 4.30% presto-facebook libc-2.12.so [.] vfprintf
vfprintf
- _IO_vsnprintf
+ 60.98% Events::log_deopt_message
+ 39.02% Events::log
+ 3.93% presto-facebook perf-994763.map [.] Interpreter
- 3.55% presto-facebook libjvm.so [.] vframeArray::allocate
vframeArray::allocate
Deoptimization::fetch_unroll_info_helper
UncommonTrapBlob
- 3.40% presto-facebook libc-2.12.so [.] _IO_default_xsputn
- _IO_default_xsputn
- 93.51% vfprintf
- _IO_vsnprintf
+ 83.39% Events::log_deopt_message
+ 16.61% Events::log
+ 6.49% _IO_padn
- 3.25% presto-facebook libc-2.12.so [.] _int_malloc
_int_malloc
malloc
- os::malloc
+ 41.90% Deoptimization::fetch_unroll_info_helper
+ 24.60% CHeapObj<(MemoryType)6>::operator new
+ 19.49% vframeArray::allocate
+ 14.01% CHeapObj<(MemoryType)7>::operator new
- 2.85% presto-facebook libc-2.12.so [.] malloc
malloc
- os::malloc
+ 41.94% Deoptimization::fetch_unroll_info_helper
+ 21.75% vframeArray::allocate
+ 19.94% CHeapObj<(MemoryType)7>::operator new
+ 16.37% CHeapObj<(MemoryType)6>::operator new
- 2.34% presto-facebook libc-2.12.so [.] malloc_consolidate
malloc_consolidate
_int_malloc
malloc
- os::malloc
+ 64.23% vframeArray::allocate
+ 35.77% Deoptimization::fetch_unroll_info_helper
- 2.04% presto-facebook libjvm.so [.]
resource_allocate_bytes
- resource_allocate_bytes
+ 25.88% vframeArrayElement::fill_in
+ 20.02% StackValue::create_stack_value
+ 19.86% ScopeValue::read_from
+ 8.19% GenericGrowableArray::raw_allocate
+ 3.79% nmethod::scope_desc_at
+ 3.69% compiledVFrame::monitors
+ 3.55% OopMapSet::update_register_map
+ 3.30% ScopeDesc::decode_scope_values
+ 2.05% compiledVFrame::locals
+ 2.04% compiledVFrame::expressions
+ 1.87% vframe::new_vframe
+ 1.52% ScopeDesc::decode_body
+ 1.05% Symbol::as_C_string
+ 0.99% Deoptimization::fetch_unroll_info_helper
+ 0.73% Symbol::as_klass_external_name
+ 0.67% DataLayout::data_in
- 1.70% presto-facebook libc-2.12.so [.] _int_free
_int_free
- Deoptimization::cleanup_deopt_info
More information about the hotspot-dev
mailing list