Deoptimization taking up most CPU cycles
    Haozhun Jin 
    haozhun.jin at gmail.com
       
    Sat Jun  4 00:08:42 UTC 2016
    
    
  
We are running a distributed system of about 30 hosts. We observe that a few
(~10%) of the hosts become unbearably slow after running for ~3 weeks
(10-50X
slower compared to hosts the are healthy). Restart of LXC container where
JVM
runs in on the unhealthy hosts would cure them. (Our system is configured
in a
way that restarting JVM alone is impossible.) Our system generates byte code
and load them on the fly. But we have caches that keep generated code
around if
code for the same work need to be generated again.
We believe this is a JVM problem because we see that a significant amount of
time is spent in VM deopt logic once the host becomes very slow. An excerpt
of
the perf result is at the end of this email.
Since we see de-opt, we wondered if JVM is repeatedly optimizing and
deoptimizing back and forth.  We used `jstat -printcompilation <pid>` to
compare
the number of compilations before and after we run a new piece of work. (We
didn't generate new code for the work. Class comes from cache.) The number
doens't change.
If we slightly tweak the work to force a cache miss and generate new code,
the
compile count will increase. But its performance remains slow. (I made 10
tries. Each generate new classes. In one of them, 1 of the 4 unhealthy hosts
finishes its work in reasonable time. But all 4 unhealthy hosts remain slow
in
the other 9 tries. For that particular work that triggered one of the hosts
to finish in reasonable time, the same work will consistently be fast on
that
particular host.)
We didn't turn on de-opt logging when we started JVM. But we were able to
get
them through GDB.  Below is the 2nd, 3rd, 5th, 6th, and 7th argument of
Events::log_deopt_message.
$34746 = 0x7f7a7cc67bf0 "Uncommon trap: reason=%s action=%s pc=0x%016lx
method=%s @ %d"
$34747 = 0x7f7a7cc67905 "unstable_if"
$34748 = 0x7f7a7cc4d0ba "none"
$34749 = 0x7f7a14f422c0
"com_facebook_presto_$gen_PageProcessor_440742.filter(Lcom/facebook/presto/spi/ConnectorSession;Lcom/facebook/presto/spi/block/Block;Lcom/facebook/presto/spi/block/Block;Lcom/facebook/presto/spi/block/"...
$34750 = 450
You can find the class referenced above at
  https://gist.github.com/haozhun/920b28932ef253b55f3f497e2d187286
along with its text representation. To summarize, at bci 450, the branch
predicts ~5% one way, ~95% the rest. When one branch is taken, the same
branch
will most likely continue to be taken for the same branch for a while.
My questions:
* Is there anything else we could look at while the process is still
running?
  (we have kept 3 of them around, for now)
* Are there any options we can turn on next time we start the process that
  could help troubleshoot this when it happens again?
* Is the behavior we’re seeing, namely, repeated uncommon traps for the same
  method for “unstable if” with no apparent deoptimization/further
  re-compilations normal?
* For my understanding, what does action “none” mean for “unstable if”?
Java version:
  java version "1.8.0_72"
  Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
  Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)
Excerpt of JVM options:
  -server
  -Xmx120G
  -Xms120G
  -Xss2048k
  -XX:+PreserveFramePointer
  -XX:+UseG1GC
  -XX:+UnlockExperimentalVMOptions
  -XX:G1MaxNewSizePercent=20
  -XX:G1HeapRegionSize=32M
  -XX:+ExplicitGCInvokesConcurrent
  -XX:+HeapDumpOnOutOfMemoryError
  -XX:+PrintReferenceGC
  -XX:+PrintGCCause
  -XX:+PrintGCDateStamps
  -XX:+PrintGCTimeStamps
  -XX:+PrintGCDetails
  -XX:+PrintClassHistogramAfterFullGC
  -XX:+PrintClassHistogramBeforeFullGC
  -XX:+PrintAdaptiveSizePolicy
  -XX:+PrintSafepointStatistics
  -XX:-OmitStackTraceInFastThrow
  -XX:+UnlockDiagnosticVMOptions
  -XX:+UnlockExperimentalVMOptions
  -XX:-UseBiasedLocking
  -XX:ReservedCodeCacheSize=512m
Excerpt of perf reuslt:
Samples: 97K of event 'cycles', Event count (approx.): 45248264720
  Overhead  Command          Shared Object         Symbol
+    4.99%  swapper          [kernel.kallsyms]     [k] intel_idle
-    4.40%  presto-facebook  libc-2.12.so          [.] _IO_strn_overflow
     _IO_strn_overflow
     _IO_default_xsputn
     vfprintf
     _IO_vsnprintf
     Events::log_deopt_message
     Deoptimization::uncommon_trap_inner
     Deoptimization::uncommon_trap
     UncommonTrapBlob
-    4.30%  presto-facebook  libc-2.12.so          [.] vfprintf
     vfprintf
   - _IO_vsnprintf
      + 60.98% Events::log_deopt_message
      + 39.02% Events::log
+    3.93%  presto-facebook  perf-994763.map       [.] Interpreter
-    3.55%  presto-facebook  libjvm.so             [.] vframeArray::allocate
     vframeArray::allocate
     Deoptimization::fetch_unroll_info_helper
     UncommonTrapBlob
-    3.40%  presto-facebook  libc-2.12.so          [.] _IO_default_xsputn
   - _IO_default_xsputn
      - 93.51% vfprintf
         - _IO_vsnprintf
            + 83.39% Events::log_deopt_message
            + 16.61% Events::log
      + 6.49% _IO_padn
-    3.25%  presto-facebook  libc-2.12.so          [.] _int_malloc
     _int_malloc
     malloc
   - os::malloc
      + 41.90% Deoptimization::fetch_unroll_info_helper
      + 24.60% CHeapObj<(MemoryType)6>::operator new
      + 19.49% vframeArray::allocate
      + 14.01% CHeapObj<(MemoryType)7>::operator new
-    2.85%  presto-facebook  libc-2.12.so          [.] malloc
     malloc
   - os::malloc
      + 41.94% Deoptimization::fetch_unroll_info_helper
      + 21.75% vframeArray::allocate
      + 19.94% CHeapObj<(MemoryType)7>::operator new
      + 16.37% CHeapObj<(MemoryType)6>::operator new
-    2.34%  presto-facebook  libc-2.12.so          [.] malloc_consolidate
     malloc_consolidate
     _int_malloc
     malloc
   - os::malloc
      + 64.23% vframeArray::allocate
      + 35.77% Deoptimization::fetch_unroll_info_helper
-    2.04%  presto-facebook  libjvm.so             [.]
resource_allocate_bytes
   - resource_allocate_bytes
      + 25.88% vframeArrayElement::fill_in
      + 20.02% StackValue::create_stack_value
      + 19.86% ScopeValue::read_from
      + 8.19% GenericGrowableArray::raw_allocate
      + 3.79% nmethod::scope_desc_at
      + 3.69% compiledVFrame::monitors
      + 3.55% OopMapSet::update_register_map
      + 3.30% ScopeDesc::decode_scope_values
      + 2.05% compiledVFrame::locals
      + 2.04% compiledVFrame::expressions
      + 1.87% vframe::new_vframe
      + 1.52% ScopeDesc::decode_body
      + 1.05% Symbol::as_C_string
      + 0.99% Deoptimization::fetch_unroll_info_helper
      + 0.73% Symbol::as_klass_external_name
      + 0.67% DataLayout::data_in
-    1.70%  presto-facebook  libc-2.12.so          [.] _int_free
     _int_free
   - Deoptimization::cleanup_deopt_info
    
    
More information about the hotspot-dev
mailing list