<div dir="ltr">Hi Vladimir --<div><br></div><div>I noticed the increase even with Initial and Reserved set to the default of 240 MB, but actual usage much lower (less than a quarter).<div><br></div><div>Look at this code path. Note that this is invoked at every safepoint (although it says "periodically" in the comment).</div><div>In the mark_active_nmethods() method, there's a thread iteration in both branches of the if. I haven't checked to</div><div>see which of the two was the culprit here, yet (if either).</div><div><br></div><div>
<p class="p1">// Various cleaning tasks that should be done periodically at safepoints</p>
<p class="p1">void SafepointSynchronize::do_cleanup_tasks() {</p>
<p class="p1">....</p>
<p class="p1">Â {</p>
<p class="p1">Â Â TraceTime t4("mark nmethods", TraceSafepointCleanupTime);</p>
<p class="p1">Â Â NMethodSweeper::mark_active_nmethods();</p>
<p class="p1">Â }</p><p class="p1">..</p><p class="p1">}</p><p class="p1"><br></p><p class="p1">void NMethodSweeper::mark_active_nmethods() {</p><p class="p1">Â ...</p><p class="p1">Â if (!sweep_in_progress()) {<br></p><p class="p1">Â Â _seen = 0;</p><p class="p1">Â Â _sweep_fractions_left = NmethodSweepFraction;</p><p class="p1">Â Â _current = CodeCache::first_nmethod();</p><p class="p1">Â Â _traversals += 1;</p><p class="p1">Â Â _total_time_this_sweep = Tickspan();</p><p class="p2"><br></p><p class="p1">Â Â if (PrintMethodFlushing) {</p><p class="p1">Â Â Â tty->print_cr("### Sweep: stack traversal %d", _traversals);</p><p class="p1">Â Â }</p><p class="p1">Â Â Threads::nmethods_do(&mark_activation_closure);</p><p class="p2"><br></p><p class="p1">Â } else {</p><p class="p1">Â Â // Only set hotness counter</p><p class="p1">Â Â Threads::nmethods_do(&set_hotness_closure);</p><p class="p1">Â }</p><p class="p2"><br></p><p class="p1">Â OrderAccess::storestore();</p><p class="p1">
</p><p class="p1">}</p></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov <span dir="ltr"><<a href="mailto:vladimir.kozlov@oracle.com" target="_blank">vladimir.kozlov@oracle.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Ramki,<br>
<br>
Did you fill up CodeCache? It start scanning aggressive only with full CodeCache:<br>
<br>
 // Force stack scanning if there is only 10% free space in the code cache.<br>
 // We force stack scanning only non-profiled code heap gets full, since critical<br>
 // allocation go to the non-profiled heap and we must be make sure that there is<br>
 // enough space.<br>
 double free_percent = 1 / CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;<br>
 if (free_percent <= StartAggressiveSweepingAt) {<br>
  do_stack_scanning();<br>
 }<span class="HOEnZb"><font color="#888888"><br>
<br>
Vladimir</font></span><span class=""><br>
<br>
On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:<br>
</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
<br>
Yes.<br>
<br>
<br>
On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich <<a href="mailto:vitalyd@gmail.com" target="_blank">vitalyd@gmail.com</a><br></span><span class="">
<mailto:<a href="mailto:vitalyd@gmail.com" target="_blank">vitalyd@gmail.com</a>>> wrote:<br>
<br>
  Ramki, are you running tiered compilation?<br>
<br>
  sent from my phone<br>
<br>
  On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" <<a href="mailto:ysr1729@gmail.com" target="_blank">ysr1729@gmail.com</a><br></span><div><div class="h5">
  <mailto:<a href="mailto:ysr1729@gmail.com" target="_blank">ysr1729@gmail.com</a>>> wrote:<br>
<br>
<br>
    Hello GC and Compiler teams!<br>
<br>
    One of our services that runs with several thousand threads<br>
    recently noticed an increase<br>
    in safepoint stop times, but not gc times, upon transitioning to<br>
    JDK 8.<br>
<br>
    Further investigation revealed that most of the delta was<br>
    related to the so-called<br>
    pre-gc/vmop "cleanup" phase when various book-keeping activities<br>
    are performed,<br>
    and more specifically in the portion that walks java thread<br>
    stacks single-threaded (!)<br>
    and updates the hotness counters for the active nmethods. This<br>
    code appears to<br>
    be new to JDK 8 (in jdk 7 one would walk the stacks only during<br>
    code cache sweeps).<br>
<br>
    I have two questions:<br>
    (1) has anyone else (typically, I'd expect applications with<br>
    many hundreds or thousands of threads)<br>
    noticed this regression?<br>
    (2) Can we do better, for example, by:<br>
        (a) doing these updates by walking thread stacks in<br>
    multiple worker threads in parallel, or best of all:<br>
        (b) doing these updates when we walk the thread stacks<br>
    during GC, and skipping this phase entirely<br>
           for non-GC safepoints (with attendant loss in<br>
    frequency of this update in low GC frequency<br>
           scenarios).<br>
<br>
    It seems kind of silly to do GC's with many multiple worker<br>
    threads, but do these thread stack<br>
    walks single-threaded when it is embarrasingly parallel (one<br>
    could predicate the parallelization<br>
    based on the measured stack sizes and thread population, if<br>
    there was concern on the ovrhead of<br>
    activating and deactivating the thread gangs for the work).<br>
<br>
    A followup question: Any guesses as to how code cache<br>
    sweep/eviction quality might be compromised if one<br>
    were to dispense with these hotness updates entirely (or at a<br>
    much reduced frequency), as a temporary<br>
    workaround to the performance problem?<br>
<br>
    Thoughts/Comments? In particular, has this issue been addressed<br>
    perhaps in newer JVMs?<br>
<br>
    Thanks for any comments, feedback, pointers!<br>
    -- ramki<br>
<br>
    PS: for comparison, here's data with +TraceSafepointCleanup from<br>
    JDK 7 (first, where this isn't done)<br>
    vs JDK 8 (where this is done) with a program that has a few<br>
    thousands of threads:<br>
<br>
<br>
<br>
    JDK 7:<br>
    ..<br>
    2827.308: [sweeping nmethods, 0.0000020 secs]<br>
    2828.679: [sweeping nmethods, 0.0000030 secs]<br>
    2829.984: [sweeping nmethods, 0.0000030 secs]<br>
    2830.956: [sweeping nmethods, 0.0000030 secs]<br>
    ..<br>
<br>
    JDK 8:<br>
    ..<br>
    7368.634: [mark nmethods, 0.0177030 secs]<br>
    7369.587: [mark nmethods, 0.0178305 secs]<br>
    7370.479: [mark nmethods, 0.0180260 secs]<br>
    7371.503: [mark nmethods, 0.0186494 secs]<br>
    ..<br>
<br>
<br>
</div></div></blockquote>
</blockquote></div><br></div>