<div dir="ltr">OK, will do and add you as watcher; thanks Vladimir! (don't yet know if with tiered and a necessarily bounded, if large, code cache whether flushing will in fact eventually become necessary, wrt yr suggested temporary workaround.)<div><br></div><div>Have a good weekend!</div><div>-- ramki</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov <span dir="ltr"><<a href="mailto:vladimir.kozlov@oracle.com" target="_blank">vladimir.kozlov@oracle.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Got it. Yes, it is issue with thousands java threads.<br>

You are the first pointing this problem. File bug on compiler. We will look what we can do. Most likely we need parallelize this work.<br>

<br>

Method's hotness is used only for UseCodeCacheFlushing. You can try to guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch it off.<br>

<br>

We need mark_as_seen_on_stack so leave it.<br>

<br>

Thanks,<br>

Vladimir<div><div class="h5"><br>

<br>

On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote:<br>

</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">

Hi Vladimir --<br>

<br>

I noticed the increase even with Initial and Reserved set to the default<br>

of 240 MB, but actual usage much lower (less than a quarter).<br>

<br>

Look at this code path. Note that this is invoked at every safepoint<br>

(although it says "periodically" in the comment).<br>

In the mark_active_nmethods() method, there's a thread iteration in both<br>

branches of the if. I haven't checked to<br>

see which of the two was the culprit here, yet (if either).<br>

<br>

// Various cleaning tasks that should be done periodically at safepoints<br>

<br>

void SafepointSynchronize::do_cleanup_tasks() {<br>

<br>

....<br>

<br>

   {<br>

<br>

     TraceTime t4("mark nmethods", TraceSafepointCleanupTime);<br>

<br>

     NMethodSweeper::mark_active_nmethods();<br>

<br>

   }<br>

<br>

..<br>

<br>

}<br>

<br>

<br>

void NMethodSweeper::mark_active_nmethods() {<br>

<br>

  ...<br>

<br>

   if (!sweep_in_progress()) {<br>

<br>

     _seen = 0;<br>

<br>

     _sweep_fractions_left = NmethodSweepFraction;<br>

<br>

     _current = CodeCache::first_nmethod();<br>

<br>

     _traversals += 1;<br>

<br>

     _total_time_this_sweep = Tickspan();<br>

<br>

<br>

     if (PrintMethodFlushing) {<br>

<br>

       tty->print_cr("### Sweep: stack traversal %d", _traversals);<br>

<br>

     }<br>

<br>

     Threads::nmethods_do(&mark_activation_closure);<br>

<br>

<br>

   } else {<br>

<br>

     // Only set hotness counter<br>

<br>

     Threads::nmethods_do(&set_hotness_closure);<br>

<br>

   }<br>

<br>

<br>

   OrderAccess::storestore();<br>

<br>

}<br>

<br>

<br>

On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov<br></div></div><span class="">

<<a href="mailto:vladimir.kozlov@oracle.com" target="_blank">vladimir.kozlov@oracle.com</a> <mailto:<a href="mailto:vladimir.kozlov@oracle.com" target="_blank">vladimir.kozlov@oracle.com</a>>> wrote:<br>

<br>

    Hi Ramki,<br>

<br>

    Did you fill up CodeCache? It start scanning aggressive only with<br>

    full CodeCache:<br>

<br>

       // Force stack scanning if there is only 10% free space in the<br>

    code cache.<br>

       // We force stack scanning only non-profiled code heap gets full,<br>

    since critical<br>

       // allocation go to the non-profiled heap and we must be make<br>

    sure that there is<br>

       // enough space.<br>

       double free_percent = 1 /<br>

    CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;<br>

       if (free_percent <= StartAggressiveSweepingAt) {<br>

         do_stack_scanning();<br>

       }<br>

<br>

    Vladimir<br>

<br>

    On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:<br>

<br>

<br>

        Yes.<br>

<br>

<br>

        On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich<br>

        <<a href="mailto:vitalyd@gmail.com" target="_blank">vitalyd@gmail.com</a> <mailto:<a href="mailto:vitalyd@gmail.com" target="_blank">vitalyd@gmail.com</a>><br></span><span class="">

        <mailto:<a href="mailto:vitalyd@gmail.com" target="_blank">vitalyd@gmail.com</a> <mailto:<a href="mailto:vitalyd@gmail.com" target="_blank">vitalyd@gmail.com</a>>>> wrote:<br>

<br>

             Ramki, are you running tiered compilation?<br>

<br>

             sent from my phone<br>

<br>

             On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna"<br>

        <<a href="mailto:ysr1729@gmail.com" target="_blank">ysr1729@gmail.com</a> <mailto:<a href="mailto:ysr1729@gmail.com" target="_blank">ysr1729@gmail.com</a>><br></span><div><div class="h5">

             <mailto:<a href="mailto:ysr1729@gmail.com" target="_blank">ysr1729@gmail.com</a> <mailto:<a href="mailto:ysr1729@gmail.com" target="_blank">ysr1729@gmail.com</a>>>> wrote:<br>

<br>

<br>

                 Hello GC and Compiler teams!<br>

<br>

                 One of our services that runs with several thousand threads<br>

                 recently noticed an increase<br>

                 in safepoint stop times, but not gc times, upon<br>

        transitioning to<br>

                 JDK 8.<br>

<br>

                 Further investigation revealed that most of the delta was<br>

                 related to the so-called<br>

                 pre-gc/vmop "cleanup" phase when various book-keeping<br>

        activities<br>

                 are performed,<br>

                 and more specifically in the portion that walks java thread<br>

                 stacks single-threaded (!)<br>

                 and updates the hotness counters for the active<br>

        nmethods. This<br>

                 code appears to<br>

                 be new to JDK 8 (in jdk 7 one would walk the stacks<br>

        only during<br>

                 code cache sweeps).<br>

<br>

                 I have two questions:<br>

                 (1) has anyone else (typically, I'd expect applications<br>

        with<br>

                 many hundreds or thousands of threads)<br>

                 noticed this regression?<br>

                 (2) Can we do better, for example, by:<br>

                        (a) doing these updates by walking thread stacks in<br>

                 multiple worker threads in parallel, or best of all:<br>

                        (b) doing these updates when we walk the thread<br>

        stacks<br>

                 during GC, and skipping this phase entirely<br>

                              for non-GC safepoints (with attendant loss in<br>

                 frequency of this update in low GC frequency<br>

                              scenarios).<br>

<br>

                 It seems kind of silly to do GC's with many multiple worker<br>

                 threads, but do these thread stack<br>

                 walks single-threaded when it is embarrasingly parallel<br>

        (one<br>

                 could predicate the parallelization<br>

                 based on the measured stack sizes and thread population, if<br>

                 there was concern on the ovrhead of<br>

                 activating and deactivating the thread gangs for the work).<br>

<br>

                 A followup question: Any guesses as to how code cache<br>

                 sweep/eviction quality might be compromised if one<br>

                 were to dispense with these hotness updates entirely<br>

        (or at a<br>

                 much reduced frequency), as a temporary<br>

                 workaround to the performance problem?<br>

<br>

                 Thoughts/Comments? In particular, has this issue been<br>

        addressed<br>

                 perhaps in newer JVMs?<br>

<br>

                 Thanks for any comments, feedback, pointers!<br>

                 -- ramki<br>

<br>

                 PS: for comparison, here's data with<br>

        +TraceSafepointCleanup from<br>

                 JDK 7 (first, where this isn't done)<br>

                 vs JDK 8 (where this is done) with a program that has a few<br>

                 thousands of threads:<br>

<br>

<br>

<br>

                 JDK 7:<br>

                 ..<br>

                 2827.308: [sweeping nmethods, 0.0000020 secs]<br>

                 2828.679: [sweeping nmethods, 0.0000030 secs]<br>

                 2829.984: [sweeping nmethods, 0.0000030 secs]<br>

                 2830.956: [sweeping nmethods, 0.0000030 secs]<br>

                 ..<br>

<br>

                 JDK 8:<br>

                 ..<br>

                 7368.634: [mark nmethods, 0.0177030 secs]<br>

                 7369.587: [mark nmethods, 0.0178305 secs]<br>

                 7370.479: [mark nmethods, 0.0180260 secs]<br>

                 7371.503: [mark nmethods, 0.0186494 secs]<br>

                 ..<br>

<br>

<br>

<br>

</div></div></blockquote>

</blockquote></div><br></div>