Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8)

Fri Jul 31 18:48:53 UTC 2015

Hi Vladimir --

I noticed the increase even with Initial and Reserved set to the default of
240 MB, but actual usage much lower (less than a quarter).

Look at this code path. Note that this is invoked at every safepoint
(although it says "periodically" in the comment).
In the mark_active_nmethods() method, there's a thread iteration in both
branches of the if. I haven't checked to
see which of the two was the culprit here, yet (if either).

// Various cleaning tasks that should be done periodically at safepoints

void SafepointSynchronize::do_cleanup_tasks() {

....

  {

    TraceTime t4("mark nmethods", TraceSafepointCleanupTime);

    NMethodSweeper::mark_active_nmethods();

  }

..

}

void NMethodSweeper::mark_active_nmethods() {

 ...

  if (!sweep_in_progress()) {

    _seen = 0;

    _sweep_fractions_left = NmethodSweepFraction;

    _current = CodeCache::first_nmethod();

    _traversals += 1;

    _total_time_this_sweep = Tickspan();

    if (PrintMethodFlushing) {

      tty->print_cr("### Sweep: stack traversal %d", _traversals);

    }

    Threads::nmethods_do(&mark_activation_closure);

  } else {

    // Only set hotness counter

    Threads::nmethods_do(&set_hotness_closure);

  }

  OrderAccess::storestore();

}

On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov <
vladimir.kozlov at oracle.com> wrote:

> Hi Ramki,
>
> Did you fill up CodeCache? It start scanning aggressive only with full
> CodeCache:
>
>   // Force stack scanning if there is only 10% free space in the code
> cache.
>   // We force stack scanning only non-profiled code heap gets full, since
> critical
>   // allocation go to the non-profiled heap and we must be make sure that
> there is
>   // enough space.
>   double free_percent = 1 /
> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;
>   if (free_percent <= StartAggressiveSweepingAt) {
>     do_stack_scanning();
>   }
>
> Vladimir
>
> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:
>
>>
>> Yes.
>>
>>
>> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich <vitalyd at gmail.com
>> <mailto:vitalyd at gmail.com>> wrote:
>>
>>     Ramki, are you running tiered compilation?
>>
>>     sent from my phone
>>
>>     On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" <ysr1729 at gmail.com
>>     <mailto:ysr1729 at gmail.com>> wrote:
>>
>>
>>         Hello GC and Compiler teams!
>>
>>         One of our services that runs with several thousand threads
>>         recently noticed an increase
>>         in safepoint stop times, but not gc times, upon transitioning to
>>         JDK 8.
>>
>>         Further investigation revealed that most of the delta was
>>         related to the so-called
>>         pre-gc/vmop "cleanup" phase when various book-keeping activities
>>         are performed,
>>         and more specifically in the portion that walks java thread
>>         stacks single-threaded (!)
>>         and updates the hotness counters for the active nmethods. This
>>         code appears to
>>         be new to JDK 8 (in jdk 7 one would walk the stacks only during
>>         code cache sweeps).
>>
>>         I have two questions:
>>         (1) has anyone else (typically, I'd expect applications with
>>         many hundreds or thousands of threads)
>>         noticed this regression?
>>         (2) Can we do better, for example, by:
>>                (a) doing these updates by walking thread stacks in
>>         multiple worker threads in parallel, or best of all:
>>                (b) doing these updates when we walk the thread stacks
>>         during GC, and skipping this phase entirely
>>                      for non-GC safepoints (with attendant loss in
>>         frequency of this update in low GC frequency
>>                      scenarios).
>>
>>         It seems kind of silly to do GC's with many multiple worker
>>         threads, but do these thread stack
>>         walks single-threaded when it is embarrasingly parallel (one
>>         could predicate the parallelization
>>         based on the measured stack sizes and thread population, if
>>         there was concern on the ovrhead of
>>         activating and deactivating the thread gangs for the work).
>>
>>         A followup question: Any guesses as to how code cache
>>         sweep/eviction quality might be compromised if one
>>         were to dispense with these hotness updates entirely (or at a
>>         much reduced frequency), as a temporary
>>         workaround to the performance problem?
>>
>>         Thoughts/Comments? In particular, has this issue been addressed
>>         perhaps in newer JVMs?
>>
>>         Thanks for any comments, feedback, pointers!
>>         -- ramki
>>
>>         PS: for comparison, here's data with +TraceSafepointCleanup from
>>         JDK 7 (first, where this isn't done)
>>         vs JDK 8 (where this is done) with a program that has a few
>>         thousands of threads:
>>
>>
>>
>>         JDK 7:
>>         ..
>>         2827.308: [sweeping nmethods, 0.0000020 secs]
>>         2828.679: [sweeping nmethods, 0.0000030 secs]
>>         2829.984: [sweeping nmethods, 0.0000030 secs]
>>         2830.956: [sweeping nmethods, 0.0000030 secs]
>>         ..
>>
>>         JDK 8:
>>         ..
>>         7368.634: [mark nmethods, 0.0177030 secs]
>>         7369.587: [mark nmethods, 0.0178305 secs]
>>         7370.479: [mark nmethods, 0.0180260 secs]
>>         7371.503: [mark nmethods, 0.0186494 secs]
>>         ..
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150731/1f262687/attachment.html>