Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8)

Sun Aug 2 18:11:31 UTC 2015

I filed:  https://bugs.openjdk.java.net/browse/JDK-8132849

thanks!
-- ramki

On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com
> wrote:

> Got it. Yes, it is issue with thousands java threads.
> You are the first pointing this problem. File bug on compiler. We will
> look what we can do. Most likely we need parallelize this work.
>
> Method's hotness is used only for UseCodeCacheFlushing. You can try to
> guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch
> it off.
>
> We need mark_as_seen_on_stack so leave it.
>
> Thanks,
> Vladimir
>
>
> On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote:
>
>> Hi Vladimir --
>>
>> I noticed the increase even with Initial and Reserved set to the default
>> of 240 MB, but actual usage much lower (less than a quarter).
>>
>> Look at this code path. Note that this is invoked at every safepoint
>> (although it says "periodically" in the comment).
>> In the mark_active_nmethods() method, there's a thread iteration in both
>> branches of the if. I haven't checked to
>> see which of the two was the culprit here, yet (if either).
>>
>> // Various cleaning tasks that should be done periodically at safepoints
>>
>> void SafepointSynchronize::do_cleanup_tasks() {
>>
>> ....
>>
>>    {
>>
>>      TraceTime t4("mark nmethods", TraceSafepointCleanupTime);
>>
>>      NMethodSweeper::mark_active_nmethods();
>>
>>    }
>>
>> ..
>>
>> }
>>
>>
>> void NMethodSweeper::mark_active_nmethods() {
>>
>>   ...
>>
>>    if (!sweep_in_progress()) {
>>
>>      _seen = 0;
>>
>>      _sweep_fractions_left = NmethodSweepFraction;
>>
>>      _current = CodeCache::first_nmethod();
>>
>>      _traversals += 1;
>>
>>      _total_time_this_sweep = Tickspan();
>>
>>
>>      if (PrintMethodFlushing) {
>>
>>        tty->print_cr("### Sweep: stack traversal %d", _traversals);
>>
>>      }
>>
>>      Threads::nmethods_do(&mark_activation_closure);
>>
>>
>>    } else {
>>
>>      // Only set hotness counter
>>
>>      Threads::nmethods_do(&set_hotness_closure);
>>
>>    }
>>
>>
>>    OrderAccess::storestore();
>>
>> }
>>
>>
>> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>>     Hi Ramki,
>>
>>     Did you fill up CodeCache? It start scanning aggressive only with
>>     full CodeCache:
>>
>>        // Force stack scanning if there is only 10% free space in the
>>     code cache.
>>        // We force stack scanning only non-profiled code heap gets full,
>>     since critical
>>        // allocation go to the non-profiled heap and we must be make
>>     sure that there is
>>        // enough space.
>>        double free_percent = 1 /
>>     CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;
>>        if (free_percent <= StartAggressiveSweepingAt) {
>>          do_stack_scanning();
>>        }
>>
>>     Vladimir
>>
>>     On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:
>>
>>
>>         Yes.
>>
>>
>>         On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich
>>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>>
>>              Ramki, are you running tiered compilation?
>>
>>              sent from my phone
>>
>>              On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna"
>>         <ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>
>>              <mailto:ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>>>
>> wrote:
>>
>>
>>                  Hello GC and Compiler teams!
>>
>>                  One of our services that runs with several thousand
>> threads
>>                  recently noticed an increase
>>                  in safepoint stop times, but not gc times, upon
>>         transitioning to
>>                  JDK 8.
>>
>>                  Further investigation revealed that most of the delta was
>>                  related to the so-called
>>                  pre-gc/vmop "cleanup" phase when various book-keeping
>>         activities
>>                  are performed,
>>                  and more specifically in the portion that walks java
>> thread
>>                  stacks single-threaded (!)
>>                  and updates the hotness counters for the active
>>         nmethods. This
>>                  code appears to
>>                  be new to JDK 8 (in jdk 7 one would walk the stacks
>>         only during
>>                  code cache sweeps).
>>
>>                  I have two questions:
>>                  (1) has anyone else (typically, I'd expect applications
>>         with
>>                  many hundreds or thousands of threads)
>>                  noticed this regression?
>>                  (2) Can we do better, for example, by:
>>                         (a) doing these updates by walking thread stacks
>> in
>>                  multiple worker threads in parallel, or best of all:
>>                         (b) doing these updates when we walk the thread
>>         stacks
>>                  during GC, and skipping this phase entirely
>>                               for non-GC safepoints (with attendant loss
>> in
>>                  frequency of this update in low GC frequency
>>                               scenarios).
>>
>>                  It seems kind of silly to do GC's with many multiple
>> worker
>>                  threads, but do these thread stack
>>                  walks single-threaded when it is embarrasingly parallel
>>         (one
>>                  could predicate the parallelization
>>                  based on the measured stack sizes and thread population,
>> if
>>                  there was concern on the ovrhead of
>>                  activating and deactivating the thread gangs for the
>> work).
>>
>>                  A followup question: Any guesses as to how code cache
>>                  sweep/eviction quality might be compromised if one
>>                  were to dispense with these hotness updates entirely
>>         (or at a
>>                  much reduced frequency), as a temporary
>>                  workaround to the performance problem?
>>
>>                  Thoughts/Comments? In particular, has this issue been
>>         addressed
>>                  perhaps in newer JVMs?
>>
>>                  Thanks for any comments, feedback, pointers!
>>                  -- ramki
>>
>>                  PS: for comparison, here's data with
>>         +TraceSafepointCleanup from
>>                  JDK 7 (first, where this isn't done)
>>                  vs JDK 8 (where this is done) with a program that has a
>> few
>>                  thousands of threads:
>>
>>
>>
>>                  JDK 7:
>>                  ..
>>                  2827.308: [sweeping nmethods, 0.0000020 secs]
>>                  2828.679: [sweeping nmethods, 0.0000030 secs]
>>                  2829.984: [sweeping nmethods, 0.0000030 secs]
>>                  2830.956: [sweeping nmethods, 0.0000030 secs]
>>                  ..
>>
>>                  JDK 8:
>>                  ..
>>                  7368.634: [mark nmethods, 0.0177030 secs]
>>                  7369.587: [mark nmethods, 0.0178305 secs]
>>                  7370.479: [mark nmethods, 0.0180260 secs]
>>                  7371.503: [mark nmethods, 0.0186494 secs]
>>                  ..
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150802/419d5450/attachment.htm>