Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8)

Vitaly Davidovich vitalyd at gmail.com
Fri Jul 31 22:08:14 UTC 2015


Ramki, are you actually seeing better peak perf with tiered than C2? I
experimented with it on a real workload and it was a net loss for peak perf
(anywhere from 8-20% worse than C2, but also quite unstable); this was with
a very large code cache to play it safe, but no other tuning.

sent from my phone
On Jul 31, 2015 6:02 PM, "Srinivas Ramakrishna" <ysr1729 at gmail.com> wrote:

> OK, will do and add you as watcher; thanks Vladimir! (don't yet know if
> with tiered and a necessarily bounded, if large, code cache whether
> flushing will in fact eventually become necessary, wrt yr suggested
> temporary workaround.)
>
> Have a good weekend!
> -- ramki
>
> On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov <
> vladimir.kozlov at oracle.com> wrote:
>
>> Got it. Yes, it is issue with thousands java threads.
>> You are the first pointing this problem. File bug on compiler. We will
>> look what we can do. Most likely we need parallelize this work.
>>
>> Method's hotness is used only for UseCodeCacheFlushing. You can try to
>> guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch
>> it off.
>>
>> We need mark_as_seen_on_stack so leave it.
>>
>> Thanks,
>> Vladimir
>>
>>
>> On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote:
>>
>>> Hi Vladimir --
>>>
>>> I noticed the increase even with Initial and Reserved set to the default
>>> of 240 MB, but actual usage much lower (less than a quarter).
>>>
>>> Look at this code path. Note that this is invoked at every safepoint
>>> (although it says "periodically" in the comment).
>>> In the mark_active_nmethods() method, there's a thread iteration in both
>>> branches of the if. I haven't checked to
>>> see which of the two was the culprit here, yet (if either).
>>>
>>> // Various cleaning tasks that should be done periodically at safepoints
>>>
>>> void SafepointSynchronize::do_cleanup_tasks() {
>>>
>>> ....
>>>
>>>    {
>>>
>>>      TraceTime t4("mark nmethods", TraceSafepointCleanupTime);
>>>
>>>      NMethodSweeper::mark_active_nmethods();
>>>
>>>    }
>>>
>>> ..
>>>
>>> }
>>>
>>>
>>> void NMethodSweeper::mark_active_nmethods() {
>>>
>>>   ...
>>>
>>>    if (!sweep_in_progress()) {
>>>
>>>      _seen = 0;
>>>
>>>      _sweep_fractions_left = NmethodSweepFraction;
>>>
>>>      _current = CodeCache::first_nmethod();
>>>
>>>      _traversals += 1;
>>>
>>>      _total_time_this_sweep = Tickspan();
>>>
>>>
>>>      if (PrintMethodFlushing) {
>>>
>>>        tty->print_cr("### Sweep: stack traversal %d", _traversals);
>>>
>>>      }
>>>
>>>      Threads::nmethods_do(&mark_activation_closure);
>>>
>>>
>>>    } else {
>>>
>>>      // Only set hotness counter
>>>
>>>      Threads::nmethods_do(&set_hotness_closure);
>>>
>>>    }
>>>
>>>
>>>    OrderAccess::storestore();
>>>
>>> }
>>>
>>>
>>> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>>
>>>     Hi Ramki,
>>>
>>>     Did you fill up CodeCache? It start scanning aggressive only with
>>>     full CodeCache:
>>>
>>>        // Force stack scanning if there is only 10% free space in the
>>>     code cache.
>>>        // We force stack scanning only non-profiled code heap gets full,
>>>     since critical
>>>        // allocation go to the non-profiled heap and we must be make
>>>     sure that there is
>>>        // enough space.
>>>        double free_percent = 1 /
>>>     CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;
>>>        if (free_percent <= StartAggressiveSweepingAt) {
>>>          do_stack_scanning();
>>>        }
>>>
>>>     Vladimir
>>>
>>>     On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:
>>>
>>>
>>>         Yes.
>>>
>>>
>>>         On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich
>>>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>>>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>>>
>>>              Ramki, are you running tiered compilation?
>>>
>>>              sent from my phone
>>>
>>>              On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna"
>>>         <ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>
>>>              <mailto:ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>>>
>>> wrote:
>>>
>>>
>>>                  Hello GC and Compiler teams!
>>>
>>>                  One of our services that runs with several thousand
>>> threads
>>>                  recently noticed an increase
>>>                  in safepoint stop times, but not gc times, upon
>>>         transitioning to
>>>                  JDK 8.
>>>
>>>                  Further investigation revealed that most of the delta
>>> was
>>>                  related to the so-called
>>>                  pre-gc/vmop "cleanup" phase when various book-keeping
>>>         activities
>>>                  are performed,
>>>                  and more specifically in the portion that walks java
>>> thread
>>>                  stacks single-threaded (!)
>>>                  and updates the hotness counters for the active
>>>         nmethods. This
>>>                  code appears to
>>>                  be new to JDK 8 (in jdk 7 one would walk the stacks
>>>         only during
>>>                  code cache sweeps).
>>>
>>>                  I have two questions:
>>>                  (1) has anyone else (typically, I'd expect applications
>>>         with
>>>                  many hundreds or thousands of threads)
>>>                  noticed this regression?
>>>                  (2) Can we do better, for example, by:
>>>                         (a) doing these updates by walking thread stacks
>>> in
>>>                  multiple worker threads in parallel, or best of all:
>>>                         (b) doing these updates when we walk the thread
>>>         stacks
>>>                  during GC, and skipping this phase entirely
>>>                               for non-GC safepoints (with attendant loss
>>> in
>>>                  frequency of this update in low GC frequency
>>>                               scenarios).
>>>
>>>                  It seems kind of silly to do GC's with many multiple
>>> worker
>>>                  threads, but do these thread stack
>>>                  walks single-threaded when it is embarrasingly parallel
>>>         (one
>>>                  could predicate the parallelization
>>>                  based on the measured stack sizes and thread
>>> population, if
>>>                  there was concern on the ovrhead of
>>>                  activating and deactivating the thread gangs for the
>>> work).
>>>
>>>                  A followup question: Any guesses as to how code cache
>>>                  sweep/eviction quality might be compromised if one
>>>                  were to dispense with these hotness updates entirely
>>>         (or at a
>>>                  much reduced frequency), as a temporary
>>>                  workaround to the performance problem?
>>>
>>>                  Thoughts/Comments? In particular, has this issue been
>>>         addressed
>>>                  perhaps in newer JVMs?
>>>
>>>                  Thanks for any comments, feedback, pointers!
>>>                  -- ramki
>>>
>>>                  PS: for comparison, here's data with
>>>         +TraceSafepointCleanup from
>>>                  JDK 7 (first, where this isn't done)
>>>                  vs JDK 8 (where this is done) with a program that has a
>>> few
>>>                  thousands of threads:
>>>
>>>
>>>
>>>                  JDK 7:
>>>                  ..
>>>                  2827.308: [sweeping nmethods, 0.0000020 secs]
>>>                  2828.679: [sweeping nmethods, 0.0000030 secs]
>>>                  2829.984: [sweeping nmethods, 0.0000030 secs]
>>>                  2830.956: [sweeping nmethods, 0.0000030 secs]
>>>                  ..
>>>
>>>                  JDK 8:
>>>                  ..
>>>                  7368.634: [mark nmethods, 0.0177030 secs]
>>>                  7369.587: [mark nmethods, 0.0178305 secs]
>>>                  7370.479: [mark nmethods, 0.0180260 secs]
>>>                  7371.503: [mark nmethods, 0.0186494 secs]
>>>                  ..
>>>
>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150731/79dac5cc/attachment.htm>


More information about the hotspot-gc-dev mailing list