Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8)

Sat Aug 1 00:39:03 UTC 2015

Hi Vitaly -- Which jdk 8 version were you testing? It's a bit of the proverbial curate's egg at the moment (albeit not in the original sense, i assure you!)  but if i may be allowed to mix my metaphors, I would be inclined not to throw out the baby with the bath water, yet. There are services that have seen benefits and some that haven't, and the picture overall is still a bit fuzzy. May be someone out there has done a more disciplined epidemiological study...

PS: a couple of services were running tiered when it wasn't the default (in jdk 7)...

-- ramki

Sent from my iPhone

> On Jul 31, 2015, at 3:08 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
> 
> Ramki, are you actually seeing better peak perf with tiered than C2? I experimented with it on a real workload and it was a net loss for peak perf (anywhere from 8-20% worse than C2, but also quite unstable); this was with a very large code cache to play it safe, but no other tuning.
> 
> sent from my phone
> 
>> On Jul 31, 2015 6:02 PM, "Srinivas Ramakrishna" <ysr1729 at gmail.com> wrote:
>> OK, will do and add you as watcher; thanks Vladimir! (don't yet know if with tiered and a necessarily bounded, if large, code cache whether flushing will in fact eventually become necessary, wrt yr suggested temporary workaround.)
>> 
>> Have a good weekend!
>> -- ramki
>> 
>>> On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>> Got it. Yes, it is issue with thousands java threads.
>>> You are the first pointing this problem. File bug on compiler. We will look what we can do. Most likely we need parallelize this work.
>>> 
>>> Method's hotness is used only for UseCodeCacheFlushing. You can try to guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch it off.
>>> 
>>> We need mark_as_seen_on_stack so leave it.
>>> 
>>> Thanks,
>>> Vladimir
>>> 
>>> 
>>>> On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote:
>>>> Hi Vladimir --
>>>> 
>>>> I noticed the increase even with Initial and Reserved set to the default
>>>> of 240 MB, but actual usage much lower (less than a quarter).
>>>> 
>>>> Look at this code path. Note that this is invoked at every safepoint
>>>> (although it says "periodically" in the comment).
>>>> In the mark_active_nmethods() method, there's a thread iteration in both
>>>> branches of the if. I haven't checked to
>>>> see which of the two was the culprit here, yet (if either).
>>>> 
>>>> // Various cleaning tasks that should be done periodically at safepoints
>>>> 
>>>> void SafepointSynchronize::do_cleanup_tasks() {
>>>> 
>>>> ....
>>>> 
>>>>    {
>>>> 
>>>>      TraceTime t4("mark nmethods", TraceSafepointCleanupTime);
>>>> 
>>>>      NMethodSweeper::mark_active_nmethods();
>>>> 
>>>>    }
>>>> 
>>>> ..
>>>> 
>>>> }
>>>> 
>>>> 
>>>> void NMethodSweeper::mark_active_nmethods() {
>>>> 
>>>>   ...
>>>> 
>>>>    if (!sweep_in_progress()) {
>>>> 
>>>>      _seen = 0;
>>>> 
>>>>      _sweep_fractions_left = NmethodSweepFraction;
>>>> 
>>>>      _current = CodeCache::first_nmethod();
>>>> 
>>>>      _traversals += 1;
>>>> 
>>>>      _total_time_this_sweep = Tickspan();
>>>> 
>>>> 
>>>>      if (PrintMethodFlushing) {
>>>> 
>>>>        tty->print_cr("### Sweep: stack traversal %d", _traversals);
>>>> 
>>>>      }
>>>> 
>>>>      Threads::nmethods_do(&mark_activation_closure);
>>>> 
>>>> 
>>>>    } else {
>>>> 
>>>>      // Only set hotness counter
>>>> 
>>>>      Threads::nmethods_do(&set_hotness_closure);
>>>> 
>>>>    }
>>>> 
>>>> 
>>>>    OrderAccess::storestore();
>>>> 
>>>> }
>>>> 
>>>> 
>>>> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov
>>>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>>> 
>>>>     Hi Ramki,
>>>> 
>>>>     Did you fill up CodeCache? It start scanning aggressive only with
>>>>     full CodeCache:
>>>> 
>>>>        // Force stack scanning if there is only 10% free space in the
>>>>     code cache.
>>>>        // We force stack scanning only non-profiled code heap gets full,
>>>>     since critical
>>>>        // allocation go to the non-profiled heap and we must be make
>>>>     sure that there is
>>>>        // enough space.
>>>>        double free_percent = 1 /
>>>>     CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;
>>>>        if (free_percent <= StartAggressiveSweepingAt) {
>>>>          do_stack_scanning();
>>>>        }
>>>> 
>>>>     Vladimir
>>>> 
>>>>     On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:
>>>> 
>>>> 
>>>>         Yes.
>>>> 
>>>> 
>>>>         On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich
>>>>         <vitalyd at gmail.com <mailto:vitalyd at gmail.com>
>>>>         <mailto:vitalyd at gmail.com <mailto:vitalyd at gmail.com>>> wrote:
>>>> 
>>>>              Ramki, are you running tiered compilation?
>>>> 
>>>>              sent from my phone
>>>> 
>>>>              On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna"
>>>>         <ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>
>>>>              <mailto:ysr1729 at gmail.com <mailto:ysr1729 at gmail.com>>> wrote:
>>>> 
>>>> 
>>>>                  Hello GC and Compiler teams!
>>>> 
>>>>                  One of our services that runs with several thousand threads
>>>>                  recently noticed an increase
>>>>                  in safepoint stop times, but not gc times, upon
>>>>         transitioning to
>>>>                  JDK 8.
>>>> 
>>>>                  Further investigation revealed that most of the delta was
>>>>                  related to the so-called
>>>>                  pre-gc/vmop "cleanup" phase when various book-keeping
>>>>         activities
>>>>                  are performed,
>>>>                  and more specifically in the portion that walks java thread
>>>>                  stacks single-threaded (!)
>>>>                  and updates the hotness counters for the active
>>>>         nmethods. This
>>>>                  code appears to
>>>>                  be new to JDK 8 (in jdk 7 one would walk the stacks
>>>>         only during
>>>>                  code cache sweeps).
>>>> 
>>>>                  I have two questions:
>>>>                  (1) has anyone else (typically, I'd expect applications
>>>>         with
>>>>                  many hundreds or thousands of threads)
>>>>                  noticed this regression?
>>>>                  (2) Can we do better, for example, by:
>>>>                         (a) doing these updates by walking thread stacks in
>>>>                  multiple worker threads in parallel, or best of all:
>>>>                         (b) doing these updates when we walk the thread
>>>>         stacks
>>>>                  during GC, and skipping this phase entirely
>>>>                               for non-GC safepoints (with attendant loss in
>>>>                  frequency of this update in low GC frequency
>>>>                               scenarios).
>>>> 
>>>>                  It seems kind of silly to do GC's with many multiple worker
>>>>                  threads, but do these thread stack
>>>>                  walks single-threaded when it is embarrasingly parallel
>>>>         (one
>>>>                  could predicate the parallelization
>>>>                  based on the measured stack sizes and thread population, if
>>>>                  there was concern on the ovrhead of
>>>>                  activating and deactivating the thread gangs for the work).
>>>> 
>>>>                  A followup question: Any guesses as to how code cache
>>>>                  sweep/eviction quality might be compromised if one
>>>>                  were to dispense with these hotness updates entirely
>>>>         (or at a
>>>>                  much reduced frequency), as a temporary
>>>>                  workaround to the performance problem?
>>>> 
>>>>                  Thoughts/Comments? In particular, has this issue been
>>>>         addressed
>>>>                  perhaps in newer JVMs?
>>>> 
>>>>                  Thanks for any comments, feedback, pointers!
>>>>                  -- ramki
>>>> 
>>>>                  PS: for comparison, here's data with
>>>>         +TraceSafepointCleanup from
>>>>                  JDK 7 (first, where this isn't done)
>>>>                  vs JDK 8 (where this is done) with a program that has a few
>>>>                  thousands of threads:
>>>> 
>>>> 
>>>> 
>>>>                  JDK 7:
>>>>                  ..
>>>>                  2827.308: [sweeping nmethods, 0.0000020 secs]
>>>>                  2828.679: [sweeping nmethods, 0.0000030 secs]
>>>>                  2829.984: [sweeping nmethods, 0.0000030 secs]
>>>>                  2830.956: [sweeping nmethods, 0.0000030 secs]
>>>>                  ..
>>>> 
>>>>                  JDK 8:
>>>>                  ..
>>>>                  7368.634: [mark nmethods, 0.0177030 secs]
>>>>                  7369.587: [mark nmethods, 0.0178305 secs]
>>>>                  7370.479: [mark nmethods, 0.0180260 secs]
>>>>                  7371.503: [mark nmethods, 0.0186494 secs]
>>>>                  ..
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150731/d1b06f18/attachment.htm>