Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8)

Srinivas Ramakrishna ysr1729 at gmail.com
Fri Jul 31 19:08:50 UTC 2015


BTW, i think some of those are counters that we added (a patch for which is
attached to the openjdk ticket i opened a few months ago, i think)...

-- ramki

On Fri, Jul 31, 2015 at 12:07 PM, Srinivas Ramakrishna <ysr1729 at gmail.com>
wrote:

> Hi Vladimir --
>
>
> Here's a snapshot of the counters:
>
> sun.ci.codeCacheCapacity=251658240
>
> sun.ci.codeCacheMaxCapacity=251658240
>
> sun.ci.codeCacheMethodsReclaimedNum=3450
>
> sun.ci.codeCacheSweepsTotalNum=58
>
> sun.ci.codeCacheSweepsTotalTimeMillis=1111
>
> sun.ci.codeCacheUsed=35888704
>
>
> Notice that the code cache usage is less that 35 MB, for the 240 MB
> capacity, yet it seems we have had 58 sweeps already, and safepoint cleanup
> says:
>
> [mark nmethods, 0.0165062 secs]
>
> Even if the two closures do little or no work, the single-threaded walk
> over deep stacks of a thousand threads will cost time for applications with
> many threads, and this is now done at each safepoint irrespective of the
> sweeper activity as far as I can tell. It seems as if this work should be
> somehow rolled up (via a suitable injection) into GC's thread walks that
> are done in parallel, rather than doing this in a pre-GC phase (unless I am
> mssing some reason that the sequencing is necessary, which it doesn't seem
> to be here).
>
> -- ramki
>
> On Fri, Jul 31, 2015 at 11:48 AM, Srinivas Ramakrishna <ysr1729 at gmail.com>
> wrote:
>
>> Hi Vladimir --
>>
>> I noticed the increase even with Initial and Reserved set to the default
>> of 240 MB, but actual usage much lower (less than a quarter).
>>
>> Look at this code path. Note that this is invoked at every safepoint
>> (although it says "periodically" in the comment).
>> In the mark_active_nmethods() method, there's a thread iteration in both
>> branches of the if. I haven't checked to
>> see which of the two was the culprit here, yet (if either).
>>
>> // Various cleaning tasks that should be done periodically at safepoints
>>
>> void SafepointSynchronize::do_cleanup_tasks() {
>>
>> ....
>>
>>   {
>>
>>     TraceTime t4("mark nmethods", TraceSafepointCleanupTime);
>>
>>     NMethodSweeper::mark_active_nmethods();
>>
>>   }
>>
>> ..
>>
>> }
>>
>>
>> void NMethodSweeper::mark_active_nmethods() {
>>
>>  ...
>>
>>   if (!sweep_in_progress()) {
>>
>>     _seen = 0;
>>
>>     _sweep_fractions_left = NmethodSweepFraction;
>>
>>     _current = CodeCache::first_nmethod();
>>
>>     _traversals += 1;
>>
>>     _total_time_this_sweep = Tickspan();
>>
>>
>>     if (PrintMethodFlushing) {
>>
>>       tty->print_cr("### Sweep: stack traversal %d", _traversals);
>>
>>     }
>>
>>     Threads::nmethods_do(&mark_activation_closure);
>>
>>
>>   } else {
>>
>>     // Only set hotness counter
>>
>>     Threads::nmethods_do(&set_hotness_closure);
>>
>>   }
>>
>>
>>   OrderAccess::storestore();
>>
>> }
>>
>> On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov <
>> vladimir.kozlov at oracle.com> wrote:
>>
>>> Hi Ramki,
>>>
>>> Did you fill up CodeCache? It start scanning aggressive only with full
>>> CodeCache:
>>>
>>>   // Force stack scanning if there is only 10% free space in the code
>>> cache.
>>>   // We force stack scanning only non-profiled code heap gets full,
>>> since critical
>>>   // allocation go to the non-profiled heap and we must be make sure
>>> that there is
>>>   // enough space.
>>>   double free_percent = 1 /
>>> CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;
>>>   if (free_percent <= StartAggressiveSweepingAt) {
>>>     do_stack_scanning();
>>>   }
>>>
>>> Vladimir
>>>
>>> On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:
>>>
>>>>
>>>> Yes.
>>>>
>>>>
>>>> On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich <vitalyd at gmail.com
>>>> <mailto:vitalyd at gmail.com>> wrote:
>>>>
>>>>     Ramki, are you running tiered compilation?
>>>>
>>>>     sent from my phone
>>>>
>>>>     On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna" <ysr1729 at gmail.com
>>>>     <mailto:ysr1729 at gmail.com>> wrote:
>>>>
>>>>
>>>>         Hello GC and Compiler teams!
>>>>
>>>>         One of our services that runs with several thousand threads
>>>>         recently noticed an increase
>>>>         in safepoint stop times, but not gc times, upon transitioning to
>>>>         JDK 8.
>>>>
>>>>         Further investigation revealed that most of the delta was
>>>>         related to the so-called
>>>>         pre-gc/vmop "cleanup" phase when various book-keeping activities
>>>>         are performed,
>>>>         and more specifically in the portion that walks java thread
>>>>         stacks single-threaded (!)
>>>>         and updates the hotness counters for the active nmethods. This
>>>>         code appears to
>>>>         be new to JDK 8 (in jdk 7 one would walk the stacks only during
>>>>         code cache sweeps).
>>>>
>>>>         I have two questions:
>>>>         (1) has anyone else (typically, I'd expect applications with
>>>>         many hundreds or thousands of threads)
>>>>         noticed this regression?
>>>>         (2) Can we do better, for example, by:
>>>>                (a) doing these updates by walking thread stacks in
>>>>         multiple worker threads in parallel, or best of all:
>>>>                (b) doing these updates when we walk the thread stacks
>>>>         during GC, and skipping this phase entirely
>>>>                      for non-GC safepoints (with attendant loss in
>>>>         frequency of this update in low GC frequency
>>>>                      scenarios).
>>>>
>>>>         It seems kind of silly to do GC's with many multiple worker
>>>>         threads, but do these thread stack
>>>>         walks single-threaded when it is embarrasingly parallel (one
>>>>         could predicate the parallelization
>>>>         based on the measured stack sizes and thread population, if
>>>>         there was concern on the ovrhead of
>>>>         activating and deactivating the thread gangs for the work).
>>>>
>>>>         A followup question: Any guesses as to how code cache
>>>>         sweep/eviction quality might be compromised if one
>>>>         were to dispense with these hotness updates entirely (or at a
>>>>         much reduced frequency), as a temporary
>>>>         workaround to the performance problem?
>>>>
>>>>         Thoughts/Comments? In particular, has this issue been addressed
>>>>         perhaps in newer JVMs?
>>>>
>>>>         Thanks for any comments, feedback, pointers!
>>>>         -- ramki
>>>>
>>>>         PS: for comparison, here's data with +TraceSafepointCleanup from
>>>>         JDK 7 (first, where this isn't done)
>>>>         vs JDK 8 (where this is done) with a program that has a few
>>>>         thousands of threads:
>>>>
>>>>
>>>>
>>>>         JDK 7:
>>>>         ..
>>>>         2827.308: [sweeping nmethods, 0.0000020 secs]
>>>>         2828.679: [sweeping nmethods, 0.0000030 secs]
>>>>         2829.984: [sweeping nmethods, 0.0000030 secs]
>>>>         2830.956: [sweeping nmethods, 0.0000030 secs]
>>>>         ..
>>>>
>>>>         JDK 8:
>>>>         ..
>>>>         7368.634: [mark nmethods, 0.0177030 secs]
>>>>         7369.587: [mark nmethods, 0.0178305 secs]
>>>>         7370.479: [mark nmethods, 0.0180260 secs]
>>>>         7371.503: [mark nmethods, 0.0186494 secs]
>>>>         ..
>>>>
>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150731/65b269d0/attachment.htm>


More information about the hotspot-gc-dev mailing list