<div dir="ltr">OK, will do and add you as watcher; thanks Vladimir! (don't yet know if with tiered and a necessarily bounded, if large, code cache whether flushing will in fact eventually become necessary, wrt yr suggested temporary workaround.)<div><br></div><div>Have a good weekend!</div><div>-- ramki</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jul 31, 2015 at 2:28 PM, Vladimir Kozlov <span dir="ltr"><<a href="mailto:vladimir.kozlov@oracle.com" target="_blank">vladimir.kozlov@oracle.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Got it. Yes, it is issue with thousands java threads.<br>
You are the first pointing this problem. File bug on compiler. We will look what we can do. Most likely we need parallelize this work.<br>
<br>
Method's hotness is used only for UseCodeCacheFlushing. You can try to guard Threads::nmethods_do(&set_hotness_closure); with this flag and switch it off.<br>
<br>
We need mark_as_seen_on_stack so leave it.<br>
<br>
Thanks,<br>
Vladimir<div><div class="h5"><br>
<br>
On 7/31/15 11:48 AM, Srinivas Ramakrishna wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">
Hi Vladimir --<br>
<br>
I noticed the increase even with Initial and Reserved set to the default<br>
of 240 MB, but actual usage much lower (less than a quarter).<br>
<br>
Look at this code path. Note that this is invoked at every safepoint<br>
(although it says "periodically" in the comment).<br>
In the mark_active_nmethods() method, there's a thread iteration in both<br>
branches of the if. I haven't checked to<br>
see which of the two was the culprit here, yet (if either).<br>
<br>
// Various cleaning tasks that should be done periodically at safepoints<br>
<br>
void SafepointSynchronize::do_cleanup_tasks() {<br>
<br>
....<br>
<br>
{<br>
<br>
TraceTime t4("mark nmethods", TraceSafepointCleanupTime);<br>
<br>
NMethodSweeper::mark_active_nmethods();<br>
<br>
}<br>
<br>
..<br>
<br>
}<br>
<br>
<br>
void NMethodSweeper::mark_active_nmethods() {<br>
<br>
...<br>
<br>
if (!sweep_in_progress()) {<br>
<br>
_seen = 0;<br>
<br>
_sweep_fractions_left = NmethodSweepFraction;<br>
<br>
_current = CodeCache::first_nmethod();<br>
<br>
_traversals += 1;<br>
<br>
_total_time_this_sweep = Tickspan();<br>
<br>
<br>
if (PrintMethodFlushing) {<br>
<br>
tty->print_cr("### Sweep: stack traversal %d", _traversals);<br>
<br>
}<br>
<br>
Threads::nmethods_do(&mark_activation_closure);<br>
<br>
<br>
} else {<br>
<br>
// Only set hotness counter<br>
<br>
Threads::nmethods_do(&set_hotness_closure);<br>
<br>
}<br>
<br>
<br>
OrderAccess::storestore();<br>
<br>
}<br>
<br>
<br>
On Fri, Jul 31, 2015 at 11:43 AM, Vladimir Kozlov<br></div></div><span class="">
<<a href="mailto:vladimir.kozlov@oracle.com" target="_blank">vladimir.kozlov@oracle.com</a> <mailto:<a href="mailto:vladimir.kozlov@oracle.com" target="_blank">vladimir.kozlov@oracle.com</a>>> wrote:<br>
<br>
Hi Ramki,<br>
<br>
Did you fill up CodeCache? It start scanning aggressive only with<br>
full CodeCache:<br>
<br>
// Force stack scanning if there is only 10% free space in the<br>
code cache.<br>
// We force stack scanning only non-profiled code heap gets full,<br>
since critical<br>
// allocation go to the non-profiled heap and we must be make<br>
sure that there is<br>
// enough space.<br>
double free_percent = 1 /<br>
CodeCache::reverse_free_ratio(CodeBlobType::MethodNonProfiled) * 100;<br>
if (free_percent <= StartAggressiveSweepingAt) {<br>
do_stack_scanning();<br>
}<br>
<br>
Vladimir<br>
<br>
On 7/31/15 11:33 AM, Srinivas Ramakrishna wrote:<br>
<br>
<br>
Yes.<br>
<br>
<br>
On Fri, Jul 31, 2015 at 11:31 AM, Vitaly Davidovich<br>
<<a href="mailto:vitalyd@gmail.com" target="_blank">vitalyd@gmail.com</a> <mailto:<a href="mailto:vitalyd@gmail.com" target="_blank">vitalyd@gmail.com</a>><br></span><span class="">
<mailto:<a href="mailto:vitalyd@gmail.com" target="_blank">vitalyd@gmail.com</a> <mailto:<a href="mailto:vitalyd@gmail.com" target="_blank">vitalyd@gmail.com</a>>>> wrote:<br>
<br>
Ramki, are you running tiered compilation?<br>
<br>
sent from my phone<br>
<br>
On Jul 31, 2015 2:19 PM, "Srinivas Ramakrishna"<br>
<<a href="mailto:ysr1729@gmail.com" target="_blank">ysr1729@gmail.com</a> <mailto:<a href="mailto:ysr1729@gmail.com" target="_blank">ysr1729@gmail.com</a>><br></span><div><div class="h5">
<mailto:<a href="mailto:ysr1729@gmail.com" target="_blank">ysr1729@gmail.com</a> <mailto:<a href="mailto:ysr1729@gmail.com" target="_blank">ysr1729@gmail.com</a>>>> wrote:<br>
<br>
<br>
Hello GC and Compiler teams!<br>
<br>
One of our services that runs with several thousand threads<br>
recently noticed an increase<br>
in safepoint stop times, but not gc times, upon<br>
transitioning to<br>
JDK 8.<br>
<br>
Further investigation revealed that most of the delta was<br>
related to the so-called<br>
pre-gc/vmop "cleanup" phase when various book-keeping<br>
activities<br>
are performed,<br>
and more specifically in the portion that walks java thread<br>
stacks single-threaded (!)<br>
and updates the hotness counters for the active<br>
nmethods. This<br>
code appears to<br>
be new to JDK 8 (in jdk 7 one would walk the stacks<br>
only during<br>
code cache sweeps).<br>
<br>
I have two questions:<br>
(1) has anyone else (typically, I'd expect applications<br>
with<br>
many hundreds or thousands of threads)<br>
noticed this regression?<br>
(2) Can we do better, for example, by:<br>
(a) doing these updates by walking thread stacks in<br>
multiple worker threads in parallel, or best of all:<br>
(b) doing these updates when we walk the thread<br>
stacks<br>
during GC, and skipping this phase entirely<br>
for non-GC safepoints (with attendant loss in<br>
frequency of this update in low GC frequency<br>
scenarios).<br>
<br>
It seems kind of silly to do GC's with many multiple worker<br>
threads, but do these thread stack<br>
walks single-threaded when it is embarrasingly parallel<br>
(one<br>
could predicate the parallelization<br>
based on the measured stack sizes and thread population, if<br>
there was concern on the ovrhead of<br>
activating and deactivating the thread gangs for the work).<br>
<br>
A followup question: Any guesses as to how code cache<br>
sweep/eviction quality might be compromised if one<br>
were to dispense with these hotness updates entirely<br>
(or at a<br>
much reduced frequency), as a temporary<br>
workaround to the performance problem?<br>
<br>
Thoughts/Comments? In particular, has this issue been<br>
addressed<br>
perhaps in newer JVMs?<br>
<br>
Thanks for any comments, feedback, pointers!<br>
-- ramki<br>
<br>
PS: for comparison, here's data with<br>
+TraceSafepointCleanup from<br>
JDK 7 (first, where this isn't done)<br>
vs JDK 8 (where this is done) with a program that has a few<br>
thousands of threads:<br>
<br>
<br>
<br>
JDK 7:<br>
..<br>
2827.308: [sweeping nmethods, 0.0000020 secs]<br>
2828.679: [sweeping nmethods, 0.0000030 secs]<br>
2829.984: [sweeping nmethods, 0.0000030 secs]<br>
2830.956: [sweeping nmethods, 0.0000030 secs]<br>
..<br>
<br>
JDK 8:<br>
..<br>
7368.634: [mark nmethods, 0.0177030 secs]<br>
7369.587: [mark nmethods, 0.0178305 secs]<br>
7370.479: [mark nmethods, 0.0180260 secs]<br>
7371.503: [mark nmethods, 0.0186494 secs]<br>
..<br>
<br>
<br>
<br>
</div></div></blockquote>
</blockquote></div><br></div>