Cost of single-threaded nmethod hotness updates at each safepoint (in JDK 8)
Srinivas Ramakrishna
ysr1729 at gmail.com
Fri Jul 31 18:19:02 UTC 2015
Hello GC and Compiler teams!
One of our services that runs with several thousand threads recently
noticed an increase
in safepoint stop times, but not gc times, upon transitioning to JDK 8.
Further investigation revealed that most of the delta was related to the
so-called
pre-gc/vmop "cleanup" phase when various book-keeping activities are
performed,
and more specifically in the portion that walks java thread stacks
single-threaded (!)
and updates the hotness counters for the active nmethods. This code appears
to
be new to JDK 8 (in jdk 7 one would walk the stacks only during code cache
sweeps).
I have two questions:
(1) has anyone else (typically, I'd expect applications with many hundreds
or thousands of threads)
noticed this regression?
(2) Can we do better, for example, by:
(a) doing these updates by walking thread stacks in multiple worker
threads in parallel, or best of all:
(b) doing these updates when we walk the thread stacks during GC, and
skipping this phase entirely
for non-GC safepoints (with attendant loss in frequency of this
update in low GC frequency
scenarios).
It seems kind of silly to do GC's with many multiple worker threads, but do
these thread stack
walks single-threaded when it is embarrasingly parallel (one could
predicate the parallelization
based on the measured stack sizes and thread population, if there was
concern on the ovrhead of
activating and deactivating the thread gangs for the work).
A followup question: Any guesses as to how code cache sweep/eviction
quality might be compromised if one
were to dispense with these hotness updates entirely (or at a much reduced
frequency), as a temporary
workaround to the performance problem?
Thoughts/Comments? In particular, has this issue been addressed perhaps in
newer JVMs?
Thanks for any comments, feedback, pointers!
-- ramki
PS: for comparison, here's data with +TraceSafepointCleanup from JDK 7
(first, where this isn't done)
vs JDK 8 (where this is done) with a program that has a few thousands of
threads:
JDK 7:
..
2827.308: [sweeping nmethods, 0.0000020 secs]
2828.679: [sweeping nmethods, 0.0000030 secs]
2829.984: [sweeping nmethods, 0.0000030 secs]
2830.956: [sweeping nmethods, 0.0000030 secs]
..
JDK 8:
..
7368.634: [mark nmethods, 0.0177030 secs]
7369.587: [mark nmethods, 0.0178305 secs]
7370.479: [mark nmethods, 0.0180260 secs]
7371.503: [mark nmethods, 0.0186494 secs]
..
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150731/e1a5487f/attachment.htm>
More information about the hotspot-gc-dev
mailing list