RFR(S): 7143511: G1: Another instance of high GC Worker Other time (50ms)
Tony Printezis
tony.printezis at oracle.com
Tue Mar 20 15:08:44 UTC 2012
John,
g1CollectedHeap.cpp:
4801 if (predicate_cl.points_into_cs()) {
4802 // At least one of the reference fields or the oop relocations
4803 // in the nmethod points into the collection set. We have to
4804 // 'mark' this nmethod.
4805 CodeBlobToOopClosure::do_code_blob(cb);
4806 }
CodeBlobToOopClosure::do_code_blob() does:
if (!_do_marking) {
...
} else {
==>> we'll always get here as _do_marking is true in your case (you can
even drop the do_marking parameter to the G1FilteredCodeBlobToOopClosure
ctor and just pass true to the CodeBlobToOopclosure ctor; are we going
to re-use the same closure somewhere else in the future?)
MarkingCodeBlobClosure::do_code_blob(cb);
}
}
MarkingCodeBlobToClosure::do_code_blob() does:
void MarkingCodeBlobClosure::do_code_blob(CodeBlob* cb) {
nmethod* nm = cb->as_nmethod_or_null();
==>> you've already checked that this is not NULL
if (nm == NULL) return;
if (!nm->test_set_oops_do_mark()) {
do_newly_marked_nmethod(nm);
}
}
and do_newly_marked_nmethod() is a virtual method and it does:
void CodeBlobToOopClosure::do_newly_marked_nmethod(nmethod* nm) {
nm->oops_do(_cl, /*do_strong_roots_only=*/ true);
}
Does it make sense to replace the call to
MarkingCodeBlobToClosure::do_code_blob() with the following (it will
save you a virtual call per nmethod)?
if (!nm->test_set_oops_do_mark()) {
nm->oops_do(...);
}
Unless you want to rely on the implementation of CodeBlobToOopClosure in
case it's changed in the future?
I was also trying to figure out how to save the second oop iteration
(i.e., combine the "any oops in the CSet?" with the "process CSet oops"
operations) but it's a bit trickier to do that without some more
refactoring in the CodeBlob closures. I think we'd need to split
test_set_oops_do_mark() into "claim nmethod" and "add it to the queue"
as we'd need to first claim the nmethod before potentially process oops
on it but only add it to the queue if we found oops to process. Let's
leave this as a future project...
One related question: will all GC threads now visit all nmethods and
look for CSet oops (as we do that without claiming them)? Does that
increase the root scanning time?
g1CollectorPolicy.cpp:
1424 print_par_stats(2, "GC Worker End", _par_last_gc_worker_end_times_ms);
1425 print_par_stats(2, "GC Worker", _par_last_gc_worker_times_ms);
GC Worker End is the end time stamp per worker and GC Worker is the
elapsed time per worker, right? Could I maybe recommend to have the time
stamp at the end so that the parallel time output is bracketed with
start and end timestamps? Also, since you're making changes here, maybe
rename "GC Worker" to something a bit more understandable, like "GC
Worker Total" or "GC Worker Time"?
Small nits:
g1CollectedHeap.cpp
4771 template<class T> void do_oop_nv(T* p) {
Could you split the line? We generally try to format these so that
template <class T> is in one line and void ... in the second one.
4791 CodeBlobToOopClosure(cl, do_marking),
4792 _g1(g1) { }
Could you maybe merge these two lines? There should be plenty of space.
4796 if (nm != NULL&& !(nm->test_oops_do_mark())) {
4797
4798 G1PointsIntoCSOopClosure predicate_cl(_g1);
Why is line 4797 empty?
5446 double start_par = os::elapsedTime();
5447 double end_par;
5473 double par_time = (end_par - start_par) * 1000.0;
5476 double code_root_fixup_time = (os::elapsedTime() - end_par) * 1000.0;
Could I convince you to rename these to:
start_par_time_sec,
end_par_time_sec,
par_time_ms, and
code_root_fixup_time_ms?
On 03/19/2012 01:54 PM, John Cuthbertson wrote:
> Hi Everyone,
>
> Can I have a couple of volunteers review the changes for this CR? The
> webrev can be found at:
> http://cr.openjdk.java.net/~johnc/7143511/webrev.0/.
>
> I am cc'ing the compiler-dev list as I changed the nmethod class to
> make the routine nmethod::test_oops_do_mark() available in non-debug
> builds.
>
> Summary:
> While running SPECjbb2012, it was noticed that the GC Worker Other
> time (printed as part of the PrintGCDetails output) was quite high on
> certain platforms. An investigation discovered that this time was
> coming from the StrongRootsScope destructor (which is executed by the
> VM thread and was being incorrectly attributed as part of the parallel
> time). Most of the time for the StrongRootsScope destructor was coming
> from walking the marked nmethods list, where the nmethods discovered
> by the CodeBlobToOops closure in SharedHeap::process_strong_roots()
> are walked to fix up any oops that may have moved during the GC.
> Further investigation revealed that the nmethods were being added to
> the marked list unconditionally. With TieredCompilation, the number of
> nmethods in the code cache (and the number of oops in each nmethod)
> significanly increased (~6000 vs. ~700 nmethods and ~115K vs ~15K
> oops) increasing the time it took to walk the marked nmethod list. It
> was also observed that most of the nmethods on the marked list did not
> contain any pointers into the collection set and so the oops that they
> were pointing to did not move during the GC. The solution was to limit
> the number of nmethods placed on to the marked list by specializing
> the CodeBlobToOopClosure to filter out those nmethods that did not
> point into the collection set.
>
> The other changes include refactoring the code in G1ParTask:work() and
> where the G1ParTask is executed in evacuate_collection_set() so that
> the values printed for the GC Worker Other time and (the new) Code
> Root Fixup time are accurately calculated.
>
> Testing: SPECjbb2012 on several platforms; the GC test suite with heap
> verification after GC enabled.
>
> Thanks,
>
> JohnC
More information about the hotspot-gc-dev
mailing list