RFR(S): 7143511: G1: Another instance of high GC Worker Other time (50ms)

Tue Mar 20 15:08:44 UTC 2012

John,

g1CollectedHeap.cpp:

4801       if (predicate_cl.points_into_cs()) {
4802         // At least one of the reference fields or the oop relocations
4803         // in the nmethod points into the collection set. We have to
4804         // 'mark' this nmethod.
4805         CodeBlobToOopClosure::do_code_blob(cb);
4806       }

CodeBlobToOopClosure::do_code_blob() does:

   if (!_do_marking) {
     ...
   } else {
==>> we'll always get here as _do_marking is true in your case (you can 
even drop the do_marking parameter to the G1FilteredCodeBlobToOopClosure 
ctor and just pass true to the CodeBlobToOopclosure ctor; are we going 
to re-use the same closure somewhere else in the future?)

     MarkingCodeBlobClosure::do_code_blob(cb);
   }
}

MarkingCodeBlobToClosure::do_code_blob() does:

void MarkingCodeBlobClosure::do_code_blob(CodeBlob* cb) {
   nmethod* nm = cb->as_nmethod_or_null();
==>> you've already checked that this is not NULL
   if (nm == NULL)  return;
   if (!nm->test_set_oops_do_mark()) {
     do_newly_marked_nmethod(nm);
   }
}

and do_newly_marked_nmethod() is a virtual method and it does:

void CodeBlobToOopClosure::do_newly_marked_nmethod(nmethod* nm) {
   nm->oops_do(_cl, /*do_strong_roots_only=*/ true);
}

Does it make sense to replace the call to 
MarkingCodeBlobToClosure::do_code_blob() with the following (it will 
save you a virtual call per nmethod)?

if (!nm->test_set_oops_do_mark()) {
   nm->oops_do(...);
}

Unless you want to rely on the implementation of CodeBlobToOopClosure in 
case it's changed in the future?

I was also trying to figure out how to save the second oop iteration 
(i.e., combine the "any oops in the CSet?" with the "process CSet oops" 
operations) but it's a bit trickier to do that without some more 
refactoring in the CodeBlob closures. I think we'd need to split 
test_set_oops_do_mark() into "claim nmethod" and "add it to the queue" 
as we'd need to first claim the nmethod before potentially process oops 
on it but only add it to the queue if we found oops to process. Let's 
leave this as a future project...

One related question: will all GC threads now visit all nmethods and 
look for CSet oops (as we do that without claiming them)? Does that 
increase the root scanning time?

g1CollectorPolicy.cpp:

1424       print_par_stats(2, "GC Worker End", _par_last_gc_worker_end_times_ms);
1425       print_par_stats(2, "GC Worker", _par_last_gc_worker_times_ms);

GC Worker End is the end time stamp per worker and GC Worker is the 
elapsed time per worker, right? Could I maybe recommend to have the time 
stamp at the end so that the parallel time output is bracketed with 
start and end timestamps? Also, since you're making changes here, maybe 
rename "GC Worker" to something a bit more understandable, like "GC 
Worker Total" or "GC Worker Time"?

Small nits:

g1CollectedHeap.cpp

4771     template<class T>  void do_oop_nv(T* p) {

Could you split the line? We generally try to format these so that 
template <class T> is in one line and void ... in the second one.

4791     CodeBlobToOopClosure(cl, do_marking),
4792     _g1(g1) { }

Could you maybe merge these two lines? There should be plenty of space.

4796     if (nm != NULL&&  !(nm->test_oops_do_mark())) {
4797
4798       G1PointsIntoCSOopClosure predicate_cl(_g1);

Why is line 4797 empty?

5446   double start_par = os::elapsedTime();
5447   double end_par;

5473   double par_time = (end_par - start_par) * 1000.0;

5476   double code_root_fixup_time = (os::elapsedTime() - end_par) * 1000.0;

Could I convince you to rename these to:

start_par_time_sec,
end_par_time_sec,
par_time_ms, and
code_root_fixup_time_ms?

On 03/19/2012 01:54 PM, John Cuthbertson wrote:
> Hi Everyone,
>
> Can I have a couple of volunteers review the changes for this CR? The 
> webrev can be found at: 
> http://cr.openjdk.java.net/~johnc/7143511/webrev.0/.
>
> I am cc'ing the compiler-dev list as I changed the nmethod class to 
> make the routine nmethod::test_oops_do_mark() available in non-debug 
> builds.
>
> Summary:
> While running SPECjbb2012, it was noticed that the GC Worker Other 
> time (printed as part of the PrintGCDetails output) was quite high on 
> certain platforms. An investigation discovered that this time was 
> coming from the StrongRootsScope destructor (which is executed by the 
> VM thread and was being incorrectly attributed as part of the parallel 
> time). Most of the time for the StrongRootsScope destructor was coming 
> from walking the marked nmethods list, where the nmethods discovered 
> by the CodeBlobToOops closure in SharedHeap::process_strong_roots() 
> are walked to fix up any oops that may have moved during the GC. 
> Further investigation revealed that the nmethods were being added to 
> the marked list unconditionally. With TieredCompilation, the number of 
> nmethods in the code cache (and the number of oops in each nmethod) 
> significanly increased (~6000 vs. ~700 nmethods and ~115K vs ~15K 
> oops) increasing the time it took to walk the marked nmethod list. It 
> was also observed that most of the nmethods on the marked list did not 
> contain any pointers into the collection set and so the oops that they 
> were pointing to did not move during the GC. The solution was to limit 
> the number of nmethods placed on to the marked list by specializing 
> the CodeBlobToOopClosure to filter out those nmethods that did not 
> point into the collection set.
>
> The other changes include refactoring the code in G1ParTask:work() and 
> where the G1ParTask is executed in evacuate_collection_set() so that 
> the values printed for the GC Worker Other time and (the new) Code 
> Root Fixup time are accurately calculated.
>
> Testing: SPECjbb2012 on several platforms; the GC test suite with heap 
> verification after GC enabled.
>
> Thanks,
>
> JohnC