RFR(M/L): 7145569: G1: optimize nmethods scanning

Fri Jun 14 12:27:23 UTC 2013

Hi John,

On 2013-06-14 00:30, John Cuthbertson wrote:
> Hi Everyone,
>
> Can I have a few volunteers look over the changes for this CR - ideally
> I would like at least one person from the compiler team and one from the
> GC team.
>
> Summary:
> For some workloads with G1, the scanning of the Code cache cache can
> take quite a long time. In fact there have been cases where the scanning
> of the code cache is the dominator of the pause and the other worker
> threads sit an spin in the termination code while the worker that
> claimed the code cache scanning task chugs along. Part of the reason is
> that the list of nmethods with scavengable oops is a single root task.
> Another part, specifically affecting G1, is that the presence of an
> nmethod in the list depends on the definition of is_scavengable. For the
> other collectors, when the oops in an nmethod are promoted to the old
> generation that nmethod ceases to be scavengable and is pruned from the
> scavengable list. For G1, because we can collect "old" data during a
> mixed evacuation pause we can't prune the scavengable nmethod list.
>
> The changes are my attempt at a solution outlined by Igor Veresov.
>
> I have added a list of nmethods to each HeapRegion. That list is
> populated by nmethods that contain roots that point into that heap
> region. The lists are populated when an nmethod is created (or in the
> case of C1 when it is patched).
>
> During an evacuation pause we skip scanning the entire code cache and
> instead scan the oops in each nmethod attached to a region when we scan
> that regions RSet. Near the end of the pause we have migrate the
> nmethods attached to regions in the collection to regions in to-space.
> During an initial mark pause, we walk the nmethods attached to all heap
> regions (other than those in the collection set - they're marked when
> their nmethod lists are scanned). During a full GC we empty the nmethod
> list attached to each region and then rebuild it - similar to what
> happens to the RSets.

Just to make sure I understand this change, for the G1 class unloading 
work I don't want the nmethods to act as strong roots. So I guess I'll 
need to remove the call to mark_strong_code_roots.
In the class unloading case only nmethods on the execution stack should 
be treated as strong roots and they are taken care of by 
Threads::possibly_parallel_oops_do (from process_strong_roots).

>
> To verify the integrity of these nmethod lists I've extended the heap
> verification. When scanning the code cache we check that an nmethod is
> attached to heap regions that it points into. When verifying the heap
> regions we ensure that the nmethods attached to a region contain at
> least one pointer into that region. This additional verification can add
> some time so I recently put it under control of a diagnostic flag
> (testing was performed with it enabled).

Did you write some sort of verification code to make sure we don't miss 
any nmethods which we used to scan with SO_CodeCache but don't right 
now? I'm not sure we need that in the code right now I'm just curious.

>
> As part of these changes I moved some code around (specifically the
> verification code) in g1CollectedHeap.[ch]pp and heapRegion.[ch]pp. The
> purely clean up changes can be found at:
> http://cr.openjdk.java.net/~johnc/7145569/webrev.0.code-movement/

I looked through this. I can't say that I understand your exact reasons 
for this change but I trust you on this.

>
> The webrev containing just the functionality can be found at:
> http://cr.openjdk.java.net/~johnc/7145569/webrev.1.optimize-nmethod-scanning/

g1CollectedHeap.cpp:
I'm curious as to why you make G1VerifyCodeRootBlobClosure inherit 
CodeBlobToOopClosure instead of just replacing
+      CodeBlobToOopClosure::do_code_blob(cb);
with "nm->oops_do(_oop_cl);"

CodeBlobToOopClosure has all sorts of complex code to handle "claiming" 
nmethods to only scan them once but you bypass that by setting 
do_marking = false.
The claiming is done in MarkingCodeBlobClosure::do_code_blob and it then 
calls back up to do_newly_marked_nmethod.
This complexity makes the code difficult to follow.

It would be nice if you could move some of the nmethod scanning 
functionality to a new file instead of growing g1CollectedHeap.cpp even 
more. Several of the functions you added to G1CollectedHeap (together 
with their closures) could be factored out into a a G1NMethodManager 
class or something. It would be nice to keep g1CollectedHeap.cpp from 
growing to 7000 lines...

heapRegion.cpp:
+void HeapRegion::migrate_strong_code_roots() {
(...)
+  GrowableArray<nmethod*> to_be_retained(10);
The backing array for this object is resource-allocated.
Are we fine with only having a ResourceMark all the way up in 
evacuate_collection_set()?

heapRegion.cpp (nits):
It seems strange to add _strong_code_root_list as a new line given that 
the other entries in the initializer list are not on their own lines.
      _rem_set(NULL), _recorded_rs_length(0), _predicted_elapsed_time_ms(0),
-    _predicted_bytes_to_copy(0)
+    _predicted_bytes_to_copy(0),
+    _strong_code_root_list(NULL)

It seems to be more common to have the asterisk next to the type in 
HotSpot, just as you did two lines up.
+    nmethod *nm = _strong_code_root_list->at(i);

>
>
> The whole shebang can be found at:
> http://cr.openjdk.java.net/~johnc/7145569/webrev.all
>
> Testing:
> GC test suite with C1, C2, and Tiered on x64 and sparc (Xmixed and
> Xcomp) - with and without verification (including the extra verification);
> a few jprt runs
> Kitchensink (4 hour runs) with C1, C2, Tiered on x64 (Xmixed and Xcomp)
> - with and without verification (including the extra verification).

Did you do any performance testing?

/Mikael

>
> Future enhancements:
> * Currently I'm using a growable array for the nmethod lists but in the
> long term I want to move to a more suitable data structure. Perhaps a
> specialized version of Stack.
> * Currently I migrate nmethods from regions in the collection set to
> regions in to-space in a separate phase. Ideally this should be done
> when we're finished scanning a region's RSet. When we do this, migration
> will be performed in parallel and two threads could be pushing to a
> to-space regions nmethod list at the same time - we will need an
> "atomic" push operation. When the compilers push an nmethod, the do it
> while holding the CodeCache_lock so such an operation is, currently,
> unnecessary.
>
> Thanks,
>
> JohnC
>
>