CRR: 7046558 G1: concurrent marking optimizations (S)

Tue Jun 7 20:35:48 UTC 2011

Hi all,

All set thanks to Stefan and a second code review from John Cuthbertson. 
In case anyone wants to have a look, here's the updated webrev(s):

Complete:
http://cr.openjdk.java.net/~tonyp/7046558/webrev.2/webrev.all/

Change 1 (closure specialization):
http://cr.openjdk.java.net/~tonyp/7046558/webrev.2/webrev.0.G1CMOptSpecial/

Change 2 (method inlining):
http://cr.openjdk.java.net/~tonyp/7046558/webrev.2/webrev.1.G1CMOptInline/

Change 3 (misc optimizations):
http://cr.openjdk.java.net/~tonyp/7046558/webrev.2/webrev.2.G1CMOptMisc/

The only one that's changed compared to the previous version is the last 
change. I introduced and use the is_in_g1_heap() method in 
ConcurrentMark to avoid calling into the G1 heap. I also added a comment 
near the start of the claim_region() method which addresses an issue 
that John Cuthbertson brought up.

Tony

On 06/02/2011 04:25 AM, Tony Printezis wrote:
> Stefan,
>
> Thanks for the code review! Good catch about the circular dependency. 
> I re-organized things a bit: I moved scan_object() from the 
> .inline.hpp to the .cpp file (it's only called from there so it should 
> be inlined anyway) so now I don't need to include 
> g1OopClosures.inline.hpp in concurrentMark.inline.hpp.
>
> Here's the updated webrev:
>
> All included:
> http://cr.openjdk.java.net/~tonyp/7046558/webrev.1/webrev.all/
>
> Change 1:
> http://cr.openjdk.java.net/~tonyp/7046558/webrev.1/webrev.G1CMOptSpecial/
>
> Change 2:
> http://cr.openjdk.java.net/~tonyp/7046558/webrev.1/webrev.G1CMOptInline/
>
> Change 3:
> http://cr.openjdk.java.net/~tonyp/7046558/webrev.1/webrev.G1CMOptMisc/
>
> Tony
>
> On 6/1/2011 9:28 AM, Stefan Karlsson wrote:
>> Note that you have introduced a circular dependency between the 
>> files: concurrentMark.inline.hpp and g1OopClosures.inline.hpp.
>>
>> Otherwise, this looks good.
>>
>> StefanK
>>
>> On 06/01/2011 02:19 PM, Tony Printezis wrote:
>>> Hi,
>>>
>>> Could I please have a couple of code reviews for the following change:
>>>
>>> http://cr.openjdk.java.net/~tonyp/7046558/webrev.0/webrev.all/
>>>
>>> It includes a few optimizations in the concurrent marking code. The 
>>> fact that webrev shows that almost 400 lines of code have been 
>>> changed is actually a bit misleading given that the changes are 
>>> actually relatively simple. Here's a breakdown to make your life a 
>>> bit easier:
>>>
>>> Change 1: Specialize the G1CMOopClosure to avoid the virtual call to 
>>> do_oop() every time we visit an oop during marking (the closure used 
>>> to be called CMOopClosure, I prefixed its name with G1 since it's 
>>> been moved out of the concurrentMark.cpp file). Note: I didn't 
>>> change much code here, just moved things around.
>>>
>>> http://cr.openjdk.java.net/~tonyp/7046558/webrev.0/webrev.G1CMOptSpecial/ 
>>>
>>>
>>> Change 2 (builds on Change 1): Move methods 
>>> CM::deal_with_reference() and CM::push() to the .inline.hpp file, 
>>> without changing them, to make sure they get inlined, as they are 
>>> called from the fast path. You can either check line by line that 
>>> they are unchanged. Or you can take my word for it. :-)
>>>
>>> http://cr.openjdk.java.net/~tonyp/7046558/webrev.0/webrev.G1CMOptInline/ 
>>>
>>>
>>> Change 3 (builds on Change 2): A few minor performance improvements 
>>> here and there (plus a couple of cosmetic changes to introduce some 
>>> missing curly brackets).
>>>
>>> http://cr.openjdk.java.net/~tonyp/7046558/webrev.0/webrev.G1CMOptMisc/
>>>
>>> I kept track of the separate patches to be able to evaluate them 
>>> separately so I thought I'd present the changes separately to make 
>>> the code review process a bit easier. I will of course push them as 
>>> a single changeset.
>>>
>>> I was not quite sure how to evaluate the performance impact of the 
>>> changes given that marking runs concurrently and not always in a 
>>> deterministic fashion (from a performance point-of-view!!!). So I 
>>> did the evaluation using collect and analyze. When looking at 
>>> time-based profiles, I see that the above changes decrease the time 
>>> we spend in the marking code (i.e., how long we spend in the 
>>> CMConcurrentMarkingTask::work(int i) method) by between 7%-10%. The 
>>> improvement is similar when looking at cycle counts. And I also see 
>>> a nice decrease in L2 cache misses. Each separate Change does 
>>> improve on the previous one, but the Change with the biggest impact 
>>> was 1.
>>>
>>> Tony
>>>
>>