CRR: 7046558 G1: concurrent marking optimizations (S)

Wed Jun 1 12:19:48 UTC 2011

Hi,

Could I please have a couple of code reviews for the following change:

http://cr.openjdk.java.net/~tonyp/7046558/webrev.0/webrev.all/

It includes a few optimizations in the concurrent marking code. The fact 
that webrev shows that almost 400 lines of code have been changed is 
actually a bit misleading given that the changes are actually relatively 
simple. Here's a breakdown to make your life a bit easier:

Change 1: Specialize the G1CMOopClosure to avoid the virtual call to 
do_oop() every time we visit an oop during marking (the closure used to 
be called CMOopClosure, I prefixed its name with G1 since it's been 
moved out of the concurrentMark.cpp file). Note: I didn't change much 
code here, just moved things around.

http://cr.openjdk.java.net/~tonyp/7046558/webrev.0/webrev.G1CMOptSpecial/

Change 2 (builds on Change 1): Move methods CM::deal_with_reference() 
and CM::push() to the .inline.hpp file, without changing them, to make 
sure they get inlined, as they are called from the fast path. You can 
either check line by line that they are unchanged. Or you can take my 
word for it. :-)

http://cr.openjdk.java.net/~tonyp/7046558/webrev.0/webrev.G1CMOptInline/

Change 3 (builds on Change 2): A few minor performance improvements here 
and there (plus a couple of cosmetic changes to introduce some missing 
curly brackets).

http://cr.openjdk.java.net/~tonyp/7046558/webrev.0/webrev.G1CMOptMisc/

I kept track of the separate patches to be able to evaluate them 
separately so I thought I'd present the changes separately to make the 
code review process a bit easier. I will of course push them as a single 
changeset.

I was not quite sure how to evaluate the performance impact of the 
changes given that marking runs concurrently and not always in a 
deterministic fashion (from a performance point-of-view!!!). So I did 
the evaluation using collect and analyze. When looking at time-based 
profiles, I see that the above changes decrease the time we spend in the 
marking code (i.e., how long we spend in the 
CMConcurrentMarkingTask::work(int i) method) by between 7%-10%. The 
improvement is similar when looking at cycle counts. And I also see a 
nice decrease in L2 cache misses. Each separate Change does improve on 
the previous one, but the Change with the biggest impact was 1.

Tony