RFR: 8151556: Use the PreservedMarks* classes for the G1 preserved mark stacks

Thu Mar 24 22:07:16 UTC 2016

Thomas,

Inline.

On March 22, 2016 at 7:01:32 AM, Thomas Schatzl (thomas.schatzl at oracle.com) wrote:

Hi Tony, 

some comments to your questions: 

On Thu, 2016-03-10 at 12:56 -0500, Tony Printezis wrote: 
> Change to use the newly-introduced PreservedMarks* classes for 
> preserving marks in G1: 
> 
> http://cr.openjdk.java.net/~tonyp/8151556/webrev.0/ 
> 
> G1 already had per-worker mark stacks so this change was mostly 
> straightforward. A few comments / questions: 
> 
> - The main change to the PreservedMarks* classes is to add the 
> par_restore() method so that G1 continues doing that phase in 
> parallel. I changed the PreservedMarks::restore() method to do what 
> G1 was doing (i.e., keep popping entries from the stack until the 
> stack is empty) instead of iterating over the stack, then clearing 
> it. The latter was, for some reason, faster in the serial case (what 
> we’ve had so far in PreservedMarks). However, it doesn’t seem to 
> parallelize well. 

Contention in free() when everyone is finished? 

I think so. Interleaving free with mark restoration seems to help a bit. FWIW, we also increased the stack segment size to 64K and that also improves things further...

Some experiments in this area (with G1) showed that the work 
distribution is far from optimal btw, particularly for smaller sets of 
marks to restore. 

Yes. BTW, given that the stacks are already chunked it might be relatively easy to take advantage of that to achieve better load balancing. Not sure it’s worth spending more time on this, though….

> So, I opted to copy what G1 was doing so that at least there’s no 
> regression in G1. I also changed the logic slightly: the parallel 
> workers now claim tasks to do (using the SequentialSubTasksDone 
> class) instead of only doing the one that corresponds to their worker 
> id (what G1 was doing before). This doesn’t seem to penalize this 
> phase (at least in the tests I ran) and it’s a bit safer (if one 
> worker wakes up late, maybe another one will do the work). 

If a worker is later, this is still a problem, because if it is late it 
is often very late. And at the end of that parallel phase they need to 
synchronize anyway. 

Of course. But, if it eventually starts up late at least all the work will be done and the phase can complete immediately (instead of waiting even longer). No, if the worker takes 1min to start up, this of course won’t really help. :-) But in the common case, it might.

Tony

> - Changed the max cache size of the stacks to 0 so that the stacks do 
> not cache any segments. Given that we don’t expect promotion / 
> evacuation failures to happen very frequently, caching segments on 
> these stacks is mostly wasted space. I also added asserts to ensure 
> that there are no cached segments when the stacks are supposed to be 
> empty. 

Okay. 

> - I wanted to have one method that has the logic to initialize an 
> object’s mark word after promotion / evacuation failure. Both ParNew 
> / ParallelGC use the RemoveForwardedPointerClosure, but G1 has its 
> own closure (as it has to do some extra work). I introduced the 
> init_forwarded_mark() method that’s called from both places. I could 
> also make the G1 closure a subclass of RemoveForwardedPointerClosure 
> and just call the do_object() on the parent. Or drop the idea of 
> using the same method. Feedback welcome. 

The current method looks fine to me. 

> 
> - Now that G1 doesn’t use the OopAndMarkOop and OopAndMarkOopStack 
> classes directly I made them private to PreservedMarks. 
> 
> - Small re-arranging of the #include declarations in 
> preservedMarks.inline.hpp as I had put them before the #ifndef guard 
> for some reason (oops! sorry…) 
> 
> - As it was straightforward, I also call par_restore() from ParNew 
> too. I opted to decide whether to call the serial or parallel cases 
> in DefNew based on whether workers() is NULL or not. I can also do 
> that with a virtual method on DefNew which is overwritten in ParNew. 
> Your call. BTW, I can do this change on a separate CR too if it’s 
> considered outside the scope of this one (but it was pretty trivial 
> so I think it’s OK piggy-backing it here). 

I think the suggestion in the review thread for the extracted parallel 
header restoration is best in terms of size of interface, lines of code 
and code reuse opportunities. I.e. there does not seem to be a need to 
use inheritance here imho. 

Thanks, 
Thomas 

-----

Tony Printezis | JVM/GC Engineer / VM Team | Twitter

@TonyPrintezis
tprintezis at twitter.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20160324/aa94ad61/attachment.htm>