RFR: 8151556: Use the PreservedMarks* classes for the G1 preserved mark stacks

Fri Mar 25 12:20:30 UTC 2016

Hi,

On Thu, 2016-03-24 at 18:07 -0400, Tony Printezis wrote:
> Thomas,
> 
> Inline.
> 
> On March 22, 2016 at 7:01:32 AM, Thomas Schatzl (
> thomas.schatzl at oracle.com) wrote:
> > Hi Tony, 
> > 
> > some comments to your questions: 
> > 
> > On Thu, 2016-03-10 at 12:56 -0500, Tony Printezis wrote: 
> > > Change to use the newly-introduced PreservedMarks* classes for 
> > > preserving marks in G1: 
> > > 
> > > http://cr.openjdk.java.net/~tonyp/8151556/webrev.0/ ;
> > > 
> > > G1 already had per-worker mark stacks so this change was mostly 
> > > straightforward. A few comments / questions: 
> > > 
> > > - The main change to the PreservedMarks* classes is to add the 
> > > par_restore() method so that G1 continues doing that phase in 
> > > parallel. I changed the PreservedMarks::restore() method to do
> > what 
> > > G1 was doing (i.e., keep popping entries from the stack until the
> > > stack is empty) instead of iterating over the stack, then
> > clearing 
> > > it. The latter was, for some reason, faster in the serial case
> > (what 
> > > we’ve had so far in PreservedMarks). However, it doesn’t seem to 
> > > parallelize well. 
> > 
> > Contention in free() when everyone is finished? 
> I think so. Interleaving free with mark restoration seems to help a
> bit. FWIW, we also increased the stack segment size to 64K and that
> also improves things further...
> 

Actually I am pretty sure about this when doing the work for G1. 64k
stack segment size did not seem to help too much, and given that people
tend to run with too many threads (tm), this seemed to be a waste.

Also, an evacuation failure should be kind of last resort, it seemed to
me that it is better to work on minimizing the occurrences and their
"size".

However I am not opposed to changing the stack segment size.

> > Some experiments in this area (with G1) showed that the work 
> > distribution is far from optimal btw, particularly for smaller sets
> > of 
> > marks to restore. 
> Yes. BTW, given that the stacks are already chunked it might be 
> relatively easy to take advantage of that to achieve better load 
> balancing. Not sure it’s worth spending more time on this, though.

I kind of agree, for the reasons stated above already.

> > > So, I opted to copy what G1 was doing so that at least there’s no
> > > regression in G1. I also changed the logic slightly: the parallel
> > > workers now claim tasks to do (using the SequentialSubTasksDone 
> > > class) instead of only doing the one that corresponds to their 
> > > worker id (what G1 was doing before). This doesn’t seem to 
> > > penalize this phase (at least in the tests I ran) and it’s a bit 
> > > safer (if one worker wakes up late, maybe another one will do the 
> > > work). 
> > 
> > If a worker is later, this is still a problem, because if it is 
> > late it is often very late. And at the end of that parallel phase 
> > they need to synchronize anyway. 
> Of course. But, if it eventually starts up late at least all the work
> will be done and the phase can complete immediately (instead of
> waiting even longer). No, if the worker takes 1min to start up, this
> of course won’t really help. :-) But in the common case, it might.

Sure, no objections here. Just not to get hopes up too much.

Thanks,
  Thomas