RFR (M): 8157952: Parallelize Memory Pretouch

Wed Sep 7 13:42:45 UTC 2016

Hi,

On Wed, 2016-09-07 at 09:01 -0400, Vitaly Davidovich wrote:
> Hi Thomas,
> 
> On Wed, Sep 7, 2016 at 8:50 AM, Thomas Schatzl <thomas.schatzl at oracle
> .com> wrote:
> > Hi Vitaly,
> > 
> > On Wed, 2016-09-07 at 10:54 +0000, Vitaly Davidovich wrote:
> > > Hi Thomas,
> > >
> > > Why does the G1PretouchTask use an atomic ptr add to advance the
> > > touch address?
> > 
> >   to make sure that a particular thread gets a unique page to
> > touch. 
> > 
> > Otherwise multiple threads might get the same. While this is not a
> > correctness issue, it's useless work. Also the OS will likely let
> > all but the first thread wait until the page has been allocated and
> > cleared until they can proceed. I.e. the threads will be serialized
> > again.
> Oh, I thought each pretouch task is assigned a range of memory to
> pretouch when its constructed, but apparently they all just "compete"
> to claim the next chunk.  Not a big deal, but is there a reason you
> chose to do it that way instead of assigning disjoint ranges to the
> workers before starting the pretouch? Is it to avoid work imbalance
> or something? 

  there is no particular reason, but imo it makes the code simpler by
following existing patterns, and as you suggested automatically avoids
imbalance.

The synchronization required on the atomic add is (imo) negligible
compared to pre-touching at least one GB of memory for every thread
(i.e. the thread pre-clearing that memory on behalf of the OS).

Thanks,
  Thomas