RFR: 8146989: Introduce per-worker preserved mark stacks in ParNew

Fri Feb 5 14:18:43 UTC 2016

Updated webrev with a small change suggested by John Coomes (CCed) to move the in_c_heap arg to the c’tor instead of the init() / reclaim() methods:

http://cr.openjdk.java.net/~tonyp/8146989/webrev.3/

Tony

On February 3, 2016 at 1:34:15 PM, Tony Printezis (tprintezis at twitter.com) wrote:

Thomas,

Is that what you were looking for?

http://cr.openjdk.java.net/~tonyp/8146989/webrev.2/

I haven’t included the PreservedMarks / PreservedMarksSet abstractions in ParallelOld yet, but I browed the code and it will be straightforward to do so (I implemented them with that in mind…).

Tony

On February 2, 2016 at 6:57:37 AM, Thomas Schatzl (thomas.schatzl at oracle.com) wrote:

Hi Tony,

On Fri, 2016-01-29 at 12:12 -0500, Tony Printezis wrote:
> Thomas,
>
> Apologies for the delay. Inline.
>
> > On January 25, 2016 at 11:24:01 AM, Thomas Schatzl (
> > thomas.schatzl at oracle.com) wrote:Hi Tony,
> >
> > On Fri, 2016-01-22 at 16:59 -0200, Tony Printezis wrote:
> > > Hi Thomas,

[...]
> > > * Different GCs have different assumptions on the lifetime of the
> > > data structures they allocate for their workers. (From memory)
> > > ParallelGC allocates them once upfront and re-uses them at every
> > > GC, ParNew allocates them before every GC in an arena and
> > > reclaims them at the end of the GC (in the ResourceMark d’tor
> > > IIRC), etc. So, IMHO, it will be easier to just embed a
> > > PreservedMarks field in the worker state (as I did in the webrev)
> > > so that its lifetime / reclamation is
> > > dealt with automatically and according to the GC’s assumptions.
> >
> > Not sure if this exactly corresponds to my understanding of the
> > lifetime of ParScanThreadState (PSS in the following): to me it is
> > a helper data structure that is for storing information that comes
> > up during the evacuation phase only.
> This should have been ‘arrays', not ‘array'...

> Sure. And both ParallelGC and G1 have equivalents. But the lifetime
> of those data structures for each GC is different. For example, the
> instances of ParScanThreadState are allocated in the resource arena
> at the start of each ParNew (see the c’tor of ParScanThreadStateSet
> in parNewGeneration.cpp) and reclaimed at the end of the ParNew, the
> instances of PSPromotionManager are allocated during JVM
> initialization (see PSPromotionManager::initialize() in
> psPromotionManager.cpp) and re-used at every GC, etc. Personally, I
> like the idea of adding a PreservedMarks field the GC worker state
> classes and piggy-back on whatever memory management they do. If you
> want to do something different, and manage the PreservedMarks
> separately, you’ll now have two data structures with different
> lifetimes interacting with each other.

> > Again to me, preserved mark handling/restoration, also because it
> > needs to be done in a separate parallel phase after main
> > evacuation, has a different lifetime and so its owner should not
> > be the PSS.
> Personally, I don’t think it does. There’s some clean up to be done
> at the end of the GC that needs to interact with the GC worker state
> instances (e.g., flush PLABs). Doing preserved mark restoration as
> part of that process is not unreasonable, IMHO.

Well, similar arguments could be said why other data structures that
are ultimately filled in with information from the PSS are not in the
PSS.

I can see your point, but I hope we can agree to disagree on this right
now. Let's just leave this change out now unless somebody else has a
strong opinion.

> > [At least for G1 we might look into that, i.e. using different
> > amount of threads for different parallel phases for various reasons
> > quite soon.
> (Out of curiosity) Do you have a CR for this?

Multiple (I am including some related to parallization here):

8034842: Parallelize the Free CSet phase in G1
7068229: G1: Dynamically enable MT reference processing for remark
pauses
8043575: Dynamically parallelize reference processing work
8143024: Make aggregate-data phase concurrent
8133051: Concurrent refinement threads may be activated and deactivated
at random
8076462: Preserving the referents of concurrent mark work distribution
method causes long pause times
8057000: Increase parallelism in String deduplication fixup
8040006: Merge "Other" time parallel phases into a single

Note that these are not complete, and just what I found quickly when
browsing through issues with the g1-gc label.

Also, apart from the general hint that we are aware of problems in a
particular area, there may be some changes in other components for jdk9
that might obsolete some of these CRs, or invalidate the approach
indicated.

So if you want to spend time on one or another problem, please give a
heads up. :)

> > - it is known where the mark restoration code is located. (exactly
> > in
> > PreservedMarksSet::restore_marks()), and not in more or less
> > randomly
> > named methods depending on collector.
> This is a non-requirement, IMHO. Most of the work is done in the
> restore() method of PreservedMarks. Where that’s called from I don’t
> think it matters, IMHO.

Again, just a matter of taste - it is much easier to understand stuff
if things work similarly and are used in a similar way in the different
collectors. Otherwise there is always a certain additional reluctance
to dig into e.g. CMS code for some particular purpose just because the
same things are already named quite differently, and the same things
are done very differently. (I am not advocating a try-to-share
-everything-everywhere approach here; but consistent naming and use of
the same concepts in the same way would decrease that hurdle :))

It is not a big issue. Let's leave this refactoring here then.

> > - makes the main evacuation method for all collectors a bit more
> > uniform. I.e. all do
> >
> > evacuate_collection_set();
> > if (evacuation_failed()) {
> > PreservedMarksSet::restore_marks()
> > }
> >
> > - puts mark restoration code in subclasses (searching for
> > subclasses of
> > PreservedMarksStack is much easier for me
> …and it makes the code more complicated to follow, as you first need
> to know which subclass is used in each GC. I generally prefer
> straight-line code, than having to understand a class hierarchy.

:)

> Anyway, I need to get this off my plate sooner than later. If you
> still want me to do the PreservedMarksSet implementation, I can. But
> tell me how to manage the memory for the PreservedMarks instances. Do
> I malloc them on the C-heap at the start of GC and reclaim them at
> the end?

I am fine with the current approach if you do not feel like spending
time on it.

Thanks,
Thomas

-----

Tony Printezis | JVM/GC Engineer / VM Team | Twitter

@TonyPrintezis
tprintezis at twitter.com

-----

Tony Printezis | JVM/GC Engineer / VM Team | Twitter

@TonyPrintezis
tprintezis at twitter.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20160205/bc60a014/attachment.htm>