RFR: 8146989: Introduce per-worker preserved mark stacks in ParNew

Fri Jan 29 17:12:14 UTC 2016

Thomas,

Apologies for the delay. Inline.

On January 25, 2016 at 11:24:01 AM, Thomas Schatzl (thomas.schatzl at oracle.com) wrote:

Hi Tony, 

On Fri, 2016-01-22 at 16:59 -0200, Tony Printezis wrote: 
> Hi Thomas, 
> 
> Thanks for looking at it. I’m traveling this week (at the JCP EC 
> meeting) so I’ll look at this in more detail early next week. 

No problem. 

> To make sure I understand: So, you’re suggesting that I introduce an 
> abstraction that keeps track of the PreservedMarks array (either as a 

This should have been ‘arrays', not ‘array'...

> separate class, PreservedMarksSet, or maybe as a static field in the 
> PreservedMarks class) so that the logic on how to restore the 
> preserved marks is shared instead of replicated. I think there are a 
> couple of issues with this: 
> 
> * Different GCs have different assumptions on the lifetime of the 
> data structures they allocate for their workers. (From memory) 
> ParallelGC allocates them once upfront and re-uses them at every GC, 
> ParNew allocates them before every GC in an arena and reclaims them 
> at the end of the GC (in the ResourceMark d’tor IIRC), etc. So, IMHO, 
> it will be easier to just embed a PreservedMarks field in the worker 
> state (as I did in the webrev) so that its lifetime / reclamation is 
> dealt with automatically and according to the GC’s assumptions. 

Not sure if this exactly corresponds to my understanding of the 
lifetime of ParScanThreadState (PSS in the following): to me it is a 
helper data structure that is for storing information that comes up 
during the evacuation phase only. 

Sure. And both ParallelGC and G1 have equivalents. But the lifetime of those data structures for each GC is different. For example, the instances of ParScanThreadState are allocated in the resource arena at the start of each ParNew (see the c’tor of ParScanThreadStateSet in parNewGeneration.cpp) and reclaimed at the end of the ParNew, the instances of PSPromotionManager are allocated during JVM initialization (see PSPromotionManager::initialize() in psPromotionManager.cpp) and re-used at every GC, etc. Personally, I like the idea of adding a PreservedMarks field the GC worker state classes and piggy-back on whatever memory management they do. If you want to do something different, and manage the PreservedMarks separately, you’ll know have two data structures with different lifetimes interacting with each other.

Again to me, preserved mark handling/restoration, also because it needs 
to be done in a separate parallel phase after main evacuation, has a 
different lifetime and so its owner should not be the PSS. 

Personally, I don’t think it does. There’s some clean up to be done at the end of the GC that needs to interact with the GC worker state instances (e.g., flush PLABs). Doing preserved mark restoration as part of that process is not unreasonable, IMHO.

The other thing is, the threads restoring the marks may be different 
ones than the ones pushing marks onto them. I.e. use a different amount 
of threads for mark restoration than evacuation to better balance the 
(available) workload. Then having the PSS the "owner" of the stacks 
just gets in the way. 

Each GC already has an array that keeps track of the GC worker state instances. So you can always grab it from there, you don’t need to have an 1-1 relationship between GC worker and GC worker state. (FWIW)

[At least for G1 we might look into that, i.e. using different amount 
of threads for different parallel phases for various reasons quite 
soon.

(Out of curiosity) Do you have a CR for this?

 Which means, please keep the preserved mark stacks separate from 
its PSS in at least G1 :] 

> * I do like the idea of having the code that restores the preserved 
> marks (maybe serially, maybe in parallel) abstracted away. But, each 
> GC maintains its own pool of worker threads and also decides how many 
> workers it uses at every GC (can this still be calculated 
> dynamically?). I think this will result in a slightly messy GC 
> -PreservedMarksSet interface. 

The main idea of this unified marks restoration method would be that it 

- it is known where the mark restoration code is located. (exactly in 
PreservedMarksSet::restore_marks()), and not in more or less randomly 
named methods depending on collector. 

This is a non-requirement, IMHO. Most of the work is done in the restore() method of PreservedMarks. Where that’s called from I don’t think it matters, IMHO. 

- makes the main evacuation method for all collectors a bit more 
uniform. I.e. all do 

evacuate_collection_set(); 
if (evacuation_failed()) { 
PreservedMarksSet::restore_marks() 
} 

- puts mark restoration code in subclasses (searching for subclasses of 
PreservedMarksStack is much easier for me

…and it makes the code more complicated to follow, as you first need to know which subclass is used in each GC. I generally prefer straight-line code, than having to understand a class hierarchy.

Anyway, I need to get this off my plate sooner than later. If you still want me to do the PreservedMarksSet implementation, I can. But tell me how to manage the memory for the PreservedMarks instances. Do I malloc them on the C-heap at the start of GC and reclaim them at the end?

Tony

 than guessing the name of the 
day for a particular collector) instead of putting it somewhere in 
CollectedHeap instances. You could then also remove some mark 
restoration helper types out of the global namespace. 

> * Customizing restore_all_marks() for each GC to address some of the 
> above issues will defeat the purpose of introducing the abstraction 
> in the first place. 

I am aware that the actual code for restoring the marks can not be 
shared. I see it as a way to improve readability of the code. 

> So, even though I understand the attraction of doing what you’re 
> suggesting, it will be more trouble that it’s worth IMHO. And note 
> that my change doesn’t add more duplicated code (and it will actually 
> decrease the amount of duplicated code when we use PreservedMarks in 
> ParallelGC). The code is already duplicated. 

I do not insist implementing this the way I would do that. That idea 
may be done at a different time, or altogether scrapped. 

> Re: long notes: Feel free to delete them. :-) Of course we can create 
> CRs for some of this follow-up. But when someone is reading the code 
> they will not go and look up CRs if they wonder about something, so 
> it’s helpful to have some explanation as a comment. 

Re-reading it again, it's fine with me to keep that note after all. I 
seem to have misunderstood the purpose of this wall of text :) 

> I’ll fix the copyright dates. 

Thanks, 
Thomas 

-----

Tony Printezis | JVM/GC Engineer / VM Team | Twitter

@TonyPrintezis
tprintezis at twitter.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20160129/bfd633d7/attachment.htm>