RFR: 8220465: Use shadow regions for faster ParallelGC full GCs

Wed Nov 27 13:22:23 UTC 2019

Hi Haoyu,

I've quickly looked through the changes and they look good in general, 
the renaming makes the code easier to follow.

One small knit, the paragraph referring your paper starts with "More 
more details", which I guess should be changed to "For more detials".

Here are updated webrevs:
Full: http://cr.openjdk.java.net/~sjohanss/8220465/04/
Inc: http://cr.openjdk.java.net/~sjohanss/8220465/03-04/

Cheers,
Stefan

On 2019-11-27 03:15, Haoyu Li wrote:
> Hi Thomas,
> 
> Thanks for your constructive suggestions!  I've addressed the issues you
> mentioned, and the updated patches are attached. Please find the details
> below.
> 
>> - static const ints can be initialized in the definition (UNUSED,
>> SHADOW, ...); also they should be CamelCased; they are very unspecific
>> too - I added some prefix to distinguish them a bit.
>>
>> Now these values of _shadow_state are initilized in the definition and
> CamelCased. I have also changed their names to ba more specific: a region
> is *UnusedRegion* when untouched, and will become *NormalRegion* if
> processed with the original parallel full GC algorithm; if an idle GC
> thread steals an unavailable region to process it with the help of a shadow
> region, the thread will mark it to *ShadowRegion*, *FilledShadow*, and
> *CopiedShadow* in sequence.
> 
> - the documentation about this change is imho lacking.
>>
>>    - It would be nice to explain the idea of shadow regions somewhere
>> assuming that you know how parallel works. Including the reference to
>> the paper. :)
>>
>>    - some of the comments just show what code (often a single statement)
>> does, not the what and why or the reason why a particular method or
>> member exists. Or explains one or the other.
>>
>> E.g.
>>
>> "The shadow region array, we use it in a LIFO fashion, so that we can
>> reuse shadow regions for better data locality and utilization"
>>
>>     - at this point we have no idea what a "shadow region" is and we
>> can't find out easily because it is called "shadow record" or "steal
>> record" elsewhere.
>>
>> Something better could be:
>>
>> "Contains currently free shadow regions (assuming we converge on that
>> name). We use it in a LIFO fashion for better data locality and
>> utilization."
>>
>> Thanks for pointing out the lack of illustration. I've added several
> paragraphs to demonstrate the main idea, the typical workflow, and the
> source paper of shadow region optimization in the comments of
> ParallelCompact::initialize_shadow_region(). Besides, I've also checked
> other comments in the patch and made them more precise.
> 
> 
>> - I think there is a missed optimization opportunity in (now)
>> PSParallelCompact::initialize_shadow_regions(). There, the code
>> initializes the "free" region ids to region_at_top+1 to end_region of a
>> particular space.
>>
>> If the top for a given space is at a region boundary (e.g. if a space is
>> empty, which is probably common for one of the survivor spaces), you
>> loose a single region per space.
>>
>> One reason might that the code uses region "0" as sentinel to indicate
>> "there is no shadow region available" in
>> ParCompactionManager::acquire_shadow_region().
>>
>> This could be fixed by improving the code in
>> PSParallelCompact::initialize_shadow_regions() and use a sentinel region
>> value of (size_t)~0 (as an explicit constant).
>>
>> Even if you do not change this, please introduce an explicit constant
>> for this sentinel value. This makes the code more self-explanatory.
>>
>> Sorry for the misleading +1 operation. The +1 can be safely removed. The
> sentinel value 0 does not cause this design because the first region (in
> old space) cannot be a shadow region.
> 
> 
>> - at least in ParallelCompactData::RegionData::try_steal I would add a
>> dirty read of the _shadow_state to avoid the overhead of obviously
>> unsuccessful steal attempts (I do not know about frequencies of those,
>> so ymmv, but probably it would be easiest to add it everywhere).
>>
>> Also all the cmpxchg can/should use memory_order_relaxed to avoid the
>> two full fences every time accessed as far as I can tell.
>>
>> Excellent suggestions! I didn't consider the performance factors in these
> atomic instructions before, and you're right that GC threads may suffer
> many failures the first time getting shadow regions. Changing the memory
> order to memory_order_relaxed is also helpful.
> 
> 
>> - not sure about whether "acquire_shadow_region()/release_shadow_region"
>> are good names for
>> "PSParallelCompact::try_pop_shadow_region/push_shadow_region" (or
>> something else).
>>
>> "Acquire"/"release" has a very specific semantic related to a completely
>> different area (memory ordering in MP systems), so we should probably
>> avoid using them. There are other well-used pairs of names to add and
>> remove elements to a container too.
>>
>> Naming functions with "Acuire" and "Release" is indeed misleading. I've
> changed these functions to push/pop_shadow_region,
> push/pop_shadow_region_mt_safe, and remove_all_shadow_regions in
> PSParallelCompact to fit the underlying LIFO _shadow_region_array. Do you
> think these names are appropriate now?
> 
> 
>> - the changes in PSParallelCompact sometimes use the terms
>> "steal_record", "shadow_record" and "shadow_region" (e.g.
>> _shadow_region_array) interchangeably.
>>
>> Can you give a reason for this? I am good with any (with a preference
>> for "shadow_region" since it gives an idea of the contents while
>> "record" is quite generic), but it makes reading the code harder than
>> necessary.
>>
>> Sorry for the inconsistency between variable names. I introduced
> steal_record to record the index of the next shadow region, so that a GC
> thread could seek shadow regions from the last point instead of the
> beginning. To make the code more specific, I've changed the variable
> _shadow_record to _next_shadow_region and moved the code in
> PSParallelCompact::initialize_shadow_record to
> PSParallelCompact::initialize_shadow_regions.
> 
> 
>> - the names of the new methods e.g. in PSParallelCompact::RegionData
>> should be more precise; e.g. please add what does "try_push" wants to
>> push? Or "try_steal" steal?
>> Not even the comments for these contain that information, and I believe
>> that by better naming of the methods, we can avoid the comments
>> completely in most cases.
>>
>> Sorry for the vague code. These five atomic interfaces intend to change
> the _shadow_state of the current region to reflect the collection process,
> not to push or steal anything.  I've changed try_push to mark_normal and
> try_steal to mark_shadow, respectively. The _shadow_state and the return
> value of these functions can help the collector to determine 1) whether a
> region should be collected by the shadow region optimization and 2) if the
> data in a shadow region are ready to be copied back to the corresponding
> heap region.
> 
> Thanks again for your valuable reviews. If there are any further problems,
> please contact me at any time. I was also wondering could you please CC the
> following mails to me? There seem some problems with my email, and I didn't
> receive your last mail until I searched the mail lists in OpenJDK website.
> Thanks very much!
> 
> Best Regards,
> Haoyu Li,
> Institute of Parallel and Distributed Systems(IPADS),
> School of Software,
> Shanghai Jiao Tong University
>