RFR (M): 8087198: G1 card refinement: batching, sorting

Fri Nov 15 09:49:35 UTC 2019

Hi,

On 14.11.19 02:59, Man Cao wrote:
> Thanks for the reviews and testing, very helpful!
> 
> I have addressed all comments, new webrevs:
> http://cr.openjdk.java.net/~manc/8087198/webrev.01/
> Incremental: https://cr.openjdk.java.net/~manc/8087198/webrev.00-01.inc/
> 
> Some tests involving G1AddMetaspaceDependency are crashing in
> debug/fastdebug builds, due to
> https://bugs.openjdk.java.net/browse/JDK-8234127 and G1UpdateBufferSize=1.
> I'm working on fixing this bug for the hashtable.
> I also see some Windows build failure on Submit repo, not sure if it is
> related to JDK-8234127, could someone find the logs for this run?
> Job: mach5-one-manc-JDK-8087198-1-20191113-2315-6689774
> 
> I also tested the performance for sorting order of the cards on
> BigRamTester, below are the CPU time for refinement threads, averaged
> across 34 trials with 95% confidence intervals in brackets:
> base (no batching): 437656.676 [425374.295, 449939.057]
> batching and decreasing order: 424714.529 [412841.728, 436587.330]
> batching and increasing order: 459918.294 [448483.304, 471353.284]

I suggest improving the comment in g1DirtyCardQueue.cpp that "tests 
showed that this order is preferable to not sorting or increasing 
address order" instead of the more unspecific "improves performance".

> Additional replies below.
> ------------------------------------------------------------------------------
>> collect_and_clean_cards could use two-finger compaction of the
>> _node_buffer in place.  (Similar to the SATB buffer filtering.)
> Great! I should have thought of this after removing the MemRegions.
> Now it's doing the two-finger compaction, and no more memcpy and
> ResourceMark.

:)

> ------------------------------------------------------------------------------
>> This change re-reads top() after the fence.
>> ...
>> I *think* re-read is okay for humongous regions too, but haven't fully
>> convinced myself of that yet.
> I thought about this further and convinced myself it is safe.
> I added some DEBUG_ONLY code to check the two reads of top()
> should return the same value, by using a KVHashtable.
> I'm not sure if such checking code is desirable. Please advise.
> Maybe I should use ResourceHashtable as Ioi suggested in JDK-8234127.

The assert looks safe (and the code valid) because in case of a 
safepoint (e.g. remark) the card is re-evaluated with the potentially 
new top anyway.

I.e. there can not be a safepoint with potential removal of the 
humongous regions between cleaning and actual refining. Other old gen 
region's top never change.

The worst that can happen is:

clean_cards(X) // X is a buffer; reads top1 = top

<safepoint-request>
re-enqueue(X)
<remark-safepoint>

clean_cards(X) // re-reads top2 = top
refine_cards(X) // uses top2

(The latter two steps can also occur in the safepoint during the card 
scanning if this is an evacuating safepoint)

So not sure if the KVHashTable check is useful here, but I'm not 
opposing it either.

> ------------------------------------------------------------------------------
>> I already split this CR into two.
>>   From a functional POV hs-tier1-5 pass with the change.
> Thanks for the work! Let's finalize what to do with the DEBUG_ONLY
> KVHashtable first, then do another round of testing.

Okay.

> ------------------------------------------------------------------------------
>> Maybe it is used in follow-up patches in these places?
> I haven't got to the follow-up patches yet.
> I will create a CR for the epoch synchronization protocol, as a subtask for
> JDK-8226731.
> Then I'll describe its implementation details and a few yet unresolved
> corner cases, so we can think about them together.

That is a good idea - it is often easier to think about the issues given 
specific code than some explanation.

One nit: in G1RefineBufferedCards::refine_cleaned_cards, use  "++" 
instead of "+=1".

Looks good otherwise.

Thanks,
   Thomas