RFR (M): 8228609: G1 copy cost prediction uses used vs. actual copied bytes

Tue Nov 12 09:06:36 UTC 2019

Hi Kim,

   sorry for the late reply - I have been working on updating this, but 
other things went in-between. Sorry.

On 06.11.19 01:57, Kim Barrett wrote:
>> On Oct 22, 2019, at 1:30 PM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>
>> Hi all,
>>
>>   can I have reviews for this change that makes G1 calculate and the use actual amount of bytes copied for Object Copy phase estimation?
>>
>> The problem is that the "used" value that is currently used for this can differ a lot from the number of actually copied bytes during the parallel phases.
>>
>> Sources for differences are:
>> - TLAB sizing
>> - TLAB/region fragmentation
>> - all of that multiplied by the number of threads
>>
>> Particularly if the amount of copied data is small compared to the number of regions all this can add up and disturb the prediction quite a lot, although overall it's not that bad.
>>
>> It's only that this and other small inaccuracies add up.
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8228609
>> Webrev:
>> http://cr.openjdk.java.net/~tschatzl/8228609/webrev/
>> Testing:
>> hs-tier1-5
>>
>> Thanks,
>>   Thomas
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1ParScanThreadState.cpp
>   105 size_t G1ParScanThreadState::copied_words() {
>   106   size_t result = _surviving_words;
>   107   _surviving_words = 0;
>   108   return result;
>   109 }
> 
> The reset behavior seems unexpected, based on the name, which looks
> like an accessor.
> 
> I think the reset behavior is to avoid double-counting by the
> recording in evacuate_live_objects. That led me to consider suggesting
> a more appropriate place for the reset might be in G1PSTS::flush(),
> where the lab_waste and lab_undo_waste (that were recorded nearby)
> also get reset. But I don't think that flush() is happening in the
> right place to prevent double-counting of the waste values. Bug?

Bug. Fixed with the new version.

> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1ParScanThreadState.cpp
>   319     _surviving_words += word_sz;
> 
> Is it really worth having a separate accumulator for the total?  It
> seems like we could instead have copied_words() return the sum over
> the _surviving_young_words.
> 
> But that might not work because of the (lack of) reset in the right
> place, per above.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1Policy.cpp
>   782       double cost_per_byte_ms = (average_time_ms(G1GCPhaseTimes::ObjCopy) + average_time_ms(G1GCPhaseTimes::OptObjCopy)) / copied_bytes;
> 
> [pre-existing]
> 
> I think this is computing the rate at which active_workers worker
> threads copies bytes. What if active_workers changes?
> 

That is an existing issue, and I believe more metrics than that do not 
take the number of threads into account properly. There is also the 
issue that this value is not necessarily scaling linearly with the 
number of threads (it includes e.g. work stealing time), so simple 
linear interpolation will not work.

I filed https://bugs.openjdk.java.net/browse/JDK-8233985.

Given your above comments I changed the code to collect these values in 
the "Merge Per-Thread State" phase as we only need a total value anyway. 
At this point the necessary calculations are free, as we already iterate 
over all _surviving_young_words entries.

Webrev:
http://cr.openjdk.java.net/~tschatzl/8228609/webrev.0_to_1/ (diff)
http://cr.openjdk.java.net/~tschatzl/8228609/webrev.1/ (full)
Testing:
hs-tier1-5

Thanks,
   Thomas