Increased ScanRS time when decreasing G1RSetUpdatingPauseTimePercent
Thomas Schatzl
thomas.schatzl at oracle.com
Mon Jan 20 14:17:16 UTC 2020
Hi Joakim,
On 19.01.20 12:02, Joakim Thun wrote:
> Hi all,
>
> I would really appreciate some help understanding a G1 behaviour I am
> seeing when decreasing the value of G1RSetUpdatingPauseTimePercent where
> the goal is to decrease the time spent in the UpdateRS phase by moving
> some of the work to be processed concurrently by the refinement threads.
>
> The behaviour I was expecting to see was a decrease in UpdateRS time
> which I am seeing but at the expense of more time being spent in the
> ScanRS phase so the end result i.e. the total pause time end up being
> very similar with and without the flag set. Decreasing
> G1RSetUpdatingPauseTimePercent to both 5 and 1 results in similar
> behaviour. I noticed that the number of scanned cards is much higher in
> the ScanRS phase when decreasing G1RSetUpdatingPauseTimePercent.
>
> Is this expected behaviour?
>
TLDR: yes.
Longer version:
The refinement threads and the refinement queues (which are processed
during Update RS) purpose is to update the remembered sets (attributed
in the Scan RS time) after some filtering (is that card already in a
remembered set? Can we drop it for other reasons?)
If an entry/card in the refinement queues has not been processed before
GC, it must be during GC (not the entire filtering needs to be applied
there).
What is cheaper to do during GC, scanning remembered sets or refinement
queues? Depends on the contents of the card. If it contains references
to a lot of regions in the collection set, then it is probably cheaper
to let it stay in the refinement queue. If it does not contain a
reference to any region in the collection set, then putting it into the
remembered sets it's a win because we moved otherwise unnecessary work
out of the pause.
There are a lot of different arguments about what the optimal location
for a card should be; some of these decisions have impact outside of the
gc pause too.
E.g. a card in the refinement queue not yet processed is never
re-enqueued - this saves enqueuing and processing work at mutator time;
however, given that they may not contain cards that are in the
collection set (which you know if you process them), keeping them would
make pause slightly time longer.
As long as the card in the refinement buffer contains a reference to the
collection set, G1 would scan it anyway (it would be in some remembered
set), and retrieving values from the refinement queue during gc is (very
slightly) faster than from the remembered sets.
Overall there is no rule that "Update RS" work is bad while "Scan RS" isn't.
In your case, since you are trading Update RS with Scan RS time, I would
argue that it's better to have the cards in the refinement queue.
> Are there any other flags worth considering to improve the ScanRS time
> while moving more work to the refinement threads?
One could try to manually control refinement work by manually setting
the various thresholds. No guarantees that this improves your situation.
Logging "gc+ergo+refine=debug" may help with debugging the adaptive
refinement thresholds; gc+remset=trace gives some general information
about concurrent refinement.
Some rundown on the options:
G1UseAdaptiveConcRefinement: enable adaptive refinement, ie. try to
observe G1UpdatePauseTimePercent.
G1UpdateBufferSize (default 256): size of a buffer in the refinement
queue, i.e. individual threads will cache that amount of cards to
process later until they are made available to the refinement threads.
G1ConcRefinementGreenZone, G1ConcRefinementYellowZone,
G1ConcRefinementRedZone: some thresholds that control refinement
threads. If the number of buffers (see above) is lower than the green
threshold, there is no concurrent refinement activity. From green to
yellow threshold increasingly more concurrent refinement threads will be
used. If the threshold reaches red, mutator threads will do the work.
If G1UseAdaptiveConcRefinement is enabled, the thresholds are changed
adaptively, and the ones you give on the command line are initial
values. Otherwise the thresholds are fixed.
G1ConcGCThreads: max number of refinement threads.
So you could completely disable concurrent refinement by disabling
G1UseAdaptiveConcRefinement, and setting G1ConcGCThreads=0; this will
make the mutators do all the work immediately if you set the red
threshold to 0 too. If you set the G1UpdateBufferSize to 1 too, the
mutators will immediately do all work I think (this will likely have a
significant impact on mutator performance).
Otherwise, using the thresholds, you can, in a very granular way select
the amount of concurrent refinement work.
Thanks,
Thomas
More information about the hotspot-gc-use
mailing list