Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
Liang Mao
maoliang.ml at alibaba-inc.com
Tue Jan 14 09:07:50 UTC 2020
Hi Thomas,
Thank you for the detailed comments!
Most of suggestions I will follow to do the modification.
And I still have some questions:
>> 1. Does the SoftMaxHeapSize limitation need to consider the GC time
>> ratio as in
>> expand_heap_after_young_collection? Now we haven't put the logic in yet.
> I am not completely clear what you are asking about, but the gc time
> ratio only affects the current "optimal" heap size which is bounded by
> SoftMaxHeapsize/max_capacity.
The decision to shrink to SoftMaxHeapSize in this patch is based
on the method "G1HeapSizingPolicy::can_shrink_heap_size_to" which
counts "used" + "reserve" + "young". We will change it to
_heap_sizing_policy->shrink_amount(); as you commented.
I'm not considering the GC time ratio as a factor to determine
whether the heap can be shinked to SoftMaxHeapSize.
> - changes in G1CollectedHeap::resize_heap_if_necessary: please make the
> method to always use the soft max heap size as I do not understand why
> you would not do that.
Do you think we need to apply the logic "can_shrink_heap_size_to"
inside resize_heap_if_necessary to determine whether to make soft
max size as limit?
> - there is a bug in the synchronization of the concurrent uncommit
> thread: it seems possible that the uncommit thread is still working
> (iterating over the list of regions to uncommit) while a completed
> (young) GC may add new regions to that list as young gcs do not wait for
> completion of the uncommit thread.
Uncommit thread could be working parallelly with VMThread but
VMThread will not add regions to the concurrent_resizing_list
if concurrent resizing thread is in "working" state.
> Related to that is the use of par_set/clear_bit in e.g. the available
> bitmap: since all par/clear actions are asserted to be in the vm thread
> at a safepoint, there does not seem to be a need for using the parallel
> variants of set/clear bit (if keeping the current mechanism).
For above reason that concurrent uncommit can run parallely with
VMThread, the bit set/clear in vm thread at safepint have
to be parallel.
> - I have a feeling that if the concurrent uncommit thread worked on
> pages, not regions, the code would be easier to understand. It would
> also solve the issue you asked about with the
> G1RegionsSmallerThanCommitSizeMapper. You may still need to pass region
> numbers anyway for logging, but otoh the logging could be done
> asynchroniusly.
I don't quite understand this part... For the G1RegionsSmallerThanCommitSizeMapper,
a page can be simultaneously requested to commit in VMThread to expand
heap and uncommit in concurrent thread to shrink heap. Looks like lowering
uncommit work to page level couldn't help this...
For the features you listed below,
1) moving the memory uncommit into the concurrent phase
2) uncommit at the end of (almost) every GC
3) SoftMaxHeapSize
Since most of code is for the concurrent framework,
do you think 2) and 3) can be together and implemented first?
(The uncommit will happen immediately)
Thanks,
Liang
------------------------------------------------------------------
From:Thomas Schatzl <thomas.schatzl at oracle.com>
Send Time:2020 Jan. 13 (Mon.) 23:15
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics
Hi Liang,
thanks for your contribution!
I looked through the change a bit and have a few comments.
What I noticed quickly after initial browsing through it is that this
change implements three different features:
1) moving the memory uncommit into the concurrent phase
2) uncommit at the end of (almost) every GC
3) SoftMaxHeapSize
These should be split across three separate changes (I already filed
JDK-8236926 last week). No particular order I think, but the concurrent
uncommit changes are probably hardest and will probably take most time.
Some additional initial comments:
- in G1Collectedheap::check_soft_max_heap_size_changed(), instead of the
check for "AllocateOldGenAt != NULL" we probably want to ask the
HeapRegionManager directly about whether it supports this. Also print
some log message that it failed. Even on success, print a log message :)
- in that same method, I recommend first doing the alignment adjustment
(which probably needs to be done for that suggested soft_max_capacity()
method below too) and then check if it changed. That saves the repeated
!= _prev_soft_max_heap_size check.
Actually, just using the suggested soft_max_heap_size() method should be
fine.
- changes in G1CollectedHeap::resize_heap_if_necessary: please make the
method to always use the soft max heap size as I do not understand why
you would not do that.
I recommend adding a "soft_max_capacity()" method in G1CollectedHeap,
and let that return MIN2(align_up(SoftMaxHeapSize, heap_alignment),
max_capacity()).
There are a few places that check SoftMaxHeapSize validity (e.g.
SoftMaxHeapSize <= capacity()), they could probably all be removed then.
- doing timing: you might have noticed, we are currently transitioning
to use Ticks/Tickspan for points in time and durations at least for the
calculation, so in any new code please avoid using os::elapsedTime().
Use:
Ticks start = Ticks::now();
// timed code
phase_times()->record....((Ticks::now() - start).seconds() * 1000.0);
instead.
- in G1CollectedHeap::shrink_heap_after_young_collection() I would
prefer a structure like in expand_heap_after_collection, i.e.:
size_t shrink_bytes = _heap_sizing_policy->shrink_amount();
if (shrink_bytes > 0) {
// do actual preparation for shrinking
}
and put all that logic determining the amount to shrink in that
shrink_amount() method.
- concurrent uncommit:
- as mentioned, please split the related changes out from the other
changes. This change is hard enough to get right as is.
- I would really prefer if we did not need to introduce another helper
thread for uncommitting memory. Did you try using the
G1YoungRemSetSamplingThread? I understand that uncommit might then delay
young gen sampling, but I do not expect these events to occur all the
time (but I have no reference here).
In the first implementation we could have another thread if others do
not object, but every additional thread takes some time to startup and
teardown, and memory for at least one stack page.
- please move the change in G1CollectedHeap::abort_concurrent_cycle()
into a separate method - waiting for completion of the concurrent
uncommit and the concurrent marking are completely different concerns.
- I admit I haven't looked at all cases in detail, but the split in
is_available() and is_unavailable_for_allocation() in HeapRegionManager
seems incomplete and unncessary. Particularly because of bad naming, as
the documentation for is_available() says it's actually
is_available[_for_allocation]. Disregarding the negation, these two look
equivalent with the problem that !is_available() != is_unavailable...,
which is really bad style.
I have not found a case where it is harmful to not consider the
_concurrent_resizing_map in is_available().
The split the state of a region between two bitmaps in HeapRegionManager
(the available_map and the _concurrent_resizing_map) may be susceptible
to tricky races. Please consider changing this to a real "state" as in
"Available -> Removing -> Unavailable". This would make the code easier
to read too. (And in the future "Adding" if required).
- it should be possible to disable concurrent uncommit/resize via an
experimental flag. Also there should be no concurrent resize thread if
the Heapregionmanager does not support it.
G1 couold immediately do the heap change in that case.
The reason for this flag is to allow users too disable this if they
experience problems.
- not sure about why some methods in HeapRegionManager have "resizing"
in their method name. As far as I can tell, the change only allows
concurrent uncommit. Maybe use the above "remove" for regions instead of
"uncommit" regions.
Background: The code and comments in HeapRegionManager are aware and
fairly consistent (I hope) to not use the wording commit/uncommit for
operations on HeapRegions. Only operations on memory pages should use
committed/uncommit. The naming in the added methods does not respect that.
- some of the methods (e.g. to find free regions) should inform the
caller that there are to-be-removed regions to maybe retry after waiting
for completion of that thread to avoid unexpected OOM.
- I have a feeling that if the concurrent uncommit thread worked on
pages, not regions, the code would be easier to understand. It would
also solve the issue you asked about with the
G1RegionsSmallerThanCommitSizeMapper. You may still need to pass region
numbers anyway for logging, but otoh the logging could be done
asynchroniusly.
- s/parallely/concurrently a few times
- there is a bug in the synchronization of the concurrent uncommit
thread: it seems possible that the uncommit thread is still working
(iterating over the list of regions to uncommit) while a completed
(young) GC may add new regions to that list as young gcs do not wait for
completion of the uncommit thread.
- the concurrently uncommitted regions only become available for
commit at the next gc, which seems very long. Why not make them
available for commit "immediately"?
Related to that is the use of par_set/clear_bit in e.g. the available
bitmap: since all par/clear actions are asserted to be in the vm thread
at a safepoint, there does not seem to be a need for using the parallel
variants of set/clear bit (if keeping the current mechanism).
- please document the supposed interactions and assumptions like in
the above two paragraphs between the "resize" thread and the other
threads and safepoints.
- please use the existing HeapRegionManager::shrink_by()
method+infrastructure for passing a shrink request to the HRM, either
immediately shrinking the heap or deferring for later shrinking
(probably controlled by a flag) instead of adding new methods for the
same purpose (with mostly the same contents).
E.g. there is a lot of code duplication in the new code in
HeapRegionManager, particularly the combination of
HeapRegionManager::concurrent_uncommit_regions_memory,
HRM::synchronize_uncommit_regions_memory and HRM::uncommit_regions could
probably be cut to almost 1/3rd.
On 13.01.20 12:45, Thomas Schatzl wrote:
> Hi Liang,
>
> On 07.01.20 17:33, Liang Mao wrote:
>> Hi Thomas,
>>
>> As we previously discussed, I use the concurrent heap uncommit/commit
>> mechanism to implement the SoftMaxHeapSize for G1. It is also for
>> thfurther implementation of
>> G1ElasticHeap for ergonomic
>> change of heap size. In the previous 8u implementation, we had some
>> limitations which are all
>> removed now in this patch. The concurrent uncommit/commit can also
>> work with some senarios for
>> immediate heap expansion.
>>
>> Here is the webrev link:
>> http://cr.openjdk.java.net/~luchsh/8236073.webrev/
>>
>> We still have some questions.
>> 1. Does the SoftMaxHeapSize limitation need to consider the GC time
>> ratio as in
>> expand_heap_after_young_collection? Now we haven't put the logic in yet.
I am not completely clear what you are asking about, but the gc time
ratio only affects the current "optimal" heap size which is bounded by
SoftMaxHeapsize/max_capacity.
>> 2. The concurrent uncommit/commit can only work for
>> G1RegionsLargerThanCommitSizeMapper but not
>> G1RegionsSmallerThanCommitSizeMapper which might need some locks to
>> ensure the multi-thread
>> synchronization issue( heap may expand immediately). I think bringing
>> the lock synchronization
>> may not be worthy for the little gain. Another idea is can we just not
>> uncommit the pages of
>> auxiliary data if in G1RegionsSmallerThanCommitSizeMapper? Heap
>> regions should not be
>> G1RegionsSmallerThanCommitSizeMapper most of time I guess...
>>
>> Looking forward to your advice:)
Thanks,
Thomas
More information about the hotspot-gc-dev
mailing list