RFR (S): 8240556: Abort concurrent mark after effective eager reclamation of humongous objects

Thu Mar 5 19:24:13 UTC 2020

Hi Liang,

Overall, I think the approach would work well after fixing a few issues
below.

In g1CollectedHeap.cpp:

3111         if (gc_cause() == GCCause::_g1_humongous_allocation &&
> collector_state()->in_initial_mark_gc()) {
> 3112           // Check if we still need to do concurrent mark after
> evacuation
> 3113           // Abort concurrent mark in case we cleaned humongous
> objects via eager reclaim
> 3114           should_start_conc_mark =
> policy()->need_to_start_conc_mark("end of GC");

Two issues:
(1) I think need_to_start_conc_mark() does not have the most up-to-date
information at this point.
For example, the later expand_heap_after_young_collection() could update
G1IHOPControl::_target_occupancy,
which is used by need_to_start_conc_mark().
One possible solution could be to move the " if (should_start_conc_mark) {
concurrent_mark()->post_initial_mark(); } " below
to after expand_heap_after_young_collection(). I'd wait for the G1 team
members to confirm that this approach is safe.

(2) Does it need to call collector_state()->set_in_initial_mark_gc(false)
if need_to_start_conc_mark() returns false?
Specifically, the later G1Policy::record_collection_pause_end() would call
collector_state()->set_mark_or_rebuild_in_progress(true),
if collector_state()->in_initial_mark_gc() remains true. This is probably
wrong if the initial mark has been aborted.

2059 void G1CollectedHeap::decrement_old_marking_cycles_started() {
> 2060   assert(_old_marking_cycles_started > 0, "must be");

Could it assert "_old_marking_cycles_started ==
_old_marking_cycles_completed + 1" instead?

3125         } else if (collector_state()->in_initial_mark_gc()) {
> 3126           // Don't do concurrent mark any more
> 3127           concurrent_mark()->initial_mark_abort();
> 3128           log_info(gc)("Concurrent Aborted");

It's probably better to move the log_info inside the initial_mark_abort()
method.
Also, "Concurrent Start Cancelled" is probably a more precise and
unambiguous message.
It corresponds to the "Pause Young (Concurrent Start)" in
G1CollectedHeap::young_gc_name(),
and does not collide with "Concurrent Mark Abort" in
G1ConcurrentMark::concurrent_cycle_end().
Perhaps initial_mark_abort() could be renamed to cancel_initial_mark() also?

-Man

On Wed, Mar 4, 2020 at 11:13 PM Liang Mao <maoliang.ml at alibaba-inc.com>
wrote:

> Hi All,
>
> Now we have the bug id. I did more test to the patch. There's
> a little concern in the patch that when we decide to cancle
> the concurrent cycle in initial mark pause we need to clear
> the next bitmap which supposes to be cleared concurrently.
> In my test with -Xmx20g -Xms20g -XX:ParallelGCThreads=10,
> the time spent on clearing next bitmap was consistently less
> than 10ms. So I guess it could be acceptable.
>
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8240556
> Webrev:
>
> http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/
>
>
> Thanks,
> Liang
>
>
>
>
>
> ------------------------------------------------------------------
> From:MAO, Liang <maoliang.ml at alibaba-inc.com>
> Send Time:2020 Mar. 3 (Tue.) 19:14
> To:Thomas Schatzl <thomas.schatzl at oracle.com>; Man Cao <manc at google.com>;
> hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
> Subject:G1: Abort concurrent at initial mark pause
>
> Hi All,
>
> As previous discusion, there're several ideas to improve the humongous
> objects handling. We've made some experiments that canceling concurrent
> mark at initial mark pause is proved to be effective in the senario that
> frequent temporary humongous objects allocation leads to frequent
> concurrent
> mark and high CPU usage. The sub-test: scimark.fft.large in specjvm2008 is
> also the exact case but not GC sensative so there's little difference
> in score.
>
> The patch is small and shall we have a bug id for it?
> http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/
>
> Thanks,
> Liang
>
>
>
>
>
>