RFR (S) 8240556 and RFR (S) 8248783 for G1 humongous objects

Fri Jul 3 08:35:02 UTC 2020

Hi Man and G1 team,

Previously we were discussing about abort of concurrent mark(JDK-8240556). I 
made a concurrent cleaning bitmap version instead of the original STW one.
 But there are still some other problems in real world. Recently I revisited
 the implementation and make more optimization and the result seems very good.

The most severe problems in our work loads are as below:
1) Humongous objects allocation will invade the space of young generation
and lead to long time STW because of to-space exhausted and even Full GC
2) Frequent concurrent marking will cost significant CPU resources (which
could be resolved by abort concurrent mark JDK-8240556)
3) Frequent GCs because the humongous allocation can reach IHOP very quickly
even no matter w or w/o abort concurrent mark.

Both 1) and 3) didn't have solutions yet. I made another change to share the 
young space for humongous allocation and eden allocation. Initial-mark will be
triggered if reserve space is invaded. My colleague helped to create a new bug
ID: https://bugs.openjdk.java.net/browse/JDK-8248783
With this change and abort concurrent mark our problematic work loads run
 very smoothly without any GC issues. I have tested the 2 changes in several
 different applications, like web services, database, etc.

Our real test is done with 8u and I have created JDK15 webrevs with the same
logic. The code completes the test of jtreg gc/g1 and jbb2015. Could you please
 take a look and run some tests?

Bug: https://bugs.openjdk.java.net/browse/JDK-8240556
Webrev: http://cr.openjdk.java.net/~ddong/liangmao/8240556/webrev.00/
(This may related to https://bugs.openjdk.java.net/browse/JDK-8247928)

Bug: https://bugs.openjdk.java.net/browse/JDK-8248783
Webrev: http://cr.openjdk.java.net/~ddong/liangmao/8248783/webrev.00/

Thanks,
Liang

------------------------------------------------------------------
From:MAO, Liang <maoliang.ml at alibaba-inc.com>
Send Time:2020 Mar. 6 (Fri.) 19:35
To:Stefan Johansson <stefan.johansson at oracle.com>; Thomas Schatzl <thomas.schatzl at oracle.com>; Man Cao <manc at google.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:RFR (S): 8240556: Abort concurrent mark after effective eager reclamation of humongous objects

Hi,

Thanks for Man's accurate comments and I made the change
http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev.1/

Stefan's concern is fairly reasonable since I have noticed if GC 
workers are not enough, the addition pause time caused by clearing 
could be considerable. concurrent_cycle_abort might not be easily to
 reuse because it still clears the bitmap in pause. I was thinking to let
 the concurrent mark thread continue and finish the last step of
 "_cm->cleanup_for_next_mark()" although it has chance to delay the 
next initial mark. Anyway I'm glad to make a try and you guys can compare
two approaches and provide comments.

Thanks,
Liang

------------------------------------------------------------------
From:Stefan Johansson <stefan.johansson at oracle.com>
Send Time:2020 Mar. 6 (Fri.) 18:59
To:"MAO, Liang" <maoliang.ml at alibaba-inc.com>; Thomas Schatzl <thomas.schatzl at oracle.com>; Man Cao <manc at google.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
Subject:Re: RFR (S): 8240556: Abort concurrent mark after effective eager reclamation of humongous objects

Hi Liang,

Thanks for picking this up, really nice to see it progressing.

It would be nice if we could make the clearing concurrently to avoid 
prolonging the pause. An alternative to abort like you do now, would be 
to let the concurrent cycle start, but have it abort it self directly. 
This should be done by calling:
G1ConcurrentMark::concurrent_cycle_abort()

This would also reuse the abort mechanism already in place and if 
aborting needs updating in the future there is only one place to change. 
There might be some things that have to be altered to get this to work 
and I haven't explored this more than in theory. Would you consider 
trying this out?

I'm thinking this should look something like this in the log:
GC(1) Pause Young (Concurrent Start) (G1 Evacuation Pause) 
261M->262M(502M) 50.153ms
GC(2) Concurrent Cycle
GC(2) Concurrent Mark Abort
GC(2) Concurrent Cycle 12.345ms

We might want to call it something other than "Abort" in the logs to 
differ it from an abort by a Full GC, but we can discuss the details 
later on.

Thanks,
Stefan

On 2020-03-05 08:13, Liang Mao wrote:
> Hi All,
> 
> Now we have the bug id. I did more test to the patch. There's
> a little concern in the patch that when we decide to cancle
> the concurrent cycle in initial mark pause we need to clear
> the next bitmap which supposes to be cleared concurrently.
> In my test with -Xmx20g -Xms20g -XX:ParallelGCThreads=10,
> the time spent on clearing next bitmap was consistently less
> than 10ms. So I guess it could be acceptable.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8240556
> Webrev:
> http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/
> 
> Thanks,
> Liang
> 
> 
> 
> 
> 
> ------------------------------------------------------------------
> From:MAO, Liang <maoliang.ml at alibaba-inc.com>
> Send Time:2020 Mar. 3 (Tue.) 19:14
> To:Thomas Schatzl <thomas.schatzl at oracle.com>; Man Cao <manc at google.com>; hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>
> Subject:G1: Abort concurrent at initial mark pause
> 
> Hi All,
> 
> As previous discusion, there're several ideas to improve the humongous
> objects handling. We've made some experiments that canceling concurrent
>   mark at initial mark pause is proved to be effective in the senario that
> frequent temporary humongous objects allocation leads to frequent concurrent
>   mark and high CPU usage. The sub-test: scimark.fft.large in specjvm2008 is
> also the exact case but not GC sensative so there's little difference
> in score.
> 
> The patch is small and shall we have a bug id for it?
> http://cr.openjdk.java.net/~luchsh/g1hum/humongous.webrev/
> 
> Thanks,
> Liang
> 
> 
> 
> 
>