[PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance

Tue Sep 17 13:52:53 UTC 2019

Thanks,

I will try to find time the coming weeks to do some evaluation and I'll 
get back to you if I have any questions or comments.

Thanks,
Stefan

On 2019-09-16 15:54, Haoyu Li wrote:
> Hi Stefan,
> 
> Thanks for getting back to me! I have ported the optimization to JDK 14 
> and the new patch is attached in this mail.
> 
> As to the command lines in our evaluation, basically, we run the 
> benchmarks with flags including *-Xmx<heap_size> 
> -XX:ParallelGCThreads=32 -XX:+UseParallelGC -XX:-ScavengeBeforeFullGC 
> -Xlog:gc.* We set the maximum heap size for each benchmark to 3X of 
> their minimum heap size and the amount of GC threads to 32 because our 
> machine has 32 physical cores. Full command lines for all benchmarks can 
> be found in the attached file /evaluation.sh/.
> 
> I am more than happy to have any feedback. Thanks for reviewing this patch!
> 
> Best Regrads,
> Haoyu Li,
> Institute of Parallel and Distributed Systems(IPADS),
> School of Software,
> Shanghai Jiao Tong University
> 
> 
> Stefan Johansson <stefan.johansson at oracle.com 
> <mailto:stefan.johansson at oracle.com>> 于2019年9月12日周四 上午5:34写道：
> 
>     Hi Haoyu,
> 
>     I recently came across your patch and I would like to pick up on
>     some of the things Kim mentioned in his mails. I especially want
>     evaluate and investigate if this is a technique we can use to
>     improve the other GCs as well. To start that work I want to take the
>     patch for a spin in our internal performance testing. The patch
>     doesn’t apply clean to the latest JDK repository, so if you could
>     provide an updated patch that would be very helpful.
> 
>     It would also be great if you could share some more information
>     around the results presented in the paper. For example, it would be
>     good to get the full command lines for the different benchmarks so
>     we can run them locally and reproduce the results you’ve seen.
> 
>     Thanks,
>     Stefan
> 
>>     12 mars 2019 kl. 03:21 skrev Haoyu Li <leihouyju at gmail.com
>>     <mailto:leihouyju at gmail.com>>:
>>
>>     Hi Kim,
>>
>>     Thanks for reviewing and testing the patch. If there are any
>>     failures or performance degradation relevant to the work, please
>>     let me know and I'll be very happy to keep improving it. Also, any
>>     suggestions about code improvements are well appreciated.
>>
>>     I'm not quite sure if both G1 and Shenandoah have the similar
>>     region dependency issue, since I haven't studied their GC
>>     behaviors before. If they have, I'm also willing to propose a more
>>     general optimization.
>>
>>     As to the memory overhead, I believe it will be low because this
>>     patch exploits empty regions in the young space rather than
>>     off-heap memory to allocate shadow regions, and also reuses the
>>     /_source_region/ field of each /RegionData /to record the
>>     correspongding shadow region index. We only introduce a new
>>     integer filed /_shadow /in the RegionData class to indicate the
>>     status of a region, a global /GrowableArray _free_shadow/ to store
>>     the indices of shadow regions, and a global /Monitor/ to protect
>>     the array. These information might help if the memory overhead
>>     need to be evaluated.
>>
>>     Looking forward to your insight.
>>
>>     Best Regrads,
>>     Haoyu Li,
>>     Institute of Parallel and Distributed Systems(IPADS),
>>     School of Software,
>>     Shanghai Jiao Tong University
>>
>>
>>     Kim Barrett <kim.barrett at oracle.com
>>     <mailto:kim.barrett at oracle.com>> 于2019年3月12日周二 上午6:11写道：
>>
>>         > On Mar 11, 2019, at 1:45 AM, Kim Barrett
>>         <kim.barrett at oracle.com <mailto:kim.barrett at oracle.com>> wrote:
>>         >
>>         >> On Jan 24, 2019, at 3:58 AM, Haoyu Li <leihouyju at gmail.com
>>         <mailto:leihouyju at gmail.com>> wrote:
>>         >>
>>         >> Hi Kim,
>>         >>
>>         >> I have ported my patch to OpenJDK 13 according to your
>>         instructions in your last mail, and the patch is attached in
>>         this mail. The patch does not change much since PSGC is indeed
>>         pretty stable.
>>         >>
>>         >> Also, I evaluate the correctness and performance of PS full
>>         GC with benchmarks from DaCapo, SPECjvm2008, and JOlden suits
>>         on a machine with dual Intel Xeon E5-2618L v3 CPUs(16 physical
>>         cores), 64G DRAM and linux kernel 4.17. The evaluation result,
>>         indicating 1.9X GC throughput improvement on average, is
>>         attached, too.
>>         >>
>>         >> However, I have no idea how to further test this patch for
>>         both correctness and performance. Can I please get any
>>         guidance from you or some sponsor?
>>         >
>>         > Sorry I missed that you had sent an updated version of the
>>         patch.
>>         >
>>         > I’ve run the full regression suite across Oracle-supported
>>         platforms.  There are some
>>         > failures, but there are almost always some failures in the
>>         later tiers right now.  I’ll start
>>         > looking at them tomorrow to figure out whether any of them
>>         are relevant.
>>         >
>>         > I’m also planning to run some of our performance benchmarks.
>>         >
>>         > I’ve lightly skimmed the proposed changes.  There might be
>>         some code improvements
>>         > to be made.
>>         >
>>         > I’m also wondering if this technique applies to other
>>         collectors.  It seems like both G1 and
>>         > Shenandoah full gc’s might have similar issues?  If so, a
>>         solution that is ParallelGC-specific
>>         > is less interesting than one that has broader
>>         applicability.  Though maybe this optimization
>>         > is less important for G1 and Shenandoah, since they actively
>>         try to avoid full gc’s.
>>         >
>>         > I’m also not clear on how much additional memory might be
>>         temporarily allocated by this
>>         > mechanism.
>>
>>         I’ve created a CR for this:
>>         https://bugs.openjdk.java.net/browse/JDK-8220465
>>
>