[PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance

Stefan Johansson stefan.johansson at oracle.com
Tue Oct 8 10:49:27 UTC 2019


Hi Haoyu,

I've done some more testing and I haven't seen any issues with the patch 
so far and the performance looks promising in most cases. For simple 
tests I've seen some regressions, but I'm not really sure why. Will do 
some more digging.

To move forward with this the first thing we need to do is making sure 
that you being covered by the Oracle Contributor Agreement is enough. 
 From what we can see it is only you as an individual that has signed 
the OCA and in that case it is important that this statement from the 
OCA is fulfilled: "no other person or entity, including my employer, has 
or will have rights with respect my contributions"

Is this the case for this contribution or should we have the university 
sign the OCA as well? For more information regarding the OCA please 
refer to:
https://www.oracle.com/technetwork/oca-faq-405384.pdf

Thanks,
Stefan

On 2019-09-16 16:02, Haoyu Li wrote:
> FYI, the evaluation results on OpenJDK 14 are plotted in the attachment. 
> I compute the full GC throughput by dividing the heap size before full 
> GC by the GC pause time, and the results are arithmetic mean values of 
> ten runs after a warm-up run. The evaluation is conducted on a machine 
> with dual Intel ®XeonTM E5-2618L v3 CPUs (2 sockets, 16 physical cores 
> with SMT enabled) and 64G DRAM.
> 
> Best Regrads,
> Haoyu Li,
> Institute of Parallel and Distributed Systems(IPADS),
> School of Software,
> Shanghai Jiao Tong University
> 
> 
> Stefan Johansson <stefan.johansson at oracle.com 
> <mailto:stefan.johansson at oracle.com>> 于2019年9月12日周四 上午5:34写道:
> 
>     Hi Haoyu,
> 
>     I recently came across your patch and I would like to pick up on
>     some of the things Kim mentioned in his mails. I especially want
>     evaluate and investigate if this is a technique we can use to
>     improve the other GCs as well. To start that work I want to take the
>     patch for a spin in our internal performance testing. The patch
>     doesn’t apply clean to the latest JDK repository, so if you could
>     provide an updated patch that would be very helpful.
> 
>     It would also be great if you could share some more information
>     around the results presented in the paper. For example, it would be
>     good to get the full command lines for the different benchmarks so
>     we can run them locally and reproduce the results you’ve seen.
> 
>     Thanks,
>     Stefan
> 
>>     12 mars 2019 kl. 03:21 skrev Haoyu Li <leihouyju at gmail.com
>>     <mailto:leihouyju at gmail.com>>:
>>
>>     Hi Kim,
>>
>>     Thanks for reviewing and testing the patch. If there are any
>>     failures or performance degradation relevant to the work, please
>>     let me know and I'll be very happy to keep improving it. Also, any
>>     suggestions about code improvements are well appreciated.
>>
>>     I'm not quite sure if both G1 and Shenandoah have the similar
>>     region dependency issue, since I haven't studied their GC
>>     behaviors before. If they have, I'm also willing to propose a more
>>     general optimization.
>>
>>     As to the memory overhead, I believe it will be low because this
>>     patch exploits empty regions in the young space rather than
>>     off-heap memory to allocate shadow regions, and also reuses the
>>     /_source_region/ field of each /RegionData /to record the
>>     correspongding shadow region index. We only introduce a new
>>     integer filed /_shadow /in the RegionData class to indicate the
>>     status of a region, a global /GrowableArray _free_shadow/ to store
>>     the indices of shadow regions, and a global /Monitor/ to protect
>>     the array. These information might help if the memory overhead
>>     need to be evaluated.
>>
>>     Looking forward to your insight.
>>
>>     Best Regrads,
>>     Haoyu Li,
>>     Institute of Parallel and Distributed Systems(IPADS),
>>     School of Software,
>>     Shanghai Jiao Tong University
>>
>>
>>     Kim Barrett <kim.barrett at oracle.com
>>     <mailto:kim.barrett at oracle.com>> 于2019年3月12日周二 上午6:11写道:
>>
>>         > On Mar 11, 2019, at 1:45 AM, Kim Barrett
>>         <kim.barrett at oracle.com <mailto:kim.barrett at oracle.com>> wrote:
>>         >
>>         >> On Jan 24, 2019, at 3:58 AM, Haoyu Li <leihouyju at gmail.com
>>         <mailto:leihouyju at gmail.com>> wrote:
>>         >>
>>         >> Hi Kim,
>>         >>
>>         >> I have ported my patch to OpenJDK 13 according to your
>>         instructions in your last mail, and the patch is attached in
>>         this mail. The patch does not change much since PSGC is indeed
>>         pretty stable.
>>         >>
>>         >> Also, I evaluate the correctness and performance of PS full
>>         GC with benchmarks from DaCapo, SPECjvm2008, and JOlden suits
>>         on a machine with dual Intel Xeon E5-2618L v3 CPUs(16 physical
>>         cores), 64G DRAM and linux kernel 4.17. The evaluation result,
>>         indicating 1.9X GC throughput improvement on average, is
>>         attached, too.
>>         >>
>>         >> However, I have no idea how to further test this patch for
>>         both correctness and performance. Can I please get any
>>         guidance from you or some sponsor?
>>         >
>>         > Sorry I missed that you had sent an updated version of the
>>         patch.
>>         >
>>         > I’ve run the full regression suite across Oracle-supported
>>         platforms.  There are some
>>         > failures, but there are almost always some failures in the
>>         later tiers right now.  I’ll start
>>         > looking at them tomorrow to figure out whether any of them
>>         are relevant.
>>         >
>>         > I’m also planning to run some of our performance benchmarks.
>>         >
>>         > I’ve lightly skimmed the proposed changes.  There might be
>>         some code improvements
>>         > to be made.
>>         >
>>         > I’m also wondering if this technique applies to other
>>         collectors.  It seems like both G1 and
>>         > Shenandoah full gc’s might have similar issues?  If so, a
>>         solution that is ParallelGC-specific
>>         > is less interesting than one that has broader
>>         applicability.  Though maybe this optimization
>>         > is less important for G1 and Shenandoah, since they actively
>>         try to avoid full gc’s.
>>         >
>>         > I’m also not clear on how much additional memory might be
>>         temporarily allocated by this
>>         > mechanism.
>>
>>         I’ve created a CR for this:
>>         https://bugs.openjdk.java.net/browse/JDK-8220465
>>
> 



More information about the hotspot-gc-dev mailing list