[PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance

Thu Oct 10 13:10:52 UTC 2019

Hi Stefan,

Thanks for your quick response! As to your concern about the OCA, I am the
sole author of the patch. And it is the case as what the agreement states.

Best Regrads,
Haoyu Li,

Stefan Johansson <stefan.johansson at oracle.com> 于2019年10月10日周四 下午8:37写道：

> Hi,
>
> On 2019-10-10 13:06, Haoyu Li wrote:
> > Hi Stefan,
> >
> > Thanks for your testing! One possible reason for the regressions in
> > simple tests is that the region dependencies maybe not heavy enough.
> > Because the locality of shadow regions is lower than that of heap
> > regions, writing to shadow regions will be slower than to normal
> > regions, and this is a part of the reason why I reuse shadow regions.
> > Therefore, if only a few shadow regions are created and not reused, the
> > overhead may not be amortized.
>
> I guess it is something like this. I thought that for "easy" heaps the
> shadow regions won't be used at all, and should therefor not really cost
> anything.
>
> >
> > As to the OCA, it is the case that I'm the only person signing the
> > agreement. Please let me know if you have any further questions. Thanks
> > again!
>
> Ok, so you are the sole author of the patch. The important part, as the
> agreement states, is:
> "no other person or entity, including my employer, has or will have
> rights with respect my contributions"
>
> Is that the case?
>
> Thanks,
> Stefan
>
> >
> > Best Regrads,
> > Haoyu Li
> >
> > Stefan Johansson <stefan.johansson at oracle.com
> > <mailto:stefan.johansson at oracle.com>> 于2019年10月8日周二 下午6:49写道：
> >
> >     Hi Haoyu,
> >
> >     I've done some more testing and I haven't seen any issues with the
> >     patch
> >     so far and the performance looks promising in most cases. For simple
> >     tests I've seen some regressions, but I'm not really sure why. Will
> do
> >     some more digging.
> >
> >     To move forward with this the first thing we need to do is making
> sure
> >     that you being covered by the Oracle Contributor Agreement is enough.
> >       From what we can see it is only you as an individual that has
> signed
> >     the OCA and in that case it is important that this statement from the
> >     OCA is fulfilled: "no other person or entity, including my employer,
> >     has
> >     or will have rights with respect my contributions"
> >
> >     Is this the case for this contribution or should we have the
> university
> >     sign the OCA as well? For more information regarding the OCA please
> >     refer to:
> >     https://www.oracle.com/technetwork/oca-faq-405384.pdf
> >
> >     Thanks,
> >     Stefan
> >
> >     On 2019-09-16 16:02, Haoyu Li wrote:
> >      > FYI, the evaluation results on OpenJDK 14 are plotted in the
> >     attachment.
> >      > I compute the full GC throughput by dividing the heap size before
> >     full
> >      > GC by the GC pause time, and the results are arithmetic mean
> >     values of
> >      > ten runs after a warm-up run. The evaluation is conducted on a
> >     machine
> >      > with dual Intel ®XeonTM E5-2618L v3 CPUs (2 sockets, 16 physical
> >     cores
> >      > with SMT enabled) and 64G DRAM.
> >      >
> >      > Best Regrads,
> >      > Haoyu Li,
> >      > Institute of Parallel and Distributed Systems(IPADS),
> >      > School of Software,
> >      > Shanghai Jiao Tong University
> >      >
> >      >
> >      > Stefan Johansson <stefan.johansson at oracle.com
> >     <mailto:stefan.johansson at oracle.com>
> >      > <mailto:stefan.johansson at oracle.com
> >     <mailto:stefan.johansson at oracle.com>>> 于2019年9月12日周四 上午5:34
> >     写道：
> >      >
> >      >     Hi Haoyu,
> >      >
> >      >     I recently came across your patch and I would like to pick up
> on
> >      >     some of the things Kim mentioned in his mails. I especially
> want
> >      >     evaluate and investigate if this is a technique we can use to
> >      >     improve the other GCs as well. To start that work I want to
> >     take the
> >      >     patch for a spin in our internal performance testing. The
> patch
> >      >     doesn’t apply clean to the latest JDK repository, so if you
> could
> >      >     provide an updated patch that would be very helpful.
> >      >
> >      >     It would also be great if you could share some more
> information
> >      >     around the results presented in the paper. For example, it
> >     would be
> >      >     good to get the full command lines for the different
> >     benchmarks so
> >      >     we can run them locally and reproduce the results you’ve seen.
> >      >
> >      >     Thanks,
> >      >     Stefan
> >      >
> >      >>     12 mars 2019 kl. 03:21 skrev Haoyu Li <leihouyju at gmail.com
> >     <mailto:leihouyju at gmail.com>
> >      >>     <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>>>:
> >      >>
> >      >>     Hi Kim,
> >      >>
> >      >>     Thanks for reviewing and testing the patch. If there are any
> >      >>     failures or performance degradation relevant to the work,
> please
> >      >>     let me know and I'll be very happy to keep improving it.
> >     Also, any
> >      >>     suggestions about code improvements are well appreciated.
> >      >>
> >      >>     I'm not quite sure if both G1 and Shenandoah have the similar
> >      >>     region dependency issue, since I haven't studied their GC
> >      >>     behaviors before. If they have, I'm also willing to propose
> >     a more
> >      >>     general optimization.
> >      >>
> >      >>     As to the memory overhead, I believe it will be low because
> this
> >      >>     patch exploits empty regions in the young space rather than
> >      >>     off-heap memory to allocate shadow regions, and also reuses
> the
> >      >>     /_source_region/ field of each /RegionData /to record the
> >      >>     correspongding shadow region index. We only introduce a new
> >      >>     integer filed /_shadow /in the RegionData class to indicate
> the
> >      >>     status of a region, a global /GrowableArray _free_shadow/ to
> >     store
> >      >>     the indices of shadow regions, and a global /Monitor/ to
> protect
> >      >>     the array. These information might help if the memory
> overhead
> >      >>     need to be evaluated.
> >      >>
> >      >>     Looking forward to your insight.
> >      >>
> >      >>     Best Regrads,
> >      >>     Haoyu Li,
> >      >>     Institute of Parallel and Distributed Systems(IPADS),
> >      >>     School of Software,
> >      >>     Shanghai Jiao Tong University
> >      >>
> >      >>
> >      >>     Kim Barrett <kim.barrett at oracle.com
> >     <mailto:kim.barrett at oracle.com>
> >      >>     <mailto:kim.barrett at oracle.com
> >     <mailto:kim.barrett at oracle.com>>> 于2019年3月12日周二 上午6:11写道：
> >      >>
> >      >>         > On Mar 11, 2019, at 1:45 AM, Kim Barrett
> >      >>         <kim.barrett at oracle.com <mailto:kim.barrett at oracle.com>
> >     <mailto:kim.barrett at oracle.com <mailto:kim.barrett at oracle.com>>>
> wrote:
> >      >>         >
> >      >>         >> On Jan 24, 2019, at 3:58 AM, Haoyu Li
> >     <leihouyju at gmail.com <mailto:leihouyju at gmail.com>
> >      >>         <mailto:leihouyju at gmail.com
> >     <mailto:leihouyju at gmail.com>>> wrote:
> >      >>         >>
> >      >>         >> Hi Kim,
> >      >>         >>
> >      >>         >> I have ported my patch to OpenJDK 13 according to your
> >      >>         instructions in your last mail, and the patch is
> attached in
> >      >>         this mail. The patch does not change much since PSGC is
> >     indeed
> >      >>         pretty stable.
> >      >>         >>
> >      >>         >> Also, I evaluate the correctness and performance of
> >     PS full
> >      >>         GC with benchmarks from DaCapo, SPECjvm2008, and JOlden
> >     suits
> >      >>         on a machine with dual Intel Xeon E5-2618L v3 CPUs(16
> >     physical
> >      >>         cores), 64G DRAM and linux kernel 4.17. The evaluation
> >     result,
> >      >>         indicating 1.9X GC throughput improvement on average, is
> >      >>         attached, too.
> >      >>         >>
> >      >>         >> However, I have no idea how to further test this
> >     patch for
> >      >>         both correctness and performance. Can I please get any
> >      >>         guidance from you or some sponsor?
> >      >>         >
> >      >>         > Sorry I missed that you had sent an updated version of
> the
> >      >>         patch.
> >      >>         >
> >      >>         > I’ve run the full regression suite across
> Oracle-supported
> >      >>         platforms.  There are some
> >      >>         > failures, but there are almost always some failures in
> the
> >      >>         later tiers right now.  I’ll start
> >      >>         > looking at them tomorrow to figure out whether any of
> them
> >      >>         are relevant.
> >      >>         >
> >      >>         > I’m also planning to run some of our performance
> >     benchmarks.
> >      >>         >
> >      >>         > I’ve lightly skimmed the proposed changes.  There
> might be
> >      >>         some code improvements
> >      >>         > to be made.
> >      >>         >
> >      >>         > I’m also wondering if this technique applies to other
> >      >>         collectors.  It seems like both G1 and
> >      >>         > Shenandoah full gc’s might have similar issues?  If
> so, a
> >      >>         solution that is ParallelGC-specific
> >      >>         > is less interesting than one that has broader
> >      >>         applicability.  Though maybe this optimization
> >      >>         > is less important for G1 and Shenandoah, since they
> >     actively
> >      >>         try to avoid full gc’s.
> >      >>         >
> >      >>         > I’m also not clear on how much additional memory might
> be
> >      >>         temporarily allocated by this
> >      >>         > mechanism.
> >      >>
> >      >>         I’ve created a CR for this:
> >      >> https://bugs.openjdk.java.net/browse/JDK-8220465
> >      >>
> >      >
> >
>