[PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance
Stefan Johansson
stefan.johansson at oracle.com
Thu Oct 10 13:50:56 UTC 2019
Thanks for the clarification =)
Moving on to the next part, the code in the patch. So this won't be a
full review of the patch but just an initial comment that I would like
to be addressed first.
The new function PSParallelCompact::fill_shadow_region() is more or less
a copy of PSParallelCompact::fill_region() and I understand that from a
proof of concept point of view it was the easy (and right) way to do it.
I would prefer if the code could be refactored so that fill_region() and
fill_shadow_region() share more code. There might be reasons that I've
missed, that prevents it, but we should at least explore how much code
can be shared.
Thanks,
Stefan
On 2019-10-10 15:10, Haoyu Li wrote:
> Hi Stefan,
>
> Thanks for your quick response! As to your concern about the OCA, I am
> the sole author of the patch. And it is the case as what the agreement
> states.
> Best Regrads,
> Haoyu Li,
>
>
> Stefan Johansson <stefan.johansson at oracle.com
> <mailto:stefan.johansson at oracle.com>> 于2019年10月10日周四 下午8:37写道:
>
> Hi,
>
> On 2019-10-10 13:06, Haoyu Li wrote:
> > Hi Stefan,
> >
> > Thanks for your testing! One possible reason for the regressions in
> > simple tests is that the region dependencies maybe not heavy enough.
> > Because the locality of shadow regions is lower than that of heap
> > regions, writing to shadow regions will be slower than to normal
> > regions, and this is a part of the reason why I reuse shadow
> regions.
> > Therefore, if only a few shadow regions are created and not
> reused, the
> > overhead may not be amortized.
>
> I guess it is something like this. I thought that for "easy" heaps the
> shadow regions won't be used at all, and should therefor not really
> cost
> anything.
>
> >
> > As to the OCA, it is the case that I'm the only person signing the
> > agreement. Please let me know if you have any further questions.
> Thanks
> > again!
>
> Ok, so you are the sole author of the patch. The important part, as the
> agreement states, is:
> "no other person or entity, including my employer, has or will have
> rights with respect my contributions"
>
> Is that the case?
>
> Thanks,
> Stefan
>
> >
> > Best Regrads,
> > Haoyu Li
> >
> > Stefan Johansson <stefan.johansson at oracle.com
> <mailto:stefan.johansson at oracle.com>
> > <mailto:stefan.johansson at oracle.com
> <mailto:stefan.johansson at oracle.com>>> 于2019年10月8日周二 下午6:49
> 写道:
> >
> > Hi Haoyu,
> >
> > I've done some more testing and I haven't seen any issues
> with the
> > patch
> > so far and the performance looks promising in most cases. For
> simple
> > tests I've seen some regressions, but I'm not really sure
> why. Will do
> > some more digging.
> >
> > To move forward with this the first thing we need to do is
> making sure
> > that you being covered by the Oracle Contributor Agreement is
> enough.
> > From what we can see it is only you as an individual that
> has signed
> > the OCA and in that case it is important that this statement
> from the
> > OCA is fulfilled: "no other person or entity, including my
> employer,
> > has
> > or will have rights with respect my contributions"
> >
> > Is this the case for this contribution or should we have the
> university
> > sign the OCA as well? For more information regarding the OCA
> please
> > refer to:
> > https://www.oracle.com/technetwork/oca-faq-405384.pdf
> >
> > Thanks,
> > Stefan
> >
> > On 2019-09-16 16:02, Haoyu Li wrote:
> > > FYI, the evaluation results on OpenJDK 14 are plotted in the
> > attachment.
> > > I compute the full GC throughput by dividing the heap size
> before
> > full
> > > GC by the GC pause time, and the results are arithmetic mean
> > values of
> > > ten runs after a warm-up run. The evaluation is conducted on a
> > machine
> > > with dual Intel ®XeonTM E5-2618L v3 CPUs (2 sockets, 16
> physical
> > cores
> > > with SMT enabled) and 64G DRAM.
> > >
> > > Best Regrads,
> > > Haoyu Li,
> > > Institute of Parallel and Distributed Systems(IPADS),
> > > School of Software,
> > > Shanghai Jiao Tong University
> > >
> > >
> > > Stefan Johansson <stefan.johansson at oracle.com
> <mailto:stefan.johansson at oracle.com>
> > <mailto:stefan.johansson at oracle.com
> <mailto:stefan.johansson at oracle.com>>
> > > <mailto:stefan.johansson at oracle.com
> <mailto:stefan.johansson at oracle.com>
> > <mailto:stefan.johansson at oracle.com
> <mailto:stefan.johansson at oracle.com>>>> 于2019年9月12日周四 上午5:34
> > 写道:
> > >
> > > Hi Haoyu,
> > >
> > > I recently came across your patch and I would like to
> pick up on
> > > some of the things Kim mentioned in his mails. I
> especially want
> > > evaluate and investigate if this is a technique we can
> use to
> > > improve the other GCs as well. To start that work I
> want to
> > take the
> > > patch for a spin in our internal performance testing.
> The patch
> > > doesn’t apply clean to the latest JDK repository, so
> if you could
> > > provide an updated patch that would be very helpful.
> > >
> > > It would also be great if you could share some more
> information
> > > around the results presented in the paper. For example, it
> > would be
> > > good to get the full command lines for the different
> > benchmarks so
> > > we can run them locally and reproduce the
> results you’ve seen.
> > >
> > > Thanks,
> > > Stefan
> > >
> > >> 12 mars 2019 kl. 03:21 skrev Haoyu Li
> <leihouyju at gmail.com <mailto:leihouyju at gmail.com>
> > <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>>
> > >> <mailto:leihouyju at gmail.com
> <mailto:leihouyju at gmail.com> <mailto:leihouyju at gmail.com
> <mailto:leihouyju at gmail.com>>>>:
> > >>
> > >> Hi Kim,
> > >>
> > >> Thanks for reviewing and testing the patch. If there
> are any
> > >> failures or performance degradation relevant to the
> work, please
> > >> let me know and I'll be very happy to keep improving it.
> > Also, any
> > >> suggestions about code improvements are well appreciated.
> > >>
> > >> I'm not quite sure if both G1 and Shenandoah have the
> similar
> > >> region dependency issue, since I haven't studied their GC
> > >> behaviors before. If they have, I'm also willing to
> propose
> > a more
> > >> general optimization.
> > >>
> > >> As to the memory overhead, I believe it will be low
> because this
> > >> patch exploits empty regions in the young space
> rather than
> > >> off-heap memory to allocate shadow regions, and also
> reuses the
> > >> /_source_region/ field of each /RegionData /to record the
> > >> correspongding shadow region index. We only introduce
> a new
> > >> integer filed /_shadow /in the RegionData class to
> indicate the
> > >> status of a region, a global /GrowableArray
> _free_shadow/ to
> > store
> > >> the indices of shadow regions, and a global
> /Monitor/ to protect
> > >> the array. These information might help if the memory
> overhead
> > >> need to be evaluated.
> > >>
> > >> Looking forward to your insight.
> > >>
> > >> Best Regrads,
> > >> Haoyu Li,
> > >> Institute of Parallel and Distributed Systems(IPADS),
> > >> School of Software,
> > >> Shanghai Jiao Tong University
> > >>
> > >>
> > >> Kim Barrett <kim.barrett at oracle.com
> <mailto:kim.barrett at oracle.com>
> > <mailto:kim.barrett at oracle.com <mailto:kim.barrett at oracle.com>>
> > >> <mailto:kim.barrett at oracle.com
> <mailto:kim.barrett at oracle.com>
> > <mailto:kim.barrett at oracle.com
> <mailto:kim.barrett at oracle.com>>>> 于2019年3月12日周二 上午6:11写道:
> > >>
> > >> > On Mar 11, 2019, at 1:45 AM, Kim Barrett
> > >> <kim.barrett at oracle.com
> <mailto:kim.barrett at oracle.com> <mailto:kim.barrett at oracle.com
> <mailto:kim.barrett at oracle.com>>
> > <mailto:kim.barrett at oracle.com
> <mailto:kim.barrett at oracle.com> <mailto:kim.barrett at oracle.com
> <mailto:kim.barrett at oracle.com>>>> wrote:
> > >> >
> > >> >> On Jan 24, 2019, at 3:58 AM, Haoyu Li
> > <leihouyju at gmail.com <mailto:leihouyju at gmail.com>
> <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>>
> > >> <mailto:leihouyju at gmail.com
> <mailto:leihouyju at gmail.com>
> > <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>>>>
> wrote:
> > >> >>
> > >> >> Hi Kim,
> > >> >>
> > >> >> I have ported my patch to OpenJDK 13 according
> to your
> > >> instructions in your last mail, and the patch is
> attached in
> > >> this mail. The patch does not change much since
> PSGC is
> > indeed
> > >> pretty stable.
> > >> >>
> > >> >> Also, I evaluate the correctness and
> performance of
> > PS full
> > >> GC with benchmarks from DaCapo, SPECjvm2008, and
> JOlden
> > suits
> > >> on a machine with dual Intel Xeon E5-2618L v3 CPUs(16
> > physical
> > >> cores), 64G DRAM and linux kernel 4.17. The
> evaluation
> > result,
> > >> indicating 1.9X GC throughput improvement on
> average, is
> > >> attached, too.
> > >> >>
> > >> >> However, I have no idea how to further test this
> > patch for
> > >> both correctness and performance. Can I please
> get any
> > >> guidance from you or some sponsor?
> > >> >
> > >> > Sorry I missed that you had sent an updated
> version of the
> > >> patch.
> > >> >
> > >> > I’ve run the full regression suite across
> Oracle-supported
> > >> platforms. There are some
> > >> > failures, but there are almost always some
> failures in the
> > >> later tiers right now. I’ll start
> > >> > looking at them tomorrow to figure out whether
> any of them
> > >> are relevant.
> > >> >
> > >> > I’m also planning to run some of our performance
> > benchmarks.
> > >> >
> > >> > I’ve lightly skimmed the proposed changes.
> There might be
> > >> some code improvements
> > >> > to be made.
> > >> >
> > >> > I’m also wondering if this technique applies to
> other
> > >> collectors. It seems like both G1 and
> > >> > Shenandoah full gc’s might have similar
> issues? If so, a
> > >> solution that is ParallelGC-specific
> > >> > is less interesting than one that has broader
> > >> applicability. Though maybe this optimization
> > >> > is less important for G1 and Shenandoah, since they
> > actively
> > >> try to avoid full gc’s.
> > >> >
> > >> > I’m also not clear on how much additional
> memory might be
> > >> temporarily allocated by this
> > >> > mechanism.
> > >>
> > >> I’ve created a CR for this:
> > >> https://bugs.openjdk.java.net/browse/JDK-8220465
> > >>
> > >
> >
>
More information about the hotspot-gc-dev
mailing list