[PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance
Stefan Johansson
stefan.johansson at oracle.com
Thu Oct 10 12:37:18 UTC 2019
Hi,
On 2019-10-10 13:06, Haoyu Li wrote:
> Hi Stefan,
>
> Thanks for your testing! One possible reason for the regressions in
> simple tests is that the region dependencies maybe not heavy enough.
> Because the locality of shadow regions is lower than that of heap
> regions, writing to shadow regions will be slower than to normal
> regions, and this is a part of the reason why I reuse shadow regions.
> Therefore, if only a few shadow regions are created and not reused, the
> overhead may not be amortized.
I guess it is something like this. I thought that for "easy" heaps the
shadow regions won't be used at all, and should therefor not really cost
anything.
>
> As to the OCA, it is the case that I'm the only person signing the
> agreement. Please let me know if you have any further questions. Thanks
> again!
Ok, so you are the sole author of the patch. The important part, as the
agreement states, is:
"no other person or entity, including my employer, has or will have
rights with respect my contributions"
Is that the case?
Thanks,
Stefan
>
> Best Regrads,
> Haoyu Li
>
> Stefan Johansson <stefan.johansson at oracle.com
> <mailto:stefan.johansson at oracle.com>> 于2019年10月8日周二 下午6:49写道:
>
> Hi Haoyu,
>
> I've done some more testing and I haven't seen any issues with the
> patch
> so far and the performance looks promising in most cases. For simple
> tests I've seen some regressions, but I'm not really sure why. Will do
> some more digging.
>
> To move forward with this the first thing we need to do is making sure
> that you being covered by the Oracle Contributor Agreement is enough.
> From what we can see it is only you as an individual that has signed
> the OCA and in that case it is important that this statement from the
> OCA is fulfilled: "no other person or entity, including my employer,
> has
> or will have rights with respect my contributions"
>
> Is this the case for this contribution or should we have the university
> sign the OCA as well? For more information regarding the OCA please
> refer to:
> https://www.oracle.com/technetwork/oca-faq-405384.pdf
>
> Thanks,
> Stefan
>
> On 2019-09-16 16:02, Haoyu Li wrote:
> > FYI, the evaluation results on OpenJDK 14 are plotted in the
> attachment.
> > I compute the full GC throughput by dividing the heap size before
> full
> > GC by the GC pause time, and the results are arithmetic mean
> values of
> > ten runs after a warm-up run. The evaluation is conducted on a
> machine
> > with dual Intel ®XeonTM E5-2618L v3 CPUs (2 sockets, 16 physical
> cores
> > with SMT enabled) and 64G DRAM.
> >
> > Best Regrads,
> > Haoyu Li,
> > Institute of Parallel and Distributed Systems(IPADS),
> > School of Software,
> > Shanghai Jiao Tong University
> >
> >
> > Stefan Johansson <stefan.johansson at oracle.com
> <mailto:stefan.johansson at oracle.com>
> > <mailto:stefan.johansson at oracle.com
> <mailto:stefan.johansson at oracle.com>>> 于2019年9月12日周四 上午5:34
> 写道:
> >
> > Hi Haoyu,
> >
> > I recently came across your patch and I would like to pick up on
> > some of the things Kim mentioned in his mails. I especially want
> > evaluate and investigate if this is a technique we can use to
> > improve the other GCs as well. To start that work I want to
> take the
> > patch for a spin in our internal performance testing. The patch
> > doesn’t apply clean to the latest JDK repository, so if you could
> > provide an updated patch that would be very helpful.
> >
> > It would also be great if you could share some more information
> > around the results presented in the paper. For example, it
> would be
> > good to get the full command lines for the different
> benchmarks so
> > we can run them locally and reproduce the results you’ve seen.
> >
> > Thanks,
> > Stefan
> >
> >> 12 mars 2019 kl. 03:21 skrev Haoyu Li <leihouyju at gmail.com
> <mailto:leihouyju at gmail.com>
> >> <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>>>:
> >>
> >> Hi Kim,
> >>
> >> Thanks for reviewing and testing the patch. If there are any
> >> failures or performance degradation relevant to the work, please
> >> let me know and I'll be very happy to keep improving it.
> Also, any
> >> suggestions about code improvements are well appreciated.
> >>
> >> I'm not quite sure if both G1 and Shenandoah have the similar
> >> region dependency issue, since I haven't studied their GC
> >> behaviors before. If they have, I'm also willing to propose
> a more
> >> general optimization.
> >>
> >> As to the memory overhead, I believe it will be low because this
> >> patch exploits empty regions in the young space rather than
> >> off-heap memory to allocate shadow regions, and also reuses the
> >> /_source_region/ field of each /RegionData /to record the
> >> correspongding shadow region index. We only introduce a new
> >> integer filed /_shadow /in the RegionData class to indicate the
> >> status of a region, a global /GrowableArray _free_shadow/ to
> store
> >> the indices of shadow regions, and a global /Monitor/ to protect
> >> the array. These information might help if the memory overhead
> >> need to be evaluated.
> >>
> >> Looking forward to your insight.
> >>
> >> Best Regrads,
> >> Haoyu Li,
> >> Institute of Parallel and Distributed Systems(IPADS),
> >> School of Software,
> >> Shanghai Jiao Tong University
> >>
> >>
> >> Kim Barrett <kim.barrett at oracle.com
> <mailto:kim.barrett at oracle.com>
> >> <mailto:kim.barrett at oracle.com
> <mailto:kim.barrett at oracle.com>>> 于2019年3月12日周二 上午6:11写道:
> >>
> >> > On Mar 11, 2019, at 1:45 AM, Kim Barrett
> >> <kim.barrett at oracle.com <mailto:kim.barrett at oracle.com>
> <mailto:kim.barrett at oracle.com <mailto:kim.barrett at oracle.com>>> wrote:
> >> >
> >> >> On Jan 24, 2019, at 3:58 AM, Haoyu Li
> <leihouyju at gmail.com <mailto:leihouyju at gmail.com>
> >> <mailto:leihouyju at gmail.com
> <mailto:leihouyju at gmail.com>>> wrote:
> >> >>
> >> >> Hi Kim,
> >> >>
> >> >> I have ported my patch to OpenJDK 13 according to your
> >> instructions in your last mail, and the patch is attached in
> >> this mail. The patch does not change much since PSGC is
> indeed
> >> pretty stable.
> >> >>
> >> >> Also, I evaluate the correctness and performance of
> PS full
> >> GC with benchmarks from DaCapo, SPECjvm2008, and JOlden
> suits
> >> on a machine with dual Intel Xeon E5-2618L v3 CPUs(16
> physical
> >> cores), 64G DRAM and linux kernel 4.17. The evaluation
> result,
> >> indicating 1.9X GC throughput improvement on average, is
> >> attached, too.
> >> >>
> >> >> However, I have no idea how to further test this
> patch for
> >> both correctness and performance. Can I please get any
> >> guidance from you or some sponsor?
> >> >
> >> > Sorry I missed that you had sent an updated version of the
> >> patch.
> >> >
> >> > I’ve run the full regression suite across Oracle-supported
> >> platforms. There are some
> >> > failures, but there are almost always some failures in the
> >> later tiers right now. I’ll start
> >> > looking at them tomorrow to figure out whether any of them
> >> are relevant.
> >> >
> >> > I’m also planning to run some of our performance
> benchmarks.
> >> >
> >> > I’ve lightly skimmed the proposed changes. There might be
> >> some code improvements
> >> > to be made.
> >> >
> >> > I’m also wondering if this technique applies to other
> >> collectors. It seems like both G1 and
> >> > Shenandoah full gc’s might have similar issues? If so, a
> >> solution that is ParallelGC-specific
> >> > is less interesting than one that has broader
> >> applicability. Though maybe this optimization
> >> > is less important for G1 and Shenandoah, since they
> actively
> >> try to avoid full gc’s.
> >> >
> >> > I’m also not clear on how much additional memory might be
> >> temporarily allocated by this
> >> > mechanism.
> >>
> >> I’ve created a CR for this:
> >> https://bugs.openjdk.java.net/browse/JDK-8220465
> >>
> >
>
More information about the hotspot-gc-dev
mailing list