[PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance
Stefan Johansson
stefan.johansson at oracle.com
Mon Oct 14 13:00:22 UTC 2019
Thanks for the quick update Haoyu,
This is a great improvement and I will try to find time to look into the
patch in more detail the coming weeks.
Thanks,
Stefan
On 2019-10-11 14:49, Haoyu Li wrote:
> Hi Stefan,
>
> Thanks for your suggestion! It is very redundant that
> PSParallelCompact::fill_shadow_region() copies most code from
> PSParallelCompact::fill_region(), and therefore I've refactored these
> two functions to share code as many as possible. And the attachment is
> the updated patch.
>
> Specifically, the closure, which moves objects, in
> PSParallelCompact::fill_region() is now declared as a template of
> either MoveAndUpdateClosure or ShadowClosure. So by controlling the
> type of closure when invoking the function, we can decide whether to
> fill a normal region or a shadow one. Thus, almost all code in
> PSParallelCompact::fill_region() can be reused.
>
> Besides, a virtual function named complete_region() is added in both
> closures to do some work after the filling, such setting states and
> copying the shadow region back.
>
> Thanks again for reviewing the patch, looking forward to your insights
> and suggestions!
>
> Best Regards,
> Haoyu Li
>
> 2019-10-10 21:50 GMT+08:00, Stefan Johansson <stefan.johansson at oracle.com>:
>> Thanks for the clarification =)
>>
>> Moving on to the next part, the code in the patch. So this won't be a
>> full review of the patch but just an initial comment that I would like
>> to be addressed first.
>>
>> The new function PSParallelCompact::fill_shadow_region() is more or less
>> a copy of PSParallelCompact::fill_region() and I understand that from a
>> proof of concept point of view it was the easy (and right) way to do it.
>> I would prefer if the code could be refactored so that fill_region() and
>> fill_shadow_region() share more code. There might be reasons that I've
>> missed, that prevents it, but we should at least explore how much code
>> can be shared.
>>
>> Thanks,
>> Stefan
>>
>> On 2019-10-10 15:10, Haoyu Li wrote:
>>> Hi Stefan,
>>>
>>> Thanks for your quick response! As to your concern about the OCA, I am
>>> the sole author of the patch. And it is the case as what the agreement
>>> states.
>>> Best Regrads,
>>> Haoyu Li,
>>>
>>>
>>> Stefan Johansson <stefan.johansson at oracle.com
>>> <mailto:stefan.johansson at oracle.com>> 于2019年10月10日周四 下午8:37写道:
>>>
>>> Hi,
>>>
>>> On 2019-10-10 13:06, Haoyu Li wrote:
>>> > Hi Stefan,
>>> >
>>> > Thanks for your testing! One possible reason for the regressions
>>> in
>>> > simple tests is that the region dependencies maybe not heavy
>>> enough.
>>> > Because the locality of shadow regions is lower than that of heap
>>> > regions, writing to shadow regions will be slower than to normal
>>> > regions, and this is a part of the reason why I reuse shadow
>>> regions.
>>> > Therefore, if only a few shadow regions are created and not
>>> reused, the
>>> > overhead may not be amortized.
>>>
>>> I guess it is something like this. I thought that for "easy" heaps
>>> the
>>> shadow regions won't be used at all, and should therefor not really
>>> cost
>>> anything.
>>>
>>> >
>>> > As to the OCA, it is the case that I'm the only person signing the
>>> > agreement. Please let me know if you have any further questions.
>>> Thanks
>>> > again!
>>>
>>> Ok, so you are the sole author of the patch. The important part, as
>>> the
>>> agreement states, is:
>>> "no other person or entity, including my employer, has or will have
>>> rights with respect my contributions"
>>>
>>> Is that the case?
>>>
>>> Thanks,
>>> Stefan
>>>
>>> >
>>> > Best Regrads,
>>> > Haoyu Li
>>> >
>>> > Stefan Johansson <stefan.johansson at oracle.com
>>> <mailto:stefan.johansson at oracle.com>
>>> > <mailto:stefan.johansson at oracle.com
>>> <mailto:stefan.johansson at oracle.com>>> 于2019年10月8日周二 下午6:49
>>> 写道:
>>> >
>>> > Hi Haoyu,
>>> >
>>> > I've done some more testing and I haven't seen any issues
>>> with the
>>> > patch
>>> > so far and the performance looks promising in most cases. For
>>> simple
>>> > tests I've seen some regressions, but I'm not really sure
>>> why. Will do
>>> > some more digging.
>>> >
>>> > To move forward with this the first thing we need to do is
>>> making sure
>>> > that you being covered by the Oracle Contributor Agreement is
>>> enough.
>>> > From what we can see it is only you as an individual that
>>> has signed
>>> > the OCA and in that case it is important that this statement
>>> from the
>>> > OCA is fulfilled: "no other person or entity, including my
>>> employer,
>>> > has
>>> > or will have rights with respect my contributions"
>>> >
>>> > Is this the case for this contribution or should we have the
>>> university
>>> > sign the OCA as well? For more information regarding the OCA
>>> please
>>> > refer to:
>>> > https://www.oracle.com/technetwork/oca-faq-405384.pdf
>>> >
>>> > Thanks,
>>> > Stefan
>>> >
>>> > On 2019-09-16 16:02, Haoyu Li wrote:
>>> > > FYI, the evaluation results on OpenJDK 14 are plotted in
>>> the
>>> > attachment.
>>> > > I compute the full GC throughput by dividing the heap size
>>> before
>>> > full
>>> > > GC by the GC pause time, and the results are arithmetic
>>> mean
>>> > values of
>>> > > ten runs after a warm-up run. The evaluation is conducted on
>>> a
>>> > machine
>>> > > with dual Intel ®XeonTM E5-2618L v3 CPUs (2 sockets, 16
>>> physical
>>> > cores
>>> > > with SMT enabled) and 64G DRAM.
>>> > >
>>> > > Best Regrads,
>>> > > Haoyu Li,
>>> > > Institute of Parallel and Distributed Systems(IPADS),
>>> > > School of Software,
>>> > > Shanghai Jiao Tong University
>>> > >
>>> > >
>>> > > Stefan Johansson <stefan.johansson at oracle.com
>>> <mailto:stefan.johansson at oracle.com>
>>> > <mailto:stefan.johansson at oracle.com
>>> <mailto:stefan.johansson at oracle.com>>
>>> > > <mailto:stefan.johansson at oracle.com
>>> <mailto:stefan.johansson at oracle.com>
>>> > <mailto:stefan.johansson at oracle.com
>>> <mailto:stefan.johansson at oracle.com>>>> 于2019年9月12日周四 上午5:34
>>> > 写道:
>>> > >
>>> > > Hi Haoyu,
>>> > >
>>> > > I recently came across your patch and I would like to
>>> pick up on
>>> > > some of the things Kim mentioned in his mails. I
>>> especially want
>>> > > evaluate and investigate if this is a technique we can
>>> use to
>>> > > improve the other GCs as well. To start that work I
>>> want to
>>> > take the
>>> > > patch for a spin in our internal performance testing.
>>> The patch
>>> > > doesn’t apply clean to the latest JDK repository, so
>>> if you could
>>> > > provide an updated patch that would be very helpful.
>>> > >
>>> > > It would also be great if you could share some more
>>> information
>>> > > around the results presented in the paper. For example,
>>> it
>>> > would be
>>> > > good to get the full command lines for the different
>>> > benchmarks so
>>> > > we can run them locally and reproduce the
>>> results you’ve seen.
>>> > >
>>> > > Thanks,
>>> > > Stefan
>>> > >
>>> > >> 12 mars 2019 kl. 03:21 skrev Haoyu Li
>>> <leihouyju at gmail.com <mailto:leihouyju at gmail.com>
>>> > <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>>
>>> > >> <mailto:leihouyju at gmail.com
>>> <mailto:leihouyju at gmail.com> <mailto:leihouyju at gmail.com
>>> <mailto:leihouyju at gmail.com>>>>:
>>> > >>
>>> > >> Hi Kim,
>>> > >>
>>> > >> Thanks for reviewing and testing the patch. If there
>>> are any
>>> > >> failures or performance degradation relevant to the
>>> work, please
>>> > >> let me know and I'll be very happy to keep improving
>>> it.
>>> > Also, any
>>> > >> suggestions about code improvements are well
>>> appreciated.
>>> > >>
>>> > >> I'm not quite sure if both G1 and Shenandoah have the
>>> similar
>>> > >> region dependency issue, since I haven't studied their
>>> GC
>>> > >> behaviors before. If they have, I'm also willing to
>>> propose
>>> > a more
>>> > >> general optimization.
>>> > >>
>>> > >> As to the memory overhead, I believe it will be low
>>> because this
>>> > >> patch exploits empty regions in the young space
>>> rather than
>>> > >> off-heap memory to allocate shadow regions, and also
>>> reuses the
>>> > >> /_source_region/ field of each /RegionData /to record
>>> the
>>> > >> correspongding shadow region index. We only introduce
>>> a new
>>> > >> integer filed /_shadow /in the RegionData class to
>>> indicate the
>>> > >> status of a region, a global /GrowableArray
>>> _free_shadow/ to
>>> > store
>>> > >> the indices of shadow regions, and a global
>>> /Monitor/ to protect
>>> > >> the array. These information might help if the memory
>>> overhead
>>> > >> need to be evaluated.
>>> > >>
>>> > >> Looking forward to your insight.
>>> > >>
>>> > >> Best Regrads,
>>> > >> Haoyu Li,
>>> > >> Institute of Parallel and Distributed Systems(IPADS),
>>> > >> School of Software,
>>> > >> Shanghai Jiao Tong University
>>> > >>
>>> > >>
>>> > >> Kim Barrett <kim.barrett at oracle.com
>>> <mailto:kim.barrett at oracle.com>
>>> > <mailto:kim.barrett at oracle.com
>>> <mailto:kim.barrett at oracle.com>>
>>> > >> <mailto:kim.barrett at oracle.com
>>> <mailto:kim.barrett at oracle.com>
>>> > <mailto:kim.barrett at oracle.com
>>> <mailto:kim.barrett at oracle.com>>>> 于2019年3月12日周二 上午6:11写道:
>>> > >>
>>> > >> > On Mar 11, 2019, at 1:45 AM, Kim Barrett
>>> > >> <kim.barrett at oracle.com
>>> <mailto:kim.barrett at oracle.com> <mailto:kim.barrett at oracle.com
>>> <mailto:kim.barrett at oracle.com>>
>>> > <mailto:kim.barrett at oracle.com
>>> <mailto:kim.barrett at oracle.com> <mailto:kim.barrett at oracle.com
>>> <mailto:kim.barrett at oracle.com>>>> wrote:
>>> > >> >
>>> > >> >> On Jan 24, 2019, at 3:58 AM, Haoyu Li
>>> > <leihouyju at gmail.com <mailto:leihouyju at gmail.com>
>>> <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>>
>>> > >> <mailto:leihouyju at gmail.com
>>> <mailto:leihouyju at gmail.com>
>>> > <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>>>>
>>> wrote:
>>> > >> >>
>>> > >> >> Hi Kim,
>>> > >> >>
>>> > >> >> I have ported my patch to OpenJDK 13 according
>>> to your
>>> > >> instructions in your last mail, and the patch is
>>> attached in
>>> > >> this mail. The patch does not change much since
>>> PSGC is
>>> > indeed
>>> > >> pretty stable.
>>> > >> >>
>>> > >> >> Also, I evaluate the correctness and
>>> performance of
>>> > PS full
>>> > >> GC with benchmarks from DaCapo, SPECjvm2008, and
>>> JOlden
>>> > suits
>>> > >> on a machine with dual Intel Xeon E5-2618L v3
>>> CPUs(16
>>> > physical
>>> > >> cores), 64G DRAM and linux kernel 4.17. The
>>> evaluation
>>> > result,
>>> > >> indicating 1.9X GC throughput improvement on
>>> average, is
>>> > >> attached, too.
>>> > >> >>
>>> > >> >> However, I have no idea how to further test
>>> this
>>> > patch for
>>> > >> both correctness and performance. Can I please
>>> get any
>>> > >> guidance from you or some sponsor?
>>> > >> >
>>> > >> > Sorry I missed that you had sent an updated
>>> version of the
>>> > >> patch.
>>> > >> >
>>> > >> > I’ve run the full regression suite across
>>> Oracle-supported
>>> > >> platforms. There are some
>>> > >> > failures, but there are almost always some
>>> failures in the
>>> > >> later tiers right now. I’ll start
>>> > >> > looking at them tomorrow to figure out whether
>>> any of them
>>> > >> are relevant.
>>> > >> >
>>> > >> > I’m also planning to run some of our performance
>>> > benchmarks.
>>> > >> >
>>> > >> > I’ve lightly skimmed the proposed changes.
>>> There might be
>>> > >> some code improvements
>>> > >> > to be made.
>>> > >> >
>>> > >> > I’m also wondering if this technique applies to
>>> other
>>> > >> collectors. It seems like both G1 and
>>> > >> > Shenandoah full gc’s might have similar
>>> issues? If so, a
>>> > >> solution that is ParallelGC-specific
>>> > >> > is less interesting than one that has broader
>>> > >> applicability. Though maybe this optimization
>>> > >> > is less important for G1 and Shenandoah, since
>>> they
>>> > actively
>>> > >> try to avoid full gc’s.
>>> > >> >
>>> > >> > I’m also not clear on how much additional
>>> memory might be
>>> > >> temporarily allocated by this
>>> > >> > mechanism.
>>> > >>
>>> > >> I’ve created a CR for this:
>>> > >> https://bugs.openjdk.java.net/browse/JDK-8220465
>>> > >>
>>> > >
>>> >
>>>
>>
>
>
More information about the hotspot-gc-dev
mailing list