[PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance

Mon Jan 7 04:08:24 UTC 2019

Hi Paul,

Thanks for your reply, and hope you have had good holidays. I will wait for
the feedback.

Best,
Haoyu Li

Paul Su <paul.su at oracle.com> 于2019年1月7日周一 上午11:04写道：

> Hi Haoyu,
>
> Thanks for your contribution. The past two weeks were holidays for most of
> the US and European regions. We are also in the middle of a critical phase
> of our release process. We are aware of your proposal and will consider it
> and provide feedback as soon as possible.
>
> Thanks,
> Paul
>
> On Jan 6, 2019, at 5:41 PM, Haoyu Li <leihouyju at gmail.com> wrote:
>
> Hi all,
>
> I submitted a patch about two weeks ago in the previous mail, however, I
> have not received any response so far. Did I miss something? I just follow
> the instructions in the webpage about *How to Contribute*. Can someone
> sponsor this patch? Any reviews are well appreciated!
>
> Best Regrads,
> Haoyu Li,
> Institute of Parallel and Distributed Systems(IPADS),
> School of Software,
> Shanghai Jiao Tong University
>
>
> Haoyu Li <leihouyju at gmail.com> 于2018年12月24日周一 上午1:38写道：
>
>> Hi all，
>> I have developed a patch to enhance the full GC performance of Parallel
>> Scavenge on OpenJDK 11, may I have some reviews? The patch is described as
>> follows and attached in this mail.
>>
>> *Problem*
>> Parallel Scavenge(PS) implements a compacting algorithm to do the full
>> GC, and we find that this algorithm leads in terrible GC thread utilization
>> (like only 8% on Derby benchmark in SPECjvm2008 suite) since there are
>> serious dependencies between heap regions, i.e., a region is available to
>> receive live objects from its source regions only after it has been
>> collected. The work stealing does not solve this problem, idle GC threads
>> cannot steal anything because most regions are unavailable to collect.
>>
>> *Optimization*
>> We propose *shadow region* to solve the above problem. The basic idea is
>> to let GC threads collect unavailable regions in advance by copying their
>> live data into newly allocated empty regions, i.e., shadow regions, to
>> resolve the region dependencies. The contents of shadow regions will be
>> copied back to the corresponding regions later. With our approach, GC
>> threads can keep working at most of the time without suffering from any
>> work stealing failure (except the work stealing failure happened in the end
>> of a full GC). And we notice that the to space in young gen is always
>> empty, so we exploit the empty regions in to space to play the role of
>> shadow regoins (if the ScavengeBeforeFullGC option is on, regions in eden
>> space may be used, too) and avoid allocating shadow regions from off heap
>> memory.
>>
>> *Evaluation*
>> We evaluate the full GC performance with our patch on DaCapo,
>> SPECjvm2008, JOlden benchmark suits, and the results shows that shadow
>> region optimization could improve full GC throughput by 2.1X on average, up
>> to 3.2X.
>>
>> The patch and evaluation result are attached.
>>
>> Best Regrads,
>> Haoyu Li,
>> Institute of Parallel and Distributed Systems(IPADS),
>> School of Software,
>> Shanghai Jiao Tong University
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20190107/64802545/attachment.htm>