[PATCH] Exploit Empty Regions in Young Gen to Enhance PS Full GC Performance

Thu Oct 24 12:16:03 UTC 2019

Hi Haoyu,

On 2019-10-23 17:15, Haoyu Li wrote:
> Hi Stefan,
> 
> Thanks for your constructive feedback. I've addressed all the issues you 
> mentioned, and the updated patch is attached in this email.
Nice, I will look at the patch next week, but I'll shortly answer your 
questions right away.

> 
> During refining the patch, I have a couple of questions:
> 1) Now the MoveAndUpdateClosure and ShadowClosure assume the destination 
> address is the very beginning of a region, instead of an arbitrary 
> address like what it used to be. However, there is an unused function 
> named PSParallelCompact::move_and_update() uses the MoveAndUpdateClosure 
> to process a region from its middle, which conflicts with the 
> assumption. I notice that you removed this function in your patch, and 
> so did I in the updated patch. Does it matter?
Yes, I found this function during my code review and it should be 
removed, but I think that should be handled as a separate issue. We can 
do this removal before this patch goes in.

> 2) Using the same do_addr() in MoveAndUpdateClosure and ShadowClosure is 
> doable, but it does not reuse all the code neatly. Because storing the 
> address of the shadow region in _destination requires extra virtual 
> functions to handle allocating blocks in the start_array and setting 
> addresses of deferred objects. In particular, allocate_blocks() and 
> set_deferred_object_for() in both closures are added. Is it worth 
> avoiding to use _offset to calculate the shadow_destination?
Ok, sounds like it might be better to have specific do_addr() functions 
then. I'll think some more around this when reviewing the new patch in 
depth.

> 
> If there are any problems with this patch, please contact me anytime. 
> I'm more than happy to keep improving the code. Thanks again for reviewing.
>
Sound good, thanks,
Stefan

> Best,
> Haoyu Li
> 
> 
> Stefan Johansson <stefan.johansson at oracle.com 
> <mailto:stefan.johansson at oracle.com>> 于2019年10月22日周二 下午9:42写道：
> 
>     Hi Haoyu,
> 
>     I've reviewed the patch now and have some comments and questions.
> 
>     To simplify the review and have a common base to look at I've created a
>     webrev at:
>     http://cr.openjdk.java.net/~sjohanss/8220465/00/
> 
>     One general note first, most of the new code uses four space
>     indentation, in hotspot the standard is two spaces, please change this.
>     Below are some file by file comments.
> 
>     src/hotspot/share/gc/parallel/psCompactionManager.cpp
>     ---
>         53 GrowableArray<size_t >* ParCompactionManager::_free_shadow = new
>     (ResourceObj::C_HEAP, mtInternal) GrowableArray<size_t >(10, true);
>         54 Monitor*                ParCompactionManager::_monitor = NULL;
> 
>     Set _free_shadow to NULL here like the other statics and then create
>     the
>     GrowableArray in initialize(). I also think _shadow_region_array or
>     something like that would be a better name and the monitor should also
>     be named something that signals that it is used for this array.
>     ---
>         70   if (_monitor == NULL) {
>         71       _monitor = new Monitor(Mutex::barrier, "CompactionManager
>     monitor",
>         72                              Mutex::_allow_vm_block_flag,
>     Monitor::_safepoint_check_never);
>         73   }
> 
>     Instead of doing the monitor creation here having to check for NULL, do
>     it in initialize() below together with the array creation.
>     ---
> 
>     src/hotspot/share/gc/parallel/psParallelCompact.cpp
>     ---
>     2974       if (cur->push()) {
> 
>     Correct me if I'm wrong, if this call to push() returns true it means
>     that nobody else has "stolen" it (used a shadow region to prepare it)
>     and we mark it as pushed. But when pushed in this code path this is the
>     end state for this RegionData? If this is the case I think it would be
>     easier to understand the code if we added another function and state
>     for
>     when we "steal" it. Haven't thought very much about the names but I
>     think you understand what I want to achieve:
>     Normal path:
>     UNUSED -> push() -> NORMAL
>     Steal path:
>     UNUSED -> steal() -> STOLEN -> fill() -> FILLED -> copy() -> SHADOW
> 
>     We could then also assert in set_completed() that the state is either
>     NORMAL or SHADOW (or if they have a shared end state DONE). As I said
>     the names can be improved (both for the states and the functions) but I
>     think we should have names and not just numbers.
>     ---
> 
>     3060 template <class T>
>     3061 void PSParallelCompact::fill_region(ParCompactionManager* cm,
>     size_t region_idx, size_t shadow, size_t offset)
> 
>     As I told you this was a big improvement from the first patch, but I
>     think there is room for even more improvements around the way we
>     pass in
>     ignored parameters to MoveAndUpdateClosure. Explaining my idea in text
>     is harder than code, so I created a patch, what do you think about this?
>     http://cr.openjdk.java.net/~sjohanss/8220465/00-alt/
> 
>     This alternative is based on 00 and does not take my other comments
>     into
>     consideration. So it might have to be altered a bit if you address some
>     of my other comments/questions.
>     ---
> 
>     3196 void PSParallelCompact::copy_back(HeapWord *region_addr, HeapWord
>     *shadow_addr) {
> 
>     I think the paramenter should change place, so that it corresponds with
>     the copy below.
>     ---
> 
>     3200 bool PSParallelCompact::steal_shadow_region(ParCompactionManager*
>     cm, size_t &region_idx) {
>     3201     size_t& record = cm->shadow_record();
> 
>     Did you consider to just let shadow_record() be a simple getter instead
>     of getting a reference and then have a next_shadow_record() which
>     advances it by active_workers?
>     ---
> 
>     3236 void PSParallelCompact::initialize_steal_record(uint which) {
> 
>     I'm having a hard time understanding the details here, or I get that
>     all
>     threads should have a separate shadow record, but I'm not sure why
>     it is
>     not enough to just do:
>     size_t record = _summary_data.addr_to_region_idx(
>         _space_info[old_space_id].dense_prefix());
>     cm->set_shadow_record(record + which);
> 
>     As you can see I'm also suggesting adding a setter for shadow_record.
>     ---
> 
>     3434 ParMarkBitMapClosure::IterationStatus
>     3435 ShadowClosure::do_addr(HeapWord* addr, size_t words) {
>     3436     HeapWord* shadow_destination = destination() + _offset;
> 
>     Using an offset instead of a given address feels a bit backwards, did
>     you consider letting the closure keep and update a _shadow_destination
>     instead? Or would it even be possible to just set destination to be the
>     shadow region address? In that case it should be possible to just use
>     the do_addr and other functions from the MoveAndUpdateClosure.
> 
>     I see from looking at this particular function that there is one assert
>     that would have to change:
>     3408
>     assert(PSParallelCompact::summary_data().calc_new_pointer(source(),
>     compaction_manager()) ==
>     3409          destination(), "wrong destination");
> 
>     This should be easily fixed by adding a virtual function
>     check_destination, that has a special implementation for the
>     ShadowClosure.
>     ---
> 
>     src/hotspot/share/gc/parallel/psParallelCompact.hpp
>     ---
>        333     // Preempt the region to avoid double processes
>        334     inline bool push();
>        335     // Mark the region as filled and ready to be copied back
>        336     inline bool fill();
>        337     // Preempt the region to copy the shadow region content back
>        338     inline bool copy();
> 
>     As mentioned, I think there might be better names for those functions
>     and the comments. Maybe adding a prefix would make the code more self
>     explaining. try_push(), mark_filled(), try_copy() and the new
>     try_steal().
>     ---
> 
>     Thanks again for providing this patch, I look forward to see an updated
>     version.
> 
>     Cheers,
>     Stefan
> 
> 
>     On 2019-10-14 15:00, Stefan Johansson wrote:
>      > Thanks for the quick update Haoyu,
>      >
>      > This is a great improvement and I will try to find time to look
>     into the
>      > patch in more detail the coming weeks.
>      >
>      > Thanks,
>      > Stefan
>      >
>      > On 2019-10-11 14:49, Haoyu Li wrote:
>      >> Hi Stefan,
>      >>
>      >> Thanks for your suggestion! It is very redundant that
>      >> PSParallelCompact::fill_shadow_region() copies most code from
>      >> PSParallelCompact::fill_region(), and therefore I've refactored
>     these
>      >> two functions to share code as many as possible. And the
>     attachment is
>      >> the updated patch.
>      >>
>      >> Specifically, the closure, which moves objects, in
>      >> PSParallelCompact::fill_region() is now declared as a template of
>      >> either MoveAndUpdateClosure or ShadowClosure. So by controlling the
>      >> type of closure when invoking the function, we can decide whether to
>      >> fill a normal region or a shadow one. Thus, almost all code in
>      >> PSParallelCompact::fill_region() can be reused.
>      >>
>      >> Besides, a virtual function named complete_region() is added in both
>      >> closures to do some work after the filling, such setting states and
>      >> copying the shadow region back.
>      >>
>      >> Thanks again for reviewing the patch, looking forward to your
>     insights
>      >> and suggestions!
>      >>
>      >> Best Regards,
>      >> Haoyu Li
>      >>
>      >> 2019-10-10 21:50 GMT+08:00, Stefan Johansson
>      >> <stefan.johansson at oracle.com <mailto:stefan.johansson at oracle.com>>:
>      >>> Thanks for the clarification =)
>      >>>
>      >>> Moving on to the next part, the code in the patch. So this
>     won't be a
>      >>> full review of the patch but just an initial comment that I
>     would like
>      >>> to be addressed first.
>      >>>
>      >>> The new function PSParallelCompact::fill_shadow_region() is
>     more or less
>      >>> a copy of PSParallelCompact::fill_region() and I understand
>     that from a
>      >>> proof of concept point of view it was the easy (and right) way
>     to do it.
>      >>> I would prefer if the code could be refactored so that
>     fill_region() and
>      >>> fill_shadow_region() share more code. There might be reasons
>     that I've
>      >>> missed, that prevents it, but we should at least explore how
>     much code
>      >>> can be shared.
>      >>>
>      >>> Thanks,
>      >>> Stefan
>      >>>
>      >>> On 2019-10-10 15:10, Haoyu Li wrote:
>      >>>> Hi Stefan,
>      >>>>
>      >>>> Thanks for your quick response! As to your concern about the
>     OCA, I am
>      >>>> the sole author of the patch. And it is the case as what the
>     agreement
>      >>>> states.
>      >>>> Best Regrads,
>      >>>> Haoyu Li,
>      >>>>
>      >>>>
>      >>>> Stefan Johansson <stefan.johansson at oracle.com
>     <mailto:stefan.johansson at oracle.com>
>      >>>> <mailto:stefan.johansson at oracle.com
>     <mailto:stefan.johansson at oracle.com>>> 于2019年10月10日周四 下午8:37
>      >>>> 写道：
>      >>>>
>      >>>>      Hi,
>      >>>>
>      >>>>      On 2019-10-10 13:06, Haoyu Li wrote:
>      >>>>       > Hi Stefan,
>      >>>>       >
>      >>>>       > Thanks for your testing! One possible reason for the
>      >>>> regressions
>      >>>> in
>      >>>>       > simple tests is that the region dependencies maybe not
>     heavy
>      >>>> enough.
>      >>>>       > Because the locality of shadow regions is lower than
>     that of
>      >>>> heap
>      >>>>       > regions, writing to shadow regions will be slower than to
>      >>>> normal
>      >>>>       > regions, and this is a part of the reason why I reuse
>     shadow
>      >>>>      regions.
>      >>>>       > Therefore, if only a few shadow regions are created
>     and not
>      >>>>      reused, the
>      >>>>       > overhead may not be amortized.
>      >>>>
>      >>>>      I guess it is something like this. I thought that for
>     "easy" heaps
>      >>>> the
>      >>>>      shadow regions won't be used at all, and should therefor not
>      >>>> really
>      >>>>      cost
>      >>>>      anything.
>      >>>>
>      >>>>       >
>      >>>>       > As to the OCA, it is the case that I'm the only person
>      >>>> signing the
>      >>>>       > agreement. Please let me know if you have any further
>      >>>> questions.
>      >>>>      Thanks
>      >>>>       > again!
>      >>>>
>      >>>>      Ok, so you are the sole author of the patch. The important
>      >>>> part, as
>      >>>> the
>      >>>>      agreement states, is:
>      >>>>      "no other person or entity, including my employer, has or
>     will
>      >>>> have
>      >>>>      rights with respect my contributions"
>      >>>>
>      >>>>      Is that the case?
>      >>>>
>      >>>>      Thanks,
>      >>>>      Stefan
>      >>>>
>      >>>>       >
>      >>>>       > Best Regrads,
>      >>>>       > Haoyu Li
>      >>>>       >
>      >>>>       > Stefan Johansson <stefan.johansson at oracle.com
>     <mailto:stefan.johansson at oracle.com>
>      >>>>      <mailto:stefan.johansson at oracle.com
>     <mailto:stefan.johansson at oracle.com>>
>      >>>>       > <mailto:stefan.johansson at oracle.com
>     <mailto:stefan.johansson at oracle.com>
>      >>>>      <mailto:stefan.johansson at oracle.com
>     <mailto:stefan.johansson at oracle.com>>>> 于2019年10月8日周二 下午
>      >>>> 6:49
>      >>>>      写道：
>      >>>>       >
>      >>>>       >     Hi Haoyu,
>      >>>>       >
>      >>>>       >     I've done some more testing and I haven't seen any
>     issues
>      >>>>      with the
>      >>>>       >     patch
>      >>>>       >     so far and the performance looks promising in most
>      >>>> cases. For
>      >>>>      simple
>      >>>>       >     tests I've seen some regressions, but I'm not
>     really sure
>      >>>>      why. Will do
>      >>>>       >     some more digging.
>      >>>>       >
>      >>>>       >     To move forward with this the first thing we need
>     to do is
>      >>>>      making sure
>      >>>>       >     that you being covered by the Oracle Contributor
>      >>>> Agreement is
>      >>>>      enough.
>      >>>>       >       From what we can see it is only you as an
>     individual that
>      >>>>      has signed
>      >>>>       >     the OCA and in that case it is important that this
>      >>>> statement
>      >>>>      from the
>      >>>>       >     OCA is fulfilled: "no other person or entity,
>     including my
>      >>>>      employer,
>      >>>>       >     has
>      >>>>       >     or will have rights with respect my contributions"
>      >>>>       >
>      >>>>       >     Is this the case for this contribution or should
>     we have
>      >>>> the
>      >>>>      university
>      >>>>       >     sign the OCA as well? For more information
>     regarding the
>      >>>> OCA
>      >>>>      please
>      >>>>       >     refer to:
>      >>>>       > https://www.oracle.com/technetwork/oca-faq-405384.pdf
>      >>>>       >
>      >>>>       >     Thanks,
>      >>>>       >     Stefan
>      >>>>       >
>      >>>>       >     On 2019-09-16 16:02, Haoyu Li wrote:
>      >>>>       >      > FYI, the evaluation results on OpenJDK 14 are
>     plotted in
>      >>>> the
>      >>>>       >     attachment.
>      >>>>       >      > I compute the full GC throughput by dividing
>     the heap
>      >>>> size
>      >>>>      before
>      >>>>       >     full
>      >>>>       >      > GC by the GC pause time, and the results are
>     arithmetic
>      >>>> mean
>      >>>>       >     values of
>      >>>>       >      > ten runs after a warm-up run. The evaluation is
>      >>>> conducted on
>      >>>> a
>      >>>>       >     machine
>      >>>>       >      > with dual Intel ®XeonTM E5-2618L v3 CPUs (2
>     sockets, 16
>      >>>>      physical
>      >>>>       >     cores
>      >>>>       >      > with SMT enabled) and 64G DRAM.
>      >>>>       >      >
>      >>>>       >      > Best Regrads,
>      >>>>       >      > Haoyu Li,
>      >>>>       >      > Institute of Parallel and Distributed
>     Systems(IPADS),
>      >>>>       >      > School of Software,
>      >>>>       >      > Shanghai Jiao Tong University
>      >>>>       >      >
>      >>>>       >      >
>      >>>>       >      > Stefan Johansson <stefan.johansson at oracle.com
>     <mailto:stefan.johansson at oracle.com>
>      >>>>      <mailto:stefan.johansson at oracle.com
>     <mailto:stefan.johansson at oracle.com>>
>      >>>>       >     <mailto:stefan.johansson at oracle.com
>     <mailto:stefan.johansson at oracle.com>
>      >>>>      <mailto:stefan.johansson at oracle.com
>     <mailto:stefan.johansson at oracle.com>>>
>      >>>>       >      > <mailto:stefan.johansson at oracle.com
>     <mailto:stefan.johansson at oracle.com>
>      >>>>      <mailto:stefan.johansson at oracle.com
>     <mailto:stefan.johansson at oracle.com>>
>      >>>>       >     <mailto:stefan.johansson at oracle.com
>     <mailto:stefan.johansson at oracle.com>
>      >>>>      <mailto:stefan.johansson at oracle.com
>     <mailto:stefan.johansson at oracle.com>>>>> 于2019年9月12日周四 上
>      >>>> 午5:34
>      >>>>       >     写道：
>      >>>>       >      >
>      >>>>       >      >     Hi Haoyu,
>      >>>>       >      >
>      >>>>       >      >     I recently came across your patch and I would
>      >>>> like to
>      >>>>      pick up on
>      >>>>       >      >     some of the things Kim mentioned in his
>     mails. I
>      >>>>      especially want
>      >>>>       >      >     evaluate and investigate if this is a
>     technique
>      >>>> we can
>      >>>>      use to
>      >>>>       >      >     improve the other GCs as well. To start
>     that work I
>      >>>>      want to
>      >>>>       >     take the
>      >>>>       >      >     patch for a spin in our internal performance
>      >>>> testing.
>      >>>>      The patch
>      >>>>       >      >     doesn’t apply clean to the latest JDK
>     repository, so
>      >>>>      if you could
>      >>>>       >      >     provide an updated patch that would be very
>     helpful.
>      >>>>       >      >
>      >>>>       >      >     It would also be great if you could share
>     some more
>      >>>>      information
>      >>>>       >      >     around the results presented in the paper. For
>      >>>> example,
>      >>>> it
>      >>>>       >     would be
>      >>>>       >      >     good to get the full command lines for the
>     different
>      >>>>       >     benchmarks so
>      >>>>       >      >     we can run them locally and reproduce the
>      >>>>      results you’ve seen.
>      >>>>       >      >
>      >>>>       >      >     Thanks,
>      >>>>       >      >     Stefan
>      >>>>       >      >
>      >>>>       >      >>     12 mars 2019 kl. 03:21 skrev Haoyu Li
>      >>>>      <leihouyju at gmail.com <mailto:leihouyju at gmail.com>
>     <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>>
>      >>>>       >     <mailto:leihouyju at gmail.com
>     <mailto:leihouyju at gmail.com> <mailto:leihouyju at gmail.com
>     <mailto:leihouyju at gmail.com>>>
>      >>>>       >      >>     <mailto:leihouyju at gmail.com
>     <mailto:leihouyju at gmail.com>
>      >>>>      <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>>
>     <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>
>      >>>>      <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>>>>>:
>      >>>>       >      >>
>      >>>>       >      >>     Hi Kim,
>      >>>>       >      >>
>      >>>>       >      >>     Thanks for reviewing and testing the
>     patch. If
>      >>>> there
>      >>>>      are any
>      >>>>       >      >>     failures or performance degradation
>     relevant to the
>      >>>>      work, please
>      >>>>       >      >>     let me know and I'll be very happy to keep
>      >>>> improving
>      >>>> it.
>      >>>>       >     Also, any
>      >>>>       >      >>     suggestions about code improvements are well
>      >>>> appreciated.
>      >>>>       >      >>
>      >>>>       >      >>     I'm not quite sure if both G1 and Shenandoah
>      >>>> have the
>      >>>>      similar
>      >>>>       >      >>     region dependency issue, since I haven't
>     studied
>      >>>> their
>      >>>> GC
>      >>>>       >      >>     behaviors before. If they have, I'm also
>     willing to
>      >>>>      propose
>      >>>>       >     a more
>      >>>>       >      >>     general optimization.
>      >>>>       >      >>
>      >>>>       >      >>     As to the memory overhead, I believe it
>     will be low
>      >>>>      because this
>      >>>>       >      >>     patch exploits empty regions in the young
>     space
>      >>>>      rather than
>      >>>>       >      >>     off-heap memory to allocate shadow
>     regions, and
>      >>>> also
>      >>>>      reuses the
>      >>>>       >      >>     /_source_region/ field of each /RegionData
>     /to
>      >>>> record
>      >>>> the
>      >>>>       >      >>     correspongding shadow region index. We only
>      >>>> introduce
>      >>>>      a new
>      >>>>       >      >>     integer filed /_shadow /in the RegionData
>     class to
>      >>>>      indicate the
>      >>>>       >      >>     status of a region, a global /GrowableArray
>      >>>>      _free_shadow/ to
>      >>>>       >     store
>      >>>>       >      >>     the indices of shadow regions, and a global
>      >>>>      /Monitor/ to protect
>      >>>>       >      >>     the array. These information might help if
>     the
>      >>>> memory
>      >>>>      overhead
>      >>>>       >      >>     need to be evaluated.
>      >>>>       >      >>
>      >>>>       >      >>     Looking forward to your insight.
>      >>>>       >      >>
>      >>>>       >      >>     Best Regrads,
>      >>>>       >      >>     Haoyu Li,
>      >>>>       >      >>     Institute of Parallel and Distributed
>      >>>> Systems(IPADS),
>      >>>>       >      >>     School of Software,
>      >>>>       >      >>     Shanghai Jiao Tong University
>      >>>>       >      >>
>      >>>>       >      >>
>      >>>>       >      >>     Kim Barrett <kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>
>      >>>>      <mailto:kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>>
>      >>>>       >     <mailto:kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>
>      >>>> <mailto:kim.barrett at oracle.com <mailto:kim.barrett at oracle.com>>>
>      >>>>       >      >>     <mailto:kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>
>      >>>>      <mailto:kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>>
>      >>>>       >     <mailto:kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>
>      >>>>      <mailto:kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>>>>> 于2019年3月12日周二 上午6:11
>      >>>> 写道：
>      >>>>       >      >>
>      >>>>       >      >>         > On Mar 11, 2019, at 1:45 AM, Kim Barrett
>      >>>>       >      >>         <kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>
>      >>>>      <mailto:kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>> <mailto:kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>
>      >>>>      <mailto:kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>>>
>      >>>>       >     <mailto:kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>
>      >>>>      <mailto:kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>> <mailto:kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>
>      >>>>      <mailto:kim.barrett at oracle.com
>     <mailto:kim.barrett at oracle.com>>>>> wrote:
>      >>>>       >      >>         >
>      >>>>       >      >>         >> On Jan 24, 2019, at 3:58 AM, Haoyu Li
>      >>>>       >     <leihouyju at gmail.com <mailto:leihouyju at gmail.com>
>     <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>>
>      >>>>      <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>
>     <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>>>
>      >>>>       >      >>         <mailto:leihouyju at gmail.com
>     <mailto:leihouyju at gmail.com>
>      >>>>      <mailto:leihouyju at gmail.com <mailto:leihouyju at gmail.com>>
>      >>>>       >     <mailto:leihouyju at gmail.com
>     <mailto:leihouyju at gmail.com> <mailto:leihouyju at gmail.com
>     <mailto:leihouyju at gmail.com>>>>>
>      >>>>      wrote:
>      >>>>       >      >>         >>
>      >>>>       >      >>         >> Hi Kim,
>      >>>>       >      >>         >>
>      >>>>       >      >>         >> I have ported my patch to OpenJDK 13
>      >>>> according
>      >>>>      to your
>      >>>>       >      >>         instructions in your last mail, and the
>      >>>> patch is
>      >>>>      attached in
>      >>>>       >      >>         this mail. The patch does not change
>     much since
>      >>>>      PSGC is
>      >>>>       >     indeed
>      >>>>       >      >>         pretty stable.
>      >>>>       >      >>         >>
>      >>>>       >      >>         >> Also, I evaluate the correctness and
>      >>>>      performance of
>      >>>>       >     PS full
>      >>>>       >      >>         GC with benchmarks from DaCapo,
>     SPECjvm2008,
>      >>>> and
>      >>>>      JOlden
>      >>>>       >   �� suits
>      >>>>       >      >>         on a machine with dual Intel Xeon
>     E5-2618L v3
>      >>>> CPUs(16
>      >>>>       >     physical
>      >>>>       >      >>         cores), 64G DRAM and linux kernel
>     4.17. The
>      >>>>      evaluation
>      >>>>       >     result,
>      >>>>       >      >>         indicating 1.9X GC throughput
>     improvement on
>      >>>>      average, is
>      >>>>       >      >>         attached, too.
>      >>>>       >      >>         >>
>      >>>>       >      >>         >> However, I have no idea how to
>     further test
>      >>>> this
>      >>>>       >     patch for
>      >>>>       >      >>         both correctness and performance. Can
>     I please
>      >>>>      get any
>      >>>>       >      >>         guidance from you or some sponsor?
>      >>>>       >      >>         >
>      >>>>       >      >>         > Sorry I missed that you had sent an
>     updated
>      >>>>      version of the
>      >>>>       >      >>         patch.
>      >>>>       >      >>         >
>      >>>>       >      >>         > I’ve run the full regression suite
>     across
>      >>>>      Oracle-supported
>      >>>>       >      >>         platforms.  There are some
>      >>>>       >      >>         > failures, but there are almost
>     always some
>      >>>>      failures in the
>      >>>>       >      >>         later tiers right now.  I’ll start
>      >>>>       >      >>         > looking at them tomorrow to figure out
>      >>>> whether
>      >>>>      any of them
>      >>>>       >      >>         are relevant.
>      >>>>       >      >>         >
>      >>>>       >      >>         > I’m also planning to run some of our
>      >>>> performance
>      >>>>       >     benchmarks.
>      >>>>       >      >>         >
>      >>>>       >      >>         > I’ve lightly skimmed the proposed
>     changes.
>      >>>>      There might be
>      >>>>       >      >>         some code improvements
>      >>>>       >      >>         > to be made.
>      >>>>       >      >>         >
>      >>>>       >      >>         > I’m also wondering if this technique
>      >>>> applies to
>      >>>>      other
>      >>>>       >      >>         collectors.  It seems like both G1 and
>      >>>>       >      >>         > Shenandoah full gc’s might have similar
>      >>>>      issues?  If so, a
>      >>>>       >      >>         solution that is ParallelGC-specific
>      >>>>       >      >>         > is less interesting than one that
>     has broader
>      >>>>       >      >>         applicability.  Though maybe this
>     optimization
>      >>>>       >      >>         > is less important for G1 and
>     Shenandoah,
>      >>>> since
>      >>>> they
>      >>>>       >     actively
>      >>>>       >      >>         try to avoid full gc’s.
>      >>>>       >      >>         >
>      >>>>       >      >>         > I’m also not clear on how much
>     additional
>      >>>>      memory might be
>      >>>>       >      >>         temporarily allocated by this
>      >>>>       >      >>         > mechanism.
>      >>>>       >      >>
>      >>>>       >      >>         I’ve created a CR for this:
>      >>>>       >      >> https://bugs.openjdk.java.net/browse/JDK-8220465
>      >>>>       >      >>
>      >>>>       >      >
>      >>>>       >
>      >>>>
>      >>>
>      >>
>      >>
>