[PATCH] enhancement to ParallelScavenge Full GC

Mon Dec 14 15:57:43 UTC 2015

Thanks, Thomas!

I will refine my code carefully according to your suggestions as soon as
possible.

Since I'm new in contributing to OpenJDK, actually I'm not sure how to
upload when my new code is ready?
 Just upload an attach of diffs as I did, or some other ways?

Besides, I'm currently processing my signed OCA for the committer rights,
I'm not sure how long it will take, does it matter?

As for the VM Thread, it does a serial adjust_roots() within the current
thread (VM thread) during PSParallelCompact::invoke_no_policy().
 I think psCompactionManager.cpp:117 and the surrounding comments confirm
the use of VMThread in full GC.
Do you agree with my above understanding of the use of VMThread?

Sincerely,
  Tianyang

2015-12-14 22:48 GMT+08:00 Thomas Schatzl <thomas.schatzl at oracle.com>:

> Hi Tianyang,
>
> On Mon, 2015-12-14 at 15:55 +0800, ray alex wrote:
> > Hi all,
> >
> > Can I have reviews for the following enhancement?
> >
>
> Thanks for contributing :) Here is a quick review.
>
> Note that I am typically not working on Parallel GC, so I am not very
> proficient in this code and did not take the time to look into it in
> more detail yet. Also, I did not do any performance testing, so I am
> mostly criticizing coding style :)
>
> - please remove obvious debug code, like most of the code in
> psParallelCompact.cpp. Also, for whatever reason the TraceTime
> constructor gets passed a print_phases() method that does not exist.
>
> - I am not sure why the VM thread needs to have the cached
> live-words-in-range information. For that matter, I think that putting
> new members into the Thread class is a no-go because these members are
> only used by the GC threads, penalizing everyone else. I think the
> correct way of putting GC task thread specific data is adding these
> members to the GCTaskThread class.
>
> Since for worker threads Thread::current() is a GCTaskThread(), you can
> still use it to retrieve the values (and add a
> Thread::as_gctask_thread() method like as_worker_thread() to retrieve
> it).
>
> Also, please create appropriate getters and/or setters for the
> variables, taking into account below comments.
>
> - in the new ParMarkBitMap::live_words_in_range() method, please do not
> use l-value references. Hotspot code does not use them, and it does not
> seem worth to change this "do as the surrounding code does" rule.
>
> Following comments are for ParMarkBitMap::live_words_in_range():
>
> - I do not see a point in opening extra scopes for straight-line code
> (like in parMarkBitMap.cpp:161)
>
> - the second part of the new live_words_in_range() looks like a verbatim
> copy of (the now) live_words_in_range_helper(). Please refactor the code
> appropriately.
>
> E.g. I would prefer if the code looked like the following (actual naming
> up to debate, I think you will find better names):
>
> live_words_in_range() {
>   if (live_words_in_range_in_cache()) {
>     return cached-values();
>   }
>   return live_words_in_range_helper();
> }
>
> or something similar. That would be a lot better to read and maintain
> instead of the copy&paste.
>
> - use of uint64_t for casting addresses/comparisons is wrong and just
> unnecessary type conversion on 32 bit. The type the language provides
> for this purpose is (u)intptr_t.
>
> Also, you probably want to use the pointer_delta function to calculate
> pointer deltas without overflow.
>
> > Description:
> > The change addresses the inefficiency in the live_words_in_range()
> > routine in ParallelScavenge Full GC,
> > which generally costs most of the time during the compacting phase.
> >
> >Our idea records last query information of live_words_in_range as well
> >as the query result.
> > Our approach reuses last query result so that can reduce much
> > computation of bitmap searching.
>
> :)
>
> >
> > Patch:  as attached.
> >
> > Testing:
> > Manual testing of full GC intensive applications including JOlden,
> > Dacapo, SPECjvm2008 benchmarks.
> >
> > Results show that our enhancement can reduce much full GC time by
> > 20~50%, depending on application scenarios.
>
> I am not sure if JOlden (benchmark from '95), or Dacapo (from '09) is
> representative of today's workloads (or specjvm2008), but the idea
> sounds timeless :)
>
> I will run (a slightly modified version, see pointer_delta() comment
> above) through some basic testing, and then some benchmarks.
>
> > Lei Tianyang
> > Institute of Parallel and Distributed System (IPADS)
> > Shanghai Jiao Tong University
>
> Another procedural comment, you do not seem to have signed an Oracle's
> Contributor agreement. We need that to be able to take your patch,
> unless you are covered by some companies' OCA (but then we would need
> you to use an email address from that company I think).
>
> Please see http://www.oracle.com/technetwork/community/oca-486395.html#l
> for more information.
>
> Thanks,
>   Thomas
>
> >
>
>
>

-- 
Lei Tianyang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20151214/c54d83a4/attachment.htm>