RFR (XXL): 8213108: Improve work distribution during remembered set scan

Thomas Schatzl thomas.schatzl at oracle.com
Mon Jun 17 12:23:43 UTC 2019


Hi Kim,

  thanks for digging through this code.

On Sat, 2019-06-15 at 17:01 -0400, Kim Barrett wrote:
> > On Jun 6, 2019, at 4:33 AM, Thomas Schatzl <
> > thomas.schatzl at oracle.com> wrote:
> > 
> > Webrevs:
> > http://cr.openjdk.java.net/~tschatzl/8213108/webrev.0_to_1/ (diff)
> > http://cr.openjdk.java.net/~tschatzl/8213108/webrev.1/ (complete)
> > Testing:
> > hs-tier1-5, rerun of perf benchmarks with no differences to before,
> > some additional applications that showed abnormal sampling thread
> > behavior.
> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1BarrierSet.cpp
> removed:
>   28 #include "gc/g1/g1CardTable.inline.hpp"
> 
[...]
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1CardTable.inline.hpp
>   44   assert(start_card_index % sizeof(size_t) == 0, "Start card
> index must be aligned.");
> 
[...]
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1CardTable.inline.hpp
>   53     if (value == WordAllClean) {
>   54       *cur_word = WordAllDirty;
>   55     } else if (value == WordAllDirty) {
>   56       // do nothing.
>   57     } else {
>   58       // There is a mix of cards in there. Tread slowly.
> 
> I think what you are trying to do here is (per byte)
>   clean => dirty
>   already_scanned => don't modify
>   dirty => remains dirty, don't modify
> with the assumption that clean is much more common than others.
> 
> It would be really helpful if there was some indication somewhere
> that already_scanned preservation was needed here, as it's not at all
> obvious from the code.

I added some comment elsewhere.

> The slow case can be made much faster with some bit twiddling,
> possibly fast enough to make some of the conditional "fast" cases not
> worth checking for.  See JDK-8224840.

Yes, we talked about that :)

> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1CardTable.hpp
>   75   static const size_t WordAllClean = (size_t)-1;

I used SIZE_MAX. I actually do not know why I always use (size_t)-1 -
bad habit.

> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1CardTable.hpp
>   78 #ifdef _LP64
>   79   static const size_t WordAlreadyScanned = ...
> ...
>   86 #endif
> 
> Consider the following:
> 
>   STATIC_ASSERT(BitsPerByte == 8);
>   static const size_t WordAlreadyScanned = (SIZE_MAX / 255) *
> g1_card_already_scanned;

Thanks!

> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1CollectedHeap.hpp
> 1181   // The variant with the HeapRegionClaimer guarantees that a
> particular region
> 1182   // will only handled by a single region.
> 
> "handled by a single region" ???  Maybe by a single worker?  Or
> processed only once?

Fixed.

> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1EvacFailure.cpp
>   52     _g1h(G1CollectedHeap::heap()), _dcq(dcq), _ct(_g1h-
> >card_table()), _last_enqueued_card((size_t)-1) {}
> 
> s/(size_t)-1/SIZE_MAX/
> 
> Similarly here:
> src/hotspot/share/gc/g1/g1ParScanThreadState.cpp
>   54     _last_enqueued_card((size_t)-1),
> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
>   72 // After all evacuation is done, we clear the card table.
> 
> "clear" suggests setting to zero.  But that would be "all dirty",
> which is not what we want.  Maybe better to says something like
> "reset to clean" or something like that.
> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
>   74 // Work distribution occurs on "chunk" basis, i.e. contiguous
> ranges of cards.
> ...
>   79 // Within these chunks, threads first claim "blocks" of cards,
> i.e. contiguous ranges
> 
> These seem a bit contradictory. The work distribution seems to occur
> at the block layer. We have regions which contain chunks, which
> contain blocks, and workers claim blocks.

I fixed the comment. Work distribution is performed on chunk layer. The
block layer is only there to decrease scanning setup cost.

> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp 
>   96   // Random (power of two) number of cards we want to claim per
> thread. This corresponds
> 
> I think the use of "Random" here is misleading. Suggest removing
> parens, and "Random", and perhaps change the final sentence about
> being the same as the chunk size for Parallel GC to suggest that no
> measurements went into choosing this number.

I actually tried both 128 and 256 entries with no perceptible
difference (CMS uses 256). For one reason or another I took 128 to
favor(?) smaller applications.

> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp 
>   82 class G1RemSetScanState : public CHeapObj<mtGC> {
> 
> I have a preference for non-trivial nested classes being declared in
> the owning class but defined outside it. Similarly for non-trivial
> functions. I find the style used here pretty hard to read, having
> trouble finding the APIs for classes and the boundaries between
> functions and between classes. (I've seen code editors that can help
> with that, but not everyone uses such.)
> 
> There are certainly lots of non-trivial classes in hotspot whose
> entire definition is nested in the class definition.  (Not that I'm a
> fan of such.)  But this seems kind of excessive.  The class
> definition is nearly 500 lines long, with 7(?) nested classes.  (I
> was almost surprised not to find any multiply nested classes; unless
> I missed them.)
> 
> (And I *do* like nested classes as a scoping mechanism, just not laid
> out like this.)
> 
> But PLEASE don't act on this complaint in the middle of this review!

I filed JDK-8226232, assigned to me.

> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp 
>  136       _buffer = NEW_C_HEAP_ARRAY(uint, max_regions, mtGC);
>  137       _contains = NEW_C_HEAP_ARRAY(bool, max_regions, mtGC);
> 
> Why initialize to NULL in the initialier list and then reset here,
> rather than initializing with allocations? (Even if we had exceptions
> and exception safety concerns we shouldn't need anything like that,
> as we'd probably be using smart pointers too, and that would
> similarly obviate such separation.)
> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp 
>  142     static size_t chunk_size() { return M; }
>  143     ~G1DirtyRegions() {
> 
> Missing line separator.
> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp 
>  185   bool filter_region(uint const region_idx) const {
> 
> "filter" can be ambiguious; sometimes it's used in the sense of
> removal, sometimes in the sense of collecting.  The name here also
> doesn't really tell me anything.  The comment suggests something like
> consider_for_card_dirtying or should_dirty, or something like that
> (which inverts the sense).
> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp 
>  198     bool filter_and_dirty(uint const region_idx) {
> 
> This seems really badly named.  It either filters out (when
> filter_region is true) OR it collects for dirtying.  Better might be
> something like collect_if_should_dirty, (which inverts the sense).
> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp 
>  540     if (_collection_set_iter_state != NULL) {
>  543     if (_card_table_scan_state != NULL) {
>  546     if (_region_scan_chunks != NULL) {
>  549     if (_scan_top != NULL) {
> 
> Unnecessary.
> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp 
>  798   size_t cur_pos() {
>  799     _cur_addr++;      
> 
> Confusingly named.  The name suggests an accessor, but it has side
> effects.  Maybe something like get_and_step_pos()?
> 

Fixed.

> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.hpp 
>   98   void merge_heap_roots(bool remembered_set_only,
> G1GCPhaseTimes::GCParPhases merge_phase);
> 
> I wonder if this "heap roots" protocol is misplaced in the G1RemSet
> class.  But that seems like something for a later cleanup pass at
> this point. 

I kept this at this point.

> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.hpp 
>  178   static PerRegionTable* volatile _free_list;
> 
> For future cleanup, free list management could use LockFreeStack.

PerRegionTable (all PRT implementations) will be removed completely in
the near(?) future with JDK-8017163. So I tried to do minimal changes
to this code at this time.

> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/heapRegionRemSet.hpp
>  189   void add_card_work(CardIdx_t from_card, bool par) {
>  192         if (_bm.par_at_put(from_card, 1)) {
>  196         _bm.at_put(from_card, 1);
> 
> Pre-existing: These should be using [par_]set_bit rather than
> [par_]at_put with a value of 1.

Fixed. This for some reason required moving a few methods into the
.inline.hpp which is the more appropriate place anyway.

> 
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/shared/cardTable.hpp
>  107     clean_card_mask             = clean_card - 15,
> 
> This change removed both claimed_card and deferred_card, but is only
> reducing the cleared bits in clean_card_mask by 1.  Was the old value
> incorrect (having not accounted for g1_young_gen)?  Or should this be
> removing two cleared bits from the mask?
> 
> Hm, clean_card_mask seems to have been G1-specific, and after this
> change it seems to be unused.
> 
> -------------------------------------------------------------------
> -----------

Removed.

Thanks again for your review.

New webrev:
http://cr.openjdk.java.net/~tschatzl/8213108/webrev.1_to_2/ (diff)
http://cr.openjdk.java.net/~tschatzl/8213108/webrev.2/ (full)

Testing: hs-tier1-3

Thanks,
  Thomas




More information about the hotspot-gc-dev mailing list