RFR (XXL): 8213108: Improve work distribution during remembered set scan
Thomas Schatzl
thomas.schatzl at oracle.com
Mon Jun 17 12:23:43 UTC 2019
Hi Kim,
thanks for digging through this code.
On Sat, 2019-06-15 at 17:01 -0400, Kim Barrett wrote:
> > On Jun 6, 2019, at 4:33 AM, Thomas Schatzl <
> > thomas.schatzl at oracle.com> wrote:
> >
> > Webrevs:
> > http://cr.openjdk.java.net/~tschatzl/8213108/webrev.0_to_1/ (diff)
> > http://cr.openjdk.java.net/~tschatzl/8213108/webrev.1/ (complete)
> > Testing:
> > hs-tier1-5, rerun of perf benchmarks with no differences to before,
> > some additional applications that showed abnormal sampling thread
> > behavior.
>
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1BarrierSet.cpp
> removed:
> 28 #include "gc/g1/g1CardTable.inline.hpp"
>
[...]
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1CardTable.inline.hpp
> 44 assert(start_card_index % sizeof(size_t) == 0, "Start card
> index must be aligned.");
>
[...]
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1CardTable.inline.hpp
> 53 if (value == WordAllClean) {
> 54 *cur_word = WordAllDirty;
> 55 } else if (value == WordAllDirty) {
> 56 // do nothing.
> 57 } else {
> 58 // There is a mix of cards in there. Tread slowly.
>
> I think what you are trying to do here is (per byte)
> clean => dirty
> already_scanned => don't modify
> dirty => remains dirty, don't modify
> with the assumption that clean is much more common than others.
>
> It would be really helpful if there was some indication somewhere
> that already_scanned preservation was needed here, as it's not at all
> obvious from the code.
I added some comment elsewhere.
> The slow case can be made much faster with some bit twiddling,
> possibly fast enough to make some of the conditional "fast" cases not
> worth checking for. See JDK-8224840.
Yes, we talked about that :)
>
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1CardTable.hpp
> 75 static const size_t WordAllClean = (size_t)-1;
I used SIZE_MAX. I actually do not know why I always use (size_t)-1 -
bad habit.
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1CardTable.hpp
> 78 #ifdef _LP64
> 79 static const size_t WordAlreadyScanned = ...
> ...
> 86 #endif
>
> Consider the following:
>
> STATIC_ASSERT(BitsPerByte == 8);
> static const size_t WordAlreadyScanned = (SIZE_MAX / 255) *
> g1_card_already_scanned;
Thanks!
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1CollectedHeap.hpp
> 1181 // The variant with the HeapRegionClaimer guarantees that a
> particular region
> 1182 // will only handled by a single region.
>
> "handled by a single region" ??? Maybe by a single worker? Or
> processed only once?
Fixed.
>
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1EvacFailure.cpp
> 52 _g1h(G1CollectedHeap::heap()), _dcq(dcq), _ct(_g1h-
> >card_table()), _last_enqueued_card((size_t)-1) {}
>
> s/(size_t)-1/SIZE_MAX/
>
> Similarly here:
> src/hotspot/share/gc/g1/g1ParScanThreadState.cpp
> 54 _last_enqueued_card((size_t)-1),
>
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
> 72 // After all evacuation is done, we clear the card table.
>
> "clear" suggests setting to zero. But that would be "all dirty",
> which is not what we want. Maybe better to says something like
> "reset to clean" or something like that.
>
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
> 74 // Work distribution occurs on "chunk" basis, i.e. contiguous
> ranges of cards.
> ...
> 79 // Within these chunks, threads first claim "blocks" of cards,
> i.e. contiguous ranges
>
> These seem a bit contradictory. The work distribution seems to occur
> at the block layer. We have regions which contain chunks, which
> contain blocks, and workers claim blocks.
I fixed the comment. Work distribution is performed on chunk layer. The
block layer is only there to decrease scanning setup cost.
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
> 96 // Random (power of two) number of cards we want to claim per
> thread. This corresponds
>
> I think the use of "Random" here is misleading. Suggest removing
> parens, and "Random", and perhaps change the final sentence about
> being the same as the chunk size for Parallel GC to suggest that no
> measurements went into choosing this number.
I actually tried both 128 and 256 entries with no perceptible
difference (CMS uses 256). For one reason or another I took 128 to
favor(?) smaller applications.
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
> 82 class G1RemSetScanState : public CHeapObj<mtGC> {
>
> I have a preference for non-trivial nested classes being declared in
> the owning class but defined outside it. Similarly for non-trivial
> functions. I find the style used here pretty hard to read, having
> trouble finding the APIs for classes and the boundaries between
> functions and between classes. (I've seen code editors that can help
> with that, but not everyone uses such.)
>
> There are certainly lots of non-trivial classes in hotspot whose
> entire definition is nested in the class definition. (Not that I'm a
> fan of such.) But this seems kind of excessive. The class
> definition is nearly 500 lines long, with 7(?) nested classes. (I
> was almost surprised not to find any multiply nested classes; unless
> I missed them.)
>
> (And I *do* like nested classes as a scoping mechanism, just not laid
> out like this.)
>
> But PLEASE don't act on this complaint in the middle of this review!
I filed JDK-8226232, assigned to me.
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
> 136 _buffer = NEW_C_HEAP_ARRAY(uint, max_regions, mtGC);
> 137 _contains = NEW_C_HEAP_ARRAY(bool, max_regions, mtGC);
>
> Why initialize to NULL in the initialier list and then reset here,
> rather than initializing with allocations? (Even if we had exceptions
> and exception safety concerns we shouldn't need anything like that,
> as we'd probably be using smart pointers too, and that would
> similarly obviate such separation.)
>
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
> 142 static size_t chunk_size() { return M; }
> 143 ~G1DirtyRegions() {
>
> Missing line separator.
>
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
> 185 bool filter_region(uint const region_idx) const {
>
> "filter" can be ambiguious; sometimes it's used in the sense of
> removal, sometimes in the sense of collecting. The name here also
> doesn't really tell me anything. The comment suggests something like
> consider_for_card_dirtying or should_dirty, or something like that
> (which inverts the sense).
>
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
> 198 bool filter_and_dirty(uint const region_idx) {
>
> This seems really badly named. It either filters out (when
> filter_region is true) OR it collects for dirtying. Better might be
> something like collect_if_should_dirty, (which inverts the sense).
>
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
> 540 if (_collection_set_iter_state != NULL) {
> 543 if (_card_table_scan_state != NULL) {
> 546 if (_region_scan_chunks != NULL) {
> 549 if (_scan_top != NULL) {
>
> Unnecessary.
>
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.cpp
> 798 size_t cur_pos() {
> 799 _cur_addr++;
>
> Confusingly named. The name suggests an accessor, but it has side
> effects. Maybe something like get_and_step_pos()?
>
Fixed.
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.hpp
> 98 void merge_heap_roots(bool remembered_set_only,
> G1GCPhaseTimes::GCParPhases merge_phase);
>
> I wonder if this "heap roots" protocol is misplaced in the G1RemSet
> class. But that seems like something for a later cleanup pass at
> this point.
I kept this at this point.
>
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/g1RemSet.hpp
> 178 static PerRegionTable* volatile _free_list;
>
> For future cleanup, free list management could use LockFreeStack.
PerRegionTable (all PRT implementations) will be removed completely in
the near(?) future with JDK-8017163. So I tried to do minimal changes
to this code at this time.
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/g1/heapRegionRemSet.hpp
> 189 void add_card_work(CardIdx_t from_card, bool par) {
> 192 if (_bm.par_at_put(from_card, 1)) {
> 196 _bm.at_put(from_card, 1);
>
> Pre-existing: These should be using [par_]set_bit rather than
> [par_]at_put with a value of 1.
Fixed. This for some reason required moving a few methods into the
.inline.hpp which is the more appropriate place anyway.
>
> -------------------------------------------------------------------
> -----------
> src/hotspot/share/gc/shared/cardTable.hpp
> 107 clean_card_mask = clean_card - 15,
>
> This change removed both claimed_card and deferred_card, but is only
> reducing the cleared bits in clean_card_mask by 1. Was the old value
> incorrect (having not accounted for g1_young_gen)? Or should this be
> removing two cleared bits from the mask?
>
> Hm, clean_card_mask seems to have been G1-specific, and after this
> change it seems to be unused.
>
> -------------------------------------------------------------------
> -----------
Removed.
Thanks again for your review.
New webrev:
http://cr.openjdk.java.net/~tschatzl/8213108/webrev.1_to_2/ (diff)
http://cr.openjdk.java.net/~tschatzl/8213108/webrev.2/ (full)
Testing: hs-tier1-3
Thanks,
Thomas
More information about the hotspot-gc-dev
mailing list