RFR (XXL): 8213108: Improve work distribution during remembered set scan
Kim Barrett
kim.barrett at oracle.com
Sat Jun 15 21:01:50 UTC 2019
> On Jun 6, 2019, at 4:33 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>
> Webrevs:
> http://cr.openjdk.java.net/~tschatzl/8213108/webrev.0_to_1/ (diff)
> http://cr.openjdk.java.net/~tschatzl/8213108/webrev.1/ (complete)
> Testing:
> hs-tier1-5, rerun of perf benchmarks with no differences to before,
> some additional applications that showed abnormal sampling thread
> behavior.
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1BarrierSet.cpp
removed:
28 #include "gc/g1/g1CardTable.inline.hpp"
Not sure why this is being removed. I see references to G1CardTable
stuff in this file, which suggests that with the removal we're getting
that class indirectly via some other #include. Maybe replace with
include of g1CardTable.hpp?
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1CardTable.inline.hpp
44 assert(start_card_index % sizeof(size_t) == 0, "Start card index must be aligned.");
Use is_aligned(start_card_index, sizeof(size_t)).
Similarly for following num_cards assertion?
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1CardTable.inline.hpp
53 if (value == WordAllClean) {
54 *cur_word = WordAllDirty;
55 } else if (value == WordAllDirty) {
56 // do nothing.
57 } else {
58 // There is a mix of cards in there. Tread slowly.
I think what you are trying to do here is (per byte)
clean => dirty
already_scanned => don't modify
dirty => remains dirty, don't modify
with the assumption that clean is much more common than others.
It would be really helpful if there was some indication somewhere that
already_scanned preservation was needed here, as it's not at all
obvious from the code.
The slow case can be made much faster with some bit twiddling,
possibly fast enough to make some of the conditional "fast" cases not
worth checking for. See JDK-8224840.
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1CardTable.hpp
75 static const size_t WordAllClean = (size_t)-1;
The cast shouldn't be needed. (Is it there to silence some compiler
warning?) Also, consider using SIZE_MAX instead.
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1CardTable.hpp
78 #ifdef _LP64
79 static const size_t WordAlreadyScanned = ...
...
86 #endif
Consider the following:
STATIC_ASSERT(BitsPerByte == 8);
static const size_t WordAlreadyScanned = (SIZE_MAX / 255) * g1_card_already_scanned;
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1CollectedHeap.hpp
1181 // The variant with the HeapRegionClaimer guarantees that a particular region
1182 // will only handled by a single region.
"handled by a single region" ??? Maybe by a single worker? Or
processed only once?
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1EvacFailure.cpp
52 _g1h(G1CollectedHeap::heap()), _dcq(dcq), _ct(_g1h->card_table()), _last_enqueued_card((size_t)-1) {}
s/(size_t)-1/SIZE_MAX/
Similarly here:
src/hotspot/share/gc/g1/g1ParScanThreadState.cpp
54 _last_enqueued_card((size_t)-1),
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1RemSet.cpp
72 // After all evacuation is done, we clear the card table.
"clear" suggests setting to zero. But that would be "all dirty",
which is not what we want. Maybe better to says something like "reset
to clean" or something like that.
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1RemSet.cpp
74 // Work distribution occurs on "chunk" basis, i.e. contiguous ranges of cards.
...
79 // Within these chunks, threads first claim "blocks" of cards, i.e. contiguous ranges
These seem a bit contradictory. The work distribution seems to occur
at the block layer. We have regions which contain chunks, which
contain blocks, and workers claim blocks.
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1RemSet.cpp
96 // Random (power of two) number of cards we want to claim per thread. This corresponds
I think the use of "Random" here is misleading. Suggest removing
parens, and "Random", and perhaps change the final sentence about
being the same as the chunk size for Parallel GC to suggest that no
measurements went into choosing this number.
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1RemSet.cpp
82 class G1RemSetScanState : public CHeapObj<mtGC> {
I have a preference for non-trivial nested classes being declared in
the owning class but defined outside it. Similarly for non-trivial
functions. I find the style used here pretty hard to read, having
trouble finding the APIs for classes and the boundaries between
functions and between classes. (I've seen code editors that can help
with that, but not everyone uses such.)
There are certainly lots of non-trivial classes in hotspot whose
entire definition is nested in the class definition. (Not that I'm a
fan of such.) But this seems kind of excessive. The class definition
is nearly 500 lines long, with 7(?) nested classes. (I was almost
surprised not to find any multiply nested classes; unless I missed them.)
(And I *do* like nested classes as a scoping mechanism, just not laid
out like this.)
But PLEASE don't act on this complaint in the middle of this review!
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1RemSet.cpp
136 _buffer = NEW_C_HEAP_ARRAY(uint, max_regions, mtGC);
137 _contains = NEW_C_HEAP_ARRAY(bool, max_regions, mtGC);
Why initialize to NULL in the initialier list and then reset here,
rather than initializing with allocations? (Even if we had exceptions
and exception safety concerns we shouldn't need anything like that, as
we'd probably be using smart pointers too, and that would similarly
obviate such separation.)
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1RemSet.cpp
142 static size_t chunk_size() { return M; }
143 ~G1DirtyRegions() {
Missing line separator.
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1RemSet.cpp
185 bool filter_region(uint const region_idx) const {
"filter" can be ambiguious; sometimes it's used in the sense of
removal, sometimes in the sense of collecting. The name here also
doesn't really tell me anything. The comment suggests something like
consider_for_card_dirtying or should_dirty, or something like that
(which inverts the sense).
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1RemSet.cpp
198 bool filter_and_dirty(uint const region_idx) {
This seems really badly named. It either filters out (when
filter_region is true) OR it collects for dirtying. Better might be
something like collect_if_should_dirty, (which inverts the sense).
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1RemSet.cpp
540 if (_collection_set_iter_state != NULL) {
543 if (_card_table_scan_state != NULL) {
546 if (_region_scan_chunks != NULL) {
549 if (_scan_top != NULL) {
Unnecessary.
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1RemSet.cpp
798 size_t cur_pos() {
799 _cur_addr++;
Confusingly named. The name suggests an accessor, but it has side
effects. Maybe something like get_and_step_pos()?
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1RemSet.hpp
98 void merge_heap_roots(bool remembered_set_only, G1GCPhaseTimes::GCParPhases merge_phase);
I wonder if this "heap roots" protocol is misplaced in the G1RemSet
class. But that seems like something for a later cleanup pass at this
point.
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1RemSet.hpp
178 static PerRegionTable* volatile _free_list;
For future cleanup, free list management could use LockFreeStack.
------------------------------------------------------------------------------
src/hotspot/share/gc/g1/heapRegionRemSet.hpp
189 void add_card_work(CardIdx_t from_card, bool par) {
192 if (_bm.par_at_put(from_card, 1)) {
196 _bm.at_put(from_card, 1);
Pre-existing: These should be using [par_]set_bit rather than
[par_]at_put with a value of 1.
------------------------------------------------------------------------------
src/hotspot/share/gc/shared/cardTable.hpp
107 clean_card_mask = clean_card - 15,
This change removed both claimed_card and deferred_card, but is only
reducing the cleared bits in clean_card_mask by 1. Was the old value
incorrect (having not accounted for g1_young_gen)? Or should this be
removing two cleared bits from the mask?
Hm, clean_card_mask seems to have been G1-specific, and after this
change it seems to be unused.
------------------------------------------------------------------------------
More information about the hotspot-gc-dev
mailing list