RFR (M): 8077144: Concurrent mark initialization takes too long
Kim Barrett
kim.barrett at oracle.com
Sat Mar 26 01:38:16 UTC 2016
> On Mar 15, 2016, at 6:12 PM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>
> Hi Mikael,
>
> updated webrev at
> http://cr.openjdk.java.net/~tschatzl/8077144/webrev.3/ (full)
> http://cr.openjdk.java.net/~tschatzl/8077144/webrev.2_to_3/ (diff)
>
> which implements the suggested changes.
I've run out of steam for looking at g1ConcurrentMark.cpp changes for
now. I may have more comments there later. I like the new approach
though.
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp
51 #include "logging/logTag.hpp"
Why add this #include? log.hpp appears to be the "include all of
logging" header, and it's already included. And I didn't find
anything specific to log tags in the changes.
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp
1142 inline void set_card_bitmap_range(BitMap* bm,
Is this *really* necessary? I suspect set_range with a small_range
hint would get much of the benefit (especially with the
_last_marked_bit_idx optimization in the caller), with a *lot* less
code. E.g.
bm.set_range(beg, end, BitMap::small_range);
And that's ignoring the question of where this should actually cross
over between looping and calling set_range; do you have measurements
to show that 8 is actually a good crossing point?
Just using set_range would also make this assert superfluous:
1148 assert(end_idx <= bm->size(), "sanity");
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp
1151 // use par_at_put_range (if parallel). The range is made up of the
I see no parallelism here. Maybe this comment has some left over cut
and paste?
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp
1199 BitMap::idx_t card_num = (BitMap::idx_t)(uintptr_t(addr) >> CardTableModRefBS::card_shift);
I would like the cast to BitMap::idx_t removed; I don't think it
serves any useful purpose because idx_t is size_t is big enough. And
if it did perform a narrowing, I'd rather not have that hidden if/when
we someday turn on -Wconversion or the like.
Personally, I'd prefer an explict reinterpret_cast for the uintptr_t
conversion, but I realize using C++ cast operators is still kind of an
unusual thing in this code base.
1286 (BitMap::idx_t)(uintptr_t(G1CollectedHeap::heap()->reserved_region().start()) >> CardTableModRefBS::card_shift);
Same thing here.
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp
1260 int obj_sz = obj->size();
...
1268 marked_bytes += (size_t)obj_sz * HeapWordSize;
[Pre-existing, feel free to ignore.]
obj_sz could be declared size_t, since obj->size() is not going to
return a negative value, and then the cast could be removed.
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp
1206 void set_bit_for_region(HeapRegion* hr) {
1207 BitMap::idx_t index = (BitMap::idx_t) hr->hrm_index();
1208 _region_bm->par_at_put(index, true);
1209 }
This is the only use of _region_bm in the G1LiveDataHelper class. Of
the three places that make one of these, two provide a region bitmap
and one passes in NULL, meaning for that one it can't actually call
this function, which suggests this function shouldn't really be part
of the helper class. (It could be a static member taking a bitmap
argument, or a similar file scoped helper function.)
[Pre-existing]
The explicit cast of hrm_index to BitMap::idx_t serves no useful
purpose, and if it were a narrowing conversion the cast would just
mask that if we ever turn on -Wconversion or the like.
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp
1281 //assert(region_bm != NULL, "");
1282 assert(card_bm != NULL, "");
Left-over debugging code? Later: Oh, the commented out assert is
related to the above discussion about set_bit_for_region.
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp
688 void G1ConcurrentMark::cleanup_for_next_mark() {
...
699 clear_bitmap(_nextMarkBitMap, _parallel_workers, true);
700
701 // Clear the live count data. If the marking has been aborted, the abort()
702 // call already did that.
703 if (!has_aborted()) {
704 clear_all_live_data(_parallel_workers);
705 DEBUG_ONLY(verify_all_live_data());
706 }
[Pre-existing]
abort() also clears the _nextMarkBitMap, but appears to do so in STW.
Seems like one of those is superfluous. And wouldn't it be better to
not do any of these in (STW) abort() but rather do all of them here?
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp
2496 _bitmap->clear_range(start, end);
If it weren't for the risk of tripping the (IMO bogus) assert on
minimum size in BitMap::clear_large_range, I would suggest calling
clear_range with a large_range hint.
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp
2475 class G1ClearAllLiveDataTask : public AbstractGangTask {
...
2494 BitMap::idx_t start = M * BitsPerByte * to_process;
2495 BitMap::idx_t end = MIN2(start + M * BitsPerByte, _bitmap->size());
...
2501 void G1ConcurrentMark::clear_all_live_data(WorkGang* workers) {
...
2506 size_t const num_chunks = align_size_up(_card_live_bm.size_in_words() * HeapWordSize, M) / M;
These two snippets implicitly share knowledge that the chunking size
for tasks is 1Mbyte. One uses "chunk" and the other uses "task"
terminology, which further obscures this relationship. Consistent
terminology and a static constant in G1ClearAllLiveDataTask that is
used in both places would help.
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp
1130 class G1LiveDataHelper VALUE_OBJ_CLASS_SPEC {
...
1142 inline void set_card_bitmap_range(BitMap* bm,
This uses non-parallel set_bit and set_range. Parallel callers are
operating on distinct heap regions. If that isn't guaranteed, then two
parallel tasks could try to modify bits in the same BitMap word
simultaneously. We're OK so long as G1HeapRegionSize is a multiple of
BitsPerWord * (1 << CardTableModRefBs::card_shift)
(which is 2^15 with 512 byte cards on a 64bit machine), but
G1HeapRegionSize is a product option with a relatively unconstrained
value (only bounded between 1M and 32M, no alignment constraint).
Oh, but, it looks like HeapRegion::setup_heap_region_size ensures the
region size is a power of 2 within the min/max bounds, rounding up an
arbitrary command line value if necessary.
So we're OK, though it required some searching to convince myself of
that. An assertion somewhere in G1LiveDataHelper might be
appropriate, since it doesn't ever use par_xxx bitmap functions,
unlike the old code, which selected between them based on an argument
that no longer exists.
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.hpp
307 // A set bit indicates that the given card contains a live object.
"... contains a live object." is perhaps imprecise. The card might
contain multiple objects, and might contain parts of one or two
objects that cross either card boundary. Something like "... contains
at least part of at least one live object." would be more precise.
But maybe I'm being overly pedantic.
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.hpp
266 class G1ConcurrentMark: public CHeapObj<mtGC> {
...
280 protected:
[Pre-existing, feel free to ignore.]
Why is all this stuff protected rather than private?
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.hpp
855 // Precondition: obj is below region's NTAMS.
There is no longer an argument called "region"; it's now the region
containing obj. (Actually, it always ways, but there used to also be a
precondition that obj was in region.)
------------------------------------------------------------------------------
src/share/vm/gc/g1/g1OopClosures.inline.hpp
134 _cm->grayRoot(obj, hr);
With the removal of _worker_i as an argument here, the _worker_i
member of G1RootRegionScanClosure appears to be unused.
------------------------------------------------------------------------------
src/share/vm/utilities/bitMap.cpp
72 os::pretouch_memory((char*)word_addr(0), (char*)word_addr(size()));
Casts to char* are no longer needed, since fix for JDK-8151414.
Also, copyright needs updating.
------------------------------------------------------------------------------
src/share/vm/utilities/bitMap.hpp
147 static idx_t size_in_words(size_t size_in_bits) {
I find the name of this function confusing in conjunction with the
no-arg ordinary member function; I'm not keen on overloads with with
very different semantics and usage. Clearer (to me) here might be
something like calc_size_in_words.
Also, copyright needs updating.
------------------------------------------------------------------------------
More information about the hotspot-gc-dev
mailing list