RFR (M): 8077144: Concurrent mark initialization takes too long

Mon Mar 7 22:21:01 UTC 2016

> On Mar 4, 2016, at 4:52 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8077144
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8077144/webrev/
> Testing:
> jprt, vm.gc testlist

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.hpp
 381   void allocate_large_bitmap(BitMap* bitmap, BitMap::idx_t const size_in_bits);

const for the size_in_bits here is superfluous.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp
 600   // First-time allocation of the bitmaps ensures that they are zeroed out. There
 601   // is no need to clear them and the liveness count data manually.

This comment seems misplaced, or even unnecessary.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp  
2145   SizeTFlagSetting fs(ArrayAllocatorMallocLimit, new_array_allocator_malloc_limit);

--- TBD: Are we *sure* we're still single-threaded here?  Is there a
    risk of futzing with ArrayAllocatorMallocLimit affecting some
    other thread?  Don't bother figuring this out until the need for
    the modification is discussed.

Wouldn't it be better to give ArrayAllocator a new API that
specifically requests mmap or adjusts the use-malloc/mmap boundary,
rather than this highly roundabout mechanism with potentially
non-local effect?  Perhaps a constructor argument that changes the
behavior of should_use_malloc.

A separate issue is some of the uses of FlagSetting and friends have
nothing to do with "flags" in the globals.hpp sense, and seem
semantically suspect.  Maybe an RFE here?

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp  
2144   size_t const new_array_allocator_malloc_limit = MIN2((size_t)os::vm_allocation_granularity() * 100, ArrayAllocatorMallocLimit);

I wonder why vm_allocation_granularity returns int rather than an
unsigned type?  Doing anything about that is out of scope for this
change, but unfortunately leads toward the cast.  Maybe an RFE here?

------------------------------------------------------------------------------ 
src/share/vm/gc/g1/g1ConcurrentMark.cpp  
2147   ArrayAllocator<BitMap::bm_word_t, mtGC> allocator(false);

The use of ArrayAllocator with false free_in_destructor introduces
cleanup problems that are not being handled.  All the new (in this
change set) and existing uses of that feature appear to be discarding
the information needed to free the allocated memory when done with it,
because they aren't capturing the distinction between malloc'ed vs
mmap'ed.

And that feature is being used and then handing the memory off to a
BitMap, which could itself attempt to free the memory (if resize were
to be called, which hopefully never happens in the cases at hand) and
have no clue about how to do so.

A possible way to deal with this would be to have an ArrayAllocator
support swap (even better would be C++11 move, but...) and swap the
external allocator with the one in the BitMap, so that when the BitMap
is destroyed it will properly deal with the deallocation.

This is a pre-existing issue, with the current change introducing a
new occurrence of the problem.

------------------------------------------------------------------------------ 
src/share/vm/gc/g1/g1ConcurrentMark.cpp   
2166   allocate_large_bitmap(&_region_bm, (BitMap::idx_t)(_g1h->max_regions()));

Why cast the max_regions value? I think currently there is no problem
(looks like uint => size_t). We might someday turn on things like
-Wconversion, in which case this cast would simply mask a (potential)
problem if a problem exists then.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp  
2167   allocate_large_bitmap(&_card_bm, align_size_up_(_g1h->reserved_region().byte_size(), CardTableModRefBS::card_size) / CardTableModRefBS::card_size);

Abuse of align_size_up_ macro, which is supposed to be for use in
places where a constant expression is required, with align_size_up
function being preferred.  Unfortunately, then one runs afoul of the
not very useful signature for align_size_up.  We could spend a lot of
time discussing shortcomings in that vicinity.  Maybe an RFE here?

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp  
2169   _count_card_bitmaps = NEW_C_HEAP_ARRAY(BitMap,  _max_worker_id, mtGC);

Extra space before _max_worker_id.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp   
2176   _count_marked_bytes = Padded2DArray<size_t, mtGC>::create_unfreeable(_max_worker_id, _g1h->max_regions(), false);

The call to create_unfreeable appears to be incorrect.  The third
argument is a size_t* to the allocation_size.  This call should be
implicitly converting false to a null pointer, and leaving the new
force_pretouch argument with its default true value.

This should have been caught by gcc -Wconversion-null, which is
enabled by default and I don't see any disabling of it in our build
system.  I'm puzzled as to how this builds?

What am I missing here?

------------------------------------------------------------------------------  
src/share/vm/gc/g1/g1ConcurrentMark.cpp 
2193     os::pretouch_memory(count_marked, count_marked + (size_t)G1CollectedHeap::heap()->max_regions() * sizeof(size_t));

Another unnecessary cast of max_regions.

------------------------------------------------------------------------------ 
src/share/vm/memory/allocation.hpp
 748   // Indicates whether the last (re-)allocation used the C-heap malloc or virtual space.
 749   bool use_malloc() const { return _use_malloc; }

I think used_malloc would be a better name for a query; use_malloc
seems like a request/command.  

------------------------------------------------------------------------------ 
src/share/vm/utilities/bitMap.cpp
  66   os::pretouch_memory((char*)word_addr(0), (char*)word_addr(size()));

Regarding the casts, here and in new G1PretouchInternalBitmapsTask,
why does pretouch_memory want char* rather than void*?  Maybe an RFE
here?  There are a few existing calls that could be cleaned up too.

------------------------------------------------------------------------------ 
test/gc/g1/Test2GbHeap.java
  28  * Skip test on 32 bit Windows: it typically does not support the many and large virtual memory reservations needed.
  29  * @requires (vm.gc == "G1" | vm.gc == "null") & ((sun.arch.data.model != "32") | (os.family != "windows"))

jtreg supports multiple @requires options, implicitly and'ing them
together (*).  Splitting the @requires here would make it more
readable.

And could the second clause be written as the following?

  !((sun.arg.data.model = "32") & (os.family = "windows"))

(*) http://hg.openjdk.java.net/code-tools/jtreg/ is not up to
date. Documentation source was updated for multiple @requires in
September 2015 (http://hg.openjdk.java.net/code-tools/jtreg/).
Implementation support appears to have existed for some time before
that.

------------------------------------------------------------------------------