RFR (M): 8077144: Concurrent mark initialization takes too long

Mon Mar 7 23:33:05 UTC 2016

Hi Kim,

  thanks for your comments.

On Mon, 2016-03-07 at 17:21 -0500, Kim Barrett wrote:
> > On Mar 4, 2016, at 4:52 AM, Thomas Schatzl <
> > thomas.schatzl at oracle.com> wrote:
> > 
> > CR:
> > https://bugs.openjdk.java.net/browse/JDK-8077144
> > Webrev:
> > http://cr.openjdk.java.net/~tschatzl/8077144/webrev/
> > Testing:
> > jprt, vm.gc testlist
> 
> ---------------------------------------------------------------------
> ---------
> src/share/vm/gc/g1/g1ConcurrentMark.hpp
>  381   void allocate_large_bitmap(BitMap* bitmap, BitMap::idx_t const
> size_in_bits);
> 
> const for the size_in_bits here is superfluous.

Done.

> ---------------------------------------------------------------------
> ---------
> src/share/vm/gc/g1/g1ConcurrentMark.cpp
>  600   // First-time allocation of the bitmaps ensures that they are
> zeroed out. There
>  601   // is no need to clear them and the liveness count data
> manually.
> 
> This comment seems misplaced, or even unnecessary.

The intent of this comment has been to avoid questions why the data
structures are not initialized here. Removing it.

> ---------------------------------------------------------------------
> ---------
> src/share/vm/gc/g1/g1ConcurrentMark.cpp  
> 2145   SizeTFlagSetting fs(ArrayAllocatorMallocLimit,
> new_array_allocator_malloc_limit);
> 
> --- TBD: Are we *sure* we're still single-threaded here?  Is there a
>     risk of futzing with ArrayAllocatorMallocLimit affecting some
>     other thread?  Don't bother figuring this out until the need for
>     the modification is discussed.
> 
> Wouldn't it be better to give ArrayAllocator a new API that
> specifically requests mmap or adjusts the use-malloc/mmap boundary,
> rather than this highly roundabout mechanism with potentially
> non-local effect?  Perhaps a constructor argument that changes the
> behavior of should_use_malloc.
> 

Yes, I was thinking about this right today, particularly because I need
edthat functionality outside of G1ConcurrentMark today anyway.

My idea has been to add something like this to the alloc/realloc
methods, but I am fine with the constructor.

I am sure that at that time we are only single-threaded btw.

> A separate issue is some of the uses of FlagSetting and friends have
> nothing to do with "flags" in the globals.hpp sense, and seem
> semantically suspect.  Maybe an RFE here?

Filed JDK-8151412, I hope this is what you meant.

> ---------------------------------------------------------------------
> ---------
> src/share/vm/gc/g1/g1ConcurrentMark.cpp  
> 2144   size_t const new_array_allocator_malloc_limit =
> MIN2((size_t)os::vm_allocation_granularity() * 100,
> ArrayAllocatorMallocLimit);
> 
> I wonder why vm_allocation_granularity returns int rather than an
> unsigned type?  Doing anything about that is out of scope for this
> change, but unfortunately leads toward the cast.  Maybe an RFE here?

Filed JDK-8151413.

> ---------------------------------------------------------------------
> --------- 
> src/share/vm/gc/g1/g1ConcurrentMark.cpp  
> 2147   ArrayAllocator<BitMap::bm_word_t, mtGC> allocator(false);
> 
> The use of ArrayAllocator with false free_in_destructor introduces
> cleanup problems that are not being handled.  All the new (in this
> change set) and existing uses of that feature appear to be discarding
> the information needed to free the allocated memory when done with 
> it, because they aren't capturing the distinction between malloc'ed 
> vs. mmap'ed.
> 
> And that feature is being used and then handing the memory off to a
> BitMap, which could itself attempt to free the memory (if resize were
> to be called, which hopefully never happens in the cases at hand) 

It does not.

Originally I had an extension of the BitMap::resize() method that
automatically handled this. However this introduced more complexity
there, and potential regressions in behavior, so I created this helper
method in G1ConcurrentMark.

I guess I will see to extending BitMap after all.

> and have no clue about how to do so.
> 
> A possible way to deal with this would be to have an ArrayAllocator
> support swap (even better would be C++11 move, but...) and swap the
> external allocator with the one in the BitMap, so that when the 
> BitMap is destroyed it will properly deal with the deallocation.
> 
> This is a pre-existing issue, with the current change introducing a
> new occurrence of the problem.

This is a known issue.

> ---------------------------------------------------------------------
> --------- 
> src/share/vm/gc/g1/g1ConcurrentMark.cpp   
> 2166   allocate_large_bitmap(&_region_bm, (BitMap::idx_t)(_g1h
> ->max_regions()));
> 
> Why cast the max_regions value? I think currently there is no problem
> (looks like uint => size_t). We might someday turn on things like
> -Wconversion, in which case this cast would simply mask a (potential)
> problem if a problem exists then.

Fixed.

> ---------------------------------------------------------------------
> ---------
> src/share/vm/gc/g1/g1ConcurrentMark.cpp  
> 2167   allocate_large_bitmap(&_card_bm, align_size_up_(_g1h
> ->reserved_region().byte_size(), CardTableModRefBS::card_size) /
> CardTableModRefBS::card_size);
> 
> Abuse of align_size_up_ macro, which is supposed to be for use in
> places where a constant expression is required, with align_size_up
> function being preferred.  Unfortunately, then one runs afoul of the
> not very useful signature for align_size_up.  We could spend a lot of
> time discussing shortcomings in that vicinity.  Maybe an RFE here?

You mean, in the vicinity of align_size_up? I agree. What kind of RFE
do you suggest here? Do you mind filing an RFE here because apparently
you have lots of issues in mind...

> ---------------------------------------------------------------------
> ---------
> src/share/vm/gc/g1/g1ConcurrentMark.cpp  
> 2169   _count_card_bitmaps = NEW_C_HEAP_ARRAY(BitMap, 
>  _max_worker_id, mtGC);
> 
> Extra space before _max_worker_id.

Fixed.

> ---------------------------------------------------------------------
> ---------
> src/share/vm/gc/g1/g1ConcurrentMark.cpp   
> 2176   _count_marked_bytes = Padded2DArray<size_t,
> mtGC>::create_unfreeable(_max_worker_id, _g1h->max_regions(), false);
> 
> The call to create_unfreeable appears to be incorrect.  The third
> argument is a size_t* to the allocation_size.  This call should be
> implicitly converting false to a null pointer, and leaving the new
> force_pretouch argument with its default true value.
> 
> This should have been caught by gcc -Wconversion-null, which is
> enabled by default and I don't see any disabling of it in our build
> system.  I'm puzzled as to how this builds?
> 
> What am I missing here?

You were looking at an old version of the change where this has not
been fixed. Sorry. The code in the local version looks as follows:

  size_t dummy;
  _count_marked_bytes = Padded2DArray<size_t,
mtGC>::create_unfreeable(_max_worker_id, _g1h->max_regions(), &dummy,
false);

This is the only difference to the version in the webrev.

> ---------------------------------------------------------------------
> ---------  
> src/share/vm/gc/g1/g1ConcurrentMark.cpp 
> 2193     os::pretouch_memory(count_marked, count_marked +
> (size_t)G1CollectedHeap::heap()->max_regions() * sizeof(size_t));
> 
> Another unnecessary cast of max_regions.

Fixed.

> ---------------------------------------------------------------------
> --------- 
> src/share/vm/memory/allocation.hpp
>  748   // Indicates whether the last (re-)allocation used the C-heap
> malloc or virtual space.
>  749   bool use_malloc() const { return _use_malloc; }
> 
> I think used_malloc would be a better name for a query; use_malloc
> seems like a request/command.  

Fixed.

> ---------------------------------------------------------------------
> --------- 
> src/share/vm/utilities/bitMap.cpp
>   66   os::pretouch_memory((char*)word_addr(0),
> (char*)word_addr(size()));
> 
> Regarding the casts, here and in new G1PretouchInternalBitmapsTask,
> why does pretouch_memory want char* rather than void*?  Maybe an RFE
> here?  There are a few existing calls that could be cleaned up too.

Potentially because at the time the call had been introduced, char*
fit. Filed JDK-8151414.

> ---------------------------------------------------------------------
> --------- 
> test/gc/g1/Test2GbHeap.java
>   28  * Skip test on 32 bit Windows: it typically does not support
> the many and large virtual memory reservations needed.
>   29  * @requires (vm.gc == "G1" | vm.gc == "null") &
> ((sun.arch.data.model != "32") | (os.family != "windows"))
> 
> jtreg supports multiple @requires options, implicitly and'ing them
> together (*).  Splitting the @requires here would make it more
> readable.
> 
> And could the second clause be written as the following?
> 
>   !((sun.arg.data.model = "32") & (os.family = "windows"))

Fixed.

Thanks again for your review. I will prepare a new webrev later.

Thanks,
  Thomas