RFR (M): 8159422: Very high Concurrent Mark mark stack contention

Mon Sep 5 10:17:52 UTC 2016

Hi Mikael,

On Wed, 2016-08-03 at 18:05 +0200, Mikael Gerdin wrote:
> Hi Thomas,
> 
> On 2016-08-02 11:24, Thomas Schatzl wrote:
> > 
> > Hi everyone,
> > 
> >   could someone take a look at this change?
> > 
> > Its FC extension request  has already been approved too...
> > 
> > Thanks,
> >   Thomas
> > 
> > On Tue, 2016-07-19 at 17:38 +0200, Thomas Schatzl wrote:
> > > 
> > > Hi all,
> > > 
> > >   can I have reviews for this change that removes the global
> > > (heavy-weight) lock when accessing the global mark stack?
> > > 
> > > 
> > > The change converts the lock and high-water mark based management
> > > of the global mark stack into lock-free high-water mark and free
> > > list based one.erts the lock and high-water mark based management
> > > of the global mark stack into lock-free high-water mark and free
> > > list based one.
> > > 
> > > In the previous review for JDK-8160897 I already mentioned that
> > > the global lock when pushing/popping elements from the global
> > > mark stack is very problematic particularly when there are many
> > > marking threads in the system.
> > > 
> > > Overall, particularly at the end of marking (both in the
> > > concurrent phases as well as during remark) this behavior
> > > represents a significant bottleneck.
> > > 
> > > Particularly if there is a lot of traffic from and to the mark
> > > stack (to be addressed by JDK-8057003), this results in marking
> > > not completing quickly enough.
> > > 
> > > There is some some customer application on a 1 TB heap (with up
> > > to 80% full at times) where this results in lunch-break like
> > > length full gc pauses when concurrent marking does not complete
> > > in time.
> > > 
> > > Overall, together with JDK-8057003, this change reduces marking
> > > times from >500 seconds to manageable 10-30s. :) (at 100
> > > concurrent marking threads, more could be used) Microbenchmarks
> > > like the one from JDK-8057003 also basically scale linearly with
> > > the number of threads then.
> > > 
> > > This change will also help improve the time to safepoint
> > > significantly; because if there is a safepoint request while
> > > draining the mark stacks, it will now yield much earlier.
> > > 
> > > There is one drawback, internal management reduces the usable
> > > mark stack by around .1 percent. Since the follow-up, JDK-8057003 
> > > reduces mark stack usage by quite a bit, this has been considered
> > > acceptable.
> > > This is an enhancement, which is waiting for final approval.
> > > 
> > > CR:
> > > https://bugs.openjdk.java.net/browse/JDK-8159422
> > > Webrev:
> > > http://cr.openjdk.java.net/~tschatzl/8159422/webrev/
> I'm not too fond of having G1CMMarkStack::_base be a void** and 
> performing all addressing arithmetic on it "by hand" vs just using a 
> OopChunk* and integer indices, hwm and capacity.
> 
> The increment of _hwm could be just an unconditional atomic increment
> since allocate_new_chunk is the only user of that, all values of _hwm
> larger than _capacity could simply be ignored.
> 
> Perhaps MmapArrayAllocator<G1CMMarkStack::OopChunk, mtGC>::allocate 
> could be used to allocate the marking stack chunks? You could have a 
> scoped typedef for it in G1CMMarkStack to make it slightly less
> verbose.

I created a new method allocate_or_null() that does not exit the VM
(well, almost), if allocation fails.

> 
> G1ConcurrentMark::note_{end,start}_of_gc are now empty, perhaps they 
> should be removed?
> 
> Would you mind making the iterate function NOT_PRODUCT_RETURN similar
> to its caller to make it more clear that it's just used for
> verification purposes?

I think I addressed all your concerns in
http://cr.openjdk.java.net/~tschatzl/8159422/webrev.1/ (full)
http://cr.openjdk.java.net/~tschatzl/8159422/webrev.0_to_1/ (diff)

Thanks,
  Thomas