RFR (M): 8159422: Very high Concurrent Mark mark stack contention

Wed Aug 3 16:05:23 UTC 2016

Hi Thomas,

On 2016-08-02 11:24, Thomas Schatzl wrote:
> Hi everyone,
>
>   could someone take a look at this change?
>
> Its FC extension request  has already been approved too...
>
> Thanks,
>   Thomas
>
> On Tue, 2016-07-19 at 17:38 +0200, Thomas Schatzl wrote:
>> Hi all,
>>
>>   can I have reviews for this change that removes the global (heavy-
>> weight) lock when accessing the global mark stack?
>>
>>
>> The change converts the lock and high-water mark based management of
>> the global mark stack into lock-free high-water mark and free list
>> based one.erts the lock and high-water mark based management of the
>> global mark stack into lock-free high-water mark and free list based
>> one.
>>
>> In the previous review for JDK-8160897 I already mentioned that the
>> global lock when pushing/popping elements from the global mark stack
>> is very problematic particularly when there are many marking threads
>> in the system.
>>
>> Overall, particularly at the end of marking (both in the concurrent
>> phases as well as during remark) this behavior represents a
>> significant bottleneck.
>>
>> Particularly if there is a lot of traffic from and to the mark stack
>> (to be addressed by JDK-8057003), this results in marking not
>> completing quickly enough.
>>
>> There is some some customer application on a 1 TB heap (with up to
>> 80% full at times) where this results in lunch-break like length full
>> gc pauses when concurrent marking does not complete in time.
>>
>> Overall, together with JDK-8057003, this change reduces marking times
>> from >500 seconds to manageable 10-30s. :) (at 100 concurrent marking
>> threads, more could be used) Microbenchmarks like the one from JDK-
>> 8057003 also basically scale linearly with the number of threads
>> then.
>>
>> This change will also help improve the time to safepoint
>> significantly; because if there is a safepoint request while draining
>> the mark stacks, it will now yield much earlier.
>>
>> There is one drawback, internal management reduces the usable mark
>> stack by around .1 percent. Since the follow-up, JDK-8057003 reduces
>> mark stack usage by quite a bit, this has been considered acceptable.
>>
>> This is an enhancement, which is waiting for final approval.
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8159422
>> Webrev:
>> http://cr.openjdk.java.net/~tschatzl/8159422/webrev/

I'm not too fond of having G1CMMarkStack::_base be a void** and 
performing all addressing arithmetic on it "by hand" vs just using a 
OopChunk* and integer indices, hwm and capacity.

The increment of _hwm could be just an unconditional atomic increment 
since allocate_new_chunk is the only user of that, all values of _hwm 
larger than _capacity could simply be ignored.

Perhaps MmapArrayAllocator<G1CMMarkStack::OopChunk, mtGC>::allocate 
could be used to allocate the marking stack chunks? You could have a 
scoped typedef for it in G1CMMarkStack to make it slightly less verbose.

G1ConcurrentMark::note_{end,start}_of_gc are now empty, perhaps they 
should be removed?

Would you mind making the iterate function NOT_PRODUCT_RETURN similar to 
its caller to make it more clear that it's just used for verification 
purposes?

/Mikael

>> Testing:
>> jprt, nightly run, several vm.gc runs, internal benchmarks
>>
>> Thanks,
>>   Thomas
>>