RFR (M): 8159422: Very high Concurrent Mark mark stack contention

Wed Sep 7 11:34:31 UTC 2016

Hi Thomas,

My concerns regarding the ABA conflict is gone now. I think it looks good!

/Erik

On 2016-09-05 12:20, Thomas Schatzl wrote:
> Hi Erik, Kim,
>
>    sorry for the late reply...
>
> On Thu, 2016-08-04 at 16:41 -0400, Kim Barrett wrote:
>>> On Aug 4, 2016, at 5:43 AM, Thomas Schatzl <thomas.schatzl at oracle.c
>>> om> wrote:
>>>> I was also wondering about the non-atomic unit of
>>>>    { modify the chunk list, modify the chunk list count }
>>>> questioning whether users of the chunk list count are all
>>>> prepared for the racy imprecision of it.  The answer to that
>>>> *might* be "yes", but it's hard to convincingly review, let alone
>>>> prove.
>>> This is a pre-existing (imo benign) problem: while the _index is
>>> updated while holding the lock, it is read without any
>>> synchronization either.
>>>
>>> The problem seems to be drain_global_stack() in the case when we
>>> want to drain the global mark stack completely.
>>> I think this problem can be easily avoided by, in this case, just
>>> try to pop work from the global mark stack until the list itself is
>>> empty. I will fix that as suggested.
>>>
>>> (In the case of partial draining it does not really matter whether
>>> we drained to an exact number of elements).
>> Yes to all of that.
>>
>>>> I'm not convinced lock-free is even the way to go here.  The
>>>> problem
>>>> with the old code was the lock granularity was much too large,
>>>> leading to bad contention.  The new chunk approach, with locking
>>>> for
>>>> chunk allocation and free, would reduce the lock granularity to
>>>> something much more sensible, so should still dramatically reduce
>>>> contention.
>>>> Remaining contention can be further ameliorated by increasing the
>>>> chunk size, to give threads more work to do between
>>>> locking.  Trying
>>>> to make this lock-free adds a lot more complexity, for what seems
>>>> likely to me to be little if any gain.
>>> I disagree here: the "there is no problem here" approach is the one
>>> that has lead me in the last few months to find out that almost
>>> every single (semi-)global lock in g1 code is a point of
>>> significant contention (remembered set locks (tons of CRs now),
>>> dcqs enqueue lock (JDK-8162929), region allocation during gc lock
>>> (don't remember) and others) on large enough machines/problems.
>>>
>>> (Note that we are in some cases talking about only 20g heap with
>>> ~30 threads here in some cases).
>>>
>>> So, just make the problem large enough, and you will see actual
>>> wait times get significantly show up in performance tools (and
>>> these times typically do not account for the busy wait time due to
>>> inlining).
>>>
>>> In this case, just consider multiple threads basically copying
>>> large objArrays on the global mark stack... (that will be fixed
>>> soon too, JDK-8057003).
>>>
>>> So I would like to try to fix the ABA problem here, and only if it
>>> does not work out in reasonable time or is otherwise functionally
>>> unacceptable, revert back to the lock.
>> OK.
>>
>    due to time constraints I moved to the suggested use of a global lock
> to fix the ABA problem.
>
> I did some re-runs on the applications this change is targeted for, and
> did not see a noticable regression in conjunction with JDK-8057003.
>
> Please have a look at http://cr.openjdk.java.net/~tschatzl/8159422/webr
> ev.1/ (full) and http://cr.openjdk.java.net/~tschatzl/8159422/webrev.1/
>   (diff) which also address Mikael's concerns.
>
> Thanks,
>    Thomas
>