RFR (M): 8159422: Very high Concurrent Mark mark stack contention

Wed Aug 3 15:37:03 UTC 2016

Hi Thomas,

Just had a look at your code. Wondering how your lock-free stack handles 
the classic ABA problem? It's not obvious for me.

In detail:

When you pop something you:

1) Load the head
2) Load the next pointer of that head
3) CAS the head expecting the head from 1, and if matching, expecting 
that the new value, which is what the next pointer used to be, will be 
consistent

This is where the bad concurrency stuff can happen. Between 2 and 3, it 
could be that another thread wins the race and pops the value first, 
logically frees it by sticking it back to the freelist, then arbitrary 
stuff happens with the original queue pushing and popping all over the 
place, then eventually this same node is pushed back again to the 
lock-free stack after being grabbed from the freelist, but this time 
installed into the lock-free stack with a completely different next 
pointer than was loaded in 2), resulting in the CAS in 3) making the 
invalid assumption that it is the same node as before and hence with the 
same next pointer.

Is that what those counters are there for, or am I missing something?

Perhaps versioned pointers, hazard pointers or epoch based safe memory 
reclamation would be good tools here.

Thanks,
/Erik

On 2016-08-02 11:24, Thomas Schatzl wrote:
> Hi everyone,
>
>    could someone take a look at this change?
>
> Its FC extension request  has already been approved too...
>
> Thanks,
>    Thomas
>
> On Tue, 2016-07-19 at 17:38 +0200, Thomas Schatzl wrote:
>> Hi all,
>>
>>    can I have reviews for this change that removes the global (heavy-
>> weight) lock when accessing the global mark stack?
>>
>>
>> The change converts the lock and high-water mark based management of
>> the global mark stack into lock-free high-water mark and free list
>> based one.erts the lock and high-water mark based management of the
>> global mark stack into lock-free high-water mark and free list based
>> one.
>>
>> In the previous review for JDK-8160897 I already mentioned that the
>> global lock when pushing/popping elements from the global mark stack
>> is very problematic particularly when there are many marking threads
>> in the system.
>>
>> Overall, particularly at the end of marking (both in the concurrent
>> phases as well as during remark) this behavior represents a
>> significant bottleneck.
>>
>> Particularly if there is a lot of traffic from and to the mark stack
>> (to be addressed by JDK-8057003), this results in marking not
>> completing quickly enough.
>>
>> There is some some customer application on a 1 TB heap (with up to
>> 80% full at times) where this results in lunch-break like length full
>> gc pauses when concurrent marking does not complete in time.
>>
>> Overall, together with JDK-8057003, this change reduces marking times
>> from >500 seconds to manageable 10-30s. :) (at 100 concurrent marking
>> threads, more could be used) Microbenchmarks like the one from JDK-
>> 8057003 also basically scale linearly with the number of threads
>> then.
>>
>> This change will also help improve the time to safepoint
>> significantly; because if there is a safepoint request while draining
>> the mark stacks, it will now yield much earlier.
>>
>> There is one drawback, internal management reduces the usable mark
>> stack by around .1 percent. Since the follow-up, JDK-8057003 reduces
>> mark stack usage by quite a bit, this has been considered acceptable.
>>
>> This is an enhancement, which is waiting for final approval.
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8159422
>> Webrev:
>> http://cr.openjdk.java.net/~tschatzl/8159422/webrev/
>> Testing:
>> jprt, nightly run, several vm.gc runs, internal benchmarks
>>
>> Thanks,
>>    Thomas
>>