RFR: 8079315: UseCondCardMark broken in conjunction with CMS precleaning

Tue May 12 13:05:27 UTC 2015

On 11.05.2015 16:41, Andrew Haley wrote:
> On 05/11/2015 12:33 PM, Erik Österlund wrote:
>> Hi Andrew,
>>
>>> On 11 May 2015, at 11:58, Andrew Haley <aph at redhat.com> wrote:
>>>
>>> On 05/11/2015 11:40 AM, Erik Österlund wrote:
>>>
>>>> I have heard statements like this that such mechanism would not work
>>>> on RMO, but never got an explanation why it would work only on
>>>> TSO. Could you please elaborate?  I studied some kernel sources for
>>>> a bunch of architectures and kernels, and it seems as far as I can
>>>> see all good for RMO too.
>>>
>>> Dave Dice himself told me that the algorithm is not in general safe
>>> for non-TSO.  Perhaps, though, it is safe in this particular case.  Of
>>> course, I may be misunderstanding him.  I'm not sure of his reasoning
>>> but perhaps we should include him in this discussion.
>>
>> I see. It would be interesting to hear his reasoning, because it is
>> not clear to me.
>>
>>> From my point of view, I can't see a strong argument for doing this on
>>> AArch64.  StoreLoad barriers are not fantastically expensive there so
>>> it may not be worth going to such extremes.  The cost of a StoreLoad
>>> barrier doesn't seem to be so much more than the StoreStore that we
>>> have to have anyway.
>>
>> Yeah about performance I’m not sure when it’s worth removing these
>> fences and on what hardware.
> 
> Your algorithm (as I understand it) trades a moderately expensive (but
> purely local) operation for a very expensive global operation, albeit
> with much lower frequency.  It's not clear to me how much we value
> continuous operation versus faster operation with occasional global
> stalls.  I suppose it must be application-dependent.

Okay, Dice's asymmetric trick is nice. In fact, that is arguably what
Parallel is using already: it serializes the mutator stores by stopping
the mutator at safepoint. Using mprotect and TLB tricks as the
serialization actions is cute and dandy.

However, I have doubts that employing the system-wide synchronization
mechanism for concurrent collector is a good thing, when we can't
predict and control the long-term performance of it. For example, we are
basically coming at the mercy of underlying OS performance with mprotect
calls. There are industrial GCs that rely on OS performance (*cough*
*cough*), you can see what do those require to guarantee performance.

Also, given the problem is specific to CMS that arguably goes away in
favor of G1, I would think introducing special-case-for-CMS barriers in
mutator code is a sane interim solution.

Especially if we can backport the G1-like barrier "filtering" in CMS
case? If I read this thread right, Erik and Thomas concluded there is no
clear benefit of introducing the mprotect-like mechanics with G1, which
probably means the overheads are bearable with appropriate mutator-side
changes.

Thanks,
-Aleksey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20150512/d2c07b09/signature.asc>