RFR: 8079315: UseCondCardMark broken in conjunction with CMS precleaning

Mon May 11 13:41:22 UTC 2015

On 05/11/2015 12:33 PM, Erik Österlund wrote:
> Hi Andrew,
> 
>> On 11 May 2015, at 11:58, Andrew Haley <aph at redhat.com> wrote:
>>
>> On 05/11/2015 11:40 AM, Erik Österlund wrote:
>>
>>> I have heard statements like this that such mechanism would not work
>>> on RMO, but never got an explanation why it would work only on
>>> TSO. Could you please elaborate?  I studied some kernel sources for
>>> a bunch of architectures and kernels, and it seems as far as I can
>>> see all good for RMO too.
>>
>> Dave Dice himself told me that the algorithm is not in general safe
>> for non-TSO.  Perhaps, though, it is safe in this particular case.  Of
>> course, I may be misunderstanding him.  I'm not sure of his reasoning
>> but perhaps we should include him in this discussion.
> 
> I see. It would be interesting to hear his reasoning, because it is
> not clear to me.
> 
>> From my point of view, I can't see a strong argument for doing this on
>> AArch64.  StoreLoad barriers are not fantastically expensive there so
>> it may not be worth going to such extremes.  The cost of a StoreLoad
>> barrier doesn't seem to be so much more than the StoreStore that we
>> have to have anyway.
> 
> Yeah about performance I’m not sure when it’s worth removing these
> fences and on what hardware.

Your algorithm (as I understand it) trades a moderately expensive (but
purely local) operation for a very expensive global operation, albeit
with much lower frequency.  It's not clear to me how much we value
continuous operation versus faster operation with occasional global
stalls.  I suppose it must be application-dependent.

> In this case though, if it makes us any happier, I think we could
> probably get rid of the storestore barrier too:
> 
> The latent reference store is forced to serialize anyway after the
> dirty card value write is observable and about to be cleaned. So the
> potential consistency violation that the card looks dirty and then
> cleaning thread reads a stale reference value could not happen with
> my approach even without storestore hardware protection. I didn’t
> give it too much thought but on the top of my mind I can’t see any
> problems. If we want to get rid of storestore too I can give it some
> more thought.

That is very interesting.

> But you know much better than me if these fences are problematic or
> not. :)

Not really.  AArch64 is an architecture not an implementation, and is
designed to be implemented using a wide range of techniques. Instead
of having very complex cores, some designers seem have decided it
makes sense to have many of them on a die.  It may well be, though,
that some implementers will adopt an x86-like highly-superscalar
architecture with a great deal of speculative execution.  I can only
predict the past...  My approach with this project has been to do
things in the most straightforward way rather than trying to optimize
for whatever implementations I happen to have available.

Andrew.