RFR: Remove prefetch during mark

Thu Mar 1 11:17:55 UTC 2018

On 03/01/2018 12:12 PM, Per Liden wrote:
> Hi,
> 
> On 03/01/2018 01:52 AM, Steve Blackburn wrote:
>> Hi all,
>>
>> I just stumbled upon this thread, and thought I ought to chime in.
>>
>> You may find our prefetch paper from 10 years ago useful.   Or not! :-).
>>                  
>> http://users.cecs.anu.edu.au/~steveb/downloads/pdf/pf-ismm-2007.pdf
> 
> Thanks for the pointer. Link above doesn't seem to work for me, but I 
> found the paper through ACM.
> 
>>
>> The short version is that there were a number of efforts to get 
>> prefetching working well in the past, but none were effective.  We did 
>> a pretty detailed study and managed to get some very nice results, 
>> with two important changes:
>>
>>    *   FIFO front end to mark queue (without the FIFO the prefetch 
>> distance is unpredictable)
>>    *   Enqueue edges rather than nodes
>> Obviously, the situation is different here (concurrent, big change in 
>> uarch, etc), but still there are some core ideas that you probably 
>> ought to know.
>>
>> The impatient may want to jump to section 7.2 and 7.3.    Note the 
>> last para of 7.3: just adding the FIFO, with no software prefetch may 
>> bring a win on some architectures.
> 
> We do enqueue edges in ZGC (to enable "striped marking"), so we're 
> fairly good positioned for prefetching to work, one would think. I 
> recently did some quick tests with a FIFO in front of the mark stack 
> (which would match "EdgeSide" in the paper) with varying prefetch 
> distance, but wasn't able to observe any real improvements. More 
> measurements and analysis would be needed to understand why.

Here's the FIFO prefetch patch I did, in case anyone is interested in 
doing more work/analysis in this area:

http://cr.openjdk.java.net/~pliden/zgc/mark_prefetch/webrev.0/

cheers,
Per

> 
> cheers,
> Per
> 
>>
>> Cheers,
>>
>> --Steve
>>
>> On 02/14/2018 05:23 PM, Wilkinson, Hugh wrote:
>>> I have been looking at this also.
>>>
>>> I find that if the prefetching occurs 3 popped entries ahead of the 
>>> processing, then there is a worthwhile benefit.
>>>
>>> A bit of re-structuring is required to make this easy and efficient.
>>>
>>> I am prefetching 2 cache lines from the referenced object and also 
>>> doing a PREFETCHW of the mark bitmap.  (Prefetch::write() requires 
>>> modification for x86.)
>>>
>>> With the current code structure, removal of the Prefetch::read() 
>>> probably makes sense; however, I would like to highlight that marking 
>>> performance can be improved with sufficiently early software cache 
>>> prefetches.
>>>
>>> I expect to share more details later.
>>
>> Looking forward to that!
>>
>> cheers,
>> Per
>>