RFR: Remove prefetch during mark
Wilkinson, Hugh
hugh.wilkinson at intel.com
Thu Mar 1 16:45:56 UTC 2018
Hi Per,
I believe that the Prefetch::read() and Prefetch::write() will be defined for x86 by zgc/src/hotspot/os_cpu/linux_x86/prefetch_linux_x86.inline.hpp. (jdk-10+43)
This file provides incorrect translation to assembly code. It creates a 3-arg effective address that is interpreted by the processor as (base_address, index, size).
The size argument of read() and write() is translated as an index. The size in the tupple above is always 1 in the translation, but this is only the size of the indexed object (1, 2, 4, 8).
So, specifying a size of a cacheline to read() and write() will prefetch the cacheline after the one that was intended.
In this instance, the easiest workaround is to specify a size of 1.
The actual prefetch instructions do not take a size argument. They only take a byte address reference, and the cacheline containing the byte is prefetched.
A more robust fix is to instantiate the number of prefetch instructions necessary to span the size, but this is only practical for perhaps a maximum of perhaps 4 prefetch instructions. It is only practical for a compile-time constant size argument.
Current Intel processors will perform a prefetch execlusive ownership for the PREFETCHW instruction. Prior to BDW, except for potentially some early Pentium 4s, a NOP was executed for a PREFETCHW instance. The file could enable execution of a PREFETCHW for a Prefetch::write().
Hugh
-----Original Message-----
From: zgc-dev [mailto:zgc-dev-bounces at openjdk.java.net] On Behalf Of Per Liden
Sent: Thursday, March 1, 2018 6:18 AM
To: Steve Blackburn <steve.blackburn at anu.edu.au>; zgc-dev at openjdk.java.net
Subject: Re: RFR: Remove prefetch during mark
On 03/01/2018 12:12 PM, Per Liden wrote:
> Hi,
>
> On 03/01/2018 01:52 AM, Steve Blackburn wrote:
>> Hi all,
>>
>> I just stumbled upon this thread, and thought I ought to chime in.
>>
>> You may find our prefetch paper from 10 years ago useful. Or not! :-).
>>
>> http://users.cecs.anu.edu.au/~steveb/downloads/pdf/pf-ismm-2007.pdf
>
> Thanks for the pointer. Link above doesn't seem to work for me, but I
> found the paper through ACM.
>
>>
>> The short version is that there were a number of efforts to get
>> prefetching working well in the past, but none were effective. We
>> did a pretty detailed study and managed to get some very nice
>> results, with two important changes:
>>
>> * FIFO front end to mark queue (without the FIFO the prefetch
>> distance is unpredictable)
>> * Enqueue edges rather than nodes Obviously, the situation is
>> different here (concurrent, big change in uarch, etc), but still
>> there are some core ideas that you probably ought to know.
>>
>> The impatient may want to jump to section 7.2 and 7.3. Note the
>> last para of 7.3: just adding the FIFO, with no software prefetch may
>> bring a win on some architectures.
>
> We do enqueue edges in ZGC (to enable "striped marking"), so we're
> fairly good positioned for prefetching to work, one would think. I
> recently did some quick tests with a FIFO in front of the mark stack
> (which would match "EdgeSide" in the paper) with varying prefetch
> distance, but wasn't able to observe any real improvements. More
> measurements and analysis would be needed to understand why.
Here's the FIFO prefetch patch I did, in case anyone is interested in doing more work/analysis in this area:
http://cr.openjdk.java.net/~pliden/zgc/mark_prefetch/webrev.0/
cheers,
Per
>
> cheers,
> Per
>
>>
>> Cheers,
>>
>> --Steve
>>
>> On 02/14/2018 05:23 PM, Wilkinson, Hugh wrote:
>>> I have been looking at this also.
>>>
>>> I find that if the prefetching occurs 3 popped entries ahead of the
>>> processing, then there is a worthwhile benefit.
>>>
>>> A bit of re-structuring is required to make this easy and efficient.
>>>
>>> I am prefetching 2 cache lines from the referenced object and also
>>> doing a PREFETCHW of the mark bitmap. (Prefetch::write() requires
>>> modification for x86.)
>>>
>>> With the current code structure, removal of the Prefetch::read()
>>> probably makes sense; however, I would like to highlight that
>>> marking performance can be improved with sufficiently early software
>>> cache prefetches.
>>>
>>> I expect to share more details later.
>>
>> Looking forward to that!
>>
>> cheers,
>> Per
>>
More information about the zgc-dev
mailing list