MFENCE vs. LOCK addl
Paul Hohensee
Paul.Hohensee at Sun.COM
Wed Feb 25 12:53:45 PST 2009
The name 'mfence()' is an artifact. It means to do the equivalent of
OrderAccess::fence().
Paul
Tom Rodriguez wrote:
> Assembler::mfence is used in places where optimizing it wouldn't seem
> to matter to me. As far as killing the condition flags go I don't
> think any piece of code which calls mfence cares. There are only
> about 5 calls which seems easy to audit. Avoiding push/pop at all
> seems much better to me.
>
> So if mfence is equivalent in power to any old locked instruction why
> is it so much more expensive? It seems like it either must be doing
> something more or it's a really crappy implementation. Are we sure
> the callers don't need the something more part?
>
> As an aside, it seems odd that something in the Assembler that's named
> after a real instruction would never actually emit that instruction.
> Shouldn't mfence emit mfence and all current callers call something
> else which might sometimes emit mfence?
>
> tom
>
> On Feb 25, 2009, at 12:31 PM, Jiva, Azeem wrote:
>
>> John, Paul --
>> Yeah I had tried that and was in the process of writing that up. It
>> is better than an MFENCE and has the added benefit of not needing a
>> system with SSE2+. I still don't have a good JVM case but the
>> assembler run shows that it's faster than MFENCE. My naïve change to
>> assembler_x86.cpp:
>>
>> void Assembler::mfence() {
>> // Memory barriers are only needed on multiprocessors
>> if (os::is_MP()) {
>> // All usable chips support "locked" instructions which suffice
>> // as barriers, and are much faster than the alternative of
>> // using cpuid instruction. We use here a xchg which is
>> implicitly locked
>> // This is conveniently otherwise a no-op except for blowing
>> // rax (which we save and restore.)
>> push(rax); // Store RAX register
>> xchgl(rax, Address(rsp, 0));
>> pop(rax); // Restore RAX register
>> }
>> }
>>
>> --
>> Azeem Jiva
>> AMD Java Labs
>> T 512.602.0907
>>
>>> -----Original Message-----
>>> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM]
>>> Sent: Wednesday, February 25, 2009 2:20 PM
>>> To: John Rose
>>> Cc: Jiva, Azeem; hotspot compiler
>>> Subject: Re: MFENCE vs. LOCK addl
>>>
>>> Good idea. Can you try it, Azeem?
>>>
>>> Paul
>>>
>>> John Rose wrote:
>>>> What about XCHG? It doesn't set flags, and (as a bonus) it implies
>>>> the effect of a LOCK prefix:
>>>> push rax
>>>> xchg rax
>>>> pop rax
>>>>
>>>> -- John
>>>>
>>>> On Feb 25, 2009, at 7:05 AM, Jiva, Azeem wrote:
>>>>
>>>>> Paul,
>>>>> Ahh right, I did some experiments with running MFENCE vs. LOCK ADDL
>>>>> and MFENCE vs. PUSH/LOCK ADDL/POPF and found that MFENCE is faster
>>> than
>>>>> PUSH/LOCK/POP but not faster than just using the LOCK instruction by
>>>>> itself. A nice optimization would be if the JVM could detect if the
>>>>> condition codes needed to be saved instead of saving them
>>>>> always. This
>>>>> is on AMD hardware, and other systems might have different
>>> performance
>>>>> issues.
>>>>
>>
>>
>
More information about the hotspot-compiler-dev
mailing list