MFENCE vs. LOCK addl

Wed Feb 25 12:53:45 PST 2009

The name 'mfence()' is an artifact.  It means to do the equivalent of 
OrderAccess::fence().

Paul

Tom Rodriguez wrote:
> Assembler::mfence is used in places where optimizing it wouldn't seem 
> to matter to me.  As far as killing the condition flags go I don't 
> think any piece of code which calls mfence cares.  There are only 
> about 5 calls which seems easy to audit.  Avoiding push/pop at all 
> seems much better to me.
>
> So if mfence is equivalent in power to any old locked instruction why 
> is it so much more expensive?  It seems like it either must be doing 
> something more or it's a really crappy implementation.  Are we sure 
> the callers don't need the something more part?
>
> As an aside, it seems odd that something in the Assembler that's named 
> after a real instruction would never actually emit that instruction.  
> Shouldn't mfence emit mfence and all current callers call something 
> else which might sometimes emit mfence?
>
> tom
>
> On Feb 25, 2009, at 12:31 PM, Jiva, Azeem wrote:
>
>> John, Paul --
>>  Yeah I had tried that and was in the process of writing that up.  It 
>> is better than an MFENCE and has the added benefit of not needing a 
>> system with SSE2+.  I still don't have a good JVM case but the 
>> assembler run shows that it's faster than MFENCE.  My naïve change to 
>> assembler_x86.cpp:
>>
>> void Assembler::mfence() {
>>    // Memory barriers are only needed on multiprocessors
>>  if (os::is_MP()) {
>>      // All usable chips support "locked" instructions which suffice
>>      // as barriers, and are much faster than the alternative of
>>      // using cpuid instruction. We use here a xchg which is 
>> implicitly locked
>>      // This is conveniently otherwise a no-op except for blowing
>>      // rax (which we save and restore.)
>>      push(rax);    // Store RAX register
>>      xchgl(rax, Address(rsp, 0));
>>      pop(rax);    // Restore RAX register
>>  }
>> }
>>
>> -- 
>> Azeem Jiva
>> AMD Java Labs
>> T 512.602.0907
>>
>>> -----Original Message-----
>>> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM]
>>> Sent: Wednesday, February 25, 2009 2:20 PM
>>> To: John Rose
>>> Cc: Jiva, Azeem; hotspot compiler
>>> Subject: Re: MFENCE vs. LOCK addl
>>>
>>> Good idea.  Can you try it, Azeem?
>>>
>>> Paul
>>>
>>> John Rose wrote:
>>>> What about XCHG?  It doesn't set flags, and (as a bonus) it implies
>>>> the effect of a LOCK prefix:
>>>>    push rax
>>>>    xchg rax
>>>>    pop rax
>>>>
>>>> -- John
>>>>
>>>> On Feb 25, 2009, at 7:05 AM, Jiva, Azeem wrote:
>>>>
>>>>> Paul,
>>>>> Ahh right, I did some experiments with running MFENCE vs. LOCK ADDL
>>>>> and MFENCE vs. PUSH/LOCK ADDL/POPF and found that MFENCE is faster
>>> than
>>>>> PUSH/LOCK/POP but not faster than just using the LOCK instruction by
>>>>> itself.  A nice optimization would be if the JVM could detect if the
>>>>> condition codes needed to be saved instead of saving them 
>>>>> always.   This
>>>>> is on AMD hardware, and other systems might have different
>>> performance
>>>>> issues.
>>>>
>>
>>
>