MFENCE vs. LOCK addl

Wed Feb 25 12:45:00 PST 2009

Assembler::mfence is used in places where optimizing it wouldn't seem  
to matter to me.  As far as killing the condition flags go I don't  
think any piece of code which calls mfence cares.  There are only  
about 5 calls which seems easy to audit.  Avoiding push/pop at all  
seems much better to me.

So if mfence is equivalent in power to any old locked instruction why  
is it so much more expensive?  It seems like it either must be doing  
something more or it's a really crappy implementation.  Are we sure  
the callers don't need the something more part?

As an aside, it seems odd that something in the Assembler that's named  
after a real instruction would never actually emit that instruction.   
Shouldn't mfence emit mfence and all current callers call something  
else which might sometimes emit mfence?

tom

On Feb 25, 2009, at 12:31 PM, Jiva, Azeem wrote:

> John, Paul --
>  Yeah I had tried that and was in the process of writing that up.   
> It is better than an MFENCE and has the added benefit of not needing  
> a system with SSE2+.  I still don't have a good JVM case but the  
> assembler run shows that it's faster than MFENCE.  My naïve change  
> to assembler_x86.cpp:
>
> void Assembler::mfence() {
>    // Memory barriers are only needed on multiprocessors
>  if (os::is_MP()) {
>      // All usable chips support "locked" instructions which suffice
>      // as barriers, and are much faster than the alternative of
>      // using cpuid instruction. We use here a xchg which is  
> implicitly locked
>      // This is conveniently otherwise a no-op except for blowing
>      // rax (which we save and restore.)
>      push(rax);	// Store RAX register
>      xchgl(rax, Address(rsp, 0));
>      pop(rax);	// Restore RAX register
>  }
> }
>
> --
> Azeem Jiva
> AMD Java Labs
> T 512.602.0907
>
>> -----Original Message-----
>> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM]
>> Sent: Wednesday, February 25, 2009 2:20 PM
>> To: John Rose
>> Cc: Jiva, Azeem; hotspot compiler
>> Subject: Re: MFENCE vs. LOCK addl
>>
>> Good idea.  Can you try it, Azeem?
>>
>> Paul
>>
>> John Rose wrote:
>>> What about XCHG?  It doesn't set flags, and (as a bonus) it implies
>>> the effect of a LOCK prefix:
>>>    push rax
>>>    xchg rax
>>>    pop rax
>>>
>>> -- John
>>>
>>> On Feb 25, 2009, at 7:05 AM, Jiva, Azeem wrote:
>>>
>>>> Paul,
>>>> Ahh right, I did some experiments with running MFENCE vs. LOCK ADDL
>>>> and MFENCE vs. PUSH/LOCK ADDL/POPF and found that MFENCE is faster
>> than
>>>> PUSH/LOCK/POP but not faster than just using the LOCK instruction  
>>>> by
>>>> itself.  A nice optimization would be if the JVM could detect if  
>>>> the
>>>> condition codes needed to be saved instead of saving them  
>>>> always.   This
>>>> is on AMD hardware, and other systems might have different
>> performance
>>>> issues.
>>>
>
>