MFENCE vs. LOCK addl

Wed Feb 25 06:11:53 PST 2009

What do pushf and popf cost?  Do locked operations actually have
2-way barrier semantics from an external pov?

Paul

Jiva, Azeem wrote:
> I was looking at memory barrier performance and noticed that HotSpot
> uses MFENCE as a memory barrier in 64bit mode.  MFENCE is significantly
> slower than using a LOCKed instruction, since MFENCE is serializing
> (similar to CPUID).   I'd like to recommend the following change:
>
> assembler_x86.cpp
> // Serializes memory.
> void Assembler::mfence() {
>   // Memory barriers are only needed on multiprocessors
>   if (os::is_MP()) {    
>       // All usable chips support "locked" instructions which suffice
>       // as barriers, and are much faster than the alternative of
>       // using cpuid or mfence instructions. We use here a locked add
> [esp],0.
>       // This is conveniently otherwise a no-op except for blowing
>       // flags (which we save and restore.)	  
>       pushf();                // Save eflags register      	  
>       lock();
>       addl(Address(rsp, 0), 0);// Assert the lock# signal here
>       popf();                 // Restore eflags register
>
>   }
> }
>
>
> Sorry it's not a diff, but I'm not setup with mercurial yet.  Only
> application I've ran is SPECjbb2005, and there are no regressions or
> gains.  Mostly because the generated code from SPECjbb2005 doesn't use
> MFENCE in any significant amount.  
>
> --
> Azeem Jiva
> AMD Java Labs
> T 512.602.0907
>
>
>