MFENCE vs. LOCK addl

Wed Feb 25 12:31:21 PST 2009

John, Paul --
  Yeah I had tried that and was in the process of writing that up.  It is better than an MFENCE and has the added benefit of not needing a system with SSE2+.  I still don't have a good JVM case but the assembler run shows that it's faster than MFENCE.  My naïve change to assembler_x86.cpp:

void Assembler::mfence() {
    // Memory barriers are only needed on multiprocessors
  if (os::is_MP()) {    
      // All usable chips support "locked" instructions which suffice
      // as barriers, and are much faster than the alternative of
      // using cpuid instruction. We use here a xchg which is implicitly locked
      // This is conveniently otherwise a no-op except for blowing
      // rax (which we save and restore.)
      push(rax);	// Store RAX register
      xchgl(rax, Address(rsp, 0));
      pop(rax);	// Restore RAX register
  }
}

--
Azeem Jiva
AMD Java Labs
T 512.602.0907

> -----Original Message-----
> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM]
> Sent: Wednesday, February 25, 2009 2:20 PM
> To: John Rose
> Cc: Jiva, Azeem; hotspot compiler
> Subject: Re: MFENCE vs. LOCK addl
> 
> Good idea.  Can you try it, Azeem?
> 
> Paul
> 
> John Rose wrote:
> > What about XCHG?  It doesn't set flags, and (as a bonus) it implies
> > the effect of a LOCK prefix:
> >     push rax
> >     xchg rax
> >     pop rax
> >
> > -- John
> >
> > On Feb 25, 2009, at 7:05 AM, Jiva, Azeem wrote:
> >
> >> Paul,
> >>  Ahh right, I did some experiments with running MFENCE vs. LOCK ADDL
> >> and MFENCE vs. PUSH/LOCK ADDL/POPF and found that MFENCE is faster
> than
> >> PUSH/LOCK/POP but not faster than just using the LOCK instruction by
> >> itself.  A nice optimization would be if the JVM could detect if the
> >> condition codes needed to be saved instead of saving them always.   This
> >> is on AMD hardware, and other systems might have different
> performance
> >> issues.
> >