MFENCE vs. LOCK addl
Jiva, Azeem
Azeem.Jiva at amd.com
Wed Feb 25 12:31:21 PST 2009
John, Paul --
Yeah I had tried that and was in the process of writing that up. It is better than an MFENCE and has the added benefit of not needing a system with SSE2+. I still don't have a good JVM case but the assembler run shows that it's faster than MFENCE. My naïve change to assembler_x86.cpp:
void Assembler::mfence() {
// Memory barriers are only needed on multiprocessors
if (os::is_MP()) {
// All usable chips support "locked" instructions which suffice
// as barriers, and are much faster than the alternative of
// using cpuid instruction. We use here a xchg which is implicitly locked
// This is conveniently otherwise a no-op except for blowing
// rax (which we save and restore.)
push(rax); // Store RAX register
xchgl(rax, Address(rsp, 0));
pop(rax); // Restore RAX register
}
}
--
Azeem Jiva
AMD Java Labs
T 512.602.0907
> -----Original Message-----
> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM]
> Sent: Wednesday, February 25, 2009 2:20 PM
> To: John Rose
> Cc: Jiva, Azeem; hotspot compiler
> Subject: Re: MFENCE vs. LOCK addl
>
> Good idea. Can you try it, Azeem?
>
> Paul
>
> John Rose wrote:
> > What about XCHG? It doesn't set flags, and (as a bonus) it implies
> > the effect of a LOCK prefix:
> > push rax
> > xchg rax
> > pop rax
> >
> > -- John
> >
> > On Feb 25, 2009, at 7:05 AM, Jiva, Azeem wrote:
> >
> >> Paul,
> >> Ahh right, I did some experiments with running MFENCE vs. LOCK ADDL
> >> and MFENCE vs. PUSH/LOCK ADDL/POPF and found that MFENCE is faster
> than
> >> PUSH/LOCK/POP but not faster than just using the LOCK instruction by
> >> itself. A nice optimization would be if the JVM could detect if the
> >> condition codes needed to be saved instead of saving them always. This
> >> is on AMD hardware, and other systems might have different
> performance
> >> issues.
> >
More information about the hotspot-compiler-dev
mailing list