MFENCE vs. LOCK addl
Paul Hohensee
Paul.Hohensee at Sun.COM
Thu Feb 26 06:46:42 PST 2009
In that case, moving it to MacroAssembler is the right thing to do. And
rename it just "fence"
or some such. Leave "mfence" in Assembler to unconditionally emit a
real mfence instruction.
Paul
Tom Rodriguez wrote:
> Maybe at one point it was an artifact but there's actual a real mfence
> instruction. It seems like it needs to be renamed, particularly if it
> ever stops producing a real mfence. Things in Assembler are really
> just supposed to emit exactly what they say they are with no
> translation or optimization. MacroAssembler is where logic for
> choosing more optimal patterns should live.
>
> tom
>
> On Feb 25, 2009, at 12:53 PM, Paul Hohensee wrote:
>
>> The name 'mfence()' is an artifact. It means to do the equivalent of
>> OrderAccess::fence().
>>
>> Paul
>>
>> Tom Rodriguez wrote:
>>> Assembler::mfence is used in places where optimizing it wouldn't
>>> seem to matter to me. As far as killing the condition flags go I
>>> don't think any piece of code which calls mfence cares. There are
>>> only about 5 calls which seems easy to audit. Avoiding push/pop at
>>> all seems much better to me.
>>>
>>> So if mfence is equivalent in power to any old locked instruction
>>> why is it so much more expensive? It seems like it either must be
>>> doing something more or it's a really crappy implementation. Are we
>>> sure the callers don't need the something more part?
>>>
>>> As an aside, it seems odd that something in the Assembler that's
>>> named after a real instruction would never actually emit that
>>> instruction. Shouldn't mfence emit mfence and all current callers
>>> call something else which might sometimes emit mfence?
>>>
>>> tom
>>>
>>> On Feb 25, 2009, at 12:31 PM, Jiva, Azeem wrote:
>>>
>>>> John, Paul --
>>>> Yeah I had tried that and was in the process of writing that up.
>>>> It is better than an MFENCE and has the added benefit of not
>>>> needing a system with SSE2+. I still don't have a good JVM case
>>>> but the assembler run shows that it's faster than MFENCE. My naïve
>>>> change to assembler_x86.cpp:
>>>>
>>>> void Assembler::mfence() {
>>>> // Memory barriers are only needed on multiprocessors
>>>> if (os::is_MP()) {
>>>> // All usable chips support "locked" instructions which suffice
>>>> // as barriers, and are much faster than the alternative of
>>>> // using cpuid instruction. We use here a xchg which is
>>>> implicitly locked
>>>> // This is conveniently otherwise a no-op except for blowing
>>>> // rax (which we save and restore.)
>>>> push(rax); // Store RAX register
>>>> xchgl(rax, Address(rsp, 0));
>>>> pop(rax); // Restore RAX register
>>>> }
>>>> }
>>>>
>>>> --
>>>> Azeem Jiva
>>>> AMD Java Labs
>>>> T 512.602.0907
>>>>
>>>>> -----Original Message-----
>>>>> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM]
>>>>> Sent: Wednesday, February 25, 2009 2:20 PM
>>>>> To: John Rose
>>>>> Cc: Jiva, Azeem; hotspot compiler
>>>>> Subject: Re: MFENCE vs. LOCK addl
>>>>>
>>>>> Good idea. Can you try it, Azeem?
>>>>>
>>>>> Paul
>>>>>
>>>>> John Rose wrote:
>>>>>> What about XCHG? It doesn't set flags, and (as a bonus) it implies
>>>>>> the effect of a LOCK prefix:
>>>>>> push rax
>>>>>> xchg rax
>>>>>> pop rax
>>>>>>
>>>>>> -- John
>>>>>>
>>>>>> On Feb 25, 2009, at 7:05 AM, Jiva, Azeem wrote:
>>>>>>
>>>>>>> Paul,
>>>>>>> Ahh right, I did some experiments with running MFENCE vs. LOCK ADDL
>>>>>>> and MFENCE vs. PUSH/LOCK ADDL/POPF and found that MFENCE is faster
>>>>> than
>>>>>>> PUSH/LOCK/POP but not faster than just using the LOCK
>>>>>>> instruction by
>>>>>>> itself. A nice optimization would be if the JVM could detect if
>>>>>>> the
>>>>>>> condition codes needed to be saved instead of saving them
>>>>>>> always. This
>>>>>>> is on AMD hardware, and other systems might have different
>>>>> performance
>>>>>>> issues.
>>>>>>
>>>>
>>>>
>>>
>
More information about the hotspot-compiler-dev
mailing list