MFENCE vs. LOCK addl

Thu Feb 26 06:46:42 PST 2009

In that case, moving it to MacroAssembler is the right thing to do.  And 
rename it just "fence"
or some such.  Leave "mfence" in Assembler to unconditionally emit a 
real mfence instruction.

Paul

Tom Rodriguez wrote:
> Maybe at one point it was an artifact but there's actual a real mfence 
> instruction.  It seems like it needs to be renamed, particularly if it 
> ever stops producing a real mfence.  Things in Assembler are really 
> just supposed to emit exactly what they say they are with no 
> translation or optimization.  MacroAssembler is where logic for 
> choosing more optimal patterns should live.
>
> tom
>
> On Feb 25, 2009, at 12:53 PM, Paul Hohensee wrote:
>
>> The name 'mfence()' is an artifact.  It means to do the equivalent of 
>> OrderAccess::fence().
>>
>> Paul
>>
>> Tom Rodriguez wrote:
>>> Assembler::mfence is used in places where optimizing it wouldn't 
>>> seem to matter to me.  As far as killing the condition flags go I 
>>> don't think any piece of code which calls mfence cares.  There are 
>>> only about 5 calls which seems easy to audit.  Avoiding push/pop at 
>>> all seems much better to me.
>>>
>>> So if mfence is equivalent in power to any old locked instruction 
>>> why is it so much more expensive?  It seems like it either must be 
>>> doing something more or it's a really crappy implementation.  Are we 
>>> sure the callers don't need the something more part?
>>>
>>> As an aside, it seems odd that something in the Assembler that's 
>>> named after a real instruction would never actually emit that 
>>> instruction.  Shouldn't mfence emit mfence and all current callers 
>>> call something else which might sometimes emit mfence?
>>>
>>> tom
>>>
>>> On Feb 25, 2009, at 12:31 PM, Jiva, Azeem wrote:
>>>
>>>> John, Paul --
>>>> Yeah I had tried that and was in the process of writing that up.  
>>>> It is better than an MFENCE and has the added benefit of not 
>>>> needing a system with SSE2+.  I still don't have a good JVM case 
>>>> but the assembler run shows that it's faster than MFENCE.  My naïve 
>>>> change to assembler_x86.cpp:
>>>>
>>>> void Assembler::mfence() {
>>>>   // Memory barriers are only needed on multiprocessors
>>>> if (os::is_MP()) {
>>>>     // All usable chips support "locked" instructions which suffice
>>>>     // as barriers, and are much faster than the alternative of
>>>>     // using cpuid instruction. We use here a xchg which is 
>>>> implicitly locked
>>>>     // This is conveniently otherwise a no-op except for blowing
>>>>     // rax (which we save and restore.)
>>>>     push(rax);    // Store RAX register
>>>>     xchgl(rax, Address(rsp, 0));
>>>>     pop(rax);    // Restore RAX register
>>>> }
>>>> }
>>>>
>>>> -- 
>>>> Azeem Jiva
>>>> AMD Java Labs
>>>> T 512.602.0907
>>>>
>>>>> -----Original Message-----
>>>>> From: Paul.Hohensee at Sun.COM [mailto:Paul.Hohensee at Sun.COM]
>>>>> Sent: Wednesday, February 25, 2009 2:20 PM
>>>>> To: John Rose
>>>>> Cc: Jiva, Azeem; hotspot compiler
>>>>> Subject: Re: MFENCE vs. LOCK addl
>>>>>
>>>>> Good idea.  Can you try it, Azeem?
>>>>>
>>>>> Paul
>>>>>
>>>>> John Rose wrote:
>>>>>> What about XCHG?  It doesn't set flags, and (as a bonus) it implies
>>>>>> the effect of a LOCK prefix:
>>>>>>   push rax
>>>>>>   xchg rax
>>>>>>   pop rax
>>>>>>
>>>>>> -- John
>>>>>>
>>>>>> On Feb 25, 2009, at 7:05 AM, Jiva, Azeem wrote:
>>>>>>
>>>>>>> Paul,
>>>>>>> Ahh right, I did some experiments with running MFENCE vs. LOCK ADDL
>>>>>>> and MFENCE vs. PUSH/LOCK ADDL/POPF and found that MFENCE is faster
>>>>> than
>>>>>>> PUSH/LOCK/POP but not faster than just using the LOCK 
>>>>>>> instruction by
>>>>>>> itself.  A nice optimization would be if the JVM could detect if 
>>>>>>> the
>>>>>>> condition codes needed to be saved instead of saving them 
>>>>>>> always.   This
>>>>>>> is on AMD hardware, and other systems might have different
>>>>> performance
>>>>>>> issues.
>>>>>>
>>>>
>>>>
>>>
>