Re: 回复： please help me about understanding method "OrderAccess::acquire()and OrderAccess::acquire()"

Thu Oct 20 03:20:51 UTC 2016

On 20/10/2016 12:33 PM, 恶灵骑士 wrote:
>
> thank you so much ,David !
> it's my fault not to clarify the problem,
> --------------------------------------------------
> the openjdk8 code shuold be :
> inline void OrderAccess::acquire() {
>   volatile intptr_t local_dummy;
> #ifdef AMD64
>   __asm__ volatile ("movq 0(%%rsp), %0" : "=r" (local_dummy) : : "memory");
> #else
>   __asm__ volatile ("movl 0(%%esp),%0" : "=r" (local_dummy) : : "memory");
> #endif // AMD64
> }
>
> inline void OrderAccess::release() {   ----------- copied wrong name before

Ah! I should have realized that. :)

>   // Avoid hitting the same cache-line from
>   // different threads.
>   volatile jint local_dummy = 0;
> }
> --------------------------------------------------
> 1,
> just for openjdk version 8,
> you said that "The intent was to produce some code to force a "compiler
> barrier" so that the acquire() semantics needed on x86 would exist
>  force a "compiler barrier" ",
> the keyword ' asm volatile' and 'memory'  are not enough to force
> a "compiler barrier"?
> why is "movq 0(%%rsp), %0" : "=r" (local_dummy) "  still needed?
> and the local_dummy is a volatile filed,
> if local_dummy without volatile qualifier , the code '__asm__ volatile
> ("movq 0(%%rsp), %0" : "=r" (local_dummy) : : "memory");‘ still can
> force a "compiler barrier"？
>
> and which part of  '__asm__ volatile ("movq 0(%%rsp), %0" : "=r"
> (local_dummy) : : "memory");‘  force a 'compiler barrier',
> 'volatile '  or  "movq 0(%%rsp), %0" : "=r" (local_dummy)  or  'memory'
> or  the three together?

You have to remember that this code has very old origins and that 
compilers have not been very clear when it comes to things like compiler 
barriers and memory ordering, and for a long time we were stuck using 
fairly old compilers. So what we have had in the past is a volatile load 
using asm "memory" to obtain the necessary "compiler barrier" to effect 
the acquire() semantics, and a volatile store to effect the "compiler 
barrier" for the release() semantics.

For JDK 9, with our updated compilers, we have moved to the more direct 
compiler_barrier code for use with gcc:

static inline void compiler_barrier() {
    __asm__ volatile ("" : : : "memory");
}

and as you can see no actual store or load is needed any more.

> -----
> 2,
> on x86, for acquire() , foring a 'comliler barrier' is enough?
> 'hardware barrier' not needed?
> for the reason of x86 ensure loadload and loadstore?

x86 has total-store-ordering so no hardware barriers are needed. As long 
as the compiler has not reordered the program statements all stores will 
happen in the written order, so nothing needs to happen to achieve 
"acquire" semantics.

> ---
> 3,
> i am confused about OrderAccess::release() ;
> in windows_x86 version the comment is "// A volatile store has release
> semantics."
> in linux_x86 version the comment is  "// Avoid hitting the same
> cache-line from
>   // different threads."
> what's the difference about "volatile jint local_dummy = 0;" between
> windows and linux?
> for windows i can find something in the c++ doc
> that  from  vc++ 2005 ,vc compiler supports volatile acquire/release
> sematics,
> but for linux ,i can not  find anything about gcc  describing  volatile
> acquire/release sematics,
>
> how does "volatile jint local_dummy = 0;" work on linux_x86?

As you note VS 2005 explicitly provides acquire/release semantics for 
volatile read/write. So to tell the VS compiler to implement a release() 
operation we simply do:

volatile int dummy = 0;

and it will generate whatever code is needed to achieve release 
semantics (likely no code at all as it can elide the actual write to the 
dummy variable).

For gcc there was no such explicit statement regarding release/acquire. 
So it was based on the observed behaviour and informal commentary on the 
gcc internals. It was found that a volatile write was sufficient to get 
the desired effects.

The comment:

// Avoid hitting the same cache-line from
// different threads

is unrelated to the acquire/release. In the past these operations wrote 
to a shared static variable. That turned out to be a performance issue 
because of cache contention. So the code was changed to write to a local 
variable instead - and the behaviour of the compiler was verified by 
inspection/observation.

Hope that clarifies things.

David

>
> thank you so so ... much !
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "David Holmes";<david.holmes at oracle.com>;
> *发送时间:* 2016年10月20日(星期四) 上午6:39
> *收件人:* "恶灵骑士"<1072213404 at qq.com>;
> "hotspot-dev"<hotspot-dev at openjdk.java.net>;
> *主题:* Re: please help me about understanding method
> "OrderAccess::acquire()and OrderAccess::acquire()"
>
> On 19/10/2016 10:35 PM, 恶灵骑士 wrote:
>> src/os_cpu/linux_x86/vm/orderAccess_linux_x86.inline.hpp
>> inline void OrderAccess::acquire() {
>>   volatile intptr_t local_dummy;
>> #ifdef AMD64
>>   __asm__ volatile ("movq 0(%%rsp), %0" : "=r" (local_dummy) : :
> "memory");
>> #else
>>   __asm__ volatile ("movl 0(%%esp),%0" : "=r" (local_dummy) : : "memory");
>> #endif // AMD64
>> }
>>
>>
>> inline void OrderAccess::acquire() {   ----------  should be release()
>>   // Avoid hitting the same cache-line from
>>   // different threads.
>>   volatile jint local_dummy = 0;
>> }
>
> As Kim stated these are old implementations. The intent was to produce
> some code to force a "compiler barrier" so that the acquire() semantics
> needed on x86 would exist - which is just a compiler barrier. The new
> code relies on a more direct gcc technique:
>
> // A compiler barrier, forcing the C++ compiler to invalidate all memory
> assumptions
> static inline void compiler_barrier() {
>    __asm__ volatile ("" : : : "memory");
> }
> inline void OrderAccess::acquire()    { compiler_barrier(); }
>
>>
>> I have a few questions:
>> 1,does gcc support the c++ keyword 'volatile' aquire/release sematics?
>>
>>
>> if question 1's answer is 'yes', then i can understand the
> implemetation of
>> method 'OrderAccess::acquire()',
>
> Not sure exactly what you mean, but we do not rely on any C++ memory
> model operations in the hotspot code - acquire/release semanics - we
> just use volatile to flag variables that should not be optimized and use
> OrderAccess operations to explicitly enforce any memory ordering
> requirements.
>
>>
>> 2, about the part 0f ' __asm__ volatile ("movq 0(%%rsp), %0" : "=r"
> (local_dummy) : : "memory");'
>> the 'volatile' prevents compiler optimize,
>> the 'memory' ensure compiler no-reordering,
>
> Basically yes that was the intent. The implementation has changed over time.
>
>>
>> then what about ""movq 0(%%rsp), %0" : "=r" (local_dummy) "? what's
> this part effect? and the local_dummywas declared as 'volatile ',is it
> necessary?
>
> That was to do the actual assignment to which the volatile and memory
> would apply. This part is no longer necessary.
>
> David
>
>>
>> thank you so so much!
>>