Re: 回复: please help me about understanding method "OrderAccess::acquire()and OrderAccess::acquire()"
David Holmes
david.holmes at oracle.com
Thu Oct 20 03:20:51 UTC 2016
On 20/10/2016 12:33 PM, 恶灵骑士 wrote:
>
> thank you so much ,David !
> it's my fault not to clarify the problem,
> --------------------------------------------------
> the openjdk8 code shuold be :
> inline void OrderAccess::acquire() {
> volatile intptr_t local_dummy;
> #ifdef AMD64
> __asm__ volatile ("movq 0(%%rsp), %0" : "=r" (local_dummy) : : "memory");
> #else
> __asm__ volatile ("movl 0(%%esp),%0" : "=r" (local_dummy) : : "memory");
> #endif // AMD64
> }
>
> inline void OrderAccess::release() { ----------- copied wrong name before
Ah! I should have realized that. :)
> // Avoid hitting the same cache-line from
> // different threads.
> volatile jint local_dummy = 0;
> }
> --------------------------------------------------
> 1,
> just for openjdk version 8,
> you said that "The intent was to produce some code to force a "compiler
> barrier" so that the acquire() semantics needed on x86 would exist
> force a "compiler barrier" ",
> the keyword ' asm volatile' and 'memory' are not enough to force
> a "compiler barrier"?
> why is "movq 0(%%rsp), %0" : "=r" (local_dummy) " still needed?
> and the local_dummy is a volatile filed,
> if local_dummy without volatile qualifier , the code '__asm__ volatile
> ("movq 0(%%rsp), %0" : "=r" (local_dummy) : : "memory");‘ still can
> force a "compiler barrier"?
>
> and which part of '__asm__ volatile ("movq 0(%%rsp), %0" : "=r"
> (local_dummy) : : "memory");‘ force a 'compiler barrier',
> 'volatile ' or "movq 0(%%rsp), %0" : "=r" (local_dummy) or 'memory'
> or the three together?
You have to remember that this code has very old origins and that
compilers have not been very clear when it comes to things like compiler
barriers and memory ordering, and for a long time we were stuck using
fairly old compilers. So what we have had in the past is a volatile load
using asm "memory" to obtain the necessary "compiler barrier" to effect
the acquire() semantics, and a volatile store to effect the "compiler
barrier" for the release() semantics.
For JDK 9, with our updated compilers, we have moved to the more direct
compiler_barrier code for use with gcc:
static inline void compiler_barrier() {
__asm__ volatile ("" : : : "memory");
}
and as you can see no actual store or load is needed any more.
> -----
> 2,
> on x86, for acquire() , foring a 'comliler barrier' is enough?
> 'hardware barrier' not needed?
> for the reason of x86 ensure loadload and loadstore?
x86 has total-store-ordering so no hardware barriers are needed. As long
as the compiler has not reordered the program statements all stores will
happen in the written order, so nothing needs to happen to achieve
"acquire" semantics.
> ---
> 3,
> i am confused about OrderAccess::release() ;
> in windows_x86 version the comment is "// A volatile store has release
> semantics."
> in linux_x86 version the comment is "// Avoid hitting the same
> cache-line from
> // different threads."
> what's the difference about "volatile jint local_dummy = 0;" between
> windows and linux?
> for windows i can find something in the c++ doc
> that from vc++ 2005 ,vc compiler supports volatile acquire/release
> sematics,
> but for linux ,i can not find anything about gcc describing volatile
> acquire/release sematics,
>
> how does "volatile jint local_dummy = 0;" work on linux_x86?
As you note VS 2005 explicitly provides acquire/release semantics for
volatile read/write. So to tell the VS compiler to implement a release()
operation we simply do:
volatile int dummy = 0;
and it will generate whatever code is needed to achieve release
semantics (likely no code at all as it can elide the actual write to the
dummy variable).
For gcc there was no such explicit statement regarding release/acquire.
So it was based on the observed behaviour and informal commentary on the
gcc internals. It was found that a volatile write was sufficient to get
the desired effects.
The comment:
// Avoid hitting the same cache-line from
// different threads
is unrelated to the acquire/release. In the past these operations wrote
to a shared static variable. That turned out to be a performance issue
because of cache contention. So the code was changed to write to a local
variable instead - and the behaviour of the compiler was verified by
inspection/observation.
Hope that clarifies things.
David
>
> thank you so so ... much !
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "David Holmes";<david.holmes at oracle.com>;
> *发送时间:* 2016年10月20日(星期四) 上午6:39
> *收件人:* "恶灵骑士"<1072213404 at qq.com>;
> "hotspot-dev"<hotspot-dev at openjdk.java.net>;
> *主题:* Re: please help me about understanding method
> "OrderAccess::acquire()and OrderAccess::acquire()"
>
> On 19/10/2016 10:35 PM, 恶灵骑士 wrote:
>> src/os_cpu/linux_x86/vm/orderAccess_linux_x86.inline.hpp
>> inline void OrderAccess::acquire() {
>> volatile intptr_t local_dummy;
>> #ifdef AMD64
>> __asm__ volatile ("movq 0(%%rsp), %0" : "=r" (local_dummy) : :
> "memory");
>> #else
>> __asm__ volatile ("movl 0(%%esp),%0" : "=r" (local_dummy) : : "memory");
>> #endif // AMD64
>> }
>>
>>
>> inline void OrderAccess::acquire() { ---------- should be release()
>> // Avoid hitting the same cache-line from
>> // different threads.
>> volatile jint local_dummy = 0;
>> }
>
> As Kim stated these are old implementations. The intent was to produce
> some code to force a "compiler barrier" so that the acquire() semantics
> needed on x86 would exist - which is just a compiler barrier. The new
> code relies on a more direct gcc technique:
>
> // A compiler barrier, forcing the C++ compiler to invalidate all memory
> assumptions
> static inline void compiler_barrier() {
> __asm__ volatile ("" : : : "memory");
> }
> inline void OrderAccess::acquire() { compiler_barrier(); }
>
>>
>> I have a few questions:
>> 1,does gcc support the c++ keyword 'volatile' aquire/release sematics?
>>
>>
>> if question 1's answer is 'yes', then i can understand the
> implemetation of
>> method 'OrderAccess::acquire()',
>
> Not sure exactly what you mean, but we do not rely on any C++ memory
> model operations in the hotspot code - acquire/release semanics - we
> just use volatile to flag variables that should not be optimized and use
> OrderAccess operations to explicitly enforce any memory ordering
> requirements.
>
>>
>> 2, about the part 0f ' __asm__ volatile ("movq 0(%%rsp), %0" : "=r"
> (local_dummy) : : "memory");'
>> the 'volatile' prevents compiler optimize,
>> the 'memory' ensure compiler no-reordering,
>
> Basically yes that was the intent. The implementation has changed over time.
>
>>
>> then what about ""movq 0(%%rsp), %0" : "=r" (local_dummy) "? what's
> this part effect? and the local_dummywas declared as 'volatile ',is it
> necessary?
>
> That was to do the actual assignment to which the volatile and memory
> would apply. This part is no longer necessary.
>
> David
>
>>
>> thank you so so much!
>>
More information about the hotspot-dev
mailing list