RFC (S) JDK-8050149: Experimental option to select the instruction sequence for x86 StoreLoad barrier

Mon Jul 14 22:30:00 UTC 2014

On 07/15/2014 02:26 AM, Vladimir Kozlov wrote:
> On 7/14/14 3:13 PM, Aleksey Shipilev wrote:
>> On 07/15/2014 02:00 AM, Vitaly Davidovich wrote:
>>> In case you're interested (and haven't checked yourself), both
>>> clang(3.4.1) and gcc(4.9) use mfence (under O2+) for
>>> std::atomic_thread_fence(std::memory_order_seq_cst), which I think is
>>> comparable to StoreLoad that's emitted on x86 backend in hotspot.
>>
>> See "case 0" in my patch. I think C++11 folks use the docs from Sewell
>> et al.: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html. (It is
>> interesting that since Azeem's Jiva / Dave Dice's research we switched
>> to "lock addl". Linux uses "lock addl" as well).
>>
>> Any non-trivial ideas? In fact, I would be grateful to see the x86
>> instruction sequences which provide the semantics (e.g. locked
>> instructions referencing memory), and (try to) do not waste registers,
>> and/or (try to) avoid stores.
> 
> May be instructions stream:
> 
>   address rip = pc();
>   lock(); addl(InternalAddress(rip), 0);
> 

Thanks, I might as well try that. I think messing with instruction
pointer breaks some sort of CPU pipelining in the same way "lock
addl"-ing stack pointer wrecks up perf now, but it would be hilarious to
watch.

-Aleksey.