RFC (S) JDK-8050149: Experimental option to select the instruction sequence for x86 StoreLoad barrier
Aleksey Shipilev
aleksey.shipilev at oracle.com
Mon Jul 14 22:32:10 UTC 2014
On 07/15/2014 02:30 AM, Aleksey Shipilev wrote:
> On 07/15/2014 02:26 AM, Vladimir Kozlov wrote:
>> On 7/14/14 3:13 PM, Aleksey Shipilev wrote:
>>> On 07/15/2014 02:00 AM, Vitaly Davidovich wrote:
>>>> In case you're interested (and haven't checked yourself), both
>>>> clang(3.4.1) and gcc(4.9) use mfence (under O2+) for
>>>> std::atomic_thread_fence(std::memory_order_seq_cst), which I think is
>>>> comparable to StoreLoad that's emitted on x86 backend in hotspot.
>>>
>>> See "case 0" in my patch. I think C++11 folks use the docs from Sewell
>>> et al.: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html. (It is
>>> interesting that since Azeem's Jiva / Dave Dice's research we switched
>>> to "lock addl". Linux uses "lock addl" as well).
>>>
>>> Any non-trivial ideas? In fact, I would be grateful to see the x86
>>> instruction sequences which provide the semantics (e.g. locked
>>> instructions referencing memory), and (try to) do not waste registers,
>>> and/or (try to) avoid stores.
>>
>> May be instructions stream:
>>
>> address rip = pc();
>> lock(); addl(InternalAddress(rip), 0);
>>
>
> Thanks, I might as well try that. I think messing with instruction
> pointer breaks some sort of CPU pipelining in the same way "lock
> addl"-ing stack pointer wrecks up perf now, but it would be hilarious to
> watch.
...in fact, "lock addl (%rip + OFFSET), 0" might act as L1$I prefetcher.
-Aleksey.
More information about the hotspot-compiler-dev
mailing list