RFC (S) JDK-8050149: Experimental option to select the instruction sequence for x86 StoreLoad barrier

Mon Jul 14 22:30:31 UTC 2014

Wouldn't the cost be dominated by the hardware fence though? Even if you
carry a data dependency here, it seems like real-life performance would
degrade due to store buffer drain stall, no? This seems like trying to shed
a few pounds off an elephant.

Also, presumably with out of order execution the register renamer should
allow for speculation to proceed assuming rsp is resolved in time, which it
should given that memory is in cache.

Sent from my phone
On Jul 14, 2014 6:14 PM, "Aleksey Shipilev" <aleksey.shipilev at oracle.com>
wrote:

> On 07/15/2014 02:00 AM, Vitaly Davidovich wrote:
> > In case you're interested (and haven't checked yourself), both
> > clang(3.4.1) and gcc(4.9) use mfence (under O2+) for
> > std::atomic_thread_fence(std::memory_order_seq_cst), which I think is
> > comparable to StoreLoad that's emitted on x86 backend in hotspot.
>
> See "case 0" in my patch. I think C++11 folks use the docs from Sewell
> et al.: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html. (It is
> interesting that since Azeem's Jiva / Dave Dice's research we switched
> to "lock addl". Linux uses "lock addl" as well).
>
> Any non-trivial ideas? In fact, I would be grateful to see the x86
> instruction sequences which provide the semantics (e.g. locked
> instructions referencing memory), and (try to) do not waste registers,
> and/or (try to) avoid stores.
>
> -Aleksey.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20140714/5ed78dad/attachment.html>