RFR: 8288904: Incorrect memory ordering in UL [v2]
Erik Österlund
eosterlund at openjdk.org
Wed Jun 22 22:22:58 UTC 2022
On Wed, 22 Jun 2022 21:28:12 GMT, Johan Sjölén <duke at openjdk.org> wrote:
>> Right but isn't there a control dependency that the hardware will still obey? Or can the hardware write to memory even if that path is never taken?
>> Regarding UB, I could paste the assembly of that instead (maybe I should have done that). My question was whether the cpu can execute that write instruction before even knowing if that branch will be taken.
>> Note: I found this interesting article about control dependencies (https://urldefense.com/v3/__https://lwn.net/Articles/860037/__;!!ACWV5N9M2RV99hQ!IOIrmfu0oArXPy3TLSB_VXl7RhgOxmdZAlAFkn9GvYKcpKIiJoSbFD8WRD8wf_6y5Mlimf7qQFsxAmxLQgTjIxb0GFB1epPOUA$ ). It mentions that the hardware will respect that dependency but there could be some aggressive compiler optimizations on some cases. I don't think that applies here though.
>> @fisk sorry, not sure I understood the example.
>
> @pchilano, it seems that it is true that a control dependency establishes that writes are not moved above the read of a control branch (as we do not know which, if any, branch is taken before the read is done). read-on-read allows for moving it up however.
>
> For example:
>
>
> // OK
> x = true;
> while(x == true) { }
> y = a[0];
> ~>
> x = true;
> y = a[0];
> while(x == true) {}
> // NOT OK
> x = true;
> while(x == true) { }
> a[0] = 5;
> ~>
> x = true;
> a[0] = 5;
> while(x == true) {}
>
>
> Source:
>
> https://urldefense.com/v3/__https://www.cl.cam.ac.uk/*pes20/ppc-supplemental/test7.pdf__;fg!!ACWV5N9M2RV99hQ!IOIrmfu0oArXPy3TLSB_VXl7RhgOxmdZAlAFkn9GvYKcpKIiJoSbFD8WRD8wf_6y5Mlimf7qQFsxAmxLQgTjIxb0GFCogvwZ5A$ section 4.2 and 4.4
>
> I believe that that means that this barrier is unnecessary, but it's good manners to do the `Atomic::load`.
>
> Nice catch :-).
Hold your horses guys. I think the logic in the cited paper is flawed. It essentially says that surely the hardware couldn't perform stores before knowing what control flow is taken, at which point the value of the load must be known. Therefore the hardware can't reorder the load and store.
Well, the hardware can speculate one branch is taken, defer the load, buffer the stores without committing them to caches, and then commit them once the load is executed, and the speculation is proven right, committing the branch. Then the already buffered stores will be published. This will yield a result equivalent to the load and store appearing to have reordered across the control dependency. It would never break a sequential program, but would reorder a LoadStore pairbseparated by a control dependency, requiring barriers.
In fact, the official ARMv8 memory model document linked here https://urldefense.com/v3/__https://documentation-service.arm.com/static/6048f1aaee937942ba302655__;!!ACWV5N9M2RV99hQ!IOIrmfu0oArXPy3TLSB_VXl7RhgOxmdZAlAFkn9GvYKcpKIiJoSbFD8WRD8wf_6y5Mlimf7qQFsxAmxLQgTjIxb0GFAgf59piA$ (end of section 6.2) says explicitly that the control dependency is *not* enough to prevent ordering between a load and a store, and that you need explicit barriers if you don't want reordering.
So basically I would rather trust the official ARM documentation that explicitly says it may reorder, than an academic paper saying it isn't possible because building such hardware would be tricky. It's essentially proof by lack of imagination IMO.
It would be interesting I guess, to run the "S" litmus test on the failing machine. Although I wouldn't want to rely on control dependency tricks for ordering regardless of the outcome. I would follow the spec and assume reordering can happen, as it clearly states.
-------------
PR: https://git.openjdk.org/jdk/pull/9225
More information about the hotspot-runtime-dev
mailing list