From aph-open at littlepinkcloud.com Sat Aug 7 09:27:29 2021 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Sat, 7 Aug 2021 10:27:29 +0100 Subject: [jmm-dev] Is the JMM spec progressing? In-Reply-To: References: <27a1640c-830f-d0f8-eb2c-5c6753ce43c0@gmail.com> Message-ID: On 8/6/21 11:32 PM, Hans Boehm wrote: > Probably even more controversially, I think we've realized that > existing compiler technology can compile such racing code in ways > that some of us are not 100% sure should really be allowed. This implies, does it not, that the problem is not formalization as such, but that we don't really understand what the language is supposed to mean? That's always been my problem with OOTA: I'm unsure whether the problem is due to the inadequacy of formal models, in which case the formalists can fix their own problem, or something we all have to pay attention to. > Demonstrably unexecuted code can affect the semantics in ways that > strike me as scary. (See wg21.link/p1217 for a down-to-assembly C++ > version; IIUC, Lochbihler and others earlier came up with some > closely related observations for Java.) Looking again at p1217, it seems to me that enforcing load-store ordering would have severe effects on compilers, at least without new optimization techniques. We hoist loads before loops and sink stores after them. When it all works out, there are no memory accesses in the loop. A load-store barrier in a loop would have the effect of forcing succeeding stores out to memory, and forcing preceding loads to reload from memory. It's not hard to imagine that this would cause an order-of-margnitude performance reduction in common cases. I suppose one could argue that such optimizations would continue to be valid, so only those stores which would have been emitted anyway would be affected. But that's not how compilers work, as far as I know. In our IR for C2, memory accesses are not pinned in any way, so the only way to make unrelated accesses execute in any particular order is to add a dependency between all loads and stores. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph-open at littlepinkcloud.com Sat Aug 7 09:18:18 2021 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Sat, 7 Aug 2021 10:18:18 +0100 Subject: [jmm-dev] Is the JMM spec progressing? In-Reply-To: References: <27a1640c-830f-d0f8-eb2c-5c6753ce43c0@gmail.com> Message-ID: <3a326f91-7684-084b-8c1b-233c067d3f8f@littlepinkcloud.com> On 8/6/21 11:32 PM, Hans Boehm wrote: > Probably even more controversially, I think we've realized that > existing compiler technology can compile such racing code in ways > that some of us are not 100% sure should really be allowed. This implies, does it not, that the problem is not formalization as such, but that we don't really understand what the language is supposed to mean? That's always been my problem with OOTA: I'm unsure whether the problem is due to the inadequacy of formal models, in which case the formalists can fix their own problem, or something we all have to pay attention to. > Demonstrably unexecuted code can affect the semantics in ways that > strike me as scary. (See wg21.link/p1217 for a down-to-assembly C++ > version; IIUC, Lochbihler and others earlier came up with some > closely related observations for Java.) Looking again at p1217, it seems to me that enforcing load-store ordering would have severe effects on compilers, at least without new optimization techniques. We hoist loads before loops and sink stores after them. When it all works out, there are no memory accesses in the loop. A load-store barrier in a loop would have the effect of forcing succeeding stores out to memory, and forcing preceding loads to reload from memory. It's not hard to imagine that this would cause an order-of-margnitude performance reduction in common cases. I suppose one could argue that such optimizations would continue to be valid, so only those stores which would have been emitted anyway would be affected. But that's not how compilers work, as far as I know. In our IR for C2, memory accesses are not pinned in any way, so the only way to make unrelated accesses execute in any particular order is to add a dependency between all loads and stores. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dl at cs.oswego.edu Sat Aug 7 11:43:52 2021 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 7 Aug 2021 07:43:52 -0400 Subject: [jmm-dev] Is the JMM spec progressing? In-Reply-To: References: <27a1640c-830f-d0f8-eb2c-5c6753ce43c0@gmail.com> Message-ID: (Henceforth, let's drop jdk-dev and move this to jmm-dev.) On 8/7/21 5:27 AM, Andrew Haley wrote: > On 8/6/21 11:32 PM, Hans Boehm wrote: > >> Probably even more controversially, I think we've realized that >> existing compiler technology can compile such racing code in ways >> that some of us are not 100% sure should really be allowed. > This implies, does it not, that the problem is not formalization as > such, but that we don't really understand what the language is > supposed to mean? That's always been my problem with OOTA: I'm unsure > whether the problem is due to the inadequacy of formal models, in > which case the formalists can fix their own problem, or something we > all have to pay attention to. Yes. My stance in the less formal account (http://gee.cs.oswego.edu/dl/html/j9mm.html) as well as Shuyang Liu et al's ongoing formalization (see links from http://compilers.cs.ucla.edu/people/) is that the most you want to say about racy Java programs is that they are typesafe. As in: you can't see a String when expecting an int. Even this looser constraint is challenging to specify, prove, and extend. But it is a path for Java that might not apply to languages like C that are not guaranteed typesafe anyway, and so enter Undefined Behavior territory (as opposed to possibly-unexpected but still typesafe behavior). -Doug > >> Demonstrably unexecuted code can affect the semantics in ways that >> strike me as scary. (See wg21.link/p1217 for a down-to-assembly C++ >> version; IIUC, Lochbihler and others earlier came up with some >> closely related observations for Java.) > Looking again at p1217, it seems to me that enforcing load-store > ordering would have severe effects on compilers, at least without new > optimization techniques. We hoist loads before loops and sink stores > after them. When it all works out, there are no memory accesses in the > loop. A load-store barrier in a loop would have the effect of forcing > succeeding stores out to memory, and forcing preceding loads to reload > from memory. It's not hard to imagine that this would cause an > order-of-margnitude performance reduction in common cases. > > I suppose one could argue that such optimizations would continue to be > valid, so only those stores which would have been emitted anyway would > be affected. But that's not how compilers work, as far as I know. In > our IR for C2, memory accesses are not pinned in any way, so the only > way to make unrelated accesses execute in any particular order is to > add a dependency between all loads and stores. > From hboehm at google.com Sat Aug 7 22:13:25 2021 From: hboehm at google.com (Hans Boehm) Date: Sat, 7 Aug 2021 15:13:25 -0700 Subject: [jmm-dev] Is the JMM spec progressing? In-Reply-To: References: <27a1640c-830f-d0f8-eb2c-5c6753ce43c0@gmail.com> Message-ID: On Sat, Aug 7, 2021 at 2:28 AM Andrew Haley wrote: > On 8/6/21 11:32 PM, Hans Boehm wrote: > > > Probably even more controversially, I think we've realized that > > existing compiler technology can compile such racing code in ways > > that some of us are not 100% sure should really be allowed. > > This implies, does it not, that the problem is not formalization as > such, but that we don't really understand what the language is > supposed to mean? That's always been my problem with OOTA: I'm unsure > whether the problem is due to the inadequacy of formal models, in > which case the formalists can fix their own problem, or something we > all have to pay attention to. > In some sense, I'm not sure either. The p1217 examples bother me in that they seem to violate some global programming rules ("if x is only ever null or refers to an object properly constructed by the same thread, then x should never appear to refer to an incompletely constructed object"). And there seems to be disagreement about whether the currently allowed behavior is "correct". On the other hand, in practice the weirdness doesn't seem to break things. If you ask people advocating the current behavior, the answer will be that it doesn't matter because nobody writes code that way. If you ask people trying to analyzer or verify code, they'll probably be unhappy. And I haven't been able to convince myself that you cannot get yourself into these situations just by linking components together, each of which does something perfectly reasonable. And there are very common code patterns (like the standard implementation of reentrant locks used by all Java implementations) that break if you allow general OOTA behavior. Which at least means that you can't currently formally verify such code. The theorem you'd be trying to prove is false with respect to the part of the language spec we know how to formalize. It's a mess. > > Demonstrably unexecuted code can affect the semantics in ways that > > strike me as scary. (See wg21.link/p1217 for a down-to-assembly C++ > > version; IIUC, Lochbihler and others earlier came up with some > > closely related observations for Java.) > > Looking again at p1217, it seems to me that enforcing load-store > ordering would have severe effects on compilers, at least without new > optimization techniques. We hoist loads before loops and sink stores > after them. When it all works out, there are no memory accesses in the > loop. A load-store barrier in a loop would have the effect of forcing > succeeding stores out to memory, and forcing preceding loads to reload > from memory. It's not hard to imagine that this would cause an > order-of-margnitude performance reduction in common cases. > I think it would be a fairly pervasive change to optimizers. It has also become clear in WG21, the C++ committee, that there is not enough support for requiring this. In that case, Ou and Demsky have a paper saying that the overhead is likely to be on the order of 1% or less. For Java if it were applied everywhere, it would probably be appreciably higher. On the other hand, it's a bit harder than that to come up with examples where the generated x86 code has to be worse. Moving loads earlier in the code, or delaying stores, as you suggest, would still be fine. The only issue is with delaying loads past stores, which seems less common, though it can certainly be beneficial for reducing live ranges, probably some vectorization etc. But it seems unlikely that such a restriction will be applied even to C++ memory_order_relaxed, much less Java ordinary variables. Hans > > I suppose one could argue that such optimizations would continue to be > valid, so only those stores which would have been emitted anyway would > be affected. But that's not how compilers work, as far as I know. In > our IR for C2, memory accesses are not pinned in any way, so the only > way to make unrelated accesses execute in any particular order is to > add a dependency between all loads and stores. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From hboehm at google.com Sat Aug 7 22:24:54 2021 From: hboehm at google.com (Hans Boehm) Date: Sat, 7 Aug 2021 15:24:54 -0700 Subject: [jmm-dev] Is the JMM spec progressing? In-Reply-To: References: <27a1640c-830f-d0f8-eb2c-5c6753ce43c0@gmail.com> Message-ID: On Sat, Aug 7, 2021 at 4:44 AM Doug Lea
wrote: > (Henceforth, let's drop jdk-dev and move this to jmm-dev.) > > On 8/7/21 5:27 AM, Andrew Haley wrote: > > On 8/6/21 11:32 PM, Hans Boehm wrote: > > > >> Probably even more controversially, I think we've realized that > >> existing compiler technology can compile such racing code in ways > >> that some of us are not 100% sure should really be allowed. > > This implies, does it not, that the problem is not formalization as > > such, but that we don't really understand what the language is > > supposed to mean? That's always been my problem with OOTA: I'm unsure > > whether the problem is due to the inadequacy of formal models, in > > which case the formalists can fix their own problem, or something we > > all have to pay attention to. > > Yes. My stance in the less formal account > (http://gee.cs.oswego.edu/dl/html/j9mm.html) as well as Shuyang Liu et > al's ongoing formalization (see links from > http://compilers.cs.ucla.edu/people/) is that the most you want to say > about racy Java programs is that they are typesafe. As in: you can't see > a String when expecting an int. Even this looser constraint is > challenging to specify, prove, and extend. But it is a path for Java > that might not apply to languages like C that are not guaranteed > typesafe anyway, and so enter Undefined Behavior territory (as opposed > to possibly-unexpected but still typesafe behavior). > > -Doug > But this now breaks some common idioms, right? In particular, I think a bunch of code assumes that racing assignments of equivalent primitive values or immutable objects to the same field are OK. If, in 2004, our view of language-based security had been the same as it is now, then I completely agree that this would have been the right approach. But I think doing it now would require significant user code changes. Which might still be the best way forward ... Hans > > > > >> Demonstrably unexecuted code can affect the semantics in ways that > >> strike me as scary. (See wg21.link/p1217 for a down-to-assembly C++ > >> version; IIUC, Lochbihler and others earlier came up with some > >> closely related observations for Java.) > > Looking again at p1217, it seems to me that enforcing load-store > > ordering would have severe effects on compilers, at least without new > > optimization techniques. We hoist loads before loops and sink stores > > after them. When it all works out, there are no memory accesses in the > > loop. A load-store barrier in a loop would have the effect of forcing > > succeeding stores out to memory, and forcing preceding loads to reload > > from memory. It's not hard to imagine that this would cause an > > order-of-margnitude performance reduction in common cases. > > > > I suppose one could argue that such optimizations would continue to be > > valid, so only those stores which would have been emitted anyway would > > be affected. But that's not how compilers work, as far as I know. In > > our IR for C2, memory accesses are not pinned in any way, so the only > > way to make unrelated accesses execute in any particular order is to > > add a dependency between all loads and stores. > > > From jriely at gmail.com Tue Aug 10 23:00:18 2021 From: jriely at gmail.com (James Riely) Date: Tue, 10 Aug 2021 18:00:18 -0500 Subject: [jmm-dev] Is the JMM spec progressing? In-Reply-To: References: <27a1640c-830f-d0f8-eb2c-5c6753ce43c0@gmail.com> Message-ID: Alan Jeffrey and I have been working on how to distinguish genuine thin-air executions from those that are just thin-air-curious. A first draft of this analysis appeared in last year?s OOPSLA [1]. We?ve been pursuing follow-up work with an expanded list of colleagues to support sequential composition and to develop some tooling and proof mechanization. The follow-up work has revealed lots of little bugs in that paper, but the larger points are still valid. The main point is that, as a concept, ?thin-air? is too ill-defined to be useful. Instead we should look for compositional reasoning principles. Applying this to the Lochbilher example, we can show that speculative operational models, like the JMM and Promising Semantics, fail to support compositional reasoning for temporal safety properties. In section 6, we provide a simplistic logic for reasoning through (a variant of) the Lochbilher example. In that section, we also analyze the first RFUB example [2]. In this case we use Hoare logic as a window on what it means to have a dependency. In particular, we observe that {true} (r=y; if (r!=42){r=42}; x=r) {x==42} VALID is a valid Hoare triple, arguing that the write to x is independent of the read from y. But that is not true when you remove the ?unexecuted? code: {true} (r=y; x=r) {x==42} NOT VALID Here the write to x clearly depends on the read. We were not aware of the RFUB New Constructor example when we wrote the paper: y=x || r=y; if(r=nullptr){r=new C}; x=r; r->f() With the goal of removing object creation, I believe it is fair to compare this to the following: y=x || r=y; if(r=0){r=randomNonZero()}; x=r; if(r) {z=1} In this case, we do have {true} (r=y; if(r=0){r=randomNonZero()}; x=r; if(r) {z=1}) {z=1} VALID But not {true} (r=y; if(r=0){r=randomNonZero()}; x=r; if(r) {z=1}) {x=42} NOT VALID (for any value you pick) Thus there is a real dependency from y to x in this case, but not from y to z. Best wishes, James [1] https://doi.org/10.1145/3428262 [2] https://wg21.link/p1217r0 [3] https://wg21.link/p1217r2 On Sun, Aug 8, 2021 at 5:59 AM Hans Boehm wrote: > On Sat, Aug 7, 2021 at 2:28 AM Andrew Haley > wrote: > > > On 8/6/21 11:32 PM, Hans Boehm wrote: > > > > > Probably even more controversially, I think we've realized that > > > existing compiler technology can compile such racing code in ways > > > that some of us are not 100% sure should really be allowed. > > > > This implies, does it not, that the problem is not formalization as > > such, but that we don't really understand what the language is > > supposed to mean? That's always been my problem with OOTA: I'm unsure > > whether the problem is due to the inadequacy of formal models, in > > which case the formalists can fix their own problem, or something we > > all have to pay attention to. > > > In some sense, I'm not sure either. The p1217 examples bother me in that > they seem to violate some global programming rules ("if x is only ever > null > or refers to an object properly constructed by the same thread, then x > should never appear to refer to an incompletely constructed object"). > And there seems to be disagreement about whether the currently allowed > behavior is "correct". > > On the other hand, in practice the weirdness doesn't seem to break things. > If you ask people advocating the current behavior, the answer will be > that it doesn't matter because nobody writes code that way. If you ask > people trying to analyzer or verify code, they'll probably be unhappy. > And I haven't been able to convince myself that you cannot get yourself > into these situations just by linking components together, each of which > does something perfectly reasonable. > > And there are very common code > patterns (like the standard implementation of reentrant locks used > by all Java implementations) that break if you allow general OOTA > behavior. Which at least means that you can't currently formally verify > such > code. The theorem you'd be trying to prove is false with respect to the > part of the language spec we know how to formalize. > > It's a mess. > > > > > Demonstrably unexecuted code can affect the semantics in ways that > > > strike me as scary. (See wg21.link/p1217 for a down-to-assembly C++ > > > version; IIUC, Lochbihler and others earlier came up with some > > > closely related observations for Java.) > > > > Looking again at p1217, it seems to me that enforcing load-store > > ordering would have severe effects on compilers, at least without new > > optimization techniques. We hoist loads before loops and sink stores > > after them. When it all works out, there are no memory accesses in the > > loop. A load-store barrier in a loop would have the effect of forcing > > succeeding stores out to memory, and forcing preceding loads to reload > > from memory. It's not hard to imagine that this would cause an > > order-of-margnitude performance reduction in common cases. > > > I think it would be a fairly pervasive change to optimizers. It has also > become clear in WG21, the C++ committee, that there is not enough > support for requiring this. In that case, Ou and Demsky have a paper > saying that the overhead is likely to be on the order of 1% or less. > For Java if it were applied everywhere, it would probably be > appreciably higher. > > On the other hand, it's a bit harder than that to come up with examples > where > the generated x86 code has to be worse. Moving loads earlier in the > code, or delaying stores, as you suggest, would still be fine. The only > issue is with delaying loads past stores, which seems less common, > though it can certainly be beneficial for reducing live ranges, probably > some > vectorization etc. > > But it seems unlikely that such a restriction will be applied even to > C++ memory_order_relaxed, much less Java ordinary variables. > > Hans > > > > > I suppose one could argue that such optimizations would continue to be > > valid, so only those stores which would have been emitted anyway would > > be affected. But that's not how compilers work, as far as I know. In > > our IR for C2, memory accesses are not pinned in any way, so the only > > way to make unrelated accesses execute in any particular order is to > > add a dependency between all loads and stores. > > > > -- > > Andrew Haley (he/him) > > Java Platform Lead Engineer > > Red Hat UK Ltd. > > https://keybase.io/andrewhaley > > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > > -- James Riely jriely at gmail.com jriely at cs.depaul.edu http://fpl.cs.depaul.edu/jriely/