RFR: 8305819: LogConfigurationTest intermittently fails on AArch64

Fri May 5 09:23:26 UTC 2023

On Fri, 5 May 2023 09:09:07 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> After @theRealAph 's detailed explanation and reading some related materials, I am convinced that we should use acquire operation on the read side, and I'd like to explain why in detail based on my current understanding.
>> 
>> I suppose that we have agreed that a storestore barrier is needed on the write side.
>> 
>> We can simplify the logic of the problematic region as follows:
>> 
>> 
>> 
>>                    The Writer                        |      The Reader(s)
>>                                                      |
>> (1) create a new node                                |  (1) traverse all nodes from _level_start[level]
>> (2) init the node (set _value, _level, _next ...)    |  (2) call (node->_value)->write(...)
>> (3) update the _level_start array                    |
>> (4) link the node to the list                        |
>> 
>> 
>> The writer step (3) and (4) add the new node. The reader step (1) may get the new node(s) added by the writer and then call (node->value)->write(...).
>> 
>> According to the [std::memory_order page](https://en.cppreference.com/w/cpp/atomic/memory_order) :
>> 
>> 
>> If one evaluation modifies a memory location, and the other reads or modifies the same memory location,
>> and if at least one of the evaluations is not an atomic operation, the behavior of the program is
>> undefined (the program has a data race) unless there exists a happens-before relationship between
>> these two evaluations.
>> 
>> 
>> There is a data race between the writer and reader(s). The writer step (3) and (4) modify the _level_start array and _next of a old node, and the reader step (1) reads from _level_start and _next of a node. This causes a UB. And in current problem, the reader step (2) may read an incorrect _value from a new node.
>> 
>> To fix this problem, we need to establish a happens-before relationship (Inter-thread happens-before). That means when step (1) on read side get a new node, we must make sure that step (1) and (2) on the write side happens-before step (2) on read side.
>> 
>> We can achieve the goal by `Dependency-ordered before` since the reader step (1) carries a dependency into step (2). Unfortunately, Hotspot currently doesn't support consume operation just as Aph metioned, so we should depend on `Synchronizes with`.
>> 
>> Combining the [std::memory_order page](https://en.cppreference.com/w/cpp/atomic/memory_order) and [std::atomic_thread_fence page](https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence) together, we can get a solution: add a release fence between step (2) and (3) on write side and use acquire operation on read side step (1).
>> 
>> At first, I only added a storestore barrier on the write side. The reason is reader step (1) and (2) have a dependency and the only CPU I know that may reorder dependent loads is Alpha, and Alpha has gone.
>> 
>> Because the code here is shared and we cannot assume the architecture we support, the acquire operation is necessary according to the language standard and the current implementation of hotspot, and also we need to make _level_start and _next volatile.
>> 
>> The above is my current understanding. Please feel free to correct me if I'm wrong.
>> 
>> This is the only place I can find now that has data race. Should we consider integrating this patch first if the current solution is acceptable and then rewrite the logic by RWL after RWL is supported by hotspot?
>
> Hi,
> 
>> I learn your comments and Boehm's. Actually, I found that a discussion on control dependency ordering took place in the PR of JDK-8288904 before. The knowledge is non-trivial but I think I got your point.
> 
> Well, maybe.
> 
> I think the core problem here is that the C++ memory model is clear enough, but many people want to find ways to cheat it. "Hey, can't we rely on a data dependency?" The answer to; that question is certainly no, you can't, not until we use `memory_order_consume`.
> 
>>     within a CPU, control and data dependency is guaranteed.
> 
> No, it really isn't. This is shared code, not CPU-specific, and therefore we don't know anything about any CPU.
>  
>>     2. I believe that the error reported in JDK-8305819 is caused by hardware instead of compiler optimization.  
> 
> In this case, yes, but the real error is between keyboard and chair: the programmer.
> 
> I believe you're thinking about this problem at entirely the wrong level. This is a C++ program, and the compiler interprets it according to the rules of C++. Therefore, the only rules to which anyone may appeal are the rules of C++.
> 
>> Evidence: I was told by @gaogao-mem this intermittent crash disappeared after he integrated a storestore barrier. He has ran the tests millions of times and the error never occurs again.  Can we settle that a store-only barrier can practically resolve this error?
> 
> I don't understand your reasoning. Why do you want to cheat? Why not simply follow the rules?
> 
>> We also provided reasoning why we need a storestore barrier in prior comments. we will put comments in PR. The process is same as JDK-8288904. Reason, merge and resolve the error practically.
> 
>>     3. I am not saying it's perfect. I agree the current solution is still "buggy" in presence of compiler optimizations.  we never enforce it to the C++ standard so it's "UB".
> 
> So, if you know that it's still incorrect, what is your motive for pushing an incorrect change, when the correct one is simple? Why not push a correct change today instead? We know what it is, let's simply use proper synchronization and no-one will ever have to look at this again, regardless of compiler and CPU changes.

> The PR was never approved and was somewhat contentious. I would want to see how UL uses the proposed RWL before considering the integration of both changes. This is not a trivial amount of work I know.
> 
> Buried on the other PR was a suggestion to switch the broken UL RCU mechanism for one of the other VM RCU-like mechanisms, but I'm not familiar enough with those other mechanisms to say if that is viable. But perhaps the UL RCU mechanism can be examined against those other ones to see if any more bugs are evident?
> 
> Or we just proceed with a suitably commented storeStore barrier and hope that this is the last RCU bug.

Yes, the VM has a (correct, AFAIK) RCU mechanism of its own in `GlobalCounter`. As the name implies, it's a global construct that's shared by all users of the mechanism. If a lot of logging occurs, then UL might put a strain on the mechanism that we haven't seen before and this in turn might have performance implications.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13421#discussion_r1185875590