RFR: 8305819: LogConfigurationTest intermittently fails on AArch64

Fri May 5 03:38:19 UTC 2023

On Thu, 4 May 2023 09:23:53 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> The PR was never approved and was somewhat contentious. I would want to see how UL uses the proposed RWL before considering the integration of both changes. This is not a trivial amount of work I know.
>> 
>> Buried on the other PR was a suggestion to switch the broken UL RCU mechanism for one of the other VM RCU-like mechanisms, but I'm not familiar enough with those other mechanisms to say if that is viable. But perhaps the UL RCU mechanism can be examined against those other ones to see if any more bugs are evident?
>> 
>> Or we just proceed with a suitably commented storeStore barrier and hope that this is the last RCU bug.
>
>> Or we just proceed with a suitably commented storeStore barrier and hope that this is the last RCU bug.
> 
> We can't do only that, because it's still buggy. For it to be correct portable code we need at least a `memory_order::consume` load on the reading side. Given that `consume` is C++20 and we're still on C++14, something like the kernel's `READ_ONCE` would probably do.

After @theRealAph 's detailed explanation and reading some related materials, I am convinced that we should use acquire operation on the read side, and I'd like to explain why in detail based on my current understanding.

I suppose that we have agreed that a storestore barrier is needed on the write side.

We can simplify the logic of the problematic region as follows:

                   The Writer                        |      The Reader(s)
                                                     |
(1) create a new node                                |  (1) traverse all nodes from _level_start[level]
(2) init the node (set _value, _level, _next ...)    |  (2) call (node->_value)->write(...)
(3) update the _level_start array                    |
(4) link the node to the list                        |

The writer step (3) and (4) add the new node. The reader step (1) may get the new node(s) added by the writer and then call (node->value)->write(...).

According to the [std::memory_order page](https://en.cppreference.com/w/cpp/atomic/memory_order) :

If one evaluation modifies a memory location, and the other reads or modifies the same memory location,
and if at least one of the evaluations is not an atomic operation, the behavior of the program is
undefined (the program has a data race) unless there exists a happens-before relationship between
these two evaluations.

There is a data race between the writer and reader(s). The writer step (3) and (4) modify the _level_start array and _next of a old node, and the reader step (1) reads from _level_start and _next of a node. This causes a UB. And in current problem, the reader step (2) may read an incorrect _value from a new node.

To fix this problem, we need to establish a happens-before relationship (Inter-thread happens-before). That means when step (1) on read side get a new node, we must make sure that step (1) and (2) on the write side happens-before step (2) on read side.

We can achieve the goal by `Dependency-ordered before` since the reader step (1) carries a dependency into step (2). Unfortunately, Hotspot currently doesn't support consume operation just as Aph metioned, so we should depend on `Synchronizes with`.

Combining the [std::memory_order page](https://en.cppreference.com/w/cpp/atomic/memory_order) and [std::atomic_thread_fence page](https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence) together, we can get a solution: add a release fence between step (2) and (3) on write side and use acquire operation on read side step (1).

At first, I only added a storestore barrier on the write side. The reason is reader step (1) and (2) have a dependency and the only CPU I know that may reorder dependent loads is Alpha, and Alpha has gone.

Because the code here is shared and we cannot assume the architecture we support, the acquire operation is necessary according to the language standard and the current implementation of hotspot, and also we need to make _level_start and _next volatile.

The above is my current understanding. Please feel free to correct me if I'm wrong.

This is the only place I can find now that has data race. Should we consider integrating this patch first if the current solution is acceptable and then rewrite the logic by RWL after RWL is supported by hotspot?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13421#discussion_r1185671874