RFR: 8305819: LogConfigurationTest intermittently fails on AArch64

Xin Liu xliu at openjdk.org
Thu May 4 18:08:20 UTC 2023


On Thu, 4 May 2023 17:40:54 GMT, Xin Liu <xliu at openjdk.org> wrote:

>>> Or we just proceed with a suitably commented storeStore barrier and hope that this is the last RCU bug.
>> 
>> We can't do only that, because it's still buggy. For it to be correct portable code we need at least a `memory_order::consume` load on the reading side. Given that `consume` is C++20 and we're still on C++14, something like the kernel's `READ_ONCE` would probably do.
>
> hi, @theRealAph 
> 
> I learn your comments and Boehm's. Actually, I found that a discussion on control dependency ordering took place in the PR of JDK-8288904 before. The knowledge is non-trivial but I think I got your point.
> 
> IMHO, you insist the general rule to a niche case. you are still right, but now the bar is too high to resolve this immediate error. 
> 
> 1. compiler may flip orders regardless control dependency -- check! you also makes reference of kernel RCU. However, UL configuration is much simpler than C++ standard libraries and Linux kernel. In this case, I think we should take the advice of "Recipient only reads x" from Boehm.  there's a single writer. it's manipulating linked-list. within a CPU, control and data dependency is guaranteed. 
> 
> 2. I believe that the error reported in JDK-8305819 is caused by hardware instead of compiler optimization.  Evidence: I was told by @gaogao-mem this intermittent crash disappeared after he integrated a storestore barrier. He has ran the tests millions of times and the error never occurs again.  Can we settle that a store-only barrier can practically resolve this error? 
> 
> We also provided reasoning why we need a storestore barrier in prior comments. we will put comments in PR. The process is same as JDK-8288904. Reason, merge and resolve the error practically. 
> 
> 3. I am not saying it's perfect. I agree the current solution is still "buggy" in presence of compiler optimizations.  we never enforce it to the C++ standard so it's "UB". 
> 
> That's why we should work on replacing RCU with RWL. Like David pointed out, we need more time to review, test a solution. To achieve that, we need a better baseline. now we know that the concurrent test is broken on the weak memory model. If this bandaid works, I think we will be in the better position for the development.

> I assume that is the memory barrier that RCU is supposed to be implicitly implementing? If this is missing from this RCU-like thing can we just use release/acquire as suggested much earlier?

currently, it uses Atomic::add in LogOutputList::increase_readers/descrease_readers. The default order of Atomic::add is `memory_order_conservative` so it emits a full memory barrier.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13421#discussion_r1185345159


More information about the hotspot-runtime-dev mailing list