RFR: 8305819: LogConfigurationTest intermittently fails on AArch64
Xin Liu
xliu at openjdk.org
Thu May 4 18:08:20 UTC 2023
On Thu, 4 May 2023 17:40:54 GMT, Xin Liu <xliu at openjdk.org> wrote:
>>> Or we just proceed with a suitably commented storeStore barrier and hope that this is the last RCU bug.
>>
>> We can't do only that, because it's still buggy. For it to be correct portable code we need at least a `memory_order::consume` load on the reading side. Given that `consume` is C++20 and we're still on C++14, something like the kernel's `READ_ONCE` would probably do.
>
> hi, @theRealAph
>
> I learn your comments and Boehm's. Actually, I found that a discussion on control dependency ordering took place in the PR of JDK-8288904 before. The knowledge is non-trivial but I think I got your point.
>
> IMHO, you insist the general rule to a niche case. you are still right, but now the bar is too high to resolve this immediate error.
>
> 1. compiler may flip orders regardless control dependency -- check! you also makes reference of kernel RCU. However, UL configuration is much simpler than C++ standard libraries and Linux kernel. In this case, I think we should take the advice of "Recipient only reads x" from Boehm. there's a single writer. it's manipulating linked-list. within a CPU, control and data dependency is guaranteed.
>
> 2. I believe that the error reported in JDK-8305819 is caused by hardware instead of compiler optimization. Evidence: I was told by @gaogao-mem this intermittent crash disappeared after he integrated a storestore barrier. He has ran the tests millions of times and the error never occurs again. Can we settle that a store-only barrier can practically resolve this error?
>
> We also provided reasoning why we need a storestore barrier in prior comments. we will put comments in PR. The process is same as JDK-8288904. Reason, merge and resolve the error practically.
>
> 3. I am not saying it's perfect. I agree the current solution is still "buggy" in presence of compiler optimizations. we never enforce it to the C++ standard so it's "UB".
>
> That's why we should work on replacing RCU with RWL. Like David pointed out, we need more time to review, test a solution. To achieve that, we need a better baseline. now we know that the concurrent test is broken on the weak memory model. If this bandaid works, I think we will be in the better position for the development.
> I assume that is the memory barrier that RCU is supposed to be implicitly implementing? If this is missing from this RCU-like thing can we just use release/acquire as suggested much earlier?
currently, it uses Atomic::add in LogOutputList::increase_readers/descrease_readers. The default order of Atomic::add is `memory_order_conservative` so it emits a full memory barrier.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/13421#discussion_r1185345159
More information about the hotspot-runtime-dev
mailing list