RFR: 8305819: LogConfigurationTest intermittently fails on AArch64
David Holmes
dholmes at openjdk.org
Thu May 4 21:34:07 UTC 2023
On Thu, 4 May 2023 18:05:13 GMT, Xin Liu <xliu at openjdk.org> wrote:
>> hi, @theRealAph
>>
>> I learn your comments and Boehm's. Actually, I found that a discussion on control dependency ordering took place in the PR of JDK-8288904 before. The knowledge is non-trivial but I think I got your point.
>>
>> IMHO, you insist the general rule to a niche case. you are still right, but now the bar is too high to resolve this immediate error.
>>
>> 1. compiler may flip orders regardless control dependency -- check! you also makes reference of kernel RCU. However, UL configuration is much simpler than C++ standard libraries and Linux kernel. In this case, I think we should take the advice of "Recipient only reads x" from Boehm. there's a single writer. it's manipulating linked-list. within a CPU, control and data dependency is guaranteed.
>>
>> 2. I believe that the error reported in JDK-8305819 is caused by hardware instead of compiler optimization. Evidence: I was told by @gaogao-mem this intermittent crash disappeared after he integrated a storestore barrier. He has ran the tests millions of times and the error never occurs again. Can we settle that a store-only barrier can practically resolve this error?
>>
>> We also provided reasoning why we need a storestore barrier in prior comments. we will put comments in PR. The process is same as JDK-8288904. Reason, merge and resolve the error practically.
>>
>> 3. I am not saying it's perfect. I agree the current solution is still "buggy" in presence of compiler optimizations. we never enforce it to the C++ standard so it's "UB".
>>
>> That's why we should work on replacing RCU with RWL. Like David pointed out, we need more time to review, test a solution. To achieve that, we need a better baseline. now we know that the concurrent test is broken on the weak memory model. If this bandaid works, I think we will be in the better position for the development.
>
>> I assume that is the memory barrier that RCU is supposed to be implicitly implementing? If this is missing from this RCU-like thing can we just use release/acquire as suggested much earlier?
>
> currently, it uses Atomic::add in LogOutputList::increase_readers/descrease_readers. The default order of Atomic::add is `memory_order_conservative` so it emits a full memory barrier.
Right but presumably the missing read-side barrier relates to the missing store barrier that we want to insert.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/13421#discussion_r1185537858
More information about the hotspot-runtime-dev
mailing list