RFR: 8305819: LogConfigurationTest intermittently fails on AArch64

Wed May 3 07:05:17 UTC 2023

On Tue, 2 May 2023 08:46:53 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> @navyxliu  thank you for looking at this. As you note the RCU code is intended to ensure that a reader can either see the old node, or the new node, and work correctly. In that case  we would logically expect to use release semantics so that if a reader sees the new node then it also sees the correct values for all fields of the node. The release barrier is a `storeStore|loadStore` barrier. Release semantics then require matching acquire semantics when we read the node from the list. You seem to be suggesting that RCU obviates the need for the acquire side of this - is that correct? If so then perhaps the raw storeStore barrier will suffice - as we state in orderAccess.hpp:
>>> If not used in such a pair, it is advised to use a membar instead: acquire/release only make sense as pairs.
>> 
>> Personally I would find the acquire/release easier to reason about as I'm not that familiar with the subtleties of RCU. That said I'm also concerned that this RCU implementation is again found to be flawed due to missing barriers and have to wonder if this is the last of them? If RCU is too difficult to get right then perhaps we should stop trying to be overly clever and use a simpler, potentially slower, lock-based solution?
>
>> > By the way, I have tested millions of times and it hasn't failed when just adding memory barrier on the writing side.
>> 
>> So you have verified it, haven't you?
> 
> :-)
>  
>> I can help you fill in some backgrounds about why it's not mandatory to use release-acquire. this RCU hack starts from Linux kernel...
>> 
>> It doesn't use explicit synchronization for scalability. 
> 
> Kernel RCU depends on assumptions not guaranteed by any language standard.  In particular, it depends on read dependency ordering. This works on most processors unless the compiler does something you weren't expecting it to do. But, of course, it is undefined behaviour according to the language. The Linux kernel itself uses some compiler-specific black magic to do the needful: see `READ_ONCE` in include/linux/compiler.h. 
> 
> We might decide to do something similar for really hot code, but this isn't it.

@theRealAph I didn't mean that current code is like linux kernel RCU, which is well-documented synchronization. I mean current solution in UnifiedLog borrow the idea. 

LogOutputList::increase_readers() is similar to rcu_read_lock(). 

I found READ_ONCE in 'include/asm-generic/rwonce.h'. In hotspot, we could use it for case-2: 

(2) Ensuring that the compiler does not fold, spindle, or otherwise
 * mutilate accesses that either do not require ordering or that interact
 * with an explicit memory barrier or atomic instruction that provides the
 * required ordering.

Analysis so far leads us to believe this error manifests in weak memory model, right?  it's the processor reorders some instructions but not compiler optimizations. Do we really need to consider that MACRO?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13421#discussion_r1183306260