RFR: 8305819: LogConfigurationTest intermittently fails on AArch64

Xin Liu xliu at openjdk.org
Sat Apr 29 00:57:23 UTC 2023


On Tue, 11 Apr 2023 08:45:56 GMT, gaogao-mem <duke at openjdk.org> wrote:

> LogConfigurationTest*reconfigure*MT* crash intermittently on AArch64.
> According to the crash log and coredump, we found it crash as follows:
> 
> void LogTagSet::log(LogLevelType level, const char* msg) {
>   LogOutputList::Iterator it = _output_list.iterator(level);
>   LogDecorations decorations(level, *this, _decorators);
> 
>   for (; it != _output_list.end(); it++) {
>     (*it)->write(decorations, msg);//crash 
>   }
> }
> 
> In the test, two threads write into the log while another thread dynamically changes the decorators and tags. During this time, the  _output_list will be modified. Because of the relax memory model of aarch64, while adding LogOutputNode to LogOutputList, adding node to list and setting the value of node may be reordered, therefore the read thread may not read the correct value of the node's content. Consequently, storestore memory barrier is needed to ensure the order of writing. 
> By the way, applying this patch may affect performance.
> 
> How to reproduce on Linux aarch64:
> test case
> 
> /* @test
>  * @library /test/lib
>  * @modules java.base/jdk.internal.misc
>  *          java.xml
>  * @run main/native GTestWrapper --gtest_filter=LogConfigurationTest*reconfigure*MT*
>  */
> 
> Crash may occasionally occur after running continuously for 5000 times.

src/hotspot/share/logging/logOutputList.cpp line 132:

> 130:   // Prevent the reordering of adding node to list and setting the value of node.
> 131:   // Such a reordering would lead to reading incorrect values.
> 132:   OrderAccess::release();

I agree this is the spot-on. what we need is a storestore barrier.  

We don't want store 'node->_value = output;' floats below stores, such as store '_level_start[l] = node;'

I meant to ask whether 'storestore' barrier itself can resolve this intermittent error on your side?  I just realized that your had storestore barrier in the early revision and this: 

>  By the way, I have tested millions of times and it hasn't failed when just adding memory barrier on the writing side.

So you have verified it, haven't you?

I can help you fill in some backgrounds about why it's not mandatory to use release-acquire.  this RCU hack starts from Linux kernel...

It doesn't use explicit synchronization for scalability. UL chose to use RCU in this scenario because readers are more frequent and need to be scalable. writer who wants to change LogConfiguration is very low-frequent and singleton. RCU shifts almost all costs on the writer.  when writer is changing, it allows readers to uses either old or new nodes in the linked list.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13421#discussion_r1180896210


More information about the hotspot-runtime-dev mailing list