RFR: 8305819: LogConfigurationTest intermittently fails on AArch64
Dan Gao
duke at openjdk.org
Mon May 15 02:19:52 UTC 2023
On Fri, 28 Apr 2023 03:00:41 GMT, David Holmes <dholmes at openjdk.org> wrote:
>> LogConfigurationTest*reconfigure*MT* crash intermittently on AArch64.
>> According to the crash log and coredump, we found it crash as follows:
>>
>> void LogTagSet::log(LogLevelType level, const char* msg) {
>> LogOutputList::Iterator it = _output_list.iterator(level);
>> LogDecorations decorations(level, *this, _decorators);
>>
>> for (; it != _output_list.end(); it++) {
>> (*it)->write(decorations, msg);//crash
>> }
>> }
>>
>> In the test, two threads write into the log while another thread dynamically changes the decorators and tags. During this time, the _output_list will be modified. Because of the relax memory model of aarch64, while adding LogOutputNode to LogOutputList, adding node to list and setting the value of node may be reordered, therefore the read thread may not read the correct value of the node's content. Consequently, storestore memory barrier is needed to ensure the order of writing.
>> By the way, applying this patch may affect performance.
>>
>> How to reproduce on Linux aarch64:
>> test case
>>
>> /* @test
>> * @library /test/lib
>> * @modules java.base/jdk.internal.misc
>> * java.xml
>> * @run main/native GTestWrapper --gtest_filter=LogConfigurationTest*reconfigure*MT*
>> */
>>
>> Crash may occasionally occur after running continuously for 5000 times.
>
> From `logConfiguration.cpp`
>
> // MT-SAFETY
> //
> // The ConfigurationLock guarantees that only one thread is performing reconfiguration. This function still needs
> // to be MT-safe because logsites in other threads may be executing in parallel. Reconfiguration means unified
> // logging allows users to dynamically change tags and decorators of a log output via DCMD(logDiagnosticCommand.hpp).
> //
> // A RCU-style synchronization 'wait_until_no_readers()' is used inside of 'ts->set_output_level(output, level)'
> // if a setting has changed. It guarantees that all logs, either synchronous writes or enqueuing to the async buffer
> // see the new tags and decorators. It's worth noting that the synchronization occurs even if the level does not change.
> //
> // LogDecorator is a set of decorators represented in a uint. ts->update_decorators(decorators) is a union of the
> // current decorators and new_decorators. It's safe to do output->set_decorators(decorators) because new_decorators
> // is a subset of relevant tagsets decorators. After updating output's decorators, it is still safe to shrink all
> // decorators of tagsets.
> //
>
>
> So this code is very much expected to be MT-safe. If we have discovered a situation where it is not MT-safe then that needs to be fixed within the existing design involving the `ConfigurationLock` (a semaphore) and the "RCU-style synchronization". I would have to be most suspicious of the latter in terms of missing memory barriers (especially given [JDK-8288904](https://bugs.openjdk.org/browse/JDK-8288904)) , but I don't understand the complete code enough to identify where a problem may be creeping in.
>
> @jdksjolen can you take another look at this and see where the design-flaw, or implementation bug, may be?
>
> Perhaps @navyxliu can offer an opinion too?
>
> Thanks.
Hi, I have modified based on your suggestion. Is it okay now? @dholmes-ora
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13421#issuecomment-1547101171
More information about the hotspot-runtime-dev
mailing list