RFR: 8305819: LogConfigurationTest intermittently fails on AArch64

gaogao-mem duke at openjdk.org
Thu Apr 27 07:13:54 UTC 2023


On Thu, 27 Apr 2023 02:45:27 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> LogConfigurationTest*reconfigure*MT* crash intermittently on AArch64.
>> According to the crash log and coredump, we found it crash as follows:
>> 
>> void LogTagSet::log(LogLevelType level, const char* msg) {
>>   LogOutputList::Iterator it = _output_list.iterator(level);
>>   LogDecorations decorations(level, *this, _decorators);
>> 
>>   for (; it != _output_list.end(); it++) {
>>     (*it)->write(decorations, msg);//crash 
>>   }
>> }
>> 
>> In the test, two threads write into the log while another thread dynamically changes the decorators and tags. During this time, the  _output_list will be modified. Because of the relax memory model of aarch64, while adding LogOutputNode to LogOutputList, adding node to list and setting the value of node may be reordered, therefore the read thread may not read the correct value of the node's content. Consequently, storestore memory barrier is needed to ensure the order of writing. 
>> By the way, applying this patch may affect performance.
>> 
>> How to reproduce on Linux aarch64:
>> test case
>> 
>> /* @test
>>  * @library /test/lib
>>  * @modules java.base/jdk.internal.misc
>>  *          java.xml
>>  * @run main/native GTestWrapper --gtest_filter=LogConfigurationTest*reconfigure*MT*
>>  */
>> 
>> Crash may occasionally occur after running continuously for 5000 times.
>
> Is the solution as simple as the test needing to use the ConfigurationLock?

@dholmes-ora Thank you for your comment. I agree with your opinion that the API here needs better design and implementation. In my humble opinion, if we need to use locks, I would prefer to use a reader-writer lock. Personally, I lean towards a lock-free implementation, but as a beginner, I'm not fully familiar with the overall design and implementation of the UL, so it may require more time. The current crash can be resolved with this patch, so I think maybe we can first apply this patch to solve the problem we are facing no. If possible, I'm willing to propose a more thorough solution in a follow up patch, but it may take some time.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13421#issuecomment-1524903995


More information about the hotspot-runtime-dev mailing list