RFR: 8274794: Print all owned locks in hs_err file [v2]

Sat Oct 16 07:48:52 UTC 2021

On Fri, 15 Oct 2021 16:47:30 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> See CR for details.  This saves the created mutex in a mutex_array during Mutex construction so that all Mutex, not just the ones in mutexLocker.cpp can be printed in the hs_err file.  ThreadCritical is used to protect the mutex_array, and to avoid UB should be used when printing the owned locks, but since that is done during error handling, this seemed better not to lock during error handling.  There's 0 probability that another thread will be creating a lock during this error handling anyway.
>> Tested with tier1-8.
>
> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Make mutex_array be a linked list.

Hi Coleen,

thank you for taking my suggestion.

I still think that this may be better suited for debug only, like the other analysis features of Mutexes, but I leave that up to you. There is a case for making this available in release builds too.

Apart from that, the rest are mostly unimportant nits. I'll leave it up to you what you take from them.

Cheers , Thomas

src/hotspot/share/runtime/mutex.cpp line 273:

> 271: // Array used to print owned locks on error.
> 272: static Mutex* _mutex_array = NULL;
> 273: static int _array_count = 0;

Do we need this counter? Could we not just count the mutexes while printing them?

src/hotspot/share/runtime/mutex.cpp line 280:

> 278:     ThreadCritical tc;
> 279:     Mutex* old_next = _next_mutex;
> 280:     assert(old_next != nullptr, "only static mutexes don't have a next");

This was a brainteaser till I realized that the end of the list is composed of the Mutexes created first, which we don't delete. Maybe add that as a comment?

src/hotspot/share/runtime/mutex.cpp line 322:

> 320:     }
> 321:     _array_count += 1;
> 322:   }

Proposal: maybe factor this block (and its counterpart in ~Mutex) to an own private method, "add_to_global_mutex_list" and "remove_from_global_mutex_list"? That would also make the difference between `_next` and `_next_mutex` a bit clearer to the casual reader.

src/hotspot/share/runtime/mutex.cpp line 335:

> 333: #ifdef ASSERT
> 334:   if (_allow_vm_block) {
> 335:     st->print("%s", " allow_vm_block");

nit: shorten to `print_raw(" alloc_vm_block")` ?

src/hotspot/share/runtime/mutex.cpp line 344:

> 342: // Print all mutexes/monitors that are currently owned by a thread; called
> 343: // by fatal error handler.
> 344: void Mutex::print_owned_locks_on_error(outputStream* st) {

Maybe a small comment that we explicitly avoid locking here, because we are in error handling and locking may block? That we rather risk a secondary crash instead of blocking in error handling?

src/hotspot/share/runtime/mutex.cpp line 345:

> 343: // by fatal error handler.
> 344: void Mutex::print_owned_locks_on_error(outputStream* st) {
> 345:   ResourceMark rm;

Why is this mark needed?

src/hotspot/share/runtime/mutex.cpp line 346:

> 344: void Mutex::print_owned_locks_on_error(outputStream* st) {
> 345:   ResourceMark rm;
> 346:   st->print("VM Mutex/Monitor currently owned by a thread: ");

nit: use plural? 
e.g. "All VM Mutexes/Monitors currently owned by any thread:"

src/hotspot/share/runtime/mutex.cpp line 361:

> 359:   }
> 360:   if (none) st->print_cr("None");
> 361:   st->print_cr("Total Mutex count %d", _array_count);

This prints the total number of mutexes, not those owned by a thread, is that intentional?

src/hotspot/share/utilities/vmError.cpp line 1920:

> 1918:       MutexLocker ml(ErrorTest_lock, Mutex::_no_safepoint_check_flag);
> 1919:       assert(how == 0, "test assert with lock");
> 1920:       break;

I like this test.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5958