RFR: 8346123: [REDO] NMT should not use ThreadCritical [v11]

Sat Dec 21 17:07:42 UTC 2024

On Fri, 20 Dec 2024 21:29:18 GMT, Robert Toyonaga <duke at openjdk.org> wrote:

>> This is a redo of [JDK-8304824](https://bugs.openjdk.org/browse/JDK-8304824) which was backed out by [JDK-8343726](https://bugs.openjdk.org/browse/JDK-8343726) due to problems documented in [JDK-8343244](https://bugs.openjdk.org/browse/JDK-8343244).
>> 
>> The problem was that `NmtVirtualMemoryLocker` was not locking when the current thread is not attached by checking `Thread::current_or_null_safe() != nullptr`. This is necessary during VM init, but should not be allowed afterward. NMT may be used in `attach_current_thread` before the current thread is set. The lock was not being acquired in that case, which intermittently caused NMT accounting to become corrupted, triggering various assertions when future NMT operations are done.  To fix this, I've adopted [Thomas' suggestion](https://github.com/openjdk/jdk/pull/21928#issuecomment-2460238057) to reverse the order of 
>> 
>> 
>> thread->register_thread_stack_with_NMT();
>> thread->initialize_thread_current();
>> 
>> 
>> in `attach_current_thread`.  This allows `NmtVirtualMemoryLocker` to be locked after current thread is set. 
>> 
>> To allow for `NmtVirtualMemoryLocker` to be used during VM init, I've replaced the `ConditionalMutexLocker` check `Thread::current_or_null_safe() != nullptr` with a new flag: `_done_bootstrap`. This flag prevents the lock from being used during VM init, at which point we are single threaded anyway. This avoids errors due to Hotspot mutexes and current thread not yet being ready. 
>> 
>> I also added new asserts in `virtualMemoryTracker.cpp` to catch future bugs like this where the lock is not held when it should be. I updated the appropriate VMT tests to also lock (there were a few cases where locking was being bypassed) so they can pass the new asserts.
>> 
>> I also removed the unused `_query_lock` variable in `MemTracker`.
>> 
>> Testing: 
>> 
>> - On Linux amd64, I was able to consistently reproduce the errors described in [JDK-8343244](https://bugs.openjdk.org/browse/JDK-8343244) by increasing the number of test threads in `java/lang/Thread/jni/AttachCurrentThread/AttachTest.java`. The test consistently passes with the new changes in this PR.
>> - hotspot_nmt , gtest:VirtualSpace, tier1
>
> Robert Toyonaga has updated the pull request incrementally with one additional commit since the last revision:
> 
>   move boostrapping_done flag into Mutex from MutexLocker

Changes requested by kbarrett (Reviewer).

src/hotspot/share/nmt/threadStackTracker.cpp line 55:

> 53:   align_thread_stack_boundaries_inward(base, size);
> 54: 
> 55:   MemTracker::MemTracker::NmtVirtualMemoryLocker nvml;

Too many "MemTracker"s here?

src/hotspot/share/nmt/virtualMemoryTracker.cpp line 632:

> 630:     if (Mutex::is_bootstrapping_done()) {
> 631:       assert_lock_strong(NmtVirtualMemory_lock);
> 632:     }

It seems like this can use `MemTracker::assert_locked()`, since that is conditioned on bootstrapping state.

src/hotspot/share/runtime/mutex.hpp line 222:

> 220:   }
> 221:  private:
> 222:   static bool _bootstrapping_done;

A description of what these mean would be nice. It seems to be mutex_init completed + threading
initialization completed?

src/hotspot/share/runtime/mutexLocker.cpp line 291:

> 289:   MUTEX_DEFN(NMTQuery_lock                   , PaddedMutex  , safepoint);
> 290:   MUTEX_DEFN(NMTCompilationCostHistory_lock  , PaddedMutex  , nosafepoint);
> 291:   MUTEX_DEFN(NmtVirtualMemory_lock           , PaddedMutex  , service-4); // Must be lower than G1Mapper_lock used from G1RegionsSmallerThanCommitSizeMapper::commit_regions

Is it really okay for nmt to use a VM mutex? That means relevant functions can
only be called from a VM thread. In particular, what happens if we get into
the error handler because of a signal on a non-VM thread, or something
similarly weird.

This is also replacing the PlatformMutex that was being used by
MemoryFileTracker. I don't know why that was using a PlatformMutex, but I
presume there was a reason. It was doing so in the initial version of
JDK-8312132. Looking back through the development commits, is see these two in
sequence:

3472573 Introduce a separate Mutex for MemoryFileTracker
46f10f6 Use own PlatformMutex instead

I found no discussion about that choice in the PR. Maybe it was because of
problems determining the needed lock rank? (The original VM mutex had
Mutex::service rank. That's apparently too high, as discussed in this change.)
Or maybe there was some other reason?

I'll ask @jdksjolen about this, though holidays...

-------------

PR Review: https://git.openjdk.org/jdk/pull/22745#pullrequestreview-2518602318
PR Review Comment: https://git.openjdk.org/jdk/pull/22745#discussion_r1894576359
PR Review Comment: https://git.openjdk.org/jdk/pull/22745#discussion_r1894576955
PR Review Comment: https://git.openjdk.org/jdk/pull/22745#discussion_r1894652359
PR Review Comment: https://git.openjdk.org/jdk/pull/22745#discussion_r1894657610