RFR: 8313202: MutexLocker should disallow null Mutexes

Thu Jul 27 08:36:52 UTC 2023

On Wed, 26 Jul 2023 17:06:02 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> As seen in [JDK-8313081](https://bugs.openjdk.org/browse/JDK-8313081), it is fairly easy to pass nullptr `Mutex` to `MutexLocker` by accident, which would just silently avoid the lock.
> 
> There are a few places in Hotspot where we pass `nullptr` to simulate re-entrancy and/or conditionally take the lock. Those places can be more explicit, and the default `MutexLocker` can disallow nullptrs for extra safety.
> 
> Open for some bikeshedding on the names of the new `MutexLockers`. Particularly `ReentrantMutexLocker` might lull readers into believing it does safepoint checks on re-entrant "lock", which it actually does not do.
> 
> More thorough testing with different GC/JIT combinations is running now, we might find more issues there. Meanwhile, please comment on the approach.
> 
> Additional testing:
>  - [x] `grep -R "MutexLocker " src/hotspot | grep -i null`, no hits
>  - [x] `grep -R "MutexLocker " src/hotspot | grep -i ?`, no hits
>  - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` (re-run in progress)

All right, the discussion is here then?

To recap: JDK-8313081 shows that it is easy to pass null Mutex by accident, which would skip synchronization. Reviewers (both you, me, and others) did not catch this. No internal asserts caught this. No testing caught this. We got lucky it only produced the accounting error. 

So there are two cases:
 a) MutexLocker accepts null Mutex during VM initialization, and that is arguably correct behavior like JDK-8313210 
 b) MutexLocker accepts null Mutex during normal operation, and that is an error leading to cases like JDK-8313081

What JDK-8313081 tells me is that by chasing (a) we are missing the safety in (b). I would argue that detecting _missing synchronization_ in product code due to (b) is more important than asserting-and-special-casing on (a). I don't think we should wait for a more serious breakage to occur to start protecting from these kinds of bugs. Note that the way current PR works, we still accept null Mutex in product bits for (a), we "just" assert it in debug.

If we don't do these checks in Mutex code itself, then the paranoid thing would be to require `assert(MyLock_lock != nullptr)` before every (all?) critical mutexes acquisitions that are expected to always lock.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15043#issuecomment-1653144754