RFR: 8323807: Async UL: Add a stalling mode to async UL [v2]

Fri Apr 5 00:38:01 UTC 2024

On Wed, 3 Apr 2024 09:59:17 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

>> Yeah the in-or-out-of the loop placement of TBIVM only impacts performance/overhead.
>> 
>> `Monitor::wait` includes safepoint checks and so is only for JavaThreads. For NonJavaTHreads and JavaThread's that don't want to perform safepoint checks you use `Monitor::wait_without_safepoint_check`. This used to be handled by an argument passed to `wait` but the API was reworked a few releases back. The caller has to check what kind of thread they have. So you will need to restructure the wait code you have.
>> 
>> Other safepoint issues would relate to potential deadlocks if the UL code is called whilst other locks are held and a safepoint is active. For non-stalling locking the short critical sections make that a non-issue, but with stalling it could well be an issue.
>
>>Other safepoint issues would relate to potential deadlocks if the UL code is called whilst other locks are held and a safepoint is active. For non-stalling locking the short critical sections make that a non-issue, but with stalling it could well be an issue.
> 
> You mean a non-issue in practice, but not in theory, right? If we do have a reachable deadlock state, then it doesn't matter if UL stalls or not, it  just increases the probability of that state being reached, is this what you mean? I'm having some trouble constructing the state you're talking about in my head. Stalling only occurs for a log site, this is not relevant for the log writer. Therefore, there is a progress guarantee on the log sites in the sense that at least one log site will always progress. This would change if the consumer thread was unable to progress, due to a safepoint-induced deadlock for example.
> 
> 
> As an aside:  There is a starvation risk, as the lock is not fair. There's also the issue of a big log message being unable to progress, as it's stalling for memory but smaller log sites are able to progress, thus preventing it from ever getting enough memory. I don't actually think that these are problems in practice, as we typically do have moments of quiescence, the throughput is in the higher MiB/s and the log buffer is in the multiple MiBs in size.

As we discussed off-list the starvation issue is a real problem that needs to be addressed.

Overall the memory management for async UL seems somewhat problematic and adding to the complexity in the stalling case. But that is probably a topic for another day.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17757#discussion_r1552628470