RFR: 8295376: Improve debug agent virtual thread performance when no debugger is attached
Chris Plummer
cjplummer at openjdk.org
Fri Oct 28 01:36:27 UTC 2022
On Fri, 28 Oct 2022 00:52:31 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:
>> This PR disables VIRTUAL_THREAD_START/END events when no debugger is attached. Some customers like to launch (in production mode) the JVM with the debug agent enabled, but with no debugger attached. This usually has a very minimal performance impact as long as the debugger remains unattached. Launching the JVM in this manner allows a debugger to later be attached if there are issues.
>>
>> In the past the debug agent has always kept THREAD_START/END events enabled, even when no debugger is attached. It does so to keep track of all existing threads, and this generally has minimal performance impact. However, with virtual threads there can be a very large number of virtual threads constantly being created and destroyed. With SkyNet, this results in as much as a 250x performance regression when the debug agent is enabled. A large part of this is due to all the VIRTUAL_THREAD_START/END events the debug agent has to deal with. When these events are disabled, the performance slowdown is more like 10x. This remaining 10x comes from JVMTI and is not addressed in this CR.
>>
>> The fix for this performance issue is to disable VIRTUAL_THREAD_START/END events when the debugger is not attached. This also requires forgetting about all known virtual threads when the debugger detaches since there will be no VIRTUAL_THREAD_END event to notify the debug agent when the virtual thread has died.
>>
>> I decided it was appropriate to not introduce this behavior change if the debug agent is launched with `includevirtualthreads=y` (the default is 'n'). `includevirtualthreads=y` is meant to help legacy debuggers, and indicates that the debug agent should try to track (remember) all vthreads that are running, and return them when `VirtualMachine.GetAllThreads` is called. `includevirtualthreads=y` sets `gdata->includeVThreads` true, so initially I had these changes conditional on `gdata->includeVThreads`. However, most of our testing is done with `includevirtualthreads=y` (an unfortunate necessity because of how most of the test are written). This made it hard to test these changes. So I introduced `gdata->rememberVThreadsWhenDisconnected`, and made the changes conditional on it instead. It defaults to be the same as `gdata->includeVThreads`. However, I did some testing with it manually set it to always be false to get better testing coverage. This could be made a debug agent flag se
ttable on the command line, but I opted not to since it's not really of interest to the typical debug agent user. I suppose it could be an undocumented flag just to use in testing. I can make this change if others agree.
>>
>> A couple notes on code flow that might help with understanding the changes:
>> - When the connection is initiated, `debugLoop_run()` is called, and it calls `eventHandler_onConnect()` to enable the START/END callbacks.
>> - When the connection is dropped, `debugLoop_run()` calls `debugInit_reset()`, which calls `eventHandler_reset()` to disable the START/END callbacks. A bit later `debugInit_reset()` calls `threadControl_reset()` to free up the vthread ThreadNodes.
>
> src/jdk.jdwp.agent/share/native/libjdwp/eventHandler.c line 1684:
>
>> 1682: }
>> 1683: debugMonitorExit(callbackLock);
>> 1684: waiting_for_active_callbacks = JNI_FALSE;
>
> The set operations at lines 1678 and 1684 are not protected by any lock.
> Would it be more safe to put them inside critical section at lines 1679-1683?
> The variable `waiting_for_active_callbacks` is checked by the event callbacks.
Only the "JDWP Command Reader" thread can ever enter this code. So we have one writer but multiple readers. The readers all have to grab the callbackLock before reading `waiting_for_active_callbacks`, and also before touching `active_callbacks`. It is possible that the flag can be set and then `END_CALLBACK()` does the notification before the above code enters the lock and waits. But that's ok because it also means that `active_callbacks` is set to 0, so no wait will be done. If another event thread races and sets it back up to 1, then that is also ok because that means END_CALLBACK() will be invoked again and do the notification again.
However, there also seems to be no reason not to move the code withing the critical section, so I can do that if you'd like. The reason it ended up outside is because I used to have a different flag (I think it was called `gdata->resetting`) that was set in `debugInit_reset()` (a couple of frames above this one), but I decided it would be easier to understand if I limited the scope of when the flag was set, and also give it a name that implied a narrower scope of use. However, I maintained the same semantics of setting it outside the lock.
-------------
PR: https://git.openjdk.org/jdk/pull/10865
More information about the serviceability-dev
mailing list