RFR: 8157023: Integrate NMT with JFR

Fri Dec 2 15:36:16 UTC 2022

On Fri, 2 Dec 2022 14:08:02 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>>> I think we do, since summary info contains information about committed thread stack size. 
>> 
>> You are totally correct, somehow missed that part of `VirtualMemorySummary::snapshot()`. So yes, we should really look at only doing one snapshot per period. Any feeling for exactly how expensive this walk is?
>> 
>>> To serialize jcmds running in parallel. Since jcmd is done by the attach listener thread, the only way I can think of would be if someone in parallel executes the command via mbean? Not sure.
>> 
>> Ok, so if executed via the mbean it would be executed in another thread but still use `DCMD::execute()` and the lock to make sure those are synchronized. Anyhow, do you agree with my above understanding that what is protected by the lock is the "cached baseline"? So as long as JFR doesn't use it, the lock should not be needed?
>> 
>>> Note that there is no way to get consistent counter values across all categories anyway. All the counters are updated atomically without lock, and that is all we can afford in malloc. Counters are not synchronized among themselves.
>> 
>> Agreed, this is a snapshot at a given point of a moving target.
>
>> Looking a bit more at this. To me it seems like we are protecting the stored baseline with the lock is:
>> 
>> ```
>> class MemTracker : AllStatic {
>> ...
>>   // Stored baseline
>>   static MemBaseline      _baseline;
>> ...
>> };
>> ```
>> 
>> That the `NMTDCmd` code uses through `MemTracker::get_baseline()`. So if the JFR events keep its own baseline we should be fine without the lock. Or am I missing something here?
> 
> I think this should be safe. 
> 
> But I still think about thread stack probing. Your sampler will compete via ThreadCritical with thread creation and destruction (and a number of other unrelated things) because it needs to walk the virtual memory region chains. I think NMT reporting was never planned to be done in high volume. 
> 
> I experimented a lot with a large number of threads and NMT thread stack probing. Its results are sometimes surprising, e.g. I regularly get much higher numbers from NMT than for process RSS. E.g. NMT tells me we have 4g committed thread stack when RSS is just 630M. This is without swapping.
> 
> So I am not sure how reliable thread stack probing actually is in reality. I know that Zhengyu put a lot of work into it and it needed a number of iterations. Since all the rest together is just reading out counters and reporting them, thread stack probing is the only unknown time-wise. It is O(n) over both number of threads and thread stack size and will potentially lock. We can improve matters somewhat with smarter locking, but the O(n) part remains.
> 
> Maybe leave that stuff out and just report the normal counters? Or can we live with large jitters here? This is a JFR question and I'm not a JFR developer :)

Yes, maybe creating a special baseline-call that skips the thread-stacks would be ok. As you say, if these are expensive it might be problematic to do them in the periodic thread.

-------------

PR: https://git.openjdk.org/jdk/pull/11449