RFR: 8342818: Implement CPU Time Profiling for JFR

Tue Oct 29 11:00:24 UTC 2024

On Thu, 12 Sep 2024 12:20:51 GMT, Johannes Bechberger <jbechberger at openjdk.org> wrote:

>> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 135:
>> 
>>> 133:   JfrTicks _start_time;
>>> 134:   JfrTicks _end_time;
>>> 135:   JavaThread* _sampled_thread;
>> 
>> You can store the thread's traceid (u8) here instead of the Thread* itself.
>> 
>> Then, later, you can skip using both JFRRecordSampledThreadCallback and crash protection.
>
> No, we can't. This crashes the sampler. The problem is, that obtaining the traceid via `JfrThreadLocal::thread_id(jt)` acquires a lock (and more) when the traceid is not set already. So we can't use it in the signal handler.

Hmm. That's not good. We could eliminate the spinlock, but that will require a thread assigning its id early in thread_state_new to prevent our old sampler from hitting it (and also the new one from skipping new threads). However, this is complex because it involves reading the tid from a Java oop, so we need to investigate if that is a valid thing to do, in those states. 

I have to think some more about this. Too bad you need to do this complexity.

>> src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.cpp line 35:
>> 
>>> 33: 
>>> 34: JfrAsyncStackFrame::JfrAsyncStackFrame(const Method* method, int bci, u1 type, int lineno, const InstanceKlass* ik) :
>>> 35:   _klass(ik), _method(method), _line(lineno), _bci(bci), _type(type) {}
>> 
>> Might be possible to delay processing of line numbers.
>
> As explained above: I think this would not be a good idea, as this would lead to additional load in the queue consuming thread, especially bad when the profiled systems has lots of cores.

I see your point better now. Makes sense to distribute the load.

>> src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.cpp line 99:
>> 
>>> 97:       type = JfrStackFrame::FRAME_INLINE;
>>> 98:     }
>>> 99:     _frames[count] = JfrAsyncStackFrame(method, bci, type, method->line_number_from_bci(bci), method->method_holder());
>> 
>> Calculating line number information is expensive. There is no need to do it for the sampled thread; it can be done by the thread registering the trace into the stack trace repository (it ejects an attempt to store a JfrStackTrace with no line number information).
>
> Why wouldn't we then do this in parallel in the sampled threads? Especially on systems with lot's of cores, this could otherwise be an issue and drastically increase the amount of dropped samples.

Makes sense.

>> test/jdk/jdk/jfr/event/profiling/TestMultipleRecordings.java line 1:
>> 
>>> 1: /*
>> 
>> What is the purpose of this test?
>
> Tests that restarting the sampler works.

It should be named something related to the CPU sampler, then. We already have many tests for starting multiple recordings.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1769533459
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1769530925
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1769530945
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1773038807