RFR: 8342818: Implement CPU Time Profiling for JFR
Johannes Bechberger
jbechberger at openjdk.org
Tue Oct 29 11:00:23 UTC 2024
On Thu, 5 Sep 2024 13:16:22 GMT, Markus Grönlund <mgronlun at openjdk.org> wrote:
>> This is the code for the [JEP draft: CPU Time based profiling for JFR].
>
> Changes requested by mgronlun (Reviewer).
Regarding enqueue buffer handling (cc @mgronlun): we need to check regularly whether we need to renew the buffer. If we omit this, than it leads to many errors in the method resolve logic (catched by the ThreadCrashProtection).
> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 92:
>
>> 90: case _thread_blocked:
>> 91: case _thread_in_native:
>> 92: case _thread_in_vm: // walking in vm causes weird bugs (assertions in G1 fail), so don't
>
> Is this comment still relevant?
It isn't. Good catch.
> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 102:
>
>> 100: static bool is_excluded(JavaThread* thread) {
>> 101: return thread->is_hidden_from_external_view() ||
>> 102: (os::is_readable_pointer(thread->jfr_thread_local()) && thread->jfr_thread_local()->is_excluded());
>
> thread->jfr_thread_local() is always safe. Its a member struct of Thread.
removed
> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 135:
>
>> 133: JfrTicks _start_time;
>> 134: JfrTicks _end_time;
>> 135: JavaThread* _sampled_thread;
>
> You can store the thread's traceid (u8) here instead of the Thread* itself.
>
> Then, later, you can skip using both JFRRecordSampledThreadCallback and crash protection.
No, we can't. This crashes the sampler. The problem is, that obtaining the traceid via `JfrThreadLocal::thread_id(jt)` acquires a lock (and more) when the traceid is not set already. So we can't use it in the signal handler.
> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 154:
>
>> 152: void set_end_time(JfrTicks end_time) { _end_time = end_time; }
>> 153: JfrTicks end_time() const { return _end_time; }
>> 154: void set_sampled_thread(JavaThread* thread) { Atomic::store(&_sampled_thread, thread); }
>
> Why atomic store? Again, try to use the traceid of the thread (from the JfrThreadLocal (updated dynamically depending on vthread or regular thread).
Removed the atomic store
> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 197:
>
>> 195: } else {
>> 196: _error = ERROR_NO_TOPFRAME;
>> 197: return;
>
> Can remove.
removed
> src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdLoadBarrier.hpp line 75:
>
>> 73: friend class JfrStackTrace;
>> 74: friend class JfrThreadSampler;
>> 75: friend class JfrCPUTimeThreadSampler;
>
> Not needed.
removed
> src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdLoadBarrier.inline.hpp line 92:
>
>> 90: auto holder = method->method_holder();
>> 91: auto res = load(holder, method);
>> 92: return res;
>
> Why this change?
Good catch, I'll remove it
> src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.cpp line 35:
>
>> 33:
>> 34: JfrAsyncStackFrame::JfrAsyncStackFrame(const Method* method, int bci, u1 type, int lineno, const InstanceKlass* ik) :
>> 35: _klass(ik), _method(method), _line(lineno), _bci(bci), _type(type) {}
>
> Might be possible to delay processing of line numbers.
As explained above: I think this would not be a good idea, as this would lead to additional load in the queue consuming thread, especially bad when the profiled systems has lots of cores.
> src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.cpp line 49:
>
>> 47:
>> 48: static inline bool is_full(const JfrBuffer* enqueue_buffer) {
>> 49: return enqueue_buffer->free_size() < min_valid_free_size_bytes;
>
> The enqueue_buffer constructs can be removed from this code because the thread does not suspend other threads. Hence, it can rely on the regular JfrTraceId load barrier tagging mechanism (which renews and enqueues new buffers implicitly).
removed
> src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.cpp line 99:
>
>> 97: type = JfrStackFrame::FRAME_INLINE;
>> 98: }
>> 99: _frames[count] = JfrAsyncStackFrame(method, bci, type, method->line_number_from_bci(bci), method->method_holder());
>
> Calculating line number information is expensive. There is no need to do it for the sampled thread; it can be done by the thread registering the trace into the stack trace repository (it ejects an attempt to store a JfrStackTrace with no line number information).
Why wouldn't we then do this in parallel in the sampled threads? Especially on systems with lot's of cores, this could otherwise be an issue and drastically increase the amount of dropped samples.
> src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.cpp line 118:
>
>> 116: const JfrAsyncStackTrace* _asyncTrace;
>> 117: JfrStackTrace* _trace;
>> 118: const JfrBuffer* const _enqueue_buffer;
>
> Remove, not necessary.
removed
> src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.cpp line 126:
>
>> 124: for (u4 i = 0; i < _nr_of_frames; i++) {
>> 125: const JfrAsyncStackFrame& frame = _frames[i];
>> 126: if (!Method::is_valid_method(frame._method) || is_full(enqueue_buffer)) {
>
> We should try to remove is_full(enqueue_buffer)).
This code is similar to the code in the old sampler and ensures that the buffer is large enough.
> src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.hpp line 28:
>
>> 26: #define SHARE_JFR_RECORDER_STACKTRACE_JFRASYNCSTACKTRACE_HPP
>> 27:
>> 28: #include "jfr/recorder/stacktrace/jfrStackTrace.hpp"
>
> Including this header is not needed.
removed
> src/hotspot/share/runtime/javaThread.cpp line 27:
>
>> 25:
>> 26: #include "precompiled.hpp"
>> 27: #include "jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp"
>
> We try to channel all (most) VM code through the jfr.hpp as a single entry point to the JFR system. Can these hooks be incorporated in the JfrThreadLocal::on_start() and JfrThreadLocal::on_exit()?
>
> If so, you don't need to touch anything in javaThread.cpp.
done
> src/hotspot/share/runtime/javaThread.cpp line 759:
>
>> 757: }
>> 758:
>> 759: JFR_ONLY(JfrCPUTimeThreadSampling::on_javathread_create(this));
>
> See if you can move these into JfrThreadLocal::on_start() and JfrThreadLocal::on_exit().
Seems to work
> test/jdk/jdk/jfr/event/profiling/TestMultipleRecordings.java line 1:
>
>> 1: /*
>
> What is the purpose of this test?
Tests that restarting the sampler works.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20752#issuecomment-2348807804
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1804543106
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1756642656
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1756750417
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1756816348
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1756645107
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1756623496
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1804537662
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1756712764
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1756635808
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1752030136
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1756636093
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1752031872
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1756629453
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1756817283
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1756816729
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1756634818
More information about the hotspot-dev
mailing list