RFR: 8342818: Implement CPU Time Profiling for JFR
Markus Grönlund
mgronlun at openjdk.org
Tue Oct 29 11:00:22 UTC 2024
On Wed, 28 Aug 2024 16:47:21 GMT, Johannes Bechberger <jbechberger at openjdk.org> wrote:
> This is the code for the [JEP draft: CPU Time based profiling for JFR].
Changes requested by mgronlun (Reviewer).
Changes requested by mgronlun (Reviewer).
Changes requested by mgronlun (Reviewer).
src/hotspot/share/jfr/metadata/metadata.xml line 945:
> 943: </Event>
> 944:
> 945: <Event name="CPUTimeExecutionSampleThrottles" category="Java Virtual Machine, Profiling" label="CPU Time Method Profiling Dropped Samples" description="Records that the CPU time dropped samples because of internal throttling"
A better name for this event is "CPUTimeExecutionSampleLoss".
This is because JFR already has a general concept of "DataLoss," so this name would better align with it.
Please update references to "dropped" -> "lost"
src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 92:
> 90: case _thread_blocked:
> 91: case _thread_in_native:
> 92: case _thread_in_vm: // walking in vm causes weird bugs (assertions in G1 fail), so don't
Is this comment still relevant?
src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 102:
> 100: static bool is_excluded(JavaThread* thread) {
> 101: return thread->is_hidden_from_external_view() ||
> 102: (os::is_readable_pointer(thread->jfr_thread_local()) && thread->jfr_thread_local()->is_excluded());
thread->jfr_thread_local() is always safe. Its a member struct of Thread.
src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 135:
> 133: JfrTicks _start_time;
> 134: JfrTicks _end_time;
> 135: JavaThread* _sampled_thread;
You can store the thread's traceid (u8) here instead of the Thread* itself.
Then, later, you can skip using both JFRRecordSampledThreadCallback and crash protection.
src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 154:
> 152: void set_end_time(JfrTicks end_time) { _end_time = end_time; }
> 153: JfrTicks end_time() const { return _end_time; }
> 154: void set_sampled_thread(JavaThread* thread) { Atomic::store(&_sampled_thread, thread); }
Why atomic store? Again, try to use the traceid of the thread (from the JfrThreadLocal (updated dynamically depending on vthread or regular thread).
src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 197:
> 195: } else {
> 196: _error = ERROR_NO_TOPFRAME;
> 197: return;
Can remove.
src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp line 29:
> 27:
> 28: #include "jfr/utilities/jfrAllocation.hpp"
> 29: #include "runtime/javaThread.hpp"
You don't need to include the "runtime/javaThread.hpp" header.
Instead, please hoist the forward declaration of "class JavaThread" to outside the "#if defined(LINUX)"
src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdLoadBarrier.cpp line 102:
> 100: }
> 101:
> 102: JfrBuffer* JfrTraceIdLoadBarrier::renew_sampler_enqueue_buffer(Thread* thread, size_t size) {
JfrCPUTimeThreadSampler should not need to take this specialized path because it does not suspend threads. It can be left as is.
src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdLoadBarrier.hpp line 75:
> 73: friend class JfrStackTrace;
> 74: friend class JfrThreadSampler;
> 75: friend class JfrCPUTimeThreadSampler;
Not needed.
src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdLoadBarrier.hpp line 83:
> 81: static void load_barrier(const Klass* klass);
> 82: static JfrBuffer* get_sampler_enqueue_buffer(Thread* thread);
> 83: static JfrBuffer* renew_sampler_enqueue_buffer(Thread* thread, size_t size = 0);
Leave as is.
src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceIdLoadBarrier.inline.hpp line 92:
> 90: auto holder = method->method_holder();
> 91: auto res = load(holder, method);
> 92: return res;
Why this change?
src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.cpp line 2:
> 1: /*
> 2: * Copyright (c) 2011, 2023, Oracle and/or its affiliates. All rights reserved.
Copyrights?
src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.cpp line 35:
> 33:
> 34: JfrAsyncStackFrame::JfrAsyncStackFrame(const Method* method, int bci, u1 type, int lineno, const InstanceKlass* ik) :
> 35: _klass(ik), _method(method), _line(lineno), _bci(bci), _type(type) {}
Might be possible to delay processing of line numbers.
src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.cpp line 49:
> 47:
> 48: static inline bool is_full(const JfrBuffer* enqueue_buffer) {
> 49: return enqueue_buffer->free_size() < min_valid_free_size_bytes;
The enqueue_buffer constructs can be removed from this code because the thread does not suspend other threads. Hence, it can rely on the regular JfrTraceId load barrier tagging mechanism (which renews and enqueues new buffers implicitly).
src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.cpp line 99:
> 97: type = JfrStackFrame::FRAME_INLINE;
> 98: }
> 99: _frames[count] = JfrAsyncStackFrame(method, bci, type, method->line_number_from_bci(bci), method->method_holder());
Calculating line number information is expensive. There is no need to do it for the sampled thread; it can be done by the thread registering the trace into the stack trace repository (it ejects an attempt to store a JfrStackTrace with no line number information).
src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.cpp line 118:
> 116: const JfrAsyncStackTrace* _asyncTrace;
> 117: JfrStackTrace* _trace;
> 118: const JfrBuffer* const _enqueue_buffer;
Remove, not necessary.
src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.cpp line 126:
> 124: for (u4 i = 0; i < _nr_of_frames; i++) {
> 125: const JfrAsyncStackFrame& frame = _frames[i];
> 126: if (!Method::is_valid_method(frame._method) || is_full(enqueue_buffer)) {
We should try to remove is_full(enqueue_buffer)).
src/hotspot/share/jfr/recorder/stacktrace/jfrAsyncStackTrace.hpp line 28:
> 26: #define SHARE_JFR_RECORDER_STACKTRACE_JFRASYNCSTACKTRACE_HPP
> 27:
> 28: #include "jfr/recorder/stacktrace/jfrStackTrace.hpp"
Including this header is not needed.
src/hotspot/share/jfr/recorder/stacktrace/jfrStackTrace.hpp line 30:
> 28: #include "jfr/utilities/jfrAllocation.hpp"
> 29: #include "jfr/utilities/jfrTypes.hpp"
> 30: #include "runtime/vframe.inline.hpp"
Do not include an .inline.hpp into a .hpp.
src/hotspot/share/jfr/recorder/stacktrace/jfrStackTrace.hpp line 41:
> 39: private:
> 40: bool _vthread;
> 41: const ContinuationEntry* _cont_entry;
You need a fwd declaration for ContinuationEntry here?
src/hotspot/share/jfr/support/jfrThreadLocal.hpp line 76:
> 74: bool _dead;
> 75: LINUX_ONLY(bool _has_cpu_timer = false);
> 76: LINUX_ONLY(timer_t _cpu_timer);
No include for timer_t?
src/hotspot/share/runtime/javaThread.cpp line 27:
> 25:
> 26: #include "precompiled.hpp"
> 27: #include "jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp"
We try to channel all (most) VM code through the jfr.hpp as a single entry point to the JFR system. Can these hooks be incorporated in the JfrThreadLocal::on_start() and JfrThreadLocal::on_exit()?
If so, you don't need to touch anything in javaThread.cpp.
src/hotspot/share/runtime/javaThread.cpp line 759:
> 757: }
> 758:
> 759: JFR_ONLY(JfrCPUTimeThreadSampling::on_javathread_create(this));
See if you can move these into JfrThreadLocal::on_start() and JfrThreadLocal::on_exit().
src/jdk.jfr/share/classes/jdk/jfr/internal/query/view.ini line 419:
> 417:
> 418: [application.cpu-time-overflowed-samples]
> 419: label = "CPU Time Samples not recorded because of internal throttling"
Align rename for CPUTimeSampleLoss
src/jdk.jfr/share/classes/jdk/jfr/internal/query/view.ini line 420:
> 418: [application.cpu-time-overflowed-samples]
> 419: label = "CPU Time Samples not recorded because of internal throttling"
> 420: table = "COLUMN 'Dropped samples'
Lost samples
test/jdk/jdk/jfr/event/profiling/TestMultipleRecordings.java line 1:
> 1: /*
What is the purpose of this test?
-------------
PR Review: https://git.openjdk.org/jdk/pull/20752#pullrequestreview-2279856307
PR Review: https://git.openjdk.org/jdk/pull/20752#pullrequestreview-2281421499
PR Review: https://git.openjdk.org/jdk/pull/20752#pullrequestreview-2282590777
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744364821
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1804541089
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744526731
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744529196
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744531604
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1745258406
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1743597903
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744368940
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744369131
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744369305
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1804534948
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744517980
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1745274936
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744512910
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1745270064
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744513289
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1745276007
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744371764
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744373520
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744374340
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744375984
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744488057
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1745272051
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744501220
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744502178
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r1744506419
More information about the hotspot-dev
mailing list