RFR: 8337789: JEP 509: JFR CPU-Time Profiling (Experimental) [v47]

Patricio Chilano Mateo pchilanomate at openjdk.org
Mon May 12 18:22:07 UTC 2025


On Mon, 5 May 2025 14:22:07 GMT, Johannes Bechberger <jbechberger at openjdk.org> wrote:

>> This is the code for the [JEP draft: CPU Time based profiling for JFR].
>> 
>> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests).
>> 
>> A version based on the cooperative sampling JEP can be found [here](https://github.com/parttimenerd/jdk/tree/parttimenerd_cooperative_cpu_time_sampler).
>
> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Simplify local trace stack

Hi, I looked through the changes and added some comments below. Thanks.

src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 107:

> 105:   JavaThread* jt;
> 106: 
> 107:   if (raw_thread == nullptr || !raw_thread->is_Java_thread() ||

Why would raw_thread be nullptr? I see we add the timer after the JavaThread has been initialized and we remove it before the JavaThread is deleted. Also we only create the timer for JavaThreads already, so these look like they should be asserts.

src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 112:

> 110:   }
> 111: 
> 112:   if (jt->is_JfrSampler_thread()) {

Sampler thread is a non-JavaThread so we shouldn’t get here.

src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 369:

> 367:         if (count < 500) {
> 368:           Atomic::release_store(&e->_state, state_empty(head));
> 369:           Atomic::dec(&_size);

Could you explain this scenario in detail? If we failed the previous branches then we are in a different generation but then I don’t see why doing this is correct.

src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 630:

> 628:  // assert(!stack.is_empty(), "invariant");
> 629:   for (u4 i = 0; i < stack.size(); i++) {
> 630:     JfrCPUTimeThreadSampler::record_event(thread, stack.at(i), tmp_stackframes);

Shouldn’t we return the trace to the _frame_store as with on_safepoint()?

src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 678:

> 676: void JfrCPUTimeThreadSampler::on_safepoint(JavaThread* thread) {
> 677:   JfrThreadLocal* jtl = thread->jfr_thread_local();
> 678:   if (jtl == nullptr) {

Shouldn’t this be an assert that `jtl != nullptr`?

src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 738:

> 736:   long max_chunk_size = JfrOptionSet::max_chunk_size() == 0 ? 12 * 1024 * 1024 : JfrOptionSet::max_chunk_size() / 2;
> 737:   int max_size = max_chunk_size / 2 / wordSize / JfrOptionSet::stackdepth();
> 738:   if (trace_stack_size < 20 * 4) {

Based on the above initialization `trace_stack_size` will be at least 100 so this should never happen.

src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 819:

> 817:   }
> 818:   JfrThreadLocal* jtl = jt->jfr_thread_local();
> 819:   if (jtl == nullptr) {

Shouldn’t this be an assert `jtl != nullptr`?

src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 823:

> 821:   }
> 822:   if (jt->thread_state() != _thread_in_native) {
> 823:     jtl->set_wants_cpu_time_out_of_safepoint_recording(false);

Why do we clear the flag this early before we acquired the lock?

src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 836:

> 834:     int64_t period = get_sampling_period() * (info->si_overrun + 1);
> 835:     trace->record_trace(jt, context, period);
> 836:     if (jtl->cpu_time_jfr_stack().enqueue(trace)) {

If this fails shouldn’t we return the trace to the `_frame_store`?

src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 887:

> 885:   sev.sigev_signo = SIG;
> 886:   sev.sigev_value.sival_ptr = &t;
> 887:   ((int*)&sev.sigev_notify)[1] = thread->osthread()->thread_id();

Is there a reason why not use `sev._sigev_un._tid` directly if this is Linux only?

src/hotspot/share/jfr/periodic/sampling/jfrThreadSampler.cpp line 185:

> 183: * leave memory (in case of signal / exception) in an inconsistent state. */
> 184: void OSThreadSampler::protected_task(const SuspendedThreadTaskContext& context) {
> 185:   NoResourceMark nrm;

I run an execution-time profile on some of the vthread tests (e.g. test/jdk/java/lang/Thread/virtual/stress/Skynet.java) and got crashes because of this change. Here is the crash and stacktrace from hs_err file:


# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (src/hotspot/share/memory/resourceArea.cpp:57), pid=673181, tid=673216
#  fatal error: memory leak: allocating with NoResourceMark
#
# JRE version: OpenJDK Runtime Environment (25.0) (fastdebug build 25-internal-2025-04-25-2335278.pchilano...)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 25-internal-2025-04-25-2335278.pchilano..., mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x187e4f1]  ResourceArea::verify_has_resource_mark()+0x51
#

Stack: [0x00007f14fbbfd000,0x00007f14fbcfd000],  sp=0x00007f14fbcfb480,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x187e4f1]  ResourceArea::verify_has_resource_mark()+0x51  (resourceArea.cpp:57)
V  [libjvm.so+0x187e58a]  resource_allocate_bytes(unsigned long, AllocFailStrategy::AllocFailEnum)+0x4a  (resourceArea.inline.hpp:33)
V  [libjvm.so+0x1b1220e]  char* UNICODE::as_utf8<signed char>(signed char const*, unsigned long&)+0x3fe  (utf8.cpp:531)
V  [libjvm.so+0x10371ad]  java_lang_String::as_utf8_string(oop, unsigned long&)+0x14d  (javaClasses.cpp:658)
V  [libjvm.so+0x114fe2e]  JfrThreadName::name(Thread const*, int&, oop)+0x2ae  (jfrThreadState.cpp:124)
V  [libjvm.so+0x115c9af]  JfrThreadConstant::serialize(JfrCheckpointWriter&)+0x9f  (jfrType.cpp:292)
V  [libjvm.so+0x115e115]  JfrTypeManager::write_checkpoint(Thread*, unsigned long, oop)+0x205  (jfrTypeManager.cpp:132)
V  [libjvm.so+0x10884c2]  JfrCheckpointManager::write_checkpoint(Thread*, unsigned long, oop)+0x52  (jfrCheckpointManager.cpp:677)
V  [libjvm.so+0x11442e7]  JfrThreadLocal::set_vthread_epoch(JavaThread const*, unsigned long, unsigned short)+0x207  (jfrThreadLocal.cpp:381)
V  [libjvm.so+0x1144858]  JfrThreadLocal::thread_id(Thread const*)+0xf8  (jfrThreadLocal.cpp:428)
V  [libjvm.so+0x1148876]  OSThreadSampler::protected_task(SuspendedThreadTaskContext const&)+0x106  (jfrThreadSampler.cpp:203)
V  [libjvm.so+0x1904857]  SuspendedThreadTask::internal_do_task()+0x47  (signals_posix.cpp:1840)
V  [libjvm.so+0x1a257c1]  SuspendedThreadTask::run()+0x11  (suspendedThreadTask.cpp:29)
V  [libjvm.so+0x11489ea]  JfrThreadSampleClosure::sample_thread_in_java(JavaThread*, JfrStackFrame*, unsigned int)+0xba  (jfrThreadSampler.cpp:210)
V  [libjvm.so+0x1149813]  JfrThreadSampleClosure::do_sample_thread(JavaThread*, JfrStackFrame*, unsigned int, JfrSampleType)+0x313  (jfrThreadSampler.cpp:409)
V  [libjvm.so+0x114e704]  JfrThreadSampler::task_stacktrace(JfrSampleType, JavaThread**) [clone .constprop.1]+0x4c4  (jfrThreadSampler.cpp:629)
V  [libjvm.so+0x114f2f4]  JfrThreadSampler::run()+0x254  (jfrThreadSampler.cpp:563)
V  [libjvm.so+0x1aa3b76]  Thread::call_run()+0xb6  (thread.cpp:231)
V  [libjvm.so+0x17341b8]  thread_native_entry(Thread*)+0x128  (os_linux.cpp:872)

src/hotspot/share/jfr/support/jfrThreadLocal.hpp line 322:

> 320:   bool acquire_cpu_time_jfr_out_of_safepoint_lock();
> 321:   void acquire_cpu_time_jfr_dequeue_lock();
> 322:   void release_cpu_time_jfr_queue_lock();

Why not just define cpu_time_jfr_lock()/cpu_time_jfr_unlock()? It’s kind of confusing reading the methods where these are used since we need to remember they are the same lock.

src/hotspot/share/runtime/handshake.cpp line 593:

> 591:     Jfr::on_safepoint(_handshakee);
> 592:   }
> 593: #endif

Adding this here plus all the other changes in the handshake code seem like a hack since this is not an actual handshake operation. I would just move this to `SafepointMechanism::process()` and add an additional condition in `update_poll_values()` and `should_process()`.

-------------

Changes requested by pchilanomate (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20752#pullrequestreview-2833605447
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r2085178255
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r2085178880
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r2085181713
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r2085183137
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r2085185240
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r2085187897
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r2085188881
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r2085189425
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r2085189956
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r2085190940
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r2085193286
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r2085195330
PR Review Comment: https://git.openjdk.org/jdk/pull/20752#discussion_r2085201536


More information about the hotspot-dev mailing list