From jbechberger at openjdk.org Sun Jun 1 07:13:00 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 07:13:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v25] In-Reply-To: References: Message-ID: <2nYqo0wpUrLLJV9iDRLwj5xjV06waCzu8Ma8YSAToIY=.1059ee96-77f8-47e6-8797-3f2b47783311@github.com> On Sat, 31 May 2025 10:37:29 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove debug printf > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp line 139: > >> 137: >> 138: // Trigger sampling while a thread is not in a safepoint, from a seperate thread >> 139: static void trigger_is_thread_in_native_stackwalking(); > > Is it sampling that is triggered? Sampling refers to the asynchronous signal received from the operating system (OS). > > You are asking for the sampler thread to process already taken JFR Sample Requests in the queue, right? Yes and I like your implied name better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2118819169 From jbechberger at openjdk.org Sun Jun 1 07:17:02 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 07:17:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v25] In-Reply-To: References: Message-ID: On Sat, 31 May 2025 10:09:15 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove debug printf > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp line 36: > >> 34: #if defined(LINUX) >> 35: >> 36: #include "memory/padded.hpp" > > What is padded? If not, this should go. Good catch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2118820425 From jbechberger at openjdk.org Sun Jun 1 07:22:58 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 07:22:58 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v24] In-Reply-To: <-QiSWEqppeW60aedVbLA3WTmnba7Fry53Qr86wE2EPs=.7a6327ce-7ef0-4b1c-bc68-0421ba3fd46f@github.com> References: <-QiSWEqppeW60aedVbLA3WTmnba7Fry53Qr86wE2EPs=.7a6327ce-7ef0-4b1c-bc68-0421ba3fd46f@github.com> Message-ID: On Fri, 30 May 2025 09:19:47 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/metadata/metadata.xml line 975: >> >>> 973: >>> 974: >>> 975: > >> I'm not a reviewer, but I just wanted to comment something I noticed. >> The JEP document says CPUTimeSampleLos'**t**', but the implementation says CPUTimeSampleLos'**s**'. Which one is correct? >> A sentence from the JEP document: >> >> Another new event,?`jdk.CPUTimeSampleLost`, is emitted when samples are lost ... > > Thanks for catching this mistake. I'll fix it this afternoon. I fixed it by changing the JEP. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2118825477 From jbechberger at openjdk.org Sun Jun 1 07:26:19 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 07:26:19 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - Refactoring - Remove convoluted native trace logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/3a10d552..439763a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=24-25 Stats: 56 lines in 5 files changed: 3 ins; 27 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From mgronlun at openjdk.org Sun Jun 1 13:04:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 13:04:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 42: > 40: #include "runtime/javaThread.hpp" > 41: #include "runtime/osThread.hpp" > 42: #include "runtime/safepointMechanism.hpp" Not needed, since you have the .inline.hpp src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 102: > 100: > 101: u4 JfrCPUTimeTraceQueue::size() const { > 102: return Atomic::load(&_head); Is this read from multiple threads? In that case, load_acquire(). src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 200: > 198: void sample_thread(JfrSampleRequest& request, void* ucontext, JavaThread* jt, JfrThreadLocal* tl); > 199: > 200: // sample all threads that are in native state (and requested to be sampled) We are not really "sampling", but processing their queues, no? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119128911 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119129239 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119129708 From mgronlun at openjdk.org Sun Jun 1 13:08:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 13:08:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 367: > 365: JfrCPUTimeSampleRequest& request = queue.at(i); > 366: JfrStackTrace stacktrace; > 367: traceid tid = JfrThreadLocal::thread_id(thread); Check the tid as a function of the JfrSampleRequest, like we do in JFR Cooperative Sampling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119130991 From mgronlun at openjdk.org Sun Jun 1 13:12:01 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 13:12:01 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 413: > 411: } > 412: if (Atomic::load(&count) % 1000 == 0) { > 413: log_info(jfr)("CPU thread sampler sent %zu events, lost %d, biased %zu\n", Atomic::load(&count), Atomic::load(&_lost_samples_sum), Atomic::load(&biased_count)); put this logging under jfr+debug or log+trace please ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119137014 From mgronlun at openjdk.org Sun Jun 1 13:23:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 13:23:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 557: > 555: if (!check_state(jt) || > 556: jt->is_JfrRecorder_thread()) { > 557: queue.increment_lost_samples(); is_JfrRecorder_thread() will not appear here since it's excluded and would have returned nullptr from get_java_thread_if_valid(). src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 558: > 556: jt->is_JfrRecorder_thread()) { > 557: queue.increment_lost_samples(); > 558: tl->set_do_async_processing_of_cpu_time_jfr_requests(false); Why is this restored here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119142346 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119142510 From jbechberger at openjdk.org Sun Jun 1 13:43:58 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 13:43:58 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 13:19:48 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 558: > >> 556: jt->is_JfrRecorder_thread()) { >> 557: queue.increment_lost_samples(); >> 558: tl->set_do_async_processing_of_cpu_time_jfr_requests(false); > > Why is this restored here? Because I shouldn't sample if the thread isn't in native state anymore. The thread is probably sampled anyway on the outgoing safepoint. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119157906 From mgronlun at openjdk.org Sun Jun 1 15:07:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 15:07:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp line 362: > 360: drain_enqueued_requests(now, tl, jt, current); > 361: #ifdef LINUX > 362: if (tl->has_cpu_time_jfr_requests()) { You are having all threads traverse over this lock, even though the cpu time sampler is disabled by default. Can it be improved? src/hotspot/share/jfr/support/jfrThreadLocal.cpp line 604: > 602: > 603: bool JfrThreadLocal::has_cpu_time_jfr_requests() { > 604: return Atomic::load(&_has_cpu_time_jfr_requests); Atomic::load_acquire() src/hotspot/share/jfr/support/jfrThreadLocal.cpp line 620: > 618: > 619: bool JfrThreadLocal::wants_async_processing_of_cpu_time_jfr_requests() { > 620: return Atomic::load(&_do_async_processing_of_cpu_time_jfr_requests); Atomic::load_acquire() ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119242319 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119243305 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119243393 From jbechberger at openjdk.org Sun Jun 1 15:07:00 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 15:07:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: <62JxxY-xn3fwz0PnhcnIH6DOWBQUPIq_fhDD_7YrSmA=.bfbb317a-403e-4826-a3ed-c364882e821b@github.com> On Sun, 1 Jun 2025 15:01:06 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp line 362: > >> 360: drain_enqueued_requests(now, tl, jt, current); >> 361: #ifdef LINUX >> 362: if (tl->has_cpu_time_jfr_requests()) { > > You are having all threads traverse over this lock, even though the cpu time sampler is disabled by default. Can it be improved? Not without allocating in the signal handler ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119243238 From mgronlun at openjdk.org Sun Jun 1 15:27:06 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 15:27:06 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 332: > 330: JavaThread* jt = tlh.list()->thread_at(i); > 331: JfrThreadLocal* tl = jt->jfr_thread_local(); > 332: if (tl != nullptr && tl->wants_async_processing_of_cpu_time_jfr_requests()) { tl is never nullptr. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 344: > 342: > 343: // equals operator for JfrSampleRequest > 344: inline bool operator==(const JfrSampleRequest& lhs, const JfrSampleRequest& rhs) { Can be removed. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 574: > 572: > 573: if (queue.enqueue(request)) { > 574: tl->set_has_cpu_time_jfr_requests(true); This should only need to be set when enqueuing the first entry. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 581: > 579: > 580: if (jt->thread_state() == _thread_in_native && > 581: queue.size() > queue.capacity() * 2 / 3) { Is this logic still valid? You are only asking for a async processing depending on the load factor of the queue? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 586: > 584: JfrCPUTimeThreadSampling::trigger_async_processing_of_cpu_time_jfr_requests(); > 585: } else { > 586: tl->set_do_async_processing_of_cpu_time_jfr_requests(false); Was it true before and needed a reset? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119250661 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119250887 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119248176 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119248824 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119249381 From jbechberger at openjdk.org Sun Jun 1 15:27:06 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 15:27:06 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 15:18:52 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 574: > >> 572: >> 573: if (queue.enqueue(request)) { >> 574: tl->set_has_cpu_time_jfr_requests(true); > > This should only need to be set when enqueuing the first entry. You're right > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 581: > >> 579: >> 580: if (jt->thread_state() == _thread_in_native && >> 581: queue.size() > queue.capacity() * 2 / 3) { > > Is this logic still valid? You are only asking for a async processing depending on the load factor of the queue? Yes, so I only start the thread walking if necessary ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119248709 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119250511 From mgronlun at openjdk.org Sun Jun 1 15:35:01 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 15:35:01 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 536: > 534: } > 535: > 536: volatile size_t count__ = 0; unused? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119258988 From jbechberger at openjdk.org Sun Jun 1 15:39:00 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 15:39:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: <6Idy8j9wbNr9udYMhsW0BQmhb8dQvc_p20vCYtg5kZc=.6380eee6-bd1b-45d0-bca8-c8068e59bd36@github.com> On Sun, 1 Jun 2025 15:32:08 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 536: > >> 534: } >> 535: >> 536: volatile size_t count__ = 0; > > unused? Yes. > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 586: > >> 584: JfrCPUTimeThreadSampling::trigger_async_processing_of_cpu_time_jfr_requests(); >> 585: } else { >> 586: tl->set_do_async_processing_of_cpu_time_jfr_requests(false); > > Was it true before and needed a reset? I could check this before setting ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119260755 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119261558 From mgronlun at openjdk.org Sun Jun 1 15:43:06 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 15:43:06 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: <66tRvhjE2LrwccsAYmRycS6QLF2KdRg-XHfk-scr-wg=.c7f269f0-301a-4da3-ae54-7f6bc7a440b1@github.com> On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/support/jfrThreadLocal.cpp line 587: > 585: } > 586: > 587: bool JfrThreadLocal::acquire_cpu_time_jfr_native_lock() { It appears that the lock state 'NATIVE' is redundant; an asynchronous request for queue drainage only requires the dequeue lock state. NATIVE can be removed to simplify the lock protocol. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119268003 From mgronlun at openjdk.org Sun Jun 1 18:12:58 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 18:12:58 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 15:24:17 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 344: > >> 342: >> 343: // equals operator for JfrSampleRequest >> 344: inline bool operator==(const JfrSampleRequest& lhs, const JfrSampleRequest& rhs) { > > Can be removed. Unless you still want to try the ljf JfrSampleRequest optimization for the native ljf, which I kind of like now that I understand it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119386104 From mgronlun at openjdk.org Sun Jun 1 18:13:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 18:13:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 15:23:06 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 581: >> >>> 579: >>> 580: if (jt->thread_state() == _thread_in_native && >>> 581: queue.size() > queue.capacity() * 2 / 3) { >> >> Is this logic still valid? You are only asking for async processing assistance depending on the load factor of the queue? > > Yes, so I only start the thread walking if necessary I see. With a bounded queue as used in this solution, it can work quite nicely, that is, if the thread is actually on CPU in native, and just not waiting - if waiting (which is most likely) then pending requests could take a long time to be sent to consumers. I also understand better the optimization you tried as part of async walk in native and frames. Also quite nice, to walk from the last JfrSampleRequest and do equals to "batch" the top JFR sample requests that are the same (i,.e taken for the ljf). Maybe you can retry that again, but then you need to save the sid AND the tid to be reused for the top equal requests (you only need stacktrace.record_inner() for one request). Its a nice optimization. >> src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp line 362: >> >>> 360: drain_enqueued_requests(now, tl, jt, current); >>> 361: #ifdef LINUX >>> 362: if (tl->has_cpu_time_jfr_requests()) { >> >> You are having all threads traverse over this test, even though the cpu time sampler is disabled by default. Can it be improved? > > Not without allocating in the signal handler How so? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119385303 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119389715 From mgronlun at openjdk.org Sun Jun 1 18:25:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 18:25:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp line 250: > 248: } > 249: > 250: biased = true; Perhaps set on entry, and only keep the single biased = false below? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119396997 From mgronlun at openjdk.org Sun Jun 1 18:31:58 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 18:31:58 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 18:22:10 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp line 250: > >> 248: } >> 249: >> 250: biased = true; > > Perhaps set on entry, and only keep the single biased = false below? Also, note you have a direct hit in line 221--222 above - it's biased = false. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119404072 From dholmes at openjdk.org Mon Jun 2 04:35:02 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Jun 2025 04:35:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic Just some drive-by comments mainly on your acquire/release usage. I'm not at all clear what memory accesses you are trying to coordinate with those. src/hotspot/share/jfr/jni/jfrJniMethod.cpp line 176: > 174: JfrEventSetting::set_enabled(JfrCPUTimeSampleEvent, rate > 0); > 175: JfrCPUTimeThreadSampling::set_rate(rate, autoadapt == JNI_TRUE); > 176: return JNI_TRUE; What is the point of having a boolean return type if you always return true? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 59: > 57: Thread* raw_thread = Thread::current_or_null_safe(); > 58: JavaThread* jt; > 59: if (raw_thread == nullptr || !raw_thread->is_Java_thread()) { // this can happen due to the high level of parralelism Suggestion: if (raw_thread == nullptr || !raw_thread->is_Java_thread()) { // this can happen due to the high level of parallelism src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 119: > 117: _data = new_data; > 118: _capacity = capacity; > 119: } I assume there is a lock protecting this so it happens atomically? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 122: > 120: > 121: bool JfrCPUTimeTraceQueue::is_full() const { > 122: return Atomic::load_acquire(&_head) >= _capacity; I don't see why acquire semantics would be needed here. Also how can it be > capacity? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 126: > 124: > 125: bool JfrCPUTimeTraceQueue::is_empty() const { > 126: return Atomic::load_acquire(&_head) == 0; Acquire semantics are definitely not needed here. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 130: > 128: > 129: s4 JfrCPUTimeTraceQueue::lost_samples() const { > 130: return Atomic::load_acquire(&_lost_samples); Again acquire semantics seem highly dubious here - what loads are you synchronizing with? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 139: > 137: > 138: u4 JfrCPUTimeTraceQueue::get_and_reset_lost_samples() { > 139: s4 lost_samples = Atomic::load_acquire(&_lost_samples); Again acquire semantics seem highly dubious here - what loads are you synchronizing with? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 151: > 149: set_capacity(capacity); > 150: } > 151: } Seems an odd definition - typically `ensure_capacity` will grow a data structure to ensure it has sufficient capacity, and if already larger than needed that is fine. Suggestion `change_capacity`, or more traditionally `resize`? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 237: > 235: > 236: void JfrCPUTimeThreadSampler::trigger_async_processing_of_cpu_time_jfr_requests() { > 237: Atomic::release_store(&_is_async_processing_of_cpu_time_jfr_requests_triggered, true); What prior stores are you ensuring should be visible by using release semantics here? ------------- PR Review: https://git.openjdk.org/jdk/pull/25302#pullrequestreview-2886627655 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119983062 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119983911 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120016607 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120011705 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120012200 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120014449 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120014541 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120020174 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120021034 From dholmes at openjdk.org Mon Jun 2 04:35:02 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Jun 2025 04:35:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v5] In-Reply-To: References: <6hGNW2D3_VuD-2WN0eTLYdEJoNu_9rPLu-dH-InGSK4=.64de8bc8-a98f-400f-a5e3-885dbd84d901@github.com> Message-ID: <7wOUvZZtjrX3TpgT9JQLm-8qTAax6PrXtfHwMJpNX4M=.13a7c6cc-e037-4108-b392-7ff30d279c05@github.com> On Mon, 26 May 2025 06:29:03 GMT, Johannes Bechberger wrote: >> Also, is raw_thread == nullptr even possible? For the same reasons. > > `!raw_thread->is_Java_thread()` I found it during testing. What thread was it, and how did it reach this code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119984783 From mbaesken at openjdk.org Mon Jun 2 07:33:27 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 07:33:27 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured Message-ID: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . Those fail when the address sanitizer is configured ( --enable-asan ). The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . While at it, also same is also added for ubsan . ------------- Commit messages: - remove zgc change - JDK-8357826 Changes: https://git.openjdk.org/jdk/pull/25575/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25575&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357826 Stats: 56 lines in 12 files changed: 54 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25575/head:pull/25575 PR: https://git.openjdk.org/jdk/pull/25575 From mbaesken at openjdk.org Mon Jun 2 07:33:27 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 07:33:27 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured In-Reply-To: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> Message-ID: On Mon, 2 Jun 2025 07:25:22 GMT, Matthias Baesken wrote: > There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . > Those fail when the address sanitizer is configured ( --enable-asan ). > The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. > Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . > While at it, also same is also added for ubsan . The change to src/hotspot/cpu/x86/gc/z/zAddress_x86.cpp was added because of zgc issues with ASAN but we will address this in another change so I remove it from here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25575#issuecomment-2929201143 From rvansa at openjdk.org Mon Jun 2 07:36:51 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Jun 2025 07:36:51 GMT Subject: RFR: 8352075: Perf regression accessing fields [v16] In-Reply-To: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: > This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . > > This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). > > In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. > > My measurements on the attached reproducer > > hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC > Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] > Range (min ? max): 45.1 ms ? 53.9 ms 100 runs > > hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC > Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] > Range (min ? max): 73.8 ms ? 79.7 ms 100 runs > > (the jdk25-master above already contains JDK-8353175) > > hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC > Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] > Range (min ? max): 37.7 ms ? 42.1 ms 100 runs > > While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: > > JDK 17: 1.6 s > JDK 21 (no patches): 22 s > JDK25-master: 12.3 s > JDK25-this-pr: 0.5 s Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Add type cast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24847/files - new: https://git.openjdk.org/jdk/pull/24847/files/70f62460..9cba2d4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=14-15 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24847/head:pull/24847 PR: https://git.openjdk.org/jdk/pull/24847 From mbaesken at openjdk.org Mon Jun 2 08:07:38 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 08:07:38 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured [v2] In-Reply-To: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> Message-ID: > There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . > Those fail when the address sanitizer is configured ( --enable-asan ). > The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. > Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . > While at it, also same is also added for ubsan . Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: TestBreakSignalThreadDump has issues with asan ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25575/files - new: https://git.openjdk.org/jdk/pull/25575/files/3ad0d93a..aa796c8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25575&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25575&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25575/head:pull/25575 PR: https://git.openjdk.org/jdk/pull/25575 From mbaesken at openjdk.org Mon Jun 2 08:07:38 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 08:07:38 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured In-Reply-To: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> Message-ID: <4CZpPTh4S1qjEkxVcHZ-J8bxpkI4iTsOtX4iCG5M2Cw=.8c1f2e8e-02c1-4691-8d6f-aa362dd54932@github.com> On Mon, 2 Jun 2025 07:25:22 GMT, Matthias Baesken wrote: > There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . > Those fail when the address sanitizer is configured ( --enable-asan ). > The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. > Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . > While at it, also same is also added for ubsan . TestBreakSignalThreadDump shows this, so it does not work well with asan too stdout: []; stderr: [==12484==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25575#issuecomment-2929322761 From rvansa at openjdk.org Mon Jun 2 08:14:48 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Jun 2025 08:14:48 GMT Subject: RFR: 8352075: Perf regression accessing fields [v17] In-Reply-To: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: > This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . > > This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). > > In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. > > My measurements on the attached reproducer > > hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC > Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] > Range (min ? max): 45.1 ms ? 53.9 ms 100 runs > > hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC > Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] > Range (min ? max): 73.8 ms ? 79.7 ms 100 runs > > (the jdk25-master above already contains JDK-8353175) > > hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC > Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] > Range (min ? max): 37.7 ms ? 42.1 ms 100 runs > > While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: > > JDK 17: 1.6 s > JDK 21 (no patches): 22 s > JDK25-master: 12.3 s > JDK25-this-pr: 0.5 s Radim Vansa has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: - Add type cast - Fix static_assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24847/files - new: https://git.openjdk.org/jdk/pull/24847/files/9cba2d4a..c592ea59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=15-16 Stats: 53 lines in 4 files changed: 0 ins; 47 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24847/head:pull/24847 PR: https://git.openjdk.org/jdk/pull/24847 From jbechberger at openjdk.org Mon Jun 2 08:44:01 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 08:44:01 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 13:01:23 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 200: > >> 198: void sample_thread(JfrSampleRequest& request, void* ucontext, JavaThread* jt, JfrThreadLocal* tl); >> 199: >> 200: // sample all threads that are in native state (and requested to be sampled) > > We are not really "sampling", but processing their queues, no? You're correct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120450563 From jbechberger at openjdk.org Mon Jun 2 08:47:01 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 08:47:01 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: <3d549Fxkhzd6v0fAVFEBOcxZ7hBKI1ZAUafLClp7Npw=.70183618-7dbf-4e05-bcc8-fd1216741c66@github.com> On Sun, 1 Jun 2025 13:05:44 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 367: > >> 365: JfrCPUTimeSampleRequest& request = queue.at(i); >> 366: JfrStackTrace stacktrace; >> 367: traceid tid = JfrThreadLocal::thread_id(thread); > > Check the tid as a function of the JfrSampleRequest, like we do in JFR Cooperative Sampling. You mean ` const traceid tid = in_continuation ? tl->vthread_id_with_epoch_update(jt) : JfrThreadLocal::jvm_thread_id(jt);`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120458307 From jbechberger at openjdk.org Mon Jun 2 08:53:02 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 08:53:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: <3d549Fxkhzd6v0fAVFEBOcxZ7hBKI1ZAUafLClp7Npw=.70183618-7dbf-4e05-bcc8-fd1216741c66@github.com> References: <3d549Fxkhzd6v0fAVFEBOcxZ7hBKI1ZAUafLClp7Npw=.70183618-7dbf-4e05-bcc8-fd1216741c66@github.com> Message-ID: On Mon, 2 Jun 2025 08:44:01 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 367: >> >>> 365: JfrCPUTimeSampleRequest& request = queue.at(i); >>> 366: JfrStackTrace stacktrace; >>> 367: traceid tid = JfrThreadLocal::thread_id(thread); >> >> Check the tid as a function of the JfrSampleRequest, like we do in JFR Cooperative Sampling. > > You mean ` const traceid tid = in_continuation ? tl->vthread_id_with_epoch_update(jt) : JfrThreadLocal::jvm_thread_id(jt);`? I implemented this in this function now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120473792 From jbechberger at openjdk.org Mon Jun 2 08:57:04 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 08:57:04 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 13:41:44 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 558: >> >>> 556: jt->is_JfrRecorder_thread()) { >>> 557: queue.increment_lost_samples(); >>> 558: tl->set_do_async_processing_of_cpu_time_jfr_requests(false); >> >> Why is this restored here? > > Because I shouldn't sample if the thread isn't in native state anymore. The thread is probably sampled anyway on the outgoing safepoint. But you might be right, I removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120481274 From jbechberger at openjdk.org Mon Jun 2 09:01:05 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 09:01:05 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 18:10:15 GMT, Markus Gr?nlund wrote: >> Not without allocating in the signal handler > > How so? Because we need to add the threads from the signal handler. So any kind of growing array or set would not work, especially if we want to remove the threads from within the signal handler again. This is certainly an area of future optimization, albeit this doesn't seem to have any measurable performance impact in my renaissance benchmark runs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120492743 From jbechberger at openjdk.org Mon Jun 2 09:05:02 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 09:05:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 18:00:55 GMT, Markus Gr?nlund wrote: >> Yes, so I only start the thread walking if necessary > > I see. With a bounded queue as used in this solution, it can work quite nicely, that is, if the thread is actually on CPU in native, and just not waiting - if waiting (which is most likely) then pending requests could take a long time to be sent to consumers. > > I also understand better the optimization you tried as part of async walk in native and frames. Also quite nice, to walk from the last JfrSampleRequest and do equals to "batch" the top JFR sample requests that are the same (i,.e taken for the ljf). Maybe you can retry that again, but then you need to save the sid AND the tid to be reused for the top equal requests (you only need stacktrace.record_inner() for one request). Its a nice optimization. The problem is when in between queue processing a new JFR chunk is started. This caused problems before. I would leave these kinds of optimizations for later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120501728 From jbechberger at openjdk.org Mon Jun 2 09:09:04 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 09:09:04 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 18:03:15 GMT, Markus Gr?nlund wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 344: >> >>> 342: >>> 343: // equals operator for JfrSampleRequest >>> 344: inline bool operator==(const JfrSampleRequest& lhs, const JfrSampleRequest& rhs) { >> >> Can be removed. > > Unless you still want to try the ljf JfrSampleRequest optimization for the native ljf, which I kind of like now that I understand it. As I said, it's a great optimization. But it needs some work. I therefore remove this method for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120511048 From kevinw at openjdk.org Mon Jun 2 09:16:56 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 2 Jun 2025 09:16:56 GMT Subject: RFR: 8356870: HotSpotDiagnosticMXBean.dumpThreads and jcmd Thread.dump_to_file updates [v3] In-Reply-To: References: <3avXpsIbMYIQBAr6mO9K3MhewKnNRt6JthztMleZEGI=.f806b009-3c4f-43c2-8728-7cec95048ae0@github.com> <3EYLo1tSB8GfKr6pkZryIn67hGT-5m9Fcf98KCE3Jbw=.00529528-34bb-4b34-90e6-a5289ddaa477@github.com> Message-ID: <1HQxixfzldeYWk_FS2UqcgtLx_2XWJw5f2T_Ver5_Tk=.c403179d-5c0e-42a4-b03b-f00f7f4aec2c@github.com> On Fri, 30 May 2025 18:15:52 GMT, Alan Bateman wrote: >> src/java.base/share/classes/jdk/internal/vm/ThreadDumper.java line 180: >> >>> 178: } >>> 179: >>> 180: private static void dumpThread(Thread thread, TextWriter writer) { >> >> On the non-json text format for locks: here we're creating a new comment-like style: >> // parked on ..etc... >> >> In the regular Thread.print we always used a "-" prefix, and always printed the frame, then the relevant locks, like: >> >> at ThreadsMem$2.run(ThreadsMem.java:38) >> - waiting to lock <0x0000000630817da0> (a java.lang.Object) >> >> at java.lang.ref.ReferenceQueue.remove(java.base at 25-internal/ReferenceQueue.java:215) >> - locked <0x0000000630802350> (a java.lang.ref.ReferenceQueue$Lock) >> >> Could we use the same? We have a lot of history reading the established style. 8-) >> Can we match the old-style "waiting to lock" rather than "waiting on" ? >> >> I realise I'm asking to move the printing of "waiting to lock" into the loop over the stackframes, and it affects various tests. > > When parked and there is a parkBlocker, blocked entering a monitor, or waiting in Object.wait, then it gets printed between the summary/state (first line) and the stack trace. I think this is a bit clearer that printing it after the top frame but okay to change. Note that the output isn't going to look the same as the traditional thread dump as it prints the object's identity hashcode rather than the address. > > For the "locked" output then you may have a point, been back and forth on this. For synchronized methods then I think it's clearer when it is printed between the caller and synchronized callee. For synchronized blocks then it's clearer when to see which method has entered the monitor. Picking one means it's not clear in all cases but maybe people are just so used to "- locked" and don't notice. Can look at this again, it's trivial to swap between the two. Thanks yes I do like this update - I think it reads how we are used to seeing these things (without aiming to be exactly the same format as Thread.print). (oops my comment suggested waiting on, and waiting to lock, are the same, but they are not) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25429#discussion_r2120533850 From jbechberger at openjdk.org Mon Jun 2 09:28:01 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 09:28:01 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 04:28:02 GMT, David Holmes wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 119: > >> 117: _data = new_data; >> 118: _capacity = capacity; >> 119: } > > I assume there is a lock protecting this so it happens atomically? This happens before the signal handler is attached to thread. So it does happen before any parallelism is introduced on thread creation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120557327 From kevinw at openjdk.org Mon Jun 2 09:32:56 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 2 Jun 2025 09:32:56 GMT Subject: RFR: 8356870: HotSpotDiagnosticMXBean.dumpThreads and jcmd Thread.dump_to_file updates [v4] In-Reply-To: References: <3avXpsIbMYIQBAr6mO9K3MhewKnNRt6JthztMleZEGI=.f806b009-3c4f-43c2-8728-7cec95048ae0@github.com> Message-ID: On Sat, 31 May 2025 08:07:48 GMT, Alan Bateman wrote: >> Updates the thread dump generated by HotSpotDiagnosticMXBean.dumpThreads and jcmd Thread.dump_to_file to include thread state and lock information. Also update the HotSpotDiagnosticMXBean.dumpThreads API description to link to a description of the JSON format dump as that format is intended to be parseable/read by tools. >> >> This PR is dependent on [pull/25425](https://github.com/openjdk/jdk/pull/25425). As noted in that PR, the changes accumulated in the loom repo, and have been split up to make it easier to review. >> >> The changes include some re-implementation of ThreadDumper. This is because it used PrintStream and didn't fail if there was an I/O error, e.g. file system full. Furthermore, the indentation to pretty print the json was fragile and hard to maintain so this is changed to use a supporting writer class to do this. >> >> Test coverage is significantly expanded, including updating the test library that is used by several tests to parse the thread dump. >> >> Testing: tier1-6 > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Sync up from loom repo, includes review comments > - Merge branch 'pull/25425' into JDK-8356870 > - Temp fixed until fixed in pull/25425 > - Sync up from loom repo, includes review comments > - Merge branch 'pull/25425' into JDK-8356870 > - Merge branch 'pull/25425' into JDK-8356870 > - Initial commit Looks good! ------------- Marked as reviewed by kevinw (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25429#pullrequestreview-2887532349 From ayang at openjdk.org Mon Jun 2 10:51:06 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 2 Jun 2025 10:51:06 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v9] In-Reply-To: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: <-mRIrbyrBpxq1lZ2tfcxIuxRLh5lcoURlM-woAXM45k=.7c152a76-e34f-42ba-b9a7-323102b19371@github.com> > This patch refines Parallel's sizing strategy to improve overall memory management and performance. > > The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. > > `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. > > GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. > > ## Performance evaluation > > - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). > - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). > - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. > > PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - merge - merge-fix - merge - Merge branch 'master' into pgc-size-policy - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - review - ... and 2 more: https://git.openjdk.org/jdk/compare/83cb0c6d...08bc74e1 ------------- Changes: https://git.openjdk.org/jdk/pull/25000/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=08 Stats: 4375 lines in 31 files changed: 522 ins; 3454 del; 399 mod Patch: https://git.openjdk.org/jdk/pull/25000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25000/head:pull/25000 PR: https://git.openjdk.org/jdk/pull/25000 From mgronlun at openjdk.org Mon Jun 2 11:26:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 11:26:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: <45mCuuxToelhOdhbJlap5NCUMfgDBrVGIUDGJHAk2Rg=.1dd9d5a6-f2b5-4214-8815-d0a9f0cbddbb@github.com> On Mon, 2 Jun 2025 08:58:28 GMT, Johannes Bechberger wrote: >> How so? > > Because we need to add the threads from the signal handler. So any kind of growing array or set would not work, especially if we want to remove the threads from within the signal handler again. > > This is certainly an area of future optimization, albeit this doesn't seem to have any measurable performance impact in my renaissance benchmark runs. I don't understand what allocation has to do with anything. I'm talking about code branch layout to avoid having to test "has_cpu_time_jfr_requests()" when we know it will be false by default. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120846868 From mgronlun at openjdk.org Mon Jun 2 11:28:59 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 11:28:59 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: <7Cy88EZJj1ZgHXaAoCY9m1PnB6UAGDJxgK9PI3BVYBQ=.a4fbad7a-19fa-4e1e-999e-8773d2fd7fb1@github.com> On Mon, 2 Jun 2025 09:02:05 GMT, Johannes Bechberger wrote: >> I see. With a bounded queue as used in this solution, it can work quite nicely, that is, if the thread is actually on CPU in native, and just not waiting - if waiting (which is most likely) then pending requests could take a long time to be sent to consumers. >> >> I also understand better the optimization you tried as part of async walk in native and frames. Also quite nice, to walk from the last JfrSampleRequest and do equals to "batch" the top JFR sample requests that are the same (i,.e taken for the ljf). Maybe you can retry that again, but then you need to save the sid AND the tid to be reused for the top equal requests (you only need stacktrace.record_inner() for one request). Its a nice optimization. > > The problem is when in between queue processing a new JFR chunk is started. This caused problems before. > > I would leave these kinds of optimizations for later. Then I would recommend you drain immediately when the thread is in native, not waiting for the queue to fill up to 2/3. The reason is because the solution is based on CPU time samples and most threads that are _thread_in_native are waiting (i.e. they will not get their queues filled while in native). I would recommend dropping the second clause about testing the queue size altogether. That way you will not get threads stuck with a lot of events a long time in native, not being delivered. Revive it later when you begin to attack the optimizations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120855119 From jbechberger at openjdk.org Mon Jun 2 11:32:27 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 11:32:27 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v27] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - Tiny fixes - Minor changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/439763a3..6a83d759 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=25-26 Stats: 90 lines in 9 files changed: 24 ins; 29 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From jbechberger at openjdk.org Mon Jun 2 11:40:00 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 11:40:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: <45mCuuxToelhOdhbJlap5NCUMfgDBrVGIUDGJHAk2Rg=.1dd9d5a6-f2b5-4214-8815-d0a9f0cbddbb@github.com> References: <45mCuuxToelhOdhbJlap5NCUMfgDBrVGIUDGJHAk2Rg=.1dd9d5a6-f2b5-4214-8815-d0a9f0cbddbb@github.com> Message-ID: On Mon, 2 Jun 2025 11:22:45 GMT, Markus Gr?nlund wrote: >> Because we need to add the threads from the signal handler. So any kind of growing array or set would not work, especially if we want to remove the threads from within the signal handler again. >> >> This is certainly an area of future optimization, albeit this doesn't seem to have any measurable performance impact in my renaissance benchmark runs. > > I don't understand what allocation has to do with anything. I'm talking about code branch layout to avoid having to test "has_cpu_time_jfr_requests()" when we know it will be false by default. Ah. Sorry. Is it about reading the atomic boolean flag again? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120882396 From mgronlun at openjdk.org Mon Jun 2 11:40:02 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 11:40:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v27] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 11:32:27 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Tiny fixes > - Minor changes src/hotspot/share/runtime/thread.hpp line 59: > 57: class SafeThreadsListPtr; > 58: class ThreadClosure; > 59: class ThreadCrashProtection; Should not be needed. src/jdk.jfr/share/classes/jdk/jfr/internal/JVM.java line 276: > 274: * Set the maximum event emission rate for the CPU time sampler > 275: * > 276: * Setting rate to 0 turns off the CPU time method sampler. "CPU time method sampler" -> "CPU time sampler" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120878701 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120882161 From jbechberger at openjdk.org Mon Jun 2 11:51:26 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 11:51:26 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v28] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with three additional commits since the last revision: - Remove header includes - Always trigger async processing - Remove one atomic read ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/6a83d759..e482ad37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=26-27 Stats: 21 lines in 6 files changed: 3 ins; 6 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From mgronlun at openjdk.org Mon Jun 2 11:51:27 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 11:51:27 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v27] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 11:32:27 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Tiny fixes > - Minor changes src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 35: > 33: > 34: #include "jfr/recorder/jfrRecorder.hpp" > 35: #include "jfr/recorder/service/jfrRecorderService.hpp" The two includes above are not needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120890097 From mgronlun at openjdk.org Mon Jun 2 11:51:27 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 11:51:27 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: <45mCuuxToelhOdhbJlap5NCUMfgDBrVGIUDGJHAk2Rg=.1dd9d5a6-f2b5-4214-8815-d0a9f0cbddbb@github.com> Message-ID: On Mon, 2 Jun 2025 11:37:23 GMT, Johannes Bechberger wrote: >> I don't understand what allocation has to do with anything. I'm talking about code branch layout to avoid having to test "has_cpu_time_jfr_requests()" when we know it will be false by default. > > Ah. Sorry. Is it about reading the atomic boolean flag again? Right. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120897042 From jbechberger at openjdk.org Mon Jun 2 11:51:27 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 11:51:27 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: <45mCuuxToelhOdhbJlap5NCUMfgDBrVGIUDGJHAk2Rg=.1dd9d5a6-f2b5-4214-8815-d0a9f0cbddbb@github.com> Message-ID: On Mon, 2 Jun 2025 11:43:54 GMT, Markus Gr?nlund wrote: >> Ah. Sorry. Is it about reading the atomic boolean flag again? > > Right. I pass it through now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120906973 From kevinw at openjdk.org Mon Jun 2 11:54:54 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 2 Jun 2025 11:54:54 GMT Subject: RFR: 8345745: Update mode of the Attach API communication pipe. In-Reply-To: <7JMhn1RvB76NFNOmznDRggA2zeygL3_4hySSm7BcNO8=.bc945f0a-67f9-4863-ae3c-49b39b50cfc0@github.com> References: <7JMhn1RvB76NFNOmznDRggA2zeygL3_4hySSm7BcNO8=.bc945f0a-67f9-4863-ae3c-49b39b50cfc0@github.com> Message-ID: <7RBeFDW1BUysN3VH38AYxu-JFsRDKZrZS8PwEdRgv0s=.4f129313-4a80-46d5-86b9-c36b80df0426@github.com> On Thu, 29 May 2025 19:21:42 GMT, Alex Menkov wrote: > Please review this small fix to update pipe mode for attach operation communication. > - `FILE_FLAG_FIRST_PIPE_INSTANCE`: there is "retry" logic if pipe creation failed [1], with this flag `CreateNamedPipe` fails when pipe with the same name already exists; > - `PIPE_REJECT_REMOTE_CLIENTS`: attach works only for local processes, the flag adds extra protection from remote connections. > > [1]: https://github.com/openjdk/jdk/blob/master/src/jdk.attach/windows/classes/sun/tools/attach/VirtualMachineImpl.java#L93 > > Testing: tier1..4, hs-tier5-svc Marked as reviewed by kevinw (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25530#pullrequestreview-2888050099 From rvansa at openjdk.org Mon Jun 2 13:09:31 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Jun 2025 13:09:31 GMT Subject: RFR: 8352075: Perf regression accessing fields [v18] In-Reply-To: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: <5wG8n_0XjBYjFprdBfdLMIj17sBHnJEtPdBdbi-5yxg=.6896113b-ef76-4a5b-973c-3c286554205f@github.com> > This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . > > This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). > > In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. > > My measurements on the attached reproducer > > hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC > Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] > Range (min ? max): 45.1 ms ? 53.9 ms 100 runs > > hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC > Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] > Range (min ? max): 73.8 ms ? 79.7 ms 100 runs > > (the jdk25-master above already contains JDK-8353175) > > hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC > Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] > Range (min ? max): 37.7 ms ? 42.1 ms 100 runs > > While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: > > JDK 17: 1.6 s > JDK 21 (no patches): 22 s > JDK25-master: 12.3 s > JDK25-this-pr: 0.5 s Radim Vansa has updated the pull request incrementally with two additional commits since the last revision: - Rename pivot -> key, payload -> value, add comments - Add gtest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24847/files - new: https://git.openjdk.org/jdk/pull/24847/files/c592ea59..456e1505 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=16-17 Stats: 193 lines in 4 files changed: 131 ins; 5 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/24847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24847/head:pull/24847 PR: https://git.openjdk.org/jdk/pull/24847 From rvansa at openjdk.org Mon Jun 2 13:19:50 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Jun 2025 13:19:50 GMT Subject: RFR: 8352075: Perf regression accessing fields [v19] In-Reply-To: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: > This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . > > This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). > > In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. > > My measurements on the attached reproducer > > hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC > Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] > Range (min ? max): 45.1 ms ? 53.9 ms 100 runs > > hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC > Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] > Range (min ? max): 73.8 ms ? 79.7 ms 100 runs > > (the jdk25-master above already contains JDK-8353175) > > hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC > Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] > Range (min ? max): 37.7 ms ? 42.1 ms 100 runs > > While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: > > JDK 17: 1.6 s > JDK 21 (no patches): 22 s > JDK25-master: 12.3 s > JDK25-this-pr: 0.5 s Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Add gtests for number of bytes used ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24847/files - new: https://git.openjdk.org/jdk/pull/24847/files/456e1505..e214a8ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=17-18 Stats: 36 lines in 1 file changed: 35 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24847/head:pull/24847 PR: https://git.openjdk.org/jdk/pull/24847 From rvansa at openjdk.org Mon Jun 2 13:31:57 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Jun 2025 13:31:57 GMT Subject: RFR: 8352075: Perf regression accessing fields [v19] In-Reply-To: References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: On Mon, 2 Jun 2025 13:19:50 GMT, Radim Vansa wrote: >> This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . >> >> This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). >> >> In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. >> >> My measurements on the attached reproducer >> >> hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC >> Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] >> Range (min ? max): 45.1 ms ? 53.9 ms 100 runs >> >> hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC >> Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] >> Range (min ? max): 73.8 ms ? 79.7 ms 100 runs >> >> (the jdk25-master above already contains JDK-8353175) >> >> hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC >> Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] >> Range (min ? max): 37.7 ms ? 42.1 ms 100 runs >> >> While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: >> >> JDK 17: 1.6 s >> JDK 21 (no patches): 22 s >> JDK25-master: 12.3 s >> JDK25-this-pr: 0.5 s > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Add gtests for number of bytes used Fixed the CI failure, and added a gtest for all allowed bit widths and sizes of table from 0 to 99 and 10000. For better testability and reusability (how do I allocate the Array without a classloader?) I've replaced this with pointer + length argument. @rose00 While your suggestion makes sense, when there's a working implementation I would leave it this way for now and leave reading with a different offset up for future improvement: we can have a microbenchmark that would justify this. I would guess that CPU caches would hide multiple memory accesses, and the loop would be unrolled (maybe to even form 4-byte access instead of 4 1-byte...). Also when not using `Array` we can no longer rely on having the 4-byte header. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24847#issuecomment-2930732550 From jbechberger at openjdk.org Mon Jun 2 13:50:49 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 13:50:49 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Fix bug related to async stack walking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/e482ad37..09ca4fed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=27-28 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From mgronlun at openjdk.org Mon Jun 2 14:52:03 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 14:52:03 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:50:49 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug related to async stack walking src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 349: > 347: const frame top_frame = thread->last_frame(); > 348: bool in_continuation = is_in_continuation(top_frame, thread); > 349: for (u4 i = 0; i < queue.size(); i++) { Realized this drainage is entirely wrong! You are not using the sample requests in the queue to build individual stack traces for events; instead, you are using the same top frame (the last Java frame) for all of them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121391177 From mgronlun at openjdk.org Mon Jun 2 15:04:12 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 15:04:12 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:50:49 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug related to async stack walking src/hotspot/share/jfr/jfr.inline.hpp line 41: > 39: inline void Jfr::check_and_process_sample_request(JavaThread* jt) { > 40: JfrThreadLocal* tl = jt->jfr_thread_local(); > 41: bool has_cpu_time_sample_request = tl->has_cpu_time_jfr_requests(); Why this change? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 191: > 189: void sample_thread(JfrSampleRequest& request, void* ucontext, JavaThread* jt, JfrThreadLocal* tl); > 190: > 191: // process the queues for all threads that are in native state (and requested to be sampled) "requested to be processed" I guess. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 270: > 268: void JfrCPUTimeThreadSampler::enroll() { > 269: if (Atomic::cmpxchg(&_disenrolled, true, false)) { > 270: log_info(jfr)("Enrolling CPU thread sampler"); log_trace, please. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 279: > 277: void JfrCPUTimeThreadSampler::disenroll() { > 278: if (!Atomic::cmpxchg(&_disenrolled, false, true)) { > 279: log_info(jfr)("Disenrolling CPU thread sampler"); log_trace, please. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121414317 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121416556 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121426574 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121428073 From jbechberger at openjdk.org Mon Jun 2 15:04:13 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 15:04:13 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 14:57:47 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bug related to async stack walking > > src/hotspot/share/jfr/jfr.inline.hpp line 41: > >> 39: inline void Jfr::check_and_process_sample_request(JavaThread* jt) { >> 40: JfrThreadLocal* tl = jt->jfr_thread_local(); >> 41: bool has_cpu_time_sample_request = tl->has_cpu_time_jfr_requests(); > > Why this change? So I don't read the ` tl->has_cpu_time_jfr_requests()` twice on the hot-path > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 349: > >> 347: const frame top_frame = thread->last_frame(); >> 348: bool in_continuation = is_in_continuation(top_frame, thread); >> 349: for (u4 i = 0; i < queue.size(); i++) { > > Realized this drainage is entirely wrong! > > You are not using the sample requests in the queue to build individual stack traces for events; instead, you are using the same top frame (the last Java frame) for all of them. Can I export compute_top_frame and use it here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121424752 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121413413 From jbechberger at openjdk.org Mon Jun 2 15:04:13 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 15:04:13 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 14:57:22 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 349: >> >>> 347: const frame top_frame = thread->last_frame(); >>> 348: bool in_continuation = is_in_continuation(top_frame, thread); >>> 349: for (u4 i = 0; i < queue.size(); i++) { >> >> Realized this drainage is entirely wrong! >> >> You are not using the sample requests in the queue to build individual stack traces for events; instead, you are using the same top frame (the last Java frame) for all of them. > > Can I export compute_top_frame and use it here? Or just create a `Jfr::drain_cpu_time_queue` method? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121426469 From mgronlun at openjdk.org Mon Jun 2 15:12:04 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 15:12:04 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: <_CAWRT6nKdljf9SDRnD-SfdXP9L9S6Y9f6I1nGB-4q8=.eb524157-7c00-4f01-8d8a-9e9c60ef4dc7@github.com> On Mon, 2 Jun 2025 15:01:39 GMT, Johannes Bechberger wrote: >> Can I export compute_top_frame and use it here? > > Or just create a `Jfr::drain_cpu_time_queue` method? Try to move the entire: void JfrCPUTimeThreadSampler::stackwalk_thread_in_native(JavaThread* thread) { } Into JfrThreadSampling.hpp / jfrThreadSampling.cpp - you can send your JfrCPUTimeThreadSampler events from there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121456081 From mgronlun at openjdk.org Mon Jun 2 15:18:11 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 15:18:11 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 15:01:15 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/jfr.inline.hpp line 41: >> >>> 39: inline void Jfr::check_and_process_sample_request(JavaThread* jt) { >>> 40: JfrThreadLocal* tl = jt->jfr_thread_local(); >>> 41: bool has_cpu_time_sample_request = tl->has_cpu_time_jfr_requests(); >> >> Why this change? > > So I don't read the ` tl->has_cpu_time_jfr_requests()` twice on the hot-path Ok, for now. We should try to come up with a better split. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121466027 From mgronlun at openjdk.org Mon Jun 2 15:18:12 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 15:18:12 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: <_CAWRT6nKdljf9SDRnD-SfdXP9L9S6Y9f6I1nGB-4q8=.eb524157-7c00-4f01-8d8a-9e9c60ef4dc7@github.com> References: <_CAWRT6nKdljf9SDRnD-SfdXP9L9S6Y9f6I1nGB-4q8=.eb524157-7c00-4f01-8d8a-9e9c60ef4dc7@github.com> Message-ID: On Mon, 2 Jun 2025 15:09:30 GMT, Markus Gr?nlund wrote: >> Or just create a `Jfr::drain_cpu_time_queue` method? > > Try to move the entire: > > void JfrCPUTimeThreadSampler::stackwalk_thread_in_native(JavaThread* thread) { > } > > Into JfrThreadSampling.hpp / jfrThreadSampling.cpp - you can send your JfrCPUTimeThreadSampler events from there. Of course, rename the routine to something appropriate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121469433 From mgronlun at openjdk.org Mon Jun 2 15:22:03 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 15:22:03 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:50:49 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug related to async stack walking src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp line 250: > 248: break; > 249: } else { > 250: biased = false; Not correct. There is a top_frame = *current - >biased = true below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121482514 From rvansa at openjdk.org Mon Jun 2 15:31:46 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Jun 2025 15:31:46 GMT Subject: RFR: 8352075: Perf regression accessing fields [v20] In-Reply-To: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: > This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . > > This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). > > In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. > > My measurements on the attached reproducer > > hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC > Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] > Range (min ? max): 45.1 ms ? 53.9 ms 100 runs > > hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC > Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] > Range (min ? max): 73.8 ms ? 79.7 ms 100 runs > > (the jdk25-master above already contains JDK-8353175) > > hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC > Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] > Range (min ? max): 37.7 ms ? 42.1 ms 100 runs > > While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: > > JDK 17: 1.6 s > JDK 21 (no patches): 22 s > JDK25-master: 12.3 s > JDK25-this-pr: 0.5 s Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Fix error on windows ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24847/files - new: https://git.openjdk.org/jdk/pull/24847/files/e214a8ec..7d8b4a19 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=18-19 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24847/head:pull/24847 PR: https://git.openjdk.org/jdk/pull/24847 From mgronlun at openjdk.org Mon Jun 2 17:29:04 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 17:29:04 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:50:49 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug related to async stack walking src/hotspot/share/memory/resourceArea.hpp line 46: > 44: // A ResourceArea is an Arena that supports safe usage of ResourceMark. > 45: class ResourceArea: public Arena { > 46: Changes in this file are unrelated, so revert this entire file. src/hotspot/share/prims/forte.cpp line 575: > 573: extern "C" { > 574: JNIEXPORT > 575: void AsyncGetCallTrace(ASGCT_CallTrace *trace, jint depth, void* ucontext) { Unrelated changes, please revert file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121757461 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121757998 From amenkov at openjdk.org Mon Jun 2 18:15:55 2025 From: amenkov at openjdk.org (Alex Menkov) Date: Mon, 2 Jun 2025 18:15:55 GMT Subject: Integrated: 8345745: Update mode of the Attach API communication pipe. In-Reply-To: <7JMhn1RvB76NFNOmznDRggA2zeygL3_4hySSm7BcNO8=.bc945f0a-67f9-4863-ae3c-49b39b50cfc0@github.com> References: <7JMhn1RvB76NFNOmznDRggA2zeygL3_4hySSm7BcNO8=.bc945f0a-67f9-4863-ae3c-49b39b50cfc0@github.com> Message-ID: <1W-GAMXMqO3OuUw2_aqgIimebVyFYIKbMwwrgozRR3I=.d24002b4-017a-433b-a7c1-53c1e61f1013@github.com> On Thu, 29 May 2025 19:21:42 GMT, Alex Menkov wrote: > Please review this small fix to update pipe mode for attach operation communication. > - `FILE_FLAG_FIRST_PIPE_INSTANCE`: there is "retry" logic if pipe creation failed [1], with this flag `CreateNamedPipe` fails when pipe with the same name already exists; > - `PIPE_REJECT_REMOTE_CLIENTS`: attach works only for local processes, the flag adds extra protection from remote connections. > > [1]: https://github.com/openjdk/jdk/blob/master/src/jdk.attach/windows/classes/sun/tools/attach/VirtualMachineImpl.java#L93 > > Testing: tier1..4, hs-tier5-svc This pull request has now been integrated. Changeset: ec02a87a Author: Alex Menkov URL: https://git.openjdk.org/jdk/commit/ec02a87aeef008f6b2f94001fa33bac66bf24627 Stats: 5 lines in 1 file changed: 1 ins; 0 del; 4 mod 8345745: Update mode of the Attach API communication pipe. Reviewed-by: sspitsyn, kevinw ------------- PR: https://git.openjdk.org/jdk/pull/25530 From vyazici at openjdk.org Mon Jun 2 18:39:33 2025 From: vyazici at openjdk.org (Volkan Yazici) Date: Mon, 2 Jun 2025 18:39:33 GMT Subject: RFR: 8357993: Use "stdin.encoding" for reading System.in with InputStreamReader/Scanner [hotspot] [v2] In-Reply-To: References: Message-ID: > Passes the `Charset` read from the `stdin.encoding` system property while creating `InputStreamReader` or `Scanner` instances for `System.in`. > > `stdin.encoding` is a recently added property for Java 25 in [JDK-8350703](https://bugs.openjdk.org/browse/JDK-8350703). Employing it throughout the entire code base is addressed by the parent ticket [JDK-8356893](https://bugs.openjdk.org/browse/JDK-8356893). JDK-8357993 this PR is addressing is a sub-task of JDK-8356893 and is concerned with only areas related to Hotspot. Volkan Yazici has updated the pull request incrementally with one additional commit since the last revision: Provide fallback for `stdin.encoding` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25542/files - new: https://git.openjdk.org/jdk/pull/25542/files/d7751294..45bdc4fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25542&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25542&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25542.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25542/head:pull/25542 PR: https://git.openjdk.org/jdk/pull/25542 From shade at openjdk.org Mon Jun 2 18:47:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 18:47:28 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:50:49 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug related to async stack walking Scanned this briefly, would do another pass tomorrow. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 117: > 115: > 116: bool JfrCPUTimeTraceQueue::is_empty() const { > 117: return Atomic::load(&_head) == 0; Not entirely clear what is the memory semantics for accessing `_head`. Does it need to be acq/rel? If so, this one should be `::load_acquire`? src/hotspot/share/memory/resourceArea.hpp line 46: > 44: // A ResourceArea is an Arena that supports safe usage of ResourceMark. > 45: class ResourceArea: public Arena { > 46: All the changes in this file are unnecessary, please revert. src/jdk.jfr/share/classes/jdk/jfr/internal/JVM.java line 281: > 279: * @param autoadapt true if the rate should be adapted automatically > 280: */ > 281: public static native void setCPUThrottle(double rate, boolean autoadapt); Suggestion: public static native void setCPUThrottle(double rate, boolean autoAdapt); test/jdk/jdk/jfr/event/profiling/TestSamplingLongPeriod.java line 42: > 40: public class TestSamplingLongPeriod { > 41: > 42: static String sampleEvent = EventNames.ExecutionSample; Does not look necessary to change? ------------- PR Review: https://git.openjdk.org/jdk/pull/25302#pullrequestreview-2888004951 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121900364 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121610476 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121587105 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121584954 From shade at openjdk.org Mon Jun 2 18:47:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 18:47:28 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v24] In-Reply-To: References: <-QiSWEqppeW60aedVbLA3WTmnba7Fry53Qr86wE2EPs=.7a6327ce-7ef0-4b1c-bc68-0421ba3fd46f@github.com> Message-ID: On Sun, 1 Jun 2025 07:19:54 GMT, Johannes Bechberger wrote: >> Thanks for catching this mistake. I'll fix it this afternoon. > > I fixed it by changing the JEP. Hold on, shouldn't this really be "Lost"? @egahlin and @mgronlun need to chime in here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120893338 From shade at openjdk.org Mon Jun 2 18:47:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 18:47:30 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v27] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 11:32:27 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Tiny fixes > - Minor changes src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 30: > 28: #include "runtime/orderAccess.hpp" > 29: #include "utilities/ticks.hpp" > 30: #include "jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp" Include order? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 60: > 58: assert(raw_thread->is_Java_thread(), "invariant"); > 59: JavaThread* jt; > 60: if ((jt = JavaThread::cast(raw_thread))->is_exiting()) { I see no point to be extra-smart with inline assignments here: Suggestion: JavaThread* jt = JavaThread::cast(raw_thread); if (jt->is_exiting()) { src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 115: > 113: JfrCPUTimeSampleRequest* new_data = JfrCHeapObj::new_array(capacity); > 114: JfrCHeapObj::free(_data, _capacity * sizeof(JfrCPUTimeSampleRequest)); > 115: _data = new_data; A bit of peak memory consumption improvement: don't have two things live at once. Plus, give the native allocator a chance to reuse the same location. Suggestion: JfrCHeapObj::free(_data, _capacity * sizeof(JfrCPUTimeSampleRequest)); _data = JfrCHeapObj::new_array(capacity); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120895107 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120897472 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120909443 From cjplummer at openjdk.org Mon Jun 2 19:02:53 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 2 Jun 2025 19:02:53 GMT Subject: RFR: 8357993: Use "stdin.encoding" for reading System.in with InputStreamReader/Scanner [hotspot] [v2] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 18:39:33 GMT, Volkan Yazici wrote: >> Passes the `Charset` read from the `stdin.encoding` system property while creating `InputStreamReader` or `Scanner` instances for `System.in`. >> >> `stdin.encoding` is a recently added property for Java 25 in [JDK-8350703](https://bugs.openjdk.org/browse/JDK-8350703). Employing it throughout the entire code base is addressed by the parent ticket [JDK-8356893](https://bugs.openjdk.org/browse/JDK-8356893). JDK-8357993 this PR is addressing is a sub-task of JDK-8356893 and is concerned with only areas related to Hotspot. > > Volkan Yazici has updated the pull request incrementally with one additional commit since the last revision: > > Provide fallback for `stdin.encoding` src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/CLHSDB.java line 111: > 109: > 110: Charset charset = Charset.forName(System.getProperty("stdin.encoding"), Charset.defaultCharset()); > 111: BufferedReader in = new BufferedReader(new InputStreamReader(System.in, charset)); Why in some cases are you special casing the the default charset as you do here, but in other cases you are not? Also, the CharSet.forName() API you are using here is new in 18, which means backports will need to address that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25542#discussion_r2121939764 From cjplummer at openjdk.org Mon Jun 2 19:07:54 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 2 Jun 2025 19:07:54 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured [v2] In-Reply-To: References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> Message-ID: <9JQNK3tYLfg04pRpUiGpPYWoSunSfqWB61lkLxSPxwk=.a781defd-ea0e-4ebf-aa7f-01fff2e63101@github.com> On Mon, 2 Jun 2025 08:07:38 GMT, Matthias Baesken wrote: >> There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . >> Those fail when the address sanitizer is configured ( --enable-asan ). >> The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. >> Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . >> While at it, also same is also added for ubsan . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > TestBreakSignalThreadDump has issues with asan Can you document why each tests fails so we have it on record? Can be done in the PR or the CR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25575#issuecomment-2932080104 From vyazici at openjdk.org Mon Jun 2 19:27:08 2025 From: vyazici at openjdk.org (Volkan Yazici) Date: Mon, 2 Jun 2025 19:27:08 GMT Subject: RFR: 8357995: Use "stdin.encoding" for reading System.in with InputStreamReader/Scanner [core] [v2] In-Reply-To: References: Message-ID: > Passes the `Charset` read from the `stdin.encoding` system property while creating `InputStreamReader` or `Scanner` instances for `System.in`. > > `stdin.encoding` is a recently added property for Java 25 in [JDK-8350703](https://bugs.openjdk.org/browse/JDK-8350703). Employing it throughout the entire code base is addressed by the parent ticket [JDK-8356893](https://bugs.openjdk.org/browse/JDK-8356893). JDK-8357995 this PR is addressing is a sub-task of JDK-8356893 and is concerned with only areas related to core libraries. Volkan Yazici has updated the pull request incrementally with two additional commits since the last revision: - Provide fallback for `stdin.encoding` - Revert changes to `Application` and `JavaChild` There stdin is connected to the parent process rather than the console. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25544/files - new: https://git.openjdk.org/jdk/pull/25544/files/3da53cc9..ef30c050 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25544&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25544&range=00-01 Stats: 28 lines in 8 files changed: 14 ins; 3 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/25544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25544/head:pull/25544 PR: https://git.openjdk.org/jdk/pull/25544 From vyazici at openjdk.org Mon Jun 2 19:27:09 2025 From: vyazici at openjdk.org (Volkan Yazici) Date: Mon, 2 Jun 2025 19:27:09 GMT Subject: RFR: 8357995: Use "stdin.encoding" for reading System.in with InputStreamReader/Scanner [core] [v2] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 14:23:24 GMT, Alan Bateman wrote: >> Volkan Yazici has updated the pull request incrementally with two additional commits since the last revision: >> >> - Provide fallback for `stdin.encoding` >> - Revert changes to `Application` and `JavaChild` >> >> There stdin is connected to the parent process rather than the console. > > test/jdk/com/sun/tools/attach/Application.java line 40: > >> 38: >> 39: try (BufferedReader br = new BufferedReader(new InputStreamReader( >> 40: System.in, System.getProperty("stdin.encoding")))) { > > This "application" is launched by the test so connected to the parent process rather than the console. Reverted in 2d52ba408. > test/jdk/java/lang/ProcessHandle/JavaChild.java line 315: > >> 313: // children and wait for each to exit >> 314: sendResult(action, "start"); >> 315: try (Reader reader = new InputStreamReader(System.in, System.getProperty("stdin.encoding")); > > I didn't study the test closely but I think this is another case where a child process is launched so System.in is connected to the parent rather than the console. Reverted in 2d52ba408.. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25544#discussion_r2121978546 PR Review Comment: https://git.openjdk.org/jdk/pull/25544#discussion_r2121978885 From cjplummer at openjdk.org Mon Jun 2 19:29:50 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 2 Jun 2025 19:29:50 GMT Subject: RFR: 8357995: Use "stdin.encoding" for reading System.in with InputStreamReader/Scanner [core] [v2] In-Reply-To: References: Message-ID: <8V_c_ulLVTPp6ivKsU3vsslAV3l4b41mRhDKbWGv5Qk=.ea887db3-4975-4c23-b907-d02ba90f2b44@github.com> On Mon, 2 Jun 2025 19:27:08 GMT, Volkan Yazici wrote: >> Passes the `Charset` read from the `stdin.encoding` system property while creating `InputStreamReader` or `Scanner` instances for `System.in`. >> >> `stdin.encoding` is a recently added property for Java 25 in [JDK-8350703](https://bugs.openjdk.org/browse/JDK-8350703). Employing it throughout the entire code base is addressed by the parent ticket [JDK-8356893](https://bugs.openjdk.org/browse/JDK-8356893). JDK-8357995 this PR is addressing is a sub-task of JDK-8356893 and is concerned with only areas related to core libraries. > > Volkan Yazici has updated the pull request incrementally with two additional commits since the last revision: > > - Provide fallback for `stdin.encoding` > - Revert changes to `Application` and `JavaChild` > > There stdin is connected to the parent process rather than the console. test/jdk/com/sun/jdi/MultiBreakpointsTest.java line 141: > 139: Thread console(final int num, final int nhits) { > 140: final InputStreamReader isr = new InputStreamReader( > 141: System.in, Charset.forName(System.getProperty("stdin.encoding"))); `isr` is not really needed. It is used to create `br`, which is never used. It is also synchronized on, but since there is a unique `isr` for each thread, the synchronization does nothing. I suggest just deleting `isr`, `br`, and the `synchronized` below. Note there is a hint in a comment as to why it is like this: // This is a tendril from the original jdb test. // It could probably be deleted. I think this test once used jdb (and had to deal with the jdb console), but no longer does. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25544#discussion_r2121984300 From vyazici at openjdk.org Mon Jun 2 19:41:52 2025 From: vyazici at openjdk.org (Volkan Yazici) Date: Mon, 2 Jun 2025 19:41:52 GMT Subject: RFR: 8357993: Use "stdin.encoding" for reading System.in with InputStreamReader/Scanner [hotspot] [v2] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 19:00:23 GMT, Chris Plummer wrote: > Why in some cases are you special casing the the default charset as you do here, but in other cases you are not? @plummercj, as requested in the parent ticket (i.e., [JDK-8356893](https://bugs.openjdk.org/browse/JDK-8356893)), I added a fallback, except for tests. AFAICT, if `stdin.encoding` is either missing or contains an invalid value in a test, we ideally should fail the test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25542#discussion_r2122007106 From vyazici at openjdk.org Mon Jun 2 19:50:10 2025 From: vyazici at openjdk.org (Volkan Yazici) Date: Mon, 2 Jun 2025 19:50:10 GMT Subject: RFR: 8357995: Use "stdin.encoding" for reading System.in with InputStreamReader/Scanner [core] [v3] In-Reply-To: References: Message-ID: > Passes the `Charset` read from the `stdin.encoding` system property while creating `InputStreamReader` or `Scanner` instances for `System.in`. > > `stdin.encoding` is a recently added property for Java 25 in [JDK-8350703](https://bugs.openjdk.org/browse/JDK-8350703). Employing it throughout the entire code base is addressed by the parent ticket [JDK-8356893](https://bugs.openjdk.org/browse/JDK-8356893). JDK-8357995 this PR is addressing is a sub-task of JDK-8356893 and is concerned with only areas related to core libraries. Volkan Yazici has updated the pull request incrementally with one additional commit since the last revision: Clean-up `MultiBreakpointsTarg` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25544/files - new: https://git.openjdk.org/jdk/pull/25544/files/ef30c050..8f8a6575 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25544&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25544&range=01-02 Stats: 65 lines in 1 file changed: 5 ins; 21 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/25544.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25544/head:pull/25544 PR: https://git.openjdk.org/jdk/pull/25544 From vyazici at openjdk.org Mon Jun 2 19:50:11 2025 From: vyazici at openjdk.org (Volkan Yazici) Date: Mon, 2 Jun 2025 19:50:11 GMT Subject: RFR: 8357995: Use "stdin.encoding" for reading System.in with InputStreamReader/Scanner [core] [v3] In-Reply-To: <8V_c_ulLVTPp6ivKsU3vsslAV3l4b41mRhDKbWGv5Qk=.ea887db3-4975-4c23-b907-d02ba90f2b44@github.com> References: <8V_c_ulLVTPp6ivKsU3vsslAV3l4b41mRhDKbWGv5Qk=.ea887db3-4975-4c23-b907-d02ba90f2b44@github.com> Message-ID: On Mon, 2 Jun 2025 19:27:14 GMT, Chris Plummer wrote: >> Volkan Yazici has updated the pull request incrementally with one additional commit since the last revision: >> >> Clean-up `MultiBreakpointsTarg` > > test/jdk/com/sun/jdi/MultiBreakpointsTest.java line 141: > >> 139: Thread console(final int num, final int nhits) { >> 140: final InputStreamReader isr = new InputStreamReader( >> 141: System.in, Charset.forName(System.getProperty("stdin.encoding"))); > > `isr` is not really needed. It is used to create `br`, which is never used. It is also synchronized on, but since there is a unique `isr` for each thread, the synchronization does nothing. I suggest just deleting `isr`, `br`, and the `synchronized` below. > > Note there is a hint in a comment as to why it is like this: > > > // This is a tendril from the original jdb test. > // It could probably be deleted. > > > I think this test once used jdb (and had to deal with the jdb console), but no longer does. Implemented your suggestion in 8f8a65754 ? took the liberty to remove the unused `done` too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25544#discussion_r2122019203 From mgronlun at openjdk.org Mon Jun 2 20:07:02 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 20:07:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:50:49 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug related to async stack walking src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 49: > 47: > 48: static bool is_excluded(JavaThread* thread) { > 49: return thread->is_hidden_from_external_view() || thread->jfr_thread_local()->is_excluded(); I think I misled you saying that JfrRecorder_thread would be excluded by the above expression. That was true - but not anymore. Our exclusion test looks like: static inline bool is_excluded(JavaThread* jt) { assert(jt != nullptr, "invariant"); return jt->is_Compiler_thread() || jt->is_hidden_from_external_view() || jt->is_JfrRecorder_thread() || jt->jfr_thread_local()->is_excluded(); } I like you could fold jt->is_Compiler_thread() into jt->is_hidden_from_external_view() - good!. But can you please again list the condition jt->is_JfrRecorder_thread() ? Sorry, I forgot we had removed it from being considered excluded on the JfrThreadLocal level. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2122045043 From sspitsyn at openjdk.org Mon Jun 2 20:15:52 2025 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 2 Jun 2025 20:15:52 GMT Subject: RFR: 8356870: HotSpotDiagnosticMXBean.dumpThreads and jcmd Thread.dump_to_file updates [v4] In-Reply-To: References: <3avXpsIbMYIQBAr6mO9K3MhewKnNRt6JthztMleZEGI=.f806b009-3c4f-43c2-8728-7cec95048ae0@github.com> Message-ID: On Sat, 31 May 2025 08:07:48 GMT, Alan Bateman wrote: >> Updates the thread dump generated by HotSpotDiagnosticMXBean.dumpThreads and jcmd Thread.dump_to_file to include thread state and lock information. Also update the HotSpotDiagnosticMXBean.dumpThreads API description to link to a description of the JSON format dump as that format is intended to be parseable/read by tools. >> >> This PR is dependent on [pull/25425](https://github.com/openjdk/jdk/pull/25425). As noted in that PR, the changes accumulated in the loom repo, and have been split up to make it easier to review. >> >> The changes include some re-implementation of ThreadDumper. This is because it used PrintStream and didn't fail if there was an I/O error, e.g. file system full. Furthermore, the indentation to pretty print the json was fragile and hard to maintain so this is changed to use a supporting writer class to do this. >> >> Test coverage is significantly expanded, including updating the test library that is used by several tests to parse the thread dump. >> >> Testing: tier1-6 > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Sync up from loom repo, includes review comments > - Merge branch 'pull/25425' into JDK-8356870 > - Temp fixed until fixed in pull/25425 > - Sync up from loom repo, includes review comments > - Merge branch 'pull/25425' into JDK-8356870 > - Merge branch 'pull/25425' into JDK-8356870 > - Initial commit Looks good. ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25429#pullrequestreview-2889804529 From dholmes at openjdk.org Mon Jun 2 21:56:02 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Jun 2025 21:56:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 18:37:14 GMT, Aleksey Shipilev wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bug related to async stack walking > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 117: > >> 115: >> 116: bool JfrCPUTimeTraceQueue::is_empty() const { >> 117: return Atomic::load(&_head) == 0; > > Not entirely clear what is the memory semantics for accessing `_head`. Does it need to be acq/rel? If so, this one should be `::load_acquire`? Many of the accesses to head do not appear to synchronize with anything and so do not need acquire semantics. But the overall concurrency properties of this code are very unclear to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2122228261 From cjplummer at openjdk.org Mon Jun 2 22:03:51 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 2 Jun 2025 22:03:51 GMT Subject: RFR: 8357995: Use "stdin.encoding" for reading System.in with InputStreamReader/Scanner [core] [v3] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 19:50:10 GMT, Volkan Yazici wrote: >> Passes the `Charset` read from the `stdin.encoding` system property while creating `InputStreamReader` or `Scanner` instances for `System.in`. >> >> `stdin.encoding` is a recently added property for Java 25 in [JDK-8350703](https://bugs.openjdk.org/browse/JDK-8350703). Employing it throughout the entire code base is addressed by the parent ticket [JDK-8356893](https://bugs.openjdk.org/browse/JDK-8356893). JDK-8357995 this PR is addressing is a sub-task of JDK-8356893 and is concerned with only areas related to core libraries. > > Volkan Yazici has updated the pull request incrementally with one additional commit since the last revision: > > Clean-up `MultiBreakpointsTarg` The src/jdk.jdi and test/jdk/com/sun/jdi changes look good. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25544#pullrequestreview-2890059517 From cjplummer at openjdk.org Mon Jun 2 22:07:52 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 2 Jun 2025 22:07:52 GMT Subject: RFR: 8357993: Use "stdin.encoding" for reading System.in with InputStreamReader/Scanner [hotspot] [v2] In-Reply-To: References: Message-ID: <-rIQHptI3h_uZTew6KYZLXC--aq6_Sxoe5wg1Xe7xq8=.f41290f7-7c61-4849-80d6-e65cf47d8733@github.com> On Mon, 2 Jun 2025 18:39:33 GMT, Volkan Yazici wrote: >> Passes the `Charset` read from the `stdin.encoding` system property while creating `InputStreamReader` or `Scanner` instances for `System.in`. >> >> `stdin.encoding` is a recently added property for Java 25 in [JDK-8350703](https://bugs.openjdk.org/browse/JDK-8350703). Employing it throughout the entire code base is addressed by the parent ticket [JDK-8356893](https://bugs.openjdk.org/browse/JDK-8356893). JDK-8357993 this PR is addressing is a sub-task of JDK-8356893 and is concerned with only areas related to Hotspot. > > Volkan Yazici has updated the pull request incrementally with one additional commit since the last revision: > > Provide fallback for `stdin.encoding` Changes look good. Can you clarify your testing? ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25542#pullrequestreview-2890065248 From coleenp at openjdk.org Tue Jun 3 00:14:57 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Jun 2025 00:14:57 GMT Subject: RFR: 8352075: Perf regression accessing fields [v20] In-Reply-To: References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: On Mon, 2 Jun 2025 15:31:46 GMT, Radim Vansa wrote: >> This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . >> >> This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). >> >> In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. >> >> My measurements on the attached reproducer >> >> hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC >> Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] >> Range (min ? max): 45.1 ms ? 53.9 ms 100 runs >> >> hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC >> Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] >> Range (min ? max): 73.8 ms ? 79.7 ms 100 runs >> >> (the jdk25-master above already contains JDK-8353175) >> >> hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC >> Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] >> Range (min ? max): 37.7 ms ? 42.1 ms 100 runs >> >> While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: >> >> JDK 17: 1.6 s >> JDK 21 (no patches): 22 s >> JDK25-master: 12.3 s >> JDK25-this-pr: 0.5 s > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Fix error on windows It all seems reasonable until I got to the packing code and it'll take a long time to figure out how it works. Maybe some comments would help. I have 3 general comments though: 1. The coding style guide somewhere says that the * belongs with the type and not the name. This is inconsistent in this code. Can you fix it? 2. Block comments (except copyright) should use // not /* */ 3. The jtreg test directory name should be not the bugid. I think this test can go in directory runtime/FieldLayout. src/hotspot/share/utilities/packedTable.hpp line 38: > 36: uint32_t _key_mask; > 37: unsigned int _value_shift; > 38: uint32_t _value_mask; Aren't all 4 of these types the same? can you make them all uint32_t or all unsigned int? (former preferred). ------------- PR Review: https://git.openjdk.org/jdk/pull/24847#pullrequestreview-2890214085 PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2122347635 From coleenp at openjdk.org Tue Jun 3 00:14:58 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Jun 2025 00:14:58 GMT Subject: RFR: 8352075: Perf regression accessing fields [v20] In-Reply-To: References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: On Mon, 2 Jun 2025 23:49:51 GMT, Coleen Phillimore wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix error on windows > > src/hotspot/share/utilities/packedTable.hpp line 38: > >> 36: uint32_t _key_mask; >> 37: unsigned int _value_shift; >> 38: uint32_t _value_mask; > > Aren't all 4 of these types the same? can you make them all uint32_t or all unsigned int? (former preferred). Can you explain somewhere how fields are mapped to this? I assume they're sorted, for some reason I expected the packed table to be {name-cp-index, sig-cp-index, offset-in-fieldstream-for-direct-access}. Does every field get 4 ints ? So why is it packed into ```Array``` rather than just use ```Array```? So much packing code that I don't know how anyone could ever debug it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2122360613 From dholmes at openjdk.org Tue Jun 3 00:16:03 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Jun 2025 00:16:03 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 09:24:53 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 119: >> >>> 117: _data = new_data; >>> 118: _capacity = capacity; >>> 119: } >> >> I assume there is a lock protecting this so it happens atomically? > > This happens before the signal handler is attached to thread. So it does happen before any parallelism is introduced on thread creation. I'm missing the big picture here unfortunately. This looks like it can get called repeatedly as needed to change capacity. Are you saying it only gets called once before we create the sampler thread? Is the concurrency model described somewhere? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2122365626 From dholmes at openjdk.org Tue Jun 3 00:28:05 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Jun 2025 00:28:05 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: <_Q0iW6TuzM0P1qeE2XsMZbTx3lfCgW9QDEsf3-FlRYE=.b6707a06-3d91-4764-a8d8-7eaa76680584@github.com> On Mon, 2 Jun 2025 21:53:38 GMT, David Holmes wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 117: >> >>> 115: >>> 116: bool JfrCPUTimeTraceQueue::is_empty() const { >>> 117: return Atomic::load(&_head) == 0; >> >> Not entirely clear what is the memory semantics for accessing `_head`. Does it need to be acq/rel? If so, this one should be `::load_acquire`? > > Many of the accesses to head do not appear to synchronize with anything and so do not need acquire semantics. But the overall concurrency properties of this code are very unclear to me. To be clear, you only need acquire semantics here if after seeing the value 0 you need to access fields that were written before `_head` was set to 0. Similarly for most of the other access to `_head`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2122374152 From dholmes at openjdk.org Tue Jun 3 00:53:57 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Jun 2025 00:53:57 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured [v2] In-Reply-To: References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> Message-ID: On Mon, 2 Jun 2025 08:07:38 GMT, Matthias Baesken wrote: >> There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . >> Those fail when the address sanitizer is configured ( --enable-asan ). >> The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. >> Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . >> While at it, also same is also added for ubsan . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > TestBreakSignalThreadDump has issues with asan Changes look fine but I agree with Chris that we need to document why these tests don't work with ASAN, though I think I'd prefer to see an `@comment` before the `@requires !vm.asan` in the actual test files - assuming the reason can be stated clearly and succinctly. ------------- PR Review: https://git.openjdk.org/jdk/pull/25575#pullrequestreview-2890276148