From kvn at openjdk.org Sun Jun 1 00:30:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 1 Jun 2025 00:30:50 GMT Subject: RFR: 8358231: Template interpreter generator crashes with ShouldNotReachHere on some platforms after 8353686 In-Reply-To: <8Xnq0jvMBRkxOk4-gheVgeDGuIPhXXlZ8Yt-NO3izhQ=.2ff06a32-52b1-4829-9c19-0106ef733399@github.com> References: <8Xnq0jvMBRkxOk4-gheVgeDGuIPhXXlZ8Yt-NO3izhQ=.2ff06a32-52b1-4829-9c19-0106ef733399@github.com> Message-ID: On Sat, 31 May 2025 22:18:33 GMT, Martin Doerr wrote: > Trivial build fix for PPC64 and s390. I haven't seen more affected platforms. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25568#pullrequestreview-2884826176 From jbechberger at openjdk.org Sun Jun 1 07:13:00 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 07:13:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v25] In-Reply-To: References: Message-ID: <2nYqo0wpUrLLJV9iDRLwj5xjV06waCzu8Ma8YSAToIY=.1059ee96-77f8-47e6-8797-3f2b47783311@github.com> On Sat, 31 May 2025 10:37:29 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove debug printf > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp line 139: > >> 137: >> 138: // Trigger sampling while a thread is not in a safepoint, from a seperate thread >> 139: static void trigger_is_thread_in_native_stackwalking(); > > Is it sampling that is triggered? Sampling refers to the asynchronous signal received from the operating system (OS). > > You are asking for the sampler thread to process already taken JFR Sample Requests in the queue, right? Yes and I like your implied name better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2118819169 From jbechberger at openjdk.org Sun Jun 1 07:17:02 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 07:17:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v25] In-Reply-To: References: Message-ID: On Sat, 31 May 2025 10:09:15 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove debug printf > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp line 36: > >> 34: #if defined(LINUX) >> 35: >> 36: #include "memory/padded.hpp" > > What is padded? If not, this should go. Good catch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2118820425 From jbechberger at openjdk.org Sun Jun 1 07:22:58 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 07:22:58 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v24] In-Reply-To: <-QiSWEqppeW60aedVbLA3WTmnba7Fry53Qr86wE2EPs=.7a6327ce-7ef0-4b1c-bc68-0421ba3fd46f@github.com> References: <-QiSWEqppeW60aedVbLA3WTmnba7Fry53Qr86wE2EPs=.7a6327ce-7ef0-4b1c-bc68-0421ba3fd46f@github.com> Message-ID: On Fri, 30 May 2025 09:19:47 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/metadata/metadata.xml line 975: >> >>> 973: >>> 974: >>> 975: > >> I'm not a reviewer, but I just wanted to comment something I noticed. >> The JEP document says CPUTimeSampleLos'**t**', but the implementation says CPUTimeSampleLos'**s**'. Which one is correct? >> A sentence from the JEP document: >> >> Another new event,?`jdk.CPUTimeSampleLost`, is emitted when samples are lost ... > > Thanks for catching this mistake. I'll fix it this afternoon. I fixed it by changing the JEP. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2118825477 From jbechberger at openjdk.org Sun Jun 1 07:26:19 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 07:26:19 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - Refactoring - Remove convoluted native trace logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/3a10d552..439763a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=24-25 Stats: 56 lines in 5 files changed: 3 ins; 27 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From mgronlun at openjdk.org Sun Jun 1 13:04:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 13:04:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 42: > 40: #include "runtime/javaThread.hpp" > 41: #include "runtime/osThread.hpp" > 42: #include "runtime/safepointMechanism.hpp" Not needed, since you have the .inline.hpp src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 102: > 100: > 101: u4 JfrCPUTimeTraceQueue::size() const { > 102: return Atomic::load(&_head); Is this read from multiple threads? In that case, load_acquire(). src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 200: > 198: void sample_thread(JfrSampleRequest& request, void* ucontext, JavaThread* jt, JfrThreadLocal* tl); > 199: > 200: // sample all threads that are in native state (and requested to be sampled) We are not really "sampling", but processing their queues, no? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119128911 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119129239 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119129708 From mgronlun at openjdk.org Sun Jun 1 13:08:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 13:08:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 367: > 365: JfrCPUTimeSampleRequest& request = queue.at(i); > 366: JfrStackTrace stacktrace; > 367: traceid tid = JfrThreadLocal::thread_id(thread); Check the tid as a function of the JfrSampleRequest, like we do in JFR Cooperative Sampling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119130991 From mgronlun at openjdk.org Sun Jun 1 13:12:01 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 13:12:01 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 413: > 411: } > 412: if (Atomic::load(&count) % 1000 == 0) { > 413: log_info(jfr)("CPU thread sampler sent %zu events, lost %d, biased %zu\n", Atomic::load(&count), Atomic::load(&_lost_samples_sum), Atomic::load(&biased_count)); put this logging under jfr+debug or log+trace please ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119137014 From mgronlun at openjdk.org Sun Jun 1 13:23:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 13:23:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 557: > 555: if (!check_state(jt) || > 556: jt->is_JfrRecorder_thread()) { > 557: queue.increment_lost_samples(); is_JfrRecorder_thread() will not appear here since it's excluded and would have returned nullptr from get_java_thread_if_valid(). src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 558: > 556: jt->is_JfrRecorder_thread()) { > 557: queue.increment_lost_samples(); > 558: tl->set_do_async_processing_of_cpu_time_jfr_requests(false); Why is this restored here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119142346 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119142510 From jbechberger at openjdk.org Sun Jun 1 13:43:58 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 13:43:58 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 13:19:48 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 558: > >> 556: jt->is_JfrRecorder_thread()) { >> 557: queue.increment_lost_samples(); >> 558: tl->set_do_async_processing_of_cpu_time_jfr_requests(false); > > Why is this restored here? Because I shouldn't sample if the thread isn't in native state anymore. The thread is probably sampled anyway on the outgoing safepoint. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119157906 From mgronlun at openjdk.org Sun Jun 1 15:07:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 15:07:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp line 362: > 360: drain_enqueued_requests(now, tl, jt, current); > 361: #ifdef LINUX > 362: if (tl->has_cpu_time_jfr_requests()) { You are having all threads traverse over this lock, even though the cpu time sampler is disabled by default. Can it be improved? src/hotspot/share/jfr/support/jfrThreadLocal.cpp line 604: > 602: > 603: bool JfrThreadLocal::has_cpu_time_jfr_requests() { > 604: return Atomic::load(&_has_cpu_time_jfr_requests); Atomic::load_acquire() src/hotspot/share/jfr/support/jfrThreadLocal.cpp line 620: > 618: > 619: bool JfrThreadLocal::wants_async_processing_of_cpu_time_jfr_requests() { > 620: return Atomic::load(&_do_async_processing_of_cpu_time_jfr_requests); Atomic::load_acquire() ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119242319 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119243305 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119243393 From jbechberger at openjdk.org Sun Jun 1 15:07:00 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 15:07:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: <62JxxY-xn3fwz0PnhcnIH6DOWBQUPIq_fhDD_7YrSmA=.bfbb317a-403e-4826-a3ed-c364882e821b@github.com> On Sun, 1 Jun 2025 15:01:06 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp line 362: > >> 360: drain_enqueued_requests(now, tl, jt, current); >> 361: #ifdef LINUX >> 362: if (tl->has_cpu_time_jfr_requests()) { > > You are having all threads traverse over this lock, even though the cpu time sampler is disabled by default. Can it be improved? Not without allocating in the signal handler ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119243238 From mgronlun at openjdk.org Sun Jun 1 15:27:06 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 15:27:06 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 332: > 330: JavaThread* jt = tlh.list()->thread_at(i); > 331: JfrThreadLocal* tl = jt->jfr_thread_local(); > 332: if (tl != nullptr && tl->wants_async_processing_of_cpu_time_jfr_requests()) { tl is never nullptr. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 344: > 342: > 343: // equals operator for JfrSampleRequest > 344: inline bool operator==(const JfrSampleRequest& lhs, const JfrSampleRequest& rhs) { Can be removed. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 574: > 572: > 573: if (queue.enqueue(request)) { > 574: tl->set_has_cpu_time_jfr_requests(true); This should only need to be set when enqueuing the first entry. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 581: > 579: > 580: if (jt->thread_state() == _thread_in_native && > 581: queue.size() > queue.capacity() * 2 / 3) { Is this logic still valid? You are only asking for a async processing depending on the load factor of the queue? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 586: > 584: JfrCPUTimeThreadSampling::trigger_async_processing_of_cpu_time_jfr_requests(); > 585: } else { > 586: tl->set_do_async_processing_of_cpu_time_jfr_requests(false); Was it true before and needed a reset? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119250661 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119250887 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119248176 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119248824 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119249381 From jbechberger at openjdk.org Sun Jun 1 15:27:06 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 15:27:06 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 15:18:52 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 574: > >> 572: >> 573: if (queue.enqueue(request)) { >> 574: tl->set_has_cpu_time_jfr_requests(true); > > This should only need to be set when enqueuing the first entry. You're right > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 581: > >> 579: >> 580: if (jt->thread_state() == _thread_in_native && >> 581: queue.size() > queue.capacity() * 2 / 3) { > > Is this logic still valid? You are only asking for a async processing depending on the load factor of the queue? Yes, so I only start the thread walking if necessary ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119248709 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119250511 From mgronlun at openjdk.org Sun Jun 1 15:35:01 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 15:35:01 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 536: > 534: } > 535: > 536: volatile size_t count__ = 0; unused? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119258988 From jbechberger at openjdk.org Sun Jun 1 15:39:00 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Sun, 1 Jun 2025 15:39:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: <6Idy8j9wbNr9udYMhsW0BQmhb8dQvc_p20vCYtg5kZc=.6380eee6-bd1b-45d0-bca8-c8068e59bd36@github.com> On Sun, 1 Jun 2025 15:32:08 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 536: > >> 534: } >> 535: >> 536: volatile size_t count__ = 0; > > unused? Yes. > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 586: > >> 584: JfrCPUTimeThreadSampling::trigger_async_processing_of_cpu_time_jfr_requests(); >> 585: } else { >> 586: tl->set_do_async_processing_of_cpu_time_jfr_requests(false); > > Was it true before and needed a reset? I could check this before setting ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119260755 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119261558 From mgronlun at openjdk.org Sun Jun 1 15:43:06 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 15:43:06 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: <66tRvhjE2LrwccsAYmRycS6QLF2KdRg-XHfk-scr-wg=.c7f269f0-301a-4da3-ae54-7f6bc7a440b1@github.com> On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/support/jfrThreadLocal.cpp line 587: > 585: } > 586: > 587: bool JfrThreadLocal::acquire_cpu_time_jfr_native_lock() { It appears that the lock state 'NATIVE' is redundant; an asynchronous request for queue drainage only requires the dequeue lock state. NATIVE can be removed to simplify the lock protocol. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119268003 From shade at openjdk.org Sun Jun 1 16:14:50 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Sun, 1 Jun 2025 16:14:50 GMT Subject: RFR: 8358231: Template interpreter generator crashes with ShouldNotReachHere on some platforms after 8353686 In-Reply-To: <8Xnq0jvMBRkxOk4-gheVgeDGuIPhXXlZ8Yt-NO3izhQ=.2ff06a32-52b1-4829-9c19-0106ef733399@github.com> References: <8Xnq0jvMBRkxOk4-gheVgeDGuIPhXXlZ8Yt-NO3izhQ=.2ff06a32-52b1-4829-9c19-0106ef733399@github.com> Message-ID: <31NqA7K-ur9Y9SJ5jIHiPuG4KHm_GWMyYU79aCYbAsQ=.16bca797-7a68-41fd-88f9-c9afce90a247@github.com> On Sat, 31 May 2025 22:18:33 GMT, Martin Doerr wrote: > Trivial build fix for PPC64 and s390. I haven't seen more affected platforms. AFAICS with my builds that invoke CDS `-Xshare:dump` on cross-compiled binaries, ARM32 is failing the same way. I think we need to add a case here: https://github.com/openjdk/jdk/blob/c1b5f62a8c30038d3b1a14d184535ba0642d51c9/src/hotspot/cpu/arm/templateInterpreterGenerator_arm.cpp#L175-L179 ------------- PR Review: https://git.openjdk.org/jdk/pull/25568#pullrequestreview-2885791890 From mdoerr at openjdk.org Sun Jun 1 17:11:05 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Sun, 1 Jun 2025 17:11:05 GMT Subject: RFR: 8358231: Template interpreter generator crashes with ShouldNotReachHere on some platforms after 8353686 [v2] In-Reply-To: <8Xnq0jvMBRkxOk4-gheVgeDGuIPhXXlZ8Yt-NO3izhQ=.2ff06a32-52b1-4829-9c19-0106ef733399@github.com> References: <8Xnq0jvMBRkxOk4-gheVgeDGuIPhXXlZ8Yt-NO3izhQ=.2ff06a32-52b1-4829-9c19-0106ef733399@github.com> Message-ID: > Trivial build fix for PPC64 and s390. Added arm32. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add arm32 fix. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25568/files - new: https://git.openjdk.org/jdk/pull/25568/files/f5df2535..25fb16bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25568&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25568&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25568.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25568/head:pull/25568 PR: https://git.openjdk.org/jdk/pull/25568 From mgronlun at openjdk.org Sun Jun 1 18:12:58 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 18:12:58 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 15:24:17 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 344: > >> 342: >> 343: // equals operator for JfrSampleRequest >> 344: inline bool operator==(const JfrSampleRequest& lhs, const JfrSampleRequest& rhs) { > > Can be removed. Unless you still want to try the ljf JfrSampleRequest optimization for the native ljf, which I kind of like now that I understand it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119386104 From mgronlun at openjdk.org Sun Jun 1 18:13:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 18:13:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 15:23:06 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 581: >> >>> 579: >>> 580: if (jt->thread_state() == _thread_in_native && >>> 581: queue.size() > queue.capacity() * 2 / 3) { >> >> Is this logic still valid? You are only asking for async processing assistance depending on the load factor of the queue? > > Yes, so I only start the thread walking if necessary I see. With a bounded queue as used in this solution, it can work quite nicely, that is, if the thread is actually on CPU in native, and just not waiting - if waiting (which is most likely) then pending requests could take a long time to be sent to consumers. I also understand better the optimization you tried as part of async walk in native and frames. Also quite nice, to walk from the last JfrSampleRequest and do equals to "batch" the top JFR sample requests that are the same (i,.e taken for the ljf). Maybe you can retry that again, but then you need to save the sid AND the tid to be reused for the top equal requests (you only need stacktrace.record_inner() for one request). Its a nice optimization. >> src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp line 362: >> >>> 360: drain_enqueued_requests(now, tl, jt, current); >>> 361: #ifdef LINUX >>> 362: if (tl->has_cpu_time_jfr_requests()) { >> >> You are having all threads traverse over this test, even though the cpu time sampler is disabled by default. Can it be improved? > > Not without allocating in the signal handler How so? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119385303 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119389715 From mgronlun at openjdk.org Sun Jun 1 18:25:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 18:25:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp line 250: > 248: } > 249: > 250: biased = true; Perhaps set on entry, and only keep the single biased = false below? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119396997 From mgronlun at openjdk.org Sun Jun 1 18:31:58 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 18:31:58 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 18:22:10 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp line 250: > >> 248: } >> 249: >> 250: biased = true; > > Perhaps set on entry, and only keep the single biased = false below? Also, note you have a direct hit in line 221--222 above - it's biased = false. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119404072 From iveresov at openjdk.org Sun Jun 1 19:05:01 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sun, 1 Jun 2025 19:05:01 GMT Subject: RFR: 8358236: [AOT] Graal crashes when trying to use persisted MDOs Message-ID: Forgot to null out MethodData::_failed_speculations before snapshotting. As a result it gets restored with a dangling pointer. Testing looks clean. ------------- Commit messages: - Null out MethodData::_failed_speculations before snapshot Changes: https://git.openjdk.org/jdk/pull/25570/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25570&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358236 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25570/head:pull/25570 PR: https://git.openjdk.org/jdk/pull/25570 From mgronlun at openjdk.org Sun Jun 1 20:38:29 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Sun, 1 Jun 2025 20:38:29 GMT Subject: RFR: 8357962: JFR Cooperative Sampling reveals inconsistent interpreter frames as part of JVMTI PopFrame Message-ID: Greetings, Please see the JIRA issue for a detailed description. Fix only applies to platforms that issue a save_bcp() as part of InterpreterMacroAssembler::unlock_object(). Testing: jdk_jfr, JVMTI PopFrame tests Thanks Markus ------------- Commit messages: - 8357962 Changes: https://git.openjdk.org/jdk/pull/25571/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25571&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357962 Stats: 3 lines in 3 files changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25571.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25571/head:pull/25571 PR: https://git.openjdk.org/jdk/pull/25571 From kvn at openjdk.org Sun Jun 1 21:23:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 1 Jun 2025 21:23:53 GMT Subject: RFR: 8358236: [AOT] Graal crashes when trying to use persisted MDOs In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 19:01:27 GMT, Igor Veresov wrote: > Forgot to null out MethodData::_failed_speculations before snapshotting. As a result it gets restored with a dangling pointer. > Testing looks clean. Trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25570#pullrequestreview-2886119546 From iveresov at openjdk.org Sun Jun 1 21:23:54 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sun, 1 Jun 2025 21:23:54 GMT Subject: Integrated: 8358236: [AOT] Graal crashes when trying to use persisted MDOs In-Reply-To: References: Message-ID: <2VQGaTWxeSr29uU3Ih3S5kF9l70w3xwlkHNG_pVFr7U=.3279eb7c-5bf8-4df1-8405-61b1678552d5@github.com> On Sun, 1 Jun 2025 19:01:27 GMT, Igor Veresov wrote: > Forgot to null out MethodData::_failed_speculations before snapshotting. As a result it gets restored with a dangling pointer. > Testing looks clean. This pull request has now been integrated. Changeset: 85e36d79 Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/85e36d79246913abb8b85c2be719670655d619ab Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8358236: [AOT] Graal crashes when trying to use persisted MDOs Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/25570 From dholmes at openjdk.org Mon Jun 2 02:11:57 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Jun 2025 02:11:57 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v6] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: On Fri, 30 May 2025 19:34:16 GMT, Mohamed Issa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.cbrt() using libm. There is a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> The command to run all range specific micro-benchmarks is posted below. >> >> `make test TEST="micro:CbrtPerf.CbrtPerfRanges"` >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b21](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B21) as the baseline version. >> >> For performance data collected with the new built in range micro-benchmark, see the table below. Each result is the mean of 8 individual runs, and the input ranges used match those from the original Java implementation. Overall, the intrinsic provides a major uplift of 169% when very small inputs are used and a more modest uplift of 45% for all other inputs. >> >> | Input range(s) | Baseline throughput (ops/ms) | Intrinsic throughput (ops/ms) | Speedup | >> | :-------------------------------------: | :-------------------------------: | :-------------------------------: | :---------: | >> | [-2^(-1022), 2^(-1022)] | 6568 | 17678 | 2.69x | >> | (-INF, -2^(-1022)], [2^(-1022), INF) | 138932 | 200897 | 1.45x | >> >> Finally, the `jtreg:test/jdk/java/lang/Math/CubeRootTests.java` test passed with the changes. > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Set address attributes in movapd assembly instruction function definition This change also broke most of the non-x86 platforms, due to the new intrinsic not being implemented on those platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24470#issuecomment-2928415483 From amitkumar at openjdk.org Mon Jun 2 03:26:58 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 2 Jun 2025 03:26:58 GMT Subject: RFR: 8358231: Template interpreter generator crashes with ShouldNotReachHere on some platforms after 8353686 [v2] In-Reply-To: References: <8Xnq0jvMBRkxOk4-gheVgeDGuIPhXXlZ8Yt-NO3izhQ=.2ff06a32-52b1-4829-9c19-0106ef733399@github.com> Message-ID: On Sun, 1 Jun 2025 17:11:05 GMT, Martin Doerr wrote: >> Trivial build fix for PPC64 and s390. Added arm32. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add arm32 fix. Thanks Martin, for fixing it. ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/25568#pullrequestreview-2886565212 From duke at openjdk.org Mon Jun 2 03:52:07 2025 From: duke at openjdk.org (Mohamed Issa) Date: Mon, 2 Jun 2025 03:52:07 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v6] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: On Mon, 2 Jun 2025 02:08:55 GMT, David Holmes wrote: > This change also broke most of the non-x86 platforms, due to the new intrinsic not being implemented on those platforms. When you say "most of the non-x86 platforms", are you referring to the ones with processor types listed below? 1. jdk/src/hotspot/cpu/**arm** 2. jdk/src/hotspot/cpu/**ppc** 3. jdk/src/hotspot/cpu/**s390** I don't see a cbrt intrinsic implementation in the non-x86 platforms. However, the ones listed above appear to get to the _ShouldNotReachHere_ error state if a particular intrinsic isn't found in `TemplateInterpreterGenerator::generate_math_entry` (`templateInterpreterGenerator_*.cpp`). It looks like aarch64 and riscv don't take that route and would fall back to the default cbrt implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24470#issuecomment-2928618217 From dholmes at openjdk.org Mon Jun 2 04:35:02 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Jun 2025 04:35:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 07:26:19 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Refactoring > - Remove convoluted native trace logic Just some drive-by comments mainly on your acquire/release usage. I'm not at all clear what memory accesses you are trying to coordinate with those. src/hotspot/share/jfr/jni/jfrJniMethod.cpp line 176: > 174: JfrEventSetting::set_enabled(JfrCPUTimeSampleEvent, rate > 0); > 175: JfrCPUTimeThreadSampling::set_rate(rate, autoadapt == JNI_TRUE); > 176: return JNI_TRUE; What is the point of having a boolean return type if you always return true? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 59: > 57: Thread* raw_thread = Thread::current_or_null_safe(); > 58: JavaThread* jt; > 59: if (raw_thread == nullptr || !raw_thread->is_Java_thread()) { // this can happen due to the high level of parralelism Suggestion: if (raw_thread == nullptr || !raw_thread->is_Java_thread()) { // this can happen due to the high level of parallelism src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 119: > 117: _data = new_data; > 118: _capacity = capacity; > 119: } I assume there is a lock protecting this so it happens atomically? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 122: > 120: > 121: bool JfrCPUTimeTraceQueue::is_full() const { > 122: return Atomic::load_acquire(&_head) >= _capacity; I don't see why acquire semantics would be needed here. Also how can it be > capacity? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 126: > 124: > 125: bool JfrCPUTimeTraceQueue::is_empty() const { > 126: return Atomic::load_acquire(&_head) == 0; Acquire semantics are definitely not needed here. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 130: > 128: > 129: s4 JfrCPUTimeTraceQueue::lost_samples() const { > 130: return Atomic::load_acquire(&_lost_samples); Again acquire semantics seem highly dubious here - what loads are you synchronizing with? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 139: > 137: > 138: u4 JfrCPUTimeTraceQueue::get_and_reset_lost_samples() { > 139: s4 lost_samples = Atomic::load_acquire(&_lost_samples); Again acquire semantics seem highly dubious here - what loads are you synchronizing with? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 151: > 149: set_capacity(capacity); > 150: } > 151: } Seems an odd definition - typically `ensure_capacity` will grow a data structure to ensure it has sufficient capacity, and if already larger than needed that is fine. Suggestion `change_capacity`, or more traditionally `resize`? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 237: > 235: > 236: void JfrCPUTimeThreadSampler::trigger_async_processing_of_cpu_time_jfr_requests() { > 237: Atomic::release_store(&_is_async_processing_of_cpu_time_jfr_requests_triggered, true); What prior stores are you ensuring should be visible by using release semantics here? ------------- PR Review: https://git.openjdk.org/jdk/pull/25302#pullrequestreview-2886627655 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119983062 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119983911 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120016607 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120011705 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120012200 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120014449 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120014541 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120020174 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120021034 From dholmes at openjdk.org Mon Jun 2 04:35:02 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Jun 2025 04:35:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v5] In-Reply-To: References: <6hGNW2D3_VuD-2WN0eTLYdEJoNu_9rPLu-dH-InGSK4=.64de8bc8-a98f-400f-a5e3-885dbd84d901@github.com> Message-ID: <7wOUvZZtjrX3TpgT9JQLm-8qTAax6PrXtfHwMJpNX4M=.13a7c6cc-e037-4108-b392-7ff30d279c05@github.com> On Mon, 26 May 2025 06:29:03 GMT, Johannes Bechberger wrote: >> Also, is raw_thread == nullptr even possible? For the same reasons. > > `!raw_thread->is_Java_thread()` I found it during testing. What thread was it, and how did it reach this code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2119984783 From dholmes at openjdk.org Mon Jun 2 04:44:57 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Jun 2025 04:44:57 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v6] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: On Mon, 2 Jun 2025 03:49:42 GMT, Mohamed Issa wrote: > When you say "most of the non-x86 platforms", are you referring to the ones with processor types listed below? Yes - 3 of the 5 non-x86 platforms. > It looks like aarch64 and riscv don't take that route and would fall back to the default cbrt implementation. I was wondering why Aarch64 didn't fail. I guess the other platforms may use this to detect new intrinsics being added. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24470#issuecomment-2928722575 From dholmes at openjdk.org Mon Jun 2 04:50:57 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Jun 2025 04:50:57 GMT Subject: RFR: 8357576: FieldInfo::_index is not initialized by the constructor In-Reply-To: <_9Nvx68w_0Ly5NgPGzGci6Uf9Si0AM1N3eQ_e-5hBR8=.1f055ae3-8cd7-4ae2-ae17-3722dc4b7427@github.com> References: <_9Nvx68w_0Ly5NgPGzGci6Uf9Si0AM1N3eQ_e-5hBR8=.1f055ae3-8cd7-4ae2-ae17-3722dc4b7427@github.com> Message-ID: On Fri, 30 May 2025 19:07:24 GMT, Matias Saavedra Silva wrote: > FieldInfo::_index is not initialized in either of the FieldInfo constructors so this patch adds initialization to both constructors. Verified with tier 1-5 tests Good and trivial, but does need copyright year update. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25554#pullrequestreview-2886701153 From kbarrett at openjdk.org Mon Jun 2 05:33:41 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 2 Jun 2025 05:33:41 GMT Subject: RFR: 8255082: HotSpot Style Guide should permit noexcept Message-ID: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> Please review this change to permit the use of `noexcept` under certain circumstances in HotSpot code. http://wg21.link/n3050 Testing: JDK-8316930 (HotSpot should use noexcept instead of throw()) showed what the conversion would look like. It will need to be brought up to current mainline, possibly with modifications. This is a modification of the Style Guide, so rough consensus among the HotSpot Group members is required to make this change. Only Group members should vote for approval (via the github PR), though reasoned objections or comments from anyone will be considered. A decision on this proposal will not be made before Friday 16-June-2025 at 12h00 UTC. Since we're piggybacking on github PRs here, please use the PR review process to approve (click on Review Changes > Approve), rather than sending a "vote: yes" email reply that would be normal for a CFV. ------------- Commit messages: - add noexcept Changes: https://git.openjdk.org/jdk/pull/25574/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25574&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8255082 Stats: 104 lines in 2 files changed: 104 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25574.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25574/head:pull/25574 PR: https://git.openjdk.org/jdk/pull/25574 From kbarrett at openjdk.org Mon Jun 2 05:48:59 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 2 Jun 2025 05:48:59 GMT Subject: RFR: 8358205: Remove unused JFR array allocation code In-Reply-To: <4iPujAp0lL_pVhcjlfMX42dIqE7Aw5X8FZr2k5cSFGo=.139bdd20-c798-4335-9ebd-cf0748e7d339@github.com> References: <4iPujAp0lL_pVhcjlfMX42dIqE7Aw5X8FZr2k5cSFGo=.139bdd20-c798-4335-9ebd-cf0748e7d339@github.com> Message-ID: On Fri, 30 May 2025 18:10:07 GMT, Coleen Phillimore wrote: > The JFR code is using ObjArray->allocate() directly rather than going through oopFactory. In Valhalla, the oopFactory code is being changed to account for new array shapes and attributes, so all code should call that instead. Turns out this function is unused, so this change removes it. Tested with tier1-7 with a ShouldNotReachHere(), then jdk/jfr tests with the removal. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25553#pullrequestreview-2886834584 From dbriemann at openjdk.org Mon Jun 2 05:53:50 2025 From: dbriemann at openjdk.org (David Briemann) Date: Mon, 2 Jun 2025 05:53:50 GMT Subject: RFR: 8357981: [PPC64] Remove old instructions from VM_Version::determine_features() In-Reply-To: References: Message-ID: On Wed, 28 May 2025 14:31:40 GMT, Martin Doerr wrote: > Simple cleanup after [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859). The old instructions are always available and don't need to be tried in `VM_Version::determine_features()`. > > On Power10: > > -------------------------------------------------------------------------------- > Decoding cpu-feature detection stub at 0x000079b9203c0380 after execution: > -------------------------------------------------------------------------------- > 0x000079b9203c0380: darn r7,1 > 0x000079b9203c0384: brw r5,r6 > 0x000079b9203c0388: blr bo=0b10100,bh=0b00[subroutine_return] > 0x000079b9203c038c: dcbz 0,r3 > 0x000079b9203c0390: blr bo=0b10100,bh=0b00[subroutine_return] > > > Also tested on older processors: On Power9, `brw` gets zeroed out. On Power8, `darn` also gets zeroed out. LGTM, Thank you! ------------- Marked as reviewed by dbriemann (Author). PR Review: https://git.openjdk.org/jdk/pull/25495#pullrequestreview-2886849573 From eosterlund at openjdk.org Mon Jun 2 06:27:50 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Jun 2025 06:27:50 GMT Subject: RFR: 8357962: JFR Cooperative Sampling reveals inconsistent interpreter frames as part of JVMTI PopFrame In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 20:33:50 GMT, Markus Gr?nlund wrote: > Greetings, > > Please see the JIRA issue for a detailed description. > > Fix only applies to platforms that issue a save_bcp() as part of InterpreterMacroAssembler::unlock_object(). > > Testing: jdk_jfr, JVMTI PopFrame tests > > Thanks > Markus Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25571#pullrequestreview-2886927096 From fyang at openjdk.org Mon Jun 2 06:41:55 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 2 Jun 2025 06:41:55 GMT Subject: RFR: 8357962: JFR Cooperative Sampling reveals inconsistent interpreter frames as part of JVMTI PopFrame In-Reply-To: References: Message-ID: <1Y5-9j2Z4EIDS0Ftrkr8S-KT1MlrtB9jYwjzX72adrs=.d4f6f733-13cf-4473-b63a-c42c46beffd3@github.com> On Sun, 1 Jun 2025 20:33:50 GMT, Markus Gr?nlund wrote: > Greetings, > > Please see the JIRA issue for a detailed description. > > Fix only applies to platforms that issue a save_bcp() as part of InterpreterMacroAssembler::unlock_object(). > > Testing: jdk_jfr, JVMTI PopFrame tests > > Thanks > Markus FYI: `hotspot_serviceability` and `jdk_svc` test good on linux-riscv64 platform. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25571#issuecomment-2929043313 From dholmes at openjdk.org Mon Jun 2 07:02:52 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Jun 2025 07:02:52 GMT Subject: RFR: 8255082: HotSpot Style Guide should permit noexcept In-Reply-To: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> References: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> Message-ID: <8ueGNCZGkc0fbJHYg8l2XPSG0w2DAxKf4e59ClyXhGw=.5497fc78-f598-4af4-b745-d05f7115e953@github.com> On Mon, 2 Jun 2025 05:28:17 GMT, Kim Barrett wrote: > Please review this change to permit the use of `noexcept` under certain > circumstances in HotSpot code. > > http://wg21.link/n3050 > > Testing: > > JDK-8316930 (HotSpot should use noexcept instead of throw()) showed what the > conversion would look like. It will need to be brought up to current mainline, > possibly with modifications. > > This is a modification of the Style Guide, so rough consensus among the > HotSpot Group members is required to make this change. Only Group members > should vote for approval (via the github PR), though reasoned objections or > comments from anyone will be considered. A decision on this proposal will not > be made before Friday 16-June-2025 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review process > to approve (click on Review Changes > Approve), rather than sending a "vote: > yes" email reply that would be normal for a CFV. I approve of this change. A couple of minor tweaks to the text suggested. Thanks doc/hotspot-style.md line 1114: > 1112: > 1113: * Only the abbreviated form of `noexcept` exception specifications are > 1114: permitted. `noexcept` exception specifications with arguments are forbidden. Suggestion: * Only the argument-less form of `noexcept` exception specifications is permitted. doc/hotspot-style.md line 1131: > 1129: > 1130: The second is to allow the compiler and library code to choose different > 1131: algorithms, depending on whether a some function may throw exceptions. This is Suggestion: algorithms, depending on whether some function may throw exceptions. This is doc/hotspot-style.md line 1139: > 1137: such a function `noexcept` informs the compiler that `nullptr` is a possible > 1138: result. If an allocation function is not declared `noexcept` then the compiler > 1139: may elide that checking and handling for a using `new` expression. Suggestion: may elide that checking and handling for a `new` expression. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25574#pullrequestreview-2887010579 PR Review Comment: https://git.openjdk.org/jdk/pull/25574#discussion_r2120226615 PR Review Comment: https://git.openjdk.org/jdk/pull/25574#discussion_r2120229061 PR Review Comment: https://git.openjdk.org/jdk/pull/25574#discussion_r2120234324 From eosterlund at openjdk.org Mon Jun 2 07:31:54 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Jun 2025 07:31:54 GMT Subject: RFR: 8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent In-Reply-To: References: Message-ID: On Fri, 30 May 2025 09:40:21 GMT, Andrew Haley wrote: > > > It would surely be better if this evil were expunged from JDK 21 as well, lest it also confuse a backporter. > > > > > > Maybe a "here be dragons" warning would suffice. > > If you add the following comment above every call to `do_oop_store()` I'll approve this patch: > > `// Clobbers: r10, r11, r3` Hmm yes that feels like a good compromise. I added the comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25483#issuecomment-2929209038 From mbaesken at openjdk.org Mon Jun 2 07:33:27 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 07:33:27 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured Message-ID: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . Those fail when the address sanitizer is configured ( --enable-asan ). The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . While at it, also same is also added for ubsan . ------------- Commit messages: - remove zgc change - JDK-8357826 Changes: https://git.openjdk.org/jdk/pull/25575/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25575&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357826 Stats: 56 lines in 12 files changed: 54 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25575/head:pull/25575 PR: https://git.openjdk.org/jdk/pull/25575 From mbaesken at openjdk.org Mon Jun 2 07:33:27 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 07:33:27 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured In-Reply-To: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> Message-ID: On Mon, 2 Jun 2025 07:25:22 GMT, Matthias Baesken wrote: > There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . > Those fail when the address sanitizer is configured ( --enable-asan ). > The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. > Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . > While at it, also same is also added for ubsan . The change to src/hotspot/cpu/x86/gc/z/zAddress_x86.cpp was added because of zgc issues with ASAN but we will address this in another change so I remove it from here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25575#issuecomment-2929201143 From rvansa at openjdk.org Mon Jun 2 07:36:51 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Jun 2025 07:36:51 GMT Subject: RFR: 8352075: Perf regression accessing fields [v16] In-Reply-To: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: > This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . > > This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). > > In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. > > My measurements on the attached reproducer > > hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC > Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] > Range (min ? max): 45.1 ms ? 53.9 ms 100 runs > > hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC > Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] > Range (min ? max): 73.8 ms ? 79.7 ms 100 runs > > (the jdk25-master above already contains JDK-8353175) > > hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC > Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] > Range (min ? max): 37.7 ms ? 42.1 ms 100 runs > > While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: > > JDK 17: 1.6 s > JDK 21 (no patches): 22 s > JDK25-master: 12.3 s > JDK25-this-pr: 0.5 s Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Add type cast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24847/files - new: https://git.openjdk.org/jdk/pull/24847/files/70f62460..9cba2d4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=14-15 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24847/head:pull/24847 PR: https://git.openjdk.org/jdk/pull/24847 From jbhateja at openjdk.org Mon Jun 2 07:44:58 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 2 Jun 2025 07:44:58 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Fri, 9 May 2025 07:35:41 GMT, Xiaohong Gong wrote: > JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). > > Two key areas require improvement: > 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. > 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. > > This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. > > Main changes: > 1. Java-side API refactoring: > - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on > architectures like AArch64. > 2. C2 compiler IR refactoring: > - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. > 3. Backend changes: > - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. > > Performance: > The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: > > Benchmark Mode Cnt Unit SIZE Before After Gain > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.312 935.269 1.02 > GatherOperationsBenchmark.micr... Hi @XiaohongGong , Looks good to me, thanks again for this re-factor !! Best Regards, Jatin ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25138#pullrequestreview-2887157235 From eosterlund at openjdk.org Mon Jun 2 07:48:39 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Jun 2025 07:48:39 GMT Subject: RFR: 8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent [v2] In-Reply-To: References: Message-ID: > The optimized fast_aputfield bytecode on AArch64 stores the field flags in r3, and performs the leading and trailing fencing depending on its volatile bit being set or not. However, r3 is also the last temp register passed in to the barrier set for reference stores, and G1 clobbers it in a way that may clear the volatile bit. Then the trailing fence won't get executed, and sequential consistency is broken. > > My fix puts the flags in r5 instead, which is the register that was used by normal aputfield bytecodes. This way, barriers don't clobber the volatile bits. > > This bug has been observed to mess up a classic Dekker duality in the java.util.concurrent.Exchanger class, leading to a hang in the test/jdk/java/util/concurrent/Exchanger/ExchangeLoops.java test that exercises it. Using G1 and -Xint a reproducer hangs 30/100 times in mach5. With the fix, the same reproducer hangs 0/100 times. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Add comment about clobbered registers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25483/files - new: https://git.openjdk.org/jdk/pull/25483/files/44f7e092..c9440f68 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25483&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25483&range=00-01 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25483.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25483/head:pull/25483 PR: https://git.openjdk.org/jdk/pull/25483 From mbaesken at openjdk.org Mon Jun 2 08:07:38 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 08:07:38 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured [v2] In-Reply-To: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> Message-ID: > There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . > Those fail when the address sanitizer is configured ( --enable-asan ). > The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. > Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . > While at it, also same is also added for ubsan . Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: TestBreakSignalThreadDump has issues with asan ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25575/files - new: https://git.openjdk.org/jdk/pull/25575/files/3ad0d93a..aa796c8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25575&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25575&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25575/head:pull/25575 PR: https://git.openjdk.org/jdk/pull/25575 From mbaesken at openjdk.org Mon Jun 2 08:07:38 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 08:07:38 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured In-Reply-To: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> Message-ID: <4CZpPTh4S1qjEkxVcHZ-J8bxpkI4iTsOtX4iCG5M2Cw=.8c1f2e8e-02c1-4691-8d6f-aa362dd54932@github.com> On Mon, 2 Jun 2025 07:25:22 GMT, Matthias Baesken wrote: > There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . > Those fail when the address sanitizer is configured ( --enable-asan ). > The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. > Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . > While at it, also same is also added for ubsan . TestBreakSignalThreadDump shows this, so it does not work well with asan too stdout: []; stderr: [==12484==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25575#issuecomment-2929322761 From rvansa at openjdk.org Mon Jun 2 08:14:48 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Jun 2025 08:14:48 GMT Subject: RFR: 8352075: Perf regression accessing fields [v17] In-Reply-To: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: > This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . > > This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). > > In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. > > My measurements on the attached reproducer > > hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC > Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] > Range (min ? max): 45.1 ms ? 53.9 ms 100 runs > > hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC > Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] > Range (min ? max): 73.8 ms ? 79.7 ms 100 runs > > (the jdk25-master above already contains JDK-8353175) > > hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC > Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] > Range (min ? max): 37.7 ms ? 42.1 ms 100 runs > > While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: > > JDK 17: 1.6 s > JDK 21 (no patches): 22 s > JDK25-master: 12.3 s > JDK25-this-pr: 0.5 s Radim Vansa has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: - Add type cast - Fix static_assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24847/files - new: https://git.openjdk.org/jdk/pull/24847/files/9cba2d4a..c592ea59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=15-16 Stats: 53 lines in 4 files changed: 0 ins; 47 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24847/head:pull/24847 PR: https://git.openjdk.org/jdk/pull/24847 From shade at openjdk.org Mon Jun 2 08:16:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 08:16:54 GMT Subject: RFR: 8358231: Template interpreter generator crashes with ShouldNotReachHere on some platforms after 8353686 [v2] In-Reply-To: References: <8Xnq0jvMBRkxOk4-gheVgeDGuIPhXXlZ8Yt-NO3izhQ=.2ff06a32-52b1-4829-9c19-0106ef733399@github.com> Message-ID: On Sun, 1 Jun 2025 17:11:05 GMT, Martin Doerr wrote: >> Trivial build fix for PPC64 and s390. Added arm32. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add arm32 fix. Looks good, thanks! ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25568#pullrequestreview-2887258170 From mbaesken at openjdk.org Mon Jun 2 08:20:53 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 08:20:53 GMT Subject: RFR: 8358231: Template interpreter generator crashes with ShouldNotReachHere on some platforms after 8353686 [v2] In-Reply-To: References: <8Xnq0jvMBRkxOk4-gheVgeDGuIPhXXlZ8Yt-NO3izhQ=.2ff06a32-52b1-4829-9c19-0106ef733399@github.com> Message-ID: On Sun, 1 Jun 2025 17:11:05 GMT, Martin Doerr wrote: >> Trivial build fix for PPC64 and s390. Added arm32. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add arm32 fix. Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25568#pullrequestreview-2887272244 From kbarrett at openjdk.org Mon Jun 2 08:21:34 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 2 Jun 2025 08:21:34 GMT Subject: RFR: 8255082: HotSpot Style Guide should permit noexcept [v2] In-Reply-To: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> References: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> Message-ID: > Please review this change to permit the use of `noexcept` under certain > circumstances in HotSpot code. > > http://wg21.link/n3050 > > Testing: > > JDK-8316930 (HotSpot should use noexcept instead of throw()) showed what the > conversion would look like. It will need to be brought up to current mainline, > possibly with modifications. > > This is a modification of the Style Guide, so rough consensus among the > HotSpot Group members is required to make this change. Only Group members > should vote for approval (via the github PR), though reasoned objections or > comments from anyone will be considered. A decision on this proposal will not > be made before Friday 16-June-2025 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review process > to approve (click on Review Changes > Approve), rather than sending a "vote: > yes" email reply that would be normal for a CFV. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: dholmes review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25574/files - new: https://git.openjdk.org/jdk/pull/25574/files/6364b3d4..e6decd1f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25574&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25574&range=00-01 Stats: 8 lines in 2 files changed: 1 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/25574.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25574/head:pull/25574 PR: https://git.openjdk.org/jdk/pull/25574 From kbarrett at openjdk.org Mon Jun 2 08:21:34 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 2 Jun 2025 08:21:34 GMT Subject: RFR: 8255082: HotSpot Style Guide should permit noexcept [v2] In-Reply-To: <8ueGNCZGkc0fbJHYg8l2XPSG0w2DAxKf4e59ClyXhGw=.5497fc78-f598-4af4-b745-d05f7115e953@github.com> References: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> <8ueGNCZGkc0fbJHYg8l2XPSG0w2DAxKf4e59ClyXhGw=.5497fc78-f598-4af4-b745-d05f7115e953@github.com> Message-ID: On Mon, 2 Jun 2025 06:58:39 GMT, David Holmes wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> dholmes review > > doc/hotspot-style.md line 1139: > >> 1137: such a function `noexcept` informs the compiler that `nullptr` is a possible >> 1138: result. If an allocation function is not declared `noexcept` then the compiler >> 1139: may elide that checking and handling for a using `new` expression. > > Suggestion: > > may elide that checking and handling for a `new` expression. Instead changed to "may elide that checking and handling for a `new` expression calling that function." It's not _any_ `new` expression that might have stuff elided, only one that calls the not-nothrow allocation function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25574#discussion_r2120385617 From kbarrett at openjdk.org Mon Jun 2 08:24:01 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 2 Jun 2025 08:24:01 GMT Subject: RFR: 8255082: HotSpot Style Guide should permit noexcept In-Reply-To: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> References: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> Message-ID: On Mon, 2 Jun 2025 05:28:17 GMT, Kim Barrett wrote: > Please review this change to permit the use of `noexcept` under certain > circumstances in HotSpot code. > > http://wg21.link/n3050 > > Testing: > > JDK-8316930 (HotSpot should use noexcept instead of throw()) showed what the > conversion would look like. It will need to be brought up to current mainline, > possibly with modifications. > > This is a modification of the Style Guide, so rough consensus among the > HotSpot Group members is required to make this change. Only Group members > should vote for approval (via the github PR), though reasoned objections or > comments from anyone will be considered. A decision on this proposal will not > be made before Friday 16-June-2025 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review process > to approve (click on Review Changes > Approve), rather than sending a "vote: > yes" email reply that would be normal for a CFV. I forgot to mention that of course the current code is out of conformance with this, since we're currently using `throw()` to declare allocation functions as being nothrow. Once this style guide is approved, we (probably meaning I) will need to update the code accordingly. Probably not as a big query-replace either, as I've already found one mistake. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25574#issuecomment-2929385129 From shade at openjdk.org Mon Jun 2 08:25:00 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 08:25:00 GMT Subject: RFR: 8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent [v2] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 07:48:39 GMT, Erik ?sterlund wrote: >> The optimized fast_aputfield bytecode on AArch64 stores the field flags in r3, and performs the leading and trailing fencing depending on its volatile bit being set or not. However, r3 is also the last temp register passed in to the barrier set for reference stores, and G1 clobbers it in a way that may clear the volatile bit. Then the trailing fence won't get executed, and sequential consistency is broken. >> >> My fix puts the flags in r5 instead, which is the register that was used by normal aputfield bytecodes. This way, barriers don't clobber the volatile bits. >> >> This bug has been observed to mess up a classic Dekker duality in the java.util.concurrent.Exchanger class, leading to a hang in the test/jdk/java/util/concurrent/Exchanger/ExchangeLoops.java test that exercises it. Using G1 and -Xint a reproducer hangs 30/100 times in mach5. With the fix, the same reproducer hangs 0/100 times. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Add comment about clobbered registers Well, since we are introducing the hunks near `do_oop_store`-s, and thus extending the scope of the patch. At this point, we can just inline `do_oop_store` (and maybe `do_oop_load`?), like Andrew initially suggested. This will also match what RISC-V already did: https://github.com/openjdk/jdk/commit/c5a1543ee3e68775f09ca29fb07efd9aebfdb33e ------------- PR Review: https://git.openjdk.org/jdk/pull/25483#pullrequestreview-2887283595 From mdoerr at openjdk.org Mon Jun 2 08:31:57 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Jun 2025 08:31:57 GMT Subject: RFR: 8358231: Template interpreter generator crashes with ShouldNotReachHere on some platforms after 8353686 [v2] In-Reply-To: References: <8Xnq0jvMBRkxOk4-gheVgeDGuIPhXXlZ8Yt-NO3izhQ=.2ff06a32-52b1-4829-9c19-0106ef733399@github.com> Message-ID: On Sun, 1 Jun 2025 17:11:05 GMT, Martin Doerr wrote: >> Trivial build fix for PPC64 and s390. Added arm32. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add arm32 fix. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25568#issuecomment-2929404503 From mdoerr at openjdk.org Mon Jun 2 08:31:57 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Jun 2025 08:31:57 GMT Subject: Integrated: 8358231: Template interpreter generator crashes with ShouldNotReachHere on some platforms after 8353686 In-Reply-To: <8Xnq0jvMBRkxOk4-gheVgeDGuIPhXXlZ8Yt-NO3izhQ=.2ff06a32-52b1-4829-9c19-0106ef733399@github.com> References: <8Xnq0jvMBRkxOk4-gheVgeDGuIPhXXlZ8Yt-NO3izhQ=.2ff06a32-52b1-4829-9c19-0106ef733399@github.com> Message-ID: <4qHafyELt_8KULAwgyl9NSO8VGsIlEAxQp7XCFCFVb8=.f57fa1e6-8b54-4f88-b052-0cfd1b0114d9@github.com> On Sat, 31 May 2025 22:18:33 GMT, Martin Doerr wrote: > Trivial build fix for PPC64 and s390. Added arm32. This pull request has now been integrated. Changeset: 40ce05d4 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/40ce05d4080a9a2b4876c21f83a184f9b8a580a2 Stats: 3 lines in 3 files changed: 3 ins; 0 del; 0 mod 8358231: Template interpreter generator crashes with ShouldNotReachHere on some platforms after 8353686 Reviewed-by: shade, amitkumar, mbaesken, kvn ------------- PR: https://git.openjdk.org/jdk/pull/25568 From ayang at openjdk.org Mon Jun 2 08:42:02 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 2 Jun 2025 08:42:02 GMT Subject: RFR: 8358294: Remove unnecessary GenAlignment Message-ID: Simple replacement of `GenAlignment` with `SpaceAlignment`, because they always have the same value. Removing the former to reduce complexity. Test: tier1-3 ------------- Commit messages: - remove-gen-alignment Changes: https://git.openjdk.org/jdk/pull/25577/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25577&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358294 Stats: 105 lines in 16 files changed: 0 ins; 46 del; 59 mod Patch: https://git.openjdk.org/jdk/pull/25577.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25577/head:pull/25577 PR: https://git.openjdk.org/jdk/pull/25577 From jbechberger at openjdk.org Mon Jun 2 08:44:01 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 08:44:01 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 13:01:23 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 200: > >> 198: void sample_thread(JfrSampleRequest& request, void* ucontext, JavaThread* jt, JfrThreadLocal* tl); >> 199: >> 200: // sample all threads that are in native state (and requested to be sampled) > > We are not really "sampling", but processing their queues, no? You're correct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120450563 From jwaters at openjdk.org Mon Jun 2 08:46:52 2025 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 2 Jun 2025 08:46:52 GMT Subject: RFR: 8255082: HotSpot Style Guide should permit noexcept In-Reply-To: References: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> Message-ID: On Mon, 2 Jun 2025 08:20:57 GMT, Kim Barrett wrote: > I forgot to mention that of course the current code is out of conformance with this, since we're currently using `throw()` to declare allocation functions as being nothrow. Once this style guide is approved, we (probably meaning I) will need to update the code accordingly. Probably not as a big query-replace either, as I've already found one mistake. If it's easier I can bring the original change to noexcept Pull Request back from the dead and remove the merge mistakes that leaked in from my other branch, which shouldn't really be that difficult to do. Not sure which code is potentially marked throw() wrongly though. Alternatively, we could just keep throw() alongside noexcept for code that already uses it, to avoid code churn. They do mean the same thing in C++17, after all (I was going to mention that there are papers for static exception specifications that propose reintroducing throw() back into C++ last I remembered, but realized that this likely doesn't mean much for us now, so this point can be ignored) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25574#issuecomment-2929473632 From jbechberger at openjdk.org Mon Jun 2 08:47:01 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 08:47:01 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: <3d549Fxkhzd6v0fAVFEBOcxZ7hBKI1ZAUafLClp7Npw=.70183618-7dbf-4e05-bcc8-fd1216741c66@github.com> On Sun, 1 Jun 2025 13:05:44 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 367: > >> 365: JfrCPUTimeSampleRequest& request = queue.at(i); >> 366: JfrStackTrace stacktrace; >> 367: traceid tid = JfrThreadLocal::thread_id(thread); > > Check the tid as a function of the JfrSampleRequest, like we do in JFR Cooperative Sampling. You mean ` const traceid tid = in_continuation ? tl->vthread_id_with_epoch_update(jt) : JfrThreadLocal::jvm_thread_id(jt);`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120458307 From jbechberger at openjdk.org Mon Jun 2 08:53:02 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 08:53:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: <3d549Fxkhzd6v0fAVFEBOcxZ7hBKI1ZAUafLClp7Npw=.70183618-7dbf-4e05-bcc8-fd1216741c66@github.com> References: <3d549Fxkhzd6v0fAVFEBOcxZ7hBKI1ZAUafLClp7Npw=.70183618-7dbf-4e05-bcc8-fd1216741c66@github.com> Message-ID: On Mon, 2 Jun 2025 08:44:01 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 367: >> >>> 365: JfrCPUTimeSampleRequest& request = queue.at(i); >>> 366: JfrStackTrace stacktrace; >>> 367: traceid tid = JfrThreadLocal::thread_id(thread); >> >> Check the tid as a function of the JfrSampleRequest, like we do in JFR Cooperative Sampling. > > You mean ` const traceid tid = in_continuation ? tl->vthread_id_with_epoch_update(jt) : JfrThreadLocal::jvm_thread_id(jt);`? I implemented this in this function now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120473792 From eosterlund at openjdk.org Mon Jun 2 08:56:56 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Jun 2025 08:56:56 GMT Subject: RFR: 8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent [v2] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 07:48:39 GMT, Erik ?sterlund wrote: >> The optimized fast_aputfield bytecode on AArch64 stores the field flags in r3, and performs the leading and trailing fencing depending on its volatile bit being set or not. However, r3 is also the last temp register passed in to the barrier set for reference stores, and G1 clobbers it in a way that may clear the volatile bit. Then the trailing fence won't get executed, and sequential consistency is broken. >> >> My fix puts the flags in r5 instead, which is the register that was used by normal aputfield bytecodes. This way, barriers don't clobber the volatile bits. >> >> This bug has been observed to mess up a classic Dekker duality in the java.util.concurrent.Exchanger class, leading to a hang in the test/jdk/java/util/concurrent/Exchanger/ExchangeLoops.java test that exercises it. Using G1 and -Xint a reproducer hangs 30/100 times in mach5. With the fix, the same reproducer hangs 0/100 times. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Add comment about clobbered registers > Well, since we are introducing the hunks near `do_oop_store`-s, and thus extending the scope of the patch. At this point, we can just inline `do_oop_store` (and maybe `do_oop_load`?), like Andrew initially suggested. This will also match what RISC-V already did: [c5a1543](https://github.com/openjdk/jdk/commit/c5a1543ee3e68775f09ca29fb07efd9aebfdb33e) RISC-V doesn't really have the backporting until JDK 8 problem. I'd really like to make that cosmetic change in the next follow-up PR instead, as previously discussed. The comments hold true all the way back to JDK 8 and don't change the logic, so I can go along with that. And I'd rather take the risk of getting some comment wrong on the way back to JDK 8, than fiddling with the guts of all this unrelated code, that has changed substantially since back then. Does that sound okay? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25483#issuecomment-2929515436 From jbechberger at openjdk.org Mon Jun 2 08:57:04 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 08:57:04 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 13:41:44 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 558: >> >>> 556: jt->is_JfrRecorder_thread()) { >>> 557: queue.increment_lost_samples(); >>> 558: tl->set_do_async_processing_of_cpu_time_jfr_requests(false); >> >> Why is this restored here? > > Because I shouldn't sample if the thread isn't in native state anymore. The thread is probably sampled anyway on the outgoing safepoint. But you might be right, I removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120481274 From aboldtch at openjdk.org Mon Jun 2 08:59:29 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 2 Jun 2025 08:59:29 GMT Subject: RFR: 8358310: ZGC: riscv, ppc ZPlatformAddressOffsetBits may return a too large value Message-ID: <6j_zozeh-Vwu3tRHRlJ5h_mhcMFsNm_OMUinAosz8fU=.d51c8c95-aad1-4566-a23b-8da5b521aa90@github.com> The way that ZPlatformAddressOffsetBits is implemented on riscv and ppc may result in a return value of 45. This is larger than the max supported value of 44 (because of other internal data structures). This was fixed in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275) for aarch64. Before [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the issue on manifested if one tried to select a heap larger than 16 TB (not supported), but after [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we try to double the heap address space when running on a NUMA machine. So we may now encounter this bug for heaps larger than 8TB (which is supported). While ZPlatformAddressOffsetBits needs an overhaul. (It was written for non-generational ZGC where we had the three color bits inside the address.) The proposal is that we solve this for ppc and riscv by doing the same thing we did for aarch64 in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275) ------------- Commit messages: - 8358310: ZGC: riscv, ppc ZPlatformAddressOffsetBits may return a too large value Changes: https://git.openjdk.org/jdk/pull/25578/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25578&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358310 Stats: 10 lines in 2 files changed: 4 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25578.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25578/head:pull/25578 PR: https://git.openjdk.org/jdk/pull/25578 From jbechberger at openjdk.org Mon Jun 2 09:01:05 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 09:01:05 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 18:10:15 GMT, Markus Gr?nlund wrote: >> Not without allocating in the signal handler > > How so? Because we need to add the threads from the signal handler. So any kind of growing array or set would not work, especially if we want to remove the threads from within the signal handler again. This is certainly an area of future optimization, albeit this doesn't seem to have any measurable performance impact in my renaissance benchmark runs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120492743 From mbaesken at openjdk.org Mon Jun 2 09:03:55 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 09:03:55 GMT Subject: RFR: 8357155: [asan] ZGC does not work In-Reply-To: References: Message-ID: <_4nt7X3dG4RfwD7R_no-3YCTNIUkWh0s6o4-eFQjHJw=.98f7be0d-b7ae-4a14-b4b8-459b6ed2c615@github.com> On Fri, 30 May 2025 15:00:53 GMT, Axel Boldt-Christmas wrote: > I was hoping this could work for Linux with 47/48 bit aarch64 VMA. But it is unclear how ASAN selects its mappings on such platforms. > > On 39/42 bit VMA returning `MIN2(valid_max_address_offset_bits, 44)` as I suggested in the PPC function may be a better best effort, as we are using addresses where we actually probed that reservations could be possible). Or even `MIN2(valid_max_address_offset_bits - 1, 44)`. Feel free to try it out, but I think this is otherwise an alright approach until we implement a better heap base selection strategy where we can test multiple base candidates. Thanks for the aarch64 related suggestions, unfortunately both do not work. So I change only the files for x86_64 and ppc64 . ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25549#discussion_r2120500717 From jbechberger at openjdk.org Mon Jun 2 09:05:02 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 09:05:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 18:00:55 GMT, Markus Gr?nlund wrote: >> Yes, so I only start the thread walking if necessary > > I see. With a bounded queue as used in this solution, it can work quite nicely, that is, if the thread is actually on CPU in native, and just not waiting - if waiting (which is most likely) then pending requests could take a long time to be sent to consumers. > > I also understand better the optimization you tried as part of async walk in native and frames. Also quite nice, to walk from the last JfrSampleRequest and do equals to "batch" the top JFR sample requests that are the same (i,.e taken for the ljf). Maybe you can retry that again, but then you need to save the sid AND the tid to be reused for the top equal requests (you only need stacktrace.record_inner() for one request). Its a nice optimization. The problem is when in between queue processing a new JFR chunk is started. This caused problems before. I would leave these kinds of optimizations for later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120501728 From aph at openjdk.org Mon Jun 2 09:06:59 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 2 Jun 2025 09:06:59 GMT Subject: RFR: 8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent [v2] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 07:48:39 GMT, Erik ?sterlund wrote: >> The optimized fast_aputfield bytecode on AArch64 stores the field flags in r3, and performs the leading and trailing fencing depending on its volatile bit being set or not. However, r3 is also the last temp register passed in to the barrier set for reference stores, and G1 clobbers it in a way that may clear the volatile bit. Then the trailing fence won't get executed, and sequential consistency is broken. >> >> My fix puts the flags in r5 instead, which is the register that was used by normal aputfield bytecodes. This way, barriers don't clobber the volatile bits. >> >> This bug has been observed to mess up a classic Dekker duality in the java.util.concurrent.Exchanger class, leading to a hang in the test/jdk/java/util/concurrent/Exchanger/ExchangeLoops.java test that exercises it. Using G1 and -Xint a reproducer hangs 30/100 times in mach5. With the fix, the same reproducer hangs 0/100 times. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Add comment about clobbered registers Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25483#pullrequestreview-2887436174 From jbechberger at openjdk.org Mon Jun 2 09:09:04 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 09:09:04 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Sun, 1 Jun 2025 18:03:15 GMT, Markus Gr?nlund wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 344: >> >>> 342: >>> 343: // equals operator for JfrSampleRequest >>> 344: inline bool operator==(const JfrSampleRequest& lhs, const JfrSampleRequest& rhs) { >> >> Can be removed. > > Unless you still want to try the ljf JfrSampleRequest optimization for the native ljf, which I kind of like now that I understand it. As I said, it's a great optimization. But it needs some work. I therefore remove this method for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120511048 From mbaesken at openjdk.org Mon Jun 2 09:11:05 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 09:11:05 GMT Subject: RFR: 8357155: [asan] ZGC does not work [v2] In-Reply-To: References: Message-ID: > Many (all?) ZGC related jtreg tests do not work when the JDK is built with address sanitizer asan enabled (configure flag --enable-asan). > This can be seen on SUSE Linux x86_64 and also on ppc64le , opt binaries were used. > It has been suggested to do a workaround - 'But I think that simply adapting the zAddress_[...].cpp implementations to always select the largest heap base would go a long way for providing ASAN compatibility.' > This seems to work nicely on x86_64 and ppc64le, however the zgc related tests still fail on Linux aarch64 (should I exclude this platform from my patch?) . Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: remove aarch64 from the change, adjust ppc64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25549/files - new: https://git.openjdk.org/jdk/pull/25549/files/ed2885ff..82a11f9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25549&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25549&range=00-01 Stats: 5 lines in 2 files changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25549.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25549/head:pull/25549 PR: https://git.openjdk.org/jdk/pull/25549 From mbaesken at openjdk.org Mon Jun 2 09:11:05 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 09:11:05 GMT Subject: RFR: 8357155: [asan] ZGC does not work In-Reply-To: References: Message-ID: On Fri, 30 May 2025 12:18:46 GMT, Matthias Baesken wrote: > Many (all?) ZGC related jtreg tests do not work when the JDK is built with address sanitizer asan enabled (configure flag --enable-asan). > This can be seen on SUSE Linux x86_64 and also on ppc64le , opt binaries were used. > It has been suggested to do a workaround - 'But I think that simply adapting the zAddress_[...].cpp implementations to always select the largest heap base would go a long way for providing ASAN compatibility.' > This seems to work nicely on x86_64 and ppc64le, however the zgc related tests still fail on Linux aarch64 (should I exclude this platform from my patch?) . I think we handle just x86_64 and ppc64 in this change. Should I adjust the subject ? Btw Axel, should I add you as contributor, makes probably sense ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25549#issuecomment-2929574262 From shade at openjdk.org Mon Jun 2 09:11:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 09:11:55 GMT Subject: RFR: 8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent [v2] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 07:48:39 GMT, Erik ?sterlund wrote: >> The optimized fast_aputfield bytecode on AArch64 stores the field flags in r3, and performs the leading and trailing fencing depending on its volatile bit being set or not. However, r3 is also the last temp register passed in to the barrier set for reference stores, and G1 clobbers it in a way that may clear the volatile bit. Then the trailing fence won't get executed, and sequential consistency is broken. >> >> My fix puts the flags in r5 instead, which is the register that was used by normal aputfield bytecodes. This way, barriers don't clobber the volatile bits. >> >> This bug has been observed to mess up a classic Dekker duality in the java.util.concurrent.Exchanger class, leading to a hang in the test/jdk/java/util/concurrent/Exchanger/ExchangeLoops.java test that exercises it. Using G1 and -Xint a reproducer hangs 30/100 times in mach5. With the fix, the same reproducer hangs 0/100 times. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Add comment about clobbered registers For me, straight-up inlining: __ store_heap_oop(dst, val, r10, r11, r3, decorators); ...conveys the similar message as `// Clobbers: r10, r11, r3`. But I shall not quibble. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25483#pullrequestreview-2887454204 From aboldtch at openjdk.org Mon Jun 2 09:20:51 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 2 Jun 2025 09:20:51 GMT Subject: RFR: 8357155: [asan] ZGC does not work In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 09:08:32 GMT, Matthias Baesken wrote: > I think we handle just x86_64 and ppc64 in this change. Should I adjust the subject ? Sounds good. We should probably make this explicit in the title. > Btw Axel, should I add you as contributor, makes probably sense ? Yeah, you can add me as a contributor. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25549#issuecomment-2929615698 From mbaesken at openjdk.org Mon Jun 2 09:23:56 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 09:23:56 GMT Subject: RFR: 8357981: [PPC64] Remove old instructions from VM_Version::determine_features() In-Reply-To: References: Message-ID: On Wed, 28 May 2025 14:31:40 GMT, Martin Doerr wrote: > Simple cleanup after [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859). The old instructions are always available and don't need to be tried in `VM_Version::determine_features()`. > > On Power10: > > -------------------------------------------------------------------------------- > Decoding cpu-feature detection stub at 0x000079b9203c0380 after execution: > -------------------------------------------------------------------------------- > 0x000079b9203c0380: darn r7,1 > 0x000079b9203c0384: brw r5,r6 > 0x000079b9203c0388: blr bo=0b10100,bh=0b00[subroutine_return] > 0x000079b9203c038c: dcbz 0,r3 > 0x000079b9203c0390: blr bo=0b10100,bh=0b00[subroutine_return] > > > Also tested on older processors: On Power9, `brw` gets zeroed out. On Power8, `darn` also gets zeroed out. Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25495#pullrequestreview-2887491611 From mdoerr at openjdk.org Mon Jun 2 09:23:56 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Jun 2025 09:23:56 GMT Subject: RFR: 8357981: [PPC64] Remove old instructions from VM_Version::determine_features() In-Reply-To: References: Message-ID: On Wed, 28 May 2025 14:31:40 GMT, Martin Doerr wrote: > Simple cleanup after [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859). The old instructions are always available and don't need to be tried in `VM_Version::determine_features()`. > > On Power10: > > -------------------------------------------------------------------------------- > Decoding cpu-feature detection stub at 0x000079b9203c0380 after execution: > -------------------------------------------------------------------------------- > 0x000079b9203c0380: darn r7,1 > 0x000079b9203c0384: brw r5,r6 > 0x000079b9203c0388: blr bo=0b10100,bh=0b00[subroutine_return] > 0x000079b9203c038c: dcbz 0,r3 > 0x000079b9203c0390: blr bo=0b10100,bh=0b00[subroutine_return] > > > Also tested on older processors: On Power9, `brw` gets zeroed out. On Power8, `darn` also gets zeroed out. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25495#issuecomment-2929629790 From mdoerr at openjdk.org Mon Jun 2 09:23:57 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Jun 2025 09:23:57 GMT Subject: Integrated: 8357981: [PPC64] Remove old instructions from VM_Version::determine_features() In-Reply-To: References: Message-ID: On Wed, 28 May 2025 14:31:40 GMT, Martin Doerr wrote: > Simple cleanup after [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859). The old instructions are always available and don't need to be tried in `VM_Version::determine_features()`. > > On Power10: > > -------------------------------------------------------------------------------- > Decoding cpu-feature detection stub at 0x000079b9203c0380 after execution: > -------------------------------------------------------------------------------- > 0x000079b9203c0380: darn r7,1 > 0x000079b9203c0384: brw r5,r6 > 0x000079b9203c0388: blr bo=0b10100,bh=0b00[subroutine_return] > 0x000079b9203c038c: dcbz 0,r3 > 0x000079b9203c0390: blr bo=0b10100,bh=0b00[subroutine_return] > > > Also tested on older processors: On Power9, `brw` gets zeroed out. On Power8, `darn` also gets zeroed out. This pull request has now been integrated. Changeset: 612f2c0c Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/612f2c0c0b75466c60d4b54dab6aa793a810c846 Stats: 75 lines in 2 files changed: 0 ins; 71 del; 4 mod 8357981: [PPC64] Remove old instructions from VM_Version::determine_features() Reviewed-by: dbriemann, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/25495 From aboldtch at openjdk.org Mon Jun 2 09:24:52 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 2 Jun 2025 09:24:52 GMT Subject: RFR: 8357155: [asan] ZGC does not work (x86_64 and ppc64) [v2] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 09:11:05 GMT, Matthias Baesken wrote: >> Many (all?) ZGC related jtreg tests do not work when the JDK is built with address sanitizer asan enabled (configure flag --enable-asan). >> This can be seen on SUSE Linux x86_64 and also on ppc64le , opt binaries were used. >> It has been suggested to do a workaround - 'But I think that simply adapting the zAddress_[...].cpp implementations to always select the largest heap base would go a long way for providing ASAN compatibility.' >> This seems to work nicely on x86_64 and ppc64le, however the zgc related tests still fail on Linux aarch64 (should I exclude this platform from my patch?) . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > remove aarch64 from the change, adjust ppc64 Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25549#pullrequestreview-2887498234 From jbechberger at openjdk.org Mon Jun 2 09:28:01 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 09:28:01 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 04:28:02 GMT, David Holmes wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactoring >> - Remove convoluted native trace logic > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 119: > >> 117: _data = new_data; >> 118: _capacity = capacity; >> 119: } > > I assume there is a lock protecting this so it happens atomically? This happens before the signal handler is attached to thread. So it does happen before any parallelism is introduced on thread creation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120557327 From mbaesken at openjdk.org Mon Jun 2 09:32:56 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 09:32:56 GMT Subject: RFR: 8357155: [asan] ZGC does not work (x86_64 and ppc64) [v2] In-Reply-To: References: Message-ID: <32poF3-6QghOwLYJ6GBMsAmGx8xcFOE9g5vqmoqzNJ0=.11438af8-f402-45e9-b74b-fcc963b2d169@github.com> On Mon, 2 Jun 2025 09:11:05 GMT, Matthias Baesken wrote: >> Many (all?) ZGC related jtreg tests do not work when the JDK is built with address sanitizer asan enabled (configure flag --enable-asan). >> This can be seen on SUSE Linux x86_64 and also on ppc64le , opt binaries were used. >> It has been suggested to do a workaround - 'But I think that simply adapting the zAddress_[...].cpp implementations to always select the largest heap base would go a long way for providing ASAN compatibility.' >> This seems to work nicely on x86_64 and ppc64le, however the zgc related tests still fail on Linux aarch64 (should I exclude this platform from my patch?) . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > remove aarch64 from the change, adjust ppc64 contributor add xmas92 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25549#issuecomment-2929667209 From mbaesken at openjdk.org Mon Jun 2 09:35:51 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Jun 2025 09:35:51 GMT Subject: RFR: 8357155: [asan] ZGC does not work (x86_64 and ppc64) [v2] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 09:11:05 GMT, Matthias Baesken wrote: >> Many (all?) ZGC related jtreg tests do not work when the JDK is built with address sanitizer asan enabled (configure flag --enable-asan). >> This can be seen on SUSE Linux x86_64 and also on ppc64le , opt binaries were used. >> It has been suggested to do a workaround - 'But I think that simply adapting the zAddress_[...].cpp implementations to always select the largest heap base would go a long way for providing ASAN compatibility.' >> This seems to work nicely on x86_64 and ppc64le, however the zgc related tests still fail on Linux aarch64 (should I exclude this platform from my patch?) . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > remove aarch64 from the change, adjust ppc64 contributor /add xmas92 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25549#issuecomment-2929681609 From duke at openjdk.org Mon Jun 2 09:43:08 2025 From: duke at openjdk.org (Anton Artemov) Date: Mon, 2 Jun 2025 09:43:08 GMT Subject: RFR: 8284017: Improve handshake filtering mechanism Message-ID: Hi, please consider the following enhancement: In this PR a new way of supplying multiple arguments to filter out / skip operations in handshake/safepoint poll is given. Multiple boolean arguments are combined in a hash table, where keys are taken from a new enum `HandshakeOperationProperty`, which is to be modified when there is a need for a new argument. Tested in GHA and tiers 1 - 3. ------------- Commit messages: - 8284017: Changed variable name to operation_filter. - 8284017: Added typedef. - Merge remote-tracking branch 'origin/master' into JDK-8284017-handshake-filtering - 8284017: Added missed include statement. - 8284017: Changed to enum class for filter operation value. - 8284017: Added resource mark.s - 8284017: Combined bool params into resourceHashTable for filtering Changes: https://git.openjdk.org/jdk/pull/25497/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25497&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8284017 Stats: 66 lines in 9 files changed: 38 ins; 2 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/25497.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25497/head:pull/25497 PR: https://git.openjdk.org/jdk/pull/25497 From mdoerr at openjdk.org Mon Jun 2 09:47:28 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Jun 2025 09:47:28 GMT Subject: RFR: 8358013: [PPC64] VSX has poor performance on Power8 [v3] In-Reply-To: <6lRLaDtZkFd5zdOobo1RnSODoZk3r7T-sgjfpcnUVwU=.ad525055-0f15-4866-a295-20e2183eaf7b@github.com> References: <6lRLaDtZkFd5zdOobo1RnSODoZk3r7T-sgjfpcnUVwU=.ad525055-0f15-4866-a295-20e2183eaf7b@github.com> Message-ID: > Power8 only has limited VSX instructions for the superword optimization and the Vector API and the performance is bad. Let's only use it on Power9 and newer by default. This change excludes the VSX registers from C2 register allocation for Power8. VSX instruction usage gets limited to a few places like intrinsics. > > Note: Power8 is an old processor and performance optimizations for it are no longer planned. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge remote-tracking branch 'origin' into PPC64_disable_SuperwordUseVSX_Power8 - Improve description of 8358013: [PPC64] VSXSuperwordUseVSX. - 8358013: [PPC64] VSX has poor performance on Power8 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25514/files - new: https://git.openjdk.org/jdk/pull/25514/files/1f8b0e91..599a4f36 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25514&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25514&range=01-02 Stats: 32865 lines in 385 files changed: 12812 ins; 12713 del; 7340 mod Patch: https://git.openjdk.org/jdk/pull/25514.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25514/head:pull/25514 PR: https://git.openjdk.org/jdk/pull/25514 From eosterlund at openjdk.org Mon Jun 2 10:11:53 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Jun 2025 10:11:53 GMT Subject: RFR: 8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent [v2] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 07:48:39 GMT, Erik ?sterlund wrote: >> The optimized fast_aputfield bytecode on AArch64 stores the field flags in r3, and performs the leading and trailing fencing depending on its volatile bit being set or not. However, r3 is also the last temp register passed in to the barrier set for reference stores, and G1 clobbers it in a way that may clear the volatile bit. Then the trailing fence won't get executed, and sequential consistency is broken. >> >> My fix puts the flags in r5 instead, which is the register that was used by normal aputfield bytecodes. This way, barriers don't clobber the volatile bits. >> >> This bug has been observed to mess up a classic Dekker duality in the java.util.concurrent.Exchanger class, leading to a hang in the test/jdk/java/util/concurrent/Exchanger/ExchangeLoops.java test that exercises it. Using G1 and -Xint a reproducer hangs 30/100 times in mach5. With the fix, the same reproducer hangs 0/100 times. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Add comment about clobbered registers Thanks for the reviews everyone! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25483#issuecomment-2929825222 From mgronlun at openjdk.org Mon Jun 2 10:21:50 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 10:21:50 GMT Subject: RFR: 8358205: Remove unused JFR array allocation code In-Reply-To: <4iPujAp0lL_pVhcjlfMX42dIqE7Aw5X8FZr2k5cSFGo=.139bdd20-c798-4335-9ebd-cf0748e7d339@github.com> References: <4iPujAp0lL_pVhcjlfMX42dIqE7Aw5X8FZr2k5cSFGo=.139bdd20-c798-4335-9ebd-cf0748e7d339@github.com> Message-ID: On Fri, 30 May 2025 18:10:07 GMT, Coleen Phillimore wrote: > The JFR code is using ObjArray->allocate() directly rather than going through oopFactory. In Valhalla, the oopFactory code is being changed to account for new array shapes and attributes, so all code should call that instead. Turns out this function is unused, so this change removes it. Tested with tier1-7 with a ShouldNotReachHere(), then jdk/jfr tests with the removal. Marked as reviewed by mgronlun (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25553#pullrequestreview-2887708018 From mdoerr at openjdk.org Mon Jun 2 10:40:54 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Jun 2025 10:40:54 GMT Subject: RFR: 8357155: [asan] ZGC does not work (x86_64 and ppc64) [v2] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 14:37:56 GMT, Axel Boldt-Christmas wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> remove aarch64 from the change, adjust ppc64 > > src/hotspot/cpu/ppc/gc/z/zAddress_ppc.cpp line 95: > >> 93: const size_t max_address_offset_bits = valid_max_address_offset_bits - 3; >> 94: #ifdef ADDRESS_SANITIZER >> 95: return max_address_offset_bits; > > I think this actually has to be > ```c++ > return MIN2(valid_max_address_offset_bits, 44); > > > Because the way we probe we may otherwise return 45 here. Which could result in more than 44 bits in a ZOffset which our internal data structures cannot handle. Hopefully this still works for ASAN on PPC. (The `-3` is a left over from non-generational ZGC). Aarch64 could do the same, but it does not have this issue as it starts its probing at bit 46, not bit 47. > > _Side note: This makes me realise that there probably is a bug here on PPC and RISCV if running on a NUMA machine with more than 8 TB heap. As after ZGlobalsPointers::min_address_offset_request() was introduced we can return 45 from this function._ @xmas92: Thanks for looking into this! Should we set `DEFAULT_MAX_ADDRESS_BIT = 44` and use the constant? Or maybe file a separate issue for fixing that on aarch64, PPC64 and riscv? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25549#discussion_r2120738138 From epeter at openjdk.org Mon Jun 2 10:50:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Jun 2025 10:50:54 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Fri, 30 May 2025 08:15:22 GMT, Xiaohong Gong wrote: >>> @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. >>> >>> Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): >>> >>> https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 >>> >>> I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. >> >> Sounds good to me. I will have a deep investigation for it. Thanks! >> >> >> >>> > Yes, I also observed such regression. >>> > It would be nice if you proactively mentioned regressions, so it does not have to be pointed out by reviewers. >>> >>> For me, it could be ok to fix it in a follow-up patch. I think we are too close to RDP1 for JDK25 now anyway, and so we could push this patch here into JDK26, and then we have enough time in JDK26 to investigate the regression. Even better would be if we could do the other patch first, so we never even encounter a regression. >> >> Sounds good to me. Thanks! > >> > @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. >> > Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): >> > https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 >> > >> > I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. >> >> Sounds good to me. I will have a deep investigation for it. Thanks! > > Hi @eme64 @jatin-bhateja, I'v created a PR https://github.com/openjdk/jdk/pull/25539 to fix this issue. With this change, the performance regression can be fixed as well. Could you please take a look at that change and help to run the test on different X86 machines? Thanks a lot! @XiaohongGong I reviewed https://github.com/openjdk/jdk/pull/25539. Since it is a relatively simple patch, I suggest that we integrate that one first, and come back to this here later. Is that ok for you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2930007655 From ayang at openjdk.org Mon Jun 2 10:51:06 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 2 Jun 2025 10:51:06 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v9] In-Reply-To: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: <-mRIrbyrBpxq1lZ2tfcxIuxRLh5lcoURlM-woAXM45k=.7c152a76-e34f-42ba-b9a7-323102b19371@github.com> > This patch refines Parallel's sizing strategy to improve overall memory management and performance. > > The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. > > `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. > > GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. > > ## Performance evaluation > > - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). > - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). > - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. > > PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - merge - merge-fix - merge - Merge branch 'master' into pgc-size-policy - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - review - ... and 2 more: https://git.openjdk.org/jdk/compare/83cb0c6d...08bc74e1 ------------- Changes: https://git.openjdk.org/jdk/pull/25000/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=08 Stats: 4375 lines in 31 files changed: 522 ins; 3454 del; 399 mod Patch: https://git.openjdk.org/jdk/pull/25000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25000/head:pull/25000 PR: https://git.openjdk.org/jdk/pull/25000 From mgronlun at openjdk.org Mon Jun 2 11:06:31 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 11:06:31 GMT Subject: RFR: 8357962: JFR Cooperative Sampling reveals inconsistent interpreter frames as part of JVMTI PopFrame [v2] In-Reply-To: References: Message-ID: > Greetings, > > Please see the JIRA issue for a detailed description. > > Fix only applies to platforms that issue a save_bcp() as part of InterpreterMacroAssembler::unlock_object(). > > Testing: jdk_jfr, JVMTI PopFrame tests > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: more precise comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25571/files - new: https://git.openjdk.org/jdk/pull/25571/files/b48c0635..70f75414 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25571&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25571&range=00-01 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25571.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25571/head:pull/25571 PR: https://git.openjdk.org/jdk/pull/25571 From mgronlun at openjdk.org Mon Jun 2 11:26:00 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 11:26:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: <45mCuuxToelhOdhbJlap5NCUMfgDBrVGIUDGJHAk2Rg=.1dd9d5a6-f2b5-4214-8815-d0a9f0cbddbb@github.com> On Mon, 2 Jun 2025 08:58:28 GMT, Johannes Bechberger wrote: >> How so? > > Because we need to add the threads from the signal handler. So any kind of growing array or set would not work, especially if we want to remove the threads from within the signal handler again. > > This is certainly an area of future optimization, albeit this doesn't seem to have any measurable performance impact in my renaissance benchmark runs. I don't understand what allocation has to do with anything. I'm talking about code branch layout to avoid having to test "has_cpu_time_jfr_requests()" when we know it will be false by default. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120846868 From mgronlun at openjdk.org Mon Jun 2 11:28:59 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 11:28:59 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: <7Cy88EZJj1ZgHXaAoCY9m1PnB6UAGDJxgK9PI3BVYBQ=.a4fbad7a-19fa-4e1e-999e-8773d2fd7fb1@github.com> On Mon, 2 Jun 2025 09:02:05 GMT, Johannes Bechberger wrote: >> I see. With a bounded queue as used in this solution, it can work quite nicely, that is, if the thread is actually on CPU in native, and just not waiting - if waiting (which is most likely) then pending requests could take a long time to be sent to consumers. >> >> I also understand better the optimization you tried as part of async walk in native and frames. Also quite nice, to walk from the last JfrSampleRequest and do equals to "batch" the top JFR sample requests that are the same (i,.e taken for the ljf). Maybe you can retry that again, but then you need to save the sid AND the tid to be reused for the top equal requests (you only need stacktrace.record_inner() for one request). Its a nice optimization. > > The problem is when in between queue processing a new JFR chunk is started. This caused problems before. > > I would leave these kinds of optimizations for later. Then I would recommend you drain immediately when the thread is in native, not waiting for the queue to fill up to 2/3. The reason is because the solution is based on CPU time samples and most threads that are _thread_in_native are waiting (i.e. they will not get their queues filled while in native). I would recommend dropping the second clause about testing the queue size altogether. That way you will not get threads stuck with a lot of events a long time in native, not being delivered. Revive it later when you begin to attack the optimizations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120855119 From jbechberger at openjdk.org Mon Jun 2 11:32:27 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 11:32:27 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v27] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - Tiny fixes - Minor changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/439763a3..6a83d759 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=25-26 Stats: 90 lines in 9 files changed: 24 ins; 29 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From jbechberger at openjdk.org Mon Jun 2 11:40:00 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 11:40:00 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: <45mCuuxToelhOdhbJlap5NCUMfgDBrVGIUDGJHAk2Rg=.1dd9d5a6-f2b5-4214-8815-d0a9f0cbddbb@github.com> References: <45mCuuxToelhOdhbJlap5NCUMfgDBrVGIUDGJHAk2Rg=.1dd9d5a6-f2b5-4214-8815-d0a9f0cbddbb@github.com> Message-ID: On Mon, 2 Jun 2025 11:22:45 GMT, Markus Gr?nlund wrote: >> Because we need to add the threads from the signal handler. So any kind of growing array or set would not work, especially if we want to remove the threads from within the signal handler again. >> >> This is certainly an area of future optimization, albeit this doesn't seem to have any measurable performance impact in my renaissance benchmark runs. > > I don't understand what allocation has to do with anything. I'm talking about code branch layout to avoid having to test "has_cpu_time_jfr_requests()" when we know it will be false by default. Ah. Sorry. Is it about reading the atomic boolean flag again? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120882396 From mgronlun at openjdk.org Mon Jun 2 11:40:02 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 11:40:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v27] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 11:32:27 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Tiny fixes > - Minor changes src/hotspot/share/runtime/thread.hpp line 59: > 57: class SafeThreadsListPtr; > 58: class ThreadClosure; > 59: class ThreadCrashProtection; Should not be needed. src/jdk.jfr/share/classes/jdk/jfr/internal/JVM.java line 276: > 274: * Set the maximum event emission rate for the CPU time sampler > 275: * > 276: * Setting rate to 0 turns off the CPU time method sampler. "CPU time method sampler" -> "CPU time sampler" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120878701 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120882161 From jbechberger at openjdk.org Mon Jun 2 11:51:26 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 11:51:26 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v28] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with three additional commits since the last revision: - Remove header includes - Always trigger async processing - Remove one atomic read ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/6a83d759..e482ad37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=26-27 Stats: 21 lines in 6 files changed: 3 ins; 6 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From mgronlun at openjdk.org Mon Jun 2 11:51:27 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 11:51:27 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v27] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 11:32:27 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Tiny fixes > - Minor changes src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 35: > 33: > 34: #include "jfr/recorder/jfrRecorder.hpp" > 35: #include "jfr/recorder/service/jfrRecorderService.hpp" The two includes above are not needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120890097 From mgronlun at openjdk.org Mon Jun 2 11:51:27 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 11:51:27 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: <45mCuuxToelhOdhbJlap5NCUMfgDBrVGIUDGJHAk2Rg=.1dd9d5a6-f2b5-4214-8815-d0a9f0cbddbb@github.com> Message-ID: On Mon, 2 Jun 2025 11:37:23 GMT, Johannes Bechberger wrote: >> I don't understand what allocation has to do with anything. I'm talking about code branch layout to avoid having to test "has_cpu_time_jfr_requests()" when we know it will be false by default. > > Ah. Sorry. Is it about reading the atomic boolean flag again? Right. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120897042 From jbechberger at openjdk.org Mon Jun 2 11:51:27 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 11:51:27 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: <45mCuuxToelhOdhbJlap5NCUMfgDBrVGIUDGJHAk2Rg=.1dd9d5a6-f2b5-4214-8815-d0a9f0cbddbb@github.com> Message-ID: On Mon, 2 Jun 2025 11:43:54 GMT, Markus Gr?nlund wrote: >> Ah. Sorry. Is it about reading the atomic boolean flag again? > > Right. I pass it through now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120906973 From coleenp at openjdk.org Mon Jun 2 11:54:00 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 2 Jun 2025 11:54:00 GMT Subject: RFR: 8358205: Remove unused JFR array allocation code In-Reply-To: <4iPujAp0lL_pVhcjlfMX42dIqE7Aw5X8FZr2k5cSFGo=.139bdd20-c798-4335-9ebd-cf0748e7d339@github.com> References: <4iPujAp0lL_pVhcjlfMX42dIqE7Aw5X8FZr2k5cSFGo=.139bdd20-c798-4335-9ebd-cf0748e7d339@github.com> Message-ID: On Fri, 30 May 2025 18:10:07 GMT, Coleen Phillimore wrote: > The JFR code is using ObjArray->allocate() directly rather than going through oopFactory. In Valhalla, the oopFactory code is being changed to account for new array shapes and attributes, so all code should call that instead. Turns out this function is unused, so this change removes it. Tested with tier1-7 with a ShouldNotReachHere(), then jdk/jfr tests with the removal. Thank you for reviewing, Kim and Markus. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25553#issuecomment-2930287718 From coleenp at openjdk.org Mon Jun 2 11:54:00 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 2 Jun 2025 11:54:00 GMT Subject: Integrated: 8358205: Remove unused JFR array allocation code In-Reply-To: <4iPujAp0lL_pVhcjlfMX42dIqE7Aw5X8FZr2k5cSFGo=.139bdd20-c798-4335-9ebd-cf0748e7d339@github.com> References: <4iPujAp0lL_pVhcjlfMX42dIqE7Aw5X8FZr2k5cSFGo=.139bdd20-c798-4335-9ebd-cf0748e7d339@github.com> Message-ID: On Fri, 30 May 2025 18:10:07 GMT, Coleen Phillimore wrote: > The JFR code is using ObjArray->allocate() directly rather than going through oopFactory. In Valhalla, the oopFactory code is being changed to account for new array shapes and attributes, so all code should call that instead. Turns out this function is unused, so this change removes it. Tested with tier1-7 with a ShouldNotReachHere(), then jdk/jfr tests with the removal. This pull request has now been integrated. Changeset: c22af0c2 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/c22af0c29ea89857c5cf57dd127b5c739130b2f1 Stats: 50 lines in 5 files changed: 0 ins; 45 del; 5 mod 8358205: Remove unused JFR array allocation code Reviewed-by: kbarrett, mgronlun ------------- PR: https://git.openjdk.org/jdk/pull/25553 From eosterlund at openjdk.org Mon Jun 2 12:23:51 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Jun 2025 12:23:51 GMT Subject: RFR: 8358310: ZGC: riscv, ppc ZPlatformAddressOffsetBits may return a too large value In-Reply-To: <6j_zozeh-Vwu3tRHRlJ5h_mhcMFsNm_OMUinAosz8fU=.d51c8c95-aad1-4566-a23b-8da5b521aa90@github.com> References: <6j_zozeh-Vwu3tRHRlJ5h_mhcMFsNm_OMUinAosz8fU=.d51c8c95-aad1-4566-a23b-8da5b521aa90@github.com> Message-ID: On Mon, 2 Jun 2025 08:55:02 GMT, Axel Boldt-Christmas wrote: > The way that ZPlatformAddressOffsetBits is implemented on riscv and ppc may result in a return value of 45. This is larger than the max supported value of 44 (because of other internal data structures). This was fixed in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275) for aarch64. > > Before [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the issue on manifested if one tried to select a heap larger than 16 TB (not supported), but after [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we try to double the heap address space when running on a NUMA machine. So we may now encounter this bug for heaps larger than 8TB (which is supported). > > While ZPlatformAddressOffsetBits needs an overhaul. (It was written for non-generational ZGC where we had the three color bits inside the address.) The proposal is that we solve this for ppc and riscv by doing the same thing we did for aarch64 in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275) Looks reasonable. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25578#pullrequestreview-2888145463 From eosterlund at openjdk.org Mon Jun 2 12:28:58 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Jun 2025 12:28:58 GMT Subject: Integrated: 8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent In-Reply-To: References: Message-ID: On Wed, 28 May 2025 08:49:17 GMT, Erik ?sterlund wrote: > The optimized fast_aputfield bytecode on AArch64 stores the field flags in r3, and performs the leading and trailing fencing depending on its volatile bit being set or not. However, r3 is also the last temp register passed in to the barrier set for reference stores, and G1 clobbers it in a way that may clear the volatile bit. Then the trailing fence won't get executed, and sequential consistency is broken. > > My fix puts the flags in r5 instead, which is the register that was used by normal aputfield bytecodes. This way, barriers don't clobber the volatile bits. > > This bug has been observed to mess up a classic Dekker duality in the java.util.concurrent.Exchanger class, leading to a hang in the test/jdk/java/util/concurrent/Exchanger/ExchangeLoops.java test that exercises it. Using G1 and -Xint a reproducer hangs 30/100 times in mach5. With the fix, the same reproducer hangs 0/100 times. This pull request has now been integrated. Changeset: 83b15da2 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/83b15da2eb3cb6c8937f517c9b75eaa9eeece314 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod 8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent Reviewed-by: shade, aph, fbredberg ------------- PR: https://git.openjdk.org/jdk/pull/25483 From rvansa at openjdk.org Mon Jun 2 13:09:31 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Jun 2025 13:09:31 GMT Subject: RFR: 8352075: Perf regression accessing fields [v18] In-Reply-To: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: <5wG8n_0XjBYjFprdBfdLMIj17sBHnJEtPdBdbi-5yxg=.6896113b-ef76-4a5b-973c-3c286554205f@github.com> > This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . > > This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). > > In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. > > My measurements on the attached reproducer > > hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC > Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] > Range (min ? max): 45.1 ms ? 53.9 ms 100 runs > > hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC > Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] > Range (min ? max): 73.8 ms ? 79.7 ms 100 runs > > (the jdk25-master above already contains JDK-8353175) > > hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC > Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] > Range (min ? max): 37.7 ms ? 42.1 ms 100 runs > > While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: > > JDK 17: 1.6 s > JDK 21 (no patches): 22 s > JDK25-master: 12.3 s > JDK25-this-pr: 0.5 s Radim Vansa has updated the pull request incrementally with two additional commits since the last revision: - Rename pivot -> key, payload -> value, add comments - Add gtest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24847/files - new: https://git.openjdk.org/jdk/pull/24847/files/c592ea59..456e1505 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=16-17 Stats: 193 lines in 4 files changed: 131 ins; 5 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/24847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24847/head:pull/24847 PR: https://git.openjdk.org/jdk/pull/24847 From shade at openjdk.org Mon Jun 2 13:10:58 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 13:10:58 GMT Subject: Integrated: 8357481: Excessive CompileTask wait/notify monitor creation In-Reply-To: References: Message-ID: On Wed, 21 May 2025 18:40:24 GMT, Aleksey Shipilev wrote: > See bug for rationale. > > This PR implements the 2nd solution from the bug: lift the lock to be global. As described in the bug, excess locking work would realistically affect Xcomp, and only in a minor way. But we will reap a minor footprint/latency benefit by not constructing the lock for every `CompileTask`. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler` > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` This pull request has now been integrated. Changeset: b3594c9e Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/b3594c9e5508101a39d10099830f04b0c09ad41f Stats: 26 lines in 5 files changed: 5 ins; 10 del; 11 mod 8357481: Excessive CompileTask wait/notify monitor creation Reviewed-by: vlivanov, kvn ------------- PR: https://git.openjdk.org/jdk/pull/25364 From shade at openjdk.org Mon Jun 2 13:10:57 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 13:10:57 GMT Subject: RFR: 8357481: Excessive CompileTask wait/notify monitor creation [v3] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 19:27:46 GMT, Aleksey Shipilev wrote: >> See bug for rationale. >> >> This PR implements the 2nd solution from the bug: lift the lock to be global. As described in the bug, excess locking work would realistically affect Xcomp, and only in a minor way. But we will reap a minor footprint/latency benefit by not constructing the lock for every `CompileTask`. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler` >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8357481-compile-task-lock > - Merge branch 'master' into JDK-8357481-compile-task-lock > - Fix Thanks for testing! I remerged locally with current master, ran `tier1` and `compiler` tests, and there are no troubles. So I am integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25364#issuecomment-2930637724 From rvansa at openjdk.org Mon Jun 2 13:19:50 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Jun 2025 13:19:50 GMT Subject: RFR: 8352075: Perf regression accessing fields [v19] In-Reply-To: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: > This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . > > This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). > > In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. > > My measurements on the attached reproducer > > hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC > Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] > Range (min ? max): 45.1 ms ? 53.9 ms 100 runs > > hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC > Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] > Range (min ? max): 73.8 ms ? 79.7 ms 100 runs > > (the jdk25-master above already contains JDK-8353175) > > hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC > Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] > Range (min ? max): 37.7 ms ? 42.1 ms 100 runs > > While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: > > JDK 17: 1.6 s > JDK 21 (no patches): 22 s > JDK25-master: 12.3 s > JDK25-this-pr: 0.5 s Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Add gtests for number of bytes used ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24847/files - new: https://git.openjdk.org/jdk/pull/24847/files/456e1505..e214a8ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=17-18 Stats: 36 lines in 1 file changed: 35 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24847/head:pull/24847 PR: https://git.openjdk.org/jdk/pull/24847 From rvansa at openjdk.org Mon Jun 2 13:31:57 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Jun 2025 13:31:57 GMT Subject: RFR: 8352075: Perf regression accessing fields [v19] In-Reply-To: References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: On Mon, 2 Jun 2025 13:19:50 GMT, Radim Vansa wrote: >> This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . >> >> This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). >> >> In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. >> >> My measurements on the attached reproducer >> >> hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC >> Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] >> Range (min ? max): 45.1 ms ? 53.9 ms 100 runs >> >> hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC >> Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] >> Range (min ? max): 73.8 ms ? 79.7 ms 100 runs >> >> (the jdk25-master above already contains JDK-8353175) >> >> hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC >> Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] >> Range (min ? max): 37.7 ms ? 42.1 ms 100 runs >> >> While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: >> >> JDK 17: 1.6 s >> JDK 21 (no patches): 22 s >> JDK25-master: 12.3 s >> JDK25-this-pr: 0.5 s > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Add gtests for number of bytes used Fixed the CI failure, and added a gtest for all allowed bit widths and sizes of table from 0 to 99 and 10000. For better testability and reusability (how do I allocate the Array without a classloader?) I've replaced this with pointer + length argument. @rose00 While your suggestion makes sense, when there's a working implementation I would leave it this way for now and leave reading with a different offset up for future improvement: we can have a microbenchmark that would justify this. I would guess that CPU caches would hide multiple memory accesses, and the loop would be unrolled (maybe to even form 4-byte access instead of 4 1-byte...). Also when not using `Array` we can no longer rely on having the 4-byte header. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24847#issuecomment-2930732550 From jbechberger at openjdk.org Mon Jun 2 13:50:49 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 13:50:49 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Fix bug related to async stack walking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/e482ad37..09ca4fed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=27-28 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From mdoerr at openjdk.org Mon Jun 2 13:56:31 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Jun 2025 13:56:31 GMT Subject: RFR: 8358013: [PPC64] VSX has poor performance on Power8 [v4] In-Reply-To: <6lRLaDtZkFd5zdOobo1RnSODoZk3r7T-sgjfpcnUVwU=.ad525055-0f15-4866-a295-20e2183eaf7b@github.com> References: <6lRLaDtZkFd5zdOobo1RnSODoZk3r7T-sgjfpcnUVwU=.ad525055-0f15-4866-a295-20e2183eaf7b@github.com> Message-ID: > Power8 only has limited VSX instructions for the superword optimization and the Vector API and the performance is bad. Let's only use it on Power9 and newer by default. This change excludes the VSX registers from C2 register allocation for Power8. VSX instruction usage gets limited to a few places like intrinsics. > > Note: Power8 is an old processor and performance optimizations for it are no longer planned. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Disable some IR rules for Power8. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25514/files - new: https://git.openjdk.org/jdk/pull/25514/files/599a4f36..2014cb21 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25514&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25514&range=02-03 Stats: 4 lines in 2 files changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25514.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25514/head:pull/25514 PR: https://git.openjdk.org/jdk/pull/25514 From mdoerr at openjdk.org Mon Jun 2 14:14:29 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Jun 2025 14:14:29 GMT Subject: RFR: 8358013: [PPC64] VSX has poor performance on Power8 [v5] In-Reply-To: <6lRLaDtZkFd5zdOobo1RnSODoZk3r7T-sgjfpcnUVwU=.ad525055-0f15-4866-a295-20e2183eaf7b@github.com> References: <6lRLaDtZkFd5zdOobo1RnSODoZk3r7T-sgjfpcnUVwU=.ad525055-0f15-4866-a295-20e2183eaf7b@github.com> Message-ID: > Power8 only has limited VSX instructions for the superword optimization and the Vector API and the performance is bad. Let's only use it on Power9 and newer by default. This change excludes the VSX registers from C2 register allocation for Power8. VSX instruction usage gets limited to a few places like intrinsics. > > Note: Power8 is an old processor and performance optimizations for it are no longer planned. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Beautify @requires statement. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25514/files - new: https://git.openjdk.org/jdk/pull/25514/files/2014cb21..77da0573 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25514&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25514&range=03-04 Stats: 4 lines in 1 file changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25514.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25514/head:pull/25514 PR: https://git.openjdk.org/jdk/pull/25514 From mgronlun at openjdk.org Mon Jun 2 14:52:03 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 14:52:03 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:50:49 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug related to async stack walking src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 349: > 347: const frame top_frame = thread->last_frame(); > 348: bool in_continuation = is_in_continuation(top_frame, thread); > 349: for (u4 i = 0; i < queue.size(); i++) { Realized this drainage is entirely wrong! You are not using the sample requests in the queue to build individual stack traces for events; instead, you are using the same top frame (the last Java frame) for all of them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121391177 From jbechberger at openjdk.org Mon Jun 2 15:04:13 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 15:04:13 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 14:57:22 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 349: >> >>> 347: const frame top_frame = thread->last_frame(); >>> 348: bool in_continuation = is_in_continuation(top_frame, thread); >>> 349: for (u4 i = 0; i < queue.size(); i++) { >> >> Realized this drainage is entirely wrong! >> >> You are not using the sample requests in the queue to build individual stack traces for events; instead, you are using the same top frame (the last Java frame) for all of them. > > Can I export compute_top_frame and use it here? Or just create a `Jfr::drain_cpu_time_queue` method? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121426469 From jbechberger at openjdk.org Mon Jun 2 15:04:13 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Mon, 2 Jun 2025 15:04:13 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 14:57:47 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bug related to async stack walking > > src/hotspot/share/jfr/jfr.inline.hpp line 41: > >> 39: inline void Jfr::check_and_process_sample_request(JavaThread* jt) { >> 40: JfrThreadLocal* tl = jt->jfr_thread_local(); >> 41: bool has_cpu_time_sample_request = tl->has_cpu_time_jfr_requests(); > > Why this change? So I don't read the ` tl->has_cpu_time_jfr_requests()` twice on the hot-path > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 349: > >> 347: const frame top_frame = thread->last_frame(); >> 348: bool in_continuation = is_in_continuation(top_frame, thread); >> 349: for (u4 i = 0; i < queue.size(); i++) { > > Realized this drainage is entirely wrong! > > You are not using the sample requests in the queue to build individual stack traces for events; instead, you are using the same top frame (the last Java frame) for all of them. Can I export compute_top_frame and use it here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121424752 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121413413 From mgronlun at openjdk.org Mon Jun 2 15:04:12 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 15:04:12 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:50:49 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug related to async stack walking src/hotspot/share/jfr/jfr.inline.hpp line 41: > 39: inline void Jfr::check_and_process_sample_request(JavaThread* jt) { > 40: JfrThreadLocal* tl = jt->jfr_thread_local(); > 41: bool has_cpu_time_sample_request = tl->has_cpu_time_jfr_requests(); Why this change? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 191: > 189: void sample_thread(JfrSampleRequest& request, void* ucontext, JavaThread* jt, JfrThreadLocal* tl); > 190: > 191: // process the queues for all threads that are in native state (and requested to be sampled) "requested to be processed" I guess. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 270: > 268: void JfrCPUTimeThreadSampler::enroll() { > 269: if (Atomic::cmpxchg(&_disenrolled, true, false)) { > 270: log_info(jfr)("Enrolling CPU thread sampler"); log_trace, please. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 279: > 277: void JfrCPUTimeThreadSampler::disenroll() { > 278: if (!Atomic::cmpxchg(&_disenrolled, false, true)) { > 279: log_info(jfr)("Disenrolling CPU thread sampler"); log_trace, please. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121414317 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121416556 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121426574 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121428073 From mgronlun at openjdk.org Mon Jun 2 15:12:04 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 15:12:04 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: <_CAWRT6nKdljf9SDRnD-SfdXP9L9S6Y9f6I1nGB-4q8=.eb524157-7c00-4f01-8d8a-9e9c60ef4dc7@github.com> On Mon, 2 Jun 2025 15:01:39 GMT, Johannes Bechberger wrote: >> Can I export compute_top_frame and use it here? > > Or just create a `Jfr::drain_cpu_time_queue` method? Try to move the entire: void JfrCPUTimeThreadSampler::stackwalk_thread_in_native(JavaThread* thread) { } Into JfrThreadSampling.hpp / jfrThreadSampling.cpp - you can send your JfrCPUTimeThreadSampler events from there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121456081 From matsaave at openjdk.org Mon Jun 2 15:14:32 2025 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 2 Jun 2025 15:14:32 GMT Subject: RFR: 8357576: FieldInfo::_index is not initialized by the constructor [v2] In-Reply-To: <_9Nvx68w_0Ly5NgPGzGci6Uf9Si0AM1N3eQ_e-5hBR8=.1f055ae3-8cd7-4ae2-ae17-3722dc4b7427@github.com> References: <_9Nvx68w_0Ly5NgPGzGci6Uf9Si0AM1N3eQ_e-5hBR8=.1f055ae3-8cd7-4ae2-ae17-3722dc4b7427@github.com> Message-ID: > FieldInfo::_index is not initialized in either of the FieldInfo constructors so this patch adds initialization to both constructors. Verified with tier 1-5 tests Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Updated copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25554/files - new: https://git.openjdk.org/jdk/pull/25554/files/e059e29a..c40a1222 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25554&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25554&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25554.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25554/head:pull/25554 PR: https://git.openjdk.org/jdk/pull/25554 From mgronlun at openjdk.org Mon Jun 2 15:18:11 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 15:18:11 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 15:01:15 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/jfr.inline.hpp line 41: >> >>> 39: inline void Jfr::check_and_process_sample_request(JavaThread* jt) { >>> 40: JfrThreadLocal* tl = jt->jfr_thread_local(); >>> 41: bool has_cpu_time_sample_request = tl->has_cpu_time_jfr_requests(); >> >> Why this change? > > So I don't read the ` tl->has_cpu_time_jfr_requests()` twice on the hot-path Ok, for now. We should try to come up with a better split. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121466027 From mgronlun at openjdk.org Mon Jun 2 15:18:12 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 15:18:12 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: <_CAWRT6nKdljf9SDRnD-SfdXP9L9S6Y9f6I1nGB-4q8=.eb524157-7c00-4f01-8d8a-9e9c60ef4dc7@github.com> References: <_CAWRT6nKdljf9SDRnD-SfdXP9L9S6Y9f6I1nGB-4q8=.eb524157-7c00-4f01-8d8a-9e9c60ef4dc7@github.com> Message-ID: On Mon, 2 Jun 2025 15:09:30 GMT, Markus Gr?nlund wrote: >> Or just create a `Jfr::drain_cpu_time_queue` method? > > Try to move the entire: > > void JfrCPUTimeThreadSampler::stackwalk_thread_in_native(JavaThread* thread) { > } > > Into JfrThreadSampling.hpp / jfrThreadSampling.cpp - you can send your JfrCPUTimeThreadSampler events from there. Of course, rename the routine to something appropriate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121469433 From coleenp at openjdk.org Mon Jun 2 15:19:56 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 2 Jun 2025 15:19:56 GMT Subject: RFR: 8357576: FieldInfo::_index is not initialized by the constructor [v2] In-Reply-To: References: <_9Nvx68w_0Ly5NgPGzGci6Uf9Si0AM1N3eQ_e-5hBR8=.1f055ae3-8cd7-4ae2-ae17-3722dc4b7427@github.com> Message-ID: On Mon, 2 Jun 2025 15:14:32 GMT, Matias Saavedra Silva wrote: >> FieldInfo::_index is not initialized in either of the FieldInfo constructors so this patch adds initialization to both constructors. Verified with tier 1-5 tests > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Updated copyright Looks good! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25554#pullrequestreview-2888908353 From mgronlun at openjdk.org Mon Jun 2 15:22:03 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 15:22:03 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:50:49 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug related to async stack walking src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp line 250: > 248: break; > 249: } else { > 250: biased = false; Not correct. There is a top_frame = *current - >biased = true below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121482514 From rvansa at openjdk.org Mon Jun 2 15:31:46 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Mon, 2 Jun 2025 15:31:46 GMT Subject: RFR: 8352075: Perf regression accessing fields [v20] In-Reply-To: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: > This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . > > This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). > > In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. > > My measurements on the attached reproducer > > hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC > Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] > Range (min ? max): 45.1 ms ? 53.9 ms 100 runs > > hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC > Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] > Range (min ? max): 73.8 ms ? 79.7 ms 100 runs > > (the jdk25-master above already contains JDK-8353175) > > hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC > Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] > Range (min ? max): 37.7 ms ? 42.1 ms 100 runs > > While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: > > JDK 17: 1.6 s > JDK 21 (no patches): 22 s > JDK25-master: 12.3 s > JDK25-this-pr: 0.5 s Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: Fix error on windows ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24847/files - new: https://git.openjdk.org/jdk/pull/24847/files/e214a8ec..7d8b4a19 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=18-19 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24847/head:pull/24847 PR: https://git.openjdk.org/jdk/pull/24847 From matsaave at openjdk.org Mon Jun 2 15:31:58 2025 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 2 Jun 2025 15:31:58 GMT Subject: RFR: 8357576: FieldInfo::_index is not initialized by the constructor [v2] In-Reply-To: References: <_9Nvx68w_0Ly5NgPGzGci6Uf9Si0AM1N3eQ_e-5hBR8=.1f055ae3-8cd7-4ae2-ae17-3722dc4b7427@github.com> Message-ID: On Sat, 31 May 2025 03:19:19 GMT, SendaoYan wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated copyright > > Should we update the copyright year to 2025 Thank you @sendaoYan @coleenp and @dholmes-ora for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25554#issuecomment-2931262041 From matsaave at openjdk.org Mon Jun 2 15:31:59 2025 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 2 Jun 2025 15:31:59 GMT Subject: Integrated: 8357576: FieldInfo::_index is not initialized by the constructor In-Reply-To: <_9Nvx68w_0Ly5NgPGzGci6Uf9Si0AM1N3eQ_e-5hBR8=.1f055ae3-8cd7-4ae2-ae17-3722dc4b7427@github.com> References: <_9Nvx68w_0Ly5NgPGzGci6Uf9Si0AM1N3eQ_e-5hBR8=.1f055ae3-8cd7-4ae2-ae17-3722dc4b7427@github.com> Message-ID: On Fri, 30 May 2025 19:07:24 GMT, Matias Saavedra Silva wrote: > FieldInfo::_index is not initialized in either of the FieldInfo constructors so this patch adds initialization to both constructors. Verified with tier 1-5 tests This pull request has now been integrated. Changeset: 1b6ae205 Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/1b6ae2059b0475ec78559d2d6612f3b6ec68309f Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8357576: FieldInfo::_index is not initialized by the constructor Reviewed-by: coleenp, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/25554 From dnsimon at openjdk.org Mon Jun 2 15:53:02 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 2 Jun 2025 15:53:02 GMT Subject: RFR: 8358254: [AOT] runtime/cds/appcds/applications/JavacBench.java#aot crashes with SEGV in ClassLoaderData::holder Message-ID: JVMCI needs to be aware of unloaded classes in type profiles just like [CI does](https://github.com/openjdk/jdk/pull/24886/files#diff-cda53c3ed39c4e59f73f3298933ebed1912daeaf854f0b31f40332be109f6c30R317). ------------- Commit messages: - support unloaded classes in type profiles in AOT mode - convert RawItemProfile to a record Changes: https://git.openjdk.org/jdk/pull/25592/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25592&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358254 Stats: 33 lines in 4 files changed: 13 ins; 16 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25592.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25592/head:pull/25592 PR: https://git.openjdk.org/jdk/pull/25592 From duke at openjdk.org Mon Jun 2 16:36:56 2025 From: duke at openjdk.org (Mohamed Issa) Date: Mon, 2 Jun 2025 16:36:56 GMT Subject: RFR: 8358231: Template interpreter generator crashes with ShouldNotReachHere on some platforms after 8353686 [v2] In-Reply-To: References: <8Xnq0jvMBRkxOk4-gheVgeDGuIPhXXlZ8Yt-NO3izhQ=.2ff06a32-52b1-4829-9c19-0106ef733399@github.com> Message-ID: On Sun, 1 Jun 2025 17:11:05 GMT, Martin Doerr wrote: >> Trivial build fix for PPC64 and s390. Added arm32. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add arm32 fix. @eme64 @dholmes-ora This resolves the crash discussed in #24470. @TheRealMDoerr Thank you for this. I was about to create a PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25568#issuecomment-2931518930 From duke at openjdk.org Mon Jun 2 17:04:07 2025 From: duke at openjdk.org (Mohamed Issa) Date: Mon, 2 Jun 2025 17:04:07 GMT Subject: RFR: 8353686: Optimize Math.cbrt for x86 64 bit platforms [v6] In-Reply-To: References: <1NsI0OGP9RcnbEwlJwDj1dZ3w7zCP4DxJhEmO1quSgo=.3b1e3da9-9aa9-4221-a73a-e2f3ec5f456b@github.com> Message-ID: On Mon, 2 Jun 2025 04:42:03 GMT, David Holmes wrote: > > When you say "most of the non-x86 platforms", are you referring to the ones with processor types listed below? > > Yes - 3 of the 5 non-x86 platforms. > > > It looks like aarch64 and riscv don't take that route and would fall back to the default cbrt implementation. > > I was wondering why Aarch64 didn't fail. I guess the other platforms may use this to detect new intrinsics being added. The arm, ppc, and s390 breaks are resolved by #25568. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24470#issuecomment-2931613343 From mgronlun at openjdk.org Mon Jun 2 17:29:04 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 17:29:04 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:50:49 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug related to async stack walking src/hotspot/share/memory/resourceArea.hpp line 46: > 44: // A ResourceArea is an Arena that supports safe usage of ResourceMark. > 45: class ResourceArea: public Arena { > 46: Changes in this file are unrelated, so revert this entire file. src/hotspot/share/prims/forte.cpp line 575: > 573: extern "C" { > 574: JNIEXPORT > 575: void AsyncGetCallTrace(ASGCT_CallTrace *trace, jint depth, void* ucontext) { Unrelated changes, please revert file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121757461 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121757998 From dnsimon at openjdk.org Mon Jun 2 18:21:53 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 2 Jun 2025 18:21:53 GMT Subject: RFR: 8358254: [AOT] runtime/cds/appcds/applications/JavacBench.java#aot crashes with SEGV in ClassLoaderData::holder In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 15:47:29 GMT, Doug Simon wrote: > JVMCI needs to be aware of unloaded classes in type profiles just like [CI does](https://github.com/openjdk/jdk/pull/24886/files#diff-cda53c3ed39c4e59f73f3298933ebed1912daeaf854f0b31f40332be109f6c30R317). src/hotspot/share/oops/trainingData.hpp line 286: > 284: static bool assembling_data() { return have_data() && CDSConfig::is_dumping_final_static_archive() && CDSConfig::is_dumping_aot_linked_classes(); } > 285: > 286: static bool is_klass_loaded(Klass* k) { This code was moved unmodified from ciMethodData.cpp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25592#discussion_r2121856022 From shade at openjdk.org Mon Jun 2 18:46:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 18:46:02 GMT Subject: RFR: 8358339: Handle MethodCounters::_method backlinks after JDK-8355003 Message-ID: <0FmJVTYdAq7vmsOi4pi9NRKHm5MfmIrotPucldzsZj4=.b1335bca-f439-4c18-aa1c-6c69548d095d@github.com> Found this when reading mainline-vs-premain webrev. [JDK-8355003](https://bugs.openjdk.org/browse/JDK-8355003) introduced a backlink to `Method*` in `MethodCounters`. I believe we need to handle that backlink at least in `CodeBuffer::finalize_oop_references()`. premain does this, while mainline does not. Also, amusingly, we have `MethodCounters::is_methodCounters`, but not the super-class `Metadata::is_methodCounters`. I pulled in the hunks that use `is_methodCounters()` and `MethodCounters::method()` from premain into this PR. Additional testing: - [x] Linux x86_64 server fastdebug, `runtime/cds` - [ ] Linux x86_64 server fastdebug, `tier1` - [ ] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/25599/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25599&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358339 Stats: 10 lines in 3 files changed: 10 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25599.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25599/head:pull/25599 PR: https://git.openjdk.org/jdk/pull/25599 From shade at openjdk.org Mon Jun 2 18:47:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 18:47:28 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:50:49 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug related to async stack walking Scanned this briefly, would do another pass tomorrow. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 117: > 115: > 116: bool JfrCPUTimeTraceQueue::is_empty() const { > 117: return Atomic::load(&_head) == 0; Not entirely clear what is the memory semantics for accessing `_head`. Does it need to be acq/rel? If so, this one should be `::load_acquire`? src/hotspot/share/memory/resourceArea.hpp line 46: > 44: // A ResourceArea is an Arena that supports safe usage of ResourceMark. > 45: class ResourceArea: public Arena { > 46: All the changes in this file are unnecessary, please revert. src/jdk.jfr/share/classes/jdk/jfr/internal/JVM.java line 281: > 279: * @param autoadapt true if the rate should be adapted automatically > 280: */ > 281: public static native void setCPUThrottle(double rate, boolean autoadapt); Suggestion: public static native void setCPUThrottle(double rate, boolean autoAdapt); test/jdk/jdk/jfr/event/profiling/TestSamplingLongPeriod.java line 42: > 40: public class TestSamplingLongPeriod { > 41: > 42: static String sampleEvent = EventNames.ExecutionSample; Does not look necessary to change? ------------- PR Review: https://git.openjdk.org/jdk/pull/25302#pullrequestreview-2888004951 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121900364 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121610476 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121587105 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2121584954 From shade at openjdk.org Mon Jun 2 18:47:28 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 18:47:28 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v24] In-Reply-To: References: <-QiSWEqppeW60aedVbLA3WTmnba7Fry53Qr86wE2EPs=.7a6327ce-7ef0-4b1c-bc68-0421ba3fd46f@github.com> Message-ID: On Sun, 1 Jun 2025 07:19:54 GMT, Johannes Bechberger wrote: >> Thanks for catching this mistake. I'll fix it this afternoon. > > I fixed it by changing the JEP. Hold on, shouldn't this really be "Lost"? @egahlin and @mgronlun need to chime in here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120893338 From shade at openjdk.org Mon Jun 2 18:47:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 18:47:30 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v27] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 11:32:27 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Tiny fixes > - Minor changes src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 30: > 28: #include "runtime/orderAccess.hpp" > 29: #include "utilities/ticks.hpp" > 30: #include "jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp" Include order? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 60: > 58: assert(raw_thread->is_Java_thread(), "invariant"); > 59: JavaThread* jt; > 60: if ((jt = JavaThread::cast(raw_thread))->is_exiting()) { I see no point to be extra-smart with inline assignments here: Suggestion: JavaThread* jt = JavaThread::cast(raw_thread); if (jt->is_exiting()) { src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 115: > 113: JfrCPUTimeSampleRequest* new_data = JfrCHeapObj::new_array(capacity); > 114: JfrCHeapObj::free(_data, _capacity * sizeof(JfrCPUTimeSampleRequest)); > 115: _data = new_data; A bit of peak memory consumption improvement: don't have two things live at once. Plus, give the native allocator a chance to reuse the same location. Suggestion: JfrCHeapObj::free(_data, _capacity * sizeof(JfrCPUTimeSampleRequest)); _data = JfrCHeapObj::new_array(capacity); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120895107 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120897472 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2120909443 From shade at openjdk.org Mon Jun 2 18:51:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Jun 2025 18:51:51 GMT Subject: RFR: 8358339: Handle MethodCounters::_method backlinks after JDK-8355003 In-Reply-To: <0FmJVTYdAq7vmsOi4pi9NRKHm5MfmIrotPucldzsZj4=.b1335bca-f439-4c18-aa1c-6c69548d095d@github.com> References: <0FmJVTYdAq7vmsOi4pi9NRKHm5MfmIrotPucldzsZj4=.b1335bca-f439-4c18-aa1c-6c69548d095d@github.com> Message-ID: On Mon, 2 Jun 2025 18:41:42 GMT, Aleksey Shipilev wrote: > Found this when reading mainline-vs-premain webrev. [JDK-8355003](https://bugs.openjdk.org/browse/JDK-8355003) introduced a backlink to `Method*` in `MethodCounters`. I believe we need to handle that backlink at least in `CodeBuffer::finalize_oop_references()`. premain does this, while mainline does not. Also, amusingly, we have `MethodCounters::is_methodCounters`, but not the super-class `Metadata::is_methodCounters`. > > I pulled in the hunks that use `is_methodCounters()` and `MethodCounters::method()` from premain into this PR. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `runtime/cds` > - [ ] Linux x86_64 server fastdebug, `tier1` > - [ ] Linux x86_64 server fastdebug, `all` Actually, I am not sure if it is even a bug, because mainline is using `MethodCounters::method()` any reasonably only in `MethodCounters::metaspace_pointers_do()`. But I guess it would be good to make sure we handle this backlink consistently. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25599#issuecomment-2932037261 From cjplummer at openjdk.org Mon Jun 2 19:07:54 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 2 Jun 2025 19:07:54 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured [v2] In-Reply-To: References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> Message-ID: <9JQNK3tYLfg04pRpUiGpPYWoSunSfqWB61lkLxSPxwk=.a781defd-ea0e-4ebf-aa7f-01fff2e63101@github.com> On Mon, 2 Jun 2025 08:07:38 GMT, Matthias Baesken wrote: >> There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . >> Those fail when the address sanitizer is configured ( --enable-asan ). >> The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. >> Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . >> While at it, also same is also added for ubsan . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > TestBreakSignalThreadDump has issues with asan Can you document why each tests fails so we have it on record? Can be done in the PR or the CR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25575#issuecomment-2932080104 From dcubed at openjdk.org Mon Jun 2 19:41:54 2025 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 2 Jun 2025 19:41:54 GMT Subject: RFR: 8255082: HotSpot Style Guide should permit noexcept [v2] In-Reply-To: References: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> Message-ID: On Mon, 2 Jun 2025 08:21:34 GMT, Kim Barrett wrote: >> Please review this change to permit the use of `noexcept` under certain >> circumstances in HotSpot code. >> >> http://wg21.link/n3050 >> >> Testing: >> >> JDK-8316930 (HotSpot should use noexcept instead of throw()) showed what the >> conversion would look like. It will need to be brought up to current mainline, >> possibly with modifications. >> >> This is a modification of the Style Guide, so rough consensus among the >> HotSpot Group members is required to make this change. Only Group members >> should vote for approval (via the github PR), though reasoned objections or >> comments from anyone will be considered. A decision on this proposal will not >> be made before Friday 16-June-2025 at 12h00 UTC. >> >> Since we're piggybacking on github PRs here, please use the PR review process >> to approve (click on Review Changes > Approve), rather than sending a "vote: >> yes" email reply that would be normal for a CFV. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > dholmes review Thumbs up. I do have a query about whether the mention of `nothrow` should be `noexcept`. doc/hotspot-style.html line 1153: > 1151: different guarantees for some operations (and may choose different > 1152: algorithms to implement those operations), depending on whether certain > 1153: functions (constructors, copy/move operations, swap) are nothrow or not. `nothrow` here or `noexcept`? doc/hotspot-style.md line 1145: > 1143: guarantees for some operations (and may choose different algorithms to > 1144: implement those operations), depending on whether certain functions > 1145: (constructors, copy/move operations, swap) are nothrow or not. They detect `nothrow` here or `noexcept`? ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25574#pullrequestreview-2889700427 PR Review Comment: https://git.openjdk.org/jdk/pull/25574#discussion_r2122001750 PR Review Comment: https://git.openjdk.org/jdk/pull/25574#discussion_r2122004882 From never at openjdk.org Mon Jun 2 20:05:51 2025 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 2 Jun 2025 20:05:51 GMT Subject: RFR: 8358254: [AOT] runtime/cds/appcds/applications/JavacBench.java#aot crashes with SEGV in ClassLoaderData::holder In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 15:47:29 GMT, Doug Simon wrote: > JVMCI needs to be aware of unloaded classes in type profiles just like [CI does](https://github.com/openjdk/jdk/pull/24886/files#diff-cda53c3ed39c4e59f73f3298933ebed1912daeaf854f0b31f40332be109f6c30R317). Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25592#pullrequestreview-2889770600 From mgronlun at openjdk.org Mon Jun 2 20:07:02 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 2 Jun 2025 20:07:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:50:49 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug related to async stack walking src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 49: > 47: > 48: static bool is_excluded(JavaThread* thread) { > 49: return thread->is_hidden_from_external_view() || thread->jfr_thread_local()->is_excluded(); I think I misled you saying that JfrRecorder_thread would be excluded by the above expression. That was true - but not anymore. Our exclusion test looks like: static inline bool is_excluded(JavaThread* jt) { assert(jt != nullptr, "invariant"); return jt->is_Compiler_thread() || jt->is_hidden_from_external_view() || jt->is_JfrRecorder_thread() || jt->jfr_thread_local()->is_excluded(); } I like you could fold jt->is_Compiler_thread() into jt->is_hidden_from_external_view() - good!. But can you please again list the condition jt->is_JfrRecorder_thread() ? Sorry, I forgot we had removed it from being considered excluded on the JfrThreadLocal level. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2122045043 From dholmes at openjdk.org Mon Jun 2 21:10:51 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Jun 2025 21:10:51 GMT Subject: RFR: 8255082: HotSpot Style Guide should permit noexcept [v2] In-Reply-To: References: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> Message-ID: On Mon, 2 Jun 2025 08:21:34 GMT, Kim Barrett wrote: >> Please review this change to permit the use of `noexcept` under certain >> circumstances in HotSpot code. >> >> http://wg21.link/n3050 >> >> Testing: >> >> JDK-8316930 (HotSpot should use noexcept instead of throw()) showed what the >> conversion would look like. It will need to be brought up to current mainline, >> possibly with modifications. >> >> This is a modification of the Style Guide, so rough consensus among the >> HotSpot Group members is required to make this change. Only Group members >> should vote for approval (via the github PR), though reasoned objections or >> comments from anyone will be considered. A decision on this proposal will not >> be made before Friday 16-June-2025 at 12h00 UTC. >> >> Since we're piggybacking on github PRs here, please use the PR review process >> to approve (click on Review Changes > Approve), rather than sending a "vote: >> yes" email reply that would be normal for a CFV. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > dholmes review Marked as reviewed by dholmes (Reviewer). doc/hotspot-style.html line 1121: > 1119:
  • Only the argument-less form of noexcept exception > 1120: specifications are permitted. noexcept exception > 1121: specifications with arguments are forbidden.
  • I was suggesting dropping the second sentence as it is implied by the first. ------------- PR Review: https://git.openjdk.org/jdk/pull/25574#pullrequestreview-2889941827 PR Review Comment: https://git.openjdk.org/jdk/pull/25574#discussion_r2122157846 From dholmes at openjdk.org Mon Jun 2 21:56:02 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Jun 2025 21:56:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 18:37:14 GMT, Aleksey Shipilev wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bug related to async stack walking > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 117: > >> 115: >> 116: bool JfrCPUTimeTraceQueue::is_empty() const { >> 117: return Atomic::load(&_head) == 0; > > Not entirely clear what is the memory semantics for accessing `_head`. Does it need to be acq/rel? If so, this one should be `::load_acquire`? Many of the accesses to head do not appear to synchronize with anything and so do not need acquire semantics. But the overall concurrency properties of this code are very unclear to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2122228261 From coleenp at openjdk.org Tue Jun 3 00:14:57 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Jun 2025 00:14:57 GMT Subject: RFR: 8352075: Perf regression accessing fields [v20] In-Reply-To: References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: On Mon, 2 Jun 2025 15:31:46 GMT, Radim Vansa wrote: >> This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . >> >> This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). >> >> In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. >> >> My measurements on the attached reproducer >> >> hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC >> Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] >> Range (min ? max): 45.1 ms ? 53.9 ms 100 runs >> >> hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC >> Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] >> Range (min ? max): 73.8 ms ? 79.7 ms 100 runs >> >> (the jdk25-master above already contains JDK-8353175) >> >> hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC >> Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] >> Range (min ? max): 37.7 ms ? 42.1 ms 100 runs >> >> While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: >> >> JDK 17: 1.6 s >> JDK 21 (no patches): 22 s >> JDK25-master: 12.3 s >> JDK25-this-pr: 0.5 s > > Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: > > Fix error on windows It all seems reasonable until I got to the packing code and it'll take a long time to figure out how it works. Maybe some comments would help. I have 3 general comments though: 1. The coding style guide somewhere says that the * belongs with the type and not the name. This is inconsistent in this code. Can you fix it? 2. Block comments (except copyright) should use // not /* */ 3. The jtreg test directory name should be not the bugid. I think this test can go in directory runtime/FieldLayout. src/hotspot/share/utilities/packedTable.hpp line 38: > 36: uint32_t _key_mask; > 37: unsigned int _value_shift; > 38: uint32_t _value_mask; Aren't all 4 of these types the same? can you make them all uint32_t or all unsigned int? (former preferred). ------------- PR Review: https://git.openjdk.org/jdk/pull/24847#pullrequestreview-2890214085 PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2122347635 From coleenp at openjdk.org Tue Jun 3 00:14:58 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Jun 2025 00:14:58 GMT Subject: RFR: 8352075: Perf regression accessing fields [v20] In-Reply-To: References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: On Mon, 2 Jun 2025 23:49:51 GMT, Coleen Phillimore wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix error on windows > > src/hotspot/share/utilities/packedTable.hpp line 38: > >> 36: uint32_t _key_mask; >> 37: unsigned int _value_shift; >> 38: uint32_t _value_mask; > > Aren't all 4 of these types the same? can you make them all uint32_t or all unsigned int? (former preferred). Can you explain somewhere how fields are mapped to this? I assume they're sorted, for some reason I expected the packed table to be {name-cp-index, sig-cp-index, offset-in-fieldstream-for-direct-access}. Does every field get 4 ints ? So why is it packed into ```Array``` rather than just use ```Array```? So much packing code that I don't know how anyone could ever debug it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2122360613 From dholmes at openjdk.org Tue Jun 3 00:16:03 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Jun 2025 00:16:03 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v26] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 09:24:53 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 119: >> >>> 117: _data = new_data; >>> 118: _capacity = capacity; >>> 119: } >> >> I assume there is a lock protecting this so it happens atomically? > > This happens before the signal handler is attached to thread. So it does happen before any parallelism is introduced on thread creation. I'm missing the big picture here unfortunately. This looks like it can get called repeatedly as needed to change capacity. Are you saying it only gets called once before we create the sampler thread? Is the concurrency model described somewhere? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2122365626 From dholmes at openjdk.org Tue Jun 3 00:28:05 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Jun 2025 00:28:05 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: <_Q0iW6TuzM0P1qeE2XsMZbTx3lfCgW9QDEsf3-FlRYE=.b6707a06-3d91-4764-a8d8-7eaa76680584@github.com> On Mon, 2 Jun 2025 21:53:38 GMT, David Holmes wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 117: >> >>> 115: >>> 116: bool JfrCPUTimeTraceQueue::is_empty() const { >>> 117: return Atomic::load(&_head) == 0; >> >> Not entirely clear what is the memory semantics for accessing `_head`. Does it need to be acq/rel? If so, this one should be `::load_acquire`? > > Many of the accesses to head do not appear to synchronize with anything and so do not need acquire semantics. But the overall concurrency properties of this code are very unclear to me. To be clear, you only need acquire semantics here if after seeing the value 0 you need to access fields that were written before `_head` was set to 0. Similarly for most of the other access to `_head`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2122374152 From dholmes at openjdk.org Tue Jun 3 00:53:57 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Jun 2025 00:53:57 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured [v2] In-Reply-To: References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> Message-ID: On Mon, 2 Jun 2025 08:07:38 GMT, Matthias Baesken wrote: >> There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . >> Those fail when the address sanitizer is configured ( --enable-asan ). >> The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. >> Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . >> While at it, also same is also added for ubsan . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > TestBreakSignalThreadDump has issues with asan Changes look fine but I agree with Chris that we need to document why these tests don't work with ASAN, though I think I'd prefer to see an `@comment` before the `@requires !vm.asan` in the actual test files - assuming the reason can be stated clearly and succinctly. ------------- PR Review: https://git.openjdk.org/jdk/pull/25575#pullrequestreview-2890276148 From xgong at openjdk.org Tue Jun 3 01:49:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 3 Jun 2025 01:49:07 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Fri, 30 May 2025 08:15:22 GMT, Xiaohong Gong wrote: >>> @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. >>> >>> Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): >>> >>> https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 >>> >>> I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. >> >> Sounds good to me. I will have a deep investigation for it. Thanks! >> >> >> >>> > Yes, I also observed such regression. >>> > It would be nice if you proactively mentioned regressions, so it does not have to be pointed out by reviewers. >>> >>> For me, it could be ok to fix it in a follow-up patch. I think we are too close to RDP1 for JDK25 now anyway, and so we could push this patch here into JDK26, and then we have enough time in JDK26 to investigate the regression. Even better would be if we could do the other patch first, so we never even encounter a regression. >> >> Sounds good to me. Thanks! > >> > @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. >> > Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): >> > https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 >> > >> > I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. >> >> Sounds good to me. I will have a deep investigation for it. Thanks! > > Hi @eme64 @jatin-bhateja, I'v created a PR https://github.com/openjdk/jdk/pull/25539 to fix this issue. With this change, the performance regression can be fixed as well. Could you please take a look at that change and help to run the test on different X86 machines? Thanks a lot! > @XiaohongGong I reviewed #25539. Since it is a relatively simple patch, I suggest that we integrate that one first, and come back to this here later. Is that ok for you? That's fine to me. Thanks for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2933082670 From xgong at openjdk.org Tue Jun 3 01:49:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 3 Jun 2025 01:49:07 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 01:45:57 GMT, Xiaohong Gong wrote: >>> > @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. >>> > Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): >>> > https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 >>> > >>> > I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. >>> >>> Sounds good to me. I will have a deep investigation for it. Thanks! >> >> Hi @eme64 @jatin-bhateja, I'v created a PR https://github.com/openjdk/jdk/pull/25539 to fix this issue. With this change, the performance regression can be fixed as well. Could you please take a look at that change and help to run the test on different X86 machines? Thanks a lot! > >> @XiaohongGong I reviewed #25539. Since it is a relatively simple patch, I suggest that we integrate that one first, and come back to this here later. Is that ok for you? > > That's fine to me. Thanks for your review! > Hi @XiaohongGong , Looks good to me, thanks again for this re-factor !! > > Best Regards, Jatin Thanks so much for your review @jatin-bhateja ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2933083694 From kvn at openjdk.org Tue Jun 3 02:06:07 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 02:06:07 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder Message-ID: There is difference between AdapterFingerPrint allocation size [compute_size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2227) which may not be aligned to HeapWord size and [size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421) used for copying during AOT cache build which is aligned and can be bigger than allocation size. I added asserts to `AdapterFingerPrint` and `AdapterHandlerEntry` to make sure sizes are correct. Both are used in AOT cache build. I also moved `FreeHeap()` from `~AdapterFingerPrint()` to enforce the comment and simplify executed code. Thanks to @MBaesken for finding the issue and @iklam for pointing the cause. Testing tier1-3, xcomp, stress. Higher tiers are still running. ------------- Commit messages: - 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder Changes: https://git.openjdk.org/jdk/pull/25604/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25604&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358289 Stats: 7 lines in 2 files changed: 3 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25604/head:pull/25604 PR: https://git.openjdk.org/jdk/pull/25604 From kvn at openjdk.org Tue Jun 3 02:12:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 02:12:50 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 02:01:02 GMT, Vladimir Kozlov wrote: > There is difference between AdapterFingerPrint allocation size [compute_size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2227) which may not be aligned to HeapWord size and [size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421) used for copying during AOT cache build which is aligned and can be bigger than allocation size. > > I added asserts to `AdapterFingerPrint` and `AdapterHandlerEntry` to make sure sizes are correct. Both are used in AOT cache build. > > I also moved `FreeHeap()` from `~AdapterFingerPrint()` to enforce the comment and simplify executed code. > > Thanks to @MBaesken for finding the issue and @iklam for pointing the cause. > > Testing tier1-3, xcomp, stress. Higher tiers are still running. @MBaesken please test this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25604#issuecomment-2933119129 From asmehra at openjdk.org Tue Jun 3 03:48:50 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 3 Jun 2025 03:48:50 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 02:01:02 GMT, Vladimir Kozlov wrote: > There is difference between AdapterFingerPrint allocation size [compute_size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2227) which may not be aligned to HeapWord size and [size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421) used for copying during AOT cache build which is aligned and can be bigger than allocation size. > > I added asserts to `AdapterFingerPrint` and `AdapterHandlerEntry` to make sure sizes are correct. Both are used in AOT cache build. > > I also moved `FreeHeap()` from `~AdapterFingerPrint()` to enforce the comment and simplify executed code. > > Thanks to @MBaesken for finding the issue and @iklam for pointing the cause. > > Testing tier1-3, xcomp, stress. Higher tiers are still running. @iklam @MBaesken Nice catch. @vnkozlov thanks for fixing it. I realized `compute_size()` does not use `sig_bt` parameter. Since you are touching this code, can you please remove it as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25604#issuecomment-2933306923 From xpeng at openjdk.org Tue Jun 3 05:36:23 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 3 Jun 2025 05:36:23 GMT Subject: RFR: 8354555: Add generic JFR events for TaskTerminator [v6] In-Reply-To: <_7FP2wNe8p3N8SxKdmCN1x4zKO8TT5JWRcWEt51i35c=.4fbac292-3cb7-48b9-922e-1114f74e0549@github.com> References: <_7FP2wNe8p3N8SxKdmCN1x4zKO8TT5JWRcWEt51i35c=.4fbac292-3cb7-48b9-922e-1114f74e0549@github.com> Message-ID: > The purpose of the PR is to add generic JFR events for TaskTerminator to track the attempts and timings that GC threads have tried to terminate GC tasks. > > Today only G1 emits JFR event with name `Termination` from [G1ParEvacuateFollowersClosure](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/g1/g1YoungCollector.cpp#L555-L563), all other garbage collectors don't emit any JFR event for the termination attempt at all. > > By adding this, it gives performance engineers the visibility to the termination attempts and termination time when GC threads trying to finish GC tasks, we could build tool to analyze the jfr events to determine if there is potential data structure issue in application code, e.g. very large LinkedList or LinkedBlockingQueue. > > For the test, I have manually tested different GCs with Flight Recording enabled and verified the events: > G1: > > jdk.GCPhaseParallel { > startTime = 23:09:34.124 (2025-05-22) > duration = 0.0108 ms > gcId = 0 > gcWorkerId = 8 > name = "Termination" > eventThread = "GC Thread#4" (osThreadId = 20483) > } > > jdk.GCPhaseParallel { > startTime = 23:09:34.124 (2025-05-22) > duration = 0.0467 ms > gcId = 0 > gcWorkerId = 2 > name = "Termination" > eventThread = "GC Thread#2" (osThreadId = 21251) > } > > jdk.GCPhaseParallel { > startTime = 23:09:34.124 (2025-05-22) > duration = 0.0474 ms > gcId = 0 > gcWorkerId = 1 > name = "Termination" > eventThread = "GC Thread#8" (osThreadId = 36359) > } > jdk.GCPhaseParallel { > startTime = 23:09:41.925 (2025-05-22) > duration = 0.000834 ms > gcId = 14 > gcWorkerId = 7 > name = "Termination: Parallel Marking" > eventThread = "GC Thread#1" (osThreadId = 21507) > } > > jdk.GCPhaseParallel { > startTime = 23:09:41.925 (2025-05-22) > duration = 0.000166 ms > gcId = 14 > gcWorkerId = 7 > name = "Termination: Parallel Marking" > eventThread = "GC Thread#1" (osThreadId = 21507) > } > > > Shenandoah: > > jdk.GCPhaseParallel { > startTime = 23:39:58.890 (2025-05-22) > duration = 0.0202 ms > gcId = 0 > gcWorkerId = 0 > name = "Termination: Concurrent Mark" > eventThread = "Shenandoah GC Threads#3" (osThreadId = 13827) > } > > jdk.GCPhaseParallel { > startTime = 23:39:58.890 (2025-05-22) > duration = 0.0205 ms > gcId = 0 > gcWorkerId = 1 > name = "Termination: Concurrent Mark" > eventThread = "Shenandoah GC Threads#1" (osThreadId = 14339) > } > > jdk.GCPhaseParallel { > startTime = 23:39:58.890 (2025-05-22) > duration = 0.0127 ms > gcId = 0 > gcWorkerId = 5 > name = "Termination: Final Mark" > eventThread = "Shenandoah G... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: - Merge branch 'openjdk:master' into JDK-8354555 - Merge branch 'openjdk:master' into JDK-8354555 - Fix jft test failure - Merge branch 'master' into JDK-8354555 - Patch to fix the PR concerns - Emit exact same events for G1 as G1 is emitting today from G1EvacuateRegionsBaseTask and G1STWRefProcProxyTask - Add include "workerThread.hpp" - Touch up - Move TERMINATION_EVENT_NAME_PREFIX_ASSERT to taskTerminator.cpp - Fix ident - ... and 20 more: https://git.openjdk.org/jdk/compare/832c5b06...8fb9a402 ------------- Changes: https://git.openjdk.org/jdk/pull/24676/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24676&range=05 Stats: 90 lines in 10 files changed: 68 ins; 7 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24676.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24676/head:pull/24676 PR: https://git.openjdk.org/jdk/pull/24676 From cslucas at openjdk.org Tue Jun 3 05:40:09 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 3 Jun 2025 05:40:09 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" [v2] In-Reply-To: References: Message-ID: > Please review this refactor to transform the reasons for making an nmethod not entrant from `const char*` into enum values. > > Tested on Linux x64 with JTREG tier1-3 in fastdebug and release mode. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Address PR feedback: modify emum to be scoped. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25338/files - new: https://git.openjdk.org/jdk/pull/25338/files/933b958d..b3bb4365 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25338&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25338&range=00-01 Stats: 83 lines in 13 files changed: 55 ins; 4 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/25338.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25338/head:pull/25338 PR: https://git.openjdk.org/jdk/pull/25338 From rvansa at openjdk.org Tue Jun 3 05:53:55 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Tue, 3 Jun 2025 05:53:55 GMT Subject: RFR: 8352075: Perf regression accessing fields [v20] In-Reply-To: References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: On Tue, 3 Jun 2025 00:05:35 GMT, Coleen Phillimore wrote: >> src/hotspot/share/utilities/packedTable.hpp line 38: >> >>> 36: uint32_t _key_mask; >>> 37: unsigned int _value_shift; >>> 38: uint32_t _value_mask; >> >> Aren't all 4 of these types the same? can you make them all uint32_t or all unsigned int? (former preferred). > > Can you explain somewhere how fields are mapped to this? I assume they're sorted, for some reason I expected the packed table to be {name-cp-index, sig-cp-index, offset-in-fieldstream-for-direct-access}. Does every field get 4 ints ? So why is it packed into ```Array``` rather than just use ```Array```? So much packing code that I don't know how anyone could ever debug it. Yes, in practice these all are of the same size, but in case of the masks (as well as in case of arguments in API) I want to stress out that these are 32 bit numbers. The `unsigned int`s are just 'some not too big number'. Is there any general guidance on deciding between `unsigned int` (I suppose just `unsigned` is not recommended), `uint32_t` and `u4`? I was hoping that the comment on line 68 explains the intended use, but I can be more verbose and document each method. When the packed table is used for fieldinfo, it's { offset-in-fieldstream, index-in-fieldstream }. The Comparator implementation can translate offset-in-fieldstream -> { name, signature } and then do the comparison. The `index-in-fieldstream` is kind of second-class citizen; we need to fill it into `FieldInfo` and it is not encoded in the stream, therefore we need to encode it in the packed table. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2122780819 From aboldtch at openjdk.org Tue Jun 3 06:00:57 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 3 Jun 2025 06:00:57 GMT Subject: RFR: 8357155: [asan] ZGC does not work (x86_64 and ppc64) [v2] In-Reply-To: References: Message-ID: <_p5h0MfOc1LQ2g30xDHYJf9v_B2QbJmJ0El0vc_u6zM=.6461af5c-c4cd-442c-a16e-c9578484f10c@github.com> On Mon, 2 Jun 2025 10:38:18 GMT, Martin Doerr wrote: >> src/hotspot/cpu/ppc/gc/z/zAddress_ppc.cpp line 95: >> >>> 93: const size_t max_address_offset_bits = valid_max_address_offset_bits - 3; >>> 94: #ifdef ADDRESS_SANITIZER >>> 95: return max_address_offset_bits; >> >> I think this actually has to be >> ```c++ >> return MIN2(valid_max_address_offset_bits, 44); >> >> >> Because the way we probe we may otherwise return 45 here. Which could result in more than 44 bits in a ZOffset which our internal data structures cannot handle. Hopefully this still works for ASAN on PPC. (The `-3` is a left over from non-generational ZGC). Aarch64 could do the same, but it does not have this issue as it starts its probing at bit 46, not bit 47. >> >> _Side note: This makes me realise that there probably is a bug here on PPC and RISCV if running on a NUMA machine with more than 8 TB heap. As after ZGlobalsPointers::min_address_offset_request() was introduced we can return 45 from this function._ > > @xmas92: Thanks for looking into this! Should we set `DEFAULT_MAX_ADDRESS_BIT = 44` and use the constant? > Or maybe file a separate issue for fixing that on aarch64, PPC64 and riscv (and also remove the -3 from the `max_address_offset_bits computation`)? [JDK-8358310](https://bugs.openjdk.org/browse/JDK-8358310) / #25578 is open right now as a quick fix for returning a too large value without cleaning up the implementation. (As a fix for 25) This was noted back in https://github.com/openjdk/jdk/pull/18941#issuecomment-2079316745 ([JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275)), but I think fixing this fell through the cracks. I currently have a rewrite in the works which overhauls the heap base selection, which I plan to get into 26. In that patch all the non-generational legacy is removed. So we no longer probe based on the assumption that we need 3 extra high order bits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25549#discussion_r2122787593 From aboldtch at openjdk.org Tue Jun 3 06:00:58 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 3 Jun 2025 06:00:58 GMT Subject: RFR: 8357155: [asan] ZGC does not work (x86_64 and ppc64) [v2] In-Reply-To: <_p5h0MfOc1LQ2g30xDHYJf9v_B2QbJmJ0El0vc_u6zM=.6461af5c-c4cd-442c-a16e-c9578484f10c@github.com> References: <_p5h0MfOc1LQ2g30xDHYJf9v_B2QbJmJ0El0vc_u6zM=.6461af5c-c4cd-442c-a16e-c9578484f10c@github.com> Message-ID: On Tue, 3 Jun 2025 05:56:46 GMT, Axel Boldt-Christmas wrote: >> @xmas92: Thanks for looking into this! Should we set `DEFAULT_MAX_ADDRESS_BIT = 44` and use the constant? >> Or maybe file a separate issue for fixing that on aarch64, PPC64 and riscv (and also remove the -3 from the `max_address_offset_bits computation`)? > > [JDK-8358310](https://bugs.openjdk.org/browse/JDK-8358310) / #25578 is open right now as a quick fix for returning a too large value without cleaning up the implementation. (As a fix for 25) > > This was noted back in https://github.com/openjdk/jdk/pull/18941#issuecomment-2079316745 ([JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275)), but I think fixing this fell through the cracks. > > I currently have a rewrite in the works which overhauls the heap base selection, which I plan to get into 26. In that patch all the non-generational legacy is removed. So we no longer probe based on the assumption that we need 3 extra high order bits. But I will make sure to create an issue for this overhaul, so it does not get lost. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25549#discussion_r2122788828 From dnsimon at openjdk.org Tue Jun 3 06:21:55 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 3 Jun 2025 06:21:55 GMT Subject: RFR: 8358254: [AOT] runtime/cds/appcds/applications/JavacBench.java#aot crashes with SEGV in ClassLoaderData::holder In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 15:47:29 GMT, Doug Simon wrote: > JVMCI needs to be aware of unloaded classes in type profiles just like [CI does](https://github.com/openjdk/jdk/pull/24886/files#diff-cda53c3ed39c4e59f73f3298933ebed1912daeaf854f0b31f40332be109f6c30R317). Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25592#issuecomment-2933645017 From dnsimon at openjdk.org Tue Jun 3 06:21:56 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 3 Jun 2025 06:21:56 GMT Subject: Integrated: 8358254: [AOT] runtime/cds/appcds/applications/JavacBench.java#aot crashes with SEGV in ClassLoaderData::holder In-Reply-To: References: Message-ID: <7NMm1SDt9UaKrkgEPeFaSbkz97Lwqof1TVjyAKEyGY4=.d4792765-e2e2-4e0c-8a28-b5583cfed394@github.com> On Mon, 2 Jun 2025 15:47:29 GMT, Doug Simon wrote: > JVMCI needs to be aware of unloaded classes in type profiles just like [CI does](https://github.com/openjdk/jdk/pull/24886/files#diff-cda53c3ed39c4e59f73f3298933ebed1912daeaf854f0b31f40332be109f6c30R317). This pull request has now been integrated. Changeset: 497a1822 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/497a1822cabcc0475ce0495d56430f1e99b1fb13 Stats: 33 lines in 4 files changed: 13 ins; 16 del; 4 mod 8358254: [AOT] runtime/cds/appcds/applications/JavacBench.java#aot crashes with SEGV in ClassLoaderData::holder Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/25592 From jbechberger at openjdk.org Tue Jun 3 06:58:03 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 06:58:03 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 13:50:49 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug related to async stack walking Regarding https://github.com/openjdk/jdk/pull/25302#discussion_r2119984783 raw_thread == nullptr This seems to happen rarely on (abrupt) shutdowns. I attached an hs_err file: [hs_err_pid1688961.log](https://github.com/user-attachments/files/20563594/hs_err_pid1688961.log) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25302#issuecomment-2933774229 From jbechberger at openjdk.org Tue Jun 3 07:05:47 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 07:05:47 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v30] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - Check for raw_thread == nullptr - Move async stackwalking to JFR ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/09ca4fed..bef52132 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=28-29 Stats: 89 lines in 3 files changed: 37 ins; 49 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From jbechberger at openjdk.org Tue Jun 3 07:12:46 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 07:12:46 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v31] In-Reply-To: References: Message-ID: <_4vRA_P9_dLG022vs8ZinaZmqC48drRAwdOSiDG9Wjk=.25880197-6c87-4faf-8259-12d6c0f10f2e@github.com> > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp Co-authored-by: Aleksey Shipil?v - Small fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/bef52132..c3dedefb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=29-30 Stats: 17 lines in 4 files changed: 2 ins; 5 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From rvansa at openjdk.org Tue Jun 3 07:16:47 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Tue, 3 Jun 2025 07:16:47 GMT Subject: RFR: 8352075: Perf regression accessing fields [v21] In-Reply-To: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: > This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . > > This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). > > In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. > > My measurements on the attached reproducer > > hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC > Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] > Range (min ? max): 45.1 ms ? 53.9 ms 100 runs > > hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC > Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] > Range (min ? max): 73.8 ms ? 79.7 ms 100 runs > > (the jdk25-master above already contains JDK-8353175) > > hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' > Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC > Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] > Range (min ? max): 37.7 ms ? 42.1 ms 100 runs > > While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: > > JDK 17: 1.6 s > JDK 21 (no patches): 22 s > JDK25-master: 12.3 s > JDK25-this-pr: 0.5 s Radim Vansa has updated the pull request incrementally with three additional commits since the last revision: - Moved jtreg test - Improved documentation - Fix coding style (asterisk placement) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24847/files - new: https://git.openjdk.org/jdk/pull/24847/files/7d8b4a19..862b264b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24847&range=19-20 Stats: 99 lines in 11 files changed: 31 ins; 2 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/24847.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24847/head:pull/24847 PR: https://git.openjdk.org/jdk/pull/24847 From jbechberger at openjdk.org Tue Jun 3 07:17:09 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 07:17:09 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v29] In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 20:02:15 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bug related to async stack walking > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 49: > >> 47: >> 48: static bool is_excluded(JavaThread* thread) { >> 49: return thread->is_hidden_from_external_view() || thread->jfr_thread_local()->is_excluded(); > > I think I misled you saying that JfrRecorder_thread would be excluded by the above expression. That was true - but not anymore. > > Our exclusion test looks like: > > static inline bool is_excluded(JavaThread* jt) { > assert(jt != nullptr, "invariant"); > return jt->is_Compiler_thread() || jt->is_hidden_from_external_view() || jt->is_JfrRecorder_thread() || jt->jfr_thread_local()->is_excluded(); > } > > I like you could fold jt->is_Compiler_thread() into jt->is_hidden_from_external_view() - good!. > > But can you please again list the condition jt->is_JfrRecorder_thread()? Sorry, I forgot we had removed it from being considered excluded on a per JfrThreadLocal level. Thanks. No problem. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2122934873 From rvansa at openjdk.org Tue Jun 3 07:18:58 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Tue, 3 Jun 2025 07:18:58 GMT Subject: RFR: 8352075: Perf regression accessing fields [v15] In-Reply-To: References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: On Sat, 31 May 2025 14:49:48 GMT, Coleen Phillimore wrote: >> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision: >> >> More debug logs > > I can reproduce the crash when building slowdebug on linux-x64. @coleenp I fixed the coding style (I wish OpenJDK had a linter, or at least a checker... the asterisk placement is hard to get used to), improved docs and moved the jtreg test to runtime/FieldStream (I think that FieldLayout checks how are fields placed within an instance). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24847#issuecomment-2933845743 From lliu at openjdk.org Tue Jun 3 07:19:11 2025 From: lliu at openjdk.org (Liming Liu) Date: Tue, 3 Jun 2025 07:19:11 GMT Subject: RFR: 8358032: Use crypto pmull for CRC32/CRC32C intrinsics on Ampere CPU Message-ID: This PR is to enable the use of crypto pmull for CRC32/CRC32C intrinsics on Ampere CPU. There is an option UseCryptoPmullForCRC32 that can enable crypto pmull, but directly enabling it on Ampere CPU will cause the following problems. 1. There will be regressions (-14% ~ -8%) on Ampere1 when the length is 64. When <= 128, both kernel_crc32_using_crc32 and kernel_crc32_using_crypto_pmull use the loop labeled as CRC_by32_loop, but their implements are a little different, and the loop in kernel_crc32_using_crc32 is better at hiding latency on Ampere1. So this PR takes the loop in kernel_crc32_using_crc32 to kernel_crc32_using_crypto_pmull, and does the same for CRC32C intrinsic. 2. The intrinsics only use crypto pmull when the length is higher than 383, while the loop in kernel_crc32_common_fold_using_crypto_pmull looks able to handle 256, and if it handles 256 on Ampere1, the improvements can be as high as 110% compared with kernel_crc32_using_crc32/kernel_crc32c_using_crc32c. However, there are regressions (~-6%) on Neoverse V1 when the length is 256. So this PR introduces a new option named CryptoPmullForCRC32LowLimit. It defaults to 256 since the code could handle 256, while it is set to 384 for V1/V2 to keep the old behavior on these platforms. The performance regressions and improvements were measured with the following microbenchmarks: org.openjdk.bench.java.util.TestCRC32.testCRC32Update org.openjdk.bench.java.util.TestCRC32C.testCRC32CUpdate Ran the following JTReg tests on Ampere1 and did not find problems: test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java ------------- Commit messages: - Use the utility functions - Introduce CryptoPmullForCRC32LowLimit and use pmull for crc32 on Ampere CPU Changes: https://git.openjdk.org/jdk/pull/25609/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25609&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358032 Stats: 28 lines in 3 files changed: 17 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25609.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25609/head:pull/25609 PR: https://git.openjdk.org/jdk/pull/25609 From dholmes at openjdk.org Tue Jun 3 07:39:02 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Jun 2025 07:39:02 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v31] In-Reply-To: <_4vRA_P9_dLG022vs8ZinaZmqC48drRAwdOSiDG9Wjk=.25880197-6c87-4faf-8259-12d6c0f10f2e@github.com> References: <_4vRA_P9_dLG022vs8ZinaZmqC48drRAwdOSiDG9Wjk=.25880197-6c87-4faf-8259-12d6c0f10f2e@github.com> Message-ID: On Tue, 3 Jun 2025 07:12:46 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp > > Co-authored-by: Aleksey Shipil?v > - Small fixes > Regarding [#25302 (comment)](https://github.com/openjdk/jdk/pull/25302#discussion_r2119984783) > > ``` > raw_thread == nullptr > ``` > > This seems to happen rarely on (abrupt) shutdowns. I attached an hs_err file: [hs_err_pid1688961.log](https://github.com/user-attachments/files/20563594/hs_err_pid1688961.log) That is interesting. The signal appears to be being handled on an unattached thread during shutdown, and there is no stack left to show any VM involvement. Possibly we need to block the signal as part of thread termination, before we clear the current thread. ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25302#issuecomment-2933916367 From jbechberger at openjdk.org Tue Jun 3 07:44:33 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 07:44:33 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v32] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - Remove includes and other lines - Fix is_excluded ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/c3dedefb..ab47f680 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=30-31 Stats: 25 lines in 17 files changed: 4 ins; 5 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From jbechberger at openjdk.org Tue Jun 3 07:44:34 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 07:44:34 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v31] In-Reply-To: <_4vRA_P9_dLG022vs8ZinaZmqC48drRAwdOSiDG9Wjk=.25880197-6c87-4faf-8259-12d6c0f10f2e@github.com> References: <_4vRA_P9_dLG022vs8ZinaZmqC48drRAwdOSiDG9Wjk=.25880197-6c87-4faf-8259-12d6c0f10f2e@github.com> Message-ID: On Tue, 3 Jun 2025 07:12:46 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp > > Co-authored-by: Aleksey Shipil?v > - Small fixes We already do: void JfrCPUTimeThreadSampler::on_javathread_terminate(JavaThread* thread) { JfrThreadLocal* tl = thread->jfr_thread_local(); assert(tl != nullptr, "invariant"); timer_t* timer = tl->cpu_timer(); if (timer == nullptr) { return; // no timer was created for this thread } timer_delete(*timer); ------------- PR Comment: https://git.openjdk.org/jdk/pull/25302#issuecomment-2933945034 From mdoerr at openjdk.org Tue Jun 3 07:54:52 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Jun 2025 07:54:52 GMT Subject: RFR: 8358310: ZGC: riscv, ppc ZPlatformAddressOffsetBits may return a too large value In-Reply-To: <6j_zozeh-Vwu3tRHRlJ5h_mhcMFsNm_OMUinAosz8fU=.d51c8c95-aad1-4566-a23b-8da5b521aa90@github.com> References: <6j_zozeh-Vwu3tRHRlJ5h_mhcMFsNm_OMUinAosz8fU=.d51c8c95-aad1-4566-a23b-8da5b521aa90@github.com> Message-ID: On Mon, 2 Jun 2025 08:55:02 GMT, Axel Boldt-Christmas wrote: > The way that ZPlatformAddressOffsetBits is implemented on riscv and ppc may result in a return value of 45. This is larger than the max supported value of 44 (because of other internal data structures). This was fixed in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275) for aarch64. > > Before [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the issue on manifested if one tried to select a heap larger than 16 TB (not supported), but after [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we try to double the heap address space when running on a NUMA machine. So we may now encounter this bug for heaps larger than 8TB (which is supported). > > While ZPlatformAddressOffsetBits needs an overhaul. (It was written for non-generational ZGC where we had the three color bits inside the address.) The proposal is that we solve this for ppc and riscv by doing the same thing we did for aarch64 in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275) Thanks for fixing it! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25578#pullrequestreview-2891147685 From mdoerr at openjdk.org Tue Jun 3 08:02:00 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Jun 2025 08:02:00 GMT Subject: RFR: 8357155: [asan] ZGC does not work (x86_64 and ppc64) [v2] In-Reply-To: References: Message-ID: <0808aEXLDKNUY6rsNCbjRjs_O0BaPLrCsX7q2zjpzus=.8ea987cf-fb41-47c5-9df3-840bc939f99a@github.com> On Mon, 2 Jun 2025 09:11:05 GMT, Matthias Baesken wrote: >> Many (all?) ZGC related jtreg tests do not work when the JDK is built with address sanitizer asan enabled (configure flag --enable-asan). >> This can be seen on SUSE Linux x86_64 and also on ppc64le , opt binaries were used. >> It has been suggested to do a workaround - 'But I think that simply adapting the zAddress_[...].cpp implementations to always select the largest heap base would go a long way for providing ASAN compatibility.' >> This seems to work nicely on x86_64 and ppc64le, however the zgc related tests still fail on Linux aarch64 (should I exclude this platform from my patch?) . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > remove aarch64 from the change, adjust ppc64 src/hotspot/cpu/ppc/gc/z/zAddress_ppc.cpp line 95: > 93: const size_t max_address_offset_bits = valid_max_address_offset_bits - 3; > 94: #ifdef ADDRESS_SANITIZER > 95: return MIN2(valid_max_address_offset_bits, (size_t)44); I think this PR is ok, but please add a comment like "The max supported value is 44 because of other internal data structures.". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25549#discussion_r2123048021 From aph at openjdk.org Tue Jun 3 08:37:51 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 3 Jun 2025 08:37:51 GMT Subject: RFR: 8358032: Use crypto pmull for CRC32/CRC32C intrinsics on Ampere CPU In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 07:14:03 GMT, Liming Liu wrote: > This PR is to enable the use of crypto pmull for CRC32/CRC32C intrinsics on Ampere CPU. There is an option UseCryptoPmullForCRC32 that can enable crypto pmull, but directly enabling it on Ampere CPU will cause the following problems. > > 1. There will be regressions (-14% ~ -8%) on Ampere1 when the length is 64. When <= 128, both kernel_crc32_using_crc32 and kernel_crc32_using_crypto_pmull use the loop labeled as CRC_by32_loop, but their implements are a little different, and the loop in kernel_crc32_using_crc32 is better at hiding latency on Ampere1. So this PR takes the loop in kernel_crc32_using_crc32 to kernel_crc32_using_crypto_pmull, and does the same for CRC32C intrinsic. > > 2. The intrinsics only use crypto pmull when the length is higher than 383, while the loop in kernel_crc32_common_fold_using_crypto_pmull looks able to handle 256, and if it handles 256 on Ampere1, the improvements can be as high as 110% compared with kernel_crc32_using_crc32/kernel_crc32c_using_crc32c. However, there are regressions (~-6%) on Neoverse V1 when the length is 256. So this PR introduces a new option named CryptoPmullForCRC32LowLimit. It defaults to 256 since the code could handle 256, while it is set to 384 for V1/V2 to keep the old behavior on these platforms. > > The performance regressions and improvements were measured with the following microbenchmarks: > org.openjdk.bench.java.util.TestCRC32.testCRC32Update > org.openjdk.bench.java.util.TestCRC32C.testCRC32CUpdate > > Ran the following JTReg tests on Ampere1 and did not find problems: > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java src/hotspot/cpu/aarch64/globals_aarch64.hpp line 95: > 93: "Minimum size in bytes when Crypto PMULL will be used." \ > 94: "Value must be a multiple of 128.") \ > 95: range(256, max_jint) \ This shouldn't be a general product flag. Suggestion: product(intx, CryptoPmullForCRC32LowLimit, 256, DIAGNOSTIC, \ "Minimum size in bytes when Crypto PMULL will be used." \ "Value must be a multiple of 128.") \ range(256, max_jint) \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25609#discussion_r2123135028 From jbechberger at openjdk.org Tue Jun 3 08:42:56 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 08:42:56 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v33] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Fix non Linux builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/ab47f680..ef9f9cd1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=31-32 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From shade at openjdk.org Tue Jun 3 08:45:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 08:45:53 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" [v2] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 08:36:58 GMT, Aleksey Shipilev wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Address PR feedback: modify emum to be scoped. > > src/hotspot/share/code/nmethod.hpp line 498: > >> 496: >> 497: >> 498: static const char* NMethodChangeReason_to_string(NMethodChangeReason reason) { > > Uh, use a switch: > > > switch(reason) { > case C1_deoptimize: return "C1 deoptimized"; > case C1_codepatch: return "C1 code patch"; > ... > default: > assert(false, "Unhandled reason"); > return "Unknown"; > } Also, names: `change_reason_to_string(ChangeReason reason)`. Now that enum is scoped to `nmethod`, there is no need for `NMethod` prefix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25338#discussion_r2123148319 From shade at openjdk.org Tue Jun 3 08:45:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 08:45:52 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" [v2] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 05:40:09 GMT, Cesar Soares Lucas wrote: >> Please review this refactor to transform the reasons for making an nmethod not entrant from `const char*` into enum values. >> >> Tested on Linux x64 with JTREG tier1-3 in fastdebug and release mode. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feedback: modify emum to be scoped. Looks conceptually fine. Cosmetics: src/hotspot/share/code/nmethod.hpp line 498: > 496: > 497: > 498: static const char* NMethodChangeReason_to_string(NMethodChangeReason reason) { Uh, use a switch: switch(reason) { case C1_deoptimize: return "C1 deoptimized"; case C1_codepatch: return "C1 code patch"; ... default: assert(false, "Unhandled reason"); return "Unknown"; } src/hotspot/share/jvmci/jvmciEnv.cpp line 1755: > 1753: > 1754: > 1755: void JVMCIEnv::invalidate_nmethod_mirror(JVMCIObject mirror, bool deoptimize, nmethod::NMethodChangeReason statusReason, JVMCI_TRAPS) { Suggestion: void JVMCIEnv::invalidate_nmethod_mirror(JVMCIObject mirror, bool deoptimize, nmethod::NMethodChangeReason change_reason, JVMCI_TRAPS) { src/hotspot/share/jvmci/jvmciEnv.hpp line 465: > 463: // If `deoptimize` is true, the nmethod is immediately deoptimized. > 464: // The HotSpotNmethod.address field is zero upon returning. > 465: void invalidate_nmethod_mirror(JVMCIObject mirror, bool deoptimze, nmethod::NMethodChangeReason statusReason, JVMCI_TRAPS); Suggestion: void invalidate_nmethod_mirror(JVMCIObject mirror, bool deoptimize, nmethod::NMethodChangeReason change_reason, JVMCI_TRAPS); ------------- PR Review: https://git.openjdk.org/jdk/pull/25338#pullrequestreview-2891309126 PR Review Comment: https://git.openjdk.org/jdk/pull/25338#discussion_r2123138594 PR Review Comment: https://git.openjdk.org/jdk/pull/25338#discussion_r2123153641 PR Review Comment: https://git.openjdk.org/jdk/pull/25338#discussion_r2123152309 From dbriemann at openjdk.org Tue Jun 3 08:54:53 2025 From: dbriemann at openjdk.org (David Briemann) Date: Tue, 3 Jun 2025 08:54:53 GMT Subject: RFR: 8358013: [PPC64] VSX has poor performance on Power8 [v5] In-Reply-To: References: <6lRLaDtZkFd5zdOobo1RnSODoZk3r7T-sgjfpcnUVwU=.ad525055-0f15-4866-a295-20e2183eaf7b@github.com> Message-ID: On Mon, 2 Jun 2025 14:14:29 GMT, Martin Doerr wrote: >> Power8 only has limited VSX instructions for the superword optimization and the Vector API and the performance is bad. Let's only use it on Power9 and newer by default. This change excludes the VSX registers from C2 register allocation for Power8. VSX instruction usage gets limited to a few places like intrinsics. >> >> Note: Power8 is an old processor and performance optimizations for it are no longer planned. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Beautify @requires statement. Marked as reviewed by dbriemann (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/25514#pullrequestreview-2891361980 From fyang at openjdk.org Tue Jun 3 08:56:51 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 3 Jun 2025 08:56:51 GMT Subject: RFR: 8358310: ZGC: riscv, ppc ZPlatformAddressOffsetBits may return a too large value In-Reply-To: <6j_zozeh-Vwu3tRHRlJ5h_mhcMFsNm_OMUinAosz8fU=.d51c8c95-aad1-4566-a23b-8da5b521aa90@github.com> References: <6j_zozeh-Vwu3tRHRlJ5h_mhcMFsNm_OMUinAosz8fU=.d51c8c95-aad1-4566-a23b-8da5b521aa90@github.com> Message-ID: On Mon, 2 Jun 2025 08:55:02 GMT, Axel Boldt-Christmas wrote: > The way that ZPlatformAddressOffsetBits is implemented on riscv and ppc may result in a return value of 45. This is larger than the max supported value of 44 (because of other internal data structures). This was fixed in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275) for aarch64. > > Before [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the issue on manifested if one tried to select a heap larger than 16 TB (not supported), but after [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we try to double the heap address space when running on a NUMA machine. So we may now encounter this bug for heaps larger than 8TB (which is supported). > > While ZPlatformAddressOffsetBits needs an overhaul. (It was written for non-generational ZGC where we had the three color bits inside the address.) The proposal is that we solve this for ppc and riscv by doing the same thing we did for aarch64 in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275) Thanks. I tried some non-trivial benchmark workloads with ZGC on linux-riscv64, seems to work. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25578#pullrequestreview-2891368869 From fjiang at openjdk.org Tue Jun 3 09:04:55 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 3 Jun 2025 09:04:55 GMT Subject: RFR: 8356159: RISC-V: Add Zabha [v11] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 09:22:37 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> This adds the byte and halfword atomic memory operations (Zabha) - https://github.com/riscv/riscv-zabha. >> All amo-instructions, except load-reserve and store-conditional, can also be performed on natural aligned half-words and bytes. (i.e. the extension do not add lr.h/b or sc.h/b) This includes amocas if zacas extension is present. >> >> The majority of this patch is to support amocas.h/b. We are now starting to really feel the pain of all these extensions, as CAS:ing 16/8-bits can now be done in three different ways: >> - lr.w/sc.w 'narrow' CAS (no extension) >> - amocas.w 'narrow' CAS (Zacas) >> - amocas.h/b (Zacas + Zabha) >> >> There is no hwprobe support yet. >> >> Ran t1-3 with Zacas+Zabha and t1 without Zabha in qemu. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - Set ins cost to 2xVOLA for cmpxchg > - Merge branch 'master' into 8356159 > - Merge branch 'master' into 8356159 > - ins cost fixes, print fixes > - Merge branch 'master' into 8356159 > - Reg limits fixed > - Merge branch 'master' into 8356159 > - Fixed reg selection > - More indention > - Indention > - ... and 10 more: https://git.openjdk.org/jdk/compare/f7e126de...b496c299 src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4151: > 4149: zext(prev, prev, 32); > 4150: break; > 4151: case int16: The call site of `atomic_cas` is only guaranteed by `UseZacas`. Do we need extra checking for `UseZabha` if the operand size is `int16` or `int8`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25252#discussion_r2123175989 From clanger at openjdk.org Tue Jun 3 09:27:06 2025 From: clanger at openjdk.org (Christoph Langer) Date: Tue, 3 Jun 2025 09:27:06 GMT Subject: RFR: 8358013: [PPC64] VSX has poor performance on Power8 [v5] In-Reply-To: References: <6lRLaDtZkFd5zdOobo1RnSODoZk3r7T-sgjfpcnUVwU=.ad525055-0f15-4866-a295-20e2183eaf7b@github.com> Message-ID: On Mon, 2 Jun 2025 14:14:29 GMT, Martin Doerr wrote: >> Power8 only has limited VSX instructions for the superword optimization and the Vector API and the performance is bad. Let's only use it on Power9 and newer by default. This change excludes the VSX registers from C2 register allocation for Power8. VSX instruction usage gets limited to a few places like intrinsics. >> >> Note: Power8 is an old processor and performance optimizations for it are no longer planned. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Beautify @requires statement. Marked as reviewed by clanger (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25514#pullrequestreview-2891469201 From mdoerr at openjdk.org Tue Jun 3 09:27:07 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Jun 2025 09:27:07 GMT Subject: RFR: 8358013: [PPC64] VSX has poor performance on Power8 [v5] In-Reply-To: References: <6lRLaDtZkFd5zdOobo1RnSODoZk3r7T-sgjfpcnUVwU=.ad525055-0f15-4866-a295-20e2183eaf7b@github.com> Message-ID: On Mon, 2 Jun 2025 14:14:29 GMT, Martin Doerr wrote: >> Power8 only has limited VSX instructions for the superword optimization and the Vector API and the performance is bad. Let's only use it on Power9 and newer by default. This change excludes the VSX registers from C2 register allocation for Power8. VSX instruction usage gets limited to a few places like intrinsics. >> >> Note: Power8 is an old processor and performance optimizations for it are no longer planned. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Beautify @requires statement. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25514#issuecomment-2934323304 From mdoerr at openjdk.org Tue Jun 3 09:27:07 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Jun 2025 09:27:07 GMT Subject: Integrated: 8358013: [PPC64] VSX has poor performance on Power8 In-Reply-To: <6lRLaDtZkFd5zdOobo1RnSODoZk3r7T-sgjfpcnUVwU=.ad525055-0f15-4866-a295-20e2183eaf7b@github.com> References: <6lRLaDtZkFd5zdOobo1RnSODoZk3r7T-sgjfpcnUVwU=.ad525055-0f15-4866-a295-20e2183eaf7b@github.com> Message-ID: On Wed, 28 May 2025 22:23:57 GMT, Martin Doerr wrote: > Power8 only has limited VSX instructions for the superword optimization and the Vector API and the performance is bad. Let's only use it on Power9 and newer by default. This change excludes the VSX registers from C2 register allocation for Power8. VSX instruction usage gets limited to a few places like intrinsics. > > Note: Power8 is an old processor and performance optimizations for it are no longer planned. This pull request has now been integrated. Changeset: 457d9de8 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/457d9de81d0f65455e3292fafea03f0e83184029 Stats: 14 lines in 4 files changed: 10 ins; 0 del; 4 mod 8358013: [PPC64] VSX has poor performance on Power8 Reviewed-by: dbriemann, clanger ------------- PR: https://git.openjdk.org/jdk/pull/25514 From rehn at openjdk.org Tue Jun 3 09:55:01 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 3 Jun 2025 09:55:01 GMT Subject: RFR: 8356159: RISC-V: Add Zabha [v11] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 08:53:17 GMT, Feilong Jiang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: >> >> - Set ins cost to 2xVOLA for cmpxchg >> - Merge branch 'master' into 8356159 >> - Merge branch 'master' into 8356159 >> - ins cost fixes, print fixes >> - Merge branch 'master' into 8356159 >> - Reg limits fixed >> - Merge branch 'master' into 8356159 >> - Fixed reg selection >> - More indention >> - Indention >> - ... and 10 more: https://git.openjdk.org/jdk/compare/ab62c13b...b496c299 > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4151: > >> 4149: zext(prev, prev, 32); >> 4150: break; >> 4151: case int16: > > The call site of `atomic_cas` is only guaranteed by `UseZacas`. Do we need extra checking for `UseZabha` if the operand size is `int16` or `int8`? If nothing else the assembler always checks: void amo_base(Register Rd, Register Rs1, uint8_t Rs2, Aqrl memory_order = aqrl) { assert(width > AMO_WIDTH_HALFWORD || UseZabha, "Must be"); assert(funct5 != AMO_CAS || UseZacas, "Must be"); I'll have a look! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25252#discussion_r2123319547 From kbarrett at openjdk.org Tue Jun 3 10:03:51 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Jun 2025 10:03:51 GMT Subject: RFR: 8255082: HotSpot Style Guide should permit noexcept [v2] In-Reply-To: References: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> Message-ID: <8iAAoW0GY4ICp4SYUr_vP7hlHdsfL9mYxVg2Tk3eeP4=.930f8b97-6e0e-4420-a268-96729b425ac6@github.com> On Mon, 2 Jun 2025 19:38:52 GMT, Daniel D. Daugherty wrote: > I do have a query about whether the mention of `nothrow` should be `noexcept`. "nothrow" (not an identifier, not code font) is a commonly used informal term, and is reflected in the names of type traits like `is_nothrow_constructible<>`. The Standard seems to consistently use "non-throwing exception specification" in text. Terminology might have been different if C++ had `noexcept` to start with. I don't have a strong preference here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25574#issuecomment-2934461969 From kvn at openjdk.org Tue Jun 3 10:47:25 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 10:47:25 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder [v2] In-Reply-To: References: Message-ID: > There is difference between AdapterFingerPrint allocation size [compute_size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2227) which may not be aligned to HeapWord size and [size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421) used for copying during AOT cache build which is aligned and can be bigger than allocation size. > > I added asserts to `AdapterFingerPrint` and `AdapterHandlerEntry` to make sure sizes are correct. Both are used in AOT cache build. > > I also moved `FreeHeap()` from `~AdapterFingerPrint()` to enforce the comment and simplify executed code. > > Thanks to @MBaesken for finding the issue and @iklam for pointing the cause. > > Testing tier1-3, xcomp, stress. Higher tiers are still running. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Remove unused argument ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25604/files - new: https://git.openjdk.org/jdk/pull/25604/files/b03c5070..9b67ceab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25604&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25604&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25604/head:pull/25604 PR: https://git.openjdk.org/jdk/pull/25604 From kbarrett at openjdk.org Tue Jun 3 11:01:06 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Jun 2025 11:01:06 GMT Subject: RFR: 8255082: HotSpot Style Guide should permit noexcept [v3] In-Reply-To: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> References: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> Message-ID: > Please review this change to permit the use of `noexcept` under certain > circumstances in HotSpot code. > > http://wg21.link/n3050 > > Testing: > > JDK-8316930 (HotSpot should use noexcept instead of throw()) showed what the > conversion would look like. It will need to be brought up to current mainline, > possibly with modifications. > > This is a modification of the Style Guide, so rough consensus among the > HotSpot Group members is required to make this change. Only Group members > should vote for approval (via the github PR), though reasoned objections or > comments from anyone will be considered. A decision on this proposal will not > be made before Friday 16-June-2025 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review process > to approve (click on Review Changes > Approve), rather than sending a "vote: > yes" email reply that would be normal for a CFV. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: more dholmes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25574/files - new: https://git.openjdk.org/jdk/pull/25574/files/e6decd1f..2bbfbeee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25574&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25574&range=01-02 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25574.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25574/head:pull/25574 PR: https://git.openjdk.org/jdk/pull/25574 From kbarrett at openjdk.org Tue Jun 3 11:01:06 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 3 Jun 2025 11:01:06 GMT Subject: RFR: 8255082: HotSpot Style Guide should permit noexcept [v2] In-Reply-To: References: <-uPcWRhBsfKiRl5wRkLQ7YaAH4OCOlT0_ettXJQnUyY=.aa5c72c3-6767-41dd-8dae-45ff9a9e4884@github.com> Message-ID: On Mon, 2 Jun 2025 21:07:34 GMT, David Holmes wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> dholmes review > > doc/hotspot-style.html line 1121: > >> 1119:
  • Only the argument-less form of noexcept exception >> 1120: specifications are permitted. noexcept exception >> 1121: specifications with arguments are forbidden.
  • > > I was suggesting dropping the second sentence as it is implied by the first. Oh, I see. Sure. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25574#discussion_r2123457546 From jbechberger at openjdk.org Tue Jun 3 11:32:18 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 11:32:18 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v31] In-Reply-To: References: <_4vRA_P9_dLG022vs8ZinaZmqC48drRAwdOSiDG9Wjk=.25880197-6c87-4faf-8259-12d6c0f10f2e@github.com> Message-ID: On Tue, 3 Jun 2025 07:36:06 GMT, David Holmes wrote: >> Johannes Bechberger has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp >> >> Co-authored-by: Aleksey Shipil?v >> - Update src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp >> >> Co-authored-by: Aleksey Shipil?v >> - Small fixes > >> Regarding [#25302 (comment)](https://github.com/openjdk/jdk/pull/25302#discussion_r2119984783) >> >> ``` >> raw_thread == nullptr >> ``` >> >> This seems to happen rarely on (abrupt) shutdowns. I attached an hs_err file: [hs_err_pid1688961.log](https://github.com/user-attachments/files/20563594/hs_err_pid1688961.log) > > That is interesting. The signal appears to be being handled on an unattached thread during shutdown, and there is no stack left to show any VM involvement. Possibly we need to block the signal as part of thread termination, before we clear the current thread. ? Regarding the acquire-release-semantics (cc @dholmes-ora): I currently use it, because it works and is fast enough. Using a weaker semantics is a good optimization, but I would abstain it for new due to time constraints. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25302#issuecomment-2934793239 From jbechberger at openjdk.org Tue Jun 3 11:42:27 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 11:42:27 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v34] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - Use store-release semantics - Log error when timer_create fails ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/ef9f9cd1..93b5a189 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=32-33 Stats: 10 lines in 3 files changed: 1 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From rehn at openjdk.org Tue Jun 3 11:52:43 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 3 Jun 2025 11:52:43 GMT Subject: RFR: 8356159: RISC-V: Add Zabha [v12] In-Reply-To: References: Message-ID: > Hi, please consider. > > This adds the byte and halfword atomic memory operations (Zabha) - https://github.com/riscv/riscv-zabha. > All amo-instructions, except load-reserve and store-conditional, can also be performed on natural aligned half-words and bytes. (i.e. the extension do not add lr.h/b or sc.h/b) This includes amocas if zacas extension is present. > > The majority of this patch is to support amocas.h/b. We are now starting to really feel the pain of all these extensions, as CAS:ing 16/8-bits can now be done in three different ways: > - lr.w/sc.w 'narrow' CAS (no extension) > - amocas.w 'narrow' CAS (Zacas) > - amocas.h/b (Zacas + Zabha) > > There is no hwprobe support yet. > > Ran t1-3 with Zacas+Zabha and t1 without Zabha in qemu. > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision: - Merge branch 'master' into 8356159 - Set ins cost to 2xVOLA for cmpxchg - Merge branch 'master' into 8356159 - Merge branch 'master' into 8356159 - ins cost fixes, print fixes - Merge branch 'master' into 8356159 - Reg limits fixed - Merge branch 'master' into 8356159 - Fixed reg selection - More indention - ... and 11 more: https://git.openjdk.org/jdk/compare/2fdd35a2...cc3b8ff7 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25252/files - new: https://git.openjdk.org/jdk/pull/25252/files/b496c299..cc3b8ff7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25252&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25252&range=10-11 Stats: 27302 lines in 350 files changed: 7274 ins; 12599 del; 7429 mod Patch: https://git.openjdk.org/jdk/pull/25252.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25252/head:pull/25252 PR: https://git.openjdk.org/jdk/pull/25252 From rehn at openjdk.org Tue Jun 3 11:52:44 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 3 Jun 2025 11:52:44 GMT Subject: RFR: 8356159: RISC-V: Add Zabha [v11] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 09:51:46 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4151: >> >>> 4149: zext(prev, prev, 32); >>> 4150: break; >>> 4151: case int16: >> >> The call site of `atomic_cas` is only guaranteed by `UseZacas`. Do we need extra checking for `UseZabha` if the operand size is `int16` or `int8`? > > If nothing else the assembler always checks: > > void amo_base(Register Rd, Register Rs1, uint8_t Rs2, Aqrl memory_order = aqrl) { > assert(width > AMO_WIDTH_HALFWORD || UseZabha, "Must be"); > assert(funct5 != AMO_CAS || UseZacas, "Must be"); > > > I'll have a look! When we call amocas with unknown size (i.e. not hardcoded int64/int32) we have this assert in the caller: `assert((UseZacas && UseZabha) || (size != int8 && size != int16), "unsupported operand size");` So it seems like we should be fine, no? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25252#discussion_r2123563891 From coleenp at openjdk.org Tue Jun 3 12:02:53 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Jun 2025 12:02:53 GMT Subject: RFR: 8358339: Handle MethodCounters::_method backlinks after JDK-8355003 In-Reply-To: <0FmJVTYdAq7vmsOi4pi9NRKHm5MfmIrotPucldzsZj4=.b1335bca-f439-4c18-aa1c-6c69548d095d@github.com> References: <0FmJVTYdAq7vmsOi4pi9NRKHm5MfmIrotPucldzsZj4=.b1335bca-f439-4c18-aa1c-6c69548d095d@github.com> Message-ID: On Mon, 2 Jun 2025 18:41:42 GMT, Aleksey Shipilev wrote: > Found this when reading mainline-vs-premain webrev. [JDK-8355003](https://bugs.openjdk.org/browse/JDK-8355003) introduced a backlink to `Method*` in `MethodCounters`. I believe we need to handle that backlink at least in `CodeBuffer::finalize_oop_references()`. premain does this, while mainline does not. Also, amusingly, we have `MethodCounters::is_methodCounters`, but not the super-class `Metadata::is_methodCounters`. > > I pulled in the hunks that use `is_methodCounters()` and `MethodCounters::method()` from premain into this PR. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `runtime/cds` > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` I don't think this is the right thing to do, since the Method* is already handled in finalize_oop_references since it's a backpointer. And MethodCounters shouldn't be inhertited from Metadata, they're inherited from MetaspaceObj in mainline. We want to avoid virtual function pointers in this type. ------------- PR Review: https://git.openjdk.org/jdk/pull/25599#pullrequestreview-2892022311 PR Review: https://git.openjdk.org/jdk/pull/25599#pullrequestreview-2892026544 From jbechberger at openjdk.org Tue Jun 3 12:12:53 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 12:12:53 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v35] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - Fix include order - Tiny refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/93b5a189..71611f1e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=33-34 Stats: 19 lines in 4 files changed: 8 ins; 10 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From fjiang at openjdk.org Tue Jun 3 12:14:55 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 3 Jun 2025 12:14:55 GMT Subject: RFR: 8356159: RISC-V: Add Zabha [v11] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 11:46:27 GMT, Robbin Ehn wrote: >> If nothing else the assembler always checks: >> >> void amo_base(Register Rd, Register Rs1, uint8_t Rs2, Aqrl memory_order = aqrl) { >> assert(width > AMO_WIDTH_HALFWORD || UseZabha, "Must be"); >> assert(funct5 != AMO_CAS || UseZacas, "Must be"); >> >> >> I'll have a look! > > When we call amocas with unknown size (i.e. not hardcoded int64/int32) we have this assert in the caller: > `assert((UseZacas && UseZabha) || (size != int8 && size != int16), "unsupported operand size");` > > So it seems like we should be fine, no? Looks like `UseZabha` relies on `UseZacas`? The following code will call `atomic_cas` only when `UseZacas` is true even if the size is `int8` or `int16`. If that is true (maybe `(UseZacas && UseZabha)` already explained), then it makes sense. https://github.com/openjdk/jdk/blob/78a392aa3b0cda52cfacfa15250fa61010519424/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L4019-L4031 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25252#discussion_r2123625099 From aboldtch at openjdk.org Tue Jun 3 12:17:57 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 3 Jun 2025 12:17:57 GMT Subject: RFR: 8358310: ZGC: riscv, ppc ZPlatformAddressOffsetBits may return a too large value In-Reply-To: <6j_zozeh-Vwu3tRHRlJ5h_mhcMFsNm_OMUinAosz8fU=.d51c8c95-aad1-4566-a23b-8da5b521aa90@github.com> References: <6j_zozeh-Vwu3tRHRlJ5h_mhcMFsNm_OMUinAosz8fU=.d51c8c95-aad1-4566-a23b-8da5b521aa90@github.com> Message-ID: On Mon, 2 Jun 2025 08:55:02 GMT, Axel Boldt-Christmas wrote: > The way that ZPlatformAddressOffsetBits is implemented on riscv and ppc may result in a return value of 45. This is larger than the max supported value of 44 (because of other internal data structures). This was fixed in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275) for aarch64. > > Before [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the issue on manifested if one tried to select a heap larger than 16 TB (not supported), but after [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we try to double the heap address space when running on a NUMA machine. So we may now encounter this bug for heaps larger than 8TB (which is supported). > > While ZPlatformAddressOffsetBits needs an overhaul. (It was written for non-generational ZGC where we had the three color bits inside the address.) The proposal is that we solve this for ppc and riscv by doing the same thing we did for aarch64 in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275) Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25578#issuecomment-2934956974 From aboldtch at openjdk.org Tue Jun 3 12:17:58 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 3 Jun 2025 12:17:58 GMT Subject: Integrated: 8358310: ZGC: riscv, ppc ZPlatformAddressOffsetBits may return a too large value In-Reply-To: <6j_zozeh-Vwu3tRHRlJ5h_mhcMFsNm_OMUinAosz8fU=.d51c8c95-aad1-4566-a23b-8da5b521aa90@github.com> References: <6j_zozeh-Vwu3tRHRlJ5h_mhcMFsNm_OMUinAosz8fU=.d51c8c95-aad1-4566-a23b-8da5b521aa90@github.com> Message-ID: On Mon, 2 Jun 2025 08:55:02 GMT, Axel Boldt-Christmas wrote: > The way that ZPlatformAddressOffsetBits is implemented on riscv and ppc may result in a return value of 45. This is larger than the max supported value of 44 (because of other internal data structures). This was fixed in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275) for aarch64. > > Before [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) the issue on manifested if one tried to select a heap larger than 16 TB (not supported), but after [JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441) we try to double the heap address space when running on a NUMA machine. So we may now encounter this bug for heaps larger than 8TB (which is supported). > > While ZPlatformAddressOffsetBits needs an overhaul. (It was written for non-generational ZGC where we had the three color bits inside the address.) The proposal is that we solve this for ppc and riscv by doing the same thing we did for aarch64 in [JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275) This pull request has now been integrated. Changeset: 46183742 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/4618374269e8636c772d921ad0c2c2d9e5e3e643 Stats: 10 lines in 2 files changed: 4 ins; 0 del; 6 mod 8358310: ZGC: riscv, ppc ZPlatformAddressOffsetBits may return a too large value Reviewed-by: eosterlund, mdoerr, fyang ------------- PR: https://git.openjdk.org/jdk/pull/25578 From egahlin at openjdk.org Tue Jun 3 12:19:04 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 3 Jun 2025 12:19:04 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v10] In-Reply-To: References: Message-ID: <8ESOaNI_qHLzLquiZT7RZR43lit-o8_5rTky1nJFjH4=.a81b8882-1470-4f76-8c9a-cdc2a7b50070@github.com> On Mon, 26 May 2025 15:42:57 GMT, Erik Gahlin wrote: >>> This is added automatically. If I add "(Experimental)" to the title, then I get "X (Experimental) (Experimental)" >> >> Sweet. >> >>> I'm unsure how to implement this using the SQL version that is used for the views >> >> I will see if I can create an example with some other events that show the syntax, and then you can fill in the CPU-Time events. > >> I will see if I can create an example with some other events that show the syntax, and then you can fill in the CPU-Time events. > > I have a Mac, so I could not try it with an actual recording, but something like this: > > [application.cpu-time-statistics] > label = "CPU Time Samples Statistics" > form = "COLUMN 'Successful Samples', 'Failed Samples', 'Total Samples', 'Lost Samples' > SELECT COUNT(S.startTime), COUNT(F.startTime), Count(A.startTime), SUM(L.lostSamples) > FROM > CPUTimeSample AS S, > CPUTimeSample AS F, > CPUTimeSample AS A, > CPUTimeSampleLoss AS L > WHERE > S.failed = 'false' AND > F.failed = 'true'" > > > I removed biased, because I wonder If we should have such a field? There can be many types of biases, and the implementation may change in the future. > Hold on, shouldn't this really be "Lost"? @egahlin and @mgronlun need to chime in here. Lost might be better. I wonder if `` is needed, instead of thread = true? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25302#issuecomment-2934959447 From jbechberger at openjdk.org Tue Jun 3 12:19:05 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 12:19:05 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v10] In-Reply-To: <8ESOaNI_qHLzLquiZT7RZR43lit-o8_5rTky1nJFjH4=.a81b8882-1470-4f76-8c9a-cdc2a7b50070@github.com> References: <8ESOaNI_qHLzLquiZT7RZR43lit-o8_5rTky1nJFjH4=.a81b8882-1470-4f76-8c9a-cdc2a7b50070@github.com> Message-ID: <1JqKzjCGoZ9N_ez_gMKOlR1lbWPte0LkQS3bSb81ua0=.3c4c006b-18c0-4444-a867-8c774899b5b9@github.com> On Tue, 3 Jun 2025 12:15:06 GMT, Erik Gahlin wrote: > I wonder if is needed, instead of thread = true? We had these discussions before on the old PR and then decided to end up with eventThread (as the other events do to), ------------- PR Comment: https://git.openjdk.org/jdk/pull/25302#issuecomment-2934963523 From mgronlun at openjdk.org Tue Jun 3 12:19:06 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Jun 2025 12:19:06 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v35] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 12:12:53 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Fix include order > - Tiny refactoring src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 67: > 65: } > 66: > 67: if (is_excluded(jt)) { I think move this before the jt->is_exiting() check - excluded is much much more common than exiting... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2123633391 From shade at openjdk.org Tue Jun 3 12:19:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 12:19:53 GMT Subject: RFR: 8358339: Handle MethodCounters::_method backlinks after JDK-8355003 In-Reply-To: References: <0FmJVTYdAq7vmsOi4pi9NRKHm5MfmIrotPucldzsZj4=.b1335bca-f439-4c18-aa1c-6c69548d095d@github.com> Message-ID: On Tue, 3 Jun 2025 11:58:29 GMT, Coleen Phillimore wrote: > I don't think this is the right thing to do, since the Method* is already handled in finalize_oop_references since it's a backpointer. Sorry, I don't understand this comment. I think there is a symmetry between `MethodCounters` and `MethodData`. Now that `MethodCounters` have the backpointer to `Method*`, like `MethodData`, it should be handled like `MethodData` everywhere? > And MethodCounters shouldn't be inhertited from Metadata, they're inherited from MetaspaceObj in mainline. We want to avoid virtual function pointers in this type. Are you, perhaps, looking at older mainline? Because in current mainline `MethodCounters` is inherited from `Metadata`: https://github.com/openjdk/jdk/blob/78a392aa3b0cda52cfacfa15250fa61010519424/src/hotspot/share/oops/methodCounters.hpp#L35 -- this was also part of [JDK-8355003](https://bugs.openjdk.org/browse/JDK-8355003). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25599#issuecomment-2934965605 From mgronlun at openjdk.org Tue Jun 3 12:25:05 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Jun 2025 12:25:05 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v35] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 12:12:53 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Fix include order > - Tiny refactoring src/hotspot/share/jfr/support/jfrThreadLocal.hpp line 371: > 369: timer_t* cpu_timer() const; > 370: > 371: // The CPU time JFR lock has four different states: Only three different states now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2123651646 From jbechberger at openjdk.org Tue Jun 3 12:29:48 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 12:29:48 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v36] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - CPUTimeSampleLoss -> CPUTimeSamplesLost - Move is_excluded forward ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/71611f1e..a419daba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=34-35 Stats: 13 lines in 7 files changed: 0 ins; 1 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From azafari at openjdk.org Tue Jun 3 12:34:55 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 3 Jun 2025 12:34:55 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured [v2] In-Reply-To: References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> Message-ID: <6HSruHtZNPOZJp4vNFnwMns6-_rP_MEHtnnvAP7S5QU=.e91023a2-089c-4541-86a5-ae8d4adeb99d@github.com> On Mon, 2 Jun 2025 08:07:38 GMT, Matthias Baesken wrote: >> There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . >> Those fail when the address sanitizer is configured ( --enable-asan ). >> The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. >> Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . >> While at it, also same is also added for ubsan . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > TestBreakSignalThreadDump has issues with asan In ASAN built JDK, some gtests and some other JTREG tests in runtime/ErrorHandling also fail. Do we exclude these in another PR? or should they also be handled/excluded here? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25575#issuecomment-2935018245 From jbechberger at openjdk.org Tue Jun 3 12:35:54 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 12:35:54 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v37] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Fix tiny mistake ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/a419daba..44c37d17 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=36 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=35-36 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From mgronlun at openjdk.org Tue Jun 3 12:35:55 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Jun 2025 12:35:55 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v35] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 12:12:53 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: > > - Fix include order > - Tiny refactoring src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 235: > 233: > 234: void JfrCPUTimeThreadSampler::on_javathread_create(JavaThread* thread) { > 235: if (thread->is_Compiler_thread()) { is_hidden_from_external_view() + is_JfrRecorderThread() instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2123668189 From jbechberger at openjdk.org Tue Jun 3 12:39:47 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 12:39:47 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v38] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Restrict threads for which timers are created ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/44c37d17..83b55f58 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=37 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=36-37 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From mgronlun at openjdk.org Tue Jun 3 12:39:49 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Jun 2025 12:39:49 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v35] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 12:29:39 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix include order >> - Tiny refactoring > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 235: > >> 233: >> 234: void JfrCPUTimeThreadSampler::on_javathread_create(JavaThread* thread) { >> 235: if (thread->is_Compiler_thread()) { > > is_hidden_from_external_view() + is_JfrRecorderThread() instead? tl->is_excluded() is volatile and can change during runtime, so it's better to add a timer unconditionally there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2123669984 From jbechberger at openjdk.org Tue Jun 3 12:39:49 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 12:39:49 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v35] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 12:30:18 GMT, Markus Gr?nlund wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 235: >> >>> 233: >>> 234: void JfrCPUTimeThreadSampler::on_javathread_create(JavaThread* thread) { >>> 235: if (thread->is_Compiler_thread()) { >> >> is_hidden_from_external_view() + is_JfrRecorderThread() instead? > > tl->is_excluded() is volatile and can change during runtime, so it's better to add a timer unconditionally there. why not just `is_excluded`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2123673323 From mgronlun at openjdk.org Tue Jun 3 12:44:09 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Jun 2025 12:44:09 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v38] In-Reply-To: References: Message-ID: <23UojxlSZeRcCB38B1d2hDIcyXbgRnmbi8Vu3cfUmM4=.ba379d71-7bc7-4721-bda0-6d5f469a45f7@github.com> On Tue, 3 Jun 2025 12:39:47 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Restrict threads for which timers are created src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 63: > 61: assert(raw_thread->is_Java_thread(), "invariant"); > 62: JavaThread* jt = JavaThread::cast(raw_thread); > 63: if (is_excluded(jt)) { and now: if (is_excluded(jt) || jt->is_exiting()) { return nullptr; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2123692657 From jbechberger at openjdk.org Tue Jun 3 12:47:48 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 12:47:48 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v39] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Tiny refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/83b55f58..2b8c6db4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=38 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=37-38 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From mgronlun at openjdk.org Tue Jun 3 12:51:13 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Jun 2025 12:51:13 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v38] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 12:39:47 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Restrict threads for which timers are created src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 327: > 325: JfrThreadLocal* tl = jt->jfr_thread_local(); > 326: if (tl->wants_async_processing_of_cpu_time_jfr_requests()) { > 327: if (!jt->has_last_Java_frame() || jt->thread_state() != _thread_in_native || !tl->try_acquire_cpu_time_jfr_dequeue_lock()) { I recommend this order for higher probability: 1. jt->thread_state() != _thread_in_native 2. !tl->try_acquire_cpu_time_jfr_dequeue_lock() 3. !jt->has_last_Java_frame() You need to restructure of course, to get the unlocking correct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2123711980 From coleenp at openjdk.org Tue Jun 3 12:55:05 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Jun 2025 12:55:05 GMT Subject: RFR: 8352075: Perf regression accessing fields [v21] In-Reply-To: References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> Message-ID: On Tue, 3 Jun 2025 07:16:47 GMT, Radim Vansa wrote: >> This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 trying to reduce the performance regression in some scenarios introduced in https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and memory consumption it is a (better) alternative to https://github.com/openjdk/jdk/pull/24713 . >> >> This PR optimizes local field lookup in classes with more than 16 fields; rather than sequentially iterating through all fields during lookup we sort the fields based on the field name. The stream includes extra table after the field information: for field at position 16, 32 ... we record the (variable-length-encoded) offset of the field info in this stream. On field lookup, rather than iterating through all fields, we iterate through this table, resolve names for given fields and continue field-by-field iteration only after the last record (hence at most 16 fields). >> >> In classes with <= 16 fields this PR reduces the memory consumption by 1 byte that was left with value 0 at the end of stream. In classes with > 16 fields we add extra 4 bytes with offset of the table, and the table contains one varint for each 16 fields. The terminal byte is not used either. >> >> My measurements on the attached reproducer >> >> hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC >> Time (mean ? ?): 51.3 ms ? 2.8 ms [User: 44.7 ms, System: 13.7 ms] >> Range (min ? max): 45.1 ms ? 53.9 ms 100 runs >> >> hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC >> Time (mean ? ?): 78.2 ms ? 1.0 ms [User: 74.6 ms, System: 17.3 ms] >> Range (min ? max): 73.8 ms ? 79.7 ms 100 runs >> >> (the jdk25-master above already contains JDK-8353175) >> >> hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC' >> Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC >> Time (mean ? ?): 38.5 ms ? 0.5 ms [User: 34.4 ms, System: 17.3 ms] >> Range (min ? max): 37.7 ms ? 42.1 ms 100 runs >> >> While https://github.com/openjdk/jdk/pull/24713 returned the performance to previous levels, this PR improves it by 25% compared to JDK 17 (which does not contain the regression)! This time, the undisclosed production-grade reproducer shows even higher improvement: >> >> JDK 17: 1.6 s >> JDK 21 (no patches): 22 s >> JDK25-master: 12.3 s >> JDK25-this-pr: 0.5 s > > Radim Vansa has updated the pull request incrementally with three additional commits since the last revision: > > - Moved jtreg test > - Improved documentation > - Fix coding style (asterisk placement) Thanks for these coding style fixes. Some IDEs choose the other placement for the asterisk which makes it annoying. Thanks for moving the test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24847#issuecomment-2935098090 From jbechberger at openjdk.org Tue Jun 3 13:00:06 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 13:00:06 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v40] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Reorder condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/2b8c6db4..56ce2b05 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=39 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=38-39 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From mgronlun at openjdk.org Tue Jun 3 13:00:06 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Jun 2025 13:00:06 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v38] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 12:39:47 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Restrict threads for which timers are created src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp line 479: > 477: > 478: // Entry point for a thread that has been sampled in native code and has a pending JFR CPU time request. > 479: void JfrThreadSampling::process_cpu_time_request(JavaThread* jt, JfrThreadLocal* tl, Thread* current, bool lock) { Can you move this up to be co-located with "drain_enqueued_cpu_time_requests"? Thanks. src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.hpp line 40: > 38: public: > 39: static void process_sample_request(JavaThread* jt, bool has_cpu_time_sample_request); > 40: static void process_cpu_time_request(JavaThread* jt, JfrThreadLocal* tl, Thread* current, bool lock); Put this under private and add JfrCPUTimeThreadSampler as a friend (like above with JfrSamplerThread and "process_native_sample_requests" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2123728400 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2123726144 From rehn at openjdk.org Tue Jun 3 13:17:56 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 3 Jun 2025 13:17:56 GMT Subject: RFR: 8356159: RISC-V: Add Zabha [v11] In-Reply-To: References: Message-ID: <_RXza4TCeXVWANzCikEnlud-YBc00BEBvCCH_OiHimg=.f5ebd65e-097c-44fd-b952-fa6d36a1e47f@github.com> On Tue, 3 Jun 2025 12:10:07 GMT, Feilong Jiang wrote: >> When we call amocas with unknown size (i.e. not hardcoded int64/int32) we have this assert in the caller: >> `assert((UseZacas && UseZabha) || (size != int8 && size != int16), "unsupported operand size");` >> >> So it seems like we should be fine, no? > > Looks like `UseZabha` relies on `UseZacas`? The following code will call `atomic_cas` only when `UseZacas` is true even if the size is `int8` or `int16`. If that is true (maybe `(UseZacas && UseZabha)` already explained), then it makes sense. > > https://github.com/openjdk/jdk/blob/78a392aa3b0cda52cfacfa15250fa61010519424/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L4019-L4031 You can't use that method for int8/int16 if you don't have UseZabha (and Zacas). You must call the cmpxchg_narrow(). The reason why we have two different method is the number of registers needed, i.e. cmpxchg_narrow requires scratch registers. So when you don't have UseZabha you should not be calling this method at all for int8/int16. There is an assert above that checks: `assert((UseZacas && UseZabha) || (size != int8 && size != int16), "unsupported operand size");` The relationship is: UseZacas=false and UseZabha=false => LR/SC and narrow LR/SC for sub word size. UseZacas=true and UseZabha=false => amocas and narrow amocas for sub word size. UseZacas=false and UseZabha=true => LR/SC and narrow LR/SC for sub word size. UseZacas=true and UseZabha=true => amocas and amocas for sub word size. There is no LR/SC for sub word sizes, int8/int16, thus without Zacas they always use narrow LR/SC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25252#discussion_r2123776682 From fjiang at openjdk.org Tue Jun 3 13:24:55 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 3 Jun 2025 13:24:55 GMT Subject: RFR: 8356159: RISC-V: Add Zabha [v12] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 11:52:43 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> This adds the byte and halfword atomic memory operations (Zabha) - https://github.com/riscv/riscv-zabha. >> All amo-instructions, except load-reserve and store-conditional, can also be performed on natural aligned half-words and bytes. (i.e. the extension do not add lr.h/b or sc.h/b) This includes amocas if zacas extension is present. >> >> The majority of this patch is to support amocas.h/b. We are now starting to really feel the pain of all these extensions, as CAS:ing 16/8-bits can now be done in three different ways: >> - lr.w/sc.w 'narrow' CAS (no extension) >> - amocas.w 'narrow' CAS (Zacas) >> - amocas.h/b (Zacas + Zabha) >> >> There is no hwprobe support yet. >> >> Ran t1-3 with Zacas+Zabha and t1 without Zabha in qemu. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision: > > - Merge branch 'master' into 8356159 > - Set ins cost to 2xVOLA for cmpxchg > - Merge branch 'master' into 8356159 > - Merge branch 'master' into 8356159 > - ins cost fixes, print fixes > - Merge branch 'master' into 8356159 > - Reg limits fixed > - Merge branch 'master' into 8356159 > - Fixed reg selection > - More indention > - ... and 11 more: https://git.openjdk.org/jdk/compare/1d76abc0...cc3b8ff7 Looks good! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/25252#pullrequestreview-2892342660 From fjiang at openjdk.org Tue Jun 3 13:24:56 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 3 Jun 2025 13:24:56 GMT Subject: RFR: 8356159: RISC-V: Add Zabha [v11] In-Reply-To: <_RXza4TCeXVWANzCikEnlud-YBc00BEBvCCH_OiHimg=.f5ebd65e-097c-44fd-b952-fa6d36a1e47f@github.com> References: <_RXza4TCeXVWANzCikEnlud-YBc00BEBvCCH_OiHimg=.f5ebd65e-097c-44fd-b952-fa6d36a1e47f@github.com> Message-ID: On Tue, 3 Jun 2025 13:14:26 GMT, Robbin Ehn wrote: >> Looks like `UseZabha` relies on `UseZacas`? The following code will call `atomic_cas` only when `UseZacas` is true even if the size is `int8` or `int16`. If that is true (maybe `(UseZacas && UseZabha)` already explained), then it makes sense. >> >> https://github.com/openjdk/jdk/blob/78a392aa3b0cda52cfacfa15250fa61010519424/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L4019-L4031 > > You can't use that method for int8/int16 if you don't have UseZabha (and Zacas). > You must call the cmpxchg_narrow(). > > The reason why we have two different method is the number of registers needed, i.e. cmpxchg_narrow requires scratch registers. > > So when you don't have UseZabha you should not be calling this method at all for int8/int16. > > There is an assert above that checks: > `assert((UseZacas && UseZabha) || (size != int8 && size != int16), "unsupported operand size");` > > The relationship is: > > UseZacas=false and UseZabha=false => LR/SC and narrow LR/SC for sub word size. > UseZacas=true and UseZabha=false => amocas and narrow amocas for sub word size. > UseZacas=false and UseZabha=true => LR/SC and narrow LR/SC for sub word size. > UseZacas=true and UseZabha=true => amocas and amocas for sub word size. > > > There is no LR/SC for sub word sizes, int8/int16, thus without Zacas they always use narrow LR/SC. Ah, I missed `(size != int8 && size != int16)` assertion. So it should be fine, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25252#discussion_r2123799385 From jbechberger at openjdk.org Tue Jun 3 13:33:29 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 13:33:29 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v41] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Make process_cpu_time_request private and move up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/56ce2b05..7561d512 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=40 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=39-40 Stats: 17 lines in 2 files changed: 9 ins; 8 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From asmehra at openjdk.org Tue Jun 3 13:57:51 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 3 Jun 2025 13:57:51 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder [v2] In-Reply-To: References: Message-ID: <2eSFIbD9m61pBTA64R6x5UyMn5eIjRJQZh48l3sh7yo=.f2b10311-77df-4007-b939-9eae802264b8@github.com> On Tue, 3 Jun 2025 10:47:25 GMT, Vladimir Kozlov wrote: >> There is difference between AdapterFingerPrint allocation size [compute_size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2227) which may not be aligned to HeapWord size and [size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421) used for copying during AOT cache build which is aligned and can be bigger than allocation size. >> >> I added asserts to `AdapterFingerPrint` and `AdapterHandlerEntry` to make sure sizes are correct. Both are used in AOT cache build. >> >> I also moved `FreeHeap()` from `~AdapterFingerPrint()` to enforce the comment and simplify executed code. >> >> Thanks to @MBaesken for finding the issue and @iklam for pointing the cause. >> >> Testing tier1-3, xcomp, stress. Higher tiers are still running. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused argument Marked as reviewed by asmehra (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25604#pullrequestreview-2892508069 From jbechberger at openjdk.org Tue Jun 3 14:09:29 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 3 Jun 2025 14:09:29 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Rename autoadapt ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/7561d512..ae55610c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=41 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=40-41 Stats: 41 lines in 8 files changed: 0 ins; 0 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From dchuyko at openjdk.org Tue Jun 3 14:09:35 2025 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Tue, 3 Jun 2025 14:09:35 GMT Subject: RFR: 8337666: AArch64: SHA3 GPR intrinsic [v6] In-Reply-To: References: Message-ID: > This is an implementation of SHA3 intrinsics for AArch64 that operates GPRs. It follows the Java implementation algorithm but eagerly uses available registers. For example, FP+R18 are used when it's allowed. On simpler cores like RPi3 or Surface Pro it is 23-53% faster than C2 compiled version; on Graviton 3 it is 8-14% faster than C2 compiled version (which is faster than the current intrinsic); on Apple Silicon it is faster than C2 compiled version but slower than the ARMv8.2-SHA intrinsic. Improvements on a particular CPU depend on the input length. For instance, for Graviton 2: > > > Benchmark (ops/ms) (digesterName) (length) G2 > MessageDigests.digest SHA3-256 64 28.28% > MessageDigests.digest SHA3-256 16384 53.58% > MessageDigests.digest SHA3-512 64 27.97% > MessageDigests.digest SHA3-512 16384 43.90% > MessageDigests.getAndDigest SHA3-256 64 26.18% > MessageDigests.getAndDigest SHA3-256 16384 52.82% > MessageDigests.getAndDigest SHA3-512 64 24.73% > MessageDigests.getAndDigest SHA3-512 16384 44.31% > > > (results for intermediate input lengths look like steps) > > On Graviton 4 there is still a noticeable difference between the proposed implementation and C2 generated code: > > > Benchmark (digesterName) (length) Pct > MessageDigests.digest SHA3-256 64 8.3% > MessageDigests.digest SHA3-256 16384 11% > MessageDigests.digest SHA3-512 64 8.4% > MessageDigests.digest SHA3-512 16384 11.5% > MessageDigests.getAndDigest SHA3-256 64 7.2% > MessageDigests.getAndDigest SHA3-256 16384 11% > MessageDigests.getAndDigest SHA3-512 64 7.3% > MessageDigests.getAndDigest SHA3-512 16384 11.6% > > > and the version that uses the extension is ~1.8x slower than C2 > > Existing intrinsic implementation is put under a flag `UseSIMDForSHA3Intrinsic` which is on by default where the intrinsic is enabled currently. > > Sanity tests were modified to cover new intrinsic variants (`-XX:-UseSIMDForSHA3Intrinsic -XX:+-PreserveFramePointer`) on aarch64 hw. Existing test cases where intrinsic is enabled are executed with `-XX:+IgnoreUnrecognizedVMOptions -XX:+UseSIMDForSHA3Intrinsic`, on platforms where the sha3 extension is missing they still are cut off by isSHA3IntrinsicAvailable() predicate. > > The original PR https://github.com/openjdk/jdk/pull/20422 has been auto-closed and the branch has been re-created on top of the new master. Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Merge branch 'openjdk:master' into JDK-8337666 - Update src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp Co-authored-by: Andrew Haley - Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp Co-authored-by: Andrew Haley - Merge branch 'openjdk:master' into JDK-8337666 - Assert message - Copyright year - Review suggestions - Merge master - Delete empty line - SHA3 GPR intrinsic & tests ------------- Changes: https://git.openjdk.org/jdk/pull/24260/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24260&range=05 Stats: 749 lines in 6 files changed: 743 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24260.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24260/head:pull/24260 PR: https://git.openjdk.org/jdk/pull/24260 From aph at openjdk.org Tue Jun 3 14:24:59 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 3 Jun 2025 14:24:59 GMT Subject: RFR: 8337666: AArch64: SHA3 GPR intrinsic [v4] In-Reply-To: References: <47P15HTCeTU93mVEKekG-smYjt5ebvSMJ8bgbG28vEI=.5f49753a-7ff5-4154-80e2-cd4fc996119f@github.com> Message-ID: On Sat, 31 May 2025 08:39:36 GMT, Andrew Haley wrote: >> Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - Merge branch 'openjdk:master' into JDK-8337666 >> - Assert message >> - Copyright year >> - Review suggestions >> - Merge master >> - Delete empty line >> - SHA3 GPR intrinsic & tests > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 331: > >> 329: >> 330: inline void rol(Register Rd, Register Rn, unsigned imm) { >> 331: extr(Rd, Rn, Rn, ((64 - imm) & 63)); > > Suggestion: > > extr(Rd, Rn, Rn, (64 - imm)); > > It's better to catch an out-of-range immediate value. `rolw` too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24260#discussion_r2124008640 From mgronlun at openjdk.org Tue Jun 3 16:25:07 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Jun 2025 16:25:07 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v35] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 12:31:53 GMT, Johannes Bechberger wrote: >> tl->is_excluded() is volatile and can change during runtime, so it's better to add a timer unconditionally there. > > why not just `is_excluded`? because tl->is_excluded() can get included and excluded many times during runtime. Its not a static property. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2124366551 From dchuyko at openjdk.org Tue Jun 3 16:31:08 2025 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Tue, 3 Jun 2025 16:31:08 GMT Subject: RFR: 8337666: AArch64: SHA3 GPR intrinsic [v7] In-Reply-To: References: Message-ID: <3hGFnUsyrN809lwWuqr7dyxfoCm0F2ILSB-yJV5Hfvo=.1a2048d4-d131-4354-a629-d75f206dda42@github.com> > This is an implementation of SHA3 intrinsics for AArch64 that operates GPRs. It follows the Java implementation algorithm but eagerly uses available registers. For example, FP+R18 are used when it's allowed. On simpler cores like RPi3 or Surface Pro it is 23-53% faster than C2 compiled version; on Graviton 3 it is 8-14% faster than C2 compiled version (which is faster than the current intrinsic); on Apple Silicon it is faster than C2 compiled version but slower than the ARMv8.2-SHA intrinsic. Improvements on a particular CPU depend on the input length. For instance, for Graviton 2: > > > Benchmark (ops/ms) (digesterName) (length) G2 > MessageDigests.digest SHA3-256 64 28.28% > MessageDigests.digest SHA3-256 16384 53.58% > MessageDigests.digest SHA3-512 64 27.97% > MessageDigests.digest SHA3-512 16384 43.90% > MessageDigests.getAndDigest SHA3-256 64 26.18% > MessageDigests.getAndDigest SHA3-256 16384 52.82% > MessageDigests.getAndDigest SHA3-512 64 24.73% > MessageDigests.getAndDigest SHA3-512 16384 44.31% > > > (results for intermediate input lengths look like steps) > > On Graviton 4 there is still a noticeable difference between the proposed implementation and C2 generated code: > > > Benchmark (digesterName) (length) Pct > MessageDigests.digest SHA3-256 64 8.3% > MessageDigests.digest SHA3-256 16384 11% > MessageDigests.digest SHA3-512 64 8.4% > MessageDigests.digest SHA3-512 16384 11.5% > MessageDigests.getAndDigest SHA3-256 64 7.2% > MessageDigests.getAndDigest SHA3-256 16384 11% > MessageDigests.getAndDigest SHA3-512 64 7.3% > MessageDigests.getAndDigest SHA3-512 16384 11.6% > > > and the version that uses the extension is ~1.8x slower than C2 > > Existing intrinsic implementation is put under a flag `UseSIMDForSHA3Intrinsic` which is on by default where the intrinsic is enabled currently. > > Sanity tests were modified to cover new intrinsic variants (`-XX:-UseSIMDForSHA3Intrinsic -XX:+-PreserveFramePointer`) on aarch64 hw. Existing test cases where intrinsic is enabled are executed with `-XX:+IgnoreUnrecognizedVMOptions -XX:+UseSIMDForSHA3Intrinsic`, on platforms where the sha3 extension is missing they still are cut off by isSHA3IntrinsicAvailable() predicate. > > The original PR https://github.com/openjdk/jdk/pull/20422 has been auto-closed and the branch has been re-created on top of the new master. Dmitry Chuyko has updated the pull request incrementally with one additional commit since the last revision: No imm masking in rolw ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24260/files - new: https://git.openjdk.org/jdk/pull/24260/files/cd24df67..d9cf5135 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24260&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24260&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24260.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24260/head:pull/24260 PR: https://git.openjdk.org/jdk/pull/24260 From lmesnik at openjdk.org Tue Jun 3 16:46:55 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 3 Jun 2025 16:46:55 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured [v2] In-Reply-To: References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> Message-ID: On Mon, 2 Jun 2025 08:07:38 GMT, Matthias Baesken wrote: >> There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . >> Those fail when the address sanitizer is configured ( --enable-asan ). >> The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. >> Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . >> While at it, also same is also added for ubsan . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > TestBreakSignalThreadDump has issues with asan Thank you for implementing exclusion this way. I'll approve PR once you address feedback about commenting. ------------- PR Review: https://git.openjdk.org/jdk/pull/25575#pullrequestreview-2893319737 From cslucas at openjdk.org Tue Jun 3 17:06:56 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 3 Jun 2025 17:06:56 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" [v2] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 08:40:41 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/code/nmethod.hpp line 498: >> >>> 496: >>> 497: >>> 498: static const char* NMethodChangeReason_to_string(NMethodChangeReason reason) { >> >> Uh, use a switch: >> >> >> switch(reason) { >> case C1_deoptimize: return "C1 deoptimized"; >> case C1_codepatch: return "C1 code patch"; >> ... >> default: >> assert(false, "Unhandled reason"); >> return "Unknown"; >> } > > Also, names: `change_reason_to_string(ChangeReason reason)`. Now that enum is scoped to `nmethod`, there is no need for `NMethod` prefix. Makes sense, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25338#discussion_r2124446618 From dchuyko at openjdk.org Tue Jun 3 17:09:57 2025 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Tue, 3 Jun 2025 17:09:57 GMT Subject: RFR: 8337666: AArch64: SHA3 GPR intrinsic [v4] In-Reply-To: References: <47P15HTCeTU93mVEKekG-smYjt5ebvSMJ8bgbG28vEI=.5f49753a-7ff5-4154-80e2-cd4fc996119f@github.com> Message-ID: On Tue, 3 Jun 2025 14:22:10 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 331: >> >>> 329: >>> 330: inline void rol(Register Rd, Register Rn, unsigned imm) { >>> 331: extr(Rd, Rn, Rn, ((64 - imm) & 63)); >> >> Suggestion: >> >> extr(Rd, Rn, Rn, (64 - imm)); >> >> It's better to catch an out-of-range immediate value. > > `rolw` too. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24260#discussion_r2124451203 From kvn at openjdk.org Tue Jun 3 17:18:00 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 17:18:00 GMT Subject: RFR: 8358339: Handle MethodCounters::_method backlinks after JDK-8355003 In-Reply-To: <0FmJVTYdAq7vmsOi4pi9NRKHm5MfmIrotPucldzsZj4=.b1335bca-f439-4c18-aa1c-6c69548d095d@github.com> References: <0FmJVTYdAq7vmsOi4pi9NRKHm5MfmIrotPucldzsZj4=.b1335bca-f439-4c18-aa1c-6c69548d095d@github.com> Message-ID: On Mon, 2 Jun 2025 18:41:42 GMT, Aleksey Shipilev wrote: > Found this when reading mainline-vs-premain webrev. [JDK-8355003](https://bugs.openjdk.org/browse/JDK-8355003) introduced a backlink to `Method*` in `MethodCounters`. I believe we need to handle that backlink at least in `CodeBuffer::finalize_oop_references()`. premain does this, while mainline does not. Also, amusingly, we have `MethodCounters::is_methodCounters`, but not the super-class `Metadata::is_methodCounters`. > > I pulled in the hunks that use `is_methodCounters()` and `MethodCounters::method()` from premain into this PR. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `runtime/cds` > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Good. This will be needed for AOT caching Level2 C1 compiled nmethods which have profiling: https://github.com/vnkozlov/jdk/commit/46595236a88a90908a7a54e4c6bb872d634be441 ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25599#pullrequestreview-2893406165 From shade at openjdk.org Tue Jun 3 17:49:41 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 17:49:41 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder [v2] In-Reply-To: References: Message-ID: <-wJp4hkfj1YBQ4C_UjhsqBE2UkrbOafMr_bs_-v7S-A=.9801fb39-6372-4803-bbfe-fa5c3fe9ad3f@github.com> On Tue, 3 Jun 2025 10:47:25 GMT, Vladimir Kozlov wrote: >> There is difference between AdapterFingerPrint allocation size [compute_size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2227) which may not be aligned to HeapWord size and [size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421) used for copying during AOT cache build which is aligned and can be bigger than allocation size. >> >> I added asserts to `AdapterFingerPrint` and `AdapterHandlerEntry` to make sure sizes are correct. Both are used in AOT cache build. >> >> I also moved `FreeHeap()` from `~AdapterFingerPrint()` to enforce the comment and simplify executed code. >> >> Thanks to @MBaesken for finding the issue and @iklam for pointing the cause. >> >> Testing tier1-3, xcomp, stress. Higher tiers are still running. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove unused argument src/hotspot/share/runtime/sharedRuntime.cpp line 2227: > 2225: } > 2226: > 2227: static int compute_size(int total_args_passed) { OK, but if the source of discrepancy is between two places computing stuff separately (inconsistently), do you want to make the computations mechanically the same? Something like: static int compute_size_in_words(int total_args_passed) { return (int)heap_word_size(sizeof(AdapterFingerPrint) + (length(total_args_passed) * sizeof(int))); } static int compute_size_in_bytes(int total_args_passed) { return compute_size_in_words(total_args_passed) * BytesPerWord; } Then use `compute_size_in_words()` in the other place: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25604#discussion_r2124505187 From cslucas at openjdk.org Tue Jun 3 17:52:04 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 3 Jun 2025 17:52:04 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" [v3] In-Reply-To: References: Message-ID: > Please review this refactor to transform the reasons for making an nmethod not entrant from `const char*` into enum values. > > Tested on Linux x64 with JTREG tier1-3 in fastdebug and release mode. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Address PR feedback: more refactoring / renamings ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25338/files - new: https://git.openjdk.org/jdk/pull/25338/files/b3bb4365..fa77be5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25338&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25338&range=01-02 Stats: 102 lines in 15 files changed: 11 ins; 31 del; 60 mod Patch: https://git.openjdk.org/jdk/pull/25338.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25338/head:pull/25338 PR: https://git.openjdk.org/jdk/pull/25338 From shade at openjdk.org Tue Jun 3 17:52:05 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 17:52:05 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" [v3] In-Reply-To: References: Message-ID: <195pScH-Kh1H1JvhkwC9xLM_joDJPccyMve95BwIlzk=.22181903-b4af-4ec5-8d57-688a6ee51832@github.com> On Tue, 3 Jun 2025 17:44:30 GMT, Cesar Soares Lucas wrote: >> Please review this refactor to transform the reasons for making an nmethod not entrant from `const char*` into enum values. >> >> Tested on Linux x64 with JTREG tier1-3 in fastdebug and release mode. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feedback: more refactoring / renamings Almost there, more cosmetics. src/hotspot/share/code/nmethod.hpp line 500: > 498: static const char* change_reason_to_string(ChangeReason change_reason) { > 499: switch (change_reason) { > 500: case ChangeReason::C1_codepatch: return "C1 code patch"; Indenting: should be two spaces everywhere. Also, I think this kind of indenting forces us to re-align the switch for the largest enum label. Let's just break them. Plus, any multi-line blocks should be braced. So, in total: switch (change_reason) { case ChangeReason::C1_codepatch: return "C1 code patch"; ... default: { assert(false, "Unhandled reason"); return "Unknown"; } } src/hotspot/share/code/nmethod.hpp line 691: > 689: // another thread performed the transition. > 690: bool make_not_entrant(ChangeReason change_reason); > 691: bool make_not_used() { return make_not_entrant(ChangeReason::not_used); } Suggestion: bool make_not_used() { return make_not_entrant(ChangeReason::not_used); } ------------- PR Review: https://git.openjdk.org/jdk/pull/25338#pullrequestreview-2893478002 PR Review Comment: https://git.openjdk.org/jdk/pull/25338#discussion_r2124511959 PR Review Comment: https://git.openjdk.org/jdk/pull/25338#discussion_r2124514702 From kvn at openjdk.org Tue Jun 3 17:58:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 17:58:48 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder [v2] In-Reply-To: <-wJp4hkfj1YBQ4C_UjhsqBE2UkrbOafMr_bs_-v7S-A=.9801fb39-6372-4803-bbfe-fa5c3fe9ad3f@github.com> References: <-wJp4hkfj1YBQ4C_UjhsqBE2UkrbOafMr_bs_-v7S-A=.9801fb39-6372-4803-bbfe-fa5c3fe9ad3f@github.com> Message-ID: <-yXLmyKeGt7ajGu_p3QgKPD2fD-2uSd7c8Hz-MHINMA=.504c9dc9-23ae-447d-8f26-74fc5693f30a@github.com> On Tue, 3 Jun 2025 17:37:49 GMT, Aleksey Shipilev wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused argument > > src/hotspot/share/runtime/sharedRuntime.cpp line 2227: > >> 2225: } >> 2226: >> 2227: static int compute_size(int total_args_passed) { > > OK, but if the source of discrepancy is between two places computing stuff separately (inconsistently), do you want to make the computations mechanically the same? > > Something like: > > > static int compute_size_in_words(int total_args_passed) { > return (int)heap_word_size(sizeof(AdapterFingerPrint) + (length(total_args_passed) * sizeof(int))); > } > > static int compute_size_in_bytes(int total_args_passed) { > return compute_size_in_words(total_args_passed) * BytesPerWord; > } > > > Then use `compute_size_in_words()` in the other place: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421 Yes, I can do that. But I will pass _length which is different from total_args_passed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25604#discussion_r2124538592 From iveresov at openjdk.org Tue Jun 3 18:07:25 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 3 Jun 2025 18:07:25 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder Message-ID: Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. ------------- Commit messages: - Cleanup with KlassTrainingData constructor Changes: https://git.openjdk.org/jdk/pull/25623/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25623&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358003 Stats: 17 lines in 1 file changed: 0 ins; 13 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25623.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25623/head:pull/25623 PR: https://git.openjdk.org/jdk/pull/25623 From shade at openjdk.org Tue Jun 3 18:07:26 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 18:07:26 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder In-Reply-To: References: Message-ID: <0BmSTgFVR4bDzT_UBDHac675eWlfGA6XmIIs_QO-pUY=.a0f73638-cc74-4441-a03a-6db66bd12ea0@github.com> On Tue, 3 Jun 2025 17:36:13 GMT, Igor Veresov wrote: > Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. Makes sense. I was dumbfounded what was "previous handle", when we are in constructor. I suspected it was something about placement-new code somewhere. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25623#pullrequestreview-2893523002 From iveresov at openjdk.org Tue Jun 3 18:07:26 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 3 Jun 2025 18:07:26 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 17:36:13 GMT, Igor Veresov wrote: > Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. mach5 testing in progress, will report back once it's done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25623#issuecomment-2936442642 From kvn at openjdk.org Tue Jun 3 18:17:29 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 18:17:29 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder [v3] In-Reply-To: References: Message-ID: > There is difference between AdapterFingerPrint allocation size [compute_size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2227) which may not be aligned to HeapWord size and [size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421) used for copying during AOT cache build which is aligned and can be bigger than allocation size. > > I added asserts to `AdapterFingerPrint` and `AdapterHandlerEntry` to make sure sizes are correct. Both are used in AOT cache build. > > I also moved `FreeHeap()` from `~AdapterFingerPrint()` to enforce the comment and simplify executed code. > > Thanks to @MBaesken for finding the issue and @iklam for pointing the cause. > > Testing tier1-3, xcomp, stress. Higher tiers are still running. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Use one compute_size_in_words() method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25604/files - new: https://git.openjdk.org/jdk/pull/25604/files/9b67ceab..862e7826 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25604&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25604&range=01-02 Stats: 10 lines in 1 file changed: 2 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/25604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25604/head:pull/25604 PR: https://git.openjdk.org/jdk/pull/25604 From kvn at openjdk.org Tue Jun 3 18:17:29 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 18:17:29 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder [v2] In-Reply-To: <-yXLmyKeGt7ajGu_p3QgKPD2fD-2uSd7c8Hz-MHINMA=.504c9dc9-23ae-447d-8f26-74fc5693f30a@github.com> References: <-wJp4hkfj1YBQ4C_UjhsqBE2UkrbOafMr_bs_-v7S-A=.9801fb39-6372-4803-bbfe-fa5c3fe9ad3f@github.com> <-yXLmyKeGt7ajGu_p3QgKPD2fD-2uSd7c8Hz-MHINMA=.504c9dc9-23ae-447d-8f26-74fc5693f30a@github.com> Message-ID: <6LAxvKdv19aANIrZN-_6AP4NEeTxtxLfgCe7HEWHmc8=.505510c7-f16e-471a-a0fe-e71c76e6b77b@github.com> On Tue, 3 Jun 2025 17:56:29 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/runtime/sharedRuntime.cpp line 2227: >> >>> 2225: } >>> 2226: >>> 2227: static int compute_size(int total_args_passed) { >> >> OK, but if the source of discrepancy is between two places computing stuff separately (inconsistently), do you want to make the computations mechanically the same? >> >> Something like: >> >> >> static int compute_size_in_words(int total_args_passed) { >> return (int)heap_word_size(sizeof(AdapterFingerPrint) + (length(total_args_passed) * sizeof(int))); >> } >> >> static int compute_size_in_bytes(int total_args_passed) { >> return compute_size_in_words(total_args_passed) * BytesPerWord; >> } >> >> >> Then use `compute_size_in_words()` in the other place: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421 > > Yes, I can do that. But I will pass _length which is different from total_args_passed. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25604#discussion_r2124572304 From shade at openjdk.org Tue Jun 3 18:29:18 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 18:29:18 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder [v3] In-Reply-To: References: Message-ID: <6X52iTWdyHqJ9izTAPWWudbuW8Qo3LjkPTkJVa20eeY=.e0af77f5-edac-4601-bc38-97d05a8cf96b@github.com> On Tue, 3 Jun 2025 18:17:29 GMT, Vladimir Kozlov wrote: >> There is difference between AdapterFingerPrint allocation size [compute_size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2227) which may not be aligned to HeapWord size and [size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421) used for copying during AOT cache build which is aligned and can be bigger than allocation size. >> >> I added asserts to `AdapterFingerPrint` and `AdapterHandlerEntry` to make sure sizes are correct. Both are used in AOT cache build. >> >> I also moved `FreeHeap()` from `~AdapterFingerPrint()` to enforce the comment and simplify executed code. >> >> Thanks to @MBaesken for finding the issue and @iklam for pointing the cause. >> >> Testing tier1-3, xcomp, stress. Higher tiers are still running. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Use one compute_size_in_words() method Looks okay. Asserts get a bit tautological, but it is pleasantly paranoid for my taste. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25604#pullrequestreview-2893631876 From amenkov at openjdk.org Tue Jun 3 18:49:28 2025 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 3 Jun 2025 18:49:28 GMT Subject: Integrated: 8357650: ThreadSnapshot to take snapshot of thread for thread dumps In-Reply-To: References: Message-ID: On Sat, 24 May 2025 00:17:26 GMT, Alex Menkov wrote: > This is first (hotspot) part of the update for `HotSpotDiagnosticMXBean.dumpThreads` and `jcmd Thread.dump_to_file` to include lock information in thread dumps (JDK-8356870). > The update has been split into parts to simplify reviewing. > The fix contains an implementation of `jdk.internal.vm.ThreadSnapshot` class to gather required information about a thread. > Second (dependent) part includes changes in `HotSpotDiagnosticMXBean.dumpThreads`/`jcmd Thread.dump_to_file`, spec updates and tests for the functionality. > > Testing: new `HotSpotDiagnosticMXBean.dumpThreads`/`jcmd Thread.dump_to_file` functionality was tested in loom repo; > sanity tier1 (this fix only) This pull request has now been integrated. Changeset: 406f1bc5 Author: Alex Menkov URL: https://git.openjdk.org/jdk/commit/406f1bc5b94408778063b885cdac807fd1501e44 Stats: 716 lines in 10 files changed: 712 ins; 0 del; 4 mod 8357650: ThreadSnapshot to take snapshot of thread for thread dumps Co-authored-by: Alan Bateman Co-authored-by: Alex Menkov Reviewed-by: sspitsyn, kevinw ------------- PR: https://git.openjdk.org/jdk/pull/25425 From cslucas at openjdk.org Tue Jun 3 18:52:34 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 3 Jun 2025 18:52:34 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" [v4] In-Reply-To: References: Message-ID: > Please review this refactor to transform the reasons for making an nmethod not entrant from `const char*` into enum values. > > Tested on Linux x64 with JTREG tier1-3 in fastdebug and release mode. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Fix spacing, fix build. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25338/files - new: https://git.openjdk.org/jdk/pull/25338/files/fa77be5c..dc3aa2c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25338&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25338&range=02-03 Stats: 50 lines in 2 files changed: 23 ins; 0 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/25338.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25338/head:pull/25338 PR: https://git.openjdk.org/jdk/pull/25338 From cslucas at openjdk.org Tue Jun 3 18:52:35 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 3 Jun 2025 18:52:35 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" [v3] In-Reply-To: <195pScH-Kh1H1JvhkwC9xLM_joDJPccyMve95BwIlzk=.22181903-b4af-4ec5-8d57-688a6ee51832@github.com> References: <195pScH-Kh1H1JvhkwC9xLM_joDJPccyMve95BwIlzk=.22181903-b4af-4ec5-8d57-688a6ee51832@github.com> Message-ID: On Tue, 3 Jun 2025 17:41:35 GMT, Aleksey Shipilev wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Address PR feedback: more refactoring / renamings > > src/hotspot/share/code/nmethod.hpp line 500: > >> 498: static const char* change_reason_to_string(ChangeReason change_reason) { >> 499: switch (change_reason) { >> 500: case ChangeReason::C1_codepatch: return "C1 code patch"; > > Indenting: should be two spaces everywhere. Also, I think this kind of indenting forces us to re-align the switch for the largest enum label. Let's just break them. Plus, any multi-line blocks should be braced. So, in total: > > > switch (change_reason) { > case ChangeReason::C1_codepatch: > return "C1 code patch"; > ... > default: { > assert(false, "Unhandled reason"); > return "Unknown"; > } > } Done, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25338#discussion_r2124655548 From shade at openjdk.org Tue Jun 3 18:55:19 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 18:55:19 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" [v4] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 18:52:34 GMT, Cesar Soares Lucas wrote: >> Please review this refactor to transform the reasons for making an nmethod not entrant from `const char*` into enum values. >> >> Tested on Linux x64 with JTREG tier1-3 in fastdebug and release mode. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix spacing, fix build. src/hotspot/share/code/nmethod.cpp line 1971: > 1969: if (xtty != nullptr) { > 1970: ttyLocker ttyl; // keep the following output all in one block > 1971: xtty->begin_elem("make_not_entrant thread='%zu' change_reason='%s'", Wait, let's not change the actual key here. This is part of XML logging, AFAICS, so this might break some tools. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25338#discussion_r2124657484 From kvn at openjdk.org Tue Jun 3 18:59:16 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 18:59:16 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder [v3] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 18:17:29 GMT, Vladimir Kozlov wrote: >> There is difference between AdapterFingerPrint allocation size [compute_size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2227) which may not be aligned to HeapWord size and [size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421) used for copying during AOT cache build which is aligned and can be bigger than allocation size. >> >> I added asserts to `AdapterFingerPrint` and `AdapterHandlerEntry` to make sure sizes are correct. Both are used in AOT cache build. >> >> I also moved `FreeHeap()` from `~AdapterFingerPrint()` to enforce the comment and simplify executed code. >> >> Thanks to @MBaesken for finding the issue and @iklam for pointing the cause. >> >> Testing tier1-3, xcomp, stress. Higher tiers are still running. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Use one compute_size_in_words() method Thank you, Aleksey ------------- PR Comment: https://git.openjdk.org/jdk/pull/25604#issuecomment-2936747055 From kvn at openjdk.org Tue Jun 3 19:33:17 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Jun 2025 19:33:17 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder [v3] In-Reply-To: References: Message-ID: <0lJeZV-WWYPigLaDj2bmwub-s9WzPwyEKgm2PfDatXA=.5267dfd2-08b9-40c1-8602-f153bd18b6b8@github.com> On Tue, 3 Jun 2025 18:17:29 GMT, Vladimir Kozlov wrote: >> There is difference between AdapterFingerPrint allocation size [compute_size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2227) which may not be aligned to HeapWord size and [size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421) used for copying during AOT cache build which is aligned and can be bigger than allocation size. >> >> I added asserts to `AdapterFingerPrint` and `AdapterHandlerEntry` to make sure sizes are correct. Both are used in AOT cache build. >> >> I also moved `FreeHeap()` from `~AdapterFingerPrint()` to enforce the comment and simplify executed code. >> >> Thanks to @MBaesken for finding the issue and @iklam for pointing the cause. >> >> Testing tier1-3, xcomp, stress. Higher tiers are still running. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Use one compute_size_in_words() method Waiting confirmation from @MBaesken . ------------- PR Comment: https://git.openjdk.org/jdk/pull/25604#issuecomment-2936868129 From cslucas at openjdk.org Tue Jun 3 19:33:57 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 3 Jun 2025 19:33:57 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" [v5] In-Reply-To: References: Message-ID: > Please review this refactor to transform the reasons for making an nmethod not entrant from `const char*` into enum values. > > Tested on Linux x64 with JTREG tier1-3 in fastdebug and release mode. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Revert change to attribute of make_not_entrant element ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25338/files - new: https://git.openjdk.org/jdk/pull/25338/files/dc3aa2c1..6af59591 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25338&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25338&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25338.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25338/head:pull/25338 PR: https://git.openjdk.org/jdk/pull/25338 From cslucas at openjdk.org Tue Jun 3 19:33:57 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 3 Jun 2025 19:33:57 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" [v4] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 18:47:43 GMT, Aleksey Shipilev wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix spacing, fix build. > > src/hotspot/share/code/nmethod.cpp line 1971: > >> 1969: if (xtty != nullptr) { >> 1970: ttyLocker ttyl; // keep the following output all in one block >> 1971: xtty->begin_elem("make_not_entrant thread='%zu' change_reason='%s'", > > Wait, let's not change the actual key here. This is part of XML logging, AFAICS, so this might break some tools. Sure, I'll revert that. I thought it would be "fine" to change the key here since it was added not "long ago.." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25338#discussion_r2124736640 From vlivanov at openjdk.org Tue Jun 3 19:57:15 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 3 Jun 2025 19:57:15 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 17:36:13 GMT, Igor Veresov wrote: > Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25623#pullrequestreview-2893913507 From shade at openjdk.org Tue Jun 3 20:01:29 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Jun 2025 20:01:29 GMT Subject: RFR: 8357396: Refactor nmethod::make_not_entrant to use Enum instead of "const char*" [v5] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 19:33:57 GMT, Cesar Soares Lucas wrote: >> Please review this refactor to transform the reasons for making an nmethod not entrant from `const char*` into enum values. >> >> Tested on Linux x64 with JTREG tier1-3 in fastdebug and release mode. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Revert change to attribute of make_not_entrant element Looks good to me. Compiler folks might want to ack as well. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25338#pullrequestreview-2893925727 From mgronlun at openjdk.org Tue Jun 3 21:00:36 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Jun 2025 21:00:36 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 14:09:29 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Rename autoadapt src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp line 31: > 29: > 30: class JavaThread; > 31: class NonJavaThread; NonJavaThread fwd not needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2124913821 From iveresov at openjdk.org Tue Jun 3 21:11:26 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 3 Jun 2025 21:11:26 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 17:36:13 GMT, Igor Veresov wrote: > Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. It seems like we don't need these release_stores either since the constructor is always run under a lock. I'll run some testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25623#issuecomment-2937211273 From mgronlun at openjdk.org Tue Jun 3 21:11:32 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Jun 2025 21:11:32 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 14:09:29 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Rename autoadapt src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 79: > 77: > 78: bool JfrCPUTimeTraceQueue::enqueue(JfrCPUTimeSampleRequest& request) { > 79: assert(JavaThread::current()->jfr_thread_local()->is_cpu_time_jfr_enqueue_locked(), "invariant"); What is preventing another thread from enqueuing a request here? We only know it holds a thread-local lock? Let's make it explicit at this site that the current queue corresponds to the thread-local queue for the current thread. + assert(&JavaThread::current()->jfr_thread_local()->cpu_time_jfr_queue() == this, "invariant"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2124934207 From mgronlun at openjdk.org Tue Jun 3 21:18:34 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Jun 2025 21:18:34 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 14:09:29 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Rename autoadapt src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 96: > 94: } > 95: > 96: volatile u4 _lost_samples_sum = 0; static volatile ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2124941941 From mgronlun at openjdk.org Tue Jun 3 21:45:29 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Jun 2025 21:45:29 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: References: Message-ID: <1RLKF0E-I7CjQRNUqb7k0mEIvoSCO010FUaKnmLVPSI=.4c6876fb-ac9c-47d2-8379-ccafdbdbaabe@github.com> On Tue, 3 Jun 2025 14:09:29 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Rename autoadapt src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 520: > 518: // the sampling period might be too low for the current Linux configuration > 519: // so samples might be skipped and we have to compute the actual period > 520: int64_t period = get_sampling_period() * (info->si_overrun + 1); Does this calculation have to be done on every signal, by every thread? It seems like something that could be precalculated when the period is set? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2124980054 From duke at openjdk.org Tue Jun 3 21:52:22 2025 From: duke at openjdk.org (duke) Date: Tue, 3 Jun 2025 21:52:22 GMT Subject: Withdrawn: 8344116: C2: remove slice parameter from LoadNode::make In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 15:18:25 GMT, Zihao Lin wrote: > This patch remove slice parameter from LoadNode::make > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24258 From mgronlun at openjdk.org Tue Jun 3 22:01:34 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Jun 2025 22:01:34 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 14:09:29 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Rename autoadapt src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp line 360: > 358: tl->set_do_async_processing_of_cpu_time_jfr_requests(false); > 359: if (lock) { > 360: tl->acquire_cpu_time_jfr_dequeue_lock(); This is your synchronization point on return from native code, which is effectively a spinlock. This can cause problems when a large number of threads are being processed by the "do_async_processing" request call. We should fix this as a bug after integration (use a proper Monitor as a synchronization point). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2124997140 From mgronlun at openjdk.org Tue Jun 3 22:04:27 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 3 Jun 2025 22:04:27 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 14:09:29 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Rename autoadapt src/hotspot/share/jfr/support/jfrThreadLocal.cpp line 143: > 141: JavaThread *const jt = JavaThread::cast(t); > 142: send_java_thread_start_event(jt); > 143: JfrCPUTimeThreadSampling::on_javathread_create(jt); Move before send_java_thread...to have that captured by the timer? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125004892 From iveresov at openjdk.org Tue Jun 3 22:13:01 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 3 Jun 2025 22:13:01 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v2] In-Reply-To: References: Message-ID: > Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: No need for release_store() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25623/files - new: https://git.openjdk.org/jdk/pull/25623/files/f9b133fe..85a71619 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25623&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25623&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25623.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25623/head:pull/25623 PR: https://git.openjdk.org/jdk/pull/25623 From coleenp at openjdk.org Tue Jun 3 22:40:17 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Jun 2025 22:40:17 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v2] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 22:13:01 GMT, Igor Veresov wrote: >> Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > No need for release_store() src/hotspot/share/oops/trainingData.cpp line 437: > 435: assert(klass != nullptr, ""); > 436: Handle hm(JavaThread::current(), klass->java_mirror()); > 437: jobject hmj = JNIHandles::make_global(hm); Why don't you use OopStorage for this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25623#discussion_r2125056459 From iveresov at openjdk.org Tue Jun 3 22:48:14 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 3 Jun 2025 22:48:14 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v2] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 22:13:01 GMT, Igor Veresov wrote: >> Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > No need for release_store() Testing is ok ------------- PR Comment: https://git.openjdk.org/jdk/pull/25623#issuecomment-2937546623 From iveresov at openjdk.org Tue Jun 3 22:55:16 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 3 Jun 2025 22:55:16 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v2] In-Reply-To: References: Message-ID: <4ne8DsOBEMC2jSdOBI4l_33Jrs0CXHEKpdrLlBB-2uM=.52428bbb-6abc-4c33-85e7-6aa424c8b4f7@github.com> On Tue, 3 Jun 2025 22:37:54 GMT, Coleen Phillimore wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> No need for release_store() > > src/hotspot/share/oops/trainingData.cpp line 437: > >> 435: assert(klass != nullptr, ""); >> 436: Handle hm(JavaThread::current(), klass->java_mirror()); >> 437: jobject hmj = JNIHandles::make_global(hm); > > Why don't you use OopStorage for this? Are there any advantages? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25623#discussion_r2125071202 From iklam at openjdk.org Tue Jun 3 23:35:21 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 3 Jun 2025 23:35:21 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder [v3] In-Reply-To: References: Message-ID: <1tLFEk_8m434FPnlObCMhoxgxuYc12pSkfZOivpE--0=.b80134c8-aa3f-4c02-827e-f24ed208d08c@github.com> On Tue, 3 Jun 2025 18:17:29 GMT, Vladimir Kozlov wrote: >> There is difference between AdapterFingerPrint allocation size [compute_size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2227) which may not be aligned to HeapWord size and [size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421) used for copying during AOT cache build which is aligned and can be bigger than allocation size. >> >> I added asserts to `AdapterFingerPrint` and `AdapterHandlerEntry` to make sure sizes are correct. Both are used in AOT cache build. >> >> I also moved `FreeHeap()` from `~AdapterFingerPrint()` to enforce the comment and simplify executed code. >> >> Thanks to @MBaesken for finding the issue and @iklam for pointing the cause. >> >> Testing tier1-3, xcomp, stress. Higher tiers are still running. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Use one compute_size_in_words() method Marked as reviewed by iklam (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25604#pullrequestreview-2894383374 From kvn at openjdk.org Wed Jun 4 00:03:16 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Jun 2025 00:03:16 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder [v3] In-Reply-To: <1tLFEk_8m434FPnlObCMhoxgxuYc12pSkfZOivpE--0=.b80134c8-aa3f-4c02-827e-f24ed208d08c@github.com> References: <1tLFEk_8m434FPnlObCMhoxgxuYc12pSkfZOivpE--0=.b80134c8-aa3f-4c02-827e-f24ed208d08c@github.com> Message-ID: On Tue, 3 Jun 2025 23:32:58 GMT, Ioi Lam wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Use one compute_size_in_words() method > > Marked as reviewed by iklam (Reviewer). Thank you, @iklam ------------- PR Comment: https://git.openjdk.org/jdk/pull/25604#issuecomment-2937775069 From iveresov at openjdk.org Wed Jun 4 00:53:21 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 4 Jun 2025 00:53:21 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v3] In-Reply-To: References: Message-ID: > Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. Igor Veresov has updated the pull request incrementally with two additional commits since the last revision: - More changes - Use dedicated OopStorage ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25623/files - new: https://git.openjdk.org/jdk/pull/25623/files/85a71619..f8a9b4a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25623&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25623&range=01-02 Stats: 32 lines in 4 files changed: 20 ins; 8 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25623.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25623/head:pull/25623 PR: https://git.openjdk.org/jdk/pull/25623 From iveresov at openjdk.org Wed Jun 4 00:56:17 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 4 Jun 2025 00:56:17 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v2] In-Reply-To: <4ne8DsOBEMC2jSdOBI4l_33Jrs0CXHEKpdrLlBB-2uM=.52428bbb-6abc-4c33-85e7-6aa424c8b4f7@github.com> References: <4ne8DsOBEMC2jSdOBI4l_33Jrs0CXHEKpdrLlBB-2uM=.52428bbb-6abc-4c33-85e7-6aa424c8b4f7@github.com> Message-ID: On Tue, 3 Jun 2025 22:52:26 GMT, Igor Veresov wrote: >> src/hotspot/share/oops/trainingData.cpp line 437: >> >>> 435: assert(klass != nullptr, ""); >>> 436: Handle hm(JavaThread::current(), klass->java_mirror()); >>> 437: jobject hmj = JNIHandles::make_global(hm); >> >> Why don't you use OopStorage for this? > > Are there any advantages? Ok, transitioned to OopStrage. Please take a look if correctly. I'll be back when the testing is done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25623#discussion_r2125208972 From kvn at openjdk.org Wed Jun 4 02:17:29 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Jun 2025 02:17:29 GMT Subject: Integrated: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 02:01:02 GMT, Vladimir Kozlov wrote: > There is difference between AdapterFingerPrint allocation size [compute_size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2227) which may not be aligned to HeapWord size and [size](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2421) used for copying during AOT cache build which is aligned and can be bigger than allocation size. > > I added asserts to `AdapterFingerPrint` and `AdapterHandlerEntry` to make sure sizes are correct. Both are used in AOT cache build. > > I also moved `FreeHeap()` from `~AdapterFingerPrint()` to enforce the comment and simplify executed code. > > Thanks to @MBaesken for finding the issue and @iklam for pointing the cause. > > Testing tier1-3, xcomp, stress. Higher tiers are still running. This pull request has now been integrated. Changeset: ebd85288 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/ebd85288ce309b7dc7ff8b36558dd9f2a2300209 Stats: 15 lines in 2 files changed: 5 ins; 1 del; 9 mod 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder Reviewed-by: shade, iklam, asmehra ------------- PR: https://git.openjdk.org/jdk/pull/25604 From apangin at openjdk.org Wed Jun 4 03:15:36 2025 From: apangin at openjdk.org (Andrei Pangin) Date: Wed, 4 Jun 2025 03:15:36 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: References: Message-ID: <3ALqkSc9a0HKOJrA6CW61v725SxX8FcLmasC8Wm4y24=.9d40a91c-a77a-442b-926a-e5785032c415@github.com> On Tue, 3 Jun 2025 14:09:29 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Rename autoadapt src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 51: > 49: static bool is_excluded(JavaThread* jt) { > 50: return jt->is_hidden_from_external_view() || > 51: jt->jfr_thread_local()->is_excluded() || These restrictions cause a large blind spot in observability. There is no technical limitation for recording cpu samples for internal threads too, even without a Java stack trace. Consider removing this restriction, although not in this PR. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 135: > 133: while ((new_lost_samples = Atomic::cmpxchg(&_lost_samples, lost_samples, 0)) != lost_samples) { > 134: lost_samples = new_lost_samples; > 135: } Why not `Atomic::xchg`? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 161: > 159: return 0; > 160: } > 161: return os::active_processor_count() * 1000000000.0 / rate; If sampling period is configured as an absolute number in milliseconds, this value must be passed as is. Double conversion via `Runtime.availableProcessors()` / `active_processor_count()` is unobvious and error-prone. First, because of asymmetry: e.g. `Runtime.availableProcessors()` may be redefined by an agent so that its value is not aligned with `active_processor_count()`. Second, because number of available processors may change at runtime, e.g., by adjusting cgroup quotas. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 198: > 196: virtual void post_run(); > 197: public: > 198: virtual const char* name() const { return "JFR CPU Time Thread Sampler"; } Thread name is too long and does not sound right. Logically, it is not "Thread Sampler", but rather "Sampler Thread", which also aligns with the existing "JFR Sampler Thread". But I'd simplify it to `JFR CPU Time Sampler` or maybe `JFR CPU Sampler Thread`. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 202: > 200: void run(); > 201: void on_javathread_create(JavaThread* thread); > 202: bool create_timer_for_thread(JavaThread* thread, timer_t &timerid); Should it be `private`? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 252: > 250: timer_delete(*timer); > 251: tl->unset_cpu_timer(); > 252: tl->deallocate_cpu_time_jfr_queue(); Either this line is not needed or there is a possible resource leak: if `create_timer_for_thread` fails, queue is allocated but not deallocated. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 281: > 279: stop_timer(); > 280: Atomic::store(&_stop_signals, true); > 281: while (Atomic::load_acquire(&_active_signal_handlers) > 0) { There can be a race when `handle_timer_signal` has already passed `_stop_signals` check but has not yet incremented `_active_signal_handlers`. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 308: > 306: > 307: if (Atomic::load_acquire(&_is_async_processing_of_cpu_time_jfr_requests_triggered)) { > 308: Atomic::release_store(&_is_async_processing_of_cpu_time_jfr_requests_triggered, false); acquire/release seem to be used for no good reason. Also, this could be a single `cmpxchg`. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 326: > 324: if (jt->thread_state() != _thread_in_native || !tl->try_acquire_cpu_time_jfr_dequeue_lock()) { > 325: tl->set_do_async_processing_of_cpu_time_jfr_requests(false); > 326: continue; // thread doesn't have a last Java frame or queue is already being processed This comment may sound confusing, since `has_last_Java_frame` is checked separately below. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 472: > 470: > 471: void handle_timer_signal(int signo, siginfo_t* info, void* context) { > 472: assert(_instance != nullptr, "invariant"); There can be an arbitrary delay in async signal delivery. It's unlikely, but not impossible for `_instance` to be deleted by the time signal handler is called. There should be a better way to synchronize with JFR shutdown. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 477: > 475: > 476: > 477: void JfrCPUTimeThreadSampling::handle_timer_signal(siginfo_t* info, void* context) { It may be a good idea to validate `info->si_code` in order to protect from things like `kill -SIGPROF` after profiling has stopped. For a similar reason, `_sampler->_stop_signals` should default to `true` whenever profiler is not running. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 509: > 507: JfrCPUTimeTraceQueue& queue = tl->cpu_time_jfr_queue(); > 508: if (!check_state(jt)) { > 509: queue.increment_lost_samples(); nit: wrong indent src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 576: > 574: } > 575: if (timer_create(clock, &sev, &t) < 0) { > 576: log_error(jfr)("Failed to register the signal handler for thread sampling: %s", os::strerror(os::get_last_error())); If an application has many threads and current RLIMIT_SIGPENDING is low, logs will be flooded with this error message. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 606: > 604: void JfrCPUTimeThreadSampler::init_timers() { > 605: // install sig handler for sig > 606: PosixSignals::install_generic_signal_handler(SIG, (void*)::handle_timer_signal); SIGPROF is also used by external profilers. Need to check if SIGPROF handler is already installed and warn user. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp line 58: > 56: volatile u4 _head; > 57: > 58: volatile s4 _lost_samples; Why signed int? Can it be negative? src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp line 128: > 126: static void send_lost_event(const JfrTicks& time, traceid tid, s4 lost_samples); > 127: > 128: // Trigger sampling while a thread is not in a safepoint, from a seperate thread typo: separate src/hotspot/share/jfr/support/jfrThreadLocal.cpp line 558: > 556: void JfrThreadLocal::set_cpu_timer(timer_t* timer) { > 557: if (_cpu_timer == nullptr) { > 558: _cpu_timer = JfrCHeapObj::new_array(1); `timer_t` is a primitive type, at most one machine word. Why extra indirection and allocation? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2124528320 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2124503100 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125157311 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125128723 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125130332 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125190998 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125203700 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125342289 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125249171 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125230422 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125241099 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125320255 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125411074 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125430231 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2124507884 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125333563 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125183931 From lliu at openjdk.org Wed Jun 4 03:30:11 2025 From: lliu at openjdk.org (Liming Liu) Date: Wed, 4 Jun 2025 03:30:11 GMT Subject: RFR: 8358032: Use crypto pmull for CRC32/CRC32C intrinsics on Ampere CPU [v2] In-Reply-To: References: Message-ID: > This PR is to enable the use of crypto pmull for CRC32/CRC32C intrinsics on Ampere CPU. There is an option UseCryptoPmullForCRC32 that can enable crypto pmull, but directly enabling it on Ampere CPU will cause the following problems. > > 1. There will be regressions (-14% ~ -8%) on Ampere1 when the length is 64. When <= 128, both kernel_crc32_using_crc32 and kernel_crc32_using_crypto_pmull use the loop labeled as CRC_by32_loop, but their implements are a little different, and the loop in kernel_crc32_using_crc32 is better at hiding latency on Ampere1. So this PR takes the loop in kernel_crc32_using_crc32 to kernel_crc32_using_crypto_pmull, and does the same for CRC32C intrinsic. > > 2. The intrinsics only use crypto pmull when the length is higher than 383, while the loop in kernel_crc32_common_fold_using_crypto_pmull looks able to handle 256, and if it handles 256 on Ampere1, the improvements can be as high as 110% compared with kernel_crc32_using_crc32/kernel_crc32c_using_crc32c. However, there are regressions (~-6%) on Neoverse V1 when the length is 256. So this PR introduces a new option named CryptoPmullForCRC32LowLimit. It defaults to 256 since the code could handle 256, while it is set to 384 for V1/V2 to keep the old behavior on these platforms. > > The performance regressions and improvements were measured with the following microbenchmarks: > org.openjdk.bench.java.util.TestCRC32.testCRC32Update > org.openjdk.bench.java.util.TestCRC32C.testCRC32CUpdate > > Ran the following JTReg tests on Ampere1 and did not find problems: > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java Liming Liu has updated the pull request incrementally with one additional commit since the last revision: Make it be a diagnostic flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25609/files - new: https://git.openjdk.org/jdk/pull/25609/files/8aa96578..db926eb0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25609&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25609&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25609.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25609/head:pull/25609 PR: https://git.openjdk.org/jdk/pull/25609 From dholmes at openjdk.org Wed Jun 4 04:53:27 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 4 Jun 2025 04:53:27 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v10] In-Reply-To: <1JqKzjCGoZ9N_ez_gMKOlR1lbWPte0LkQS3bSb81ua0=.3c4c006b-18c0-4444-a867-8c774899b5b9@github.com> References: <8ESOaNI_qHLzLquiZT7RZR43lit-o8_5rTky1nJFjH4=.a81b8882-1470-4f76-8c9a-cdc2a7b50070@github.com> <1JqKzjCGoZ9N_ez_gMKOlR1lbWPte0LkQS3bSb81ua0=.3c4c006b-18c0-4444-a867-8c774899b5b9@github.com> Message-ID: On Tue, 3 Jun 2025 12:16:32 GMT, Johannes Bechberger wrote: >>> Hold on, shouldn't this really be "Lost"? @egahlin and @mgronlun need to chime in here. >> >> Lost might be better. >> >> I wonder if `` is needed, instead of thread = true? > >> I wonder if is needed, instead of thread = true? > > We had these discussions before on the old PR and then decided to end up with eventThread (as the other events do to), @parttimenerd I would really like to see some kind of design description for this which explains what the threading model is, how the signals are used, and how all the pieces interact. Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/25302#issuecomment-2938512151 From mbaesken at openjdk.org Wed Jun 4 05:21:17 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 4 Jun 2025 05:21:17 GMT Subject: RFR: 8357155: [asan] ZGC does not work (x86_64 and ppc64) [v2] In-Reply-To: References: <_p5h0MfOc1LQ2g30xDHYJf9v_B2QbJmJ0El0vc_u6zM=.6461af5c-c4cd-442c-a16e-c9578484f10c@github.com> Message-ID: On Tue, 3 Jun 2025 05:57:46 GMT, Axel Boldt-Christmas wrote: >> [JDK-8358310](https://bugs.openjdk.org/browse/JDK-8358310) / #25578 is open right now as a quick fix for returning a too large value without cleaning up the implementation. (As a fix for 25) >> >> This was noted back in https://github.com/openjdk/jdk/pull/18941#issuecomment-2079316745 ([JDK-8330275](https://bugs.openjdk.org/browse/JDK-8330275)), but I think fixing this fell through the cracks. >> >> I currently have a rewrite in the works which overhauls the heap base selection, which I plan to get into 26. In that patch all the non-generational legacy is removed. So we no longer probe based on the assumption that we need 3 extra high order bits. > > But I will make sure to create an issue for this overhaul, so it does not get lost. Thanks for this ! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25549#discussion_r2125665378 From mbaesken at openjdk.org Wed Jun 4 05:27:06 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 4 Jun 2025 05:27:06 GMT Subject: RFR: 8357155: [asan] ZGC does not work (x86_64 and ppc64) [v3] In-Reply-To: References: Message-ID: > Many (all?) ZGC related jtreg tests do not work when the JDK is built with address sanitizer asan enabled (configure flag --enable-asan). > This can be seen on SUSE Linux x86_64 and also on ppc64le , opt binaries were used. > It has been suggested to do a workaround - 'But I think that simply adapting the zAddress_[...].cpp implementations to always select the largest heap base would go a long way for providing ASAN compatibility.' > This seems to work nicely on x86_64 and ppc64le, however the zgc related tests still fail on Linux aarch64 (should I exclude this platform from my patch?) . Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Add comment requested by mdoerr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25549/files - new: https://git.openjdk.org/jdk/pull/25549/files/82a11f9b..85da86e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25549&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25549&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25549.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25549/head:pull/25549 PR: https://git.openjdk.org/jdk/pull/25549 From mbaesken at openjdk.org Wed Jun 4 05:27:06 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 4 Jun 2025 05:27:06 GMT Subject: RFR: 8357155: [asan] ZGC does not work (x86_64 and ppc64) [v2] In-Reply-To: <0808aEXLDKNUY6rsNCbjRjs_O0BaPLrCsX7q2zjpzus=.8ea987cf-fb41-47c5-9df3-840bc939f99a@github.com> References: <0808aEXLDKNUY6rsNCbjRjs_O0BaPLrCsX7q2zjpzus=.8ea987cf-fb41-47c5-9df3-840bc939f99a@github.com> Message-ID: On Tue, 3 Jun 2025 07:58:26 GMT, Martin Doerr wrote: > I think this PR is ok, but please add a comment like "The max supported value is 44 because of other internal data structures.". Sure, I added the comment. Are you fine with the PR now ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25549#discussion_r2125673231 From jbechberger at openjdk.org Wed Jun 4 05:32:29 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 05:32:29 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: <3ALqkSc9a0HKOJrA6CW61v725SxX8FcLmasC8Wm4y24=.9d40a91c-a77a-442b-926a-e5785032c415@github.com> References: <3ALqkSc9a0HKOJrA6CW61v725SxX8FcLmasC8Wm4y24=.9d40a91c-a77a-442b-926a-e5785032c415@github.com> Message-ID: On Wed, 4 Jun 2025 00:13:07 GMT, Andrei Pangin wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename autoadapt > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 161: > >> 159: return 0; >> 160: } >> 161: return os::active_processor_count() * 1000000000.0 / rate; > > If sampling period is configured as an absolute number in milliseconds, this value must be passed as is. > Double conversion via `Runtime.availableProcessors()` / `active_processor_count()` is unobvious and error-prone. First, because of asymmetry: e.g. `Runtime.availableProcessors()` may be redefined by an agent so that its value is not aligned with `active_processor_count()`. Second, because number of available processors may change at runtime, e.g., by adjusting cgroup quotas. Is this something for a later PR? > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 281: > >> 279: stop_timer(); >> 280: Atomic::store(&_stop_signals, true); >> 281: while (Atomic::load_acquire(&_active_signal_handlers) > 0) { > > There can be a race when `handle_timer_signal` has already passed `_stop_signals` check but has not yet incremented `_active_signal_handlers`. Amy idea on how to fix it? > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 472: > >> 470: >> 471: void handle_timer_signal(int signo, siginfo_t* info, void* context) { >> 472: assert(_instance != nullptr, "invariant"); > > There can be an arbitrary delay in async signal delivery. > It's unlikely, but not impossible for `_instance` to be deleted by the time signal handler is called. There should be a better way to synchronize with JFR shutdown. Any ideas? Or is it something for a later PR? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125678084 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125680345 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125681876 From jbechberger at openjdk.org Wed Jun 4 06:02:32 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 06:02:32 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v10] In-Reply-To: References: <8ESOaNI_qHLzLquiZT7RZR43lit-o8_5rTky1nJFjH4=.a81b8882-1470-4f76-8c9a-cdc2a7b50070@github.com> <1JqKzjCGoZ9N_ez_gMKOlR1lbWPte0LkQS3bSb81ua0=.3c4c006b-18c0-4444-a867-8c774899b5b9@github.com> Message-ID: <81dLp_39MhU-TuDD3EDt7iTX8HyEDZfj6nvCPwE5Ol4=.7d564cda-b7ae-4623-9705-704770b2b118@github.com> On Wed, 4 Jun 2025 04:50:56 GMT, David Holmes wrote: >>> I wonder if is needed, instead of thread = true? >> >> We had these discussions before on the old PR and then decided to end up with eventThread (as the other events do to), > > @parttimenerd I would really like to see some kind of design description for this which explains what the threading model is, how the signals are used, and how all the pieces interact. Thanks @dholmes-ora I attempt a first version here: The design consists of four main parts: - setup code: This sets up the signal handlers for every new thread and deletes them afterwards - the per-thread signal handlers: They check first that the current thread is valid, increment that they are currently active and check that they shouldn't stop (because the profiler is disabled). Now they acquire the thread-local enqueue lock for the current thread's request queue and push the sampling requests in (see https://openjdk.org/jeps/518 + the current period). It triggers/arms a safepoint. If the current thread is in native, they trigger (set a flag) the asynchronous stackwalking. This prevents long native periods of overflowing the request queue. Finally, the enqueue lock is released. - the safepoint handler: In the safepoint handler, we check if the thread-local queue is not empty. If so, we acquire a dequeue lock and process all entries of the queue, thereby creating JFR events. We also untrigger the async-stack-walking request for the thread. We then release the lock. - the sampler thread: Its task is to regularly update the timers if needed (configuration changes) and to walk the thread list to find any task that wants to be asynchronously stack-walked. For every of these threads, the dequeue lock is acquired (skipping if already set to enqueue) and the queue is processed as at the safepoint. Then the lock is released. On shutdown: Whenever the sampler is shut down, we first set the `_stop_signals` flag to prevent new signal handlers from entering the request creation code (and thereby accessing data structures that we already deallocated), we disable the timers for all threads and then wait till no signal handler is engaged anymore. It is important to note that there is only one thread-local lock used, but it has three states: - enqueue - dequeue - unlocked This prevents these phases from overlapping. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25302#issuecomment-2938677600 From jbechberger at openjdk.org Wed Jun 4 06:10:30 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 06:10:30 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: <1RLKF0E-I7CjQRNUqb7k0mEIvoSCO010FUaKnmLVPSI=.4c6876fb-ac9c-47d2-8379-ccafdbdbaabe@github.com> References: <1RLKF0E-I7CjQRNUqb7k0mEIvoSCO010FUaKnmLVPSI=.4c6876fb-ac9c-47d2-8379-ccafdbdbaabe@github.com> Message-ID: On Tue, 3 Jun 2025 21:42:48 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename autoadapt > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 520: > >> 518: // the sampling period might be too low for the current Linux configuration >> 519: // so samples might be skipped and we have to compute the actual period >> 520: int64_t period = get_sampling_period() * (info->si_overrun + 1); > > Does this calculation have to be done on every signal, by every thread? It seems like something that could be precalculated when the period is set? This might change dynamically, so probably no. Only caching would work, but this is a small optimization for later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125725230 From mbaesken at openjdk.org Wed Jun 4 06:23:17 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 4 Jun 2025 06:23:17 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured [v2] In-Reply-To: References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> Message-ID: On Mon, 2 Jun 2025 08:07:38 GMT, Matthias Baesken wrote: >> There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . >> Those fail when the address sanitizer is configured ( --enable-asan ). >> The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. >> Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . >> While at it, also same is also added for ubsan . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > TestBreakSignalThreadDump has issues with asan gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java --------------------------------------------------------------- stderr: [==46460==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12) ==46460==ReserveShadowMemoryRange failed while trying to map 0xdfff0001000 bytes. Perhaps you're using ulimit -v ulimit clashes with the memory requirements of ASAN runtime/Thread/TestBreakSignalThreadDump.java --------------------------------------------------------------- stderr: [==18432==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD. loading of the jsig lib does currently not work well with ASAN lib runtime/XCheckJniJsig/XCheckJSig.java --------------------------------------------------------------- stderr: [==71228==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD. loading of the jsig lib does currently not work well with ASAN lib runtime/cds/appcds/aotCode/AOTCodeCompressedOopsTest.java --------------------------------------------------------------- reports ==35621==ERROR: AddressSanitizer: heap-buffer-overflow on address ... this will be fixed hopefully so we could maybe remove the !asan tagging serviceability/dcmd/vm/SystemDumpMapTest.java --------------------------------------------------------------- Missing patterns in dump: 0x\\p{XDigit}+-0x\\p{XDigit}+ +\\d+ +[rwsxp-]+ +\\d+ +\\d+ +(4K|8K|16K|64K|2M|16M|64M) +com.*\[heap\] test SystemDumpMapTest.jmx(): failure [410ms] ASAN changes the memory map dump slightly, but the test has rather strict requirements serviceability/dcmd/vm/SystemMapTest.java --------------------------------------------------------------- test SystemMapTest.jmx(): failure [381ms] java.lang.RuntimeException: '0x\\p{XDigit}+-0x\\p{XDigit}+ +\\d+ +[rwsxp-]+ +\\d+ +\\d+ +(4K|8K|16K|64K|2M|16M|64M) +com.*\[heap\]' missing from stdout/stderr ASAN changes the memory map dump slightly, but the test has rather strict requirements serviceability/sa/ClhsdbCDSCore.java --------------------------------------------------------------- Output and diagnostic info for process 45808 was saved into 'pid-45808-output.log' crashOutputString = [[0.028s][error][cds] An error has occurred while processing the shared archive file. Run with -Xlog:aot,cds for details. [0.029s][error][cds] Mismatched values for property jdk.module.addexports: java.base/jdk.internal.misc=ALL-UNNAMED specified during runtime but not during dump time [0.029s][error][cds] Disabling optimized module handling # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x000014d4d60ef8d2, pid=45808, tid=46654 # # JRE version: OpenJDK Runtime Environment (25.0.0.1) (build 25.0.0.1-internal-adhoc.myuser.jdk) # Java VM: OpenJDK 64-Bit Server VM (25.0.0.1-internal-adhoc.myuser.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x3d6b8d2] Unsafe_PutInt+0x592 # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: Output doesn't contain the location of core file.: expected true, was false at ClhsdbCDSCore.main(ClhsdbCDSCore.java:171) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) at java.base/java.lang.Thread.run(Thread.java:1474) Caused by: java.lang.RuntimeException: Output doesn't contain the location of core file.: expected true, was false Seems no core was written, maybe ASAN is to blame or my test environment ? serviceability/sa/ClhsdbFindPC.java --------------------------------------------------------------- java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: Output doesn't contain the location of core file.: expected true, was false at ClhsdbFindPC.testFindPC(ClhsdbFindPC.java:317) at ClhsdbFindPC.main(ClhsdbFindPC.java:339) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) at java.base/java.lang.Thread.run(Thread.java:1474) Caused by: java.lang.RuntimeException: Output doesn't contain the location of core file.: expected true, was false Looks similar to ClhsdbCDSCore issue Turns out cds/appcds/aotCode/AOTCodeCompressedOopsTest.java was a real bug, so I guess we should remove it from this exclusion. Are you fine with the short explanations given, if yes I would add them as comment to the tests . ------------- PR Comment: https://git.openjdk.org/jdk/pull/25575#issuecomment-2938728030 PR Comment: https://git.openjdk.org/jdk/pull/25575#issuecomment-2938732660 From mbaesken at openjdk.org Wed Jun 4 06:28:17 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 4 Jun 2025 06:28:17 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured [v2] In-Reply-To: <6HSruHtZNPOZJp4vNFnwMns6-_rP_MEHtnnvAP7S5QU=.e91023a2-089c-4541-86a5-ae8d4adeb99d@github.com> References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> <6HSruHtZNPOZJp4vNFnwMns6-_rP_MEHtnnvAP7S5QU=.e91023a2-089c-4541-86a5-ae8d4adeb99d@github.com> Message-ID: On Tue, 3 Jun 2025 12:31:51 GMT, Afshin Zafari wrote: > In ASAN built JDK, some gtests and some other JTREG tests in runtime/ErrorHandling also fail. Do we exclude these in another PR? or should they also be handled/excluded here? The 'some' word in the PR's title is not strict, IMO. Yes it is not strict ; I did mostly tests with ASAN on Linux x86_64 and Linux ppc64le so far . On x86_64 I saw a few more tests have issues with ASAN, but the intention of this PR was to just include the ones where it was clear to me what happens and where a 'fix' is not likely (and mostly also the ones I saw failing across the 2 OS/CPU platforms I mentioned). Maybe that's why we should better remove the exclusion of AOTCodeCompressedOopsTest , because this is not some kind of incompatibility of ASAN with special test requirements, but a real memory issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25575#issuecomment-2938748695 From jbechberger at openjdk.org Wed Jun 4 06:34:29 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 06:34:29 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: References: <3ALqkSc9a0HKOJrA6CW61v725SxX8FcLmasC8Wm4y24=.9d40a91c-a77a-442b-926a-e5785032c415@github.com> Message-ID: On Wed, 4 Jun 2025 05:28:21 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 281: >> >>> 279: stop_timer(); >>> 280: Atomic::store(&_stop_signals, true); >>> 281: while (Atomic::load_acquire(&_active_signal_handlers) > 0) { >> >> There can be a race when `handle_timer_signal` has already passed `_stop_signals` check but has not yet incremented `_active_signal_handlers`. > > Amy idea on how to fix it? I added another _static_stop_signals field which should prevent this. >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 472: >> >>> 470: >>> 471: void handle_timer_signal(int signo, siginfo_t* info, void* context) { >>> 472: assert(_instance != nullptr, "invariant"); >> >> There can be an arbitrary delay in async signal delivery. >> It's unlikely, but not impossible for `_instance` to be deleted by the time signal handler is called. There should be a better way to synchronize with JFR shutdown. > > Any ideas? Or is it something for a later PR? I added another `_static_stop_signals` field which should prevent this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125756115 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125755428 From jbechberger at openjdk.org Wed Jun 4 06:34:30 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 06:34:30 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: <3ALqkSc9a0HKOJrA6CW61v725SxX8FcLmasC8Wm4y24=.9d40a91c-a77a-442b-926a-e5785032c415@github.com> References: <3ALqkSc9a0HKOJrA6CW61v725SxX8FcLmasC8Wm4y24=.9d40a91c-a77a-442b-926a-e5785032c415@github.com> Message-ID: On Wed, 4 Jun 2025 00:28:46 GMT, Andrei Pangin wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename autoadapt > > src/hotspot/share/jfr/support/jfrThreadLocal.cpp line 558: > >> 556: void JfrThreadLocal::set_cpu_timer(timer_t* timer) { >> 557: if (_cpu_timer == nullptr) { >> 558: _cpu_timer = JfrCHeapObj::new_array(1); > > `timer_t` is a primitive type, at most one machine word. Why extra indirection and allocation? @mgronlun wanted this indirection to move it abstract from implementation details ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125758074 From jbechberger at openjdk.org Wed Jun 4 07:00:51 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 07:00:51 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v43] In-Reply-To: References: Message-ID: <-BGoOClpsfsd4Q8Wq-H57L3tIvoaLGauYtRBEDPO-_w=.97e25e4f-879d-45f6-bd00-ad53e2463a8d@github.com> > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Improve ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/ae55610c..55c30aef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=42 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=41-42 Stats: 87 lines in 6 files changed: 26 ins; 10 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From kbarrett at openjdk.org Wed Jun 4 07:12:42 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 4 Jun 2025 07:12:42 GMT Subject: RFR: 8352565: Add native method implementation of Reference.get() [v8] In-Reply-To: References: Message-ID: > Please review this change which adds a native method providing the > implementation of Reference::get. Referece::get is an intrinsic candidate, so > this native method implementation is only used when the intrinsic is not. > > Currently there is intrinsic support by the interpreter, C1, C2, and graal, > which are always used. With this change we can later remove all the > per-platform interpreter intrinsic implementations, and might also remove the > C1 intrinsic implementation. > > Testing: > (1) mach5 tier1-6 normal (so using all the existing intrinsics). > (2) mach5 tier1-6 with interpreter and C1 Reference::get intrinsics disabled. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'master' into native-reference-get - make private native Reference.get0 the intrinsic - Merge branch 'master' into native-reference-get - Merge branch 'master' into native-reference-get - use new waitForRefProc, some tidying - Merge branch 'master' into native-reference-get - remove timeout by using waitForReferenceProcessing - make ill-timed gc in non-concurrent case less likely - fix test package use - add package decl to test - ... and 3 more: https://git.openjdk.org/jdk/compare/9578d341...98056a8b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24315/files - new: https://git.openjdk.org/jdk/pull/24315/files/4387e2fe..98056a8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24315&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24315&range=06-07 Stats: 49978 lines in 811 files changed: 26005 ins; 15101 del; 8872 mod Patch: https://git.openjdk.org/jdk/pull/24315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24315/head:pull/24315 PR: https://git.openjdk.org/jdk/pull/24315 From kbarrett at openjdk.org Wed Jun 4 07:12:42 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 4 Jun 2025 07:12:42 GMT Subject: RFR: 8352565: Add native method implementation of Reference.get() [v6] In-Reply-To: References: <5D6vakt8Q41_YF90LaGoxI0tECxo3hm_fiMCuXrpf-w=.363ecf9a-9421-482d-a101-a7ec1efd8b8e@github.com> <_99Geoayi09Ey7YT7qWw4pjMqbVUNxfKpFBwwI_EbHg=.e81158ae-813c-4015-94d6-4404eb756394@github.com> Message-ID: On Fri, 30 May 2025 19:30:50 GMT, Vladimir Ivanov wrote: >> Much of the point of this change is to let us later remove the interpreter/c1 >> intrinsics for this function. I think what you are saying is that might be >> tricky if `get()` is the intrinsic. So maybe I should just go ahead now with >> making the native `get0()` be the intrinsic. I'll take a look at it and see >> how widespread the renaming changes are. >> >> If `get0()` is the intrinsic, then I think that referenced snippet from the >> Compile ctor can go away? Rather than being changed to refer to the get0 >> intrinsic. > >> If get0() is the intrinsic, then I think that referenced snippet from the > Compile ctor can go away? > > Yes. OK, I've moved the intrinsification to get0. It adds a fair number of files, but the changes are mostly trivial renaming of "get" to "get0". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24315#discussion_r2125849565 From mbaesken at openjdk.org Wed Jun 4 07:23:21 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 4 Jun 2025 07:23:21 GMT Subject: RFR: 8358289: [asan] runtime/cds/appcds/aotCode/AOTCodeFlags.java reports heap-buffer-overflow in ArchiveBuilder [v3] In-Reply-To: <0lJeZV-WWYPigLaDj2bmwub-s9WzPwyEKgm2PfDatXA=.5267dfd2-08b9-40c1-8602-f153bd18b6b8@github.com> References: <0lJeZV-WWYPigLaDj2bmwub-s9WzPwyEKgm2PfDatXA=.5267dfd2-08b9-40c1-8602-f153bd18b6b8@github.com> Message-ID: On Tue, 3 Jun 2025 19:30:45 GMT, Vladimir Kozlov wrote: > Waiting confirmation from @MBaesken . The issue is fixed now! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25604#issuecomment-2938902641 From mgronlun at openjdk.org Wed Jun 4 08:17:31 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 4 Jun 2025 08:17:31 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: References: <3ALqkSc9a0HKOJrA6CW61v725SxX8FcLmasC8Wm4y24=.9d40a91c-a77a-442b-926a-e5785032c415@github.com> Message-ID: On Wed, 4 Jun 2025 06:29:59 GMT, Johannes Bechberger wrote: >> Any ideas? Or is it something for a later PR? > > I added another `_static_stop_signals` field which should prevent this. The _instance is only ever deleted in case a JFR startup attempt fails as part of JfrRecorder::create(). The sampler must have a rate and become enrolled to serve clients (by installing timers). The rate is set post JfrRecorder::create() using the setting system, which implies that _instance != nullptr should be invariant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125976370 From epeter at openjdk.org Wed Jun 4 08:24:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 08:24:23 GMT Subject: RFR: 8337666: AArch64: SHA3 GPR intrinsic [v2] In-Reply-To: References: <4gjCTX5GeYnhLOggsT2koqaeM1DdlJnwcQdSiR-3cZk=.beb2eccc-ac6d-48bb-a828-e58383799ea5@github.com> Message-ID: On Fri, 30 May 2025 18:24:22 GMT, Dmitry Chuyko wrote: >> Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Copyright year >> - Review suggestions >> - Merge master >> - Delete empty line >> - SHA3 GPR intrinsic & tests > > GPR rol, rax and rax1 pseudo instructions were added in MacroAssembler. > > Main loop and "bcax"/Chi parts were extracted as functions. > > Main loop counter was put in fp register with fp decrement and fcmp (this variant does have a positive impact). > > Updated results from Graviton machines (Linux, intrinsic vs C2): > > Benchmark (digesterName) (length) Pct > G2 > MessageDigests.digest SHA3-256 64 +20.8% > MessageDigests.digest SHA3-256 16384 +27.2% > G3 > MessageDigests.digest SHA3-256 64 +12.8% > MessageDigests.digest SHA3-256 16384 +15.7% > G4 > MessageDigests.digest SHA3-256 64 +9.7% > MessageDigests.digest SHA3-256 16384 +13.2% @dchuyko Thanks for working on this! I have quickly scanned the code, and it looks reasonable, though I am not an intrinsics specialist. I'll not run some internal testing, feel free to ping me again in 24h. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24260#issuecomment-2939079640 From epeter at openjdk.org Wed Jun 4 08:24:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 08:24:24 GMT Subject: RFR: 8337666: AArch64: SHA3 GPR intrinsic [v7] In-Reply-To: <3hGFnUsyrN809lwWuqr7dyxfoCm0F2ILSB-yJV5Hfvo=.1a2048d4-d131-4354-a629-d75f206dda42@github.com> References: <3hGFnUsyrN809lwWuqr7dyxfoCm0F2ILSB-yJV5Hfvo=.1a2048d4-d131-4354-a629-d75f206dda42@github.com> Message-ID: On Tue, 3 Jun 2025 16:31:08 GMT, Dmitry Chuyko wrote: >> This is an implementation of SHA3 intrinsics for AArch64 that operates GPRs. It follows the Java implementation algorithm but eagerly uses available registers. For example, FP+R18 are used when it's allowed. On simpler cores like RPi3 or Surface Pro it is 23-53% faster than C2 compiled version; on Graviton 3 it is 8-14% faster than C2 compiled version (which is faster than the current intrinsic); on Apple Silicon it is faster than C2 compiled version but slower than the ARMv8.2-SHA intrinsic. Improvements on a particular CPU depend on the input length. For instance, for Graviton 2: >> >> >> Benchmark (ops/ms) (digesterName) (length) G2 >> MessageDigests.digest SHA3-256 64 28.28% >> MessageDigests.digest SHA3-256 16384 53.58% >> MessageDigests.digest SHA3-512 64 27.97% >> MessageDigests.digest SHA3-512 16384 43.90% >> MessageDigests.getAndDigest SHA3-256 64 26.18% >> MessageDigests.getAndDigest SHA3-256 16384 52.82% >> MessageDigests.getAndDigest SHA3-512 64 24.73% >> MessageDigests.getAndDigest SHA3-512 16384 44.31% >> >> >> (results for intermediate input lengths look like steps) >> >> On Graviton 4 there is still a noticeable difference between the proposed implementation and C2 generated code: >> >> >> Benchmark (digesterName) (length) Pct >> MessageDigests.digest SHA3-256 64 8.3% >> MessageDigests.digest SHA3-256 16384 11% >> MessageDigests.digest SHA3-512 64 8.4% >> MessageDigests.digest SHA3-512 16384 11.5% >> MessageDigests.getAndDigest SHA3-256 64 7.2% >> MessageDigests.getAndDigest SHA3-256 16384 11% >> MessageDigests.getAndDigest SHA3-512 64 7.3% >> MessageDigests.getAndDigest SHA3-512 16384 11.6% >> >> >> and the version that uses the extension is ~1.8x slower than C2 >> >> Existing intrinsic implementation is put under a flag `UseSIMDForSHA3Intrinsic` which is on by default where the intrinsic is enabled currently. >> >> Sanity tests were modified to cover new intrinsic variants (`-XX:-UseSIMDForSHA3Intrinsic -XX:+-PreserveFramePointer`) on aarch64 hw. Existing test cases where intrinsic is enabled are executed with `-XX:+IgnoreUnrecognizedVMOptions -XX:+UseSIMDForSHA3Intrinsic`, on platforms where the sha3 extension is missing they still are cut off by isSHA3IntrinsicAvailable() predicate. >> >> The original PR https://github.com/openjdk/jdk/pull/20422 has been auto-closed and the branch has... > > Dmitry Chuyko has updated the pull request incrementally with one additional commit since the last revision: > > No imm masking in rolw A nit: can you please fix the alignment issue in the PR description's benchmark results? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24260#issuecomment-2939083282 From jbechberger at openjdk.org Wed Jun 4 08:21:33 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 08:21:33 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: References: <3ALqkSc9a0HKOJrA6CW61v725SxX8FcLmasC8Wm4y24=.9d40a91c-a77a-442b-926a-e5785032c415@github.com> Message-ID: On Wed, 4 Jun 2025 08:14:11 GMT, Markus Gr?nlund wrote: >> I added another `_static_stop_signals` field which should prevent this. > > The _instance is only ever deleted in case a JFR startup attempt fails as part of JfrRecorder::create(). The sampler must have a rate and become enrolled to serve clients (by installing timers). The rate is set post JfrRecorder::create() using the setting system, which implies that _instance != nullptr should be invariant. Yes, you're right. I'll update the code and combine `_active_signal_handlers` and `_stop_signals` in one, so that a CAS loop prevents `_active_signal_handlers` from being incremented when `_stop_signals` is true. This should solve the other data race. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2125985412 From epeter at openjdk.org Wed Jun 4 08:33:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Jun 2025 08:33:20 GMT Subject: RFR: 8358032: Use crypto pmull for CRC32/CRC32C intrinsics on Ampere CPU [v2] In-Reply-To: References: Message-ID: <5BGf1eIVeMQIaLXIoOvcuQlBiaPeWojv8HAnfuOiW_E=.8c39a6ac-c0f2-40fe-bef3-be0a6bd71c07@github.com> On Wed, 4 Jun 2025 03:30:11 GMT, Liming Liu wrote: >> This PR is to enable the use of crypto pmull for CRC32/CRC32C intrinsics on Ampere CPU. There is an option UseCryptoPmullForCRC32 that can enable crypto pmull, but directly enabling it on Ampere CPU will cause the following problems. >> >> 1. There will be regressions (-14% ~ -8%) on Ampere1 when the length is 64. When <= 128, both kernel_crc32_using_crc32 and kernel_crc32_using_crypto_pmull use the loop labeled as CRC_by32_loop, but their implements are a little different, and the loop in kernel_crc32_using_crc32 is better at hiding latency on Ampere1. So this PR takes the loop in kernel_crc32_using_crc32 to kernel_crc32_using_crypto_pmull, and does the same for CRC32C intrinsic. >> >> 2. The intrinsics only use crypto pmull when the length is higher than 383, while the loop in kernel_crc32_common_fold_using_crypto_pmull looks able to handle 256, and if it handles 256 on Ampere1, the improvements can be as high as 110% compared with kernel_crc32_using_crc32/kernel_crc32c_using_crc32c. However, there are regressions (~-6%) on Neoverse V1 when the length is 256. So this PR introduces a new option named CryptoPmullForCRC32LowLimit. It defaults to 256 since the code could handle 256, while it is set to 384 for V1/V2 to keep the old behavior on these platforms. >> >> The performance regressions and improvements were measured with the following microbenchmarks: >> org.openjdk.bench.java.util.TestCRC32.testCRC32Update >> org.openjdk.bench.java.util.TestCRC32C.testCRC32CUpdate >> >> Ran the following JTReg tests on Ampere1 and did not find problems: >> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java >> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java > > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Make it be a diagnostic flag @limingliu-ampere Thanks for working on this! ? Generally looks reasonable to me as a non expert in crypto intrinsics. But we definitively need an expert to approve this in the end. I have a few comments below. Also: it would be nice to have a sanity test where you use that new flag. It could also be an additional run in an existing test (that's probably even better). You may want to run it with a few different values, including non-multiple of `128` just to sanity check the alignment correction as well. I don't know how much runtime that would add, so that should be checked before going too crazy. Having different values for the flag helps us to simulate the behavior of other hardware for example, and that can be quite useful in general. What do you think? src/hotspot/cpu/aarch64/globals_aarch64.hpp line 95: > 93: "Minimum size in bytes when Crypto PMULL will be used." \ > 94: "Value must be a multiple of 128.") \ > 95: range(256, max_jint) \ Is it sane to have negative values? If not, use `uintx`... or maybe even just `uint`? src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4335: > 4333: assert_different_registers(crc, buf, len, tmp0, tmp1, tmp2); > 4334: > 4335: subs(tmp0, len, CryptoPmullForCRC32LowLimit); Would it make sense to have another alignment sanity check here? It would be both helpful to make sure nobody later breaks your assumption, and could also be helpful for the reader to see the `128` alignment immediately. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25609#pullrequestreview-2895780805 PR Review Comment: https://git.openjdk.org/jdk/pull/25609#discussion_r2125999298 PR Review Comment: https://git.openjdk.org/jdk/pull/25609#discussion_r2126003055 From mgronlun at openjdk.org Wed Jun 4 08:43:37 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 4 Jun 2025 08:43:37 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: <3ALqkSc9a0HKOJrA6CW61v725SxX8FcLmasC8Wm4y24=.9d40a91c-a77a-442b-926a-e5785032c415@github.com> References: <3ALqkSc9a0HKOJrA6CW61v725SxX8FcLmasC8Wm4y24=.9d40a91c-a77a-442b-926a-e5785032c415@github.com> Message-ID: On Wed, 4 Jun 2025 03:07:52 GMT, Andrei Pangin wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename autoadapt > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 606: > >> 604: void JfrCPUTimeThreadSampler::init_timers() { >> 605: // install sig handler for sig >> 606: PosixSignals::install_generic_signal_handler(SIG, (void*)::handle_timer_signal); > > SIGPROF is also used by external profilers. Need to check if SIGPROF handler is already installed and warn user. This is *very* important to have a robust failure mechanism when existing handlers are already installed. Why? JFR can be turned on dynamically from the outside, at any time, during runtime. A lot of agents could have installed their handlers by then. Please describe how you intend to handle the case where someone starts JFR late during runtime and the signal handler cannot be installed. > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp line 128: > >> 126: static void send_lost_event(const JfrTicks& time, traceid tid, s4 lost_samples); >> 127: >> 128: // Trigger sampling while a thread is not in a safepoint, from a seperate thread > > typo: separate And again, its not "sampling" that is triggered. It is async processing of the queue holding existing samples. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126029865 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126025355 From aph at openjdk.org Wed Jun 4 08:45:21 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 4 Jun 2025 08:45:21 GMT Subject: RFR: 8358032: Use crypto pmull for CRC32/CRC32C intrinsics on Ampere CPU [v2] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 03:30:11 GMT, Liming Liu wrote: >> This PR is to enable the use of crypto pmull for CRC32/CRC32C intrinsics on Ampere CPU. There is an option UseCryptoPmullForCRC32 that can enable crypto pmull, but directly enabling it on Ampere CPU will cause the following problems. >> >> 1. There will be regressions (-14% ~ -8%) on Ampere1 when the length is 64. When <= 128, both kernel_crc32_using_crc32 and kernel_crc32_using_crypto_pmull use the loop labeled as CRC_by32_loop, but their implements are a little different, and the loop in kernel_crc32_using_crc32 is better at hiding latency on Ampere1. So this PR takes the loop in kernel_crc32_using_crc32 to kernel_crc32_using_crypto_pmull, and does the same for CRC32C intrinsic. >> >> 2. The intrinsics only use crypto pmull when the length is higher than 383, while the loop in kernel_crc32_common_fold_using_crypto_pmull looks able to handle 256, and if it handles 256 on Ampere1, the improvements can be as high as 110% compared with kernel_crc32_using_crc32/kernel_crc32c_using_crc32c. However, there are regressions (~-6%) on Neoverse V1 when the length is 256. So this PR introduces a new option named CryptoPmullForCRC32LowLimit. It defaults to 256 since the code could handle 256, while it is set to 384 for V1/V2 to keep the old behavior on these platforms. >> >> The performance regressions and improvements were measured with the following microbenchmarks: >> org.openjdk.bench.java.util.TestCRC32.testCRC32Update >> org.openjdk.bench.java.util.TestCRC32C.testCRC32CUpdate >> >> Ran the following JTReg tests on Ampere1 and did not find problems: >> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java >> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java > > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Make it be a diagnostic flag src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4355: > 4353: add(buf, buf, 32); > 4354: crc32x(crc, crc, tmp2); > 4355: subs(len, len, 32); What is the point of these changes? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25609#discussion_r2126035063 From iveresov at openjdk.org Wed Jun 4 08:46:11 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 4 Jun 2025 08:46:11 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v4] In-Reply-To: References: Message-ID: > Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: More changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25623/files - new: https://git.openjdk.org/jdk/pull/25623/files/f8a9b4a3..5a7b128f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25623&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25623&range=02-03 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25623.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25623/head:pull/25623 PR: https://git.openjdk.org/jdk/pull/25623 From iveresov at openjdk.org Wed Jun 4 08:46:13 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 4 Jun 2025 08:46:13 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v3] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 00:53:21 GMT, Igor Veresov wrote: >> Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. > > Igor Veresov has updated the pull request incrementally with two additional commits since the last revision: > > - More changes > - Use dedicated OopStorage Ok, testing was clean. Please take another look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25623#issuecomment-2939150820 From aph at openjdk.org Wed Jun 4 08:50:17 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 4 Jun 2025 08:50:17 GMT Subject: RFR: 8358032: Use crypto pmull for CRC32/CRC32C intrinsics on Ampere CPU [v2] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 08:42:58 GMT, Andrew Haley wrote: >> Liming Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Make it be a diagnostic flag > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4355: > >> 4353: add(buf, buf, 32); >> 4354: crc32x(crc, crc, tmp2); >> 4355: subs(len, len, 32); > > What is the point of these changes? To be more precise: converting these adjustments to post-increment operations isn't obviously an improvement on AArch64 generally. How does it help? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25609#discussion_r2126044000 From mdoerr at openjdk.org Wed Jun 4 08:53:17 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 4 Jun 2025 08:53:17 GMT Subject: RFR: 8357155: [asan] ZGC does not work (x86_64 and ppc64) [v3] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 05:27:06 GMT, Matthias Baesken wrote: >> Many (all?) ZGC related jtreg tests do not work when the JDK is built with address sanitizer asan enabled (configure flag --enable-asan). >> This can be seen on SUSE Linux x86_64 and also on ppc64le , opt binaries were used. >> It has been suggested to do a workaround - 'But I think that simply adapting the zAddress_[...].cpp implementations to always select the largest heap base would go a long way for providing ASAN compatibility.' >> This seems to work nicely on x86_64 and ppc64le, however the zgc related tests still fail on Linux aarch64 (should I exclude this platform from my patch?) . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Add comment requested by mdoerr Thanks! LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25549#pullrequestreview-2895870539 From shade at openjdk.org Wed Jun 4 09:01:24 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Jun 2025 09:01:24 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v4] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 08:46:11 GMT, Igor Veresov wrote: >> Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > More changes Right off the bat, before I look at the rest of it: I don't think there is a need to introduce another OopStorage class just for these handles. We already see it would probably require touchups in other code that enumerates OopStorages. So instead, use `VM Global` one? I.e. do: handle = OopHandle(Universe::vm_global(), obj); Also I cannot spot where we clean these. Note that for `OopHandle`-s, you have to explicitly call `.release`, likely in `KlassTrainingData` destructor. ------------- PR Review: https://git.openjdk.org/jdk/pull/25623#pullrequestreview-2895896625 From shade at openjdk.org Wed Jun 4 09:05:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Jun 2025 09:05:16 GMT Subject: RFR: 8358339: Handle MethodCounters::_method backlinks after JDK-8355003 In-Reply-To: References: <0FmJVTYdAq7vmsOi4pi9NRKHm5MfmIrotPucldzsZj4=.b1335bca-f439-4c18-aa1c-6c69548d095d@github.com> Message-ID: On Tue, 3 Jun 2025 12:00:03 GMT, Coleen Phillimore wrote: >> Found this when reading mainline-vs-premain webrev. [JDK-8355003](https://bugs.openjdk.org/browse/JDK-8355003) introduced a backlink to `Method*` in `MethodCounters`. I believe we need to handle that backlink at least in `CodeBuffer::finalize_oop_references()`. premain does this, while mainline does not. Also, amusingly, we have `MethodCounters::is_methodCounters`, but not the super-class `Metadata::is_methodCounters`. >> >> I pulled in the hunks that use `is_methodCounters()` and `MethodCounters::method()` from premain into this PR. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `runtime/cds` >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > And MethodCounters shouldn't be inhertited from Metadata, they're inherited from MetaspaceObj in mainline. We want to avoid virtual function pointers in this type. Before I proceed anywhere with this, I need to understand what @coleenp saw in all this :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25599#issuecomment-2939215148 From mbaesken at openjdk.org Wed Jun 4 09:09:22 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 4 Jun 2025 09:09:22 GMT Subject: Integrated: 8357155: [asan] ZGC does not work (x86_64 and ppc64) In-Reply-To: References: Message-ID: On Fri, 30 May 2025 12:18:46 GMT, Matthias Baesken wrote: > Many (all?) ZGC related jtreg tests do not work when the JDK is built with address sanitizer asan enabled (configure flag --enable-asan). > This can be seen on SUSE Linux x86_64 and also on ppc64le , opt binaries were used. > It has been suggested to do a workaround - 'But I think that simply adapting the zAddress_[...].cpp implementations to always select the largest heap base would go a long way for providing ASAN compatibility.' > This seems to work nicely on x86_64 and ppc64le, however the zgc related tests still fail on Linux aarch64 (should I exclude this platform from my patch?) . This pull request has now been integrated. Changeset: cd16b689 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/cd16b6896222a623dc99b9e63bb917a9d2980e88 Stats: 9 lines in 2 files changed: 9 ins; 0 del; 0 mod 8357155: [asan] ZGC does not work (x86_64 and ppc64) Co-authored-by: Axel Boldt-Christmas Reviewed-by: mdoerr, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/25549 From mbaesken at openjdk.org Wed Jun 4 09:09:21 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 4 Jun 2025 09:09:21 GMT Subject: RFR: 8357155: [asan] ZGC does not work (x86_64 and ppc64) [v3] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 05:27:06 GMT, Matthias Baesken wrote: >> Many (all?) ZGC related jtreg tests do not work when the JDK is built with address sanitizer asan enabled (configure flag --enable-asan). >> This can be seen on SUSE Linux x86_64 and also on ppc64le , opt binaries were used. >> It has been suggested to do a workaround - 'But I think that simply adapting the zAddress_[...].cpp implementations to always select the largest heap base would go a long way for providing ASAN compatibility.' >> This seems to work nicely on x86_64 and ppc64le, however the zgc related tests still fail on Linux aarch64 (should I exclude this platform from my patch?) . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Add comment requested by mdoerr Hi Axel and Martin, thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25549#issuecomment-2939228221 From ayang at openjdk.org Wed Jun 4 09:11:04 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 4 Jun 2025 09:11:04 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v10] In-Reply-To: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: > This patch refines Parallel's sizing strategy to improve overall memory management and performance. > > The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. > > `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. > > GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. > > ## Performance evaluation > > - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). > - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). > - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. > > PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - revert-aliases - Merge branch 'master' into pgc-size-policy - merge - merge-fix - merge - Merge branch 'master' into pgc-size-policy - Merge branch 'master' into pgc-size-policy - review - Merge branch 'master' into pgc-size-policy - review - ... and 4 more: https://git.openjdk.org/jdk/compare/ab235000...72645267 ------------- Changes: https://git.openjdk.org/jdk/pull/25000/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25000&range=09 Stats: 4373 lines in 31 files changed: 522 ins; 3452 del; 399 mod Patch: https://git.openjdk.org/jdk/pull/25000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25000/head:pull/25000 PR: https://git.openjdk.org/jdk/pull/25000 From kevinw at openjdk.org Wed Jun 4 09:31:19 2025 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 4 Jun 2025 09:31:19 GMT Subject: RFR: 8338977: Parallel: Improve heap resizing heuristics [v10] In-Reply-To: References: <9-QvRzQoMkyGxgiTAFpkizJOG8unI4JYBLYu7gigMMQ=.7257790b-1a27-4925-b88a-87c03b3ea536@github.com> Message-ID: On Wed, 4 Jun 2025 09:11:04 GMT, Albert Mingkun Yang wrote: >> This patch refines Parallel's sizing strategy to improve overall memory management and performance. >> >> The young generation layout has been reconfigured from the previous `eden-from/to` arrangement to a new `from/to-eden` order. This new layout facilitates young generation resizing, since we perform resizing after a successful young GC when all live objects are located at the beginning of the young generation. Previously, resizing was often inhibited by live objects residing in the middle of the young generation (from-space). The new layout is illustrated in `parallelScavengeHeap.hpp`. >> >> `NumberSeq` is now used to track various runtime metrics, such as minor/major GC pause durations, promoted/survived bytes after a young GC, highest old generation usage, etc. This tracking primarily lives in `AdaptiveSizePolicy` and its subclass `PSAdaptiveSizePolicy`. >> >> GC overhead checking, which was previously entangled with adaptive resizing logic, has been extracted and is now largely encapsulated in `ParallelScavengeHeap::is_gc_overhead_limit_reached`. >> >> ## Performance evaluation >> >> - SPECjvm2008-Compress shows ~8% improvement on Linux/AArch64 and Linux/x64 (restoring the regression reported in [JDK-8332485](https://bugs.openjdk.org/browse/JDK-8332485) and [JDK-8338689](https://bugs.openjdk.org/browse/JDK-8338689)). >> - Fixes the surprising behavior when using a non-default (smaller) value of `GCTimeRatio` with Heapothesys/Hyperalloc, as discussed in [this thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050146.html). >> - Performance is mostly neutral across other tested benchmarks: **DaCapo**, **SPECjbb2005**, **SPECjbb2015**, **SPECjvm2008**, and **CacheStress**. The number of young-gc sometimes goes up a bit and the total heap-size decreases a bit, because promotion-size-to-old-gen goes down with the more effective eden/survivor-space resizing. >> >> PS: I have opportunistically set the obsolete/expired version to 25/26 for now. I will update them accordingly before merging. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - revert-aliases > - Merge branch 'master' into pgc-size-policy > - merge > - merge-fix > - merge > - Merge branch 'master' into pgc-size-policy > - Merge branch 'master' into pgc-size-policy > - review > - Merge branch 'master' into pgc-size-policy > - review > - ... and 4 more: https://git.openjdk.org/jdk/compare/ab235000...72645267 Thanks for the aliasmap update, looks good. I think alias sun.gc.policy.boundaryMoved is removed here as it's already redundant, the rest all match with the counter being removed in the change. There is a case for removing those old e.g. 1.4.1 aliases separately, in a future change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25000#issuecomment-2939297086 From iveresov at openjdk.org Wed Jun 4 09:52:17 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 4 Jun 2025 09:52:17 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v4] In-Reply-To: References: Message-ID: <3yChSmr2Gswo4p31Bms_biStRj97VCLkCPLtqIGFdb4=.ddb84aff-523b-4836-b15c-1a16f3bee733@github.com> On Wed, 4 Jun 2025 08:46:11 GMT, Igor Veresov wrote: >> Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > More changes We don't need to release them. KTDs are never destroyed. They just die with the process. As for OopStorage @coleenp wants it. I gives a bit of an advantage that we can remove the handle field from KTD (since again, we don't ever need to free them). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25623#issuecomment-2939363346 From iveresov at openjdk.org Wed Jun 4 09:59:15 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 4 Jun 2025 09:59:15 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v4] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 08:46:11 GMT, Igor Veresov wrote: >> Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > More changes I kind of need to push it today before the fork. Let's try making changes to this minimal. I'm also fine reverting back to before @coleenp suggested OopStorage. And we can address the remaining concerns later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25623#issuecomment-2939386000 From shade at openjdk.org Wed Jun 4 09:59:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Jun 2025 09:59:16 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v4] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 09:56:20 GMT, Igor Veresov wrote: > I kind of need to push it today before the fork. Let's try making changes to this minimal. I'm also fine reverting back to before @coleenp suggested OopStorage. And we can address the remaining concerns later. Yeah, let's do OopStorage rewrite as the followup. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25623#issuecomment-2939387077 From iveresov at openjdk.org Wed Jun 4 10:14:31 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 4 Jun 2025 10:14:31 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v5] In-Reply-To: References: Message-ID: <1QNHiJnC7fE8K_KX5gK9VP-OsKNPAqyJfxFjOyfJyP4=.40f70cdb-11d9-4eec-9b25-ec2714fad601@github.com> > Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Undo OopStorage changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25623/files - new: https://git.openjdk.org/jdk/pull/25623/files/5a7b128f..a5693d69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25623&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25623&range=03-04 Stats: 33 lines in 5 files changed: 8 ins; 21 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25623.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25623/head:pull/25623 PR: https://git.openjdk.org/jdk/pull/25623 From iveresov at openjdk.org Wed Jun 4 10:14:32 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 4 Jun 2025 10:14:32 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v4] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 08:46:11 GMT, Igor Veresov wrote: >> Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > More changes Ok, I reverted this to before the OopStorage changes. And file https://bugs.openjdk.org/browse/JDK-8358580 to rethink it later. @coleenp are you ok with that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25623#issuecomment-2939433502 From jbechberger at openjdk.org Wed Jun 4 11:13:16 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 11:13:16 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v44] In-Reply-To: References: Message-ID: <35LXUV5UP0dcnU2ImfP7ny2SyPmJBTYhRT6JbADqWA4=.22d4360e-639c-4e65-86a3-62aad45a2606@github.com> > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with two additional commits since the last revision: - Add error message on signal handler install failure - Fix signal handler synchronization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/55c30aef..4a258e96 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=43 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=42-43 Stats: 71 lines in 2 files changed: 44 ins; 19 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From jbechberger at openjdk.org Wed Jun 4 11:13:17 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 11:13:17 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: References: <3ALqkSc9a0HKOJrA6CW61v725SxX8FcLmasC8Wm4y24=.9d40a91c-a77a-442b-926a-e5785032c415@github.com> Message-ID: On Wed, 4 Jun 2025 08:40:34 GMT, Markus Gr?nlund wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 606: >> >>> 604: void JfrCPUTimeThreadSampler::init_timers() { >>> 605: // install sig handler for sig >>> 606: PosixSignals::install_generic_signal_handler(SIG, (void*)::handle_timer_signal); >> >> SIGPROF is also used by external profilers. Need to check if SIGPROF handler is already installed and warn user. > > This is *very* important to have a robust failure mechanism when existing handlers are already installed. Why? JFR can be turned on dynamically from the outside, at any time, during runtime. A lot of agents could have installed their handlers by then. > > Please describe how you intend to handle the case where someone starts JFR late during runtime and the signal handler cannot be installed. I added a log_error to tell the user >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp line 128: >> >>> 126: static void send_lost_event(const JfrTicks& time, traceid tid, s4 lost_samples); >>> 127: >>> 128: // Trigger sampling while a thread is not in a safepoint, from a seperate thread >> >> typo: separate > > And again, its not "sampling" that is triggered. It is async processing of the queue holding existing samples. I removed the comment, as the method name itself is pretty self-explanatory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126330036 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126332382 From jbechberger at openjdk.org Wed Jun 4 11:18:52 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 11:18:52 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v45] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 127 commits: - Merge branch 'master' into parttimenerd_cooperative_cpu_time_sampler - Add error message on signal handler install failure - Fix signal handler synchronization - Improve - Rename autoadapt - Make process_cpu_time_request private and move up - Reorder condition - Tiny refactoring - Restrict threads for which timers are created - Fix tiny mistake - ... and 117 more: https://git.openjdk.org/jdk/compare/7838321b...4fd4b673 ------------- Changes: https://git.openjdk.org/jdk/pull/25302/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=44 Stats: 2308 lines in 39 files changed: 2164 ins; 128 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From jbechberger at openjdk.org Wed Jun 4 11:28:51 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 11:28:51 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v46] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Improve error message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/4fd4b673..8fe07614 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=45 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=44-45 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From apangin at openjdk.org Wed Jun 4 11:28:51 2025 From: apangin at openjdk.org (Andrei Pangin) Date: Wed, 4 Jun 2025 11:28:51 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v42] In-Reply-To: References: <3ALqkSc9a0HKOJrA6CW61v725SxX8FcLmasC8Wm4y24=.9d40a91c-a77a-442b-926a-e5785032c415@github.com> Message-ID: On Wed, 4 Jun 2025 05:26:42 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 161: >> >>> 159: return 0; >>> 160: } >>> 161: return os::active_processor_count() * 1000000000.0 / rate; >> >> If sampling period is configured as an absolute number in milliseconds, this value must be passed as is. >> Double conversion via `Runtime.availableProcessors()` / `active_processor_count()` is unobvious and error-prone. First, because of asymmetry: e.g. `Runtime.availableProcessors()` may be redefined by an agent so that its value is not aligned with `active_processor_count()`. Second, because number of available processors may change at runtime, e.g., by adjusting cgroup quotas. > > Is this something for a later PR? I'm OK with fixing this separately. >> src/hotspot/share/jfr/support/jfrThreadLocal.cpp line 558: >> >>> 556: void JfrThreadLocal::set_cpu_timer(timer_t* timer) { >>> 557: if (_cpu_timer == nullptr) { >>> 558: _cpu_timer = JfrCHeapObj::new_array(1); >> >> `timer_t` is a primitive type, at most one machine word. Why extra indirection and allocation? > > @mgronlun wanted this indirection to move it abstract from implementation details I don't see how it is an abstraction when the pointer still has concrete `timer_t` type. All POSIX timer functions accept `timer_t` rather than `timer_t*`. This is not a big issue, though, just a minor inefficiency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126360330 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126304082 From apangin at openjdk.org Wed Jun 4 11:28:52 2025 From: apangin at openjdk.org (Andrei Pangin) Date: Wed, 4 Jun 2025 11:28:52 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v43] In-Reply-To: <-BGoOClpsfsd4Q8Wq-H57L3tIvoaLGauYtRBEDPO-_w=.97e25e4f-879d-45f6-bd00-ad53e2463a8d@github.com> References: <-BGoOClpsfsd4Q8Wq-H57L3tIvoaLGauYtRBEDPO-_w=.97e25e4f-879d-45f6-bd00-ad53e2463a8d@github.com> Message-ID: On Wed, 4 Jun 2025 07:00:51 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Improve src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 238: > 236: tl->cpu_time_jfr_queue().resize_for_period(_current_sampling_period_ns / 1000000); > 237: timer_t timerid; > 238: if (create_timer_for_thread(thread, timerid)) { Timer creation failure is not an impossible situation, we should somehow let user know that not all threads are being profiled but without flooding in logs. One warning per profiling session may be a good compromise. You can verify failure condition by setting low `ulimit -i`. src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 492: > 490: assert(_sampler != nullptr, "invariant"); > 491: if (info->si_signo != SIGPROF) { > 492: // not the signal we are interested in No, I meant checking `si_code`. `si_signo` will always be the right one. And this check should come first, before any assertions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126338737 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126316533 From apangin at openjdk.org Wed Jun 4 11:28:54 2025 From: apangin at openjdk.org (Andrei Pangin) Date: Wed, 4 Jun 2025 11:28:54 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v45] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 11:18:52 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 127 commits: > > - Merge branch 'master' into parttimenerd_cooperative_cpu_time_sampler > - Add error message on signal handler install failure > - Fix signal handler synchronization > - Improve > - Rename autoadapt > - Make process_cpu_time_request private and move up > - Reorder condition > - Tiny refactoring > - Restrict threads for which timers are created > - Fix tiny mistake > - ... and 117 more: https://git.openjdk.org/jdk/compare/7838321b...4fd4b673 src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 611: > 609: // increment the count of active signal handlers > 610: u4 old_value = Atomic::fetch_then_add(&_active_signal_handlers, (u4)1, memory_order_acq_rel); > 611: if ((old_value & STOP_SIGNAL_BIT) != 0) { Combining stop signal with a counter is nice, you can then use `Atomic::cmpxchg` to avoid incrementing counter when the stop bit is set. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126354062 From mgronlun at openjdk.org Wed Jun 4 11:28:54 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 4 Jun 2025 11:28:54 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v45] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 11:18:52 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 127 commits: > > - Merge branch 'master' into parttimenerd_cooperative_cpu_time_sampler > - Add error message on signal handler install failure > - Fix signal handler synchronization > - Improve > - Rename autoadapt > - Make process_cpu_time_request private and move up > - Reorder condition > - Tiny refactoring > - Restrict threads for which timers are created > - Fix tiny mistake > - ... and 117 more: https://git.openjdk.org/jdk/compare/7838321b...4fd4b673 src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 647: > 645: // install sig handler for sig > 646: if ((s8)PosixSignals::install_generic_signal_handler(SIG, (void*)::handle_timer_signal) == -1) { > 647: log_error(jfr)("Failed to install signal handler for CPU thread sampling, possibly because another profiler is active: %s", os::strerror(os::get_last_error())); That we are using a signal handler to provide the user with CPU time information is an implementation detail. Its good to provide an error message, but I think it should reflect back on something the user is expecting. Perhaps add a line that says something along the lines of "CPUTimeSample events will not be recorded." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126357345 From jbechberger at openjdk.org Wed Jun 4 11:28:54 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 11:28:54 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v45] In-Reply-To: References: Message-ID: <1vlVzhpxsamhKyaKX4ixcG-JZj4Qxgc0Au3mEnjs_So=.91d8a5a1-7b1b-46c6-991c-a5c61c77e39e@github.com> On Wed, 4 Jun 2025 11:23:57 GMT, Markus Gr?nlund wrote: >> Johannes Bechberger has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 127 commits: >> >> - Merge branch 'master' into parttimenerd_cooperative_cpu_time_sampler >> - Add error message on signal handler install failure >> - Fix signal handler synchronization >> - Improve >> - Rename autoadapt >> - Make process_cpu_time_request private and move up >> - Reorder condition >> - Tiny refactoring >> - Restrict threads for which timers are created >> - Fix tiny mistake >> - ... and 117 more: https://git.openjdk.org/jdk/compare/7838321b...4fd4b673 > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 647: > >> 645: // install sig handler for sig >> 646: if ((s8)PosixSignals::install_generic_signal_handler(SIG, (void*)::handle_timer_signal) == -1) { >> 647: log_error(jfr)("Failed to install signal handler for CPU thread sampling, possibly because another profiler is active: %s", os::strerror(os::get_last_error())); > > That we are using a signal handler to provide the user with CPU time information is an implementation detail. Its good to provide an error message, but I think it should reflect back on something the user is expecting. > > Perhaps add a line that says something along the lines of "CPUTimeSample events will not be recorded." Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126360801 From coleenp at openjdk.org Wed Jun 4 11:36:19 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 4 Jun 2025 11:36:19 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v5] In-Reply-To: <1QNHiJnC7fE8K_KX5gK9VP-OsKNPAqyJfxFjOyfJyP4=.40f70cdb-11d9-4eec-9b25-ec2714fad601@github.com> References: <1QNHiJnC7fE8K_KX5gK9VP-OsKNPAqyJfxFjOyfJyP4=.40f70cdb-11d9-4eec-9b25-ec2714fad601@github.com> Message-ID: On Wed, 4 Jun 2025 10:14:31 GMT, Igor Veresov wrote: >> Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Undo OopStorage changes I'm fine with this and the follow-up issue. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25623#pullrequestreview-2896371220 From coleenp at openjdk.org Wed Jun 4 11:36:21 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 4 Jun 2025 11:36:21 GMT Subject: RFR: 8358003: KlassTrainingData initializer reads garbage holder [v2] In-Reply-To: References: <4ne8DsOBEMC2jSdOBI4l_33Jrs0CXHEKpdrLlBB-2uM=.52428bbb-6abc-4c33-85e7-6aa424c8b4f7@github.com> Message-ID: On Wed, 4 Jun 2025 00:53:56 GMT, Igor Veresov wrote: >> Are there any advantages? > > Ok, transitioned to OopStrage. Please take a look if correctly. I'll be back when the testing is done. The advantage of OopStorage is that jni handles aren't trusted because they come from outside jni calls so have some safefetch code, but OopStorage are trusted so presumably faster. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25623#discussion_r2126372510 From jbechberger at openjdk.org Wed Jun 4 11:37:34 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 11:37:34 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v43] In-Reply-To: References: <-BGoOClpsfsd4Q8Wq-H57L3tIvoaLGauYtRBEDPO-_w=.97e25e4f-879d-45f6-bd00-ad53e2463a8d@github.com> Message-ID: <3tt6-HjldmHpRIKHFqWHaLcC7FRZLwiCZl__j7Ht7Gw=.a14d3aa8-4780-49e2-b9ec-d24a828a1948@github.com> On Wed, 4 Jun 2025 11:13:30 GMT, Andrei Pangin wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 238: > >> 236: tl->cpu_time_jfr_queue().resize_for_period(_current_sampling_period_ns / 1000000); >> 237: timer_t timerid; >> 238: if (create_timer_for_thread(thread, timerid)) { > > Timer creation failure is not an impossible situation, we should somehow let user know that not all threads are being profiled but without flooding in logs. One warning per profiling session may be a good compromise. > You can verify failure condition by setting low `ulimit -i`. I added a "Failed to create timer for a thread" warning ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126373721 From jbechberger at openjdk.org Wed Jun 4 11:37:37 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 11:37:37 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v45] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 11:21:57 GMT, Andrei Pangin wrote: >> Johannes Bechberger has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 127 commits: >> >> - Merge branch 'master' into parttimenerd_cooperative_cpu_time_sampler >> - Add error message on signal handler install failure >> - Fix signal handler synchronization >> - Improve >> - Rename autoadapt >> - Make process_cpu_time_request private and move up >> - Reorder condition >> - Tiny refactoring >> - Restrict threads for which timers are created >> - Fix tiny mistake >> - ... and 117 more: https://git.openjdk.org/jdk/compare/7838321b...4fd4b673 > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 611: > >> 609: // increment the count of active signal handlers >> 610: u4 old_value = Atomic::fetch_then_add(&_active_signal_handlers, (u4)1, memory_order_acq_rel); >> 611: if ((old_value & STOP_SIGNAL_BIT) != 0) { > > Combining stop signal with a counter is nice, you can then use `Atomic::cmpxchg` to avoid incrementing counter when the stop bit is set. I don't see how `Atomic::cmpxchg` would make the code easier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126375482 From dtabata at openjdk.org Wed Jun 4 11:43:31 2025 From: dtabata at openjdk.org (Daishi Tabata) Date: Wed, 4 Jun 2025 11:43:31 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v46] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 11:28:51 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Improve error message https://github.com/openjdk/jdk/pull/25302/commits/a419dabab213e78a2ff7f3c62cd4af72a0fdabed Since the implementation has changed from Loss to Lost, the JEP document needs to be changed back to the original, Lost. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25302#issuecomment-2939698219 From jbechberger at openjdk.org Wed Jun 4 11:43:32 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 11:43:32 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v45] In-Reply-To: References: Message-ID: <6e2kCVaBWwL4UY_zXxuwRDYQKksbEo_uaRH7P8gBDJU=.f52500af-d413-4b2e-bc19-32d2248aa48e@github.com> On Wed, 4 Jun 2025 11:34:44 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 611: >> >>> 609: // increment the count of active signal handlers >>> 610: u4 old_value = Atomic::fetch_then_add(&_active_signal_handlers, (u4)1, memory_order_acq_rel); >>> 611: if ((old_value & STOP_SIGNAL_BIT) != 0) { >> >> Combining stop signal with a counter is nice, you can then use `Atomic::cmpxchg` to avoid incrementing counter when the stop bit is set. > > I don't see how `Atomic::cmpxchg` would make the code easier. With my current code, I avoid having a loop, and in the fast path, I only have one atomic instruction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126383576 From coleenp at openjdk.org Wed Jun 4 11:46:15 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 4 Jun 2025 11:46:15 GMT Subject: RFR: 8358339: Handle MethodCounters::_method backlinks after JDK-8355003 In-Reply-To: <0FmJVTYdAq7vmsOi4pi9NRKHm5MfmIrotPucldzsZj4=.b1335bca-f439-4c18-aa1c-6c69548d095d@github.com> References: <0FmJVTYdAq7vmsOi4pi9NRKHm5MfmIrotPucldzsZj4=.b1335bca-f439-4c18-aa1c-6c69548d095d@github.com> Message-ID: <3vcIHlpqLmTiXCOB3vLT-zxygE07PKsh-ni9dcPMENM=.73ee4382-b2a4-450c-920f-a1ce3d4ff87b@github.com> On Mon, 2 Jun 2025 18:41:42 GMT, Aleksey Shipilev wrote: > Found this when reading mainline-vs-premain webrev. [JDK-8355003](https://bugs.openjdk.org/browse/JDK-8355003) introduced a backlink to `Method*` in `MethodCounters`. I believe we need to handle that backlink at least in `CodeBuffer::finalize_oop_references()`. premain does this, while mainline does not. Also, amusingly, we have `MethodCounters::is_methodCounters`, but not the super-class `Metadata::is_methodCounters`. > > I pulled in the hunks that use `is_methodCounters()` and `MethodCounters::method()` from premain into this PR. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `runtime/cds` > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` My repo was two weeks old so I didn't see this change to give MethodCounters a vptr, and don't know why. At worst the backpointer to Method* in MethodCounters is redundant with the Method* that you're creating the oop_references for, but it shouldn't create two oops. ie, md == ((MethodCounter*)m)->method(); But maybe that's not the case here. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25599#pullrequestreview-2896407247 From jbechberger at openjdk.org Wed Jun 4 11:49:33 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 11:49:33 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v46] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 11:28:51 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Improve error message > [a419dab](https://github.com/openjdk/jdk/commit/a419dabab213e78a2ff7f3c62cd4af72a0fdabed) > Since the implementation has changed from Loss to Lost, the JEP document needs to be changed back to the original, Lost. Good catch, I updated the JEP. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25302#issuecomment-2939723399 From jbechberger at openjdk.org Wed Jun 4 12:05:50 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 12:05:50 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v47] In-Reply-To: References: Message-ID: <7XHQamQvo__d4VCHVNQQqwNEmPLoKh8wtpES1a3ZRDg=.2bc3d95c-c00d-4487-90e2-2341a8da9173@github.com> > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Improve ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/8fe07614..fe53990d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=46 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=45-46 Stats: 11 lines in 1 file changed: 8 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From mgronlun at openjdk.org Wed Jun 4 12:05:50 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 4 Jun 2025 12:05:50 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v46] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 11:28:51 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Improve error message I am approving this PR for the following reasons: 1. We have reached a state that is "good enough" - I no longer see any fundamental design issues that can not be handled by follow-up bug fixes. 2. There are still many vague aspects included with this PR, as many has already pointed out, mostly related to the memory model and thread interactions - all those can, and should, be clarified, explained and exacted post-integration. 3. The feature as a whole is experimental and turned off by default. 4. Today is the penultimate day before JDK 25 cutoff. To give the feature a fair chance for making JDK25, it needs approval now. Thanks a lot Johannes and all involved for your hard work getting this feature ready. Many thanks Markus ------------- Marked as reviewed by mgronlun (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25302#pullrequestreview-2896467191 From mbaesken at openjdk.org Wed Jun 4 12:08:18 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 4 Jun 2025 12:08:18 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured [v2] In-Reply-To: References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> <6HSruHtZNPOZJp4vNFnwMns6-_rP_MEHtnnvAP7S5QU=.e91023a2-089c-4541-86a5-ae8d4adeb99d@github.com> Message-ID: On Wed, 4 Jun 2025 06:26:02 GMT, Matthias Baesken wrote: > In ASAN built JDK, some gtests and some other JTREG tests in runtime/ErrorHandling also fail. btw I did not check ALL gtests but the HS `:tier1` gtests work for me now on Linux x86_64. But make sure the very recent change https://bugs.openjdk.org/browse/JDK-8357155 8357155: [asan] ZGC does not work is included. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25575#issuecomment-2939781272 From rehn at openjdk.org Wed Jun 4 12:08:25 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 4 Jun 2025 12:08:25 GMT Subject: RFR: 8356159: RISC-V: Add Zabha [v12] In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 13:22:04 GMT, Feilong Jiang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision: >> >> - Merge branch 'master' into 8356159 >> - Set ins cost to 2xVOLA for cmpxchg >> - Merge branch 'master' into 8356159 >> - Merge branch 'master' into 8356159 >> - ins cost fixes, print fixes >> - Merge branch 'master' into 8356159 >> - Reg limits fixed >> - Merge branch 'master' into 8356159 >> - Fixed reg selection >> - More indention >> - ... and 11 more: https://git.openjdk.org/jdk/compare/66a7f51f...cc3b8ff7 > > Looks good! Thanks @feilongjiang @RealFYang ------------- PR Comment: https://git.openjdk.org/jdk/pull/25252#issuecomment-2939782354 From jbechberger at openjdk.org Wed Jun 4 12:10:17 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 12:10:17 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v48] In-Reply-To: References: Message-ID: > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Fix timer creation warning ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/fe53990d..8d545e74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=47 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=46-47 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From mbaesken at openjdk.org Wed Jun 4 12:18:20 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 4 Jun 2025 12:18:20 GMT Subject: RFR: 8357826: Avoid running some jtreg tests when asan is configured [v2] In-Reply-To: References: <2VOsPdnaamydEfe2I-79af90nn9xlaRXULKEzrDHkGk=.7b237cd6-0a12-4ec2-8467-4177084b4468@github.com> Message-ID: On Mon, 2 Jun 2025 08:07:38 GMT, Matthias Baesken wrote: >> There are a couple of jtreg tests, especially in the HS area, with very special assumptions about memory layout/sizes . >> Those fail when the address sanitizer is configured ( --enable-asan ). >> The change adds a way to tag those tests with 'requires' so that they can be avoided easily when running jtreg tests with ASAN enabled. >> Adjusting the tests for "pleasing" the sanitizer is not always desired (if possible for some tests it can be done later) . >> While at it, also same is also added for ubsan . > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > TestBreakSignalThreadDump has issues with asan The test AOTCodeCompressedOopsTest.java has the memory error mentioned above fixed now with recent changes , but shows another issue runtime/cds/appcds/aotCode/AOTCodeCompressedOopsTest.java --------------------------------------------------------------- java.lang.RuntimeException: Pattern "narrow_oop_base = 0x(\\d+), narrow_oop_shift = (\\d)" not found in the output at AOTCodeCompressedOopsTest$Tester.checkExecution(AOTCodeCompressedOopsTest.java:184) at jdk.test.lib.cds.CDSAppTester.executeAndCheck(CDSAppTester.java:221) at jdk.test.lib.cds.CDSAppTester.productionRun(CDSAppTester.java:427) at jdk.test.lib.cds.CDSAppTester.productionRun(CDSAppTester.java:392) at AOTCodeCompressedOopsTest.main(AOTCodeCompressedOopsTest.java:58) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) at java.base/java.lang.Thread.run(Thread.java:1474) Maybe we should ask an AOT expert about this, not sure what that means. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25575#issuecomment-2939805054 From jbechberger at openjdk.org Wed Jun 4 12:23:13 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 12:23:13 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v49] In-Reply-To: References: Message-ID: <4aLbfK7e6pncU0QwXORueBxt8WEOz5KYO1pKnpjFOC0=.cf78fb29-a30f-4084-bee7-76c1e6e81f31@github.com> > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Fix build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/8d545e74..fbaf1da6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=48 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=47-48 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From apangin at openjdk.org Wed Jun 4 12:38:21 2025 From: apangin at openjdk.org (Andrei Pangin) Date: Wed, 4 Jun 2025 12:38:21 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v48] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 12:10:17 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix timer creation warning src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 491: > 489: > 490: void JfrCPUTimeThreadSampling::handle_timer_signal(siginfo_t* info, void* context) { > 491: if (info->si_code != SIGPROF) { The correct check is `if (info->si_code != SI_TIMER)` src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 652: > 650: bool JfrCPUSamplerThread::init_timers() { > 651: // install sig handler for sig > 652: if ((s8)PosixSignals::install_generic_signal_handler(SIG, (void*)::handle_timer_signal) == -1) { Comparing return value to `(void*)-1` would be cleaner. But the main problem is that it only checks for `sigaction` failure (which normally never happens), however, we should also check if there was a custom signal handler set _before_ installing our own handler, i.e. old handler is not SIG_IGN or SIG_DFL or `handle_timer_signal`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126447823 PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126488937 From jbechberger at openjdk.org Wed Jun 4 12:47:36 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 12:47:36 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v48] In-Reply-To: References: Message-ID: On Wed, 4 Jun 2025 12:33:45 GMT, Andrei Pangin wrote: >> Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix timer creation warning > > src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 652: > >> 650: bool JfrCPUSamplerThread::init_timers() { >> 651: // install sig handler for sig >> 652: if ((s8)PosixSignals::install_generic_signal_handler(SIG, (void*)::handle_timer_signal) == -1) { > > Comparing return value to `(void*)-1` would be cleaner. > But the main problem is that it only checks for `sigaction` failure (which normally never happens), however, we should also check if there was a custom signal handler set _before_ installing our own handler, i.e. old handler is not SIG_IGN or SIG_DFL or `handle_timer_signal`. Using `sigaction(SIG, NULL, &sa)` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126510947 From rehn at openjdk.org Wed Jun 4 12:50:15 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 4 Jun 2025 12:50:15 GMT Subject: Integrated: 8356159: RISC-V: Add Zabha In-Reply-To: References: Message-ID: On Thu, 15 May 2025 14:08:48 GMT, Robbin Ehn wrote: > Hi, please consider. > > This adds the byte and halfword atomic memory operations (Zabha) - https://github.com/riscv/riscv-zabha. > All amo-instructions, except load-reserve and store-conditional, can also be performed on natural aligned half-words and bytes. (i.e. the extension do not add lr.h/b or sc.h/b) This includes amocas if zacas extension is present. > > The majority of this patch is to support amocas.h/b. We are now starting to really feel the pain of all these extensions, as CAS:ing 16/8-bits can now be done in three different ways: > - lr.w/sc.w 'narrow' CAS (no extension) > - amocas.w 'narrow' CAS (Zacas) > - amocas.h/b (Zacas + Zabha) > > There is no hwprobe support yet. > > Ran t1-3 with Zacas+Zabha and t1 without Zabha in qemu. > > Thanks, Robbin This pull request has now been integrated. Changeset: dc961609 Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/dc961609f84a38164d10852cb92c005c3eb077e4 Stats: 824 lines in 6 files changed: 563 ins; 64 del; 197 mod 8356159: RISC-V: Add Zabha Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/25252 From rvansa at openjdk.org Wed Jun 4 12:50:35 2025 From: rvansa at openjdk.org (Radim Vansa) Date: Wed, 4 Jun 2025 12:50:35 GMT Subject: RFR: 8352075: Perf regression accessing fields [v15] In-Reply-To: <5gclUhzEQCai7QGUBDA16OcIrQcmesMGR1pJd2Hbgbw=.79a0d71a-a246-4a84-9794-43f7ef738b09@github.com> References: <0FXlc_4Zi2WDj-f3MVkUT4farzZJqvCP1CIgRVjbkK8=.3acf7aab-8cd8-494d-962a-340447efe39a@github.com> <5gclUhzEQCai7QGUBDA16OcIrQcmesMGR1pJd2Hbgbw=.79a0d71a-a246-4a84-9794-43f7ef738b09@github.com> Message-ID: On Fri, 30 May 2025 21:14:53 GMT, John R Rose wrote: >> Radim Vansa has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. > >> I like the idea of mapping each element in the table as raw bits, though handling of access to the end of the array would be a bit inconvenient (or we would have to allocate a few extra bytes). > > The code snippet I shared above shows a better way: You load a full 8 (or 4) bytes where the END (not the START) of the word lines up with the LAST (not FIRST) byte. Then you will never run past the end of the array! So, fine, but what about the start of the array? Well, it's inside an `Array` object, which has a length header, which is guaranteed to be safe to load (under a cast or bytewise or whatever). Problem solved. The only thing to avoid is to load an 8-byte word when the packed word size is 1..5 bytes; then you load a 4-byte word. You can load both components at once, and then use a configurable shift (from one machine word) to separate them. This is why I say it saves a half-byte on average. > > These tweaky ideas have three effects: They probably make the code a little simpler (or at least no worse), they reduce the number of memory operations to query a packed array, and they probably use fewer ALU instructions overall. They are certainly worth considering for the general-purpose "searchable packed array" I am envisioning; they are optional for this particular bug, viewed in isolation. > >> I've changed the algorithm to use unsigned integers; in fact I find a bit annoying that most of the indices used throughout the related code are signed. > > Yes, it annoys me also. It's playing with fire (or walking the firepit). > >> I've also added a test generating class with a different number of fields, though running it through the full range of fields (0-65535, though in practice the upper bound is rather 26k) would be excessive; even now it takes more than a minute on my machine. Also, I realize that varying the number of fields does not result in full coverage of possible stream sizes; per-field records have probably rather uniform lengths. > > Yeah, a gtest on the binary search would cover most of those issues, faster and cleaner. Then loading many gigantic classfiles will be unnecessary. Just a few classfiles at several scales, probably, and thorough gtest-level unit testing, gets a better result in less time. As I said above, I'm willing to put off some of the refactoring, given that it should cover other, prior occurrences of binary search (so it's got a larger scope than this bug). > >> @rose00 OK, so I have refactored out the PackedTable that now h... @rose00 Hi, would you be OK with the current implementation? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24847#issuecomment-2939915008 From jbechberger at openjdk.org Wed Jun 4 12:56:22 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 12:56:22 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v50] In-Reply-To: References: Message-ID: <4_cKaFGWs_Wf0mcRY-lbaEn5i_DJfUoqpaNPhF8E_pw=.b82280fe-ed5b-42f0-85af-6dd15d297ba0@github.com> > This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). > > Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with > - ... different heap sizes > - ... different GCs > - ... different samplers (the standard JFR and the new CPU Time Sampler and both) > - ... different JFR recording durations > - ... different chunk-sizes Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: Fix build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25302/files - new: https://git.openjdk.org/jdk/pull/25302/files/fbaf1da6..e4558a6e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=49 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25302&range=48-49 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25302/head:pull/25302 PR: https://git.openjdk.org/jdk/pull/25302 From jbechberger at openjdk.org Wed Jun 4 13:07:15 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 4 Jun 2025 13:07:15 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v48] In-Reply-To: References: Message-ID: <-JmnqMbD8vGy_dVeVUv59WrjuCavWV3F3w9HMTxhAQM=.2c079574-0729-4c70-af86-946a3204f7b6@github.com> On Wed, 4 Jun 2025 12:44:53 GMT, Johannes Bechberger wrote: >> src/hotspot/share/jfr/periodic/sampling/jfrCPUTimeThreadSampler.cpp line 652: >> >>> 650: bool JfrCPUSamplerThread::init_timers() { >>> 651: // install sig handler for sig >>> 652: if ((s8)PosixSignals::install_generic_signal_handler(SIG, (void*)::handle_timer_signal) == -1) { >> >> Comparing return value to `(void*)-1` would be cleaner. >> But the main problem is that it only checks for `sigaction` failure (which normally never happens), however, we should also check if there was a custom signal handler set _before_ installing our own handler, i.e. old handler is not SIG_IGN or SIG_DFL or `handle_timer_signal`. > > Using `sigaction(SIG, NULL, &sa)` ? I'm currently implementing the check against SIG_IGN and SIG_DFL, as `handle_timer_signal` should never occur. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25302#discussion_r2126551278 From iwalulya at openjdk.org Wed Jun 4 13:51:57 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 4 Jun 2025 13:51:57 GMT Subject: RFR: 8358294: Remove unnecessary GenAlignment In-Reply-To: References: Message-ID: On Mon, 2 Jun 2025 08:36:08 GMT, Albert Mingkun Yang wrote: > Simple replacement of `GenAlignment` with `SpaceAlignment`, because they always have the same value. Removing the former to reduce complexity. > > Test: tier1-3 LGTM! ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25577#pullrequestreview-2896822028 From mdoerr at openjdk.org Wed Jun 4 14:01:16 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 4 Jun 2025 14:01:16 GMT Subject: RFR: 8342818: Implement JEP 509: JFR CPU-Time Profiling [v50] In-Reply-To: <4_cKaFGWs_Wf0mcRY-lbaEn5i_DJfUoqpaNPhF8E_pw=.b82280fe-ed5b-42f0-85af-6dd15d297ba0@github.com> References: <4_cKaFGWs_Wf0mcRY-lbaEn5i_DJfUoqpaNPhF8E_pw=.b82280fe-ed5b-42f0-85af-6dd15d297ba0@github.com> Message-ID: On Wed, 4 Jun 2025 12:56:22 GMT, Johannes Bechberger wrote: >> This is the code for the [JEP 509: CPU Time based profiling for JFR](https://openjdk.org/jeps/509). >> >> Currently tested using [this test suite](https://github.com/parttimenerd/basic-profiler-tests). This runs profiles the [Renaissance](https://renaissance.dev/) benchmark with >> - ... different heap sizes >> - ... different GCs >> - ... different samplers (the standard JFR and the new CPU Time Sampler and both) >> - ... different JFR recording durations >> - ... different chunk-sizes > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Fix build I've looked over it and couldn't spot any critical issue. I think it's good enough for an experimental feature if we do further cleanups and improvements later. What I'd like to see as a follow-up is a review of the usage of `Atomic` functions. I've never seen so many of them in such a density. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25302#pullrequestreview-2896853345 From iveresov at openjdk.org Wed Jun 4 14:11:00 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 4 Jun 2025 14:11:00 GMT Subject: Integrated: 8358003: KlassTrainingData initializer reads garbage holder In-Reply-To: References: Message-ID: On Tue, 3 Jun 2025 17:36:13 GMT, Igor Veresov wrote: > Simplify KlassTrainingData constructor. The lines in question come from the old pre-CDS world. They are not needed anymore. This pull request has now been integrated. Changeset: ae1892fb Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/ae1892fb0fb6b7646f9ca60067d6945ccea7f888 Stats: 18 lines in 1 file changed: 0 ins; 13 del; 5 mod 8358003: KlassTrainingData initializer reads garbage holder Reviewed-by: coleenp, shade, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/25623 From shade at openjdk.org Wed Jun 4 14:12:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Jun 2025 14:12:51 GMT Subject: RFR: 8358339: Handle MethodCounters::_method backlinks after JDK-8355003 In-Reply-To: <0FmJVTYdAq7vmsOi4pi9NRKHm5MfmIrotPucldzsZj4=.b1335bca-f439-4c18-aa1c-6c69548d095d@github.com> References: <0FmJVTYdAq7vmsOi4pi9NRKHm5MfmIrotPucldzsZj4=.b1335bca-f439-4c18-aa1c-6c69548d095d@github.com> Message-ID: On Mon, 2 Jun 2025 18:41:42 GMT, Aleksey Shipilev wrote: > Found this when reading mainline-vs-premain webrev. [JDK-8355003](https://bugs.openjdk.org/browse/JDK-8355003) introduced a backlink to `Method*` in `MethodCounters`. I believe we need to handle that backlink at least in `CodeBuffer::finalize_oop_references()`. premain does this, while mainline does not. Also, amusingly, we have `MethodCounters::is_methodCounters`, but not the super-class `Metadata::is_methodCounters`. > > I pulled in the hunks that use `is_methodCounters()` and `MethodCounters::method()` from premain into this PR. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `runtime/cds` > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` OK, phew. I thought I am not seeing some huge gap here. Thanks! I think we are ready to integrate this. Just checking if @veresov is also okay with it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25599#issuecomment-2940192413