From zgu at openjdk.org Sat Feb 1 16:42:50 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Sat, 1 Feb 2025 16:42:50 GMT Subject: RFR: 8348171: Refactor GenerationCounters and its subclasses [v4] In-Reply-To: References: Message-ID: On Fri, 31 Jan 2025 12:30:30 GMT, Albert Mingkun Yang wrote: >> Simple refactoring of removing the use of `virtual` method and use concrete subclasses when needed. >> >> Test: tier1-5 > > Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: > > - review > - * some more refactoring src/hotspot/share/gc/shenandoah/shenandoahMonitoringSupport.cpp line 39: > 37: GenerationCounters("Young", 0, 0, 0, (size_t)0, (size_t)0) {}; > 38: > 39: void update_all() { Shenandoah looks a bit odd now. @kdnilsen @wkemper and @ysramakrishna may want to take a look? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23209#discussion_r1938303987 From zgu at openjdk.org Sat Feb 1 16:47:52 2025 From: zgu at openjdk.org (Zhengyu Gu) Date: Sat, 1 Feb 2025 16:47:52 GMT Subject: RFR: 8348171: Refactor GenerationCounters and its subclasses [v4] In-Reply-To: References: Message-ID: On Fri, 31 Jan 2025 12:30:30 GMT, Albert Mingkun Yang wrote: >> Simple refactoring of removing the use of `virtual` method and use concrete subclasses when needed. >> >> Test: tier1-5 > > Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: > > - review > - * some more refactoring `Shenandoah` code no longer aligns to others. Other than that, LGTM. ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23209#pullrequestreview-2588364961 From tschatzl at openjdk.org Mon Feb 3 09:25:50 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Feb 2025 09:25:50 GMT Subject: RFR: 8348171: Refactor GenerationCounters and its subclasses [v4] In-Reply-To: References: Message-ID: <7k73_VjUmBq7-G2reVDOlB7-vSazUekr8Q3Ez3houa0=.61d2baf5-72c6-41cd-aa74-c49a5b5e9ce1@github.com> On Fri, 31 Jan 2025 12:30:30 GMT, Albert Mingkun Yang wrote: >> Simple refactoring of removing the use of `virtual` method and use concrete subclasses when needed. >> >> Test: tier1-5 > > Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: > > - review > - * some more refactoring Lgtm. Related to @zhengyu123 's comment, not sure right now what is meant with "looking odd" here as the previous code did not update the counters either, but it might be useful to wait on Shenandoah team's input anyway. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23209#pullrequestreview-2589340090 From tschatzl at openjdk.org Mon Feb 3 15:20:19 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Feb 2025 15:20:19 GMT Subject: RFR: 8349213: G1: Clearing bitmaps during collection set merging not claimed by region Message-ID: Hi all, please review this change that makes (optional) bitmap clearing during merging remembered sets claim regions. Otherwise every thread will do the (currently little) work themselves over and over again. Testing: tier1-3 Thanks, Thomas ------------- Commit messages: - * move commenty - 8349213: G1: Clearing bitmaps during collection set merging not claimed by region Changes: https://git.openjdk.org/jdk/pull/23419/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23419&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349213 Stats: 30 lines in 1 file changed: 8 ins; 20 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23419.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23419/head:pull/23419 PR: https://git.openjdk.org/jdk/pull/23419 From mdoerr at openjdk.org Mon Feb 3 18:03:26 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 3 Feb 2025 18:03:26 GMT Subject: [jdk24] RFR: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis Message-ID: Clean backport of [JDK-8348562](https://bugs.openjdk.org/browse/JDK-8348562). It only adds a null check + bailout where the current implementation crashes with SIGSEGV. ------------- Commit messages: - Backport afcc2b03afc77f730300e1d92471466d56ed75fb Changes: https://git.openjdk.org/jdk/pull/23422/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23422&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8348562 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23422.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23422/head:pull/23422 PR: https://git.openjdk.org/jdk/pull/23422 From mdoerr at openjdk.org Mon Feb 3 18:18:48 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 3 Feb 2025 18:18:48 GMT Subject: [jdk24] RFR: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis In-Reply-To: References: Message-ID: <4iwi6PbCt7B6GE731aOGDZsEl9KiT2ZERf-r7JUjiq8=.6fc8933b-4718-4a48-9064-ca205bc25630@github.com> On Mon, 3 Feb 2025 17:56:23 GMT, Martin Doerr wrote: > Clean backport of [JDK-8348562](https://bugs.openjdk.org/browse/JDK-8348562). It only adds a null check + bailout where the current implementation crashes with SIGSEGV. @TobiHartmann This is the jdk24 backport. Please take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23422#issuecomment-2631726161 From kvn at openjdk.org Mon Feb 3 18:18:47 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 3 Feb 2025 18:18:47 GMT Subject: [jdk24] RFR: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 17:56:23 GMT, Martin Doerr wrote: > Clean backport of [JDK-8348562](https://bugs.openjdk.org/browse/JDK-8348562). It only adds a null check + bailout where the current implementation crashes with SIGSEGV. The fix was requested for JDK 24 update (jdk24u repository) not for JDK 24 branch which this change is based on (if I see this correctly). ------------- Changes requested by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23422#pullrequestreview-2590672470 From mdoerr at openjdk.org Mon Feb 3 18:22:46 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 3 Feb 2025 18:22:46 GMT Subject: [jdk24] RFR: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 18:15:53 GMT, Vladimir Kozlov wrote: >> Clean backport of [JDK-8348562](https://bugs.openjdk.org/browse/JDK-8348562). It only adds a null check + bailout where the current implementation crashes with SIGSEGV. > > The fix was requested for JDK 24 update (jdk24u repository) not for JDK 24 branch which this change is based on (if I see this correctly). @vnkozlov I had originally targeted 24u, but Tobias has reclassified it as P2, so this is the new PR. I will close the other one if this one gets approved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23422#issuecomment-2631736130 From kvn at openjdk.org Mon Feb 3 18:22:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 3 Feb 2025 18:22:46 GMT Subject: [jdk24] RFR: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 18:19:25 GMT, Martin Doerr wrote: >> The fix was requested for JDK 24 update (jdk24u repository) not for JDK 24 branch which this change is based on (if I see this correctly). > > @vnkozlov I had originally targeted 24u, but Tobias has reclassified it as P2, so this is the new PR. I will close the other one if this one gets approved. @TheRealMDoerr you need new Fix request and approval for JDK 24 in bug report. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23422#issuecomment-2631739024 From mdoerr at openjdk.org Mon Feb 3 18:37:46 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 3 Feb 2025 18:37:46 GMT Subject: [jdk24] RFR: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 18:19:25 GMT, Martin Doerr wrote: >> The fix was requested for JDK 24 update (jdk24u repository) not for JDK 24 branch which this change is based on (if I see this correctly). > > @vnkozlov I had originally targeted 24u, but Tobias has reclassified it as P2, so this is the new PR. I will close the other one if this one gets approved. > @TheRealMDoerr you need new Fix request and approval for JDK 24 in bug report. JDK24 requires a review instead of a maintainer approval. See Skara messages above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23422#issuecomment-2631767998 From kvn at openjdk.org Mon Feb 3 18:59:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 3 Feb 2025 18:59:50 GMT Subject: [jdk24] RFR: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 18:35:22 GMT, Martin Doerr wrote: > > @TheRealMDoerr you need new Fix request and approval for JDK 24 in bug report. > > JDK24 requires a review instead of a maintainer approval. See Skara messages above. We are in RDP 2 phase - you need approval for fixes there: https://openjdk.org/jeps/3#rdp-2 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23422#issuecomment-2631811256 From kvn at openjdk.org Mon Feb 3 19:07:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 3 Feb 2025 19:07:08 GMT Subject: [jdk24] RFR: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 17:56:23 GMT, Martin Doerr wrote: > Clean backport of [JDK-8348562](https://bugs.openjdk.org/browse/JDK-8348562). It only adds a null check + bailout where the current implementation crashes with SIGSEGV. To clarify. It is different approval from approval for update release. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23422#issuecomment-2631817357 From kvn at openjdk.org Mon Feb 3 19:07:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 3 Feb 2025 19:07:08 GMT Subject: [jdk24] RFR: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 19:02:25 GMT, Martin Doerr wrote: > Ok. Thanks! I've created the approval request manually. Skara doesn't support it. Yes, it is manual process - you need to add label and comment to main bug report. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23422#issuecomment-2631830611 From mdoerr at openjdk.org Mon Feb 3 19:07:08 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 3 Feb 2025 19:07:08 GMT Subject: [jdk24] RFR: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 17:56:23 GMT, Martin Doerr wrote: > Clean backport of [JDK-8348562](https://bugs.openjdk.org/browse/JDK-8348562). It only adds a null check + bailout where the current implementation crashes with SIGSEGV. Ok. Thanks! I've created the approval request manually. Skara doesn't support it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23422#issuecomment-2631827041 From kvn at openjdk.org Mon Feb 3 19:09:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 3 Feb 2025 19:09:51 GMT Subject: [jdk24] RFR: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 19:02:25 GMT, Martin Doerr wrote: >> Clean backport of [JDK-8348562](https://bugs.openjdk.org/browse/JDK-8348562). It only adds a null check + bailout where the current implementation crashes with SIGSEGV. > > Ok. Thanks! I've created the approval request manually. Skara doesn't support it. @TheRealMDoerr Please, add fix request comment too. You can copy jdk24u request. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23422#issuecomment-2631837256 From kvn at openjdk.org Mon Feb 3 19:22:45 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 3 Feb 2025 19:22:45 GMT Subject: [jdk24] RFR: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 17:56:23 GMT, Martin Doerr wrote: > Clean backport of [JDK-8348562](https://bugs.openjdk.org/browse/JDK-8348562). It only adds a null check + bailout where the current implementation crashes with SIGSEGV. Looks good. Good. I approved request as Area Lead. Formalities are done ;^) Now we can review and integrate this into JDK 24. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23422#pullrequestreview-2590805301 PR Comment: https://git.openjdk.org/jdk/pull/23422#issuecomment-2631867578 From wkemper at openjdk.org Mon Feb 3 20:36:09 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 3 Feb 2025 20:36:09 GMT Subject: RFR: 8348268: Test gc/shenandoah/TestResizeTLAB.java#compact: fatal error: Before Updating References: Thread C2 CompilerThread1: expected gc-state 9, actual 21 Message-ID: Non-java threads were not having their gc-state configured when they attach. If they were created before the verifier's safepoint, but after the iteration over non-java threads, they would not have the correct state. ------------- Commit messages: - Set gc state for all attached threads (not just java threads). Changes: https://git.openjdk.org/jdk/pull/23428/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23428&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8348268 Stats: 4 lines in 1 file changed: 3 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23428.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23428/head:pull/23428 PR: https://git.openjdk.org/jdk/pull/23428 From wkemper at openjdk.org Mon Feb 3 22:34:13 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 3 Feb 2025 22:34:13 GMT Subject: [jdk24] RFR: 8349002: GenShen: Deadlock during shutdown Message-ID: <6a_6G1g93RUACyYcHG5B9HtqFBaqdETRRdhvFWwrfi8=.e88c6252-f6c4-4e3f-972a-2c4495d27127@github.com> Clean backport. Fixes bug introduced by [JDK-8345970](https://bugs.openjdk.org/browse/JDK-8345970). ------------- Commit messages: - Backport 06ebb170bac3879dc1e378b48b1c7ef006070c86 Changes: https://git.openjdk.org/jdk/pull/23429/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23429&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349002 Stats: 5 lines in 2 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23429.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23429/head:pull/23429 PR: https://git.openjdk.org/jdk/pull/23429 From kvn at openjdk.org Mon Feb 3 23:51:17 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 3 Feb 2025 23:51:17 GMT Subject: [jdk24] RFR: 8349002: GenShen: Deadlock during shutdown In-Reply-To: <6a_6G1g93RUACyYcHG5B9HtqFBaqdETRRdhvFWwrfi8=.e88c6252-f6c4-4e3f-972a-2c4495d27127@github.com> References: <6a_6G1g93RUACyYcHG5B9HtqFBaqdETRRdhvFWwrfi8=.e88c6252-f6c4-4e3f-972a-2c4495d27127@github.com> Message-ID: On Mon, 3 Feb 2025 22:27:44 GMT, William Kemper wrote: > Clean backport. Fixes bug introduced by [JDK-8345970](https://bugs.openjdk.org/browse/JDK-8345970). We are in RDP2 phase of JDK 24 release. Only P1 and P2 are allowed to be pushed with approval: https://openjdk.org/jeps/3#rdp-2 Consider backporting the fix into JDK 24 Update release. ------------- Changes requested by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23429#pullrequestreview-2591384756 From ayang at openjdk.org Tue Feb 4 09:22:13 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 4 Feb 2025 09:22:13 GMT Subject: RFR: 8349213: G1: Clearing bitmaps during collection set merging not claimed by region In-Reply-To: References: Message-ID: <5YSfBhp40MOFgK-EbKrg1vY-X6ZuKHXmcnFi40hQp54=.2021c2c8-c22a-485c-b987-681e2e032f86@github.com> On Mon, 3 Feb 2025 14:11:20 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that makes (optional) bitmap clearing during merging remembered sets claim regions. Otherwise every thread will do the (currently little) work themselves over and over again. > > Testing: tier1-3 > > Thanks, > Thomas src/hotspot/share/gc/g1/g1RemSet.cpp line 1390: > 1388: g1h->collection_set_iterate_increment_from(&merge, worker_id); > 1389: for (uint i = 0; i < G1GCPhaseTimes::MergeRSContainersSentinel; i++) { > 1390: p->record_or_add_thread_work_item(merge_remset_phase, worker_id, merge.stats().merged(i), i); `stats()` has side-effect; should be invoked only once. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23419#discussion_r1940791948 From tschatzl at openjdk.org Tue Feb 4 09:50:16 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Feb 2025 09:50:16 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME In-Reply-To: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Thu, 30 Jan 2025 12:12:29 GMT, Albert Mingkun Yang wrote: > Here is an attempt to simplify GCLocker implementation for Serial and Parallel. > > GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. > > The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. > > Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. > > Test: tier1-8 * Idk if GCLocker JFR events need to be available in metadata.xml if the VM does not actually ever send it. I think it does not. Maybe it is used to decode from old recordings, may be worth asking e.g. @egahlin . * the bot shows a failure that this PR's CR number shows up in the problemlist, that line needs to be deleted as well. Further it would be interesting to see how many retries there are in the allocation loop with these jnilock* stress test. * another issue, probably todo is that while Parallel GC has the emergency bailout via GC Overhead limit after excessive retries, Serial does not. Which means that it might retry for a long time, which isn't good (while it did earlier if the number of retries due to gclocker exceed that threshold) src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 323: > 321: } > 322: > 323: if (result == nullptr) { pre-existing: is it actually possible that `result` is not `nullptr` here? The code above always returns with a non-null result. Maybe assert this instead. src/hotspot/share/gc/shared/gcLocker.cpp line 86: > 84: void GCLocker::block() { > 85: assert(_lock->is_locked(), "precondition"); > 86: assert(Atomic::load(&_is_gc_request_pending) == false, "precondition"); Suggestion: assert(!Atomic::load(&_is_gc_request_pending), "precondition"); src/hotspot/share/gc/shared/gcLocker.cpp line 106: > 104: > 105: #ifdef ASSERT > 106: // Matching the storestore in GCLocker::exit Suggestion: // Matching the storestore in GCLocker::exit. src/hotspot/share/gc/shared/gcLocker.cpp line 114: > 112: void GCLocker::unblock() { > 113: assert(_lock->is_locked(), "precondition"); > 114: assert(Atomic::load(&_is_gc_request_pending) == true, "precondition"); Suggestion: assert(Atomic::load(&_is_gc_request_pending), "precondition"); src/hotspot/share/gc/shared/gcLocker.hpp line 31: > 29: #include "memory/allStatic.hpp" > 30: #include "runtime/mutex.hpp" > 31: Documentation how GCLocker works/is supposed to work is missing here. It's not exactly trivial. src/hotspot/share/gc/shared/gcLocker.hpp line 33: > 31: > 32: class GCLocker: public AllStatic { > 33: static Monitor* _lock; Not sure if having this copy/reference to `Heap_lock` makes the code more clear than referencing `Heap_lock` directly. It needs to be `Heap_lock` anyway. src/hotspot/share/gc/shared/gcLocker.hpp line 37: > 35: > 36: #ifdef ASSERT > 37: static uint64_t _debug_count; Maybe the variable could be named something less generic, indicating what it is counting. Or add a comment. src/hotspot/share/gc/shared/gcLocker.inline.hpp line 40: > 38: if (Atomic::load(&_is_gc_request_pending)) { > 39: thread->exit_critical(); > 40: // slow-path Suggestion: Not sure what this `slow-path` comment helps with. Maybe it is describing the next method (but it is named very similarly), or this is an attempt to describe the true-block of the if. In the latter case, it would maybe be better to put this comment at the start of the true-block of the if, and say something more descriptive like `// Another thread is requesting gc, enter slow path.` Not sure, feel free to ignore, it's just that to me the comment should either be removed or put upwards a line. src/hotspot/share/gc/shared/gcLocker.inline.hpp line 56: > 54: if (thread->in_last_critical()) { > 55: Atomic::add(&_debug_count, (uint64_t)-1); > 56: // Matching the loadload in GCLocker::block Suggestion: // Matching the loadload in GCLocker::block. src/hotspot/share/gc/shared/gcTraceSend.cpp line 364: > 362: #if INCLUDE_JFR > 363: > 364: #endif Please remove this empty `#if/#endif` block. src/hotspot/share/gc/shared/gc_globals.hpp line 162: > 160: "blocked by the GC locker") \ > 161: range(0, max_uintx) \ > 162: \ This removal should warrant a release note; while it's a diagnostic option and we can remove at a whim, it is in use to workaround issues. src/hotspot/share/prims/whitebox.cpp line 48: > 46: #include "gc/shared/concurrentGCBreakpoints.hpp" > 47: #include "gc/shared/gcConfig.hpp" > 48: #include "gc/shared/gcLocker.hpp" Suggestion: The file does not seem to use the `GCLocker` class anymore, please remove this line as well. ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2592106484 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940732531 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940775211 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940813063 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940779840 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940770235 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940769765 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940796501 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940793704 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940812598 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940746077 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940748992 PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1940752118 From tschatzl at openjdk.org Tue Feb 4 10:35:44 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Feb 2025 10:35:44 GMT Subject: RFR: 8349213: G1: Clearing bitmaps during collection set merging not claimed by region [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that makes (optional) bitmap clearing during merging remembered sets claim regions. Otherwise every thread will do the (currently little) work themselves over and over again. > > Testing: tier1-3 > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * ayang review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23419/files - new: https://git.openjdk.org/jdk/pull/23419/files/ba3a9ec7..a73b3b34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23419&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23419&range=00-01 Stats: 4 lines in 1 file changed: 2 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23419.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23419/head:pull/23419 PR: https://git.openjdk.org/jdk/pull/23419 From tschatzl at openjdk.org Tue Feb 4 10:35:44 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Feb 2025 10:35:44 GMT Subject: RFR: 8349213: G1: Clearing bitmaps during collection set merging not claimed by region [v2] In-Reply-To: <5YSfBhp40MOFgK-EbKrg1vY-X6ZuKHXmcnFi40hQp54=.2021c2c8-c22a-485c-b987-681e2e032f86@github.com> References: <5YSfBhp40MOFgK-EbKrg1vY-X6ZuKHXmcnFi40hQp54=.2021c2c8-c22a-485c-b987-681e2e032f86@github.com> Message-ID: On Tue, 4 Feb 2025 09:19:04 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * ayang review > > src/hotspot/share/gc/g1/g1RemSet.cpp line 1390: > >> 1388: g1h->collection_set_iterate_increment_from(&merge, worker_id); >> 1389: for (uint i = 0; i < G1GCPhaseTimes::MergeRSContainersSentinel; i++) { >> 1390: p->record_or_add_thread_work_item(merge_remset_phase, worker_id, merge.stats().merged(i), i); > > `stats()` has side-effect; should be invoked only once. Nice catch! Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23419#discussion_r1940911005 From ayang at openjdk.org Tue Feb 4 10:58:09 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 4 Feb 2025 10:58:09 GMT Subject: RFR: 8349213: G1: Clearing bitmaps during collection set merging not claimed by region [v2] In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 10:35:44 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that makes (optional) bitmap clearing during merging remembered sets claim regions. Otherwise every thread will do the (currently little) work themselves over and over again. >> >> Testing: tier1-3 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * ayang review Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23419#pullrequestreview-2592457802 From thartmann at openjdk.org Tue Feb 4 12:09:10 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 4 Feb 2025 12:09:10 GMT Subject: [jdk24] RFR: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis In-Reply-To: References: Message-ID: <5LRr8GRO6lq8Y2c8cYr91PMm9XoJ5sq3m_1NJhGeOWE=.384cd75d-810f-47a7-a80f-c09e8f04619b@github.com> On Mon, 3 Feb 2025 17:56:23 GMT, Martin Doerr wrote: > Clean backport of [JDK-8348562](https://bugs.openjdk.org/browse/JDK-8348562). It only adds a null check + bailout where the current implementation crashes with SIGSEGV. Marked as reviewed by thartmann (Reviewer). Looks good to me too. ------------- PR Review: https://git.openjdk.org/jdk/pull/23422#pullrequestreview-2592626045 PR Comment: https://git.openjdk.org/jdk/pull/23422#issuecomment-2633708939 From mdoerr at openjdk.org Tue Feb 4 13:13:19 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 4 Feb 2025 13:13:19 GMT Subject: [jdk24] RFR: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 17:56:23 GMT, Martin Doerr wrote: > Clean backport of [JDK-8348562](https://bugs.openjdk.org/browse/JDK-8348562). It only adds a null check + bailout where the current implementation crashes with SIGSEGV. Thanks for the reviews and for the assistance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23422#issuecomment-2633855619 From mdoerr at openjdk.org Tue Feb 4 13:13:19 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 4 Feb 2025 13:13:19 GMT Subject: [jdk24] Integrated: 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 17:56:23 GMT, Martin Doerr wrote: > Clean backport of [JDK-8348562](https://bugs.openjdk.org/browse/JDK-8348562). It only adds a null check + bailout where the current implementation crashes with SIGSEGV. This pull request has now been integrated. Changeset: b1659e34 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/b1659e345afa7d660e832f0d8ce48707ac99e824 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8348562: ZGC: segmentation fault due to missing node type check in barrier elision analysis Reviewed-by: kvn, thartmann Backport-of: afcc2b03afc77f730300e1d92471466d56ed75fb ------------- PR: https://git.openjdk.org/jdk/pull/23422 From phh at openjdk.org Tue Feb 4 15:53:13 2025 From: phh at openjdk.org (Paul Hohensee) Date: Tue, 4 Feb 2025 15:53:13 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check In-Reply-To: References: Message-ID: On Fri, 24 Jan 2025 18:30:02 GMT, Kelvin Nilsen wrote: > At the end of a degenerated GC, we check whether sufficient progress has been made in replenishing the memory available to the mutator. The test for good progress is implemented as a ratio of free memory against the total heap size. > > For generational Shenandoah, the ratio should be computed against the size of the young generation. Note that the size of the generational collection set is based on young generation size rather than total heap size. > > This issue first identified in GenShen GC logs, where a large number of degenerated cycles were upgrading to full GC because the free-set progress was short of desired by 10-25%. src/hotspot/share/gc/shenandoah/shenandoahMetrics.cpp line 52: > 50: size_t free_actual = free_set->available(); > 51: // The sum of free_set->capacity() and ->reserved represents capacity of young in generational, heap in non-generational. > 52: size_t free_expected = ((free_set->capacity() + free_set->reserved()) / 100) * ShenandoahCriticalFreeThreshold; As an outsider, the units involved and what exactly is being calculated is pretty opaque. Why would we divide by 100 to compute free_expected and not do the same for free_actual? Do we care about integer division truncation? The default value of ShenandoahCriticalFreeThreshold is 1, so multiplying by it is a nop by default, which seems strange. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23306#discussion_r1941436272 From shade at openjdk.org Tue Feb 4 16:05:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 4 Feb 2025 16:05:10 GMT Subject: RFR: 8348268: Test gc/shenandoah/TestResizeTLAB.java#compact: fatal error: Before Updating References: Thread C2 CompilerThread1: expected gc-state 9, actual 21 In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 20:28:58 GMT, William Kemper wrote: > Non-java threads were not having their gc-state configured when they attach. If they were created before the verifier's safepoint, but after the iteration over non-java threads, they would not have the correct state. It looks generally okay, but I am confused how this fixes a bad state in `C2 CompilerThread1`, since compiler threads are Java threads? https://github.com/openjdk/jdk/blob/beb43e2633900bb9ab3c975376fe5860b6d054e0/src/hotspot/share/compiler/compilerThread.hpp#L42 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23428#issuecomment-2634411610 From phh at openjdk.org Tue Feb 4 16:19:20 2025 From: phh at openjdk.org (Paul Hohensee) Date: Tue, 4 Feb 2025 16:19:20 GMT Subject: RFR: 8348594: Shenandoah: Do not penalize for degeneration when not the fault of triggering heuristic [v2] In-Reply-To: References: Message-ID: On Mon, 27 Jan 2025 02:05:02 GMT, Kelvin Nilsen wrote: >> Shenandoah heuristics use a penalty mechanism to cause earlier GC triggers when recent concurrent GC cycles degenerate. Degeneration is a stop-the-world remediation that allows GC to catch up when mutator allocations fail during concurrent GC. The fact that we needed to degenerate indicates that we were overly optimistic in delaying the trigger that starts concurrent GC. >> >> We have observed that it is common for degenerated GC cycles to cascade upon each other. The condition that caused an initial degenerated cycle is often not fully resolved by the end of that degenerated cycle. For example, the application may be experiencing a phase change and the GC heuristics are not yet attuned to the new behavior. Furthermore, a degenerated GC may exacerbate the problem condition. During the stop-the-world pause imposed by the first degenerated GC, work continues to accumulate in the form of new client requests that are buffered in network sockets until the end of that degenerated GC. >> >> As originally implemented, each degeneration would "pile on" additional penalties. These penalties cause the GC frequency to continue to increase. And the expanding CPU load of GC makes it increasingly difficult for mutator threads to catchup. The large penalties accumulated while we are trying to resolve the problem linger long after the problem condition has been resolved. >> >> This change does not add further to the degeneration penalties if a new degenerated cycle occurs through no fault of the triggering mechanism. We only add the degeneration penalty if the reason we are now degenerating can be attributed to a consciously late trigger by the heuristic. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Respond to reviewer feedback src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 318: > 316: > 317: if (ShenandoahHeuristics::should_start_gc()) { > 318: _start_gc_is_pending = true; I assume there's no race here, i.e., only one thread reads/writes _start_gc_is_pending. If there's a race, make sure it's benign. In either case, _start_gc_is_pending is made "sticky" by this code. src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 261: > 259: > 260: void ShenandoahHeuristics::record_success_concurrent() { > 261: _start_gc_is_pending = false; The name _start_gc_is_pending implies that it should be set false as soon as a gc cycle starts, not when it finishes. Maybe _gc_pending? Or maybe setting it false at the end of a gc cycle is a bug? :) src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.hpp line 87: > 85: size_t _declined_trigger_count; // This counts how many times since previous GC finished that this > 86: // heuristic has answered false to should_start_gc(). > 87: size_t _previous_trigger_declinations; // This represents the value of _declined_trigger_count as captured at the Maybe the name should be _most_recent_declined_trigger_count, which relates it directly to _declined_trigger_count. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23305#discussion_r1941486248 PR Review Comment: https://git.openjdk.org/jdk/pull/23305#discussion_r1941462312 PR Review Comment: https://git.openjdk.org/jdk/pull/23305#discussion_r1941468695 From egahlin at openjdk.org Tue Feb 4 16:56:17 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 4 Feb 2025 16:56:17 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 4 Feb 2025 09:47:20 GMT, Thomas Schatzl wrote: > * Idk if GCLocker JFR events need to be available in metadata.xml if the VM does not actually ever send it. I think it does not. > Maybe it is used to decode from old recordings, may be worth asking e.g. @egahlin . If the event is not used and the metric is not interesting to have anymore, remove it from metadata.xml, default.jfc, profile.jfc, EventNames.java and delete the TestGCLockerEvent.java file. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2634538626 From wkemper at openjdk.org Tue Feb 4 17:25:20 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 4 Feb 2025 17:25:20 GMT Subject: RFR: 8348268: Test gc/shenandoah/TestResizeTLAB.java#compact: fatal error: Before Updating References: Thread C2 CompilerThread1: expected gc-state 9, actual 21 In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 16:03:02 GMT, Aleksey Shipilev wrote: >> Non-java threads were not having their gc-state configured when they attach. If they were created before the verifier's safepoint, but after the iteration over non-java threads, they would not have the correct state. > > It looks generally okay, but I am confused how this fixes a bad state in `C2 CompilerThread1`, since compiler threads are Java threads? https://github.com/openjdk/jdk/blob/beb43e2633900bb9ab3c975376fe5860b6d054e0/src/hotspot/share/compiler/compilerThread.hpp#L42 @shipilev , that is a good point. Will take a closer look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23428#issuecomment-2634609075 From rcastanedalo at openjdk.org Wed Feb 5 12:38:06 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 5 Feb 2025 12:38:06 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v4] In-Reply-To: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> Message-ID: <-aHCYC9iVc4eMZ3pMfiDpqaW-wGM_s3zRMiVBWoadCM=.910336cd-3be2-45b5-9874-63b71abf38f8@github.com> > G1 barriers can be safely elided from writes to newly allocated objects as long as no safepoint is taken between the allocation and the write. This changeset complements early G1 barrier elision (performed by the platform-independent phases of C2, and limited to writes immediately following allocations) with a more general elision pass done at a late stage. > > The late elision pass exploits that it runs at a stage where the relative order of memory accesses and safepoints cannot change anymore to elide barriers from initialization writes that do not immediately follow the corresponding allocation, e.g. in conditional initialization writes: > > > o = new MyObject(); > if (...) { > o.myField = ...; // barrier elided only after this changeset > // (assuming no safepoint in the if condition) > } > > > or in initialization writes placed after exception-throwing checks: > > > o = new MyObject(); > if (...) { > throw new Exception(""); > } > o.myField = ...; // barrier elided only after this changeset > // (assuming no safepoint in the above if condition) > > > These patterns are commonly found in Java code, e.g. in the core libraries: > > - [conditional initialization](https://github.com/openjdk/jdk/blob/25fecaaf87400af535c242fe50296f1f89ceeb16/src/java.base/share/classes/java/lang/String.java#L4850), or > > - [initialization after exception-throwing checks (in the superclass constructor)](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/X-Buffer.java.template#L324). > > The optimization also enhances barrier elision for array initialization writes, for example eliding barriers from small array initialization loops (for which safepoints are not inserted): > > > Object[] a = new Object[...]; > for (int i = 0; i < a.length; i++) { > a[i] = ...; // barrier elided only after this changeset > } > > > or eliding barriers from array initialization writes with unknown array index: > > > Object[] a = new Object[...]; > a[index] = ...; // barrier elided only after this changeset > > > The logic used to perform this additional barrier elision is a subset of a pre-existing ZGC-specific optimization. This changeset simply reuses the relevant subset (barrier elision for writes to newly-allocated objects) by extracting the core of the optimization logic from `zBarrierSetC2.cpp` into the GC-shared file `barrierSetC2.cpp`. The functions `block_has_safepoint`, `block_index`, `look_through_node`, `is_{undefined|unknown|concrete}`, `get_base_and_offset`, `is_array... Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Add some more tests to exercise barrier elision for atomic operations - Elide barriers from atomic operations on newly allocated objects as well ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23235/files - new: https://git.openjdk.org/jdk/pull/23235/files/3d154fa8..621a61cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23235&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23235&range=02-03 Stats: 174 lines in 2 files changed: 167 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23235/head:pull/23235 PR: https://git.openjdk.org/jdk/pull/23235 From rcastanedalo at openjdk.org Wed Feb 5 12:42:15 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 5 Feb 2025 12:42:15 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v2] In-Reply-To: References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> Message-ID: On Fri, 31 Jan 2025 14:06:16 GMT, Roberto Casta?eda Lozano wrote: > > One question about elision for atomics. > > Otherwise it seems good afaict, although a large part was checking that the code movement is/was correct. > > Thanks for reviewing Thomas! Please let me know whether you want me to extend this changeset to elide barriers on atomic operations (happy to do so). @tschatzl I did extend the changeset now to also elide barriers on atomic operations, as discussed offline. Please have a look again. @offamitkumar @TheRealMDoerr @RealFYang @snazarkin you might want to re-test the changeset on your respective platforms. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23235#issuecomment-2636669816 From iwalulya at openjdk.org Wed Feb 5 13:37:52 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 5 Feb 2025 13:37:52 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v11] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: - Revise Print Rememberedset info - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - Albert review - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - Merge remote-tracking branch 'upstream/master' into OldGenRemsetGroupsV1 - fix type - fix space issues - cleanup - assert - ... and 21 more: https://git.openjdk.org/jdk/compare/beae8843...d50457e3 ------------- Changes: https://git.openjdk.org/jdk/pull/22015/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=10 Stats: 1441 lines in 32 files changed: 679 ins; 369 del; 393 mod Patch: https://git.openjdk.org/jdk/pull/22015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22015/head:pull/22015 PR: https://git.openjdk.org/jdk/pull/22015 From iwalulya at openjdk.org Wed Feb 5 13:40:59 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 5 Feb 2025 13:40:59 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v12] In-Reply-To: References: Message-ID: <3ru13KcIWif1mzPnCckRryxaW6g3AkrIJvTBIaaCRNQ=.6c12262e-7b05-40df-8341-ae8141983237@github.com> > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22015/files - new: https://git.openjdk.org/jdk/pull/22015/files/d50457e3..5b43fdb7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22015&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22015/head:pull/22015 PR: https://git.openjdk.org/jdk/pull/22015 From iwalulya at openjdk.org Wed Feb 5 13:48:20 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 5 Feb 2025 13:48:20 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v10] In-Reply-To: References: <5IANDiv_ZPk3dAPem7OekMx6d1cUDiFGtOVWlcWt52Y=.f5e7ad67-3181-4757-8f61-1bbcc9e62280@github.com> Message-ID: On Mon, 23 Dec 2024 21:03:16 GMT, Albert Mingkun Yang wrote: >> Yes, retained regions are in "single region" groups, so all details should be added to the log when we call `do_heap_region` > > I see; however, this would print the same gc_eff twice if young-gen contains a single region, right? Since this method is about cset-groups, I think it's more natural to visit all groups (regardless their size) here. With this PR, there is no gc_eff associated with individual region, `do_heap_region` can just skip gc_eff. fixed, creating another issue; now we don't print details on humongous regions. I ask we fix that in a follow up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22015#discussion_r1942972818 From ayang at openjdk.org Wed Feb 5 14:41:39 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 5 Feb 2025 14:41:39 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: > Here is an attempt to simplify GCLocker implementation for Serial and Parallel. > > GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. > > The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. > > Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into gclocker - review - Merge branch 'master' into gclocker - gclocker ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23367/files - new: https://git.openjdk.org/jdk/pull/23367/files/6283a19c..1b6f908b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23367&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23367&range=00-01 Stats: 20456 lines in 569 files changed: 9369 ins; 6708 del; 4379 mod Patch: https://git.openjdk.org/jdk/pull/23367.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23367/head:pull/23367 PR: https://git.openjdk.org/jdk/pull/23367 From ayang at openjdk.org Wed Feb 5 14:41:39 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 5 Feb 2025 14:41:39 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 4 Feb 2025 09:05:35 GMT, Thomas Schatzl wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into gclocker >> - review >> - Merge branch 'master' into gclocker >> - gclocker > > src/hotspot/share/gc/shared/gcLocker.hpp line 33: > >> 31: >> 32: class GCLocker: public AllStatic { >> 33: static Monitor* _lock; > > Not sure if having this copy/reference to `Heap_lock` makes the code more clear than referencing `Heap_lock` directly. It needs to be `Heap_lock` anyway. `GCLocker` itself doesn't mandates that the lock must be `Heap_lock`; it's the interaction with rest of VM that shows that `Heap_lock` is a good candidate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1943040719 From ayang at openjdk.org Wed Feb 5 14:41:39 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 5 Feb 2025 14:41:39 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: <82w1_VjrsxtrpA7921QmHsA0kh9_J0kBtOCxp6sL7F4=.0b0d0698-b3d2-43a0-85b4-6b7e530e3a7a@github.com> On Wed, 5 Feb 2025 14:38:45 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker > Further it would be interesting to see how many retries there are in the allocation loop with these jnilock* stress test. I added `QueuedAllocationWarningCount=1` to `test/hotspot/jtreg/vmTestbase/nsk/stress/jni/gclocker/gcl001.java` and saw retry never exceeds 10 for Serial/Parallel. > Which means that it might retry for a long time That occurs only when another java thread successfully triggers a gc, advancing the gc-counter, i.e. there is some system-wide progress. Per-thread progress is hard to guarantee, IMO. ------------- PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2595944041 From ayang at openjdk.org Wed Feb 5 15:08:13 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 5 Feb 2025 15:08:13 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v12] In-Reply-To: <3ru13KcIWif1mzPnCckRryxaW6g3AkrIJvTBIaaCRNQ=.6c12262e-7b05-40df-8341-ae8141983237@github.com> References: <3ru13KcIWif1mzPnCckRryxaW6g3AkrIJvTBIaaCRNQ=.6c12262e-7b05-40df-8341-ae8141983237@github.com> Message-ID: On Wed, 5 Feb 2025 13:40:59 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. >> >> In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. >> >> The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. >> >> In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. >> >> Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. >> >> We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. >> >> Testing: Mach5 Tier1-6 >> >> ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) >> ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) >> ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) >> ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > space Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22015#pullrequestreview-2596072480 From mdoerr at openjdk.org Wed Feb 5 15:09:17 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 5 Feb 2025 15:09:17 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v4] In-Reply-To: <-aHCYC9iVc4eMZ3pMfiDpqaW-wGM_s3zRMiVBWoadCM=.910336cd-3be2-45b5-9874-63b71abf38f8@github.com> References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> <-aHCYC9iVc4eMZ3pMfiDpqaW-wGM_s3zRMiVBWoadCM=.910336cd-3be2-45b5-9874-63b71abf38f8@github.com> Message-ID: On Wed, 5 Feb 2025 12:38:06 GMT, Roberto Casta?eda Lozano wrote: >> G1 barriers can be safely elided from writes to newly allocated objects as long as no safepoint is taken between the allocation and the write. This changeset complements early G1 barrier elision (performed by the platform-independent phases of C2, and limited to writes immediately following allocations) with a more general elision pass done at a late stage. >> >> The late elision pass exploits that it runs at a stage where the relative order of memory accesses and safepoints cannot change anymore to elide barriers from initialization writes that do not immediately follow the corresponding allocation, e.g. in conditional initialization writes: >> >> >> o = new MyObject(); >> if (...) { >> o.myField = ...; // barrier elided only after this changeset >> // (assuming no safepoint in the if condition) >> } >> >> >> or in initialization writes placed after exception-throwing checks: >> >> >> o = new MyObject(); >> if (...) { >> throw new Exception(""); >> } >> o.myField = ...; // barrier elided only after this changeset >> // (assuming no safepoint in the above if condition) >> >> >> These patterns are commonly found in Java code, e.g. in the core libraries: >> >> - [conditional initialization](https://github.com/openjdk/jdk/blob/25fecaaf87400af535c242fe50296f1f89ceeb16/src/java.base/share/classes/java/lang/String.java#L4850), or >> >> - [initialization after exception-throwing checks (in the superclass constructor)](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/X-Buffer.java.template#L324). >> >> The optimization also enhances barrier elision for array initialization writes, for example eliding barriers from small array initialization loops (for which safepoints are not inserted): >> >> >> Object[] a = new Object[...]; >> for (int i = 0; i < a.length; i++) { >> a[i] = ...; // barrier elided only after this changeset >> } >> >> >> or eliding barriers from array initialization writes with unknown array index: >> >> >> Object[] a = new Object[...]; >> a[index] = ...; // barrier elided only after this changeset >> >> >> The logic used to perform this additional barrier elision is a subset of a pre-existing ZGC-specific optimization. This changeset simply reuses the relevant subset (barrier elision for writes to newly-allocated objects) by extracting the core of the optimization logic from `zBarrierSetC2.cpp` into the GC-shared file `barrierSetC2.cpp`. The functions `block_has_safepoint`, `block_inde... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Add some more tests to exercise barrier elision for atomic operations > - Elide barriers from atomic operations on newly allocated objects as well LGTM. TestG1BarrierGeneration.java has passed on ppc64le. I'll run more tests. Please remember updating the Copyright headers. ------------- PR Review: https://git.openjdk.org/jdk/pull/23235#pullrequestreview-2596076238 From amitkumar at openjdk.org Wed Feb 5 17:57:19 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 5 Feb 2025 17:57:19 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v4] In-Reply-To: <-aHCYC9iVc4eMZ3pMfiDpqaW-wGM_s3zRMiVBWoadCM=.910336cd-3be2-45b5-9874-63b71abf38f8@github.com> References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> <-aHCYC9iVc4eMZ3pMfiDpqaW-wGM_s3zRMiVBWoadCM=.910336cd-3be2-45b5-9874-63b71abf38f8@github.com> Message-ID: On Wed, 5 Feb 2025 12:38:06 GMT, Roberto Casta?eda Lozano wrote: >> G1 barriers can be safely elided from writes to newly allocated objects as long as no safepoint is taken between the allocation and the write. This changeset complements early G1 barrier elision (performed by the platform-independent phases of C2, and limited to writes immediately following allocations) with a more general elision pass done at a late stage. >> >> The late elision pass exploits that it runs at a stage where the relative order of memory accesses and safepoints cannot change anymore to elide barriers from initialization writes that do not immediately follow the corresponding allocation, e.g. in conditional initialization writes: >> >> >> o = new MyObject(); >> if (...) { >> o.myField = ...; // barrier elided only after this changeset >> // (assuming no safepoint in the if condition) >> } >> >> >> or in initialization writes placed after exception-throwing checks: >> >> >> o = new MyObject(); >> if (...) { >> throw new Exception(""); >> } >> o.myField = ...; // barrier elided only after this changeset >> // (assuming no safepoint in the above if condition) >> >> >> These patterns are commonly found in Java code, e.g. in the core libraries: >> >> - [conditional initialization](https://github.com/openjdk/jdk/blob/25fecaaf87400af535c242fe50296f1f89ceeb16/src/java.base/share/classes/java/lang/String.java#L4850), or >> >> - [initialization after exception-throwing checks (in the superclass constructor)](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/X-Buffer.java.template#L324). >> >> The optimization also enhances barrier elision for array initialization writes, for example eliding barriers from small array initialization loops (for which safepoints are not inserted): >> >> >> Object[] a = new Object[...]; >> for (int i = 0; i < a.length; i++) { >> a[i] = ...; // barrier elided only after this changeset >> } >> >> >> or eliding barriers from array initialization writes with unknown array index: >> >> >> Object[] a = new Object[...]; >> a[index] = ...; // barrier elided only after this changeset >> >> >> The logic used to perform this additional barrier elision is a subset of a pre-existing ZGC-specific optimization. This changeset simply reuses the relevant subset (barrier elision for writes to newly-allocated objects) by extracting the core of the optimization logic from `zBarrierSetC2.cpp` into the GC-shared file `barrierSetC2.cpp`. The functions `block_has_safepoint`, `block_inde... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Add some more tests to exercise barrier elision for atomic operations > - Elide barriers from atomic operations on newly allocated objects as well I see TestG1BarrierGeneration.java failure :( [TestG1BarrierGeneration_jtr.log](https://github.com/user-attachments/files/18676532/TestG1BarrierGeneration_jtr.log) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23235#issuecomment-2637624720 From wkemper at openjdk.org Wed Feb 5 19:06:20 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 5 Feb 2025 19:06:20 GMT Subject: [jdk24] Withdrawn: 8349002: GenShen: Deadlock during shutdown In-Reply-To: <6a_6G1g93RUACyYcHG5B9HtqFBaqdETRRdhvFWwrfi8=.e88c6252-f6c4-4e3f-972a-2c4495d27127@github.com> References: <6a_6G1g93RUACyYcHG5B9HtqFBaqdETRRdhvFWwrfi8=.e88c6252-f6c4-4e3f-972a-2c4495d27127@github.com> Message-ID: On Mon, 3 Feb 2025 22:27:44 GMT, William Kemper wrote: > Clean backport. Fixes bug introduced by [JDK-8345970](https://bugs.openjdk.org/browse/JDK-8345970). This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23429 From wkemper at openjdk.org Wed Feb 5 19:06:20 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 5 Feb 2025 19:06:20 GMT Subject: [jdk24] RFR: 8349002: GenShen: Deadlock during shutdown In-Reply-To: <6a_6G1g93RUACyYcHG5B9HtqFBaqdETRRdhvFWwrfi8=.e88c6252-f6c4-4e3f-972a-2c4495d27127@github.com> References: <6a_6G1g93RUACyYcHG5B9HtqFBaqdETRRdhvFWwrfi8=.e88c6252-f6c4-4e3f-972a-2c4495d27127@github.com> Message-ID: On Mon, 3 Feb 2025 22:27:44 GMT, William Kemper wrote: > Clean backport. Fixes bug introduced by [JDK-8345970](https://bugs.openjdk.org/browse/JDK-8345970). Understood. Will target JDK24 update release. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23429#issuecomment-2637789977 From dlong at openjdk.org Wed Feb 5 19:51:02 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 5 Feb 2025 19:51:02 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker I like the direction this is taking us, but I think we could go even further and eventually fold the JNI critical region into the existing safepoint mechanism. To me, the safepoint mechanism already implements a readers-writer lock, with threads states like _thread_in_Java/_thread_in_vm already being "critical regions". With this change, we have two nested readers-writer locks that a GC needs to acquire. I think if we made entering and exiting a JNI critical region change the thread state, (probably by introducing a new thread state), then we don't need a separate readers-writer lock for JNI critical region. However, maybe we don't want to go that far, as the current solution allows us GC-specific implementations and allows each different GC VMOp to decide if it needs to block for JNI critical regions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2637865869 From egahlin at openjdk.org Wed Feb 5 19:59:13 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 5 Feb 2025 19:59:13 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker JFR changes look good. ------------- Marked as reviewed by egahlin (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2596847997 From wkemper at openjdk.org Wed Feb 5 22:35:21 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 5 Feb 2025 22:35:21 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions Message-ID: There are several changes to the operation of Shenandoah's control threads here. * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. * The cancellation handling is driven entirely by the cancellation cause * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed * The shutdown sequence is simpler * The generational control thread uses a lock to coordinate updates to the requested cause and generation * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles * The control thread doesn't loop on its own (unless the pacer is enabled). ------------- Commit messages: - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads - Simplify shControlThread - Revert unnecessary changes - Fix interrupted old cycle handling - Restore reporting allocations to pacer - Better names, better comments - WIP: Simplify shutdown protocol - WIP: Don't need request.mode anymore - WIP: Simplify degenerated cycle handling - WIP: Passes tier1, mostly passes tier2 - ... and 4 more: https://git.openjdk.org/jdk/compare/b499c827...f97f257b Changes: https://git.openjdk.org/jdk/pull/23475/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349094 Stats: 817 lines in 14 files changed: 241 ins; 286 del; 290 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From fyang at openjdk.org Thu Feb 6 02:43:12 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 6 Feb 2025 02:43:12 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v4] In-Reply-To: <-aHCYC9iVc4eMZ3pMfiDpqaW-wGM_s3zRMiVBWoadCM=.910336cd-3be2-45b5-9874-63b71abf38f8@github.com> References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> <-aHCYC9iVc4eMZ3pMfiDpqaW-wGM_s3zRMiVBWoadCM=.910336cd-3be2-45b5-9874-63b71abf38f8@github.com> Message-ID: On Wed, 5 Feb 2025 12:38:06 GMT, Roberto Casta?eda Lozano wrote: >> G1 barriers can be safely elided from writes to newly allocated objects as long as no safepoint is taken between the allocation and the write. This changeset complements early G1 barrier elision (performed by the platform-independent phases of C2, and limited to writes immediately following allocations) with a more general elision pass done at a late stage. >> >> The late elision pass exploits that it runs at a stage where the relative order of memory accesses and safepoints cannot change anymore to elide barriers from initialization writes that do not immediately follow the corresponding allocation, e.g. in conditional initialization writes: >> >> >> o = new MyObject(); >> if (...) { >> o.myField = ...; // barrier elided only after this changeset >> // (assuming no safepoint in the if condition) >> } >> >> >> or in initialization writes placed after exception-throwing checks: >> >> >> o = new MyObject(); >> if (...) { >> throw new Exception(""); >> } >> o.myField = ...; // barrier elided only after this changeset >> // (assuming no safepoint in the above if condition) >> >> >> These patterns are commonly found in Java code, e.g. in the core libraries: >> >> - [conditional initialization](https://github.com/openjdk/jdk/blob/25fecaaf87400af535c242fe50296f1f89ceeb16/src/java.base/share/classes/java/lang/String.java#L4850), or >> >> - [initialization after exception-throwing checks (in the superclass constructor)](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/X-Buffer.java.template#L324). >> >> The optimization also enhances barrier elision for array initialization writes, for example eliding barriers from small array initialization loops (for which safepoints are not inserted): >> >> >> Object[] a = new Object[...]; >> for (int i = 0; i < a.length; i++) { >> a[i] = ...; // barrier elided only after this changeset >> } >> >> >> or eliding barriers from array initialization writes with unknown array index: >> >> >> Object[] a = new Object[...]; >> a[index] = ...; // barrier elided only after this changeset >> >> >> The logic used to perform this additional barrier elision is a subset of a pre-existing ZGC-specific optimization. This changeset simply reuses the relevant subset (barrier elision for writes to newly-allocated objects) by extracting the core of the optimization logic from `zBarrierSetC2.cpp` into the GC-shared file `barrierSetC2.cpp`. The functions `block_has_safepoint`, `block_inde... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Add some more tests to exercise barrier elision for atomic operations > - Elide barriers from atomic operations on newly allocated objects as well FYI: hs-tier1 still test good on linux-riscv64 with fastdebug build. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23235#issuecomment-2638686602 From dholmes at openjdk.org Thu Feb 6 06:28:13 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Feb 2025 06:28:13 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker src/hotspot/share/runtime/javaThread.hpp line 938: > 936: } > 937: > 938: bool in_critical_atomic() { return Atomic::load(&_jni_active_critical) > 0; } If you think you need an atomic load here, then it would be needed for `in_critical()` so just add it there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1944166423 From dholmes at openjdk.org Thu Feb 6 06:38:13 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Feb 2025 06:38:13 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker > this PR uses an existing thread-local variable with a store-load barrier for synchronization. @albertnetymk can you explain how this protocol is intended to work please. I must be missing some higher-level context that provides additional synchronization because use of the per-thread counters is inherently racy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2638958442 From dholmes at openjdk.org Thu Feb 6 06:41:21 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Feb 2025 06:41:21 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Wed, 5 Feb 2025 19:39:30 GMT, Dean Long wrote: > I think we could go even further and eventually fold the JNI critical region into the existing safepoint mechanism. @dean-long you seem to be forgetting why it was folded out in the first place. :) This was performance critical JNI code where the thread-state transitions were too heavyweight and expensive to use. So we keep the thread safepoint-safe (`_thread_in_native`) and have a way to tell the GC to pause whilst we are in these critical regions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2638962365 From rcastanedalo at openjdk.org Thu Feb 6 08:58:40 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 6 Feb 2025 08:58:40 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v4] In-Reply-To: References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> <-aHCYC9iVc4eMZ3pMfiDpqaW-wGM_s3zRMiVBWoadCM=.910336cd-3be2-45b5-9874-63b71abf38f8@github.com> Message-ID: On Wed, 5 Feb 2025 15:06:39 GMT, Martin Doerr wrote: > LGTM. TestG1BarrierGeneration.java has passed on ppc64le. I'll run more tests. Please remember updating the Copyright headers. Thanks for the reminder, updated in commit 3671f474. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23235#issuecomment-2639197888 From rcastanedalo at openjdk.org Thu Feb 6 08:49:28 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 6 Feb 2025 08:49:28 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v5] In-Reply-To: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> Message-ID: > G1 barriers can be safely elided from writes to newly allocated objects as long as no safepoint is taken between the allocation and the write. This changeset complements early G1 barrier elision (performed by the platform-independent phases of C2, and limited to writes immediately following allocations) with a more general elision pass done at a late stage. > > The late elision pass exploits that it runs at a stage where the relative order of memory accesses and safepoints cannot change anymore to elide barriers from initialization writes that do not immediately follow the corresponding allocation, e.g. in conditional initialization writes: > > > o = new MyObject(); > if (...) { > o.myField = ...; // barrier elided only after this changeset > // (assuming no safepoint in the if condition) > } > > > or in initialization writes placed after exception-throwing checks: > > > o = new MyObject(); > if (...) { > throw new Exception(""); > } > o.myField = ...; // barrier elided only after this changeset > // (assuming no safepoint in the above if condition) > > > These patterns are commonly found in Java code, e.g. in the core libraries: > > - [conditional initialization](https://github.com/openjdk/jdk/blob/25fecaaf87400af535c242fe50296f1f89ceeb16/src/java.base/share/classes/java/lang/String.java#L4850), or > > - [initialization after exception-throwing checks (in the superclass constructor)](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/X-Buffer.java.template#L324). > > The optimization also enhances barrier elision for array initialization writes, for example eliding barriers from small array initialization loops (for which safepoints are not inserted): > > > Object[] a = new Object[...]; > for (int i = 0; i < a.length; i++) { > a[i] = ...; // barrier elided only after this changeset > } > > > or eliding barriers from array initialization writes with unknown array index: > > > Object[] a = new Object[...]; > a[index] = ...; // barrier elided only after this changeset > > > The logic used to perform this additional barrier elision is a subset of a pre-existing ZGC-specific optimization. This changeset simply reuses the relevant subset (barrier elision for writes to newly-allocated objects) by extracting the core of the optimization logic from `zBarrierSetC2.cpp` into the GC-shared file `barrierSetC2.cpp`. The functions `block_has_safepoint`, `block_index`, `look_through_node`, `is_{undefined|unknown|concrete}`, `get_base_and_offset`, `is_array... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Update copyright headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23235/files - new: https://git.openjdk.org/jdk/pull/23235/files/621a61cf..3671f474 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23235&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23235&range=03-04 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23235/head:pull/23235 PR: https://git.openjdk.org/jdk/pull/23235 From mdoerr at openjdk.org Thu Feb 6 10:13:28 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 6 Feb 2025 10:13:28 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v5] In-Reply-To: References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> Message-ID: On Thu, 6 Feb 2025 08:49:28 GMT, Roberto Casta?eda Lozano wrote: >> G1 barriers can be safely elided from writes to newly allocated objects as long as no safepoint is taken between the allocation and the write. This changeset complements early G1 barrier elision (performed by the platform-independent phases of C2, and limited to writes immediately following allocations) with a more general elision pass done at a late stage. >> >> The late elision pass exploits that it runs at a stage where the relative order of memory accesses and safepoints cannot change anymore to elide barriers from initialization writes that do not immediately follow the corresponding allocation, e.g. in conditional initialization writes: >> >> >> o = new MyObject(); >> if (...) { >> o.myField = ...; // barrier elided only after this changeset >> // (assuming no safepoint in the if condition) >> } >> >> >> or in initialization writes placed after exception-throwing checks: >> >> >> o = new MyObject(); >> if (...) { >> throw new Exception(""); >> } >> o.myField = ...; // barrier elided only after this changeset >> // (assuming no safepoint in the above if condition) >> >> >> These patterns are commonly found in Java code, e.g. in the core libraries: >> >> - [conditional initialization](https://github.com/openjdk/jdk/blob/25fecaaf87400af535c242fe50296f1f89ceeb16/src/java.base/share/classes/java/lang/String.java#L4850), or >> >> - [initialization after exception-throwing checks (in the superclass constructor)](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/X-Buffer.java.template#L324). >> >> The optimization also enhances barrier elision for array initialization writes, for example eliding barriers from small array initialization loops (for which safepoints are not inserted): >> >> >> Object[] a = new Object[...]; >> for (int i = 0; i < a.length; i++) { >> a[i] = ...; // barrier elided only after this changeset >> } >> >> >> or eliding barriers from array initialization writes with unknown array index: >> >> >> Object[] a = new Object[...]; >> a[index] = ...; // barrier elided only after this changeset >> >> >> The logic used to perform this additional barrier elision is a subset of a pre-existing ZGC-specific optimization. This changeset simply reuses the relevant subset (barrier elision for writes to newly-allocated objects) by extracting the core of the optimization logic from `zBarrierSetC2.cpp` into the GC-shared file `barrierSetC2.cpp`. The functions `block_has_safepoint`, `block_inde... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright headers Code and test results look good. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23235#pullrequestreview-2598222122 From wkemper at openjdk.org Thu Feb 6 17:20:52 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Feb 2025 17:20:52 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v2] In-Reply-To: References: Message-ID: <9vqH905wEy_k3MoOq-wmpzFWuniRKpiDAu6en7bOSr4=.a8fee870-a8fc-4532-acc7-c37975e8a948@github.com> > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). William Kemper has updated the pull request incrementally with one additional commit since the last revision: Remove invalid assert, alloc waiters wait until allocation failure is clear ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23475/files - new: https://git.openjdk.org/jdk/pull/23475/files/f97f257b..a7a6eea1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=00-01 Stats: 5 lines in 2 files changed: 2 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From ayang at openjdk.org Thu Feb 6 21:45:13 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 6 Feb 2025 21:45:13 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Thu, 6 Feb 2025 06:35:46 GMT, David Holmes wrote: > can you explain how this protocol is intended to work please. When a GC is requested, the `block()` function sets `_is_gc_request_pending` to `true` and then waits until all threads have exited their critical regions. Any thread attempting to enter a critical region during this time will detect the pending GC flag in `enter()` and follow the slow path, effectively waiting until the GC completes. The storeload barrier is critical to ensure that these two variables -- `_is_gc_request_pending` and the thread-local `_jni_active_critical` -- are accessed in the proper order. > If you think you need an atomic load here, then it would be needed for in_critical() so just add it there. `in_critical()` is used only by the owning thread, which has exclusive write access. Therefore, its access does not need to be atomic. However, the reads performed by other threads must be atomic, I believe. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2641116616 From wkemper at openjdk.org Fri Feb 7 02:04:29 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 7 Feb 2025 02:04:29 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v3] In-Reply-To: References: Message-ID: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). William Kemper has updated the pull request incrementally with three additional commits since the last revision: - Resuming an old cycle should not preempt a young cycle - Use logging tag 'thread' to help control debug volume - Do not stomp on pending requests when running a degenerated cycle ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23475/files - new: https://git.openjdk.org/jdk/pull/23475/files/a7a6eea1..ae207480 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=01-02 Stats: 64 lines in 8 files changed: 35 ins; 12 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From dholmes at openjdk.org Fri Feb 7 06:46:10 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 7 Feb 2025 06:46:10 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Thu, 6 Feb 2025 21:42:50 GMT, Albert Mingkun Yang wrote: > in_critical() is used only by the owning thread, I see code using `thr->in_critical()` which is not obviously being executed by the current thread on itself. But in any case adding the atomic load to `in_critical()` is basically a no-op (loads are atomic) so no need to add a new API just to do that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2642070840 From dholmes at openjdk.org Fri Feb 7 07:01:39 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 7 Feb 2025 07:01:39 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Thu, 6 Feb 2025 21:42:50 GMT, Albert Mingkun Yang wrote: > The storeload barrier is critical ... I'm not sure it is sufficient. I would have expected some full fences to be needed here as this is very similar to the interaction of thread state with safepoints. I will look closer on Monday (sorry). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2642089369 From tschatzl at openjdk.org Fri Feb 7 08:42:16 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 7 Feb 2025 08:42:16 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v5] In-Reply-To: References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> Message-ID: On Thu, 6 Feb 2025 08:49:28 GMT, Roberto Casta?eda Lozano wrote: >> G1 barriers can be safely elided from writes to newly allocated objects as long as no safepoint is taken between the allocation and the write. This changeset complements early G1 barrier elision (performed by the platform-independent phases of C2, and limited to writes immediately following allocations) with a more general elision pass done at a late stage. >> >> The late elision pass exploits that it runs at a stage where the relative order of memory accesses and safepoints cannot change anymore to elide barriers from initialization writes that do not immediately follow the corresponding allocation, e.g. in conditional initialization writes: >> >> >> o = new MyObject(); >> if (...) { >> o.myField = ...; // barrier elided only after this changeset >> // (assuming no safepoint in the if condition) >> } >> >> >> or in initialization writes placed after exception-throwing checks: >> >> >> o = new MyObject(); >> if (...) { >> throw new Exception(""); >> } >> o.myField = ...; // barrier elided only after this changeset >> // (assuming no safepoint in the above if condition) >> >> >> These patterns are commonly found in Java code, e.g. in the core libraries: >> >> - [conditional initialization](https://github.com/openjdk/jdk/blob/25fecaaf87400af535c242fe50296f1f89ceeb16/src/java.base/share/classes/java/lang/String.java#L4850), or >> >> - [initialization after exception-throwing checks (in the superclass constructor)](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/X-Buffer.java.template#L324). >> >> The optimization also enhances barrier elision for array initialization writes, for example eliding barriers from small array initialization loops (for which safepoints are not inserted): >> >> >> Object[] a = new Object[...]; >> for (int i = 0; i < a.length; i++) { >> a[i] = ...; // barrier elided only after this changeset >> } >> >> >> or eliding barriers from array initialization writes with unknown array index: >> >> >> Object[] a = new Object[...]; >> a[index] = ...; // barrier elided only after this changeset >> >> >> The logic used to perform this additional barrier elision is a subset of a pre-existing ZGC-specific optimization. This changeset simply reuses the relevant subset (barrier elision for writes to newly-allocated objects) by extracting the core of the optimization logic from `zBarrierSetC2.cpp` into the GC-shared file `barrierSetC2.cpp`. The functions `block_has_safepoint`, `block_inde... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright headers Afaict this is good. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23235#pullrequestreview-2601121948 From rcastanedalo at openjdk.org Fri Feb 7 09:19:13 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 7 Feb 2025 09:19:13 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v5] In-Reply-To: References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> Message-ID: <6WMofkASYawj1iolPRb1_3GIgpjJ_5ggK-nnnMXdYII=.58aff13a-49b3-430f-a37e-c2dea123bd97@github.com> On Fri, 7 Feb 2025 08:40:03 GMT, Thomas Schatzl wrote: > Afaict this is good. Thanks for reviewing, Thomas! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23235#issuecomment-2642378742 From rcastanedalo at openjdk.org Fri Feb 7 09:24:13 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 7 Feb 2025 09:24:13 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v4] In-Reply-To: References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> <-aHCYC9iVc4eMZ3pMfiDpqaW-wGM_s3zRMiVBWoadCM=.910336cd-3be2-45b5-9874-63b71abf38f8@github.com> Message-ID: On Wed, 5 Feb 2025 17:51:36 GMT, Amit Kumar wrote: > I see TestG1BarrierGeneration.java failure :( > > [TestG1BarrierGeneration_jtr.log](https://github.com/user-attachments/files/18676532/TestG1BarrierGeneration_jtr.log) @offamitkumar thanks for the report! Most likely the test failures are only due to missing optimizations (because of limitations in the barrier elision pattern matching analysis), but if you want me to confirm please send the entire jtreg log, without truncation. You can disable output truncation running the test like this: `make run-test TEST="compiler/gcbarriers/TestG1BarrierGeneration.java" JTREG="MAX_OUTPUT=999999999"` Please double-check that the output log file does not contain any `Output overflow` message. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23235#issuecomment-2642388571 From iwalulya at openjdk.org Fri Feb 7 10:40:23 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 7 Feb 2025 10:40:23 GMT Subject: RFR: 8343782: G1: Use one G1CardSet instance for multiple old gen regions [v12] In-Reply-To: References: <3ru13KcIWif1mzPnCckRryxaW6g3AkrIJvTBIaaCRNQ=.6c12262e-7b05-40df-8341-ae8141983237@github.com> Message-ID: On Wed, 5 Feb 2025 15:05:21 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> space > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk and @tschatzl for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22015#issuecomment-2642514964 From iwalulya at openjdk.org Fri Feb 7 10:40:24 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 7 Feb 2025 10:40:24 GMT Subject: Integrated: 8343782: G1: Use one G1CardSet instance for multiple old gen regions In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 13:58:36 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to assign multiple collection candidate regions to a single instance of a G1CardSet. Currently, we maintain a 1:1 mapping of old-gen regions and G1CardSet instances, assuming these regions are collected independently. However, regions are collected in batches for performance reasons to meet the G1MixedGCCountTarget. > > In this change, at the end of the Remark phase, we batch regions that we anticipate will be collected together into a collection group while selecting remembered set rebuild candidates. Regions in a collection group should be evacuated at the same time because they are assigned to the same G1CardSet instances. This implies that we do not need to maintain cross-region remembered set entries for regions within the same collection group. > > The benefit is a reduction in the memory overhead of the remembered set and the remembered set merge time during the collection pause. One disadvantage is that this approach decreases the flexibility during evacuation: you can only evacuate all regions that share a particular G1CardSet at the same time. Another downside is that pinned regions that are part of a collection group have to be partially evacuated when the collection group is selected for evacuation. This removes the optimization in the mainline implementation where the pinned regions are skipped to allow for potential unpinning before evacuation. > > In this change, we make significant changes to the collection set implementation as we switch to group selection instead of region selection. Consequently, many of the changes in the PR are about switching from region-centered collection set selection to a group-centered approach. > > Note: The batching is based on the sort order by reclaimable bytes which may change the evacuation order in which regions would have been evacuated when sorted by gc efficiency. > > We have not observed any regressions on internal performance testing platforms. Memory comparisons for the Cachestress benchmark for different heap sizes are attached below. > > Testing: Mach5 Tier1-6 > > ![16GB](https://github.com/user-attachments/assets/3224c2f1-172d-4d76-ba28-bf483b1b1c95) > ![32G](https://github.com/user-attachments/assets/abd10537-41a9-4cf9-b668-362af12fe949) > ![64GB](https://github.com/user-attachments/assets/fa87eefc-cf8a-4fb5-9fc4-e7151498bf73) > ![128GB](https://github.com/user-attachments/assets/c3a59e32-6bd7-43e3-a3e4-c472f71aa544) This pull request has now been integrated. Changeset: 86cec4ea Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/86cec4ea2c2c56f03b23be44caade49b922cd3c6 Stats: 1440 lines in 32 files changed: 678 ins; 369 del; 393 mod 8343782: G1: Use one G1CardSet instance for multiple old gen regions Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/22015 From amitkumar at openjdk.org Fri Feb 7 12:02:22 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 7 Feb 2025 12:02:22 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v4] In-Reply-To: References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> <-aHCYC9iVc4eMZ3pMfiDpqaW-wGM_s3zRMiVBWoadCM=.910336cd-3be2-45b5-9874-63b71abf38f8@github.com> Message-ID: On Fri, 7 Feb 2025 09:21:39 GMT, Roberto Casta?eda Lozano wrote: >> I see TestG1BarrierGeneration.java failure :( >> >> [TestG1BarrierGeneration_jtr.log](https://github.com/user-attachments/files/18676532/TestG1BarrierGeneration_jtr.log) > >> I see TestG1BarrierGeneration.java failure :( >> >> [TestG1BarrierGeneration_jtr.log](https://github.com/user-attachments/files/18676532/TestG1BarrierGeneration_jtr.log) > > @offamitkumar thanks for the report! Most likely the test failures are only due to missing optimizations (because of limitations in the barrier elision pattern matching analysis), but if you want me to confirm please send the entire jtreg log, without truncation. You can disable output truncation running the test like this: > `make run-test TEST="compiler/gcbarriers/TestG1BarrierGeneration.java" JTREG="MAX_OUTPUT=999999999"` > Please double-check that the output log file does not contain any `Output overflow` message. @robcasloz Sure: I can spend time on it, maybe on weekend, for now I am overloaded with some other tasks. [TestG1BarrierGeneration_jtr_no_overflow.log](https://github.com/user-attachments/files/18706090/TestG1BarrierGeneration_jtr_no_overflow.log) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23235#issuecomment-2642733177 From rcastanedalo at openjdk.org Fri Feb 7 14:48:51 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 7 Feb 2025 14:48:51 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v6] In-Reply-To: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> Message-ID: <2jrzusvVl-XI8K734YlChq4ObRX75yovTq7mWTf8ZlA=.0e75781a-5d52-4919-ad28-c5e91ec3a47f@github.com> > G1 barriers can be safely elided from writes to newly allocated objects as long as no safepoint is taken between the allocation and the write. This changeset complements early G1 barrier elision (performed by the platform-independent phases of C2, and limited to writes immediately following allocations) with a more general elision pass done at a late stage. > > The late elision pass exploits that it runs at a stage where the relative order of memory accesses and safepoints cannot change anymore to elide barriers from initialization writes that do not immediately follow the corresponding allocation, e.g. in conditional initialization writes: > > > o = new MyObject(); > if (...) { > o.myField = ...; // barrier elided only after this changeset > // (assuming no safepoint in the if condition) > } > > > or in initialization writes placed after exception-throwing checks: > > > o = new MyObject(); > if (...) { > throw new Exception(""); > } > o.myField = ...; // barrier elided only after this changeset > // (assuming no safepoint in the above if condition) > > > These patterns are commonly found in Java code, e.g. in the core libraries: > > - [conditional initialization](https://github.com/openjdk/jdk/blob/25fecaaf87400af535c242fe50296f1f89ceeb16/src/java.base/share/classes/java/lang/String.java#L4850), or > > - [initialization after exception-throwing checks (in the superclass constructor)](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/X-Buffer.java.template#L324). > > The optimization also enhances barrier elision for array initialization writes, for example eliding barriers from small array initialization loops (for which safepoints are not inserted): > > > Object[] a = new Object[...]; > for (int i = 0; i < a.length; i++) { > a[i] = ...; // barrier elided only after this changeset > } > > > or eliding barriers from array initialization writes with unknown array index: > > > Object[] a = new Object[...]; > a[index] = ...; // barrier elided only after this changeset > > > The logic used to perform this additional barrier elision is a subset of a pre-existing ZGC-specific optimization. This changeset simply reuses the relevant subset (barrier elision for writes to newly-allocated objects) by extracting the core of the optimization logic from `zBarrierSetC2.cpp` into the GC-shared file `barrierSetC2.cpp`. The functions `block_has_safepoint`, `block_index`, `look_through_node`, `is_{undefined|unknown|concrete}`, `get_base_and_offset`, `is_array... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Disable test IR checks for cases where barrier elision analysis fails to elide on s390 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23235/files - new: https://git.openjdk.org/jdk/pull/23235/files/3671f474..956e0ac5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23235&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23235&range=04-05 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23235/head:pull/23235 PR: https://git.openjdk.org/jdk/pull/23235 From rcastanedalo at openjdk.org Fri Feb 7 14:55:13 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 7 Feb 2025 14:55:13 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v4] In-Reply-To: References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> <-aHCYC9iVc4eMZ3pMfiDpqaW-wGM_s3zRMiVBWoadCM=.910336cd-3be2-45b5-9874-63b71abf38f8@github.com> Message-ID: On Fri, 7 Feb 2025 09:21:39 GMT, Roberto Casta?eda Lozano wrote: >> I see TestG1BarrierGeneration.java failure :( >> >> [TestG1BarrierGeneration_jtr.log](https://github.com/user-attachments/files/18676532/TestG1BarrierGeneration_jtr.log) > >> I see TestG1BarrierGeneration.java failure :( >> >> [TestG1BarrierGeneration_jtr.log](https://github.com/user-attachments/files/18676532/TestG1BarrierGeneration_jtr.log) > > @offamitkumar thanks for the report! Most likely the test failures are only due to missing optimizations (because of limitations in the barrier elision pattern matching analysis), but if you want me to confirm please send the entire jtreg log, without truncation. You can disable output truncation running the test like this: > `make run-test TEST="compiler/gcbarriers/TestG1BarrierGeneration.java" JTREG="MAX_OUTPUT=999999999"` > Please double-check that the output log file does not contain any `Output overflow` message. > @robcasloz Sure: > > I can spend time on it, maybe on weekend, for now I am overloaded with some other tasks. > > [TestG1BarrierGeneration_jtr_no_overflow.log](https://github.com/user-attachments/files/18706090/TestG1BarrierGeneration_jtr_no_overflow.log) Thanks Amit, I had a look and the failures are indeed due to missing barrier elisions for atomic operations on newly created objects, which is suboptimal but safe (and in practice unlikely to make a noticeable performance difference). I just disabled IR checks for the two affected tests on s390 by now (commit 956e0ac5). The issue is likely due to limitations in the pattern matching logic of barrier elision, but I do not have the proper means to debug it on s390. If you find a solution before this changeset is fully reviewed, feel free to propose a patch and I will merge it into the changeset. Otherwise, it can always be done as follow-up work. Hope this works for you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23235#issuecomment-2643162531 From tschatzl at openjdk.org Fri Feb 7 16:58:22 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 7 Feb 2025 16:58:22 GMT Subject: RFR: 8349213: G1: Clearing bitmaps during collection set merging not claimed by region [v3] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that makes (optional) bitmap clearing during merging remembered sets claim regions. Otherwise every thread will do the (currently little) work themselves over and over again. > > Testing: tier1-3 > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' into 8349213-bitmapclear-merging-not-claiming-regions - * ayang review - * move commenty - 8349213: G1: Clearing bitmaps during collection set merging not claimed by region Hi all, please review this change that makes (optional) bitmap clearing during merging remembered sets claim regions. Otherwise every thread will do the (currently little) work themselves over and over again. Testing: tier1-3 Thanks, Thomas ------------- Changes: https://git.openjdk.org/jdk/pull/23419/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23419&range=02 Stats: 12 lines in 1 file changed: 9 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23419.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23419/head:pull/23419 PR: https://git.openjdk.org/jdk/pull/23419 From wkemper at openjdk.org Fri Feb 7 22:21:45 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 7 Feb 2025 22:21:45 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v4] In-Reply-To: References: Message-ID: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Simplify locking protocol - Make shutdown more robust, make better use of request lock ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23475/files - new: https://git.openjdk.org/jdk/pull/23475/files/ae207480..a6513bcb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=02-03 Stats: 133 lines in 5 files changed: 54 ins; 39 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From wkemper at openjdk.org Fri Feb 7 22:28:25 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 7 Feb 2025 22:28:25 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v5] In-Reply-To: References: Message-ID: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). William Kemper has updated the pull request incrementally with one additional commit since the last revision: Fix includes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23475/files - new: https://git.openjdk.org/jdk/pull/23475/files/a6513bcb..d16f6fd0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=03-04 Stats: 3 lines in 1 file changed: 1 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From kdnilsen at openjdk.org Fri Feb 7 23:59:52 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 7 Feb 2025 23:59:52 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v2] In-Reply-To: References: Message-ID: > At the end of a degenerated GC, we check whether sufficient progress has been made in replenishing the memory available to the mutator. The test for good progress is implemented as a ratio of free memory against the total heap size. > > For generational Shenandoah, the ratio should be computed against the size of the young generation. Note that the size of the generational collection set is based on young generation size rather than total heap size. > > This issue first identified in GenShen GC logs, where a large number of degenerated cycles were upgrading to full GC because the free-set progress was short of desired by 10-25%. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Respond to reviewer feedback In testing suggested refinements, I discovered a bug in original implementation. ShenandoahFreeSet::capacity() does not represent the size of young generation. It represents the total size of the young regions that had available memory at the time we most recently rebuilt the ShenandoahFreeSet. I am rerunning the performance tests following this suggested change. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23306/files - new: https://git.openjdk.org/jdk/pull/23306/files/a850e484..7969515d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23306&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23306&range=00-01 Stats: 13 lines in 5 files changed: 4 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23306.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23306/head:pull/23306 PR: https://git.openjdk.org/jdk/pull/23306 From kdnilsen at openjdk.org Fri Feb 7 23:59:52 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 7 Feb 2025 23:59:52 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v2] In-Reply-To: References: Message-ID: On Fri, 31 Jan 2025 01:15:01 GMT, Xiaolong Peng wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Respond to reviewer feedback >> >> In testing suggested refinements, I discovered a bug in original >> implementation. ShenandoahFreeSet::capacity() does not represent the >> size of young generation. It represents the total size of the young >> regions that had available memory at the time we most recently rebuilt >> the ShenandoahFreeSet. >> >> I am rerunning the performance tests following this suggested change. > > src/hotspot/share/gc/shenandoah/shenandoahMetrics.cpp line 52: > >> 50: size_t free_actual = free_set->available(); >> 51: // The sum of free_set->capacity() and ->reserved represents capacity of young in generational, heap in non-generational. >> 52: size_t free_expected = ((free_set->capacity() + free_set->reserved()) / 100) * ShenandoahCriticalFreeThreshold; > > We may pass ShenandoahGeneration as parameter to `is_good_progress` to simplify the calculation of free_expected, it should be like: > ` > generation->max_capacity() / 100 * ShenandoahCriticalFreeThreshold > ` > Good part is, free_expected might be more accurate in Full GC/Degen for global cycle, e.g. Full GC collects memory for global, `free_expected` should be calculated using the metrics from global generation. But either way, `free_expected` is not clearly defined in generational mode now, current code also works. Thanks for this suggestion. I've made change. It turns out there was actually a bug in the original implementation, so I am retesting the performance results. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23306#discussion_r1947334711 From kdnilsen at openjdk.org Fri Feb 7 23:59:52 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 7 Feb 2025 23:59:52 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v2] In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 15:50:59 GMT, Paul Hohensee wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Respond to reviewer feedback >> >> In testing suggested refinements, I discovered a bug in original >> implementation. ShenandoahFreeSet::capacity() does not represent the >> size of young generation. It represents the total size of the young >> regions that had available memory at the time we most recently rebuilt >> the ShenandoahFreeSet. >> >> I am rerunning the performance tests following this suggested change. > > src/hotspot/share/gc/shenandoah/shenandoahMetrics.cpp line 52: > >> 50: size_t free_actual = free_set->available(); >> 51: // The sum of free_set->capacity() and ->reserved represents capacity of young in generational, heap in non-generational. >> 52: size_t free_expected = ((free_set->capacity() + free_set->reserved()) / 100) * ShenandoahCriticalFreeThreshold; > > As an outsider, the units involved and what exactly is being calculated is pretty opaque. Why would we divide by 100 to compute free_expected and not do the same for free_actual? Do we care about integer division truncation? The default value of ShenandoahCriticalFreeThreshold is 1, so multiplying by it is a nop by default, which seems strange. ShenandoahCriticalFreeThreshold represents a percentage of the "total size". To calculate N% of the young generation size, we divide the generation size by 100 and then multiply by ShenandoahCriticalFreeThreshold. This code is a bit different in the most recent revision. Do you think it needs a comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23306#discussion_r1947335561 From xpeng at openjdk.org Sat Feb 8 02:11:21 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Sat, 8 Feb 2025 02:11:21 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 23:54:46 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahMetrics.cpp line 52: >> >>> 50: size_t free_actual = free_set->available(); >>> 51: // The sum of free_set->capacity() and ->reserved represents capacity of young in generational, heap in non-generational. >>> 52: size_t free_expected = ((free_set->capacity() + free_set->reserved()) / 100) * ShenandoahCriticalFreeThreshold; >> >> We may pass ShenandoahGeneration as parameter to `is_good_progress` to simplify the calculation of free_expected, it should be like: >> ` >> generation->max_capacity() / 100 * ShenandoahCriticalFreeThreshold >> ` >> Good part is, free_expected might be more accurate in Full GC/Degen for global cycle, e.g. Full GC collects memory for global, `free_expected` should be calculated using the metrics from global generation. But either way, `free_expected` is not clearly defined in generational mode now, current code also works. > > Thanks for this suggestion. I've made change. It turns out there was actually a bug in the original implementation, so I am retesting the performance results. Thanks, honest I didn't understand that why `(free_set->capacity() + free_set->reserved()` represents capacity of young in generational, is it the bug you found? `free_set->capacity()` excludes the regions doesn't have enough capacity(it is calculated when rebuild free set) I thought a bit more, it makes more sense to calculate free_expected in `snap_before`, max_capacity of generations may change after collection, the free_expected should be calculated before the cycle. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23306#discussion_r1947405475 From tschatzl at openjdk.org Sat Feb 8 10:35:06 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Sat, 8 Feb 2025 10:35:06 GMT Subject: RFR: 8349213: G1: Clearing bitmaps during collection set merging not claimed by region [v4] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that makes (optional) bitmap clearing during merging remembered sets claim regions. Otherwise every thread will do the (currently little) work themselves over and over again. > > Testing: tier1-3 > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - * fix botched merge - Merge branch 'master' into 8349213-bitmapclear-merging-not-claiming-regions - Merge branch 'master' into 8349213-bitmapclear-merging-not-claiming-regions - * ayang review - * move commenty - 8349213: G1: Clearing bitmaps during collection set merging not claimed by region Hi all, please review this change that makes (optional) bitmap clearing during merging remembered sets claim regions. Otherwise every thread will do the (currently little) work themselves over and over again. Testing: tier1-3 Thanks, Thomas ------------- Changes: https://git.openjdk.org/jdk/pull/23419/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23419&range=03 Stats: 10 lines in 1 file changed: 8 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23419.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23419/head:pull/23419 PR: https://git.openjdk.org/jdk/pull/23419 From jarek.odzga at gmail.com Sun Feb 9 19:54:20 2025 From: jarek.odzga at gmail.com (Jaroslaw Odzga) Date: Sun, 9 Feb 2025 11:54:20 -0800 Subject: Configurable G1 heap expansion aggressiveness Message-ID: Context and Motivation In multi-tenant environments e.g. Kubernetes clusters in cloud environments there is a strong incentive to use as little memory as possible. Lower memory usage means more processes can be packed on a single VM which directly translates to lower cloud cost. Configuring G1 heap size in this setup is currently challenging. On the one hand we would like to set the max heap size to a high value so that application doesn?t fail with heap OOME when faced with unexpectedly high load or organic growth. On the other hand we need to set max heap size to as small a value as possible because G1 is very eager to expand heap even when tuned to collect garbage aggressively. Ideally, we would like to: - Set the initial heap size to a small value. - Set the max heap size to a value larger than expected usage so that application can handle unexpected load and organic growth. - Configure G1 GC to not expand heap aggressively. This is currently not possible. We propose two new JVM G1 flags that would give us more control over G1 heap expansion aggressiveness and realize significant cost savings in multi-tenant environments. At the same time we don?t want to change existing G1 behavior - with default values of the new flags current G1 behavior would be maintained. Analysis Currently even with very aggressive G1 configuration such as: -XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20 -XX:GCTimeRatio=4 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60 the heap is fairly eagerly expanded. We found two culprits responsible for this in G1HeapSizingPolicy::young_collection_expansion_amount() function. First, the scale_with_heap() function makes pause_time_threshold small in cases where current heap size is smaller than 1/2 of max heap size. While it is likely a desired behavior in many situations, it also causes memory usage spikes in situations where max heap size is much larger than current heap size. Second, the MinOverThresholdForGrowth constant equal to 4 is an arbitrary value which hardcodes the heap expansion aggressiveness. We observed that short_term_pause_time_ratio can exceed pause_time_threshold and trigger heap expansion too eagerly in many situations, especially when allocation rate is spiky. Proposal We would like to introduce two new experimental flags: - G1ScaleWithHeapPauseTimeThreshold: a binary flag that would allow disabling scale_with_heap() - G1MinPausesOverThresholdForGrowth: a value between 1 and 10, a configurable replacement for the MinOverThresholdForGrowth constant. We don?t want to change the default behavior of G1. Default values for these flags (G1ScaleWithHeapPauseTimeThreshold=true, G1MinPausesOverThresholdForGrowth=4) would maintain the existing behavior. Alternatives There is currently no good alternative. Potentially we could configure G1 aggressively to trigger GC very frequently e.g.: -XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20 -XX:GCTimeRatio=4 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60 Even with this configuration we see occasional large memory spikes where heap is quickly expanded. Even though the expanded heap contracts eventually, this poses a significant problem because in practice we don?t know if such a spike could have been avoided so it is not obvious how much memory the application really needs. Of course such configuration would also consume more CPU. Experimental results We tested this change on patched jdk17. With new flags we can use far less aggressive -XX:GCTimeRatio=9 together with -XX:-G1ScaleWithHeapPauseTimeThreshold and -XX:G1MinPausesOverThresholdForGrowth=10 (this effectively disables heap expansion based on short time pause ratio and only depends on long time pause ratio). Compared to more aggressive G1 configuration mentioned above we see lower CPU usage, and 30%-60% lower max memory usage. Implementation https://github.com/openjdk/jdk/pull/23534 From phh at openjdk.org Mon Feb 10 18:44:13 2025 From: phh at openjdk.org (Paul Hohensee) Date: Mon, 10 Feb 2025 18:44:13 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 23:56:56 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahMetrics.cpp line 52: >> >>> 50: size_t free_actual = free_set->available(); >>> 51: // The sum of free_set->capacity() and ->reserved represents capacity of young in generational, heap in non-generational. >>> 52: size_t free_expected = ((free_set->capacity() + free_set->reserved()) / 100) * ShenandoahCriticalFreeThreshold; >> >> As an outsider, the units involved and what exactly is being calculated is pretty opaque. Why would we divide by 100 to compute free_expected and not do the same for free_actual? Do we care about integer division truncation? The default value of ShenandoahCriticalFreeThreshold is 1, so multiplying by it is a nop by default, which seems strange. > > ShenandoahCriticalFreeThreshold represents a percentage of the "total size". To calculate N% of the young generation size, we divide the generation size by 100 and then multiply by ShenandoahCriticalFreeThreshold. This code is a bit different in the most recent revision. Do you think it needs a comment? Yes :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23306#discussion_r1949689186 From kdnilsen at openjdk.org Mon Feb 10 19:55:12 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Feb 2025 19:55:12 GMT Subject: RFR: 8348594: Shenandoah: Do not penalize for degeneration when not the fault of triggering heuristic [v2] In-Reply-To: References: Message-ID: <68DeNcSBaX3EJo0OuQI7800ywqaQjhcCMpIjFqwdoao=.0da72a64-afa1-43bc-83bb-d4caf0d62514@github.com> On Tue, 4 Feb 2025 16:08:02 GMT, Paul Hohensee wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Respond to reviewer feedback > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.hpp line 87: > >> 85: size_t _declined_trigger_count; // This counts how many times since previous GC finished that this >> 86: // heuristic has answered false to should_start_gc(). >> 87: size_t _previous_trigger_declinations; // This represents the value of _declined_trigger_count as captured at the > > Maybe the name should be _most_recent_declined_trigger_count, which relates it directly to _declined_trigger_count. Thanks for suggestion. I'm making this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23305#discussion_r1949788716 From kdnilsen at openjdk.org Mon Feb 10 20:02:14 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Feb 2025 20:02:14 GMT Subject: RFR: 8348594: Shenandoah: Do not penalize for degeneration when not the fault of triggering heuristic [v2] In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 16:04:34 GMT, Paul Hohensee wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Respond to reviewer feedback > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 261: > >> 259: >> 260: void ShenandoahHeuristics::record_success_concurrent() { >> 261: _start_gc_is_pending = false; > > The name _start_gc_is_pending implies that it should be set false as soon as a gc cycle starts, not when it finishes. Maybe _gc_pending? Or maybe setting it false at the end of a gc cycle is a bug? :) You make a good point. I'll change the control flow to cancel the trigger as soon as we start up the GC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23305#discussion_r1949798178 From kdnilsen at openjdk.org Mon Feb 10 20:28:54 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Feb 2025 20:28:54 GMT Subject: RFR: 8348594: Shenandoah: Do not penalize for degeneration when not the fault of triggering heuristic [v2] In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 16:14:49 GMT, Paul Hohensee wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Respond to reviewer feedback > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 318: > >> 316: >> 317: if (ShenandoahHeuristics::should_start_gc()) { >> 318: _start_gc_is_pending = true; > > I assume there's no race here, i.e., only one thread reads/writes _start_gc_is_pending. If there's a race, make sure it's benign. In either case, _start_gc_is_pending is made "sticky" by this code. There is no race. A single control thread queries should_start-gc() and that is the same thread that initiates the GC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23305#discussion_r1949828557 From kdnilsen at openjdk.org Mon Feb 10 20:28:54 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Feb 2025 20:28:54 GMT Subject: RFR: 8348594: Shenandoah: Do not penalize for degeneration when not the fault of triggering heuristic [v3] In-Reply-To: References: Message-ID: <2v0axonBAvZDKo779TX8POWEXGeMCA5xaKV3KQBQo14=.fbd1e6bc-0e12-4a0c-a9f7-ba1d3c5f728d@github.com> > Shenandoah heuristics use a penalty mechanism to cause earlier GC triggers when recent concurrent GC cycles degenerate. Degeneration is a stop-the-world remediation that allows GC to catch up when mutator allocations fail during concurrent GC. The fact that we needed to degenerate indicates that we were overly optimistic in delaying the trigger that starts concurrent GC. > > We have observed that it is common for degenerated GC cycles to cascade upon each other. The condition that caused an initial degenerated cycle is often not fully resolved by the end of that degenerated cycle. For example, the application may be experiencing a phase change and the GC heuristics are not yet attuned to the new behavior. Furthermore, a degenerated GC may exacerbate the problem condition. During the stop-the-world pause imposed by the first degenerated GC, work continues to accumulate in the form of new client requests that are buffered in network sockets until the end of that degenerated GC. > > As originally implemented, each degeneration would "pile on" additional penalties. These penalties cause the GC frequency to continue to increase. And the expanding CPU load of GC makes it increasingly difficult for mutator threads to catchup. The large penalties accumulated while we are trying to resolve the problem linger long after the problem condition has been resolved. > > This change does not add further to the degeneration penalties if a new degenerated cycle occurs through no fault of the triggering mechanism. We only add the degeneration penalty if the reason we are now degenerating can be attributed to a consciously late trigger by the heuristic. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Respond to reviewer feedback - Use generation size to determine expected free ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23305/files - new: https://git.openjdk.org/jdk/pull/23305/files/ee3cdacc..8a9e4c5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23305&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23305&range=01-02 Stats: 27 lines in 8 files changed: 13 ins; 3 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/23305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23305/head:pull/23305 PR: https://git.openjdk.org/jdk/pull/23305 From kdnilsen at openjdk.org Mon Feb 10 20:41:10 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Feb 2025 20:41:10 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v2] In-Reply-To: References: Message-ID: On Sat, 8 Feb 2025 02:06:13 GMT, Xiaolong Peng wrote: >> Thanks for this suggestion. I've made change. It turns out there was actually a bug in the original implementation, so I am retesting the performance results. > > Thanks, honest I didn't understand that why `(free_set->capacity() + free_set->reserved()` represents capacity of young in generational, is it the bug you found? `free_set->capacity()` is the capacity of all mutator regions which also excludes the regions doesn't have capacity for new object alloc(it is calculated when rebuild free set) > > I thought a bit more, it makes more sense to calculate free_expected in `snap_before`, max_capacity of generations may change after collection, the free_expected should be calculated before the cycle. Interesting thoughts. So young-generation size will change under these circumstances: 1. There's a lot of young-gen memory to be promoted, or we choose to promote some young-gen regions in place (by relabeling the regions as OLD without evacuating their data). In both of these cases, we may shrink young in order to expand old. 2. The GC cycle is mixed, so it has the side effect of reclaiming some old regions. These reclaimed old regions will typically be granted back to young, until such time as we need to expand old in order to hold results of promotion. While it makes sense for expected to be computed based on "original size" of young generation, the question of how much free remaining in young represents "good progress" should probably be based on the current size of young. Ultimately, we are trying to figure out if there's enough memory in young to make it worthwhile to attempt another concurrent GC cycle. I realize this thinking is a bit "fuzzy". The heuristic was originally designed for non-generational use. I'm inclined to keep as is currently implemented, but should probably add a comment to explain why. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23306#discussion_r1949847579 From wkemper at openjdk.org Mon Feb 10 21:26:59 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Feb 2025 21:26:59 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v6] In-Reply-To: References: Message-ID: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Add event for control thread state changes - Fix shutdown livelock error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23475/files - new: https://git.openjdk.org/jdk/pull/23475/files/d16f6fd0..f11584d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=04-05 Stats: 13 lines in 1 file changed: 6 ins; 3 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From wkemper at openjdk.org Mon Feb 10 21:54:51 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Feb 2025 21:54:51 GMT Subject: RFR: 8348268: Test gc/shenandoah/TestResizeTLAB.java#compact: fatal error: Before Updating References: Thread C2 CompilerThread1: expected gc-state 9, actual 21 [v2] In-Reply-To: References: Message-ID: <73lkaeIWH7aBWahNyU_czTYSnSmCMOURWYDv55-zc4Y=.39398a24-6904-465c-8e47-7bfe32efc9db@github.com> > Non-java threads were not having their gc-state configured when they attach. If they were created before the verifier's safepoint, but after the iteration over non-java threads, they would not have the correct state. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Hold the thread lock when concurrently changing gc state ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23428/files - new: https://git.openjdk.org/jdk/pull/23428/files/f402628e..1a4e3bb1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23428&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23428&range=00-01 Stats: 13 lines in 1 file changed: 8 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23428.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23428/head:pull/23428 PR: https://git.openjdk.org/jdk/pull/23428 From xpeng at openjdk.org Mon Feb 10 22:27:12 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 10 Feb 2025 22:27:12 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v2] In-Reply-To: References: Message-ID: <7B6iyKusHAIUGeaVJVEiCWQTIe64ZPJKImH-YYUB3K0=.8d7612e4-9e08-4b90-9c60-0f68d3e7c4ad@github.com> On Mon, 10 Feb 2025 20:38:35 GMT, Kelvin Nilsen wrote: >> Thanks, honest I didn't understand that why `(free_set->capacity() + free_set->reserved()` represents capacity of young in generational, is it the bug you found? `free_set->capacity()` is the capacity of all mutator regions which also excludes the regions doesn't have capacity for new object alloc(it is calculated when rebuild free set) >> >> I thought a bit more, it makes more sense to calculate free_expected in `snap_before`, max_capacity of generations may change after collection, the free_expected should be calculated before the cycle. > > Interesting thoughts. So young-generation size will change under these circumstances: > > 1. There's a lot of young-gen memory to be promoted, or we choose to promote some young-gen regions in place (by relabeling the regions as OLD without evacuating their data). In both of these cases, we may shrink young in order to expand old. > 2. The GC cycle is mixed, so it has the side effect of reclaiming some old regions. These reclaimed old regions will typically be granted back to young, until such time as we need to expand old in order to hold results of promotion. > > While it makes sense for expected to be computed based on "original size" of young generation, the question of how much free remaining in young represents "good progress" should probably be based on the current size of young. Ultimately, we are trying to figure out if there's enough memory in young to make it worthwhile to attempt another concurrent GC cycle. > > I realize this thinking is a bit "fuzzy". The heuristic was originally designed for non-generational use. > > I'm inclined to keep as is currently implemented, but should probably add a comment to explain why. What do you think? Thanks for the explanation, I agree with it is bit "fuzzy". I'm not sure we should consider following case: Degen cycle doesn't reclaim any memory, but promoted some young regions resulting in young capacity to shrink, in this case we may treat it as "good progress" but actually it is not. A "good progress" could be `free_actual_after > free_actual_before && free_actual_after > free_expected`, what do you think? I am not sure all cases triggering degen cycle, this might be a false case that never happens. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23306#discussion_r1949985042 From kdnilsen at openjdk.org Mon Feb 10 23:19:27 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Feb 2025 23:19:27 GMT Subject: RFR: 8348594: Shenandoah: Do not penalize for degeneration when not the fault of triggering heuristic [v4] In-Reply-To: References: Message-ID: > Shenandoah heuristics use a penalty mechanism to cause earlier GC triggers when recent concurrent GC cycles degenerate. Degeneration is a stop-the-world remediation that allows GC to catch up when mutator allocations fail during concurrent GC. The fact that we needed to degenerate indicates that we were overly optimistic in delaying the trigger that starts concurrent GC. > > We have observed that it is common for degenerated GC cycles to cascade upon each other. The condition that caused an initial degenerated cycle is often not fully resolved by the end of that degenerated cycle. For example, the application may be experiencing a phase change and the GC heuristics are not yet attuned to the new behavior. Furthermore, a degenerated GC may exacerbate the problem condition. During the stop-the-world pause imposed by the first degenerated GC, work continues to accumulate in the form of new client requests that are buffered in network sockets until the end of that degenerated GC. > > As originally implemented, each degeneration would "pile on" additional penalties. These penalties cause the GC frequency to continue to increase. And the expanding CPU load of GC makes it increasingly difficult for mutator threads to catchup. The large penalties accumulated while we are trying to resolve the problem linger long after the problem condition has been resolved. > > This change does not add further to the degeneration penalties if a new degenerated cycle occurs through no fault of the triggering mechanism. We only add the degeneration penalty if the reason we are now degenerating can be attributed to a consciously late trigger by the heuristic. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Revert "Use generation size to determine expected free" This reverts commit 94a32ebfe5fefcc0e899e09e6fbfc0585c62b4e0. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23305/files - new: https://git.openjdk.org/jdk/pull/23305/files/8a9e4c5e..ee7fe689 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23305&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23305&range=02-03 Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23305/head:pull/23305 PR: https://git.openjdk.org/jdk/pull/23305 From kdnilsen at openjdk.org Mon Feb 10 23:32:09 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Feb 2025 23:32:09 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v2] In-Reply-To: <7B6iyKusHAIUGeaVJVEiCWQTIe64ZPJKImH-YYUB3K0=.8d7612e4-9e08-4b90-9c60-0f68d3e7c4ad@github.com> References: <7B6iyKusHAIUGeaVJVEiCWQTIe64ZPJKImH-YYUB3K0=.8d7612e4-9e08-4b90-9c60-0f68d3e7c4ad@github.com> Message-ID: <16WXn9LEVXGdRSeJ98OxomG66UfnruLxo9nnfY52ZJo=.f4acdbb1-c99b-4be8-807b-bdbf9504af81@github.com> On Mon, 10 Feb 2025 22:24:35 GMT, Xiaolong Peng wrote: >> Interesting thoughts. So young-generation size will change under these circumstances: >> >> 1. There's a lot of young-gen memory to be promoted, or we choose to promote some young-gen regions in place (by relabeling the regions as OLD without evacuating their data). In both of these cases, we may shrink young in order to expand old. >> 2. The GC cycle is mixed, so it has the side effect of reclaiming some old regions. These reclaimed old regions will typically be granted back to young, until such time as we need to expand old in order to hold results of promotion. >> >> While it makes sense for expected to be computed based on "original size" of young generation, the question of how much free remaining in young represents "good progress" should probably be based on the current size of young. Ultimately, we are trying to figure out if there's enough memory in young to make it worthwhile to attempt another concurrent GC cycle. >> >> I realize this thinking is a bit "fuzzy". The heuristic was originally designed for non-generational use. >> >> I'm inclined to keep as is currently implemented, but should probably add a comment to explain why. What do you think? > > Thanks for the explanation, I agree with it is bit "fuzzy". > I'm not sure we should consider following case: > > Degen cycle doesn't reclaim any memory, but promoted some young regions resulting in young capacity to shrink, in this case we may treat it as "good progress" but actually it is not. > > A "good progress" could be `free_actual_after > free_actual_before && free_actual_after > free_expected`, what do you think? I am not sure all cases triggering degen cycle, this might be a false case that never happens. If we manage to pass the test "free_actual_after > free_expected" following the degen, even if young has shrunk, I think it is reasonable to pursue concurrent GC. Passing this exact test at the end of the next GC (assuming no further adjustments to generation sizes) would qualify us to continue with concurrent GC on the next cycle. In general, it is very rare that "full gc" is the right thing to do. we're in the process of deprecating it entirely. I will add a comment to clarify the thinking here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23306#discussion_r1950040394 From kdnilsen at openjdk.org Mon Feb 10 23:43:11 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Feb 2025 23:43:11 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 23:59:52 GMT, Kelvin Nilsen wrote: >> At the end of a degenerated GC, we check whether sufficient progress has been made in replenishing the memory available to the mutator. The test for good progress is implemented as a ratio of free memory against the total heap size. >> >> For generational Shenandoah, the ratio should be computed against the size of the young generation. Note that the size of the generational collection set is based on young generation size rather than total heap size. >> >> This issue first identified in GenShen GC logs, where a large number of degenerated cycles were upgrading to full GC because the free-set progress was short of desired by 10-25%. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Respond to reviewer feedback > > In testing suggested refinements, I discovered a bug in original > implementation. ShenandoahFreeSet::capacity() does not represent the > size of young generation. It represents the total size of the young > regions that had available memory at the time we most recently rebuilt > the ShenandoahFreeSet. > > I am rerunning the performance tests following this suggested change. These are updated performance results after making the change that uses generation size to determine expected. This change computes a larger expected size, increasing the likelihood that a particular degenerated cycle will be considered "bad progress": ![Screenshot 2025-02-10 at 3 38 18?PM](https://github.com/user-attachments/assets/d0826502-aec1-4e30-88e7-03a4d25e5661) This represents overall improvement compared to previously reported number. It would appear that the difference in performance might be the result of "random noise". ------------- PR Comment: https://git.openjdk.org/jdk/pull/23306#issuecomment-2649499090 From kdnilsen at openjdk.org Mon Feb 10 23:48:11 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 10 Feb 2025 23:48:11 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 23:59:52 GMT, Kelvin Nilsen wrote: >> At the end of a degenerated GC, we check whether sufficient progress has been made in replenishing the memory available to the mutator. The test for good progress is implemented as a ratio of free memory against the total heap size. >> >> For generational Shenandoah, the ratio should be computed against the size of the young generation. Note that the size of the generational collection set is based on young generation size rather than total heap size. >> >> This issue first identified in GenShen GC logs, where a large number of degenerated cycles were upgrading to full GC because the free-set progress was short of desired by 10-25%. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Respond to reviewer feedback > > In testing suggested refinements, I discovered a bug in original > implementation. ShenandoahFreeSet::capacity() does not represent the > size of young generation. It represents the total size of the young > regions that had available memory at the time we most recently rebuilt > the ShenandoahFreeSet. > > I am rerunning the performance tests following this suggested change. These are the results of combining both proposed PRs into a single execution test: ![Screenshot 2025-02-10 at 3 41 31?PM](https://github.com/user-attachments/assets/c3db758c-41ea-4c0d-a91e-0a44aaefc390) This result is not as good as what was reported above. In my judgment, it still represents improvement over tip. The difference between the two runs may also be signal noise as there is no clear correlation between number of Full GCs and percentile latencies. The two full GCs reported in the "better both (redo)" run both result from alloc failure during evacuation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23306#issuecomment-2649506896 From xpeng at openjdk.org Mon Feb 10 23:51:13 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 10 Feb 2025 23:51:13 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v2] In-Reply-To: References: Message-ID: On Fri, 7 Feb 2025 23:59:52 GMT, Kelvin Nilsen wrote: >> At the end of a degenerated GC, we check whether sufficient progress has been made in replenishing the memory available to the mutator. The test for good progress is implemented as a ratio of free memory against the total heap size. >> >> For generational Shenandoah, the ratio should be computed against the size of the young generation. Note that the size of the generational collection set is based on young generation size rather than total heap size. >> >> This issue first identified in GenShen GC logs, where a large number of degenerated cycles were upgrading to full GC because the free-set progress was short of desired by 10-25%. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Respond to reviewer feedback > > In testing suggested refinements, I discovered a bug in original > implementation. ShenandoahFreeSet::capacity() does not represent the > size of young generation. It represents the total size of the young > regions that had available memory at the time we most recently rebuilt > the ShenandoahFreeSet. > > I am rerunning the performance tests following this suggested change. Thank for the comprehensive tests and explanations, my approve doesn't count though:) ------------- Marked as reviewed by xpeng (Author). PR Review: https://git.openjdk.org/jdk/pull/23306#pullrequestreview-2607419434 From wkemper at openjdk.org Tue Feb 11 00:54:36 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Feb 2025 00:54:36 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v7] In-Reply-To: References: Message-ID: <7vsmPKQNSOx9PxGp2C1yjC5IeEtB2ZWPRybQQ-s4YNE=.1b8ffa7e-cc6d-4885-a9c4-16a503d9d8d9@github.com> > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Do not accept requests if control thread is terminating - Notify waiters when control thread terminates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23475/files - new: https://git.openjdk.org/jdk/pull/23475/files/f11584d5..861ed699 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=05-06 Stats: 26 lines in 3 files changed: 24 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From kdnilsen at openjdk.org Tue Feb 11 03:39:41 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Feb 2025 03:39:41 GMT Subject: RFR: 8349766: GenShen: Bad progress after degen does not always need full gc Message-ID: In generational mode, only upgrade to full GC from degenerated GC if we've done two degenerated cycles in a row and both indicated bad progress. Otherwise, start another concurrent GC, which will most likely degenerate also. But this degenerated cycle will reclaim floating garbage within the young generation much more quickly than a full GC would have done. ------------- Commit messages: - Be less eager to upgrade degen to full gc Changes: https://git.openjdk.org/jdk/pull/23552/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23552&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349766 Stats: 20 lines in 2 files changed: 17 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23552.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23552/head:pull/23552 PR: https://git.openjdk.org/jdk/pull/23552 From kdnilsen at openjdk.org Tue Feb 11 03:39:41 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Feb 2025 03:39:41 GMT Subject: RFR: 8349766: GenShen: Bad progress after degen does not always need full gc In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 03:31:51 GMT, Kelvin Nilsen wrote: > In generational mode, only upgrade to full GC from degenerated GC if we've done two degenerated cycles in a row and both indicated bad progress. Otherwise, start another concurrent GC, which will most likely degenerate also. But this degenerated cycle will reclaim floating garbage within the young generation much more quickly than a full GC would have done. Some detailed results running the workload mentioned in JBS ticket on tip: ![Screenshot 2025-02-10 at 7 10 18?PM](https://github.com/user-attachments/assets/c06606a6-ec21-4e40-b117-915ddfc0d1f6) These are results running the same workload with the changes of this PR: ![Screenshot 2025-02-10 at 7 35 47?PM](https://github.com/user-attachments/assets/432c227e-9bf4-4f21-8099-1b39b5af364a) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23552#issuecomment-2649732684 PR Comment: https://git.openjdk.org/jdk/pull/23552#issuecomment-2649733471 From kdnilsen at openjdk.org Tue Feb 11 03:53:10 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Feb 2025 03:53:10 GMT Subject: RFR: 8349766: GenShen: Bad progress after degen does not always need full gc In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 03:31:51 GMT, Kelvin Nilsen wrote: > In generational mode, only upgrade to full GC from degenerated GC if we've done two degenerated cycles in a row and both indicated bad progress. Otherwise, start another concurrent GC, which will most likely degenerate also. But this degenerated cycle will reclaim floating garbage within the young generation much more quickly than a full GC would have done. Green represents improvement compared to tip. For runs 3-5, the new code is notably better. Run 2 is significantly worse in p50, but about equal to tip average at p99.999 and above. Run 1 is close to averages of tip at p50, but up to 56% above max at higher percentiles. Most noteworthy is that we were able to significantly reduce the number of Full GCs without causing a crash or OOM. On this workload, full GCs are known to require approximately 3 s of pause time. The average degen cycle required 1.4s (102 out of cycle, 142 at roots, 149 at mark, 1 at evac, 28 at update refs). Note that an upgraded Full GC results in a pause that is the sum of a Full GC plus a degenerated GC. While there is reason to be concerned about trial two results on the PR code, I expect that unlucky scenario, whatever it was, will be much less likely in the context of in-flight PRs to advance triggering of GC when allocation rates are accelerating and to surge GC workers whenever there is increased risk of degenerated cycles. Perhaps, we should wait until those other PRs are integrated and then retest. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23552#issuecomment-2649744182 From kdnilsen at openjdk.org Tue Feb 11 04:08:48 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Feb 2025 04:08:48 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v3] In-Reply-To: References: Message-ID: <0bbrstoX8nDMn2Ku_WwSYn_NYSSLi3yXkWdg28imCHo=.ab1661a4-1ea5-4c57-9fde-0ee63ebac027@github.com> > At the end of a degenerated GC, we check whether sufficient progress has been made in replenishing the memory available to the mutator. The test for good progress is implemented as a ratio of free memory against the total heap size. > > For generational Shenandoah, the ratio should be computed against the size of the young generation. Note that the size of the generational collection set is based on young generation size rather than total heap size. > > This issue first identified in GenShen GC logs, where a large number of degenerated cycles were upgrading to full GC because the free-set progress was short of desired by 10-25%. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Add comments suggested by reviewers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23306/files - new: https://git.openjdk.org/jdk/pull/23306/files/7969515d..8f644cdb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23306&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23306&range=01-02 Stats: 15 lines in 1 file changed: 14 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23306.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23306/head:pull/23306 PR: https://git.openjdk.org/jdk/pull/23306 From xpeng at openjdk.org Tue Feb 11 05:54:11 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 11 Feb 2025 05:54:11 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v3] In-Reply-To: <0bbrstoX8nDMn2Ku_WwSYn_NYSSLi3yXkWdg28imCHo=.ab1661a4-1ea5-4c57-9fde-0ee63ebac027@github.com> References: <0bbrstoX8nDMn2Ku_WwSYn_NYSSLi3yXkWdg28imCHo=.ab1661a4-1ea5-4c57-9fde-0ee63ebac027@github.com> Message-ID: On Tue, 11 Feb 2025 04:08:48 GMT, Kelvin Nilsen wrote: >> At the end of a degenerated GC, we check whether sufficient progress has been made in replenishing the memory available to the mutator. The test for good progress is implemented as a ratio of free memory against the total heap size. >> >> For generational Shenandoah, the ratio should be computed against the size of the young generation. Note that the size of the generational collection set is based on young generation size rather than total heap size. >> >> This issue first identified in GenShen GC logs, where a large number of degenerated cycles were upgrading to full GC because the free-set progress was short of desired by 10-25%. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Add comments suggested by reviewers Marked as reviewed by xpeng (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/23306#pullrequestreview-2607738788 From shade at openjdk.org Tue Feb 11 08:50:11 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 11 Feb 2025 08:50:11 GMT Subject: RFR: 8348268: Test gc/shenandoah/TestResizeTLAB.java#compact: fatal error: Before Updating References: Thread C2 CompilerThread1: expected gc-state 9, actual 21 [v2] In-Reply-To: <73lkaeIWH7aBWahNyU_czTYSnSmCMOURWYDv55-zc4Y=.39398a24-6904-465c-8e47-7bfe32efc9db@github.com> References: <73lkaeIWH7aBWahNyU_czTYSnSmCMOURWYDv55-zc4Y=.39398a24-6904-465c-8e47-7bfe32efc9db@github.com> Message-ID: On Mon, 10 Feb 2025 21:54:51 GMT, William Kemper wrote: >> Non-java threads were not having their gc-state configured when they attach. If they were created before the verifier's safepoint, but after the iteration over non-java threads, they would not have the correct state. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Hold the thread lock when concurrently changing gc state Great find. So that means we cannot safely do `ShenandoahHeap::set_gc_state_concurrent`, unless we hold `Threads_lock` and do a handshake afterwards? I think a part of comment that you have near `MutexLocker` can go to `ShenandoahHeap::set_gc_state_concurrent` with the `assert(Threads_lock->is_locked(), ...`. ------------- PR Review: https://git.openjdk.org/jdk/pull/23428#pullrequestreview-2608045755 From iwalulya at openjdk.org Tue Feb 11 09:23:11 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 11 Feb 2025 09:23:11 GMT Subject: RFR: 8349213: G1: Clearing bitmaps during collection set merging not claimed by region [v4] In-Reply-To: References: Message-ID: On Sat, 8 Feb 2025 10:35:06 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that makes (optional) bitmap clearing during merging remembered sets claim regions. Otherwise every thread will do the (currently little) work themselves over and over again. >> >> Testing: tier1-3 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - * fix botched merge > - Merge branch 'master' into 8349213-bitmapclear-merging-not-claiming-regions > - Merge branch 'master' into 8349213-bitmapclear-merging-not-claiming-regions > - * ayang review > - * move commenty > - 8349213: G1: Clearing bitmaps during collection set merging not claimed by region > > Hi all, > > please review this change that makes (optional) bitmap clearing during merging remembered sets claim regions. Otherwise every thread will do the (currently little) work themselves over and over again. > > Testing: tier1-3 > > Thanks, > Thomas LGTM! ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23419#pullrequestreview-2608121652 From dholmes at openjdk.org Tue Feb 11 09:32:13 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 11 Feb 2025 09:32:13 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: <_CnY-j8qQhI5hEydYYH1gfQQP909-QrWTboS79F6UHA=.cf2527c7-5a81-4e4d-8433-ce18f9d63982@github.com> On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker Sorry still on my to-do list. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2650249785 From tschatzl at openjdk.org Tue Feb 11 09:55:15 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 11 Feb 2025 09:55:15 GMT Subject: RFR: 8349213: G1: Clearing bitmaps during collection set merging not claimed by region [v2] In-Reply-To: References: Message-ID: On Tue, 4 Feb 2025 10:55:35 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * ayang review > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk @walulyai for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/23419#issuecomment-2650301620 From tschatzl at openjdk.org Tue Feb 11 09:55:16 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 11 Feb 2025 09:55:16 GMT Subject: Integrated: 8349213: G1: Clearing bitmaps during collection set merging not claimed by region In-Reply-To: References: Message-ID: On Mon, 3 Feb 2025 14:11:20 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that makes (optional) bitmap clearing during merging remembered sets claim regions. Otherwise every thread will do the (currently little) work themselves over and over again. > > Testing: tier1-3 > > Thanks, > Thomas This pull request has now been integrated. Changeset: 8e858294 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/8e8582949669d5f3dcb68886ccb6a719393d1a9e Stats: 10 lines in 1 file changed: 8 ins; 2 del; 0 mod 8349213: G1: Clearing bitmaps during collection set merging not claimed by region Reviewed-by: iwalulya, ayang ------------- PR: https://git.openjdk.org/jdk/pull/23419 From phh at openjdk.org Tue Feb 11 14:20:12 2025 From: phh at openjdk.org (Paul Hohensee) Date: Tue, 11 Feb 2025 14:20:12 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v3] In-Reply-To: <0bbrstoX8nDMn2Ku_WwSYn_NYSSLi3yXkWdg28imCHo=.ab1661a4-1ea5-4c57-9fde-0ee63ebac027@github.com> References: <0bbrstoX8nDMn2Ku_WwSYn_NYSSLi3yXkWdg28imCHo=.ab1661a4-1ea5-4c57-9fde-0ee63ebac027@github.com> Message-ID: <2XLAHIk0VEr8Xae-jNqjMZjBtPTrHqm8nl7tn_rigS8=.155e8a5a-193a-49b8-a773-b8e60b4dc3f5@github.com> On Tue, 11 Feb 2025 04:08:48 GMT, Kelvin Nilsen wrote: >> At the end of a degenerated GC, we check whether sufficient progress has been made in replenishing the memory available to the mutator. The test for good progress is implemented as a ratio of free memory against the total heap size. >> >> For generational Shenandoah, the ratio should be computed against the size of the young generation. Note that the size of the generational collection set is based on young generation size rather than total heap size. >> >> This issue first identified in GenShen GC logs, where a large number of degenerated cycles were upgrading to full GC because the free-set progress was short of desired by 10-25%. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Add comments suggested by reviewers Looks good. ------------- Marked as reviewed by phh (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23306#pullrequestreview-2608901033 From kdnilsen at openjdk.org Tue Feb 11 14:20:13 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Feb 2025 14:20:13 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v3] In-Reply-To: References: Message-ID: On Mon, 10 Feb 2025 18:41:27 GMT, Paul Hohensee wrote: >> ShenandoahCriticalFreeThreshold represents a percentage of the "total size". To calculate N% of the young generation size, we divide the generation size by 100 and then multiply by ShenandoahCriticalFreeThreshold. This code is a bit different in the most recent revision. Do you think it needs a comment? > > Yes :) I've added a comment here. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23306#discussion_r1950933308 From ayang at openjdk.org Tue Feb 11 15:28:25 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 11 Feb 2025 15:28:25 GMT Subject: RFR: 8348171: Refactor GenerationCounters and its subclasses [v5] In-Reply-To: References: Message-ID: <7otkT63ENoyKzZ29CbYpycLLwL89ARajYg36Mstz4tQ=.fd3c7dcf-5a8b-44be-9205-09e3d160d54d@github.com> > Simple refactoring of removing the use of `virtual` method and use concrete subclasses when needed. > > Test: tier1-5 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into gen-counter - review - * some more refactoring - review - Merge branch 'master' into gen-counter - merge - gen-counter ------------- Changes: https://git.openjdk.org/jdk/pull/23209/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23209&range=04 Stats: 202 lines in 17 files changed: 6 ins; 160 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/23209.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23209/head:pull/23209 PR: https://git.openjdk.org/jdk/pull/23209 From tschatzl at openjdk.org Tue Feb 11 16:19:54 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 11 Feb 2025 16:19:54 GMT Subject: RFR: 8349836: G1: Improve group prediction log message Message-ID: Hi all, please review this minor change to the group prediction log message printed with gc+ergo+cset=trace: * add group id to be able to refer to something concrete when discussing results * add total time * fix typo in `bytes_to_cop` Testing: gha, local verification Thanks, Thomas ------------- Commit messages: - 8349836 Changes: https://git.openjdk.org/jdk/pull/23562/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23562&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349836 Stats: 12 lines in 1 file changed: 7 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23562.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23562/head:pull/23562 PR: https://git.openjdk.org/jdk/pull/23562 From kdnilsen at openjdk.org Tue Feb 11 18:15:58 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Feb 2025 18:15:58 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v4] In-Reply-To: References: Message-ID: > At the end of a degenerated GC, we check whether sufficient progress has been made in replenishing the memory available to the mutator. The test for good progress is implemented as a ratio of free memory against the total heap size. > > For generational Shenandoah, the ratio should be computed against the size of the young generation. Note that the size of the generational collection set is based on young generation size rather than total heap size. > > This issue first identified in GenShen GC logs, where a large number of degenerated cycles were upgrading to full GC because the free-set progress was short of desired by 10-25%. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge tag 'jdk-25+9' into fix-generational-no-progress-check Added tag jdk-25+9 for changeset 30f71622 - Add comments suggested by reviewers - Respond to reviewer feedback In testing suggested refinements, I discovered a bug in original implementation. ShenandoahFreeSet::capacity() does not represent the size of young generation. It represents the total size of the young regions that had available memory at the time we most recently rebuilt the ShenandoahFreeSet. I am rerunning the performance tests following this suggested change. - Use freeset to determine goodness of progress As previously implemented, we used the heap size to measure goodness of progress. However, heap size is only appropriate for non-generational Shenandoah. Freeset abstraction works for both. - Use size-of young generation to assess progress Previously, we were using size of heap to asses progress of generational degenerated cycle. But that is not appropriate, because the collection set is chosen based on the size of young generation. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23306/files - new: https://git.openjdk.org/jdk/pull/23306/files/8f644cdb..8c610136 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23306&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23306&range=02-03 Stats: 43531 lines in 2988 files changed: 18658 ins; 14204 del; 10669 mod Patch: https://git.openjdk.org/jdk/pull/23306.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23306/head:pull/23306 PR: https://git.openjdk.org/jdk/pull/23306 From kdnilsen at openjdk.org Tue Feb 11 18:21:09 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Feb 2025 18:21:09 GMT Subject: RFR: 8348594: Shenandoah: Do not penalize for degeneration when not the fault of triggering heuristic [v5] In-Reply-To: References: Message-ID: > Shenandoah heuristics use a penalty mechanism to cause earlier GC triggers when recent concurrent GC cycles degenerate. Degeneration is a stop-the-world remediation that allows GC to catch up when mutator allocations fail during concurrent GC. The fact that we needed to degenerate indicates that we were overly optimistic in delaying the trigger that starts concurrent GC. > > We have observed that it is common for degenerated GC cycles to cascade upon each other. The condition that caused an initial degenerated cycle is often not fully resolved by the end of that degenerated cycle. For example, the application may be experiencing a phase change and the GC heuristics are not yet attuned to the new behavior. Furthermore, a degenerated GC may exacerbate the problem condition. During the stop-the-world pause imposed by the first degenerated GC, work continues to accumulate in the form of new client requests that are buffered in network sockets until the end of that degenerated GC. > > As originally implemented, each degeneration would "pile on" additional penalties. These penalties cause the GC frequency to continue to increase. And the expanding CPU load of GC makes it increasingly difficult for mutator threads to catchup. The large penalties accumulated while we are trying to resolve the problem linger long after the problem condition has been resolved. > > This change does not add further to the degeneration penalties if a new degenerated cycle occurs through no fault of the triggering mechanism. We only add the degeneration penalty if the reason we are now degenerating can be attributed to a consciously late trigger by the heuristic. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Merge tag 'jdk-25+9' into eliminate-no-fault-degen-penalties Added tag jdk-25+9 for changeset 30f71622 - Revert "Use generation size to determine expected free" This reverts commit 94a32ebfe5fefcc0e899e09e6fbfc0585c62b4e0. - Respond to reviewer feedback - Use generation size to determine expected free - Respond to reviewer feedback - Fix white space - Remove debug instrumentation - Only penalize heuristic if heuristic responsible If we degenerate through no fault of "late triggering", then do not penalize the heuristic. - Eliminate no-fault degen penalties As originally implemented, we apply penalties to the triggering heuristic every time we experience a degenerated cycle. This has the effect of forcing GC triggers to spiral out of control. This commit changes the penalty mechanism. When a degen happens through no fault of the heuristic triggering mechanism, we do not pile on additional penalties. Specifically, we consider that heuristic triggering is not responsible for a degenerated cycle that is associated with a GC that began immediately following the end of the previous GC cycle. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23305/files - new: https://git.openjdk.org/jdk/pull/23305/files/ee7fe689..3aabd4db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23305&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23305&range=03-04 Stats: 43531 lines in 2988 files changed: 18658 ins; 14204 del; 10669 mod Patch: https://git.openjdk.org/jdk/pull/23305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23305/head:pull/23305 PR: https://git.openjdk.org/jdk/pull/23305 From iwalulya at openjdk.org Tue Feb 11 18:29:23 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 11 Feb 2025 18:29:23 GMT Subject: RFR: 8349836: G1: Improve group prediction log message In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 15:29:03 GMT, Thomas Schatzl wrote: > Hi all, > > please review this minor change to the group prediction log message > printed with gc+ergo+cset=trace: > > * add group id to be able to refer to something concrete when discussing results > * add total time > * fix typo in `bytes_to_cop` > > Testing: gha, local verification > > `Group 10: 5 regions prediction total_time 20.0ms card_rs_length 123456 merge_scan_time 10.2ms code_root_scan_time_ms 5.5ms evac_time_ms 3.7ms other_time 0.3ms bytes_to_copy 1234567` > > instead of > > `Prediction for group with 76 regions, card_rs_length 320, merge_scan_time 0.02ms, code_root_scan_time_ms 0.00ms, evac_time_ms 9.92ms, other_time 45.60ms, bytes_to_cop 61408560` > > Thanks, > Thomas Looks good! ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23562#pullrequestreview-2609634280 From phh at openjdk.org Tue Feb 11 18:51:16 2025 From: phh at openjdk.org (Paul Hohensee) Date: Tue, 11 Feb 2025 18:51:16 GMT Subject: RFR: 8348594: Shenandoah: Do not penalize for degeneration when not the fault of triggering heuristic [v5] In-Reply-To: References: Message-ID: <8Gt2wkVtRhYtPwLWfkuH8fWrboud7gjBRpCfzT2GeLw=.9e580aa0-34b7-4b7e-9ab7-f49cec2d3a6a@github.com> On Tue, 11 Feb 2025 18:21:09 GMT, Kelvin Nilsen wrote: >> Shenandoah heuristics use a penalty mechanism to cause earlier GC triggers when recent concurrent GC cycles degenerate. Degeneration is a stop-the-world remediation that allows GC to catch up when mutator allocations fail during concurrent GC. The fact that we needed to degenerate indicates that we were overly optimistic in delaying the trigger that starts concurrent GC. >> >> We have observed that it is common for degenerated GC cycles to cascade upon each other. The condition that caused an initial degenerated cycle is often not fully resolved by the end of that degenerated cycle. For example, the application may be experiencing a phase change and the GC heuristics are not yet attuned to the new behavior. Furthermore, a degenerated GC may exacerbate the problem condition. During the stop-the-world pause imposed by the first degenerated GC, work continues to accumulate in the form of new client requests that are buffered in network sockets until the end of that degenerated GC. >> >> As originally implemented, each degeneration would "pile on" additional penalties. These penalties cause the GC frequency to continue to increase. And the expanding CPU load of GC makes it increasingly difficult for mutator threads to catchup. The large penalties accumulated while we are trying to resolve the problem linger long after the problem condition has been resolved. >> >> This change does not add further to the degeneration penalties if a new degenerated cycle occurs through no fault of the triggering mechanism. We only add the degeneration penalty if the reason we are now degenerating can be attributed to a consciously late trigger by the heuristic. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Merge tag 'jdk-25+9' into eliminate-no-fault-degen-penalties > > Added tag jdk-25+9 for changeset 30f71622 > - Revert "Use generation size to determine expected free" > > This reverts commit 94a32ebfe5fefcc0e899e09e6fbfc0585c62b4e0. > - Respond to reviewer feedback > - Use generation size to determine expected free > - Respond to reviewer feedback > - Fix white space > - Remove debug instrumentation > - Only penalize heuristic if heuristic responsible > > If we degenerate through no fault of "late triggering", then do not > penalize the heuristic. > - Eliminate no-fault degen penalties > > As originally implemented, we apply penalties to the triggering > heuristic every time we experience a degenerated cycle. This has the > effect of forcing GC triggers to spiral out of control. This commit > changes the penalty mechanism. When a degen happens through no fault of > the heuristic triggering mechanism, we do not pile on additional > penalties. Specifically, we consider that heuristic triggering is not > responsible for a degenerated cycle that is associated with a GC that > began immediately following the end of the previous GC cycle. Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23305#pullrequestreview-2609684439 From iwalulya at openjdk.org Tue Feb 11 19:14:32 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 11 Feb 2025 19:14:32 GMT Subject: RFR: 8349688: Crash assert(!_g1h->heap_region_containing(p)->is_survivor()) failed: Should have filtered out from-newly allocated survivor references already Message-ID: <8-jj1o3jeNZuavFIbg4VCh_oMXEJxP8vxqnz01cOVgg=.d7ea8a07-d5e6-46cc-9bdf-0f2e5c22632a@github.com> Hi, Please review this fix to the bug in setting a region index to the optional cset. The crash happens because the incorrect `_index_in_opt_cset` refers to a region that has already been evacuated. Testing: tier5-common-apps ------------- Commit messages: - set_index_in_opt_cset correctly Changes: https://git.openjdk.org/jdk/pull/23568/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23568&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349688 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23568.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23568/head:pull/23568 PR: https://git.openjdk.org/jdk/pull/23568 From wkemper at openjdk.org Tue Feb 11 19:31:09 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Feb 2025 19:31:09 GMT Subject: RFR: 8348268: Test gc/shenandoah/TestResizeTLAB.java#compact: fatal error: Before Updating References: Thread C2 CompilerThread1: expected gc-state 9, actual 21 [v2] In-Reply-To: <73lkaeIWH7aBWahNyU_czTYSnSmCMOURWYDv55-zc4Y=.39398a24-6904-465c-8e47-7bfe32efc9db@github.com> References: <73lkaeIWH7aBWahNyU_czTYSnSmCMOURWYDv55-zc4Y=.39398a24-6904-465c-8e47-7bfe32efc9db@github.com> Message-ID: <91rj68CdahMzjrRCIMEH0mR6CxmDQayALIYHXBykJ5c=.a4164dc3-79db-43e8-9a8a-c6216e826f5b@github.com> On Mon, 10 Feb 2025 21:54:51 GMT, William Kemper wrote: >> Non-java threads were not having their gc-state configured when they attach. If they were created before the verifier's safepoint, but after the iteration over non-java threads, they would not have the correct state. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Hold the thread lock when concurrently changing gc state That's right. The `on_thread_attach` callback and the thread being added to the list of threads _does_ happen under the `Thread_lock`, by the handshake mechanism (and the java thread iterator) do _not_ take the thread lock. In this particular assertion violation, the thread received a stale `gc_state` when it attached (before the control thread even entered `concurrent_prepare_for_update_refs`), however, the control thread executed the handshake _before_ the recently attached thread was actually added to the java thread list. I will update the comment and add an assert in `set_gc_state_concurrent`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23428#issuecomment-2651866466 From wkemper at openjdk.org Tue Feb 11 19:39:25 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Feb 2025 19:39:25 GMT Subject: RFR: 8348268: Test gc/shenandoah/TestResizeTLAB.java#compact: fatal error: Before Updating References: Thread C2 CompilerThread1: expected gc-state 9, actual 21 [v3] In-Reply-To: References: Message-ID: <-v8uprH0cQK06apB7HGbrHNO31cCmzOXxMiZB8ipWx4=.7ce5e5da-bcc8-4101-9bf8-23fb899d06c2@github.com> > Non-java threads were not having their gc-state configured when they attach. If they were created before the verifier's safepoint, but after the iteration over non-java threads, they would not have the correct state. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Update comments, add an assertion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23428/files - new: https://git.openjdk.org/jdk/pull/23428/files/1a4e3bb1..c57bf8a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23428&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23428&range=01-02 Stats: 11 lines in 1 file changed: 6 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23428.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23428/head:pull/23428 PR: https://git.openjdk.org/jdk/pull/23428 From shade at openjdk.org Tue Feb 11 20:11:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 11 Feb 2025 20:11:10 GMT Subject: RFR: 8348268: Test gc/shenandoah/TestResizeTLAB.java#compact: fatal error: Before Updating References: Thread C2 CompilerThread1: expected gc-state 9, actual 21 [v3] In-Reply-To: <-v8uprH0cQK06apB7HGbrHNO31cCmzOXxMiZB8ipWx4=.7ce5e5da-bcc8-4101-9bf8-23fb899d06c2@github.com> References: <-v8uprH0cQK06apB7HGbrHNO31cCmzOXxMiZB8ipWx4=.7ce5e5da-bcc8-4101-9bf8-23fb899d06c2@github.com> Message-ID: On Tue, 11 Feb 2025 19:39:25 GMT, William Kemper wrote: >> Non-java threads were not having their gc-state configured when they attach. If they were created before the verifier's safepoint, but after the iteration over non-java threads, they would not have the correct state. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Update comments, add an assertion Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23428#pullrequestreview-2609916144 From wkemper at openjdk.org Tue Feb 11 20:23:14 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Feb 2025 20:23:14 GMT Subject: Integrated: 8348268: Test gc/shenandoah/TestResizeTLAB.java#compact: fatal error: Before Updating References: Thread C2 CompilerThread1: expected gc-state 9, actual 21 In-Reply-To: References: Message-ID: <9AZBJik8xf6tZdYSesYFvrlDs6Z8tDbEZkXvQz7Cm6s=.cb15a767-4988-40df-b87e-2e1868a15752@github.com> On Mon, 3 Feb 2025 20:28:58 GMT, William Kemper wrote: > Non-java threads were not having their gc-state configured when they attach. Additionally, we need to hold the `Threads_lock` when concurrently changing the gc state to make sure that any stale gc state observed when the thread `attaches` is fixed by the handshake when the thread list is iterated. This pull request has now been integrated. Changeset: 8c09d40d Author: William Kemper URL: https://git.openjdk.org/jdk/commit/8c09d40d6c345fda9fc7b358a53cae3b5965580b Stats: 22 lines in 2 files changed: 16 ins; 1 del; 5 mod 8348268: Test gc/shenandoah/TestResizeTLAB.java#compact: fatal error: Before Updating References: Thread C2 CompilerThread1: expected gc-state 9, actual 21 Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/23428 From wkemper at openjdk.org Tue Feb 11 23:01:58 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Feb 2025 23:01:58 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v8] In-Reply-To: References: Message-ID: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). William Kemper has updated the pull request incrementally with one additional commit since the last revision: Make shutdown safer for threads requesting (or expecting) gc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23475/files - new: https://git.openjdk.org/jdk/pull/23475/files/861ed699..047d6ffa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=06-07 Stats: 35 lines in 5 files changed: 9 ins; 18 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From dholmes at openjdk.org Wed Feb 12 02:51:12 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 12 Feb 2025 02:51:12 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker @albertnetymk I think that to get the correct "dekker duality" in this code you do need to have full fences between the stores and loads, not just a `storeload` barrier. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2610698148 From amitkumar at openjdk.org Wed Feb 12 03:08:21 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 12 Feb 2025 03:08:21 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v4] In-Reply-To: References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> <-aHCYC9iVc4eMZ3pMfiDpqaW-wGM_s3zRMiVBWoadCM=.910336cd-3be2-45b5-9874-63b71abf38f8@github.com> Message-ID: On Fri, 7 Feb 2025 14:52:42 GMT, Roberto Casta?eda Lozano wrote: >>> I see TestG1BarrierGeneration.java failure :( >>> >>> [TestG1BarrierGeneration_jtr.log](https://github.com/user-attachments/files/18676532/TestG1BarrierGeneration_jtr.log) >> >> @offamitkumar thanks for the report! Most likely the test failures are only due to missing optimizations (because of limitations in the barrier elision pattern matching analysis), but if you want me to confirm please send the entire jtreg log, without truncation. You can disable output truncation running the test like this: >> `make run-test TEST="compiler/gcbarriers/TestG1BarrierGeneration.java" JTREG="MAX_OUTPUT=999999999"` >> Please double-check that the output log file does not contain any `Output overflow` message. > >> @robcasloz Sure: >> >> I can spend time on it, maybe on weekend, for now I am overloaded with some other tasks. >> >> [TestG1BarrierGeneration_jtr_no_overflow.log](https://github.com/user-attachments/files/18706090/TestG1BarrierGeneration_jtr_no_overflow.log) > > Thanks Amit, I had a look and the failures are indeed due to missing barrier elisions for atomic operations on newly created objects, which is suboptimal but safe (and in practice unlikely to make a noticeable performance difference). I just disabled IR checks for the two affected tests on s390 by now (commit 956e0ac5). The issue is likely due to limitations in the pattern matching logic of barrier elision, but I do not have the proper means to debug it on s390. If you find a solution before this changeset is fully reviewed, feel free to propose a patch and I will merge it into the changeset. Otherwise, it can always be done as follow-up work. Hope this works for you! > > @robcasloz Sure: > > I can spend time on it, maybe on weekend, for now I am overloaded with some other tasks. > > [TestG1BarrierGeneration_jtr_no_overflow.log](https://github.com/user-attachments/files/18706090/TestG1BarrierGeneration_jtr_no_overflow.log) > > Thanks Amit, I had a look and the failures are indeed due to missing barrier elisions for atomic operations on newly created objects, which is suboptimal but safe (and in practice unlikely to make a noticeable performance difference). I just disabled IR checks for the two affected tests on s390 by now (commit [956e0ac](https://github.com/openjdk/jdk/commit/956e0ac5a7d580ad0e8850cfc4497da77cdb525c)). The issue is likely due to limitations in the pattern matching logic of barrier elision, but I do not have the proper means to debug it on s390. If you find a solution before this changeset is fully reviewed, feel free to propose a patch and I will merge it into the changeset. Otherwise, it can always be done as follow-up work. Hope this works for you! Thanks @robcasloz. Yes sure, that's works totally for us. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23235#issuecomment-2652551761 From tschatzl at openjdk.org Wed Feb 12 11:52:22 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Feb 2025 11:52:22 GMT Subject: RFR: 8349906: G1: Improve initial survivor rate for newly used young regions Message-ID: Hi all, please review this change that tries to improve the survivor rate initial values for newly expanded regions. Currently G1 uses `InitialSurvivorRate` as survivor rate for such regions, but it is typically a pretty bad choice because * it's rather conservative, estimating that 40% of region contents will survive * such a conservative value is kind of bad particularly in cases for regions that are expanded late in the mutator phase because they are not frequently updated (and with our running weights changes get propagated over a very long time), i.e. this 40% sticks for a long time * it is a random value, i.e. not particularly specific to the application. The suggestion is to use the survivor rate for the last region we know the survivor rate already. Testing: gha, tier1-7 (with other changes) Hth, Thomas ------------- Commit messages: - * remove whitespace - 8349906 Changes: https://git.openjdk.org/jdk/pull/23584/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23584&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349906 Stats: 12 lines in 1 file changed: 8 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23584.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23584/head:pull/23584 PR: https://git.openjdk.org/jdk/pull/23584 From tschatzl at openjdk.org Wed Feb 12 11:57:29 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Feb 2025 11:57:29 GMT Subject: RFR: 8349476: G1: Regularly print CPU usage by thread type Message-ID: Hi all, please review this change to print total cpu usage per worker thread group every gc (with `gc+cpu=debug`) to have a better overview about which threads are taking how much CPU. I considered merging with the gc worker perfcounter update close by, but the opportunity to share code is very little, and the resulting code would be riddled with checking whether the perf counters should be updated or not. I.e. the only shared code would be the closure with a one-liner calling `os::thread_cpu_time()`; most other code would be different, e.g. determining whether to update the perf counters or not, the actual log messages etc. Tell me if you think I should try harder to do so. Testing: gha Thanks, Thomas ------------- Commit messages: - 8349476 Changes: https://git.openjdk.org/jdk/pull/23585/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23585&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349476 Stats: 43 lines in 2 files changed: 43 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23585/head:pull/23585 PR: https://git.openjdk.org/jdk/pull/23585 From ayang at openjdk.org Wed Feb 12 13:06:09 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 12 Feb 2025 13:06:09 GMT Subject: RFR: 8349836: G1: Improve group prediction log message In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 15:29:03 GMT, Thomas Schatzl wrote: > Hi all, > > please review this minor change to the group prediction log message > printed with gc+ergo+cset=trace: > > * add group id to be able to refer to something concrete when discussing results > * add total time > * fix typo in `bytes_to_cop` > > Testing: gha, local verification > > `Group 10: 5 regions prediction total_time 20.0ms card_rs_length 123456 merge_scan_time 10.2ms code_root_scan_time_ms 5.5ms evac_time_ms 3.7ms other_time 0.3ms bytes_to_copy 1234567` > > instead of > > `Prediction for group with 76 regions, card_rs_length 320, merge_scan_time 0.02ms, code_root_scan_time_ms 0.00ms, evac_time_ms 9.92ms, other_time 45.60ms, bytes_to_cop 61408560` > > Thanks, > Thomas I feel sth like `Prediction for Group X (Y regions): total_time ...` reads slightly better. YMMV. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23562#pullrequestreview-2611850553 From iwalulya at openjdk.org Wed Feb 12 13:53:22 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 12 Feb 2025 13:53:22 GMT Subject: RFR: 8349783: g1RemSetSummary.cpp:344:68: runtime error: member call on null pointer of type 'struct G1HeapRegion' Message-ID: Hi, Please review this cleanup to remove dead code. Testing: local testing with --enable-ubsan ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/23587/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23587&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8349783 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23587.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23587/head:pull/23587 PR: https://git.openjdk.org/jdk/pull/23587 From tschatzl at openjdk.org Wed Feb 12 15:35:13 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Feb 2025 15:35:13 GMT Subject: RFR: 8349783: g1RemSetSummary.cpp:344:68: runtime error: member call on null pointer of type 'struct G1HeapRegion' In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 13:35:23 GMT, Ivan Walulya wrote: > Hi, > > Please review this cleanup to remove dead code. > > Testing: local testing with --enable-ubsan lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23587#pullrequestreview-2612346685 From tschatzl at openjdk.org Wed Feb 12 15:42:49 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Feb 2025 15:42:49 GMT Subject: RFR: 8349836: G1: Improve group prediction log message [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this minor change to the group prediction log message > printed with gc+ergo+cset=trace: > > * add group id to be able to refer to something concrete when discussing results > * add total time > * fix typo in `bytes_to_cop` > > Testing: gha, local verification > > `Group 10: 5 regions prediction total_time 20.0ms card_rs_length 123456 merge_scan_time 10.2ms code_root_scan_time_ms 5.5ms evac_time_ms 3.7ms other_time 0.3ms bytes_to_copy 1234567` > > instead of > > `Prediction for group with 76 regions, card_rs_length 320, merge_scan_time 0.02ms, code_root_scan_time_ms 0.00ms, evac_time_ms 9.92ms, other_time 45.60ms, bytes_to_cop 61408560` > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23562/files - new: https://git.openjdk.org/jdk/pull/23562/files/3c9cd94f..21139f21 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23562&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23562&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23562.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23562/head:pull/23562 PR: https://git.openjdk.org/jdk/pull/23562 From ayang at openjdk.org Wed Feb 12 16:02:12 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 12 Feb 2025 16:02:12 GMT Subject: RFR: 8349836: G1: Improve group prediction log message [v2] In-Reply-To: References: Message-ID: <12JEDfdeTFH0pC_2-b284HXb5Wd417w2AvCWmXycO2I=.736a9a50-818e-43f7-ae13-39b657e3a606@github.com> On Wed, 12 Feb 2025 15:42:49 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this minor change to the group prediction log message >> printed with gc+ergo+cset=trace: >> >> * add group id to be able to refer to something concrete when discussing results >> * add total time >> * fix typo in `bytes_to_cop` >> >> Testing: gha, local verification >> >> `Group 10: 5 regions prediction total_time 20.0ms card_rs_length 123456 merge_scan_time 10.2ms code_root_scan_time_ms 5.5ms evac_time_ms 3.7ms other_time 0.3ms bytes_to_copy 1234567` >> >> instead of >> >> `Prediction for group with 76 regions, card_rs_length 320, merge_scan_time 0.02ms, code_root_scan_time_ms 0.00ms, evac_time_ms 9.92ms, other_time 45.60ms, bytes_to_cop 61408560` >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > ayang review Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23562#pullrequestreview-2612435747 From ayang at openjdk.org Wed Feb 12 16:05:13 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 12 Feb 2025 16:05:13 GMT Subject: RFR: 8349783: g1RemSetSummary.cpp:344:68: runtime error: member call on null pointer of type 'struct G1HeapRegion' In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 13:35:23 GMT, Ivan Walulya wrote: > Hi, > > Please review this cleanup to remove dead code. > > Testing: local testing with --enable-ubsan Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23587#pullrequestreview-2612445688 From tschatzl at openjdk.org Wed Feb 12 16:14:18 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Feb 2025 16:14:18 GMT Subject: RFR: 8349836: G1: Improve group prediction log message [v2] In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 18:26:15 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> ayang review > > Looks good! Thanks @walulyai @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/23562#issuecomment-2654184213 From tschatzl at openjdk.org Wed Feb 12 16:14:19 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Feb 2025 16:14:19 GMT Subject: Integrated: 8349836: G1: Improve group prediction log message In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 15:29:03 GMT, Thomas Schatzl wrote: > Hi all, > > please review this minor change to the group prediction log message > printed with gc+ergo+cset=trace: > > * add group id to be able to refer to something concrete when discussing results > * add total time > * fix typo in `bytes_to_cop` > > Testing: gha, local verification > > `Group 10: 5 regions prediction total_time 20.0ms card_rs_length 123456 merge_scan_time 10.2ms code_root_scan_time_ms 5.5ms evac_time_ms 3.7ms other_time 0.3ms bytes_to_copy 1234567` > > instead of > > `Prediction for group with 76 regions, card_rs_length 320, merge_scan_time 0.02ms, code_root_scan_time_ms 0.00ms, evac_time_ms 9.92ms, other_time 45.60ms, bytes_to_cop 61408560` > > Thanks, > Thomas This pull request has now been integrated. Changeset: 73e1780a Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/73e1780ad0aba92ce60bb35fc66a395abccbf57e Stats: 12 lines in 1 file changed: 7 ins; 3 del; 2 mod 8349836: G1: Improve group prediction log message Reviewed-by: ayang, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/23562 From tschatzl at openjdk.org Wed Feb 12 17:46:14 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Feb 2025 17:46:14 GMT Subject: RFR: 8349476: G1: Regularly print CPU usage by thread type In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 11:52:50 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change to print total cpu usage per worker thread group every gc (with `gc+cpu=debug`) to have a better overview about which threads are taking how much CPU. > > I considered merging with the gc worker perfcounter update close by, but the opportunity to share code is very little, and the resulting code would be riddled with checking whether the perf counters should be updated or not. > > I.e. the only shared code would be the closure with a one-liner calling `os::thread_cpu_time()`; most other code would be different, e.g. determining whether to update the perf counters or not, the actual log messages etc. > > Tell me if you think I should try harder to do so. > > Testing: gha > > Thanks, > Thomas Another alternative is just not doing this change: this information can be retrieved using the VM performance counters (and sufficient for the purposes I need it) too, although only available if `-XX:+UsePerfData` is enabled. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23585#issuecomment-2654432360 From tschatzl at openjdk.org Wed Feb 12 17:53:16 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Feb 2025 17:53:16 GMT Subject: Withdrawn: 8349476: G1: Regularly print CPU usage by thread type In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 11:52:50 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change to print total cpu usage per worker thread group every gc (with `gc+cpu=debug`) to have a better overview about which threads are taking how much CPU. > > I considered merging with the gc worker perfcounter update close by, but the opportunity to share code is very little, and the resulting code would be riddled with checking whether the perf counters should be updated or not. > > I.e. the only shared code would be the closure with a one-liner calling `os::thread_cpu_time()`; most other code would be different, e.g. determining whether to update the perf counters or not, the actual log messages etc. > > Tell me if you think I should try harder to do so. > > Testing: gha > > Thanks, > Thomas This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23585 From tschatzl at openjdk.org Wed Feb 12 17:53:16 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Feb 2025 17:53:16 GMT Subject: RFR: 8349476: G1: Regularly print CPU usage by thread type In-Reply-To: References: Message-ID: <7hh_XTkI-xs4dJe3s5cxb254-F_24CybdUD3YV2kYxA=.bbaa63f2-0898-4162-95c8-e85f20258418@github.com> On Wed, 12 Feb 2025 11:52:50 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change to print total cpu usage per worker thread group every gc (with `gc+cpu=debug`) to have a better overview about which threads are taking how much CPU. > > I considered merging with the gc worker perfcounter update close by, but the opportunity to share code is very little, and the resulting code would be riddled with checking whether the perf counters should be updated or not. > > I.e. the only shared code would be the closure with a one-liner calling `os::thread_cpu_time()`; most other code would be different, e.g. determining whether to update the perf counters or not, the actual log messages etc. > > Tell me if you think I should try harder to do so. > > Testing: gha > > Thanks, > Thomas Retracting, I think there is too little gain here, and the change isn't that nice either. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23585#issuecomment-2654449922 From wkemper at openjdk.org Wed Feb 12 21:12:39 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Feb 2025 21:12:39 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v9] In-Reply-To: References: Message-ID: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). William Kemper has updated the pull request incrementally with one additional commit since the last revision: Improve message for assertion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23475/files - new: https://git.openjdk.org/jdk/pull/23475/files/047d6ffa..779492c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From kirk at kodewerk.com Wed Feb 12 22:08:09 2025 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Wed, 12 Feb 2025 14:08:09 -0800 Subject: Configurable G1 heap expansion aggressiveness In-Reply-To: References: Message-ID: Hi Jaroslaw, This work is inline with work that we are doing with the Serial collector and Oracle has done with ZGC. Also, Google has started on this work with G1. Very briefly, our thinking is that we should be able to set max heap size to the amount of available memory, be that constrained by the CGroup or by the physical machine. Second to this is an awareness and ability to react to global memory pressure. IOWs, GC ergonomics will be aware of what is happening on the machine in addition to what is happening in the JVM and use that information to guide heap sizing. Properly resourced, JVMs should cooperate. If a deployment is under-resources, GC overheads will likely be higher than desired but JVMs should still cooperate to ensure some form of fair share (at the expense of tail latencies) to avoid OOM kills. gen-ZGC has a JEP, we will propose one shortly. Kind regards, Kirk Pepperdine > On Feb 9, 2025, at 11:54?AM, Jaroslaw Odzga wrote: > > Context and Motivation > In multi-tenant environments e.g. Kubernetes clusters in cloud > environments there is a strong incentive to use as little memory as > possible. Lower memory usage means more processes can be packed on a > single VM which directly translates to lower cloud cost. > Configuring G1 heap size in this setup is currently challenging. On > the one hand we would like to set the max heap size to a high value so > that application doesn?t fail with heap OOME when faced with > unexpectedly high load or organic growth. On the other hand we need to > set max heap size to as small a value as possible because G1 is very > eager to expand heap even when tuned to collect garbage aggressively. > > Ideally, we would like to: > - Set the initial heap size to a small value. > - Set the max heap size to a value larger than expected usage so that > application can handle unexpected load and organic growth. > - Configure G1 GC to not expand heap aggressively. This is currently > not possible. > > We propose two new JVM G1 flags that would give us more control over > G1 heap expansion aggressiveness and realize significant cost savings > in multi-tenant environments. > At the same time we don?t want to change existing G1 behavior - with > default values of the new flags current G1 behavior would be > maintained. > > Analysis > Currently even with very aggressive G1 configuration such as: > -XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20 > -XX:GCTimeRatio=4 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60 > the heap is fairly eagerly expanded. > > We found two culprits responsible for this in > G1HeapSizingPolicy::young_collection_expansion_amount() function. > First, the scale_with_heap() function makes pause_time_threshold small > in cases where current heap size is smaller than 1/2 of max heap size. > While it is likely a desired behavior in many situations, it also > causes memory usage spikes in situations where max heap size is much > larger than current heap size. > Second, the MinOverThresholdForGrowth constant equal to 4 is an > arbitrary value which hardcodes the heap expansion aggressiveness. We > observed that short_term_pause_time_ratio can exceed > pause_time_threshold and trigger heap expansion too eagerly in many > situations, especially when allocation rate is spiky. > > Proposal > We would like to introduce two new experimental flags: > - G1ScaleWithHeapPauseTimeThreshold: a binary flag that would allow > disabling scale_with_heap() > - G1MinPausesOverThresholdForGrowth: a value between 1 and 10, a > configurable replacement for the MinOverThresholdForGrowth constant. > > We don?t want to change the default behavior of G1. Default values for > these flags (G1ScaleWithHeapPauseTimeThreshold=true, > G1MinPausesOverThresholdForGrowth=4) would maintain the existing > behavior. > > Alternatives > There is currently no good alternative. Potentially we could configure > G1 aggressively to trigger GC very frequently e.g.: > -XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20 > -XX:GCTimeRatio=4 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60 > Even with this configuration we see occasional large memory spikes > where heap is quickly expanded. Even though the expanded heap > contracts eventually, this poses a significant problem because in > practice we don?t know if such a spike could have been avoided so it > is not obvious how much memory the application really needs. Of course > such configuration would also consume more CPU. > > Experimental results > We tested this change on patched jdk17. > With new flags we can use far less aggressive -XX:GCTimeRatio=9 > together with -XX:-G1ScaleWithHeapPauseTimeThreshold and > -XX:G1MinPausesOverThresholdForGrowth=10 (this effectively disables > heap expansion based on short time pause ratio and only depends on > long time pause ratio). > Compared to more aggressive G1 configuration mentioned above we see > lower CPU usage, and 30%-60% lower max memory usage. > > Implementation > https://github.com/openjdk/jdk/pull/23534 From wkemper at openjdk.org Thu Feb 13 00:20:40 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 13 Feb 2025 00:20:40 GMT Subject: RFR: 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) Message-ID: Restore weak roots rendezvous handshake. This is necessary to have mutators complete the LRB before the concurrent GC invalidates any oop handles that may exist in native stacks. ------------- Commit messages: - Restore weak roots rendezvous handshake Changes: https://git.openjdk.org/jdk/pull/23604/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23604&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8348092 Stats: 19 lines in 1 file changed: 14 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23604/head:pull/23604 PR: https://git.openjdk.org/jdk/pull/23604 From shade at openjdk.org Thu Feb 13 08:34:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 13 Feb 2025 08:34:14 GMT Subject: RFR: 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 00:15:43 GMT, William Kemper wrote: > Restore weak roots rendezvous handshake. This is necessary to have mutators complete the LRB before the concurrent GC invalidates any oop handles that may exist in native stacks. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23604#pullrequestreview-2614219213 From ayang at openjdk.org Thu Feb 13 09:23:27 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 13 Feb 2025 09:23:27 GMT Subject: RFR: 8348171: Refactor GenerationCounters and its subclasses [v5] In-Reply-To: <7otkT63ENoyKzZ29CbYpycLLwL89ARajYg36Mstz4tQ=.fd3c7dcf-5a8b-44be-9205-09e3d160d54d@github.com> References: <7otkT63ENoyKzZ29CbYpycLLwL89ARajYg36Mstz4tQ=.fd3c7dcf-5a8b-44be-9205-09e3d160d54d@github.com> Message-ID: On Tue, 11 Feb 2025 15:28:25 GMT, Albert Mingkun Yang wrote: >> Simple refactoring of removing the use of `virtual` method and use concrete subclasses when needed. >> >> Test: tier1-5 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into gen-counter > - review > - * some more refactoring > - review > - Merge branch 'master' into gen-counter > - merge > - gen-counter Any suggestions/comments/objections from Shenandoah team? I'd like to merge this patch, if none. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23209#issuecomment-2655989267 From iwalulya at openjdk.org Thu Feb 13 09:49:27 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 13 Feb 2025 09:49:27 GMT Subject: RFR: 8349783: g1RemSetSummary.cpp:344:68: runtime error: member call on null pointer of type 'struct G1HeapRegion' In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 16:03:03 GMT, Albert Mingkun Yang wrote: >> Hi, >> >> Please review this cleanup to remove dead code. >> >> Testing: local testing with --enable-ubsan > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk and @tschatzl for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23587#issuecomment-2656050875 From iwalulya at openjdk.org Thu Feb 13 09:49:28 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 13 Feb 2025 09:49:28 GMT Subject: Integrated: 8349783: g1RemSetSummary.cpp:344:68: runtime error: member call on null pointer of type 'struct G1HeapRegion' In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 13:35:23 GMT, Ivan Walulya wrote: > Hi, > > Please review this cleanup to remove dead code. > > Testing: local testing with --enable-ubsan This pull request has now been integrated. Changeset: 24b7f815 Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/24b7f815ae4ca2a228dff2694993b5ebc2192382 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8349783: g1RemSetSummary.cpp:344:68: runtime error: member call on null pointer of type 'struct G1HeapRegion' Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.org/jdk/pull/23587 From iwalulya at openjdk.org Thu Feb 13 10:12:14 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 13 Feb 2025 10:12:14 GMT Subject: RFR: 8349906: G1: Improve initial survivor rate for newly used young regions In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 10:55:46 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that tries to improve the survivor rate initial values for newly expanded regions. > > Currently G1 uses `InitialSurvivorRate` as survivor rate for such regions, but it is typically a pretty bad choice because > > * it's rather conservative, estimating that 40% of region contents will survive > * such a conservative value is kind of bad particularly in cases for regions that are expanded late in the mutator phase because they are not frequently updated (and with our running weights changes get propagated over a very long time), i.e. this 40% sticks for a long time > * it is a random value, i.e. not particularly specific to the application. > > The suggestion is to use the survivor rate for the last region we know the survivor rate already. > > Testing: gha, tier1-7 (with other changes) > > Hth, > Thomas Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23584#pullrequestreview-2614502592 From thomas.schatzl at oracle.com Thu Feb 13 10:48:51 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 13 Feb 2025 11:48:51 +0100 Subject: Configurable G1 heap expansion aggressiveness In-Reply-To: References: Message-ID: <553f4d95-14a4-4736-b10d-02b8bb3af686@oracle.com> Hi Jaroslaw, thank you for contributing and speaking up with an itch of yours! The motivation, and analysis are spot on: we agree that the aggressiveness of G1 heap expansion paired with reluctance to give back memory can make it hard to configure G1 as you would want in this situation. However we do not think that the proposed solution (adding even more customizability) is where we want to go. More background below, inline: On 09.02.25 20:54, Jaroslaw Odzga wrote: > Context and Motivation > In multi-tenant environments e.g. Kubernetes clusters in cloud > environments there is a strong incentive to use as little memory as > possible. Lower memory usage means more processes can be packed on a > single VM which directly translates to lower cloud cost. > Configuring G1 heap size in this setup is currently challenging. On > the one hand we would like to set the max heap size to a high value so > that application doesn?t fail with heap OOME when faced with > unexpectedly high load or organic growth. On the other hand we need to > set max heap size to as small a value as possible because G1 is very > eager to expand heap even when tuned to collect garbage aggressively. > > Ideally, we would like to: > - Set the initial heap size to a small value. > - Set the max heap size to a value larger than expected usage so that > application can handle unexpected load and organic growth. > - Configure G1 GC to not expand heap aggressively. This is currently > not possible. > > We propose two new JVM G1 flags that would give us more control over > G1 heap expansion aggressiveness and realize significant cost savings > in multi-tenant environments. Understood. We are generally very reluctant in exposing more flags in basically any collector due to maintenance overhead. We understand that these are experimental flags that can be removed at a whim, but still doing that if/when they are in use is awkward. > At the same time we don?t want to change existing G1 behavior - with > default values of the new flags current G1 behavior would be > maintained. > > Analysis > Currently even with very aggressive G1 configuration such as: > -XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20 > -XX:GCTimeRatio=4 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60 > the heap is fairly eagerly expanded. > > We found two culprits responsible for this in > G1HeapSizingPolicy::young_collection_expansion_amount() function. > First, the scale_with_heap() function makes pause_time_threshold small > in cases where current heap size is smaller than 1/2 of max heap size. > While it is likely a desired behavior in many situations, it also > causes memory usage spikes in situations where max heap size is much > larger than current heap size. > Second, the MinOverThresholdForGrowth constant equal to 4 is an > arbitrary value which hardcodes the heap expansion aggressiveness. We > observed that short_term_pause_time_ratio can exceed > pause_time_threshold and trigger heap expansion too eagerly in many > situations, especially when allocation rate is spiky. > > Proposal > We would like to introduce two new experimental flags: > - G1ScaleWithHeapPauseTimeThreshold: a binary flag that would allow > disabling scale_with_heap() > - G1MinPausesOverThresholdForGrowth: a value between 1 and 10, a > configurable replacement for the MinOverThresholdForGrowth constant. > > We don?t want to change the default behavior of G1. Default values for > these flags (G1ScaleWithHeapPauseTimeThreshold=true, > G1MinPausesOverThresholdForGrowth=4) would maintain the existing > behavior. > > Alternatives > There is currently no good alternative. Potentially we could configure > G1 aggressively to trigger GC very frequently e.g.: > -XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20 > -XX:GCTimeRatio=4 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60 > Even with this configuration we see occasional large memory spikes > where heap is quickly expanded. Even though the expanded heap > contracts eventually, this poses a significant problem because in > practice we don?t know if such a spike could have been avoided so it > is not obvious how much memory the application really needs. Of course > such configuration would also consume more CPU. The suggestion changes a) the aggressiveness of expansion if it has been decided that G1 should expand (G1ScaleWithHeapPauseTimeThreshold); looking at this particular piece of code, this behavior actually seems strange and unexpected. I.e. given that the user sets a GCTimeRatio, for some reason allow G1 to basically override it to a large extent. The reason is mostly historical: I collected thoughts in https://bugs.openjdk.org/browse/JDK-8349978. Note that just removing this behavior has quite a few unintended consequences as heap sizing is very much interconnected with general performance behavior. b) makes G1 more lazy about determining whether it needs to expand (G1MinPausesOverThresholdForGrowth) by increasing the number of consecutive GCs that GCTimeRatio needs to be over the threshold to cause expansion. (That's just exposing an internal constant :)) These changes cover expansion behavior, but not shrinking again. I believe that still the other slew of options mentioned above (-XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60) is needed to keep the heap stable and shrinking again over time (it may work with just changing GCTimeRatio in your particular case). That seems awfully complicated for an end user, and indicative of papering over the problem. We would like to avoid this. As Kirk in his other email in the thread indicates, there is work underway to make the VM (and G1) aware of other memory consumers in the VM. Not sure if that would also fix your problem in a more user friendly (and hopefully generic) way. Wouldn't the option to make G1 to keep GCTimeRatio better (e.g. https://bugs.openjdk.org/browse/JDK-8238687), and/or some configurable soft heap size goal (https://bugs.openjdk.org/browse/JDK-8236073) that the collector will keep also solve your issue while being easier to configure? (There're a lot of connected problems in the bug tracker, so make sure to follow related issues). Maybe you are interested and can find something to work on in that area; there has actually already been a lot of investigation (and some resulting, unfinished patches) in that area, so feel free to ask. Thanks, Thomas Fwiw, we tried to label issues related to this area, see https://bugs.openjdk.org/issues/?jql=labels%20%3D%20gc-g1-heap-resizing . From jarek.odzga at gmail.com Thu Feb 13 13:24:17 2025 From: jarek.odzga at gmail.com (Jaroslaw Odzga) Date: Thu, 13 Feb 2025 05:24:17 -0800 Subject: Configurable G1 heap expansion aggressiveness In-Reply-To: <553f4d95-14a4-4736-b10d-02b8bb3af686@oracle.com> References: <553f4d95-14a4-4736-b10d-02b8bb3af686@oracle.com> Message-ID: Thank you Kirk and Thomas for your answers! What Kirk describes sounds great, is the right long term approach and I can't wait for it to be shipped. It also sounds like a feature we might need to wait for a while (please correct me if I am wrong). My proposal is just a tiny stopgap that might help alleviate some of the problems but does not attempt to be a holistic solution and, as you pointed out, has downsides. I totally agree with your assessment: it is just exposing internal constants but the fact that these are constants is part of the problem because they bake in an eager heap expansion behavior which is not necessarily desired. I share your reluctance to adding more obscure tuning flags: it has maintenance cost and a risk of misuse. I would not recommend anyone tuning these flags without reading the source code and understanding the tradeoffs. These are not silver bullets and, as you pointed out, probably would have to be used together with other tuning parameters to achieve reasonable results. To clarify, the way we plan to use these flags is to establish a constant set of tuning parameters that achieve a good tradeoff between latency, throughput and footprint and apply it to a large number of services. We want to avoid tuning each service individually because it is hard to scale. Example configuration (used with jdk17): -XX:+UnlockExperimentalVMOptions -XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=60000 -XX:G1PeriodicGCSystemLoadThreshold=0 -XX:GCTimeRatio=9 -XX:G1MixedGCLiveThresholdPercent=85 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60 -XX:MaxGCPauseMillis=200 -XX:GCPauseIntervalMillis=1000 -XX:-G1UsePreventiveGC -XX:-G1ScaleWithHeapPauseTimeThreshold -XX:G1MinPausesOverThresholdForGrowth=10 >From experiments so far it seems that we can leave the adaptive IHOP on because even if it mispredicted, e.g. due to allocation spikes, the heap is not aggressively expanded. On the plus side, the change itself is tiny, very localized and could be trivially backported e.g. all the way to jdk17. Most importantly, it seems to enable significant cost savings. At the end of the day it is a tradeoff. Would it help if I provided examples of the impact this change had on real life applications? At Databricks we run hundreds of JVM services and initial results are very promising. Or should I treat this proposal as officially rejected? > Wouldn't the option to make G1 to keep GCTimeRatio better (e.g. > https://bugs.openjdk.org/browse/JDK-8238687), and/or some configurable > soft heap size goal (https://bugs.openjdk.org/browse/JDK-8236073) that > the collector will keep also solve your issue while being easier to > configure? Thanks for sharing these. The JDK-8238687 focuses on uncommit while the heap expansion hurts the most. The SoftMaxHeapSize could be used as a building block towards a solution. I think there still would have to be some controller that adjusts the value of SoftMaxHeapSize based on GC behavior e.g. increase it when GC pressure is too high. Best regards, Jaroslaw On Thu, Feb 13, 2025 at 2:49?AM Thomas Schatzl wrote: > > Hi Jaroslaw, > > thank you for contributing and speaking up with an itch of yours! > > The motivation, and analysis are spot on: we agree that the > aggressiveness of G1 heap expansion paired with reluctance to give back > memory can make it hard to configure G1 as you would want in this situation. > > However we do not think that the proposed solution (adding even more > customizability) is where we want to go. > > More background below, inline: > > On 09.02.25 20:54, Jaroslaw Odzga wrote: > > Context and Motivation > > In multi-tenant environments e.g. Kubernetes clusters in cloud > > environments there is a strong incentive to use as little memory as > > possible. Lower memory usage means more processes can be packed on a > > single VM which directly translates to lower cloud cost. > > Configuring G1 heap size in this setup is currently challenging. On > > the one hand we would like to set the max heap size to a high value so > > that application doesn?t fail with heap OOME when faced with > > unexpectedly high load or organic growth. On the other hand we need to > > set max heap size to as small a value as possible because G1 is very > > eager to expand heap even when tuned to collect garbage aggressively. > > > > Ideally, we would like to: > > - Set the initial heap size to a small value. > > - Set the max heap size to a value larger than expected usage so that > > application can handle unexpected load and organic growth. > > - Configure G1 GC to not expand heap aggressively. This is currently > > not possible. > > > > We propose two new JVM G1 flags that would give us more control over > > G1 heap expansion aggressiveness and realize significant cost savings > > in multi-tenant environments. > > Understood. > > We are generally very reluctant in exposing more flags in basically any > collector due to maintenance overhead. We understand that these are > experimental flags that can be removed at a whim, but still doing that > if/when they are in use is awkward. > > > > At the same time we don?t want to change existing G1 behavior - with > > default values of the new flags current G1 behavior would be > > maintained. > > > > Analysis > > Currently even with very aggressive G1 configuration such as: > > -XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20 > > -XX:GCTimeRatio=4 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60 > > the heap is fairly eagerly expanded. > > > > We found two culprits responsible for this in > > G1HeapSizingPolicy::young_collection_expansion_amount() function. > > First, the scale_with_heap() function makes pause_time_threshold small > > in cases where current heap size is smaller than 1/2 of max heap size. > > While it is likely a desired behavior in many situations, it also > > causes memory usage spikes in situations where max heap size is much > > larger than current heap size. > > Second, the MinOverThresholdForGrowth constant equal to 4 is an > > arbitrary value which hardcodes the heap expansion aggressiveness. We > > observed that short_term_pause_time_ratio can exceed > > pause_time_threshold and trigger heap expansion too eagerly in many > > situations, especially when allocation rate is spiky. > > > > Proposal > > We would like to introduce two new experimental flags: > > - G1ScaleWithHeapPauseTimeThreshold: a binary flag that would allow > > disabling scale_with_heap() > > - G1MinPausesOverThresholdForGrowth: a value between 1 and 10, a > > configurable replacement for the MinOverThresholdForGrowth constant. > > > > We don?t want to change the default behavior of G1. Default values for > > these flags (G1ScaleWithHeapPauseTimeThreshold=true, > > G1MinPausesOverThresholdForGrowth=4) would maintain the existing > > behavior. > > > > Alternatives > > There is currently no good alternative. Potentially we could configure > > G1 aggressively to trigger GC very frequently e.g.: > > -XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20 > > -XX:GCTimeRatio=4 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60 > > Even with this configuration we see occasional large memory spikes > > where heap is quickly expanded. Even though the expanded heap > > contracts eventually, this poses a significant problem because in > > practice we don?t know if such a spike could have been avoided so it > > is not obvious how much memory the application really needs. Of course > > such configuration would also consume more CPU. > > The suggestion changes > > a) the aggressiveness of expansion if it has been decided that G1 should > expand (G1ScaleWithHeapPauseTimeThreshold); looking at this particular > piece of code, this behavior actually seems strange and unexpected. I.e. > given that the user sets a GCTimeRatio, for some reason allow G1 to > basically override it to a large extent. > > The reason is mostly historical: I collected thoughts in > https://bugs.openjdk.org/browse/JDK-8349978. > > Note that just removing this behavior has quite a few unintended > consequences as heap sizing is very much interconnected with general > performance behavior. > > b) makes G1 more lazy about determining whether it needs to expand > (G1MinPausesOverThresholdForGrowth) by increasing the number of > consecutive GCs that GCTimeRatio needs to be over the threshold to cause > expansion. > (That's just exposing an internal constant :)) > > > These changes cover expansion behavior, but not shrinking again. I > believe that still the other slew of options mentioned above > > (-XX:-G1UseAdaptiveIHOP -XX:InitiatingHeapOccupancyPercent=20 > -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60) > > is needed to keep the heap stable and shrinking again over time (it may > work with just changing GCTimeRatio in your particular case). > > That seems awfully complicated for an end user, and indicative of > papering over the problem. We would like to avoid this. > > > As Kirk in his other email in the thread indicates, there is work > underway to make the VM (and G1) aware of other memory consumers in the > VM. Not sure if that would also fix your problem in a more user friendly > (and hopefully generic) way. > > > > Wouldn't the option to make G1 to keep GCTimeRatio better (e.g. > https://bugs.openjdk.org/browse/JDK-8238687), and/or some configurable > soft heap size goal (https://bugs.openjdk.org/browse/JDK-8236073) that > the collector will keep also solve your issue while being easier to > configure? > > (There're a lot of connected problems in the bug tracker, so make sure > to follow related issues). > > Maybe you are interested and can find something to work on in that area; > there has actually already been a lot of investigation (and some > resulting, unfinished patches) in that area, so feel free to ask. > > Thanks, > Thomas > > Fwiw, we tried to label issues related to this area, see > https://bugs.openjdk.org/issues/?jql=labels%20%3D%20gc-g1-heap-resizing . From kirk at kodewerk.com Thu Feb 13 14:35:36 2025 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Thu, 13 Feb 2025 06:35:36 -0800 Subject: Configurable G1 heap expansion aggressiveness In-Reply-To: References: <553f4d95-14a4-4736-b10d-02b8bb3af686@oracle.com> Message-ID: <60612080-69DD-46DC-AA5B-ED078C3A3793@kodewerk.com> Hi Jaroslaw, > On Feb 13, 2025, at 5:24?AM, Jaroslaw Odzga wrote: > > Thank you Kirk and Thomas for your answers! > > What Kirk describes sounds great, is the right long term approach and > I can't wait for it to be shipped. It also sounds like a feature we > might need to wait for a while (please correct me if I am wrong). If you look at the ZGC code as a model I believe you?ll find that it?s something that can be achieved by making the appropriate adjustments to the ergonomics. So while the knowledge needed to make the changes is non-trivial, the actual coding effort isn?t something that makes this a ?long term approach?. Our decision to focus on Serial was two fold. First, work on G1 is already taking place and given the progress there we thought best to focus on the Serial collector. This is because the Serial collector is default for small deployments which are fairly common. I personally see AHS to be a stepping stone to being able to ?hibernate? idle JVMs, something that isn?t really possible at the moment. Being able to wake up a hibernated JVM should be far cheaper than spinning up a new one taking into account all of the container costs. The data that I?ve collected suggests that starting a JVM is only a small fraction of the total costs of spinning up a new container. And that doesn?t include warmup. The complication with the Serial collector is in how heap is structured and consequently, where data resides in memory after a collection cycle. We have rearranged where the generations reside so that ergonomics has the freedom resize individual generational spaces without having to take on the cost of copying data about to accommodate that resizing. This work will land as soon as I address Thomas?s concerns in the JBS. This work sets us up for the next steps which I believe should come more quickly now that we?ve set the foundation for it. What we?re looking to do is safely resize each generation according to its current needs while taking into account global memory pressure. In my experience, a lot more memory than is needed gets committed to Java heap simply to accommodate the current sizing policies. Resizing generational spaces individually allows us to end us with heap configurations that are currently unsafe. For example, it is common that GC log data tells me that Eden should be 2 or 3x the size of tenured. Currently, configuring Java heap to accommodate this need risks OOME being thrown or unnecessarily enlarging heap (Tenured) to safely allow for a much larger Eden. Getting this internal tuning right reduces both on GC overhead and memory footprint. This also allows us to easily completely collapse heap should a JVM become idle. While there are significant differences between G1 and the Serial collector, there are also similarities with the tuning strategies. In my opinion, the work needed for G1 is easier than it is for the Serial collector simply because of how Java heap is structured. That said, a tuning strategy for G1 is more complicated because the costs of transients is quite different in G1 than it is with the Serial/Parallel collectors. But I believe it is achievable using existing flags/structures and the addition of the SoftMaxHeapSize. If I might add, in large homogenous deployments, you?d think you?d see a 1 size fits all optimal GC configuration. Unfortunately my look into this has shown that there are often multiple optimal configurations. The only way to combat this is with smarter ergonomics in the runtime. > > My proposal is just a tiny stopgap that might help alleviate some of > the problems but does not attempt to be a holistic solution and, as > you pointed out, has downsides. > I totally agree with your assessment: it is just exposing internal > constants but the fact that these are constants is part of the problem > because they bake in an eager heap expansion behavior which is not > necessarily desired. > I share your reluctance to adding more obscure tuning flags: it has > maintenance cost and a risk of misuse. I would not recommend anyone > tuning these flags without reading the source code and understanding > the tradeoffs. > These are not silver bullets and, as you pointed out, probably would > have to be used together with other tuning parameters to achieve > reasonable results. > To clarify, the way we plan to use these flags is to establish a > constant set of tuning parameters that achieve a good tradeoff between > latency, throughput and footprint and apply it to a large number of > services. > We want to avoid tuning each service individually because it is hard > to scale. Example configuration (used with jdk17): > -XX:+UnlockExperimentalVMOptions -XX:+G1PeriodicGCInvokesConcurrent > -XX:G1PeriodicGCInterval=60000 -XX:G1PeriodicGCSystemLoadThreshold=0 > -XX:GCTimeRatio=9 -XX:G1MixedGCLiveThresholdPercent=85 > -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60 > -XX:MaxGCPauseMillis=200 -XX:GCPauseIntervalMillis=1000 > -XX:-G1UsePreventiveGC -XX:-G1ScaleWithHeapPauseTimeThreshold > -XX:G1MinPausesOverThresholdForGrowth=10 A nightmare that can be avoided with smarter ergonomics. > From experiments so far it seems that we can leave the adaptive IHOP > on because even if it mispredicted, e.g. due to allocation spikes, the > heap is not aggressively expanded. > > On the plus side, the change itself is tiny, very localized and could > be trivially backported e.g. all the way to jdk17. Most importantly, > it seems to enable significant cost savings. > > At the end of the day it is a tradeoff. Would it help if I provided > examples of the impact this change had on real life applications? At > Databricks we run hundreds of JVM services and initial results are > very promising. Or should I treat this proposal as officially > rejected? > >> Wouldn't the option to make G1 to keep GCTimeRatio better (e.g. >> https://bugs.openjdk.org/browse/JDK-8238687), and/or some configurable >> soft heap size goal (https://bugs.openjdk.org/browse/JDK-8236073) that >> the collector will keep also solve your issue while being easier to >> configure? > Thanks for sharing these. The JDK-8238687 focuses on uncommit while > the heap expansion hurts the most. > The SoftMaxHeapSize could be used as a building block towards a > solution. I think there still would have to be some controller that > adjusts the value of SoftMaxHeapSize based on GC behavior e.g. > increase it when GC pressure is too high. Having more data is always a good thing so I would welcome anything you can share. I pub?ed a table that suggests that GC CPU utilization, and not allocation rates, is a key metric to drive heap sizing. The other key metric is availability of RAM. Again, ZGC has this worked out so we?re integrating that work into ours. Kind regards, Kirk -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkemper at openjdk.org Thu Feb 13 16:37:21 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 13 Feb 2025 16:37:21 GMT Subject: Integrated: 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 00:15:43 GMT, William Kemper wrote: > Restore weak roots rendezvous handshake. This is necessary to have mutators complete the LRB before the concurrent GC invalidates any oop handles that may exist in native stacks. This pull request has now been integrated. Changeset: 28e744dc Author: William Kemper URL: https://git.openjdk.org/jdk/commit/28e744dc642db8ebe376403f28630438a5ee3f44 Stats: 19 lines in 1 file changed: 14 ins; 0 del; 5 mod 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/23604 From wkemper at openjdk.org Thu Feb 13 16:59:08 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 13 Feb 2025 16:59:08 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v10] In-Reply-To: References: Message-ID: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 28 additional commits since the last revision: - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads - Improve message for assertion - Make shutdown safer for threads requesting (or expecting) gc - Do not accept requests if control thread is terminating - Notify waiters when control thread terminates - Add event for control thread state changes - Fix shutdown livelock error - Fix includes - Simplify locking protocol - Make shutdown more robust, make better use of request lock - ... and 18 more: https://git.openjdk.org/jdk/compare/06ea83a4...51d09207 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23475/files - new: https://git.openjdk.org/jdk/pull/23475/files/779492c6..51d09207 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=08-09 Stats: 12604 lines in 600 files changed: 8568 ins; 1551 del; 2485 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From jarek.odzga at gmail.com Thu Feb 13 20:36:52 2025 From: jarek.odzga at gmail.com (Jaroslaw Odzga) Date: Thu, 13 Feb 2025 12:36:52 -0800 Subject: Configurable G1 heap expansion aggressiveness In-Reply-To: <60612080-69DD-46DC-AA5B-ED078C3A3793@kodewerk.com> References: <553f4d95-14a4-4736-b10d-02b8bb3af686@oracle.com> <60612080-69DD-46DC-AA5B-ED078C3A3793@kodewerk.com> Message-ID: Hi Kirk, Thanks for the detailed answer, I appreciate your time on this. Would you mind sharing more details on the work you are referring to (Serial collector changes, the ongoing G1 work) so I could learn more about it? It sounds like the "hibernation" feature you are talking about is different from Azul's CRaC that uses CRIU? Can you elaborate? > That said, a tuning strategy for G1 is more complicated because the costs of transients is quite different in G1 than it is with the Serial/Parallel collectors. But I believe it is achievable using existing flags/structures and the addition of the SoftMaxHeapSize. Can you share more on how SoftMaxHeapSize fits into this strategy? Doesn't it require some "controller" that would dynamically adjust SoftMaxHeapSize at runtime based on signals like GC CPU usage, VM memory pressure etc? > If I might add, in large homogenous deployments, you?d think you?d see a 1 size fits all optimal GC configuration. Unfortunately my look into this has shown that there are often multiple optimal configurations. The only way to combat this is with smarter ergonomics in the runtime. Thanks for the insight. I believe this to be true. My claim is that for the majority of applications in certain domains (e.g. backend services in multi-tenant environments running in the cloud) the existing default G1 configuration and ergonomics work well only if the max heap size is correctly sized (because of the greedy heap expansion). Sizing heaps is challenging at scale and the most common result is setting max heap too high. This leads to a lot of resource waste that is hard to detect and realize for many because "the heap is used". I guess the question boils down to: is it worth exposing two more internal G1 parameters in a "short term" as experimental tunnables to allow some high leverage optimizations. I think we agree on the cost of doing it (although it might be hard to quantify): maintenance, potential misuse, additional complexity. On a benefit side, the initial results suggest we could significantly increase bin-packing of JVMs per VM because, when bin-packing, we have to account for memory usage spikes due to temporary aggressive heap expansions. Rough estimates suggest 30%-60% smaller memory spikes. At large scale this could lead to big cost savings with a little effort. Taking into account that the additional flags do not change existing default behavior, so they would be completely transparent unless someone decides to go down the rabbit hole of tuning experimental GC flags, maybe it is something worth considering? Best regards, Jaroslaw On Thu, Feb 13, 2025 at 6:35?AM Kirk Pepperdine wrote: > > Hi Jaroslaw, > > > > On Feb 13, 2025, at 5:24?AM, Jaroslaw Odzga wrote: > > Thank you Kirk and Thomas for your answers! > > What Kirk describes sounds great, is the right long term approach and > I can't wait for it to be shipped. It also sounds like a feature we > might need to wait for a while (please correct me if I am wrong). > > > If you look at the ZGC code as a model I believe you?ll find that it?s something that can be achieved by making the appropriate adjustments to the ergonomics. So while the knowledge needed to make the changes is non-trivial, the actual coding effort isn?t something that makes this a ?long term approach?. > > Our decision to focus on Serial was two fold. First, work on G1 is already taking place and given the progress there we thought best to focus on the Serial collector. This is because the Serial collector is default for small deployments which are fairly common. I personally see AHS to be a stepping stone to being able to ?hibernate? idle JVMs, something that isn?t really possible at the moment. Being able to wake up a hibernated JVM should be far cheaper than spinning up a new one taking into account all of the container costs. The data that I?ve collected suggests that starting a JVM is only a small fraction of the total costs of spinning up a new container. And that doesn?t include warmup. > > The complication with the Serial collector is in how heap is structured and consequently, where data resides in memory after a collection cycle. We have rearranged where the generations reside so that ergonomics has the freedom resize individual generational spaces without having to take on the cost of copying data about to accommodate that resizing. This work will land as soon as I address Thomas?s concerns in the JBS. > > This work sets us up for the next steps which I believe should come more quickly now that we?ve set the foundation for it. What we?re looking to do is safely resize each generation according to its current needs while taking into account global memory pressure. In my experience, a lot more memory than is needed gets committed to Java heap simply to accommodate the current sizing policies. Resizing generational spaces individually allows us to end us with heap configurations that are currently unsafe. For example, it is common that GC log data tells me that Eden should be 2 or 3x the size of tenured. Currently, configuring Java heap to accommodate this need risks OOME being thrown or unnecessarily enlarging heap (Tenured) to safely allow for a much larger Eden. Getting this internal tuning right reduces both on GC overhead and memory footprint. This also allows us to easily completely collapse heap should a JVM become idle. > > While there are significant differences between G1 and the Serial collector, there are also similarities with the tuning strategies. In my opinion, the work needed for G1 is easier than it is for the Serial collector simply because of how Java heap is structured. That said, a tuning strategy for G1 is more complicated because the costs of transients is quite different in G1 than it is with the Serial/Parallel collectors. But I believe it is achievable using existing flags/structures and the addition of the SoftMaxHeapSize. > > If I might add, in large homogenous deployments, you?d think you?d see a 1 size fits all optimal GC configuration. Unfortunately my look into this has shown that there are often multiple optimal configurations. The only way to combat this is with smarter ergonomics in the runtime. > > > My proposal is just a tiny stopgap that might help alleviate some of > the problems but does not attempt to be a holistic solution and, as > you pointed out, has downsides. > I totally agree with your assessment: it is just exposing internal > constants but the fact that these are constants is part of the problem > because they bake in an eager heap expansion behavior which is not > necessarily desired. > I share your reluctance to adding more obscure tuning flags: it has > maintenance cost and a risk of misuse. I would not recommend anyone > tuning these flags without reading the source code and understanding > the tradeoffs. > These are not silver bullets and, as you pointed out, probably would > have to be used together with other tuning parameters to achieve > reasonable results. > To clarify, the way we plan to use these flags is to establish a > constant set of tuning parameters that achieve a good tradeoff between > latency, throughput and footprint and apply it to a large number of > services. > We want to avoid tuning each service individually because it is hard > to scale. Example configuration (used with jdk17): > -XX:+UnlockExperimentalVMOptions -XX:+G1PeriodicGCInvokesConcurrent > -XX:G1PeriodicGCInterval=60000 -XX:G1PeriodicGCSystemLoadThreshold=0 > -XX:GCTimeRatio=9 -XX:G1MixedGCLiveThresholdPercent=85 > -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=60 > -XX:MaxGCPauseMillis=200 -XX:GCPauseIntervalMillis=1000 > -XX:-G1UsePreventiveGC -XX:-G1ScaleWithHeapPauseTimeThreshold > -XX:G1MinPausesOverThresholdForGrowth=10 > > > A nightmare that can be avoided with smarter ergonomics. > > > From experiments so far it seems that we can leave the adaptive IHOP > on because even if it mispredicted, e.g. due to allocation spikes, the > heap is not aggressively expanded. > > On the plus side, the change itself is tiny, very localized and could > be trivially backported e.g. all the way to jdk17. Most importantly, > it seems to enable significant cost savings. > > At the end of the day it is a tradeoff. Would it help if I provided > examples of the impact this change had on real life applications? At > Databricks we run hundreds of JVM services and initial results are > very promising. Or should I treat this proposal as officially > rejected? > > Wouldn't the option to make G1 to keep GCTimeRatio better (e.g. > https://bugs.openjdk.org/browse/JDK-8238687), and/or some configurable > soft heap size goal (https://bugs.openjdk.org/browse/JDK-8236073) that > the collector will keep also solve your issue while being easier to > configure? > > Thanks for sharing these. The JDK-8238687 focuses on uncommit while > the heap expansion hurts the most. > The SoftMaxHeapSize could be used as a building block towards a > solution. I think there still would have to be some controller that > adjusts the value of SoftMaxHeapSize based on GC behavior e.g. > increase it when GC pressure is too high. > > > Having more data is always a good thing so I would welcome anything you can share. > > I pub?ed a table that suggests that GC CPU utilization, and not allocation rates, is a key metric to drive heap sizing. The other key metric is availability of RAM. Again, ZGC has this worked out so we?re integrating that work into ours. > > Kind regards, > Kirk > From wkemper at openjdk.org Thu Feb 13 22:39:08 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 13 Feb 2025 22:39:08 GMT Subject: RFR: 8349766: GenShen: Bad progress after degen does not always need full gc In-Reply-To: References: Message-ID: On Tue, 11 Feb 2025 03:31:51 GMT, Kelvin Nilsen wrote: > In generational mode, only upgrade to full GC from degenerated GC if we've done two degenerated cycles in a row and both indicated bad progress. Otherwise, start another concurrent GC, which will most likely degenerate also. But this degenerated cycle will reclaim floating garbage within the young generation much more quickly than a full GC would have done. Looks good to me. ------------- Marked as reviewed by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/23552#pullrequestreview-2616371490 From kdnilsen at openjdk.org Thu Feb 13 23:27:48 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 13 Feb 2025 23:27:48 GMT Subject: RFR: 8349766: GenShen: Bad progress after degen does not always need full gc [v2] In-Reply-To: References: Message-ID: > In generational mode, only upgrade to full GC from degenerated GC if we've done two degenerated cycles in a row and both indicated bad progress. Otherwise, start another concurrent GC, which will most likely degenerate also. But this degenerated cycle will reclaim floating garbage within the young generation much more quickly than a full GC would have done. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) Reviewed-by: shade - Merge tag 'jdk-25+10' into defer-generational-full-gc Added tag jdk-25+10 for changeset a637ccf2 - Be less eager to upgrade degen to full gc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23552/files - new: https://git.openjdk.org/jdk/pull/23552/files/17e5e919..8d662e10 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23552&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23552&range=00-01 Stats: 9730 lines in 593 files changed: 6779 ins; 1405 del; 1546 mod Patch: https://git.openjdk.org/jdk/pull/23552.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23552/head:pull/23552 PR: https://git.openjdk.org/jdk/pull/23552 From kdnilsen at openjdk.org Fri Feb 14 01:18:01 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 14 Feb 2025 01:18:01 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v5] In-Reply-To: References: Message-ID: > At the end of a degenerated GC, we check whether sufficient progress has been made in replenishing the memory available to the mutator. The test for good progress is implemented as a ratio of free memory against the total heap size. > > For generational Shenandoah, the ratio should be computed against the size of the young generation. Note that the size of the generational collection set is based on young generation size rather than total heap size. > > This issue first identified in GenShen GC logs, where a large number of degenerated cycles were upgrading to full GC because the free-set progress was short of desired by 10-25%. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) Reviewed-by: shade - Merge tag 'jdk-25+10' into fix-generational-no-progress-check Added tag jdk-25+10 for changeset a637ccf2 - Merge tag 'jdk-25+9' into fix-generational-no-progress-check Added tag jdk-25+9 for changeset 30f71622 - Add comments suggested by reviewers - Respond to reviewer feedback In testing suggested refinements, I discovered a bug in original implementation. ShenandoahFreeSet::capacity() does not represent the size of young generation. It represents the total size of the young regions that had available memory at the time we most recently rebuilt the ShenandoahFreeSet. I am rerunning the performance tests following this suggested change. - Use freeset to determine goodness of progress As previously implemented, we used the heap size to measure goodness of progress. However, heap size is only appropriate for non-generational Shenandoah. Freeset abstraction works for both. - Use size-of young generation to assess progress Previously, we were using size of heap to asses progress of generational degenerated cycle. But that is not appropriate, because the collection set is chosen based on the size of young generation. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23306/files - new: https://git.openjdk.org/jdk/pull/23306/files/8c610136..0e86c5bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23306&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23306&range=03-04 Stats: 12378 lines in 689 files changed: 8313 ins; 1890 del; 2175 mod Patch: https://git.openjdk.org/jdk/pull/23306.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23306/head:pull/23306 PR: https://git.openjdk.org/jdk/pull/23306 From kdnilsen at openjdk.org Fri Feb 14 01:35:51 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 14 Feb 2025 01:35:51 GMT Subject: RFR: 8348594: Shenandoah: Do not penalize for degeneration when not the fault of triggering heuristic [v6] In-Reply-To: References: Message-ID: > Shenandoah heuristics use a penalty mechanism to cause earlier GC triggers when recent concurrent GC cycles degenerate. Degeneration is a stop-the-world remediation that allows GC to catch up when mutator allocations fail during concurrent GC. The fact that we needed to degenerate indicates that we were overly optimistic in delaying the trigger that starts concurrent GC. > > We have observed that it is common for degenerated GC cycles to cascade upon each other. The condition that caused an initial degenerated cycle is often not fully resolved by the end of that degenerated cycle. For example, the application may be experiencing a phase change and the GC heuristics are not yet attuned to the new behavior. Furthermore, a degenerated GC may exacerbate the problem condition. During the stop-the-world pause imposed by the first degenerated GC, work continues to accumulate in the form of new client requests that are buffered in network sockets until the end of that degenerated GC. > > As originally implemented, each degeneration would "pile on" additional penalties. These penalties cause the GC frequency to continue to increase. And the expanding CPU load of GC makes it increasingly difficult for mutator threads to catchup. The large penalties accumulated while we are trying to resolve the problem linger long after the problem condition has been resolved. > > This change does not add further to the degeneration penalties if a new degenerated cycle occurs through no fault of the triggering mechanism. We only add the degeneration penalty if the reason we are now degenerating can be attributed to a consciously late trigger by the heuristic. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) Reviewed-by: shade - Merge tag 'jdk-25+10' into eliminate-no-fault-degen-penalties Added tag jdk-25+10 for changeset a637ccf2 - Merge tag 'jdk-25+9' into eliminate-no-fault-degen-penalties Added tag jdk-25+9 for changeset 30f71622 - Revert "Use generation size to determine expected free" This reverts commit 94a32ebfe5fefcc0e899e09e6fbfc0585c62b4e0. - Respond to reviewer feedback - Use generation size to determine expected free - Respond to reviewer feedback - Fix white space - Remove debug instrumentation - Only penalize heuristic if heuristic responsible If we degenerate through no fault of "late triggering", then do not penalize the heuristic. - ... and 1 more: https://git.openjdk.org/jdk/compare/961a87d9...0d85e341 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23305/files - new: https://git.openjdk.org/jdk/pull/23305/files/3aabd4db..0d85e341 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23305&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23305&range=04-05 Stats: 12378 lines in 689 files changed: 8313 ins; 1890 del; 2175 mod Patch: https://git.openjdk.org/jdk/pull/23305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23305/head:pull/23305 PR: https://git.openjdk.org/jdk/pull/23305 From wkemper at openjdk.org Fri Feb 14 01:48:58 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 14 Feb 2025 01:48:58 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v11] In-Reply-To: References: Message-ID: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). > > ## Testing > * jtreg hotspot_gc_shenandoah > * dacapo, extremem, diluvian, specjbb2015, specjvm2018, heapothesys William Kemper has updated the pull request incrementally with one additional commit since the last revision: Old gen bootstrap cycle must make it to init mark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23475/files - new: https://git.openjdk.org/jdk/pull/23475/files/51d09207..82f96090 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=09-10 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From xpeng at openjdk.org Fri Feb 14 06:58:37 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 14 Feb 2025 06:58:37 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle [v8] In-Reply-To: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> Message-ID: > Reset marking bitmaps after collection cycle; for GenShen only do this for young generation, also choose not do this for Degen and full GC since both are running at safepoint, we should leave safepoint as ASAP. > > I have run same workload for 30s with Shenandoah in generational mode and classic mode, average average time of concurrent reset dropped significantly since in most case bitmap for young gen should have been reset after pervious concurrent cycle finishes if there is no need to preserve bitmap states. > > GenShen: > Before: > > [33.342s][info][gc,stats ] Concurrent Reset = 0.023 s (a = 1921 us) (n = 12) (lvls, us = 133, 385, 1191, 1836, 8878) > > > After: > > [33.597s][info][gc,stats ] Concurrent Reset = 0.004 s (a = 317 us) (n = 13) (lvls, us = 58, 119, 217, 410, 670) > [33.597s][info][gc,stats ] Concurrent Reset After Collect = 0.018 s (a = 1365 us) (n = 13) (lvls, us = 91, 186, 818, 1836, 3872) > > > Shenandoah: > Before: > > [33.144s][info][gc,stats ] Concurrent Reset = 0.014 s (a = 1067 us) (n = 13) (lvls, us = 139, 277, 898, 1328, 2118) > > After: > > [33.128s][info][gc,stats ] Concurrent Reset = 0.003 s (a = 225 us) (n = 13) (lvls, us = 32, 92, 137, 295, 542) > [33.128s][info][gc,stats ] Concurrent Reset After Collect = 0.009 s (a = 661 us) (n = 13) (lvls, us = 92, 160, 594, 896, 1661) > > > Additional changes: > * Remove `ShenandoahResetBitmapClosure` and `ShenandoahPrepareForMarkClosure`, merge the code with `ShenandoahResetBitmapClosure`, saving one iteration over all the regions. > * Use API `ShenandoahGeneration::parallel_heap_region_iterate_free` to iterate the regions, two benefits from this: > - Underneath it calls `ShenandoahHeap::parallel_heap_region_iterate`, which is faster for very light tasks, see https://bugs.openjdk.org/browse/JDK-8337154 > - `ShenandoahGeneration::parallel_heap_region_iterate_free` decorate the closure with `ShenandoahExcludeRegionClosure`, which simplifies the code in closure. > * When `_do_old_gc_bootstrap is true`, instead of reset mark bitmap for old gen separately, simply reset the global generations, so we don't need walk the all regions twice. > * Clean up FullGC code, remove duplicate code. > > Additional tests: > - [x] CONF=macosx-aarch64-server-fastdebug make test T... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: - Merge branch 'openjdk:master' into reset-bitmap - Merge branch 'openjdk:master' into reset-bitmap - Merge branch 'openjdk:master' into reset-bitmap - Merge branch 'openjdk:master' into reset-bitmap - Adding condition "!_do_old_gc_bootstrap && !heap->is_concurrent_old_mark_in_progress()" back and address some PR comments - Remove entry_reset_after_collect from ShenandoahOldGC - Remove condition check !_do_old_gc_bootstrap && !heap->is_concurrent_old_mark_in_progress() from op_reset_after_collect - Merge branch 'openjdk:master' into reset-bitmap - Address review comments - Merge branch 'openjdk:master' into reset-bitmap - ... and 14 more: https://git.openjdk.org/jdk/compare/a90afca6...c7e9bff3 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22778/files - new: https://git.openjdk.org/jdk/pull/22778/files/92c63159..c7e9bff3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22778&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22778&range=06-07 Stats: 66728 lines in 3845 files changed: 34690 ins; 16712 del; 15326 mod Patch: https://git.openjdk.org/jdk/pull/22778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22778/head:pull/22778 PR: https://git.openjdk.org/jdk/pull/22778 From thomas.schatzl at oracle.com Fri Feb 14 09:21:27 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 14 Feb 2025 10:21:27 +0100 Subject: G1 AHS [Was: Re: Configurable G1 heap expansion aggressiveness] In-Reply-To: References: <553f4d95-14a4-4736-b10d-02b8bb3af686@oracle.com> <60612080-69DD-46DC-AA5B-ED078C3A3793@kodewerk.com> Message-ID: Hi, On 13.02.25 21:36, Jaroslaw Odzga wrote: > Hi Kirk, > > Thanks for the detailed answer, I appreciate your time on this. > [...] > >> That said, a tuning strategy for G1 is more complicated because the costs of transients is quite different in G1 than it is with the Serial/Parallel collectors. But I believe it is achievable using existing flags/structures and the addition of the SoftMaxHeapSize. > Can you share more on how SoftMaxHeapSize fits into this strategy? > Doesn't it require some "controller" that would dynamically adjust > SoftMaxHeapSize at runtime based on signals like GC CPU usage, VM > memory pressure etc? Giving a rough summary about the system envisioned for G1 (warning: fixed size font needed ahead): Inputs: Min/Max/Initial- HeapSize (1) CPU based heap sizing (2) Current committed heap size Min/MaxHeapFree- ----> Controller ----> Ratio (3) Current target heap size CurrentMaxHeap- Size (4) SoftMaxHeapSize (5) "AHS" (6) (1) Existing, kept (2) Improve current CPU based heap sizing; partially done. Largest improvement is to size down the heap based on GCTimeRatio. To some degree this is JDK-8238687, there is a prototype patch for that. (3) The current function of the flags will be removed, if not the flags themselves. They are just in the way all the time. (4) New functionality (JDK-8204088; may not be a JEP). Patch from google available. Allows the user to control max heap size from outside given external direction from outside with information not known (or impossible to know) by the VM. (I.e. the VM will OOME if going over that, but smaller than real MaxHeapSize). Optional. (5) New functionality (JDK-8236073). A guide for G1 to try "hard" to keep that amount of memory. Will not make the VM go OOME if exceeding that. Prototype patch attached to the CR. (6) "AHS": Similar to ZGC's efforts to be a good citizen within a given environment, probing current available memory and adjust committed size (https://openjdk.org/jeps/8329758). There are some other, relatively minor issues that should/could be fixed, also collected using the `gc-g1-heap-resizing` label in JIRA (https://bugs.openjdk.org/issues/?jql=labels%20%3D%20gc-g1-heap-resizing). Some of them deal with the unnecessary aggressive boosting of heap sizing. I'm not sure whether there should be user control for the response time. Even if so, I would rather have a more abstract "inertia" knob than directly allowing to control sample length. Maybe "ZGCPressure" as in the AHS JEP may also control that. (Don't let it bother you that most of the CRs are currently assigned to me. I should unassign myselves for the time being because I'm working on something else right now) Given all these inputs about what the heap size should be, some component, let's call it "controller" will decide current committed/target/max heap size. Hth, Thomas From thomas.schatzl at oracle.com Fri Feb 14 10:04:13 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 14 Feb 2025 11:04:13 +0100 Subject: G1 AHS [Was: Re: Configurable G1 heap expansion aggressiveness] In-Reply-To: References: <553f4d95-14a4-4736-b10d-02b8bb3af686@oracle.com> <60612080-69DD-46DC-AA5B-ED078C3A3793@kodewerk.com> Message-ID: Hi, On 13.02.25 21:36, Jaroslaw Odzga wrote: > Hi Kirk, > > Thanks for the detailed answer, I appreciate your time on this. > > Would you mind sharing more details on the work you are referring to > (Serial collector changes, the ongoing G1 work) so I could learn more > about it? > It sounds like the "hibernation" feature you are talking about is > different from Azul's CRaC that uses CRIU? Can you elaborate? > >> That said, a tuning strategy for G1 is more complicated because the costs of transients is quite different in G1 than it is with the Serial/Parallel collectors. But I believe it is achievable using existing flags/structures and the addition of the SoftMaxHeapSize. > Can you share more on how SoftMaxHeapSize fits into this strategy? > Doesn't it require some "controller" that would dynamically adjust > SoftMaxHeapSize at runtime based on signals like GC CPU usage, VM > memory pressure etc? > Some rough outline of the structure of heap sizing and what is envisioned for the G1 work; it's fairly straightforward, given some input a "controller" mashes all of them together and produces output. Some ascii-art with a bit more detail: Min/MaxInitialHeap- size (1) Current max heap size GC CPU based heap sizing (2) Current committed size Min/MaxHeapFree- ----> Controller ----> Ratio (3) Current target heap size CurrentMaxHeapSize (4) SoftMaxHeapSize (5) AHS (6) Comments: (1) will stay (2) Partially implemented, the main part that is missing is somehow giving feedback on shrinking the heap. JDK-8238687 mostly covers that, there is a (fairly old now) prototype. (3) Their functionality will be fairly reduced if not completely scrapped. They get in the way all the time at least given their current definition. (4) Set a current MaxHeapSize, i.e. a maximum heap size after which the VM will OOME. Allows setting max heap size according to input that the VM can't detect. Prototype from Google available, JDK-8204088 contains a link to some mail thread containing another prototype (5) End user guiding a current target heap size, i.e. a heap size that g1 tries "hard" to follow, but will allow allocation above that. JDK-8236073 contains an old prototype. (6) "AHS": be a nice citizen to other memory consumers in the same environment, similar to ZGC's JEP (JDK-8329758): - tiny initial heap size, large max heap size - startup boost for heap sizing - taking into account available memory in the environment - "ZGCPressure" - some knob to tune weights between all these inputs (i.e. memory/performance, response time/inertia). More (minor) items in this area are collected using the "gc-g1-heap-resizing" label (https://bugs.openjdk.org/issues/?jql=labels%20%3D%20gc-g1-heap-resizing). (That most of these CRs are assigned is misleading. That's historical, I'm working on something else for the foreseeable future) The ZGCPressure equivalent should cover all the capabilities covered by your changes, both extent of response as well as response time itself. Hth, Thomas P.S: I'll respond to the other part of your email(s) in a bit. From thomas.schatzl at oracle.com Fri Feb 14 10:06:30 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 14 Feb 2025 11:06:30 +0100 Subject: G1 AHS [Was: Re: Configurable G1 heap expansion aggressiveness] In-Reply-To: References: <553f4d95-14a4-4736-b10d-02b8bb3af686@oracle.com> <60612080-69DD-46DC-AA5B-ED078C3A3793@kodewerk.com> Message-ID: Sorry for effectively sending this email twice, due to some email client error I thought it had not been sent (and not even been saved anywhere, so the rewrite with minor differences). Apologies, Thomas From thomas.schatzl at oracle.com Fri Feb 14 10:32:24 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 14 Feb 2025 11:32:24 +0100 Subject: Configurable G1 heap expansion aggressiveness In-Reply-To: References: <553f4d95-14a4-4736-b10d-02b8bb3af686@oracle.com> <60612080-69DD-46DC-AA5B-ED078C3A3793@kodewerk.com> Message-ID: Hi, On 13.02.25 21:36, Jaroslaw Odzga wrote: > Hi Kirk, [...] >> If I might add, in large homogenous deployments, you?d think you?d see a 1 size fits all optimal GC configuration. Unfortunately my look into this has shown that there are often multiple optimal configurations. The only way to combat this is with smarter ergonomics in the runtime. > Thanks for the insight. I believe this to be true. My claim is that > for the majority of applications in certain domains (e.g. backend > services in multi-tenant environments running in the cloud) the > existing default G1 configuration and ergonomics work well only if the > max heap size is correctly sized (because of the greedy heap > expansion). > Sizing heaps is challenging at scale and the most common result is > setting max heap too high. This leads to a lot of resource waste that > is hard to detect and realize for many because "the heap is used". > > I guess the question boils down to: is it worth exposing two more > internal G1 parameters in a "short term" as experimental tunnables to > allow some high leverage optimizations. > I think we agree on the cost of doing it (although it might be hard to > quantify): maintenance, potential misuse, additional complexity. We (in the Oracle GC team) have had really really bad experience with haphazardly exposing functionality using flags. They tend to live longer than expected, and due to a responsibility for backwards compatibility they just hinder progress (E.g. in this area: Min/MaxHeapFreeRatio). Further complicating matters is that you seem to expect this to be backported all the way back to JDK 17 ("On the plus side, the change itself is tiny, very localized and could be trivially backported e.g. all the way to jdk17."), which is an even much greater ask. > On a benefit side, the initial results suggest we could significantly > increase bin-packing of JVMs per VM because, when bin-packing, we have > to account for memory usage spikes due to temporary aggressive heap > expansions. Rough estimates suggest 30%-60% smaller memory spikes. At > large scale this could lead to big cost savings with a little effort. At this time it is only you asking for this, although the general problem has been known for a long time. Given that, and that a lot of the issues related to that were filed around 2020 by me (https://bugs.openjdk.org/issues/?jql=labels%20%3D%20gc-g1-heap-resizing), after first concerns having been voiced in 2018, it tells me that it is not/has not been that great of an itch to invest effort into this until now. Thanks go to Google and MS who are looking into this in some capacity at this time :) > Taking into account that the additional flags do not change existing > default behavior, so they would be completely transparent unless > someone decides to go down the rabbit hole of tuning experimental GC > flags, maybe it is something worth considering? We at Oracle currently think it is better to invest the time and effort, as little as may be, in a proper solution than band-aiding heap sizing another time. So yeah, barring other compelling supportive input/opinions/data, we at Oracle would reject this proposal for jdk 25 (and even more backporting this to jdk17). Hth, Thomas From phh at openjdk.org Fri Feb 14 10:41:13 2025 From: phh at openjdk.org (Paul Hohensee) Date: Fri, 14 Feb 2025 10:41:13 GMT Subject: RFR: 8348594: Shenandoah: Do not penalize for degeneration when not the fault of triggering heuristic [v6] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 01:35:51 GMT, Kelvin Nilsen wrote: >> Shenandoah heuristics use a penalty mechanism to cause earlier GC triggers when recent concurrent GC cycles degenerate. Degeneration is a stop-the-world remediation that allows GC to catch up when mutator allocations fail during concurrent GC. The fact that we needed to degenerate indicates that we were overly optimistic in delaying the trigger that starts concurrent GC. >> >> We have observed that it is common for degenerated GC cycles to cascade upon each other. The condition that caused an initial degenerated cycle is often not fully resolved by the end of that degenerated cycle. For example, the application may be experiencing a phase change and the GC heuristics are not yet attuned to the new behavior. Furthermore, a degenerated GC may exacerbate the problem condition. During the stop-the-world pause imposed by the first degenerated GC, work continues to accumulate in the form of new client requests that are buffered in network sockets until the end of that degenerated GC. >> >> As originally implemented, each degeneration would "pile on" additional penalties. These penalties cause the GC frequency to continue to increase. And the expanding CPU load of GC makes it increasingly difficult for mutator threads to catchup. The large penalties accumulated while we are trying to resolve the problem linger long after the problem condition has been resolved. >> >> This change does not add further to the degeneration penalties if a new degenerated cycle occurs through no fault of the triggering mechanism. We only add the degeneration penalty if the reason we are now degenerating can be attributed to a consciously late trigger by the heuristic. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) > > Reviewed-by: shade > - Merge tag 'jdk-25+10' into eliminate-no-fault-degen-penalties > > Added tag jdk-25+10 for changeset a637ccf2 > - Merge tag 'jdk-25+9' into eliminate-no-fault-degen-penalties > > Added tag jdk-25+9 for changeset 30f71622 > - Revert "Use generation size to determine expected free" > > This reverts commit 94a32ebfe5fefcc0e899e09e6fbfc0585c62b4e0. > - Respond to reviewer feedback > - Use generation size to determine expected free > - Respond to reviewer feedback > - Fix white space > - Remove debug instrumentation > - Only penalize heuristic if heuristic responsible > > If we degenerate through no fault of "late triggering", then do not > penalize the heuristic. > - ... and 1 more: https://git.openjdk.org/jdk/compare/0a5fcdaf...0d85e341 Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23305#pullrequestreview-2617399579 From phh at openjdk.org Fri Feb 14 10:46:15 2025 From: phh at openjdk.org (Paul Hohensee) Date: Fri, 14 Feb 2025 10:46:15 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v5] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 01:18:01 GMT, Kelvin Nilsen wrote: >> At the end of a degenerated GC, we check whether sufficient progress has been made in replenishing the memory available to the mutator. The test for good progress is implemented as a ratio of free memory against the total heap size. >> >> For generational Shenandoah, the ratio should be computed against the size of the young generation. Note that the size of the generational collection set is based on young generation size rather than total heap size. >> >> This issue first identified in GenShen GC logs, where a large number of degenerated cycles were upgrading to full GC because the free-set progress was short of desired by 10-25%. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) > > Reviewed-by: shade > - Merge tag 'jdk-25+10' into fix-generational-no-progress-check > > Added tag jdk-25+10 for changeset a637ccf2 > - Merge tag 'jdk-25+9' into fix-generational-no-progress-check > > Added tag jdk-25+9 for changeset 30f71622 > - Add comments suggested by reviewers > - Respond to reviewer feedback > > In testing suggested refinements, I discovered a bug in original > implementation. ShenandoahFreeSet::capacity() does not represent the > size of young generation. It represents the total size of the young > regions that had available memory at the time we most recently rebuilt > the ShenandoahFreeSet. > > I am rerunning the performance tests following this suggested change. > - Use freeset to determine goodness of progress > > As previously implemented, we used the heap size to measure goodness of > progress. However, heap size is only appropriate for non-generational > Shenandoah. Freeset abstraction works for both. > - Use size-of young generation to assess progress > > Previously, we were using size of heap to asses progress of generational > degenerated cycle. But that is not appropriate, because the collection > set is chosen based on the size of young generation. Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23306#pullrequestreview-2617414395 From duke at openjdk.org Fri Feb 14 15:14:15 2025 From: duke at openjdk.org (duke) Date: Fri, 14 Feb 2025 15:14:15 GMT Subject: RFR: 8348595: GenShen: Fix generational free-memory no-progress check [v5] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 01:18:01 GMT, Kelvin Nilsen wrote: >> At the end of a degenerated GC, we check whether sufficient progress has been made in replenishing the memory available to the mutator. The test for good progress is implemented as a ratio of free memory against the total heap size. >> >> For generational Shenandoah, the ratio should be computed against the size of the young generation. Note that the size of the generational collection set is based on young generation size rather than total heap size. >> >> This issue first identified in GenShen GC logs, where a large number of degenerated cycles were upgrading to full GC because the free-set progress was short of desired by 10-25%. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) > > Reviewed-by: shade > - Merge tag 'jdk-25+10' into fix-generational-no-progress-check > > Added tag jdk-25+10 for changeset a637ccf2 > - Merge tag 'jdk-25+9' into fix-generational-no-progress-check > > Added tag jdk-25+9 for changeset 30f71622 > - Add comments suggested by reviewers > - Respond to reviewer feedback > > In testing suggested refinements, I discovered a bug in original > implementation. ShenandoahFreeSet::capacity() does not represent the > size of young generation. It represents the total size of the young > regions that had available memory at the time we most recently rebuilt > the ShenandoahFreeSet. > > I am rerunning the performance tests following this suggested change. > - Use freeset to determine goodness of progress > > As previously implemented, we used the heap size to measure goodness of > progress. However, heap size is only appropriate for non-generational > Shenandoah. Freeset abstraction works for both. > - Use size-of young generation to assess progress > > Previously, we were using size of heap to asses progress of generational > degenerated cycle. But that is not appropriate, because the collection > set is chosen based on the size of young generation. @kdnilsen Your change (at version 0e86c5bd1ae330522daa9652f7843342fef9f83e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23306#issuecomment-2659585115 From duke at openjdk.org Fri Feb 14 15:16:25 2025 From: duke at openjdk.org (duke) Date: Fri, 14 Feb 2025 15:16:25 GMT Subject: RFR: 8348594: Shenandoah: Do not penalize for degeneration when not the fault of triggering heuristic [v6] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 01:35:51 GMT, Kelvin Nilsen wrote: >> Shenandoah heuristics use a penalty mechanism to cause earlier GC triggers when recent concurrent GC cycles degenerate. Degeneration is a stop-the-world remediation that allows GC to catch up when mutator allocations fail during concurrent GC. The fact that we needed to degenerate indicates that we were overly optimistic in delaying the trigger that starts concurrent GC. >> >> We have observed that it is common for degenerated GC cycles to cascade upon each other. The condition that caused an initial degenerated cycle is often not fully resolved by the end of that degenerated cycle. For example, the application may be experiencing a phase change and the GC heuristics are not yet attuned to the new behavior. Furthermore, a degenerated GC may exacerbate the problem condition. During the stop-the-world pause imposed by the first degenerated GC, work continues to accumulate in the form of new client requests that are buffered in network sockets until the end of that degenerated GC. >> >> As originally implemented, each degeneration would "pile on" additional penalties. These penalties cause the GC frequency to continue to increase. And the expanding CPU load of GC makes it increasingly difficult for mutator threads to catchup. The large penalties accumulated while we are trying to resolve the problem linger long after the problem condition has been resolved. >> >> This change does not add further to the degeneration penalties if a new degenerated cycle occurs through no fault of the triggering mechanism. We only add the degeneration penalty if the reason we are now degenerating can be attributed to a consciously late trigger by the heuristic. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) > > Reviewed-by: shade > - Merge tag 'jdk-25+10' into eliminate-no-fault-degen-penalties > > Added tag jdk-25+10 for changeset a637ccf2 > - Merge tag 'jdk-25+9' into eliminate-no-fault-degen-penalties > > Added tag jdk-25+9 for changeset 30f71622 > - Revert "Use generation size to determine expected free" > > This reverts commit 94a32ebfe5fefcc0e899e09e6fbfc0585c62b4e0. > - Respond to reviewer feedback > - Use generation size to determine expected free > - Respond to reviewer feedback > - Fix white space > - Remove debug instrumentation > - Only penalize heuristic if heuristic responsible > > If we degenerate through no fault of "late triggering", then do not > penalize the heuristic. > - ... and 1 more: https://git.openjdk.org/jdk/compare/a1bdb2da...0d85e341 @kdnilsen Your change (at version 0d85e34107d74e471a791e0523cabc403e02178c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23305#issuecomment-2659590224 From kdnilsen at openjdk.org Fri Feb 14 16:43:17 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 14 Feb 2025 16:43:17 GMT Subject: Integrated: 8348594: Shenandoah: Do not penalize for degeneration when not the fault of triggering heuristic In-Reply-To: References: Message-ID: On Fri, 24 Jan 2025 18:18:25 GMT, Kelvin Nilsen wrote: > Shenandoah heuristics use a penalty mechanism to cause earlier GC triggers when recent concurrent GC cycles degenerate. Degeneration is a stop-the-world remediation that allows GC to catch up when mutator allocations fail during concurrent GC. The fact that we needed to degenerate indicates that we were overly optimistic in delaying the trigger that starts concurrent GC. > > We have observed that it is common for degenerated GC cycles to cascade upon each other. The condition that caused an initial degenerated cycle is often not fully resolved by the end of that degenerated cycle. For example, the application may be experiencing a phase change and the GC heuristics are not yet attuned to the new behavior. Furthermore, a degenerated GC may exacerbate the problem condition. During the stop-the-world pause imposed by the first degenerated GC, work continues to accumulate in the form of new client requests that are buffered in network sockets until the end of that degenerated GC. > > As originally implemented, each degeneration would "pile on" additional penalties. These penalties cause the GC frequency to continue to increase. And the expanding CPU load of GC makes it increasingly difficult for mutator threads to catchup. The large penalties accumulated while we are trying to resolve the problem linger long after the problem condition has been resolved. > > This change does not add further to the degeneration penalties if a new degenerated cycle occurs through no fault of the triggering mechanism. We only add the degeneration penalty if the reason we are now degenerating can be attributed to a consciously late trigger by the heuristic. This pull request has now been integrated. Changeset: 38322407 Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/38322407cd1664115e975c7fd9cb61e40d9557b5 Stats: 82 lines in 12 files changed: 78 ins; 0 del; 4 mod 8348594: Shenandoah: Do not penalize for degeneration when not the fault of triggering heuristic Reviewed-by: phh, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/23305 From kdnilsen at openjdk.org Fri Feb 14 16:44:16 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 14 Feb 2025 16:44:16 GMT Subject: Integrated: 8348595: GenShen: Fix generational free-memory no-progress check In-Reply-To: References: Message-ID: On Fri, 24 Jan 2025 18:30:02 GMT, Kelvin Nilsen wrote: > At the end of a degenerated GC, we check whether sufficient progress has been made in replenishing the memory available to the mutator. The test for good progress is implemented as a ratio of free memory against the total heap size. > > For generational Shenandoah, the ratio should be computed against the size of the young generation. Note that the size of the generational collection set is based on young generation size rather than total heap size. > > This issue first identified in GenShen GC logs, where a large number of degenerated cycles were upgrading to full GC because the free-set progress was short of desired by 10-25%. This pull request has now been integrated. Changeset: ba6c9659 Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/ba6c96599aac1a6c08cb66c611474f83bbc9b260 Stats: 27 lines in 5 files changed: 21 ins; 0 del; 6 mod 8348595: GenShen: Fix generational free-memory no-progress check Reviewed-by: phh, xpeng ------------- PR: https://git.openjdk.org/jdk/pull/23306 From wkemper at openjdk.org Fri Feb 14 17:43:48 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 14 Feb 2025 17:43:48 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v12] In-Reply-To: References: Message-ID: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). > > ## Testing > * jtreg hotspot_gc_shenandoah > * dacapo, extremem, diluvian, specjbb2015, specjvm2018, heapothesys William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads - Old gen bootstrap cycle must make it to init mark - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads - Improve message for assertion - Make shutdown safer for threads requesting (or expecting) gc - Do not accept requests if control thread is terminating - Notify waiters when control thread terminates - Add event for control thread state changes - Fix shutdown livelock error - Fix includes - ... and 20 more: https://git.openjdk.org/jdk/compare/ba6c9659...915ffbda ------------- Changes: https://git.openjdk.org/jdk/pull/23475/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=11 Stats: 892 lines in 18 files changed: 285 ins; 281 del; 326 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From kdnilsen at openjdk.org Fri Feb 14 18:37:54 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 14 Feb 2025 18:37:54 GMT Subject: RFR: 8349766: GenShen: Bad progress after degen does not always need full gc [v3] In-Reply-To: References: Message-ID: > In generational mode, only upgrade to full GC from degenerated GC if we've done two degenerated cycles in a row and both indicated bad progress. Otherwise, start another concurrent GC, which will most likely degenerate also. But this degenerated cycle will reclaim floating garbage within the young generation much more quickly than a full GC would have done. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Fix typo in merge conflict resolution - 8348595: GenShen: Fix generational free-memory no-progress check Reviewed-by: phh, xpeng ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23552/files - new: https://git.openjdk.org/jdk/pull/23552/files/8d662e10..0f5051a1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23552&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23552&range=01-02 Stats: 27 lines in 5 files changed: 21 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23552.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23552/head:pull/23552 PR: https://git.openjdk.org/jdk/pull/23552 From kdnilsen at openjdk.org Fri Feb 14 18:51:31 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 14 Feb 2025 18:51:31 GMT Subject: RFR: 8349766: GenShen: Bad progress after degen does not always need full gc [v4] In-Reply-To: References: Message-ID: > In generational mode, only upgrade to full GC from degenerated GC if we've done two degenerated cycles in a row and both indicated bad progress. Otherwise, start another concurrent GC, which will most likely degenerate also. But this degenerated cycle will reclaim floating garbage within the young generation much more quickly than a full GC would have done. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge master - Fix typo in merge conflict resolution - 8348595: GenShen: Fix generational free-memory no-progress check Reviewed-by: phh, xpeng - 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) Reviewed-by: shade - Merge tag 'jdk-25+10' into defer-generational-full-gc Added tag jdk-25+10 for changeset a637ccf2 - Be less eager to upgrade degen to full gc ------------- Changes: https://git.openjdk.org/jdk/pull/23552/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23552&range=03 Stats: 20 lines in 2 files changed: 17 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23552.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23552/head:pull/23552 PR: https://git.openjdk.org/jdk/pull/23552 From kdnilsen at openjdk.org Fri Feb 14 19:41:16 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 14 Feb 2025 19:41:16 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v12] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 17:43:48 GMT, William Kemper wrote: >> There are several changes to the operation of Shenandoah's control threads here. >> * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. >> * The cancellation handling is driven entirely by the cancellation cause >> * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed >> * The shutdown sequence is simpler >> * The generational control thread uses a lock to coordinate updates to the requested cause and generation >> * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance >> * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles >> * The control thread doesn't loop on its own (unless the pacer is enabled). >> >> ## Testing >> * jtreg hotspot_gc_shenandoah >> * dacapo, extremem, diluvian, specjbb2015, specjvm2018, heapothesys > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: > > - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads > - Old gen bootstrap cycle must make it to init mark > - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads > - Improve message for assertion > - Make shutdown safer for threads requesting (or expecting) gc > - Do not accept requests if control thread is terminating > - Notify waiters when control thread terminates > - Add event for control thread state changes > - Fix shutdown livelock error > - Fix includes > - ... and 20 more: https://git.openjdk.org/jdk/compare/ba6c9659...915ffbda Thank you. This looks very clean to me. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 98: > 96: } > 97: > 98: // In case any threads are waiting for a cycle to happen, let them know it isn't. maybe "it isn't happening", or "it won't happen". ------------- Marked as reviewed by kdnilsen (Author). PR Review: https://git.openjdk.org/jdk/pull/23475#pullrequestreview-2618641262 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956626274 From dlong at openjdk.org Fri Feb 14 23:47:12 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 14 Feb 2025 23:47:12 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: <8JUfZWRWpAhYCG9qO7Jxfj5k6d1iUNpRdawRn-veiBQ=.4b70e450-14e5-429a-aa95-08599673afba@github.com> On Wed, 5 Feb 2025 14:41:39 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 385: > 383: > 384: HeapWord* ParallelScavengeHeap::mem_allocate_old_gen(size_t size) { > 385: if (!should_alloc_in_eden(size) || GCLocker::is_active()) { I don't understand why we are checking is_active() here. The value is not reliable if we aren't at a safepoint, and iterating over all threads seems expensive. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1956881801 From ysr at openjdk.org Sat Feb 15 01:55:20 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 15 Feb 2025 01:55:20 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v12] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 17:43:48 GMT, William Kemper wrote: >> There are several changes to the operation of Shenandoah's control threads here. >> * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. >> * The cancellation handling is driven entirely by the cancellation cause >> * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed >> * The shutdown sequence is simpler >> * The generational control thread uses a lock to coordinate updates to the requested cause and generation >> * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance >> * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles >> * The control thread doesn't loop on its own (unless the pacer is enabled). >> >> ## Testing >> * jtreg hotspot_gc_shenandoah >> * dacapo, extremem, diluvian, specjbb2015, specjvm2018, heapothesys > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: > > - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads > - Old gen bootstrap cycle must make it to init mark > - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads > - Improve message for assertion > - Make shutdown safer for threads requesting (or expecting) gc > - Do not accept requests if control thread is terminating > - Notify waiters when control thread terminates > - Add event for control thread state changes > - Fix shutdown livelock error > - Fix includes > - ... and 20 more: https://git.openjdk.org/jdk/compare/ba6c9659...915ffbda Flushing the comments at EOD; will complete review later. src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.hpp line 188: > 186: > 187: bool should_start_gc() override; > 188: bool resume_old_cycle(); Documentation comment please, especially explaining the return value. For things that may return `false` and not do anything, it's better to use `try_` prefix. In fact, the method doesn't actually resume the cycle, but checks if we are in a state such that we should resume it. So, I'd name it `should_resume_old_cycle()`, consistent with the name `should_start_gc()` for the previous method. src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.hpp line 101: > 99: || cause == GCCause::_shenandoah_allocation_failure_evac > 100: || cause == GCCause::_shenandoah_humongous_allocation_failure; > 101: } Would it make sense to move this implementation also to the .cpp file like the other static `is_...` methods below? src/hotspot/share/gc/shenandoah/shenandoahController.hpp line 42: > 40: volatile size_t _allocs_seen; > 41: shenandoah_padding(1); > 42: volatile size_t _gc_id; // A monotonically increasing GC count. src/hotspot/share/gc/shenandoah/shenandoahController.hpp line 66: > 64: > 65: // This cancels the collection cycle and has an option to block > 66: // until another cycle runs and clears the alloc failure gc flag. But "the alloc failure gc flag" is gone above. The comment should be updated as well. A public API's description should avoid talking about its internal implementation details here. It's OK to talk about implementation details in the implementation of the method, not in the header spec here. src/hotspot/share/gc/shenandoah/shenandoahController.hpp line 87: > 85: // Returns the internal gc count used by the control thread. Probably > 86: // doesn't need to be exposed. > 87: size_t get_gc_id(); As far as I can tell, there's a single non-internal (public) use of this, and it's from `ShenandoahOldGeneration::handle_failed_promotion()` where it's being used for reducing logging data. If we do need to expose this through a public API, I'd elide the "Probably doesn't need to be exposed", and update the comment to: // Return the value of a monotonic increasing GC count, maintained by the control thread. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 64: > 62: void ShenandoahGenerationalControlThread::run_service() { > 63: > 64: const int64_t wait_ms = ShenandoahPacing ? ShenandoahControlIntervalMin : 0; So we are supporting ShenandoahPacing with GenShenm(at least till we pull it in the future), but don't want to enable it by default, correct? src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 64: > 62: private: > 63: // This lock is used to coordinate setting the _requested_gc_cause, _requested generation > 64: // and _gc_mode. It is important that these be changed together and have a consistent view. In that case, for ease of maintenance, I'd move the declaration of all of the 3 data members that this lock protects next to this lock, either immediately preceding or immediately succeeding its declaration in the body of this class. Are these data members always both read and written under this lock? If so, then `_gc_mode` below doesn't need to be defined `volatile`. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 88: > 86: uint _age_period; > 87: > 88: // The mode is read frequently by requesting threads and only ever written by the control thread. Do requesting threads lock the mutex when reading? I am trying to square yr comment that it's protected by the mutex with the field being declared `volatile`. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 450: > 448: > 449: void cancel_concurrent_mark(); > 450: bool cancel_gc(GCCause::Cause cause); // Returns true if and only if cancellation request was successfully communicated. ------------- PR Review: https://git.openjdk.org/jdk/pull/23475#pullrequestreview-2618968208 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956962731 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956965579 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956944585 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956918529 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956929734 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956981955 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956816268 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956824150 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956952381 From ysr at openjdk.org Sat Feb 15 01:55:21 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 15 Feb 2025 01:55:21 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v12] In-Reply-To: References: Message-ID: On Sat, 15 Feb 2025 01:10:51 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: >> >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Old gen bootstrap cycle must make it to init mark >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Improve message for assertion >> - Make shutdown safer for threads requesting (or expecting) gc >> - Do not accept requests if control thread is terminating >> - Notify waiters when control thread terminates >> - Add event for control thread state changes >> - Fix shutdown livelock error >> - Fix includes >> - ... and 20 more: https://git.openjdk.org/jdk/compare/ba6c9659...915ffbda > > src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.hpp line 101: > >> 99: || cause == GCCause::_shenandoah_allocation_failure_evac >> 100: || cause == GCCause::_shenandoah_humongous_allocation_failure; >> 101: } > > Would it make sense to move this implementation also to the .cpp file like the other static `is_...` methods below? Or is this guaranteeing inlining into the caller's body, which you might prefer for the callers? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956968182 From ysr at openjdk.org Sat Feb 15 01:55:22 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 15 Feb 2025 01:55:22 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v12] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 19:28:01 GMT, Kelvin Nilsen wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: >> >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Old gen bootstrap cycle must make it to init mark >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Improve message for assertion >> - Make shutdown safer for threads requesting (or expecting) gc >> - Do not accept requests if control thread is terminating >> - Notify waiters when control thread terminates >> - Add event for control thread state changes >> - Fix shutdown livelock error >> - Fix includes >> - ... and 20 more: https://git.openjdk.org/jdk/compare/ba6c9659...915ffbda > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 98: > >> 96: } >> 97: >> 98: // In case any threads are waiting for a cycle to happen, let them know it isn't. > > maybe "it isn't happening", or "it won't happen". This is interesting. If GC is stopping prior to shutting down the VM, is there any point in notifying these waiting threads. Why not let them wait, and quietly shut things down? Are there JCK or other tests that would fail in that case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956979069 From ayang at openjdk.org Sat Feb 15 11:44:44 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sat, 15 Feb 2025 11:44:44 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v3] In-Reply-To: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: > Here is an attempt to simplify GCLocker implementation for Serial and Parallel. > > GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. > > The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. > > Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into gclocker - review - Merge branch 'master' into gclocker - review - Merge branch 'master' into gclocker - gclocker ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23367/files - new: https://git.openjdk.org/jdk/pull/23367/files/1b6f908b..005087e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23367&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23367&range=01-02 Stats: 18668 lines in 693 files changed: 10993 ins; 4307 del; 3368 mod Patch: https://git.openjdk.org/jdk/pull/23367.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23367/head:pull/23367 PR: https://git.openjdk.org/jdk/pull/23367 From ayang at openjdk.org Sat Feb 15 11:49:13 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sat, 15 Feb 2025 11:49:13 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: <8JUfZWRWpAhYCG9qO7Jxfj5k6d1iUNpRdawRn-veiBQ=.4b70e450-14e5-429a-aa95-08599673afba@github.com> References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> <8JUfZWRWpAhYCG9qO7Jxfj5k6d1iUNpRdawRn-veiBQ=.4b70e450-14e5-429a-aa95-08599673afba@github.com> Message-ID: On Fri, 14 Feb 2025 23:44:25 GMT, Dean Long wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into gclocker >> - review >> - Merge branch 'master' into gclocker >> - gclocker > > src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 385: > >> 383: >> 384: HeapWord* ParallelScavengeHeap::mem_allocate_old_gen(size_t size) { >> 385: if (!should_alloc_in_eden(size) || GCLocker::is_active()) { > > I don't understand why we are checking is_active() here. The value is not reliable if we aren't at a safepoint, and iterating over all threads seems expensive. The intention is to avoid blocking java threads if possible, but there is no fundamental reason why it has be to this way. I have removed it for simpler (or less magical) code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23367#discussion_r1957098815 From ayang at openjdk.org Sat Feb 15 11:52:14 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Sat, 15 Feb 2025 11:52:14 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Fri, 7 Feb 2025 06:43:25 GMT, David Holmes wrote: > But in any case adding the atomic load to in_critical() is basically a no-op (loads are atomic) so no need to add a new API just to do that. I have removed the new API, and switched to use the original `in_critical()`. > I think that to get the correct "dekker duality" in this code you do need to have full fences between the stores and loads, not just a storeload barrier. I have changed to `fence` for simpler reasoning. (In our codebase, the two have the same implementation, so perf should be the same.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2660886740 From jarek.odzga at gmail.com Mon Feb 17 20:51:43 2025 From: jarek.odzga at gmail.com (Jaroslaw Odzga) Date: Mon, 17 Feb 2025 12:51:43 -0800 Subject: G1 AHS [Was: Re: Configurable G1 heap expansion aggressiveness] In-Reply-To: References: <553f4d95-14a4-4736-b10d-02b8bb3af686@oracle.com> <60612080-69DD-46DC-AA5B-ED078C3A3793@kodewerk.com> Message-ID: Thank you Thomas and Kirk for your time discussing this and sharing the resources! I also noticed that Kirk dropped this jpe over the weekend ;-) https://openjdk.org/jeps/8350152 The work that is currently ongoing is very exciting. It sounds like the right decision to focus on a zero-configuration solution that just works and exposing existing tunables as flags would certainly not help with that. Best regards, Jaroslaw On Fri, Feb 14, 2025 at 2:07?AM Thomas Schatzl wrote: > > Sorry for effectively sending this email twice, due to some email client > error I thought it had not been sent (and not even been saved anywhere, > so the rewrite with minor differences). > > Apologies, > Thomas > From dholmes at openjdk.org Tue Feb 18 01:28:20 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 18 Feb 2025 01:28:20 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Sat, 15 Feb 2025 11:49:53 GMT, Albert Mingkun Yang wrote: > I have removed the new API, and switched to use the original in_critical(). You still need it to be an atomic load together with whatever compiler barriers that implies, otherwise it can be hoisted out of the spin-loop: while (cur->in_critical()) { spin_yield.wait(); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2664351707 From dholmes at openjdk.org Tue Feb 18 01:40:22 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 18 Feb 2025 01:40:22 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v3] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Sat, 15 Feb 2025 11:44:44 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker The GCLocker behaviour would be easier to discern if all of the `thread` parameters/variables that have to be the current thread were actually called `current` (with a few suitably placed assertions). ------------- PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2622203973 From aboldtch at openjdk.org Tue Feb 18 06:26:12 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 18 Feb 2025 06:26:12 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v6] In-Reply-To: <2jrzusvVl-XI8K734YlChq4ObRX75yovTq7mWTf8ZlA=.0e75781a-5d52-4919-ad28-c5e91ec3a47f@github.com> References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> <2jrzusvVl-XI8K734YlChq4ObRX75yovTq7mWTf8ZlA=.0e75781a-5d52-4919-ad28-c5e91ec3a47f@github.com> Message-ID: On Fri, 7 Feb 2025 14:48:51 GMT, Roberto Casta?eda Lozano wrote: >> G1 barriers can be safely elided from writes to newly allocated objects as long as no safepoint is taken between the allocation and the write. This changeset complements early G1 barrier elision (performed by the platform-independent phases of C2, and limited to writes immediately following allocations) with a more general elision pass done at a late stage. >> >> The late elision pass exploits that it runs at a stage where the relative order of memory accesses and safepoints cannot change anymore to elide barriers from initialization writes that do not immediately follow the corresponding allocation, e.g. in conditional initialization writes: >> >> >> o = new MyObject(); >> if (...) { >> o.myField = ...; // barrier elided only after this changeset >> // (assuming no safepoint in the if condition) >> } >> >> >> or in initialization writes placed after exception-throwing checks: >> >> >> o = new MyObject(); >> if (...) { >> throw new Exception(""); >> } >> o.myField = ...; // barrier elided only after this changeset >> // (assuming no safepoint in the above if condition) >> >> >> These patterns are commonly found in Java code, e.g. in the core libraries: >> >> - [conditional initialization](https://github.com/openjdk/jdk/blob/25fecaaf87400af535c242fe50296f1f89ceeb16/src/java.base/share/classes/java/lang/String.java#L4850), or >> >> - [initialization after exception-throwing checks (in the superclass constructor)](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/X-Buffer.java.template#L324). >> >> The optimization also enhances barrier elision for array initialization writes, for example eliding barriers from small array initialization loops (for which safepoints are not inserted): >> >> >> Object[] a = new Object[...]; >> for (int i = 0; i < a.length; i++) { >> a[i] = ...; // barrier elided only after this changeset >> } >> >> >> or eliding barriers from array initialization writes with unknown array index: >> >> >> Object[] a = new Object[...]; >> a[index] = ...; // barrier elided only after this changeset >> >> >> The logic used to perform this additional barrier elision is a subset of a pre-existing ZGC-specific optimization. This changeset simply reuses the relevant subset (barrier elision for writes to newly-allocated objects) by extracting the core of the optimization logic from `zBarrierSetC2.cpp` into the GC-shared file `barrierSetC2.cpp`. The functions `block_has_safepoint`, `block_inde... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Disable test IR checks for cases where barrier elision analysis fails to elide on s390 The ZGC refactoring lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23235#pullrequestreview-2622554105 From rcastanedalo at openjdk.org Tue Feb 18 08:29:12 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 18 Feb 2025 08:29:12 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v6] In-Reply-To: References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> <2jrzusvVl-XI8K734YlChq4ObRX75yovTq7mWTf8ZlA=.0e75781a-5d52-4919-ad28-c5e91ec3a47f@github.com> Message-ID: On Tue, 18 Feb 2025 06:23:20 GMT, Axel Boldt-Christmas wrote: > The ZGC refactoring lgtm. Thanks, Axel! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23235#issuecomment-2664921534 From iwalulya at openjdk.org Tue Feb 18 09:12:50 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 18 Feb 2025 09:12:50 GMT Subject: RFR: 8349688: Crash assert(!_g1h->heap_region_containing(p)->is_survivor()) failed: Should have filtered out from-newly allocated survivor references already [v2] In-Reply-To: <8-jj1o3jeNZuavFIbg4VCh_oMXEJxP8vxqnz01cOVgg=.d7ea8a07-d5e6-46cc-9bdf-0f2e5c22632a@github.com> References: <8-jj1o3jeNZuavFIbg4VCh_oMXEJxP8vxqnz01cOVgg=.d7ea8a07-d5e6-46cc-9bdf-0f2e5c22632a@github.com> Message-ID: > Hi, > > Please review this fix to the bug in setting a region index to the optional cset. The crash happens because the incorrect `_index_in_opt_cset` refers to a region that has already been evacuated. > > Testing: tier5-common-apps Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8349688-OptionalRegions - set_index_in_opt_cset correctly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23568/files - new: https://git.openjdk.org/jdk/pull/23568/files/e66a43e1..7c5147a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23568&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23568&range=00-01 Stats: 11779 lines in 500 files changed: 8619 ins; 1331 del; 1829 mod Patch: https://git.openjdk.org/jdk/pull/23568.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23568/head:pull/23568 PR: https://git.openjdk.org/jdk/pull/23568 From ayang at openjdk.org Tue Feb 18 09:20:57 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 18 Feb 2025 09:20:57 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v4] In-Reply-To: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: > Here is an attempt to simplify GCLocker implementation for Serial and Parallel. > > GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. > > The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. > > Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. > > Test: tier1-8 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into gclocker - review - Merge branch 'master' into gclocker - review - Merge branch 'master' into gclocker - review - Merge branch 'master' into gclocker - gclocker ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23367/files - new: https://git.openjdk.org/jdk/pull/23367/files/005087e3..78f91d4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23367&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23367&range=02-03 Stats: 1461 lines in 94 files changed: 848 ins; 266 del; 347 mod Patch: https://git.openjdk.org/jdk/pull/23367.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23367/head:pull/23367 PR: https://git.openjdk.org/jdk/pull/23367 From ayang at openjdk.org Tue Feb 18 09:24:15 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 18 Feb 2025 09:24:15 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 18 Feb 2025 01:25:12 GMT, David Holmes wrote: > You still need it to be an atomic load Then, maybe the logic is easier to read if the "atomic" access is visible directly from that context, instead of hiding it inside `in_critical`. Therefore, it probably makes more sense to introduce a new API. WDYT? > The GCLocker behaviour would be easier to discern ... Renamed to `current_thread` in `enter`,`exit`, and `enter_slow`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2665044825 From tschatzl at openjdk.org Tue Feb 18 10:18:20 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 18 Feb 2025 10:18:20 GMT Subject: RFR: 8346280: C2: implement late barrier elision for G1 [v6] In-Reply-To: <2jrzusvVl-XI8K734YlChq4ObRX75yovTq7mWTf8ZlA=.0e75781a-5d52-4919-ad28-c5e91ec3a47f@github.com> References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> <2jrzusvVl-XI8K734YlChq4ObRX75yovTq7mWTf8ZlA=.0e75781a-5d52-4919-ad28-c5e91ec3a47f@github.com> Message-ID: On Fri, 7 Feb 2025 14:48:51 GMT, Roberto Casta?eda Lozano wrote: >> G1 barriers can be safely elided from writes to newly allocated objects as long as no safepoint is taken between the allocation and the write. This changeset complements early G1 barrier elision (performed by the platform-independent phases of C2, and limited to writes immediately following allocations) with a more general elision pass done at a late stage. >> >> The late elision pass exploits that it runs at a stage where the relative order of memory accesses and safepoints cannot change anymore to elide barriers from initialization writes that do not immediately follow the corresponding allocation, e.g. in conditional initialization writes: >> >> >> o = new MyObject(); >> if (...) { >> o.myField = ...; // barrier elided only after this changeset >> // (assuming no safepoint in the if condition) >> } >> >> >> or in initialization writes placed after exception-throwing checks: >> >> >> o = new MyObject(); >> if (...) { >> throw new Exception(""); >> } >> o.myField = ...; // barrier elided only after this changeset >> // (assuming no safepoint in the above if condition) >> >> >> These patterns are commonly found in Java code, e.g. in the core libraries: >> >> - [conditional initialization](https://github.com/openjdk/jdk/blob/25fecaaf87400af535c242fe50296f1f89ceeb16/src/java.base/share/classes/java/lang/String.java#L4850), or >> >> - [initialization after exception-throwing checks (in the superclass constructor)](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/X-Buffer.java.template#L324). >> >> The optimization also enhances barrier elision for array initialization writes, for example eliding barriers from small array initialization loops (for which safepoints are not inserted): >> >> >> Object[] a = new Object[...]; >> for (int i = 0; i < a.length; i++) { >> a[i] = ...; // barrier elided only after this changeset >> } >> >> >> or eliding barriers from array initialization writes with unknown array index: >> >> >> Object[] a = new Object[...]; >> a[index] = ...; // barrier elided only after this changeset >> >> >> The logic used to perform this additional barrier elision is a subset of a pre-existing ZGC-specific optimization. This changeset simply reuses the relevant subset (barrier elision for writes to newly-allocated objects) by extracting the core of the optimization logic from `zBarrierSetC2.cpp` into the GC-shared file `barrierSetC2.cpp`. The functions `block_has_safepoint`, `block_inde... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Disable test IR checks for cases where barrier elision analysis fails to elide on s390 Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23235#pullrequestreview-2623060636 From rcastanedalo at openjdk.org Tue Feb 18 10:26:20 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 18 Feb 2025 10:26:20 GMT Subject: Integrated: 8346280: C2: implement late barrier elision for G1 In-Reply-To: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> References: <3eOK-nFYQbKn1w81CWHUY14wk0gyWMT5ULHgZ-ih5-w=.8be51ad0-f412-4aad-b73a-436ccdb8181a@github.com> Message-ID: On Wed, 22 Jan 2025 15:20:19 GMT, Roberto Casta?eda Lozano wrote: > G1 barriers can be safely elided from writes to newly allocated objects as long as no safepoint is taken between the allocation and the write. This changeset complements early G1 barrier elision (performed by the platform-independent phases of C2, and limited to writes immediately following allocations) with a more general elision pass done at a late stage. > > The late elision pass exploits that it runs at a stage where the relative order of memory accesses and safepoints cannot change anymore to elide barriers from initialization writes that do not immediately follow the corresponding allocation, e.g. in conditional initialization writes: > > > o = new MyObject(); > if (...) { > o.myField = ...; // barrier elided only after this changeset > // (assuming no safepoint in the if condition) > } > > > or in initialization writes placed after exception-throwing checks: > > > o = new MyObject(); > if (...) { > throw new Exception(""); > } > o.myField = ...; // barrier elided only after this changeset > // (assuming no safepoint in the above if condition) > > > These patterns are commonly found in Java code, e.g. in the core libraries: > > - [conditional initialization](https://github.com/openjdk/jdk/blob/25fecaaf87400af535c242fe50296f1f89ceeb16/src/java.base/share/classes/java/lang/String.java#L4850), or > > - [initialization after exception-throwing checks (in the superclass constructor)](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/nio/X-Buffer.java.template#L324). > > The optimization also enhances barrier elision for array initialization writes, for example eliding barriers from small array initialization loops (for which safepoints are not inserted): > > > Object[] a = new Object[...]; > for (int i = 0; i < a.length; i++) { > a[i] = ...; // barrier elided only after this changeset > } > > > or eliding barriers from array initialization writes with unknown array index: > > > Object[] a = new Object[...]; > a[index] = ...; // barrier elided only after this changeset > > > The logic used to perform this additional barrier elision is a subset of a pre-existing ZGC-specific optimization. This changeset simply reuses the relevant subset (barrier elision for writes to newly-allocated objects) by extracting the core of the optimization logic from `zBarrierSetC2.cpp` into the GC-shared file `barrierSetC2.cpp`. The functions `block_has_safepoint`, `block_index`, `look_through_node`, `is_{undefined|unknown|concrete}`, `get_base_and_offset`, `is_array... This pull request has now been integrated. Changeset: 8193e0d5 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/8193e0d53ac806d6974e2aacc7b7476aeb52a5fd Stats: 957 lines in 9 files changed: 669 ins; 264 del; 24 mod 8346280: C2: implement late barrier elision for G1 Reviewed-by: tschatzl, aboldtch, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/23235 From tschatzl at openjdk.org Tue Feb 18 10:51:11 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 18 Feb 2025 10:51:11 GMT Subject: RFR: 8349688: Crash assert(!_g1h->heap_region_containing(p)->is_survivor()) failed: Should have filtered out from-newly allocated survivor references already [v2] In-Reply-To: References: <8-jj1o3jeNZuavFIbg4VCh_oMXEJxP8vxqnz01cOVgg=.d7ea8a07-d5e6-46cc-9bdf-0f2e5c22632a@github.com> Message-ID: On Tue, 18 Feb 2025 09:12:50 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this fix to the bug in setting a region index to the optional cset. The crash happens because the incorrect `_index_in_opt_cset` refers to a region that has already been evacuated. >> >> Testing: tier5-common-apps > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into JDK-8349688-OptionalRegions > - set_index_in_opt_cset correctly Seems good, thanks for catching this. However, it would be nice to rename the PR and CR to something understandable, like "G1: Wrong initial optional region index when selecting candidates from retained regions" or so. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 486: > 484: void G1CollectionSet::select_candidates_from_retained(double time_remaining_ms) { > 485: uint num_initial_regions = 0; > 486: uint prev_num_optional_regions = _optional_groups.num_regions(); It would be great if the code in `G1CollectionSet::select_candidates_from_marking` would check that the number of optional regions is zero as well when initializing its `num_optional_regions`. ------------- PR Review: https://git.openjdk.org/jdk/pull/23568#pullrequestreview-2623141503 PR Review Comment: https://git.openjdk.org/jdk/pull/23568#discussion_r1959497709 From iwalulya at openjdk.org Tue Feb 18 19:13:57 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 18 Feb 2025 19:13:57 GMT Subject: RFR: 8349688: G1: Wrong initial optional region index when selecting candidates from retained regions [v3] In-Reply-To: <8-jj1o3jeNZuavFIbg4VCh_oMXEJxP8vxqnz01cOVgg=.d7ea8a07-d5e6-46cc-9bdf-0f2e5c22632a@github.com> References: <8-jj1o3jeNZuavFIbg4VCh_oMXEJxP8vxqnz01cOVgg=.d7ea8a07-d5e6-46cc-9bdf-0f2e5c22632a@github.com> Message-ID: > Hi, > > Please review this fix to the bug in setting a region index to the optional cset. The crash happens because the incorrect `_index_in_opt_cset` refers to a region that has already been evacuated. > > Testing: tier5-common-apps Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Thomas suggestion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23568/files - new: https://git.openjdk.org/jdk/pull/23568/files/7c5147a2..7fdf01fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23568&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23568&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23568.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23568/head:pull/23568 PR: https://git.openjdk.org/jdk/pull/23568 From iwalulya at openjdk.org Tue Feb 18 19:13:58 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 18 Feb 2025 19:13:58 GMT Subject: RFR: 8349688: G1: Wrong initial optional region index when selecting candidates from retained regions [v2] In-Reply-To: References: <8-jj1o3jeNZuavFIbg4VCh_oMXEJxP8vxqnz01cOVgg=.d7ea8a07-d5e6-46cc-9bdf-0f2e5c22632a@github.com> Message-ID: On Tue, 18 Feb 2025 10:47:05 GMT, Thomas Schatzl wrote: >> Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge remote-tracking branch 'upstream/master' into JDK-8349688-OptionalRegions >> - set_index_in_opt_cset correctly > > src/hotspot/share/gc/g1/g1CollectionSet.cpp line 486: > >> 484: void G1CollectionSet::select_candidates_from_retained(double time_remaining_ms) { >> 485: uint num_initial_regions = 0; >> 486: uint prev_num_optional_regions = _optional_groups.num_regions(); > > It would be great if the code in `G1CollectionSet::select_candidates_from_marking` would check that the number of optional regions is zero as well when initializing its `num_optional_regions`. Added, running it through testing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23568#discussion_r1960061849 From kdnilsen at openjdk.org Tue Feb 18 19:28:28 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 18 Feb 2025 19:28:28 GMT Subject: RFR: 8349766: GenShen: Bad progress after degen does not always need full gc [v5] In-Reply-To: References: Message-ID: > In generational mode, only upgrade to full GC from degenerated GC if we've done two degenerated cycles in a row and both indicated bad progress. Otherwise, start another concurrent GC, which will most likely degenerate also. But this degenerated cycle will reclaim floating garbage within the young generation much more quickly than a full GC would have done. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' of https://git.openjdk.org/jdk into defer-generational-full-gc - Merge master - Fix typo in merge conflict resolution - 8348595: GenShen: Fix generational free-memory no-progress check Reviewed-by: phh, xpeng - 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) Reviewed-by: shade - Merge tag 'jdk-25+10' into defer-generational-full-gc Added tag jdk-25+10 for changeset a637ccf2 - Be less eager to upgrade degen to full gc ------------- Changes: https://git.openjdk.org/jdk/pull/23552/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23552&range=04 Stats: 20 lines in 2 files changed: 17 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23552.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23552/head:pull/23552 PR: https://git.openjdk.org/jdk/pull/23552 From kdnilsen at openjdk.org Tue Feb 18 19:49:54 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 18 Feb 2025 19:49:54 GMT Subject: RFR: 8347804: GenShen: Crash with small GCCardSizeInBytes and small Java heap In-Reply-To: References: Message-ID: On Thu, 30 Jan 2025 18:55:53 GMT, Kelvin Nilsen wrote: > Original implementation was not robust to overriding of CardSizeInBytes, especially to smaller values. This fixes that issue. Internal pipelines reveal a regression. Changing to draft mode while I chase this down. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23373#issuecomment-2625332964 From kdnilsen at openjdk.org Tue Feb 18 19:49:54 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 18 Feb 2025 19:49:54 GMT Subject: RFR: 8347804: GenShen: Crash with small GCCardSizeInBytes and small Java heap Message-ID: Original implementation was not robust to overriding of CardSizeInBytes, especially to smaller values. This fixes that issue. ------------- Commit messages: - 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) - Merge tag 'jdk-25+10' into fix-small-card-size - Remove SIZE_FORMAT usage - Merge tag 'jdk-25+9' into fix-small-card-size - Remove debug instrumentation - Fix several bookkeeping errors - Revert "Remove debug instrumentation" - Remove debug instrumentation - Use snprintf instead of sprintf - Add a jtreg test for small card size - ... and 1 more: https://git.openjdk.org/jdk/compare/a637ccf2...7120cdf3 Changes: https://git.openjdk.org/jdk/pull/23373/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23373&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347804 Stats: 107 lines in 6 files changed: 79 ins; 4 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/23373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23373/head:pull/23373 PR: https://git.openjdk.org/jdk/pull/23373 From duke at openjdk.org Wed Feb 19 06:21:33 2025 From: duke at openjdk.org (sli-x) Date: Wed, 19 Feb 2025 06:21:33 GMT Subject: RFR: 8340434: Excessive Young GCs Triggered by CodeCache GC Threshold Message-ID: The trigger of _codecache_GC_threshold in CodeCache::gc_on_allocation is the key to this problem. if (used_ratio > threshold) { // After threshold is reached, scale it by free_ratio so that more aggressive // GC is triggered as we approach code cache exhaustion threshold *= free_ratio; } // If code cache has been allocated without any GC at all, let's make sure // it is eventually invoked to avoid trouble. if (allocated_since_last_ratio > threshold) { // In case the GC is concurrent, we make sure only one thread requests the GC. if (Atomic::cmpxchg(&_unloading_threshold_gc_requested, false, true) == false) { log_info(codecache)("Triggering threshold (%.3f%%) GC due to allocating %.3f%% since last unloading (%.3f%% used -> %.3f%% used)", threshold * 100.0, allocated_since_last_ratio * 100.0, last_used_ratio * 100.0, used_ratio * 100.0); Universe::heap()->collect(GCCause::_codecache_GC_threshold); } } Here with the limited codecache size, the free_ratio will get lower and lower (so as the threshold) if no methods can be swept and thus leads to a more and more frequent collection behavior. Since the collection happens in stw, the whole performance of gc will also be degraded. So a simple solution is to delete the scaling logic here. However, I think here lies some problems worth further exploring. There're two options to control a code cache sweeper, StartAggressiveSweepingAt and SweeperThreshold. StartAggressiveSweepingAt is a sweeper triggered for little space in codeCache and does little harm. However, SweeperThreshold, first introduced by [JDK-8244660](https://bugs.openjdk.org/browse/JDK-8244660), was designed for a regular sweep for codecache, when codeCache sweeper and heap collection are actually individual. After [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) and some patches related, the old mechanism of codeCache sweeper is merged into a concurrent heap collection. So the Code cache sweeper heuristics and the unloading behavior will be promised by the concurrent collection. There's no longer any "zombie" methods to be counted. Considering it will introduce lots of useless collection jobs, I think SweeperThreshold should be deleted now. ------------- Commit messages: - remove SweeperThreshold and set it to Obselete Changes: https://git.openjdk.org/jdk/pull/21084/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21084&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340434 Stats: 55 lines in 14 files changed: 1 ins; 53 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21084.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21084/head:pull/21084 PR: https://git.openjdk.org/jdk/pull/21084 From tschatzl at openjdk.org Wed Feb 19 06:21:33 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 19 Feb 2025 06:21:33 GMT Subject: RFR: 8340434: Excessive Young GCs Triggered by CodeCache GC Threshold In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 08:43:50 GMT, sli-x wrote: > Here with the limited codecache size, the free_ratio will get lower and lower (so as the threshold) if no methods can be swept and thus leads to a more and more frequent collection behavior. Since the collection happens in stw, the whole performance of gc will also be degraded. > >So a simple solution is to delete the scaling logic here. However, I think here lies some problems worth further exploring. > >There're two options to control a code cache sweeper, StartAggressiveSweepingAt and SweeperThreshold. StartAggressiveSweepingAt is a sweeper triggered for little space in codeCache and does little harm. However, SweeperThreshold, first introduced by [JDK-8244660](https://bugs.openjdk.org/browse/JDK-8244660), was designed for a regular sweep for codecache, when codeCache sweeper and heap collection are actually individual. After [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) and some patches related, the old mechanism of codeCache sweeper is merged into a concurrent heap collection. So the Code cache sweeper heuristics and the unloading behavior will be promised by the concurrent collection. There's no longer any "zombie" methods to be counted. Considering it will introduce lots of useless collection jobs, I think SweeperThreshold should be deleted now. I think the general concern presented out by the code > // After threshold is reached, scale it by free_ratio so that more aggressive > // GC is triggered as we approach code cache exhaustion is still valid. How this is implemented also makes somewhat sense: changes are the trigger for collections, allow larger changes before trying to clean out the code cache the emptier the code cache is. It tries to limit code cache memory usage by increasingly doing more frequent collections the more occupied the code cache becomes, i.e. some kind of backpressure on code cache usage. Your use case of limiting the code cache size (and setting initial == max) seems to be relatively unusual one to me, and apparently does not fit that model as it seems that you set code cache size close to actual max usage. Removing `SweepingThreshold` would affect the regular case as well in a significant way (allocate until bumping into the `StartAggressiveSweepingAt`) I do not think removing this part of the heuristic isn't good (or desired at all). Maybe an alternative could be only not doing this heuristic part in your case; and even then am not sure that waiting until hitting the `StartAggressiveSweepingAt` threshold is a good idea; it may be too late to avoid disabling the compiler at least temporarily. And even then, as long as the memory usage keeps being larger larger than the threshold, this will result in continuous code cache sweeps (_every time_ _any_ memory is allocated in the code cache). >From the [JDK-8244660](https://bugs.openjdk.org/browse/JDK-8244660) CR: > This is because users with different sized code caches might want different thresholds. (Otherwise there would be no way to control the sweepers intensity). Which means that one could just take that suggestion literally and not only change the initial/max code cache size but also that threshold in your use case. Stepping back a little, this situation very much resembles issues with G1's `InitiatingHeapOccupancyPercent` pre [JDK-8136677](https://bugs.openjdk.org/browse/JDK-8136677) where a one-size-fits-all value also did not work, and many many people tuned `InitiatingHeapOccupancyPercent` manually in the past. Maybe a similar mechanism at least taking actual code cache allocation rate into account ("when will the current watermark will be hit"?) would be preferable to replace both options (note that since I'm not an expert in code cache, there may be other reasons to clean out the code cache than just occupancy threshold)? Thomas ------------- PR Comment: https://git.openjdk.org/jdk/pull/21084#issuecomment-2383475220 From robilad at openjdk.org Wed Feb 19 06:21:33 2025 From: robilad at openjdk.org (Dalibor Topic) Date: Wed, 19 Feb 2025 06:21:33 GMT Subject: RFR: 8340434: Excessive Young GCs Triggered by CodeCache GC Threshold In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 08:43:50 GMT, sli-x wrote: > The trigger of _codecache_GC_threshold in CodeCache::gc_on_allocation is the key to this problem. > > if (used_ratio > threshold) { > // After threshold is reached, scale it by free_ratio so that more aggressive > // GC is triggered as we approach code cache exhaustion > threshold *= free_ratio; > } > // If code cache has been allocated without any GC at all, let's make sure > // it is eventually invoked to avoid trouble. > if (allocated_since_last_ratio > threshold) { > // In case the GC is concurrent, we make sure only one thread requests the GC. > if (Atomic::cmpxchg(&_unloading_threshold_gc_requested, false, true) == false) { > log_info(codecache)("Triggering threshold (%.3f%%) GC due to allocating %.3f%% since last unloading (%.3f%% used -> %.3f%% used)", > threshold * 100.0, allocated_since_last_ratio * 100.0, last_used_ratio * 100.0, used_ratio * 100.0); > Universe::heap()->collect(GCCause::_codecache_GC_threshold); > } > } > > Here with the limited codecache size, the free_ratio will get lower and lower (so as the threshold) if no methods can be swept and thus leads to a more and more frequent collection behavior. Since the collection happens in stw, the whole performance of gc will also be degraded. > > So a simple solution is to delete the scaling logic here. However, I think here lies some problems worth further exploring. > > There're two options to control a code cache sweeper, StartAggressiveSweepingAt and SweeperThreshold. StartAggressiveSweepingAt is a sweeper triggered for little space in codeCache and does little harm. However, SweeperThreshold, first introduced by [JDK-8244660](https://bugs.openjdk.org/browse/JDK-8244660), was designed for a regular sweep for codecache, when codeCache sweeper and heap collection are actually individual. After [JDK-8290025](https://bugs.openjdk.org/browse/JDK-8290025) and some patches related, the old mechanism of codeCache sweeper is merged into a concurrent heap collection. So the Code cache sweeper heuristics and the unloading behavior will be promised by the concurrent collection. There's no longer any "zombie" methods to be counted. Considering it will introduce lots of useless collection jobs, I think SweeperThreshold should be deleted now. Hi, please send an e-mail to dalibor.topic at oracle.com so that I can verify your account in Skara. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21084#issuecomment-2427338142 From dholmes at openjdk.org Wed Feb 19 06:31:53 2025 From: dholmes at openjdk.org (David Holmes) Date: Wed, 19 Feb 2025 06:31:53 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v2] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 18 Feb 2025 09:21:18 GMT, Albert Mingkun Yang wrote: > Then, maybe the logic is easier to read if the "atomic" access is visible directly from that context, instead of hiding it inside in_critical. Therefore, it probably makes more sense to introduce a new API. WDYT? Okay. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2667619218 From ayang at openjdk.org Wed Feb 19 13:01:55 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 19 Feb 2025 13:01:55 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v4] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 18 Feb 2025 09:20:57 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker All suggestions/comments are addressed. Tier1-8 pass. It's ready for another round of review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2668584881 From thomas.schatzl at oracle.com Wed Feb 19 13:16:38 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 19 Feb 2025 14:16:38 +0100 Subject: RFC: G1 as default collector (for real this time) Message-ID: <74d05686-9c57-4262-881d-31c269f34bc5@oracle.com> Hi all, there have been some recent discussions around making G1 the default for all use-cases, both internally at Oracle and at the OpenJDK Committers Workshop. With this e-mail I want to bring this subject to a wider audience to gather feedback around potential problems with such a move. As you all may know, G1 is the default collector in the Hotspot VM. However in some situations, some say in (too) many situations, the VM selects Serial GC "by default" anyway. :) From what I understand there are the following reasons to keep Serial GC _as default option_ in the context of "small" environments: * throughput: G1's large write barrier makes an argument about throughput being too far off and noticeable. Ongoing efforts ([1] which we plan for JDK 25) show that the difference is going to be much smaller if it ever was. Further, as soon as Serial GC is running for longer this advantage diminishes a lot due to full collections and can result in G1 actually performing better. * (native) memory footprint: G1 has made great strides in native memory usage. In the past particularly remembered sets were of concern, but their memory usage has been reduced significantly over the past few years. E.g. with above change the entire young gen remembered set is also managed on the card table exactly like in Serial GC. [I would also like to state that I would be surprised if remembered sets, with a recent JDK and G1, are ever an issue with heaps Serial GC targets] Heap management tends to be worse with Serial GC, mostly due to its strict generational boundaries. G1's region based layout avoids wasting memory. * latency: if this has ever been a disadvantage, Serial GC's full collections are worse compared to G1's incremental collections. * startup: the time to start up the VM is not that different between these two collectors. Other components are much more relevant here. * historical inertia: at the time there was need to select a default, there has been nothing but Serial and Parallel GC. JDK 9 simply replaced Parallel GC as default for "server class" machines, probably as path of lesser resistance and at the time known shortcomings in some of the above areas. Some initial testing showed that Serial GC much better when constraining it to the same environment (single thread, heaps < 1.7g) than G1. At the same time, looking at the current situation from an end users point of view, it is very much confusing for them, getting a different garbage collector depending on environment, based on some somewhat arguable criteria? This change would also make the expectations ("g1 is default since jdk 9") match the actual behavior. I am looking forward to hear about your opinion about making G1 unconditionally default. Thanks, Thomas [1] https://bugs.openjdk.org/browse/JDK-8340827 From tschatzl at openjdk.org Wed Feb 19 14:17:53 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 19 Feb 2025 14:17:53 GMT Subject: RFR: 8349688: G1: Wrong initial optional region index when selecting candidates from retained regions [v3] In-Reply-To: References: <8-jj1o3jeNZuavFIbg4VCh_oMXEJxP8vxqnz01cOVgg=.d7ea8a07-d5e6-46cc-9bdf-0f2e5c22632a@github.com> Message-ID: On Tue, 18 Feb 2025 19:13:57 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this fix to the bug in setting a region index to the optional cset. The crash happens because the incorrect `_index_in_opt_cset` refers to a region that has already been evacuated. >> >> Testing: tier5-common-apps > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Thomas suggestion lgtm ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23568#pullrequestreview-2626954045 From ayang at openjdk.org Wed Feb 19 14:18:05 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 19 Feb 2025 14:18:05 GMT Subject: RFR: 8348171: Refactor GenerationCounters and its subclasses [v5] In-Reply-To: <7otkT63ENoyKzZ29CbYpycLLwL89ARajYg36Mstz4tQ=.fd3c7dcf-5a8b-44be-9205-09e3d160d54d@github.com> References: <7otkT63ENoyKzZ29CbYpycLLwL89ARajYg36Mstz4tQ=.fd3c7dcf-5a8b-44be-9205-09e3d160d54d@github.com> Message-ID: On Tue, 11 Feb 2025 15:28:25 GMT, Albert Mingkun Yang wrote: >> Simple refactoring of removing the use of `virtual` method and use concrete subclasses when needed. >> >> Test: tier1-5 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into gen-counter > - review > - * some more refactoring > - review > - Merge branch 'master' into gen-counter > - merge > - gen-counter Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23209#issuecomment-2668780554 From ayang at openjdk.org Wed Feb 19 14:18:05 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 19 Feb 2025 14:18:05 GMT Subject: Integrated: 8348171: Refactor GenerationCounters and its subclasses In-Reply-To: References: Message-ID: On Tue, 21 Jan 2025 09:53:07 GMT, Albert Mingkun Yang wrote: > Simple refactoring of removing the use of `virtual` method and use concrete subclasses when needed. > > Test: tier1-5 This pull request has now been integrated. Changeset: c6e47fd5 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/c6e47fd5812997e3428249be1c77c60e7b05a5df Stats: 202 lines in 17 files changed: 6 ins; 160 del; 36 mod 8348171: Refactor GenerationCounters and its subclasses Co-authored-by: Thomas Schatzl Reviewed-by: gli, tschatzl, zgu ------------- PR: https://git.openjdk.org/jdk/pull/23209 From ayang at openjdk.org Wed Feb 19 14:22:58 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 19 Feb 2025 14:22:58 GMT Subject: RFR: 8349688: G1: Wrong initial optional region index when selecting candidates from retained regions [v3] In-Reply-To: References: <8-jj1o3jeNZuavFIbg4VCh_oMXEJxP8vxqnz01cOVgg=.d7ea8a07-d5e6-46cc-9bdf-0f2e5c22632a@github.com> Message-ID: On Tue, 18 Feb 2025 19:13:57 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this fix to the bug in setting a region index to the optional cset. The crash happens because the incorrect `_index_in_opt_cset` refers to a region that has already been evacuated. >> >> Testing: tier5-common-apps > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Thomas suggestion Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23568#pullrequestreview-2626971229 From iwalulya at openjdk.org Wed Feb 19 14:29:59 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 19 Feb 2025 14:29:59 GMT Subject: RFR: 8349688: G1: Wrong initial optional region index when selecting candidates from retained regions [v3] In-Reply-To: References: <8-jj1o3jeNZuavFIbg4VCh_oMXEJxP8vxqnz01cOVgg=.d7ea8a07-d5e6-46cc-9bdf-0f2e5c22632a@github.com> Message-ID: On Wed, 19 Feb 2025 14:20:21 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> Thomas suggestion > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk and @tschatzl for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23568#issuecomment-2668813998 From iwalulya at openjdk.org Wed Feb 19 14:30:00 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 19 Feb 2025 14:30:00 GMT Subject: Integrated: 8349688: G1: Wrong initial optional region index when selecting candidates from retained regions In-Reply-To: <8-jj1o3jeNZuavFIbg4VCh_oMXEJxP8vxqnz01cOVgg=.d7ea8a07-d5e6-46cc-9bdf-0f2e5c22632a@github.com> References: <8-jj1o3jeNZuavFIbg4VCh_oMXEJxP8vxqnz01cOVgg=.d7ea8a07-d5e6-46cc-9bdf-0f2e5c22632a@github.com> Message-ID: On Tue, 11 Feb 2025 18:33:42 GMT, Ivan Walulya wrote: > Hi, > > Please review this fix to the bug in setting a region index to the optional cset. The crash happens because the incorrect `_index_in_opt_cset` refers to a region that has already been evacuated. > > Testing: tier5-common-apps This pull request has now been integrated. Changeset: efbad00c Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/efbad00c4d7931177ccc5e9bce3b30dfbac94010 Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod 8349688: G1: Wrong initial optional region index when selecting candidates from retained regions Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.org/jdk/pull/23568 From tschatzl at openjdk.org Wed Feb 19 15:06:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 19 Feb 2025 15:06:56 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v4] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 18 Feb 2025 09:20:57 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2627110375 From shade at openjdk.org Wed Feb 19 20:58:05 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Feb 2025 20:58:05 GMT Subject: RFR: 8350285: Regression caused by ShenandoahLock under extreme contention In-Reply-To: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> Message-ID: <8ggezlgzCLo_ux2lTJ1UJzVD5VgwSlSo35ze0tQ8XcI=.b4d84743-5e40-4478-82ab-b0de3505e6ab@github.com> On Wed, 19 Feb 2025 15:58:01 GMT, Xiaolong Peng wrote: > We have noticed there is significant regression in at-safepoint time with recent changes made to ShenandoahLock, more specifically this [PR](https://github.com/openjdk/jdk/pull/19570), a local reproducer was written to reproduce the issue, here is the top N at-safepoint time in `ns` comparison: > > Tip: > > 94069776 > 50993550 > 49321667 > 33903446 > 32291313 > 30587810 > 27759958 > 25080997 > 24657404 > 23874338 > > Tip + reverting [PR](https://github.com/openjdk/jdk/pull/19570) > > 58428998 > 44410618 > 30788370 > 20636942 > 15986465 > 15307468 > 9686426 > 9432094 > 7473938 > 6854014 > > Note: command line for the test: > > java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint ~/Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr > > > With further digging, we found the real problem is more runnable threads after the [PR](https://github.com/openjdk/jdk/pull/19570) causes longer time for VM_Thread to call disarm wait barrier when leaving safepoint. Fixing in the issue in VM_Thread benefits other GCs as well but it is more complicated(see the details here https://bugs.openjdk.org/browse/JDK-8350324). > With some tweaks in ShenandoahLock, we could mitigate the regression caused by [PR](https://github.com/openjdk/jdk/pull/19570), also improve the long tails of at-saftpoint time by more than 10x, here is the result from the same test with this changes of this PR: > > > 1890706 > 1222180 > 1042758 > 853157 > 792057 > 785697 > 780627 > 757817 > 740607 > 736646 > 725727 > 725596 > 724106 > > > ### Other test > - [x] `make test TEST=hotspot_gc_shenandoah` > - [x] Tier 2 All right, assuming performance results are good. Consider a minor nit: src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 71: > 69: while (SafepointSynchronize::is_synchronizing() && > 70: !SafepointMechanism::local_poll_armed(java_thread)) { > 71: short_sleep(); Why not `yield_or_sleep` here? src/hotspot/share/gc/shenandoah/shenandoahLock.hpp line 48: > 46: void yield_or_sleep(int &yields) { > 47: if (yields < 5) { > 48: os::naked_yield(); Need `#include "runtime/os.hpp"` for this. There is likely a transitive dependency now, but it is cleaner to depend explicitly. Or, maybe move this definition to `shenandoahLock.cpp`, it would be even cleaner then, I think. src/hotspot/share/gc/shenandoah/shenandoahLock.hpp line 61: > 59: #else > 60: os::naked_short_nanosleep(100000); > 61: #endif Any context where this is coming from? This looks like from `SpinYield`? If so, should we target `SpinYield::default_sleep_ns=1000`? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23701#pullrequestreview-2627707591 PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962048735 PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962211656 PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962050750 From xpeng at openjdk.org Wed Feb 19 20:58:04 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 19 Feb 2025 20:58:04 GMT Subject: RFR: 8350285: Regression caused by ShenandoahLock under extreme contention Message-ID: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> We have noticed there is significant regression in at-safepoint time with recent changes made to ShenandoahLock, more specifically this [PR](https://github.com/openjdk/jdk/pull/19570), a local reproducer was written to reproduce the issue, here is the top N at-safepoint time in `ns` comparison: Tip: 94069776 50993550 49321667 33903446 32291313 30587810 27759958 25080997 24657404 23874338 Tip + reverting [PR](https://github.com/openjdk/jdk/pull/19570) 58428998 44410618 30788370 20636942 15986465 15307468 9686426 9432094 7473938 6854014 Note: command line for the test: java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint ~/Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr With further digging, we found the real problem is more runnable threads after the [PR](https://github.com/openjdk/jdk/pull/19570) causes longer time for VM_Thread to call disarm wait barrier when leaving safepoint. Fixing in the issue in VM_Thread benefits other GCs as well but it is more complicated(see the details here https://bugs.openjdk.org/browse/JDK-8350324). With some tweaks in ShenandoahLock, we could mitigate the regression caused by [PR](https://github.com/openjdk/jdk/pull/19570), also improve the long tails of at-saftpoint time by more than 10x, here is the result from the same test with this changes of this PR: 1890706 1222180 1042758 853157 792057 785697 780627 757817 740607 736646 725727 725596 724106 ### Other test - [x] `make test TEST=hotspot_gc_shenandoah` - [x] Tier 2 ------------- Commit messages: - Move impl of yield_or_sleep to cpp file - Address review comments - Reset yields count to 0 after short sleep - Merge branch 'openjdk:master' into shenandoah-lock-fix - Put thread to sleep after it yield up to 5 times to contend for ShenandoahLock w/o luck Changes: https://git.openjdk.org/jdk/pull/23701/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23701&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350285 Stats: 16 lines in 2 files changed: 13 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23701.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23701/head:pull/23701 PR: https://git.openjdk.org/jdk/pull/23701 From xpeng at openjdk.org Wed Feb 19 20:58:05 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 19 Feb 2025 20:58:05 GMT Subject: RFR: 8350285: Regression caused by ShenandoahLock under extreme contention In-Reply-To: <8ggezlgzCLo_ux2lTJ1UJzVD5VgwSlSo35ze0tQ8XcI=.b4d84743-5e40-4478-82ab-b0de3505e6ab@github.com> References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> <8ggezlgzCLo_ux2lTJ1UJzVD5VgwSlSo35ze0tQ8XcI=.b4d84743-5e40-4478-82ab-b0de3505e6ab@github.com> Message-ID: <5au_4m8XLah7rypwO90JKB5C41b7meh_IVRXYuOYveY=.daf14a11-252e-4dbf-846f-acccae09af18@github.com> On Wed, 19 Feb 2025 16:56:14 GMT, Aleksey Shipilev wrote: >> We have noticed there is significant regression in at-safepoint time with recent changes made to ShenandoahLock, more specifically this [PR](https://github.com/openjdk/jdk/pull/19570), a local reproducer was written to reproduce the issue, here is the top N at-safepoint time in `ns` comparison: >> >> Tip: >> >> 94069776 >> 50993550 >> 49321667 >> 33903446 >> 32291313 >> 30587810 >> 27759958 >> 25080997 >> 24657404 >> 23874338 >> >> Tip + reverting [PR](https://github.com/openjdk/jdk/pull/19570) >> >> 58428998 >> 44410618 >> 30788370 >> 20636942 >> 15986465 >> 15307468 >> 9686426 >> 9432094 >> 7473938 >> 6854014 >> >> Note: command line for the test: >> >> java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint ~/Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr >> >> >> With further digging, we found the real problem is more runnable threads after the [PR](https://github.com/openjdk/jdk/pull/19570) causes longer time for VM_Thread to call disarm wait barrier when leaving safepoint. Fixing in the issue in VM_Thread benefits other GCs as well but it is more complicated(see the details here https://bugs.openjdk.org/browse/JDK-8350324). >> With some tweaks in ShenandoahLock, we could mitigate the regression caused by [PR](https://github.com/openjdk/jdk/pull/19570), also improve the long tails of at-saftpoint time by more than 10x, here is the result from the same test with this changes of this PR: >> >> >> 1890706 >> 1222180 >> 1042758 >> 853157 >> 792057 >> 785697 >> 780627 >> 757817 >> 740607 >> 736646 >> 725727 >> 725596 >> 724106 >> >> >> ### Other test >> - [x] `make test TEST=hotspot_gc_shenandoah` >> - [x] Tier 2 > > src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 71: > >> 69: while (SafepointSynchronize::is_synchronizing() && >> 70: !SafepointMechanism::local_poll_armed(java_thread)) { >> 71: short_sleep(); > > Why not `yield_or_sleep` here? It can be `yield_or_sleep` here now, I'll rerun a test to verify, I have updated `yield_or_sleep` to reset the counter after `short_sleep`. In our test last year, we noticed Java tread may run this loop over 20k times in worse case, the older version `yields` counter won't be reset, so it is possible after safepoint some Java thread will only do `short_sleep`, which may increase allocation latency, I don't want waiting on safepoint poll impact the allocation path after safepoint. > src/hotspot/share/gc/shenandoah/shenandoahLock.hpp line 48: > >> 46: void yield_or_sleep(int &yields) { >> 47: if (yields < 5) { >> 48: os::naked_yield(); > > Need `#include "runtime/os.hpp"` for this. There is likely a transitive dependency now, but it is cleaner to depend explicitly. Or, maybe move this definition to `shenandoahLock.cpp`, it would be even cleaner then, I think. Thanks, just moved the definition to `shenandoahLock.cpp`, it can be static as well. > src/hotspot/share/gc/shenandoah/shenandoahLock.hpp line 61: > >> 59: #else >> 60: os::naked_short_nanosleep(100000); >> 61: #endif > > Any context where this is coming from? This looks like from `SpinYield`? If so, should we target `SpinYield::default_sleep_ns=1000`? It is just a magic number we tested, not from `SpinYield`. I chose 100us because Shenandoah GC pause is usually less then 1ms, I also tested 10us but 100us was better in the test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962088642 PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962243559 PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962094983 From xpeng at openjdk.org Wed Feb 19 20:58:06 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 19 Feb 2025 20:58:06 GMT Subject: RFR: 8350285: Regression caused by ShenandoahLock under extreme contention In-Reply-To: <5au_4m8XLah7rypwO90JKB5C41b7meh_IVRXYuOYveY=.daf14a11-252e-4dbf-846f-acccae09af18@github.com> References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> <8ggezlgzCLo_ux2lTJ1UJzVD5VgwSlSo35ze0tQ8XcI=.b4d84743-5e40-4478-82ab-b0de3505e6ab@github.com> <5au_4m8XLah7rypwO90JKB5C41b7meh_IVRXYuOYveY=.daf14a11-252e-4dbf-846f-acccae09af18@github.com> Message-ID: On Wed, 19 Feb 2025 17:22:59 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 71: >> >>> 69: while (SafepointSynchronize::is_synchronizing() && >>> 70: !SafepointMechanism::local_poll_armed(java_thread)) { >>> 71: short_sleep(); >> >> Why not `yield_or_sleep` here? > > It can be `yield_or_sleep` here now, I'll rerun a test to verify, I have updated `yield_or_sleep` to reset the counter after `short_sleep`. > > In our test last year, we noticed Java tread may run this loop over 20k times in worse case, the older version `yields` counter won't be reset, so it is possible after safepoint some Java thread will only do `short_sleep`, which may increase allocation latency, I don't want waiting on safepoint poll impact the allocation path after safepoint. I have update code to use yield_or_sleep here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962201794 From shade at openjdk.org Wed Feb 19 20:58:06 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Feb 2025 20:58:06 GMT Subject: RFR: 8350285: Regression caused by ShenandoahLock under extreme contention In-Reply-To: <5au_4m8XLah7rypwO90JKB5C41b7meh_IVRXYuOYveY=.daf14a11-252e-4dbf-846f-acccae09af18@github.com> References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> <8ggezlgzCLo_ux2lTJ1UJzVD5VgwSlSo35ze0tQ8XcI=.b4d84743-5e40-4478-82ab-b0de3505e6ab@github.com> <5au_4m8XLah7rypwO90JKB5C41b7meh_IVRXYuOYveY=.daf14a11-252e-4dbf-846f-acccae09af18@github.com> Message-ID: On Wed, 19 Feb 2025 17:27:15 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahLock.hpp line 61: >> >>> 59: #else >>> 60: os::naked_short_nanosleep(100000); >>> 61: #endif >> >> Any context where this is coming from? This looks like from `SpinYield`? If so, should we target `SpinYield::default_sleep_ns=1000`? > > It is just a magic number we tested, not from `SpinYield`. I chose 100us because Shenandoah GC pause is usually less then 1ms, I also tested 10us but 100us was better in the test. I looked around HS sources, and I think the closest primitive we have is `HandshakeSpinYield`, which does 10us sleeps. How much worse is 10us in comparison to 100us in your tests? I would prefer to do 10us for all platforms, if performance tests allow us. This would also allow you to inline `short_sleep()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962150403 From xpeng at openjdk.org Wed Feb 19 20:58:06 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 19 Feb 2025 20:58:06 GMT Subject: RFR: 8350285: Regression caused by ShenandoahLock under extreme contention In-Reply-To: References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> <8ggezlgzCLo_ux2lTJ1UJzVD5VgwSlSo35ze0tQ8XcI=.b4d84743-5e40-4478-82ab-b0de3505e6ab@github.com> <5au_4m8XLah7rypwO90JKB5C41b7meh_IVRXYuOYveY=.daf14a11-252e-4dbf-846f-acccae09af18@github.com> Message-ID: On Wed, 19 Feb 2025 18:05:50 GMT, Aleksey Shipilev wrote: >> It is just a magic number we tested, not from `SpinYield`. I chose 100us because Shenandoah GC pause is usually less then 1ms, I also tested 10us but 100us was better in the test. > > I looked around HS sources, and I think the closest primitive we have is `HandshakeSpinYield`, which does 10us sleeps. How much worse is 10us in comparison to 100us in your tests? I would prefer to do 10us for all platforms, if performance tests allow us. This would also allow you to inline `short_sleep()`. It is like more than 2x worse, but still much better(10x) than tip and revert of JDK-8331411, here are the top 10 at-safepoint time for comparison: 10 us: 7982953 5043319 5008139 4597156 4580556 4429245 4403175 4047891 3677389 3582308 100 us: 4553716 1703093 1046248 1038148 780447 778786 778436 778276 728716 721856 I can change it to same seep time for all platforms, but Windows doesn't really support nanosecond resolution sleep, JVM use combination of yield and spin pause to approximate nanosecond resolution sleep for Windows, it should be still fine since it won't be worse then only `yield` as before. In short, I am not expecting any improvement for Windows. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962180175 From xpeng at openjdk.org Wed Feb 19 20:58:06 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 19 Feb 2025 20:58:06 GMT Subject: RFR: 8350285: Regression caused by ShenandoahLock under extreme contention In-Reply-To: References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> <8ggezlgzCLo_ux2lTJ1UJzVD5VgwSlSo35ze0tQ8XcI=.b4d84743-5e40-4478-82ab-b0de3505e6ab@github.com> <5au_4m8XLah7rypwO90JKB5C41b7meh_IVRXYuOYveY=.daf14a11-252e-4dbf-846f-acccae09af18@github.com> Message-ID: On Wed, 19 Feb 2025 18:29:04 GMT, Xiaolong Peng wrote: >> I looked around HS sources, and I think the closest primitive we have is `HandshakeSpinYield`, which does 10us sleeps. How much worse is 10us in comparison to 100us in your tests? I would prefer to do 10us for all platforms, if performance tests allow us. This would also allow you to inline `short_sleep()`. > > It is like more than 2x worse, but still much better(10x) than tip and revert of JDK-8331411, here are the top 10 at-safepoint time for comparison: > 10 us: > > 7982953 > 5043319 > 5008139 > 4597156 > 4580556 > 4429245 > 4403175 > 4047891 > 3677389 > 3582308 > > > 100 us: > > 4553716 > 1703093 > 1046248 > 1038148 > 780447 > 778786 > 778436 > 778276 > 728716 > 721856 > > > I can change it to same seep time for all platforms, but Windows doesn't really support nanosecond resolution sleep, JVM use combination of yield and spin pause to approximate nanosecond resolution sleep for Windows, it should be still fine since it won't be worse then only `yield` as before. In short, I am not expecting any improvement for Windows. I have changed to be same for all platforms, but still keep 100us sleep duration given that it did perform better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23701#discussion_r1962203617 From kdnilsen at openjdk.org Wed Feb 19 21:23:54 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 19 Feb 2025 21:23:54 GMT Subject: RFR: 8350285: Regression caused by ShenandoahLock under extreme contention In-Reply-To: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> Message-ID: <_qWdKYXFMjPhwL_I2udsXUTyRYKUzOpKMZ3Bx3x-hWQ=.e5b6d1fc-cfda-46af-9c6e-2aedca880353@github.com> On Wed, 19 Feb 2025 15:58:01 GMT, Xiaolong Peng wrote: > We have noticed there is significant regression in at-safepoint time with recent changes made to ShenandoahLock, more specifically this [PR](https://github.com/openjdk/jdk/pull/19570), a local reproducer was written to reproduce the issue, here is the top N at-safepoint time in `ns` comparison: > > Tip: > > 94069776 > 50993550 > 49321667 > 33903446 > 32291313 > 30587810 > 27759958 > 25080997 > 24657404 > 23874338 > > Tip + reverting [PR](https://github.com/openjdk/jdk/pull/19570) > > 58428998 > 44410618 > 30788370 > 20636942 > 15986465 > 15307468 > 9686426 > 9432094 > 7473938 > 6854014 > > Note: command line for the test: > > java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint ~/Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr > > > With further digging, we found the real problem is more runnable threads after the [PR](https://github.com/openjdk/jdk/pull/19570) causes longer time for VM_Thread to call `futex(FUTEX_WAKE_PRIVATE)` to disarm wait barrier when leaving safepoint. Fixing in the issue in VM_Thread benefits other GCs as well but it is more complicated(see the details here https://bugs.openjdk.org/browse/JDK-8350324). > With some tweaks in ShenandoahLock, we could mitigate the regression caused by [PR](https://github.com/openjdk/jdk/pull/19570), also improve the long tails of at-saftpoint time by more than 10x, here is the result from the same test with this changes of this PR: > > > 1890706 > 1222180 > 1042758 > 853157 > 792057 > 785697 > 780627 > 757817 > 740607 > 736646 > 725727 > 725596 > 724106 > > > ### Other test > - [x] `make test TEST=hotspot_gc_shenandoah` > - [x] Tier 2 Yielding 5x for every 1 nanosleep seems a bit "arbitrary". I assume you found that the number 5 delivered the "best performance" compared to other numbers you might have chosen. I wonder if different architectures with different numbers of cores, different operating systems, and/or different test applications that have different numbers of runnable threads would also perform best with this same magic number 5. Could we at least add a comment explaining how/why we chose 5 here? ------------- PR Review: https://git.openjdk.org/jdk/pull/23701#pullrequestreview-2628009211 From xpeng at openjdk.org Wed Feb 19 21:30:52 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 19 Feb 2025 21:30:52 GMT Subject: RFR: 8350285: Regression caused by ShenandoahLock under extreme contention In-Reply-To: <_qWdKYXFMjPhwL_I2udsXUTyRYKUzOpKMZ3Bx3x-hWQ=.e5b6d1fc-cfda-46af-9c6e-2aedca880353@github.com> References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> <_qWdKYXFMjPhwL_I2udsXUTyRYKUzOpKMZ3Bx3x-hWQ=.e5b6d1fc-cfda-46af-9c6e-2aedca880353@github.com> Message-ID: On Wed, 19 Feb 2025 21:21:42 GMT, Kelvin Nilsen wrote: > Yielding 5x for every 1 nanosleep seems a bit "arbitrary". I assume you found that the number 5 delivered the "best performance" compared to other numbers you might have chosen. I wonder if different architectures with different numbers of cores, different operating systems, and/or different test applications that have different numbers of runnable threads would also perform best with this same magic number 5. > > Could we at least add a comment explaining how/why we chose 5 here? 5 is from the old implementation, the old implementation was copied from https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/thread.cpp#L577 I can do a bit more test on this, and add some comments to explain why we choose the magic number 5. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23701#issuecomment-2669800369 From xpeng at openjdk.org Wed Feb 19 22:38:53 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 19 Feb 2025 22:38:53 GMT Subject: RFR: 8350285: Regression caused by ShenandoahLock under extreme contention In-Reply-To: References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> <_qWdKYXFMjPhwL_I2udsXUTyRYKUzOpKMZ3Bx3x-hWQ=.e5b6d1fc-cfda-46af-9c6e-2aedca880353@github.com> Message-ID: On Wed, 19 Feb 2025 21:28:40 GMT, Xiaolong Peng wrote: > > Yielding 5x for every 1 nanosleep seems a bit "arbitrary". I assume you found that the number 5 delivered the "best performance" compared to other numbers you might have chosen. I wonder if different architectures with different numbers of cores, different operating systems, and/or different test applications that have different numbers of runnable threads would also perform best with this same magic number 5. > > Could we at least add a comment explaining how/why we chose 5 here? > > 5 is from the old implementation, the old implementation was copied from https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/thread.cpp#L577 > > I can do a bit more test on this, and add some comments to explain why we choose the magic number 5. I have tested 3/5/7, safepoint time is very close in the tests with 3 or 5 yields(3 is slightly better), but it is worse with 7, I can change it to 3. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23701#issuecomment-2669916924 From xpeng at openjdk.org Wed Feb 19 22:49:22 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 19 Feb 2025 22:49:22 GMT Subject: RFR: 8350285: Regression caused by ShenandoahLock under extreme contention [v2] In-Reply-To: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> Message-ID: > We have noticed there is significant regression in at-safepoint time with recent changes made to ShenandoahLock, more specifically this [PR](https://github.com/openjdk/jdk/pull/19570), a local reproducer was written to reproduce the issue, here is the top N at-safepoint time in `ns` comparison: > > Tip: > > 94069776 > 50993550 > 49321667 > 33903446 > 32291313 > 30587810 > 27759958 > 25080997 > 24657404 > 23874338 > > Tip + reverting [PR](https://github.com/openjdk/jdk/pull/19570) > > 58428998 > 44410618 > 30788370 > 20636942 > 15986465 > 15307468 > 9686426 > 9432094 > 7473938 > 6854014 > > Note: command line for the test: > > java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint ~/Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr > > > With further digging, we found the real problem is more runnable threads after the [PR](https://github.com/openjdk/jdk/pull/19570) causes longer time for VM_Thread to call `futex(FUTEX_WAKE_PRIVATE)` to disarm wait barrier when leaving safepoint. Fixing in the issue in VM_Thread benefits other GCs as well but it is more complicated(see the details here https://bugs.openjdk.org/browse/JDK-8350324). > With some tweaks in ShenandoahLock, we could mitigate the regression caused by [PR](https://github.com/openjdk/jdk/pull/19570), also improve the long tails of at-saftpoint time by more than 10x, here is the result from the same test with this changes of this PR: > > > 1890706 > 1222180 > 1042758 > 853157 > 792057 > 785697 > 780627 > 757817 > 740607 > 736646 > 725727 > 725596 > 724106 > > > ### Other test > - [x] `make test TEST=hotspot_gc_shenandoah` > - [x] Tier 2 Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23701/files - new: https://git.openjdk.org/jdk/pull/23701/files/756f7820..68e1b985 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23701&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23701&range=00-01 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23701.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23701/head:pull/23701 PR: https://git.openjdk.org/jdk/pull/23701 From kdnilsen at openjdk.org Thu Feb 20 14:22:56 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 20 Feb 2025 14:22:56 GMT Subject: RFR: 8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention [v2] In-Reply-To: References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> Message-ID: <3X_Qwm8n-a9jDEWsqcXPQF2piMQ8Pbf7OtjXCYeZB6A=.9ec22dd1-e75f-4696-b3da-48eb4f75bd9a@github.com> On Wed, 19 Feb 2025 22:49:22 GMT, Xiaolong Peng wrote: >> We have noticed there is significant regression in at-safepoint time with recent changes made to ShenandoahLock, more specifically this [PR](https://github.com/openjdk/jdk/pull/19570), a local reproducer was written to reproduce the issue, here is the top N at-safepoint time in `ns` comparison: >> >> Tip: >> >> 94069776 >> 50993550 >> 49321667 >> 33903446 >> 32291313 >> 30587810 >> 27759958 >> 25080997 >> 24657404 >> 23874338 >> >> Tip + reverting [PR](https://github.com/openjdk/jdk/pull/19570) >> >> 58428998 >> 44410618 >> 30788370 >> 20636942 >> 15986465 >> 15307468 >> 9686426 >> 9432094 >> 7473938 >> 6854014 >> >> Note: command line for the test: >> >> java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint ~/Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr >> >> >> With further digging, we found the real problem is more runnable threads after the [PR](https://github.com/openjdk/jdk/pull/19570) causes longer time for VM_Thread to call `futex(FUTEX_WAKE_PRIVATE)` to disarm wait barrier when leaving safepoint. Fixing in the issue in VM_Thread benefits other GCs as well but it is more complicated(see the details here https://bugs.openjdk.org/browse/JDK-8350324). >> With some tweaks in ShenandoahLock, we could mitigate the regression caused by [PR](https://github.com/openjdk/jdk/pull/19570), also improve the long tails of at-saftpoint time by more than 10x, here is the result from the same test with this changes of this PR: >> >> >> 1890706 >> 1222180 >> 1042758 >> 853157 >> 792057 >> 785697 >> 780627 >> 757817 >> 740607 >> 736646 >> 725727 >> 725596 >> 724106 >> >> >> ### Other test >> - [x] `make test TEST=hotspot_gc_shenandoah` >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Thanks for followup. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/23701#pullrequestreview-2630011438 From xpeng at openjdk.org Fri Feb 21 06:50:53 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 21 Feb 2025 06:50:53 GMT Subject: RFR: 8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention [v2] In-Reply-To: References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> Message-ID: <3TP8q0-9_fuI5p0CjFlHYzr7UPeLhLsV7hSv0Np3s7I=.f506318a-3243-4efd-b838-f46f9573492e@github.com> On Wed, 19 Feb 2025 22:49:22 GMT, Xiaolong Peng wrote: >> We have noticed there is significant regression in at-safepoint time with recent changes made to ShenandoahLock, more specifically this [PR](https://github.com/openjdk/jdk/pull/19570), a local reproducer was written to reproduce the issue, here is the top N at-safepoint time in `ns` comparison: >> >> Tip: >> >> 94069776 >> 50993550 >> 49321667 >> 33903446 >> 32291313 >> 30587810 >> 27759958 >> 25080997 >> 24657404 >> 23874338 >> >> Tip + reverting [PR](https://github.com/openjdk/jdk/pull/19570) >> >> 58428998 >> 44410618 >> 30788370 >> 20636942 >> 15986465 >> 15307468 >> 9686426 >> 9432094 >> 7473938 >> 6854014 >> >> Note: command line for the test: >> >> java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint ~/Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr >> >> >> With further digging, we found the real problem is more runnable threads after the [PR](https://github.com/openjdk/jdk/pull/19570) causes longer time for VM_Thread to call `futex(FUTEX_WAKE_PRIVATE)` to disarm wait barrier when leaving safepoint. Fixing in the issue in VM_Thread benefits other GCs as well but it is more complicated(see the details here https://bugs.openjdk.org/browse/JDK-8350324). >> With some tweaks in ShenandoahLock, we could mitigate the regression caused by [PR](https://github.com/openjdk/jdk/pull/19570), also improve the long tails of at-saftpoint time by more than 10x, here is the result from the same test with this changes of this PR: >> >> >> 1890706 >> 1222180 >> 1042758 >> 853157 >> 792057 >> 785697 >> 780627 >> 757817 >> 740607 >> 736646 >> 725727 >> 725596 >> 724106 >> >> >> ### Other test >> - [x] `make test TEST=hotspot_gc_shenandoah` >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23701#issuecomment-2673624414 From xpeng at openjdk.org Fri Feb 21 06:54:53 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 21 Feb 2025 06:54:53 GMT Subject: RFR: 8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention [v2] In-Reply-To: References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> Message-ID: On Wed, 19 Feb 2025 22:49:22 GMT, Xiaolong Peng wrote: >> We have noticed there is significant regression in at-safepoint time with recent changes made to ShenandoahLock, more specifically this [PR](https://github.com/openjdk/jdk/pull/19570), a local reproducer was written to reproduce the issue, here is the top N at-safepoint time in `ns` comparison: >> >> Tip: >> >> 94069776 >> 50993550 >> 49321667 >> 33903446 >> 32291313 >> 30587810 >> 27759958 >> 25080997 >> 24657404 >> 23874338 >> >> Tip + reverting [PR](https://github.com/openjdk/jdk/pull/19570) >> >> 58428998 >> 44410618 >> 30788370 >> 20636942 >> 15986465 >> 15307468 >> 9686426 >> 9432094 >> 7473938 >> 6854014 >> >> Note: command line for the test: >> >> java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint ~/Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr >> >> >> With further digging, we found the real problem is more runnable threads after the [PR](https://github.com/openjdk/jdk/pull/19570) causes longer time for VM_Thread to call `futex(FUTEX_WAKE_PRIVATE)` to disarm wait barrier when leaving safepoint. Fixing in the issue in VM_Thread benefits other GCs as well but it is more complicated(see the details here https://bugs.openjdk.org/browse/JDK-8350324). >> With some tweaks in ShenandoahLock, we could mitigate the regression caused by [PR](https://github.com/openjdk/jdk/pull/19570), also improve the long tails of at-saftpoint time by more than 10x, here is the result from the same test with this changes of this PR: >> >> >> 1890706 >> 1222180 >> 1042758 >> 853157 >> 792057 >> 785697 >> 780627 >> 757817 >> 740607 >> 736646 >> 725727 >> 725596 >> 724106 >> >> >> ### Other test >> - [x] `make test TEST=hotspot_gc_shenandoah` >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments > > @pengxiaolong This pull request has not yet been marked as ready for integration. > Sorry I misread it ------------- PR Comment: https://git.openjdk.org/jdk/pull/23701#issuecomment-2673660240 From shade at openjdk.org Fri Feb 21 08:49:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 21 Feb 2025 08:49:55 GMT Subject: RFR: 8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention [v2] In-Reply-To: References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> Message-ID: On Wed, 19 Feb 2025 22:49:22 GMT, Xiaolong Peng wrote: >> We have noticed there is significant regression in at-safepoint time with recent changes made to ShenandoahLock, more specifically this [PR](https://github.com/openjdk/jdk/pull/19570), a local reproducer was written to reproduce the issue, here is the top N at-safepoint time in `ns` comparison: >> >> Tip: >> >> 94069776 >> 50993550 >> 49321667 >> 33903446 >> 32291313 >> 30587810 >> 27759958 >> 25080997 >> 24657404 >> 23874338 >> >> Tip + reverting [PR](https://github.com/openjdk/jdk/pull/19570) >> >> 58428998 >> 44410618 >> 30788370 >> 20636942 >> 15986465 >> 15307468 >> 9686426 >> 9432094 >> 7473938 >> 6854014 >> >> Note: command line for the test: >> >> java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint ~/Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr >> >> >> With further digging, we found the real problem is more runnable threads after the [PR](https://github.com/openjdk/jdk/pull/19570) causes longer time for VM_Thread to call `futex(FUTEX_WAKE_PRIVATE)` to disarm wait barrier when leaving safepoint. Fixing in the issue in VM_Thread benefits other GCs as well but it is more complicated(see the details here https://bugs.openjdk.org/browse/JDK-8350324). >> With some tweaks in ShenandoahLock, we could mitigate the regression caused by [PR](https://github.com/openjdk/jdk/pull/19570), also improve the long tails of at-saftpoint time by more than 10x, here is the result from the same test with this changes of this PR: >> >> >> 1890706 >> 1222180 >> 1042758 >> 853157 >> 792057 >> 785697 >> 780627 >> 757817 >> 740607 >> 736646 >> 725727 >> 725596 >> 724106 >> >> >> ### Other test >> - [x] `make test TEST=hotspot_gc_shenandoah` >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23701#pullrequestreview-2632396522 From erik.osterlund at oracle.com Fri Feb 21 14:02:39 2025 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Fri, 21 Feb 2025 14:02:39 +0000 Subject: RFC: G1 as default collector (for real this time) In-Reply-To: <74d05686-9c57-4262-881d-31c269f34bc5@oracle.com> References: <74d05686-9c57-4262-881d-31c269f34bc5@oracle.com> Message-ID: <61CEE33A-6718-479D-A498-697C1063B5AA@oracle.com> Hi Thomas, When G1 came, the main focus was making it scale better to larger workloads. It did so with a more well rounded balance between latency, throughput and memory footprint than other collectors, and became a good fit for the default GC. Given its initial focus as a region based collector that could scale better, it made sense that Serial was still better in smaller environments where that scaling wasn?t really needed. When G1 was made default, it made sense to stay away from the small machine realm. Since then, the gap between Serial and G1 has diminished over time, and it still keeps on diminishing. So it certainly makes sense to consider the option of changing Serial -> G1 by default in the small realm. The result of doing that is, as you point out, that there is a single default GC invariant of the environment scale, instead of having an exception for small environments. That?s nice. There is however a flip side for that argument on the other side of the scaling spectrum, where ZGC is probably a better fit on the even larger scale. So while it?s true that the effect of a Serial -> G1 default change is a static default GC, I just think we should mind the fact that there is more uncertainty on the larger end of the scale. I?m not proposing any changes, just saying that maybe we should be careful about stressing the importance of having a static default GC, if we don?t know if that is the better strategy on the larger end of the scale or not, going forward. /Erik > On 19 Feb 2025, at 14:16, Thomas Schatzl wrote: > > Hi all, > > > there have been some recent discussions around making G1 the default for all use-cases, both internally at Oracle and at the OpenJDK Committers Workshop. With this e-mail I want to bring this subject to a wider audience to gather feedback around potential problems with such a move. > > > As you all may know, G1 is the default collector in the Hotspot VM. However in some situations, some say in (too) many situations, the VM selects Serial GC "by default" anyway. :) > > From what I understand there are the following reasons to keep Serial GC _as default option_ in the context of "small" environments: > > * throughput: G1's large write barrier makes an argument about throughput being too far off and noticeable. Ongoing efforts ([1] which we plan for JDK 25) show that the difference is going to be much smaller if it ever was. > > Further, as soon as Serial GC is running for longer this advantage diminishes a lot due to full collections and can result in G1 actually performing better. > > * (native) memory footprint: G1 has made great strides in native memory usage. > > In the past particularly remembered sets were of concern, but their memory usage has been reduced significantly over the past few years. > E.g. with above change the entire young gen remembered set is also managed on the card table exactly like in Serial GC. > > [I would also like to state that I would be surprised if remembered sets, with a recent JDK and G1, are ever an issue with heaps Serial GC targets] > > Heap management tends to be worse with Serial GC, mostly due to its strict generational boundaries. G1's region based layout avoids wasting memory. > > * latency: if this has ever been a disadvantage, Serial GC's full collections are worse compared to G1's incremental collections. > > * startup: the time to start up the VM is not that different between these two collectors. Other components are much more relevant here. > > * historical inertia: at the time there was need to select a default, there has been nothing but Serial and Parallel GC. JDK 9 simply replaced Parallel GC as default for "server class" machines, probably as path of lesser resistance and at the time known shortcomings in some of the above areas. > > Some initial testing showed that Serial GC much better when constraining it to the same environment (single thread, heaps < 1.7g) than G1. > > At the same time, looking at the current situation from an end users point of view, it is very much confusing for them, getting a different garbage collector depending on environment, based on some somewhat arguable criteria? > > This change would also make the expectations ("g1 is default since jdk 9") match the actual behavior. > > I am looking forward to hear about your opinion about making G1 unconditionally default. > > Thanks, > Thomas > > [1] https://bugs.openjdk.org/browse/JDK-8340827 From duke at openjdk.org Fri Feb 21 16:34:54 2025 From: duke at openjdk.org (duke) Date: Fri, 21 Feb 2025 16:34:54 GMT Subject: RFR: 8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention [v2] In-Reply-To: References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> Message-ID: <8pPDE34XMhzcayRu3NZhW4JUmANZblcUq9-QZZOFe2g=.95047351-a914-44bf-ac66-b35985b73b18@github.com> On Wed, 19 Feb 2025 22:49:22 GMT, Xiaolong Peng wrote: >> We have noticed there is significant regression in at-safepoint time with recent changes made to ShenandoahLock, more specifically this [PR](https://github.com/openjdk/jdk/pull/19570), a local reproducer was written to reproduce the issue, here is the top N at-safepoint time in `ns` comparison: >> >> Tip: >> >> 94069776 >> 50993550 >> 49321667 >> 33903446 >> 32291313 >> 30587810 >> 27759958 >> 25080997 >> 24657404 >> 23874338 >> >> Tip + reverting [PR](https://github.com/openjdk/jdk/pull/19570) >> >> 58428998 >> 44410618 >> 30788370 >> 20636942 >> 15986465 >> 15307468 >> 9686426 >> 9432094 >> 7473938 >> 6854014 >> >> Note: command line for the test: >> >> java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint ~/Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr >> >> >> With further digging, we found the real problem is more runnable threads after the [PR](https://github.com/openjdk/jdk/pull/19570) causes longer time for VM_Thread to call `futex(FUTEX_WAKE_PRIVATE)` to disarm wait barrier when leaving safepoint. Fixing in the issue in VM_Thread benefits other GCs as well but it is more complicated(see the details here https://bugs.openjdk.org/browse/JDK-8350324). >> With some tweaks in ShenandoahLock, we could mitigate the regression caused by [PR](https://github.com/openjdk/jdk/pull/19570), also improve the long tails of at-saftpoint time by more than 10x, here is the result from the same test with this changes of this PR: >> >> >> 1890706 >> 1222180 >> 1042758 >> 853157 >> 792057 >> 785697 >> 780627 >> 757817 >> 740607 >> 736646 >> 725727 >> 725596 >> 724106 >> >> >> ### Other test >> - [x] `make test TEST=hotspot_gc_shenandoah` >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments @pengxiaolong Your change (at version 68e1b985b939dba0f4dc12a71901bb063769c1f1) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23701#issuecomment-2675015842 From xpeng at openjdk.org Fri Feb 21 16:41:58 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 21 Feb 2025 16:41:58 GMT Subject: Integrated: 8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention In-Reply-To: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> References: <2ZCZUKE71ToGyKRHVR2TNpmgoubol7j2MVENy3p4kdo=.3e005c72-39ca-4fbe-852a-ac90bbfeb63a@github.com> Message-ID: On Wed, 19 Feb 2025 15:58:01 GMT, Xiaolong Peng wrote: > We have noticed there is significant regression in at-safepoint time with recent changes made to ShenandoahLock, more specifically this [PR](https://github.com/openjdk/jdk/pull/19570), a local reproducer was written to reproduce the issue, here is the top N at-safepoint time in `ns` comparison: > > Tip: > > 94069776 > 50993550 > 49321667 > 33903446 > 32291313 > 30587810 > 27759958 > 25080997 > 24657404 > 23874338 > > Tip + reverting [PR](https://github.com/openjdk/jdk/pull/19570) > > 58428998 > 44410618 > 30788370 > 20636942 > 15986465 > 15307468 > 9686426 > 9432094 > 7473938 > 6854014 > > Note: command line for the test: > > java -Xms256m -Xmx256m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:-ShenandoahPacing -XX:-UseTLAB -Xlog:gc -Xlog:safepoint ~/Alloc.java | grep -Po "At safepoint: \d+ ns" | grep -Po "\d+" | sort -nr > > > With further digging, we found the real problem is more runnable threads after the [PR](https://github.com/openjdk/jdk/pull/19570) causes longer time for VM_Thread to call `futex(FUTEX_WAKE_PRIVATE)` to disarm wait barrier when leaving safepoint. Fixing in the issue in VM_Thread benefits other GCs as well but it is more complicated(see the details here https://bugs.openjdk.org/browse/JDK-8350324). > With some tweaks in ShenandoahLock, we could mitigate the regression caused by [PR](https://github.com/openjdk/jdk/pull/19570), also improve the long tails of at-saftpoint time by more than 10x, here is the result from the same test with this changes of this PR: > > > 1890706 > 1222180 > 1042758 > 853157 > 792057 > 785697 > 780627 > 757817 > 740607 > 736646 > 725727 > 725596 > 724106 > > > ### Other test > - [x] `make test TEST=hotspot_gc_shenandoah` > - [x] Tier 2 This pull request has now been integrated. Changeset: bd8ad309 Author: Xiaolong Peng Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/bd8ad309b59bceb3073a8d6411cca74e73508885 Stats: 18 lines in 2 files changed: 15 ins; 0 del; 3 mod 8350285: Shenandoah: Regression caused by ShenandoahLock under extreme contention Reviewed-by: shade, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/23701 From thomas.schatzl at oracle.com Mon Feb 24 08:33:32 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 24 Feb 2025 09:33:32 +0100 Subject: RFC: G1 as default collector (for real this time) In-Reply-To: <61CEE33A-6718-479D-A498-697C1063B5AA@oracle.com> References: <74d05686-9c57-4262-881d-31c269f34bc5@oracle.com> <61CEE33A-6718-479D-A498-697C1063B5AA@oracle.com> Message-ID: Hi, On 21.02.25 15:02, Erik Osterlund wrote: > Hi Thomas, > [...]> There is however a flip side for that argument on the other side of the scaling spectrum, where ZGC is probably a better fit on the even larger scale. So while it?s true that the effect of a Serial -> G1 default change is a static default GC, I just think we should mind the fact that there is more uncertainty on the larger end of the scale. I?m not proposing any changes, just saying that maybe we should be careful about stressing the importance of having a static default GC, if we don?t know if that is the better strategy on the larger end of the scale or not, going forward. +1 Thomas From iwalulya at openjdk.org Mon Feb 24 09:53:55 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 24 Feb 2025 09:53:55 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v4] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 18 Feb 2025 09:20:57 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2636497189 From dholmes at openjdk.org Mon Feb 24 11:28:01 2025 From: dholmes at openjdk.org (David Holmes) Date: Mon, 24 Feb 2025 11:28:01 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v4] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 18 Feb 2025 09:20:57 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker Nothing further from me. Thanks ------------- PR Review: https://git.openjdk.org/jdk/pull/23367#pullrequestreview-2636773637 From kbarrett at openjdk.org Mon Feb 24 13:02:54 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 24 Feb 2025 13:02:54 GMT Subject: RFR: 8349906: G1: Improve initial survivor rate for newly used young regions In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 10:55:46 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that tries to improve the survivor rate initial values for newly expanded regions. > > Currently G1 uses `InitialSurvivorRate` as survivor rate for such regions, but it is typically a pretty bad choice because > > * it's rather conservative, estimating that 40% of region contents will survive > * such a conservative value is kind of bad particularly in cases for regions that are expanded late in the mutator phase because they are not frequently updated (and with our running weights changes get propagated over a very long time), i.e. this 40% sticks for a long time (*) > * it is a random value, i.e. not particularly specific to the application. > > The suggestion is to use the survivor rate for the last region we know the survivor rate already. > > (*) to clarify this a little: G1 keeps track of `[0...m]` survivor rate predictors. For a given garbage collection, `[0...n]` of those are updated (`n` is the number of eden/survivor regions depending on the rate group). However those for `]n...m]` are not, particularly those in that range that are seldom allocated, the predictors are not updated very frequently. Now the young gen sizing uses these predictions "at the end" of the predictor anyway, and since they are infrequently updated and their values are very conservative, G1 won't expand young gen as much as it could/should. > > Testing: gha, tier1-7 (with other changes) > > Hth, > Thomas Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/g1/g1SurvRateGroup.cpp line 79: > 77: : InitialSurvivorRate; > 78: > 79: for (size_t i = _stats_arrays_length; i < _num_added_regions; ++i) { Shouldn't this iteration variable be similarly updated as was done in fill_in_last_surv_rates, to avoid some implicit narrowing? Similarly for the loop on line 49 in the destructor? Basically, I'm suggesting doing all or none of these in this PR (with maybe a preference for none, and do a separate sweep). ------------- PR Review: https://git.openjdk.org/jdk/pull/23584#pullrequestreview-2637013120 PR Review Comment: https://git.openjdk.org/jdk/pull/23584#discussion_r1967598618 From rsunderbabu at openjdk.org Mon Feb 24 15:33:11 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Mon, 24 Feb 2025 15:33:11 GMT Subject: RFR: 8314840: 3 gc/epsilon tests ignore external vm options Message-ID: These tests do not pass Java/JVM test command line options (flags) to the child process. More details in JBS. Tiers 1 to 3 tested. Along with various flag combinations. ------------- Commit messages: - prepend test java opts Changes: https://git.openjdk.org/jdk/pull/23751/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23751&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314840 Stats: 9 lines in 3 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23751.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23751/head:pull/23751 PR: https://git.openjdk.org/jdk/pull/23751 From wkemper at openjdk.org Mon Feb 24 17:39:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 24 Feb 2025 17:39:54 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v12] In-Reply-To: References: Message-ID: On Sat, 15 Feb 2025 01:41:48 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 98: >> >>> 96: } >>> 97: >>> 98: // In case any threads are waiting for a cycle to happen, let them know it isn't. >> >> maybe "it isn't happening", or "it won't happen". > > This is interesting. If GC is stopping prior to shutting down the VM, is there any point in notifying these waiting threads. Why not let them wait, and quietly shut things down? Are there JCK or other tests that would fail in that case? The waiting threads will remain waiting and prevent the JVM from shutting down. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1968113830 From wkemper at openjdk.org Mon Feb 24 17:53:01 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 24 Feb 2025 17:53:01 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v12] In-Reply-To: References: Message-ID: On Fri, 14 Feb 2025 22:56:20 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: >> >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Old gen bootstrap cycle must make it to init mark >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Improve message for assertion >> - Make shutdown safer for threads requesting (or expecting) gc >> - Do not accept requests if control thread is terminating >> - Notify waiters when control thread terminates >> - Add event for control thread state changes >> - Fix shutdown livelock error >> - Fix includes >> - ... and 20 more: https://git.openjdk.org/jdk/compare/ba6c9659...915ffbda > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 64: > >> 62: private: >> 63: // This lock is used to coordinate setting the _requested_gc_cause, _requested generation >> 64: // and _gc_mode. It is important that these be changed together and have a consistent view. > > In that case, for ease of maintenance, I'd move the declaration of all of the 3 data members that this lock protects next to this lock, either immediately preceding or immediately succeeding its declaration in the body of this class. > > Are these data members always both read and written under this lock? If so, then `_gc_mode` below doesn't need to be defined `volatile`. The `_gc_mode` is read without the lock by the regulator thread. However, the regulator thread does take the lock and reads `_gc_mode` again under the lock before making any state changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1968132168 From wkemper at openjdk.org Mon Feb 24 17:55:57 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 24 Feb 2025 17:55:57 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v12] In-Reply-To: References: Message-ID: On Sat, 15 Feb 2025 00:05:46 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: >> >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Old gen bootstrap cycle must make it to init mark >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Improve message for assertion >> - Make shutdown safer for threads requesting (or expecting) gc >> - Do not accept requests if control thread is terminating >> - Notify waiters when control thread terminates >> - Add event for control thread state changes >> - Fix shutdown livelock error >> - Fix includes >> - ... and 20 more: https://git.openjdk.org/jdk/compare/ba6c9659...915ffbda > > src/hotspot/share/gc/shenandoah/shenandoahController.hpp line 66: > >> 64: >> 65: // This cancels the collection cycle and has an option to block >> 66: // until another cycle runs and clears the alloc failure gc flag. > > But "the alloc failure gc flag" is gone above. The comment should be updated as well. A public API's description should avoid talking about its internal implementation details here. It's OK to talk about implementation details in the implementation of the method, not in the header spec here. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1968135382 From wkemper at openjdk.org Mon Feb 24 18:02:55 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 24 Feb 2025 18:02:55 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v12] In-Reply-To: References: Message-ID: On Sat, 15 Feb 2025 00:59:37 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: >> >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Old gen bootstrap cycle must make it to init mark >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Improve message for assertion >> - Make shutdown safer for threads requesting (or expecting) gc >> - Do not accept requests if control thread is terminating >> - Notify waiters when control thread terminates >> - Add event for control thread state changes >> - Fix shutdown livelock error >> - Fix includes >> - ... and 20 more: https://git.openjdk.org/jdk/compare/ba6c9659...915ffbda > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.hpp line 188: > >> 186: >> 187: bool should_start_gc() override; >> 188: bool resume_old_cycle(); > > Documentation comment please, especially explaining the return value. > For things that may return `false` and not do anything, it's better to use `try_` prefix. In fact, the method doesn't actually resume the cycle, but checks if we are in a state such that we should resume it. So, I'd name it `should_resume_old_cycle()`, consistent with the name `should_start_gc()` for the previous method. That makes sense. Will do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1968146250 From wkemper at openjdk.org Mon Feb 24 18:23:56 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 24 Feb 2025 18:23:56 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v12] In-Reply-To: References: Message-ID: On Sat, 15 Feb 2025 01:14:03 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.hpp line 101: >> >>> 99: || cause == GCCause::_shenandoah_allocation_failure_evac >>> 100: || cause == GCCause::_shenandoah_humongous_allocation_failure; >>> 101: } >> >> Would it make sense to move this implementation also to the .cpp file like the other static `is_...` methods below? > > Or is this guaranteeing inlining into the caller's body, which you might prefer for the callers? I moved the implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1968193533 From wkemper at openjdk.org Mon Feb 24 18:38:58 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 24 Feb 2025 18:38:58 GMT Subject: RFR: 8349766: GenShen: Bad progress after degen does not always need full gc [v5] In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 19:28:28 GMT, Kelvin Nilsen wrote: >> In generational mode, only upgrade to full GC from degenerated GC if we've done two degenerated cycles in a row and both indicated bad progress. Otherwise, start another concurrent GC, which will most likely degenerate also. But this degenerated cycle will reclaim floating garbage within the young generation much more quickly than a full GC would have done. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' of https://git.openjdk.org/jdk into defer-generational-full-gc > - Merge master > - Fix typo in merge conflict resolution > - 8348595: GenShen: Fix generational free-memory no-progress check > > Reviewed-by: phh, xpeng > - 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) > > Reviewed-by: shade > - Merge tag 'jdk-25+10' into defer-generational-full-gc > > Added tag jdk-25+10 for changeset a637ccf2 > - Be less eager to upgrade degen to full gc Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23552#pullrequestreview-2638092681 From wkemper at openjdk.org Mon Feb 24 20:54:37 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 24 Feb 2025 20:54:37 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v13] In-Reply-To: References: Message-ID: <7e3ebblQE8huN6LKQWeMxBHqJwaN-H6PangQDk57k4g=.1c9d19d0-22e4-4748-9484-87f52055491a@github.com> > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). > > ## Testing > * jtreg hotspot_gc_shenandoah > * dacapo, extremem, diluvian, specjbb2015, specjvm2018, heapothesys William Kemper has updated the pull request incrementally with one additional commit since the last revision: Address review feedback (better comments, better names) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23475/files - new: https://git.openjdk.org/jdk/pull/23475/files/915ffbda..1d887fcb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=11-12 Stats: 56 lines in 9 files changed: 24 ins; 17 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From wkemper at openjdk.org Tue Feb 25 01:43:15 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Feb 2025 01:43:15 GMT Subject: RFR: 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them Message-ID: The protocol which is meant to prevent regions from being uncommitted while their bitmaps are being reset may fail. This happens when the control thread attempts to wait for the uncommit thread to finish, but the uncommit thread has not yet indicated that it has started. ------------- Commit messages: - Use lock to protect in progress flag, remove unnecessary stop lock and flag Changes: https://git.openjdk.org/jdk/pull/23760/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23760&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350605 Stats: 74 lines in 2 files changed: 36 ins; 24 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/23760.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23760/head:pull/23760 PR: https://git.openjdk.org/jdk/pull/23760 From ayang at openjdk.org Tue Feb 25 11:17:06 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 25 Feb 2025 11:17:06 GMT Subject: RFR: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME [v4] In-Reply-To: References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Tue, 18 Feb 2025 09:20:57 GMT, Albert Mingkun Yang wrote: >> Here is an attempt to simplify GCLocker implementation for Serial and Parallel. >> >> GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. >> >> The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. >> >> Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. >> >> Test: tier1-8 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - review > - Merge branch 'master' into gclocker > - gclocker Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23367#issuecomment-2681608369 From ayang at openjdk.org Tue Feb 25 11:17:07 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 25 Feb 2025 11:17:07 GMT Subject: Integrated: 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME In-Reply-To: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> References: <8Vqsu8qf5wAN8pZF-8zu8zNhryQa42EZux3nMRChX5k=.63c53ac1-ca69-4a45-a924-9a454e24ea3f@github.com> Message-ID: On Thu, 30 Jan 2025 12:12:29 GMT, Albert Mingkun Yang wrote: > Here is an attempt to simplify GCLocker implementation for Serial and Parallel. > > GCLocker prevents GC when Java threads are in a critical region (i.e., calling JNI critical APIs). JDK-7129164 introduces an optimization that updates a shared variable (used to track the number of threads in the critical region) only if there is a pending GC request. However, this also means that after reaching a GC safepoint, we may discover that GCLocker is active, preventing a GC cycle from being invoked. The inability to perform GC at a safepoint adds complexity -- for example, a caller must retry allocation if the request fails due to GC being inhibited by GCLocker. > > The proposed patch uses a readers-writer lock to ensure that all Java threads exit the critical region before reaching a GC safepoint. This guarantees that once inside the safepoint, we can successfully invoke a GC cycle. The approach takes inspiration from `ZJNICritical`, but some regressions were observed in j2dbench (on Windows) and the micro-benchmark in [JDK-8232575](https://bugs.openjdk.org/browse/JDK-8232575). Therefore, instead of relying on atomic operations on a global variable when entering or leaving the critical region, this PR uses an existing thread-local variable with a store-load barrier for synchronization. > > Performance is neutral for all benchmarks tested: DaCapo, SPECjbb2005, SPECjbb2015, SPECjvm2008, j2dbench, and CacheStress. > > Test: tier1-8 This pull request has now been integrated. Changeset: a9c9f7f0 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/a9c9f7f0cbb2f2395fef08348bf867ffa8875d73 Stats: 985 lines in 41 files changed: 50 ins; 822 del; 113 mod 8192647: GClocker induced GCs can starve threads requiring memory leading to OOME Reviewed-by: tschatzl, iwalulya, egahlin ------------- PR: https://git.openjdk.org/jdk/pull/23367 From tschatzl at openjdk.org Tue Feb 25 12:24:21 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 25 Feb 2025 12:24:21 GMT Subject: RFR: 8350643: G1: Make loop iteration variable type correspond to limit in G1SurvRateGroup Message-ID: Hi all, in the G1SurvRateGroup class there are three loops that use a `size_t` loop iteration variable that is compared to `uint`s in the condition, causing some implicit narrowing. Change the loop variables to also use `uints`. Pointed out by @kimbarrett during review of JDK-8349906. Testing: local compilation, GHA, checking other loop variables to match type Thanks, Thomas ------------- Commit messages: - 8350643 Changes: https://git.openjdk.org/jdk/pull/23773/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23773&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350643 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23773.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23773/head:pull/23773 PR: https://git.openjdk.org/jdk/pull/23773 From tschatzl at openjdk.org Tue Feb 25 15:04:28 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 25 Feb 2025 15:04:28 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier Message-ID: Hi all, please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. ### Current situation With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. The main reason for the current barrier is how g1 implements concurrent refinement: * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: // Filtering if (region(@x.a) == region(y)) goto done; // same region check if (y == null) goto done; // null value check if (card(@x.a) == young_card) goto done; // write to young gen check StoreLoad; // synchronize if (card(@x.a) == dirty_card) goto done; *card(@x.a) = dirty // Card tracking enqueue(card-address(@x.a)) into thread-local-dcq; if (thread-local-dcq is not full) goto done; call runtime to move thread-local-dcq into dcqs done: Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a second card table ("refinement table"). The second card table also replaces the dirty card queue. In that scheme the fine-grained synchronization is unnecessary because mutator and refinement threads always write to different memory areas (and no concurrent write where an update can be lost can occur). This removes the necessity for synchronization for every reference write. Also no card enqueuing is required any more. Only the filters and the card mark remain. ### How this works In the beginning both the card table and the refinement table are completely unmarked (contain "clean" cards). The mutator dirties the card table, until G1 heuristics think that a significant enough amount of cards were dirtied based on what is allocated for scanning them during the garbage collection. At that point, the card table and the refinement table are exchanged "atomically" using handshakes. The mutator keeps dirtying the (the previous, clean refinement table which is now the) card table, while the refinement threads look for and refine dirty cards on the refinement table as before. Refinement of cards is very similar to before: if an interesting reference in a dirty card has been found, G1 records it in appropriate remembered sets. In this implementation there is an exception for references to the current collection set (typically young gen) - the refinement threads redirty that card on the card table with a special `to-collection-set` value. This is valid because races with the mutator for that write do not matter - the entire card will eventually be rescanned anyway, regardless of whether it ends up as dirty or to-collection-set. The advantage of marking to-collection-set cards specially is that the next time the card tables are swapped, the refinement threads will not re-refine them on the assumption that that reference to the collection set will not change. This decreases refinement work substantially. If refinement gets interrupted by GC, the refinement table will be merged with the card table before card scanning, which works as before. New barrier pseudo-code for an assignment `x.a = y`: // Filtering if (region(@x.a) == region(y)) goto done; // same region check if (y == null) goto done; // null value check if (card(@x.a) != clean_card) goto done; // skip already non-clean cards *card(@x.a) = dirty This is basically the Serial/Parallel GC barrier with additional filters to keep the number of dirty cards as little as possible. A few more comments about the barrier: * the barrier now loads the card table base offset from a thread local instead of inlining it. This is necessary for this mechanism to work as the card table to dirty changes over time, and may even be faster on some architectures (code size), and some architectures already do. * all existing pre-filters were kept. Benchmarks showed some significant regressions wrt to pause times and even throughput compared to G1 in master. Using the Parallel GC barrier (just the dirty card write) would be possible, and further investigation on stripping parts will be made as follow-up. * the final check tests for non-clean cards to avoid overwriting existing cards, in particular the "to-collection set" cards described above. Current G1 marks the cards corresponding to young gen regions as all "young" so that the original barrier could potentially avoid the `StoreLoad`. This implementation removes this facility (which might be re-introduced later), but measurements showed that pre-dirtying the young generation region's cards as "dirty" (g1 does not need to use an extra "young" value) did not yield any measurable performance difference. ### Refinement process The goal of the refinement (threads) is to make sure that the number of cards to scan in the garbage collection is below a particular threshold. The prototype changes the refinement threads into a single control thread and a set of (refinement) worker threads. Differently to the previous implementation, the control thread does not do any refinement, but only executes the heuristics to start a calculated amount of worker threads and tracking refinement progress. The refinement trigger is based on current known number of pending (i.e. dirty) cards on the card table and a pending card generation rate, fairly similarly to the previous algorithm. After the refinement control thread determines that it is time to do refinement, it starts the following sequence: 1) **Swap the card table**. This consists of several steps: 1) **Swap the global card table** - the global card table pointer is swapped; newly created threads and runtime calls will eventually use the new values, at the latest after the next two steps. 2) **Update the pointers in all JavaThread**'s TLS storage to the new card table pointer using a handshake operation 3) **Update the pointers in the GC thread**'s TLS storage to the new card table pointer using the SuspendibleThreadSet mechanism 2) **Snapshot the heap** - determine the extent of work needed for all regions where the refinement threads need to do some work on the refinement table (the previous card table). The snapshot stores the work progress for each region so that work can be interrupted and continued at any time. This work either consists of refinement of the particular card (old generation regions) or clearing the cards (next collection set/young generation regions). 3) **Sweep the refinement table** by activating the refinement worker threads. The threads refine dirty cards using the heap snapshot where worker threads claim parts of regions to process. * Cards with references to the young generation are not added to the young generation's card based remembered set. Instead these cards are marked as to-collection-set in the card table and any remaining refinement of that card skipped. * If refinement encounters a card that is already marked as to-collection-set it is not refined and re-marked as to-collection-set on the card table . * During refinement, the refinement table is also cleared (in bulk for collection set regions as they do not need any refinement, and in other regions as they are refined for the non-clean cards). * Dirty cards within unparsable heap areas are forwarded to/redirtied on the card table as is. 4) **Completion work**, mostly statistics. If the work is interrupted by a non-garbage collection synchronization point, work is suspended temporarily and resumed later using the heap snapshot. After the refinement process the refinement table is all-clean again and ready to be swapped again. ### Garbage collection pause changes Since a garbage collection (young or full gc) pause may occur at any point during the refinement process, the garbage collection needs some compensating work for the not yet swept parts of the refinement table. Note that this situation is very rare, and the heuristics try to avoid that, so in most cases nothing needs to be done as the refinement table is all clean. If this happens, young collections add a new phase called `Merge Refinement Table` in the garbage collection pause right before the `Merge Heap Roots` phase. This compensating phase does the following: 0) (Optional) Snapshot the heap if not done yet (if the process has been interrupted between state 1 and 3 of the refinement process) 1) Merge the refinement table into the card table - in this step the dirty cards of interesting regions are 2) Completion work (statistics) If a full collection interrupts concurrent refinement, the refinement table is simply cleared and all dirty cards thrown away. A garbage collection generates new cards (e.g. references from promoted objects into the young generation) on the refinement table. This acts similarly to the extra DCQS used to record these interesting references/cards and redirty the card table using them in the previous implementation. G1 swaps the card tables at the end of the collection to keep the post-condition of the refinement table being all clean (and any to-be-refined cards on the card table) at the end of garbage collection. ### Performance metrics Following is an overview of the changes in behavior. Some numbers are provided in the CR in the first comment. #### Native memory usage The refinement table takes an additional 0.2% of the Java heap size of native memory compared to JDK 21 and above (in JDK 21 we removed one card table sized data structure, so this is a non-issue when updating from before). Some of that additional memory usage is automatically reclaimed by removing the dirty card queues. Additional memory is reclaimed by managing the cards containing to-collection-set references on the card table by dropping the explicit remembered sets for young generation completely and any remembered set entries which would otherwise be duplicated into the other region's remembered sets. In some applications/benchmarks these gains completely offset the additional card table, however most of the time this is not the case, particularly for throughput applications currently. It is possible to allocate the refinement table lazily, which means that since these applications often do not need any concurrent refinement, there is no overhead at all but actually a net reduction of native memory usage. This is not implemented in this prototype. #### Latency ("Pause times") Not affected or slightly better. Pause times decrease due to a shorter "Merge remembered sets" phase due to no work required for the remembered sets for the young generation - they are always already on the card table! However merging of the refinement table into the card table is extremely fast and is always faster than merging remembered sets for the young gen in my measurements. Since this work is linearly scanning some memory, this is embarassingly parallel too. The cards created during garbage collection do not need to be redirtied, so that phase has also been removed. The card table swap is based on predictions for mutator card dirtying rate and refinement rate as before, and the policy is actually fairly similar to before. It is still rather aggressive, but in most cases takes less cpu resources than the one before, mostly because refining takes less cpu time. Many applications do not do any refinement at all like before. More investigation could be done to improve this in the future. #### Throughput This change always increases throughput in my measurements, depending on benchmark/application it may not actually show up in scores though. Due to the pre-barrier and the additional filters in the barrier G1 is still slower than Parallel on raw throughput benchmarks, but is typically somewhere half-way to Parallel GC or closer. ### Platform support Since the post write barrier changed, additional work for some platforms is required to allow this change to proceed. At this time all work for all platforms is done, but needs testing - GraalVM (contributed by the GraalVM team) - S390 (contributed by A. Kumar from IBM) - PPC (contributed by M. Doerr, from SAP) - ARM (should work, HelloWorld compiles and runs) - RISCV (should work, HelloWorld compiles and runs) - x86 (should work, build/HelloWorld compiles and runs) None of the above mentioned platforms implement the barrier method to write cards for a reference array (aarch64 and x64 are fully implemented), they call the runtime as before. I believe it is doable fairly easily now with this simplified barrier for some extra performance, but not necessary. ### Alternatives The JEP text extensively discusses alternatives. ### Reviewing The change can be roughly divided in these fairly isolated parts * platform specific changes to the barrier * refinement and refinement control thread changes; this is best reviewed starting from the `G1ConcurrentRefineThread::run_service` method * changes to garbage collection: `merge_refinement_table()` in `g1RemSet.cpp` * policy modifications are typically related to code around the calls to `G1Policy::record_dirtying_stats`. Further information is available in the [JEP draft](https://bugs.openjdk.org/browse/JDK-8340827); there is also an a bit more extensive discussion of the change on my [blog](https://tschatzl.github.io/2025/02/21/new-write-barriers.html). Some additional comments: * the pre-marking of young generation cards has been removed. Benchmarks did not show any significant difference either way. To me this makes somewhat sense because the entire young gen will quickly get marked anyway. I.e. one only saves a single additional card table write (for every card). With the old barrier the costs for a card table mark has been much higher. * G1 sets `UseCondCardMark` to true by default. The conditional card mark corresponds to the third filter in the write barrier now, and since I decided to keep all filters for this change, it makes sense to directly use this mechanism. If there are any questions, feel free to ask. Testing: tier1-7 (multiple tier1-7, tier1-8 with slightly older versions) Thanks, Thomas ------------- Commit messages: - * only provide byte map base for JavaThreads - * mdoerr review: fix comments in ppc code - * fix crash when writing dirty cards for memory regions during card table switching - * remove mention of "enqueue" or "enqueuing" for actions related to post barrier - * remove some commented out debug code - Card table as DCQ Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342382 Stats: 6543 lines in 103 files changed: 2162 ins; 3461 del; 920 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From mdoerr at openjdk.org Tue Feb 25 15:04:29 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 25 Feb 2025 15:04:29 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier In-Reply-To: References: Message-ID: On Sun, 23 Feb 2025 18:53:33 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... PPC64 code looks great! Thanks for doing this! Only some comments are no longer correct. src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp line 244: > 242: > 243: __ xorr(R0, store_addr, new_val); // tmp1 := store address ^ new value > 244: __ srdi_(R0, R0, G1HeapRegion::LogOfHRGrainBytes); // tmp1 := ((store address ^ new value) >> LogOfHRGrainBytes) Comment: R0 is used instead of tmp1 src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp line 259: > 257: > 258: __ ld(tmp1, G1ThreadLocalData::card_table_base_offset(), thread); > 259: __ srdi(tmp2, store_addr, CardTable::card_shift()); // tmp1 := card address relative to card table base Comment: tmp2 is used, here src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp line 261: > 259: __ srdi(tmp2, store_addr, CardTable::card_shift()); // tmp1 := card address relative to card table base > 260: if (UseCondCardMark) { > 261: __ lbzx(R0, tmp1, tmp2); // tmp1 := card address Can you remove the comment, please? It's wrong. ------------- PR Review: https://git.openjdk.org/jdk/pull/23739#pullrequestreview-2637143540 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1967669777 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1967670850 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1967671593 From duke at openjdk.org Tue Feb 25 15:04:29 2025 From: duke at openjdk.org (Piotr Tarsa) Date: Tue, 25 Feb 2025 15:04:29 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier In-Reply-To: References: Message-ID: On Sun, 23 Feb 2025 18:53:33 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... in this pr you've wrote if (region(@x.a) != region(y)) goto done; // same region check but on https://tschatzl.github.io/2025/02/21/new-write-barriers.html you wrote: (1) if (region(x.a) == region(y)) goto done; // Ignore references within the same region/area i guess the second one is correct ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2677075290 From stuefe at openjdk.org Tue Feb 25 15:04:29 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Feb 2025 15:04:29 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier In-Reply-To: References: Message-ID: On Sun, 23 Feb 2025 18:53:33 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... @tschatzl I did not contribute the ppc port. Did you mean @TheRealMDoerr or @reinrich ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2677512780 From tschatzl at openjdk.org Tue Feb 25 15:13:43 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 25 Feb 2025 15:13:43 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * remove unnecessarily added logging ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/0100d8e2..9ef9c5f4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=00-01 Stats: 4 lines in 4 files changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Tue Feb 25 16:46:33 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 25 Feb 2025 16:46:33 GMT Subject: RFR: 8349906: G1: Improve initial survivor rate for newly used young regions [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that tries to improve the survivor rate initial values for newly expanded regions. > > Currently G1 uses `InitialSurvivorRate` as survivor rate for such regions, but it is typically a pretty bad choice because > > * it's rather conservative, estimating that 40% of region contents will survive > * such a conservative value is kind of bad particularly in cases for regions that are expanded late in the mutator phase because they are not frequently updated (and with our running weights changes get propagated over a very long time), i.e. this 40% sticks for a long time (*) > * it is a random value, i.e. not particularly specific to the application. > > The suggestion is to use the survivor rate for the last region we know the survivor rate already. > > (*) to clarify this a little: G1 keeps track of `[0...m]` survivor rate predictors. For a given garbage collection, `[0...n]` of those are updated (`n` is the number of eden/survivor regions depending on the rate group). However those for `]n...m]` are not, particularly those in that range that are seldom allocated, the predictors are not updated very frequently. Now the young gen sizing uses these predictions "at the end" of the predictor anyway, and since they are infrequently updated and their values are very conservative, G1 won't expand young gen as much as it could/should. > > Testing: gha, tier1-7 (with other changes) > > Hth, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * kbarrett review: do not change the type of loop variable * ayang review: use actual last value instead of prediction for newly allocated survivor rate groups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23584/files - new: https://git.openjdk.org/jdk/pull/23584/files/5c4ded01..a09bc25e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23584&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23584&range=00-01 Stats: 10 lines in 1 file changed: 7 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23584.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23584/head:pull/23584 PR: https://git.openjdk.org/jdk/pull/23584 From shade at openjdk.org Tue Feb 25 20:13:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Feb 2025 20:13:01 GMT Subject: RFR: 8348278: Trim InitialRAMPercentage to improve startup in default modes In-Reply-To: References: Message-ID: On Thu, 23 Jan 2025 11:27:46 GMT, Aleksey Shipilev wrote: > See bug for discussion. This is the code change, which is simple. What is not simple is deciding what the new value should be. The change would probably require CSR, which I can file after we agree on the value. > > I think cutting to 0.2% of RAM size gets us into good sweet spot: > - On huge 1024G machine, this yields 2G initial heap > - On reasonably sized 128G machine, this gives 256M initial heap > - On smaller 1G container, this gives 2M initial heap > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` Not now, bot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23262#issuecomment-2683175870 From wkemper at openjdk.org Tue Feb 25 22:06:19 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Feb 2025 22:06:19 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v14] In-Reply-To: References: Message-ID: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). > > ## Testing > * jtreg hotspot_gc_shenandoah > * dacapo, extremem, diluvian, specjbb2015, specjvm2018, heapothesys William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: - Merge tag 'jdk-25+11' into fix-control-regulator-threads Added tag jdk-25+11 for changeset 0131c1bf - Address review feedback (better comments, better names) - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads - Old gen bootstrap cycle must make it to init mark - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads - Improve message for assertion - Make shutdown safer for threads requesting (or expecting) gc - Do not accept requests if control thread is terminating - Notify waiters when control thread terminates - Add event for control thread state changes - ... and 22 more: https://git.openjdk.org/jdk/compare/0131c1bf...d7858deb ------------- Changes: https://git.openjdk.org/jdk/pull/23475/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=13 Stats: 927 lines in 18 files changed: 302 ins; 291 del; 334 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From xpeng at openjdk.org Tue Feb 25 22:47:16 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 25 Feb 2025 22:47:16 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings Message-ID: The change is to improve the observability of Shenandoah GC, basically there are two changes for Shenandoah GC timings in this PR: 1. Net GC pause timings include the time to propagate GC state to Java threads 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs With the change, the new GC timing log will be like: [11.056s][info][gc,stats ] Concurrent Reset 89 us [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us [11.056s][info][gc,stats ] Update Region States 3 us [11.056s][info][gc,stats ] Propagate GC state 1 us [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x [11.056s][info][gc,stats ] CMR: 456 us [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x [11.057s][info][gc,stats ] CM: 3043 us [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, [11.057s][info][gc,stats ] Flush SATB 204 us [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x [11.057s][info][gc,stats ] Propagate GC state 2 us [11.057s][info][gc,stats ] Update Region States 12 us [11.057s][info][gc,stats ] Choose Collection Set 25 us [11.057s][info][gc,stats ] Rebuild Free Set 29 us [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x [11.057s][info][gc,stats ] CWRF: 17 us [11.057s][info][gc,stats ] CWRF: Weak References 17 us, workers (us): 15, 1, 0, ---, ---, ---, [11.057s][info][gc,stats ] Concurrent Weak Roots 413 us [11.057s][info][gc,stats ] Roots 203 us, parallelism: 1.95x [11.057s][info][gc,stats ] CWR: 396 us [11.057s][info][gc,stats ] CWR: Code Cache Roots 295 us, workers (us): 90, 96, 109, ---, ---, ---, [11.057s][info][gc,stats ] CWR: VM Weak Roots 100 us, workers (us): 48, 37, 14, ---, ---, ---, [11.057s][info][gc,stats ] CWR: CLDG Roots 2 us, workers (us): ---, ---, 2, ---, ---, ---, [11.058s][info][gc,stats ] Rendezvous 197 us [11.058s][info][gc,stats ] Concurrent Cleanup 35 us [11.058s][info][gc,stats ] Concurrent Class Unloading 486 us [11.058s][info][gc,stats ] Unlink Stale 398 us [11.058s][info][gc,stats ] System Dictionary 5 us [11.058s][info][gc,stats ] Weak Class Links 0 us [11.058s][info][gc,stats ] Code Roots 391 us [11.058s][info][gc,stats ] Rendezvous 69 us [11.058s][info][gc,stats ] Purge Unlinked 4 us [11.058s][info][gc,stats ] Code Roots 0 us [11.058s][info][gc,stats ] CLDG 3 us [11.058s][info][gc,stats ] Pause Final Roots (G) 272 us [11.058s][info][gc,stats ] Pause Final Roots (N) 18 us [11.058s][info][gc,stats ] Propagate GC state 3 us ### Test - [x] make test TEST=hotspot_gc_shenandoah ------------- Commit messages: - Log time to propagate GC state - 8350314: Shenandoah: Capture thread state sync times in GC timings Changes: https://git.openjdk.org/jdk/pull/23759/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23759&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350314 Stats: 47 lines in 5 files changed: 40 ins; 7 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23759/head:pull/23759 PR: https://git.openjdk.org/jdk/pull/23759 From shade at openjdk.org Tue Feb 25 22:47:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Feb 2025 22:47:16 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 01:20:35 GMT, Xiaolong Peng wrote: > The change is to improve the observability of Shenandoah GC, basically there are two changes for Shenandoah GC timings in this PR: > > 1. Net GC pause timings include the time to propagate GC state to Java threads > 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs > > With the change, the new GC timing log will be like: > > [11.056s][info][gc,stats ] Concurrent Reset 89 us > [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us > [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us > [11.056s][info][gc,stats ] Update Region States 3 us > [11.056s][info][gc,stats ] Propagate GC state 1 us > [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x > [11.056s][info][gc,stats ] CMR: 456 us > [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, > [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x > [11.057s][info][gc,stats ] CM: 3043 us > [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, > [11.057s][info][gc,stats ] Flush SATB 204 us > [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us > [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us > [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x > [11.057s][info][gc,stats ] Propagate GC state 2 us > [11.057s][info][gc,stats ] Update Region States 12 us > [11.057s][info][gc,stats ] Choose Collection Set 25 us > [11.057s][info][gc,stats ] Rebuild Free Set 29 us > [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x > [11.057s][info][gc,stats ] CWRF: 17 us > [11.057s][info][gc,stats ] CWRF: Weak References 17 us, workers (us): 15, 1, 0, ---, ---, ---, > [11.057s][info][gc,stats ] Concurrent Weak Roots 413 us > [11.057s][info][gc,stats ... src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 308: > 306: > 307: op_init_mark(); > 308: ShenandoahHeap::heap()->propagate_gc_state_to_all_threads(); I would say move it downwards into `op_init_mark` and wrap with its own subtimer, e.g. with `ShenandoahGCPhase phase(ShenandoahPhaseTimings::init_propagate_gcstate);`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1969328009 From wkemper at openjdk.org Tue Feb 25 23:03:53 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Feb 2025 23:03:53 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 01:20:35 GMT, Xiaolong Peng wrote: > The change is to improve the observability of Shenandoah GC, basically there are two changes for Shenandoah GC timings in this PR: > > 1. Net GC pause timings include the time to propagate GC state to Java threads > 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs > > With the change, the new GC timing log will be like: > > [11.056s][info][gc,stats ] Concurrent Reset 89 us > [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us > [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us > [11.056s][info][gc,stats ] Update Region States 3 us > [11.056s][info][gc,stats ] Propagate GC state 1 us > [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x > [11.056s][info][gc,stats ] CMR: 456 us > [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, > [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x > [11.057s][info][gc,stats ] CM: 3043 us > [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, > [11.057s][info][gc,stats ] Flush SATB 204 us > [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us > [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us > [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x > [11.057s][info][gc,stats ] Propagate GC state 2 us > [11.057s][info][gc,stats ] Update Region States 12 us > [11.057s][info][gc,stats ] Choose Collection Set 25 us > [11.057s][info][gc,stats ] Rebuild Free Set 29 us > [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x > [11.057s][info][gc,stats ] CWRF: 17 us > [11.057s][info][gc,stats ] CWRF: Weak References 17 us, workers (us): 15, 1, 0, ---, ---, ---, > [11.057s][info][gc,stats ] Concurrent Weak Roots 413 us > [11.057s][info][gc,stats ... Did we really see `propagate_gc_state_to_all_threads` taking a long time? Or was it exiting the safepoint (i.e., after the state had been propagated) that took a long time? src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 694: > 692: { > 693: ShenandoahGCPhase phase(ShenandoahPhaseTimings::init_propagate_gc_state); > 694: ShenandoahHeap::heap()->propagate_gc_state_to_all_threads(); A nit, but can we use the `heap` variable that is already in scope for these `propagate_gc_state_to_all_threads` calls? ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23759#pullrequestreview-2642654296 PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1970660730 From xpeng at openjdk.org Tue Feb 25 23:22:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 25 Feb 2025 23:22:56 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 23:01:24 GMT, William Kemper wrote: > Did we really see `propagate_gc_state_to_all_threads` taking a long time? Or was it exiting the safepoint (i.e., after the state had been propagated) that took a long time? No, we discussed it last week in Slack channel, `propagate_gc_state_to_all_threads` usually takes less than 10 ns for ~1k threads in our test, it not a problem. it was the `futex` call when exit the safepoint took long time, we have a [fix](https://bugs.openjdk.org/browse/JDK-8350285) in ShenandoahLock for mitigation since the change we made last year affects scheduler, meanwhile Aleksey is working on https://bugs.openjdk.org/browse/JDK-8350324 which should improve the time to leave safepoint which has much broader impact. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23759#issuecomment-2683510642 From xpeng at openjdk.org Tue Feb 25 23:27:22 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 25 Feb 2025 23:27:22 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v2] In-Reply-To: References: Message-ID: > The change is to improve the observability of Shenandoah GC, basically there are two changes for Shenandoah GC timings in this PR: > > 1. Net GC pause timings include the time to propagate GC state to Java threads > 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs > > With the change, the new GC timing log will be like: > > [11.056s][info][gc,stats ] Concurrent Reset 89 us > [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us > [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us > [11.056s][info][gc,stats ] Update Region States 3 us > [11.056s][info][gc,stats ] Propagate GC state 1 us > [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x > [11.056s][info][gc,stats ] CMR: 456 us > [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, > [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x > [11.057s][info][gc,stats ] CM: 3043 us > [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, > [11.057s][info][gc,stats ] Flush SATB 204 us > [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us > [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us > [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x > [11.057s][info][gc,stats ] Propagate GC state 2 us > [11.057s][info][gc,stats ] Update Region States 12 us > [11.057s][info][gc,stats ] Choose Collection Set 25 us > [11.057s][info][gc,stats ] Rebuild Free Set 29 us > [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x > [11.057s][info][gc,stats ] CWRF: 17 us > [11.057s][info][gc,stats ] CWRF: Weak References 17 us, workers (us): 15, 1, 0, ---, ---, ---, > [11.057s][info][gc,stats ] Concurrent Weak Roots 413 us > [11.057s][info][gc,stats ... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23759/files - new: https://git.openjdk.org/jdk/pull/23759/files/4940e451..cad4a434 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23759&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23759&range=00-01 Stats: 7 lines in 3 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23759/head:pull/23759 PR: https://git.openjdk.org/jdk/pull/23759 From xpeng at openjdk.org Tue Feb 25 23:27:22 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 25 Feb 2025 23:27:22 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v2] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 23:00:28 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 694: > >> 692: { >> 693: ShenandoahGCPhase phase(ShenandoahPhaseTimings::init_propagate_gc_state); >> 694: ShenandoahHeap::heap()->propagate_gc_state_to_all_threads(); > > A nit, but can we use the `heap` variable that is already in scope for these `propagate_gc_state_to_all_threads` calls? Thanks, I have updated the PR to use `heap` variable in scope. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1970679549 From wkemper at openjdk.org Tue Feb 25 23:49:53 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Feb 2025 23:49:53 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v2] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 23:27:22 GMT, Xiaolong Peng wrote: >> The change is to improve the observability of Shenandoah GC, basically there are two changes for Shenandoah GC timings in this PR: >> >> 1. Net GC pause timings include the time to propagate GC state to Java threads >> 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs >> >> With the change, the new GC timing log will be like: >> >> [11.056s][info][gc,stats ] Concurrent Reset 89 us >> [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us >> [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us >> [11.056s][info][gc,stats ] Update Region States 3 us >> [11.056s][info][gc,stats ] Propagate GC state 1 us >> [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x >> [11.056s][info][gc,stats ] CMR: 456 us >> [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, >> [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x >> [11.057s][info][gc,stats ] CM: 3043 us >> [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, >> [11.057s][info][gc,stats ] Flush SATB 204 us >> [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us >> [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us >> [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x >> [11.057s][info][gc,stats ] Propagate GC state 2 us >> [11.057s][info][gc,stats ] Update Region States 12 us >> [11.057s][info][gc,stats ] Choose Collection Set 25 us >> [11.057s][info][gc,stats ] Rebuild Free Set 29 us >> [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x >> [11.057s][info][gc,stats ] CWRF: 17 us >> [11.057s][info][gc,stats ] CWRF: Weak References 17 us, workers (us): 15, 1, 0, ---, ---, ---, >> [11.057s][info][gc,stats ] Concu... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Propagating gc state for init update refs is vestigial. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1081: > 1079: heap->pacer()->setup_for_update_refs(); > 1080: } > 1081: { This one isn't necessary. This safepoint only happens when the pacer or the verifier is enabled. We moved init update ref gc state propagation into a handshake (see `ShenandoahHeap::concurrent_prepare_for_update_refs`). Sorry, I should have caught this in my first review. ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23759#pullrequestreview-2642707574 PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1970695230 From wkemper at openjdk.org Tue Feb 25 23:49:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Feb 2025 23:49:54 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v2] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 23:45:21 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1081: > >> 1079: heap->pacer()->setup_for_update_refs(); >> 1080: } >> 1081: { > > This one isn't necessary. This safepoint only happens when the pacer or the verifier is enabled. We moved init update ref gc state propagation into a handshake (see `ShenandoahHeap::concurrent_prepare_for_update_refs`). Sorry, I should have caught this in my first review. I'm also currently working on removing the `final_roots` safepoint and fixing up the phase timings for the newish concurrent init update refs phase. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1970696214 From xpeng at openjdk.org Wed Feb 26 00:05:03 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Feb 2025 00:05:03 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v2] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 23:46:49 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1081: >> >>> 1079: heap->pacer()->setup_for_update_refs(); >>> 1080: } >>> 1081: { >> >> This one isn't necessary. This safepoint only happens when the pacer or the verifier is enabled. We moved init update ref gc state propagation into a handshake (see `ShenandoahHeap::concurrent_prepare_for_update_refs`). Sorry, I should have caught this in my first review. > > I'm also currently working on removing the `final_roots` safepoint and fixing up the phase timings for the newish concurrent init update refs phase. Sorry I should have read the code in `op_init_update_refs`, it doesn't change GC state, we can remove the the `propagate_gc_state_to_all_threads` to clean up code a little bit after you replaced the init_update_refs pause with handshake. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1970706937 From xpeng at openjdk.org Wed Feb 26 00:08:57 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Feb 2025 00:08:57 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v2] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 00:02:45 GMT, Xiaolong Peng wrote: >> I'm also currently working on removing the `final_roots` safepoint and fixing up the phase timings for the newish concurrent init update refs phase. > > Sorry I should have read the code in `op_init_update_refs`, it doesn't change GC state, we can remove the the call of `propagate_gc_state_to_all_threads` to clean up code a little bit, it was missed when you replaced the init_update_refs pause with handshake. For `final_roots`, I think I should leave it as it is in this PR, later you will remove the timings and gc state propagation into a handshake anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1970708588 From wkemper at openjdk.org Wed Feb 26 00:34:52 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 26 Feb 2025 00:34:52 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v2] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 00:05:12 GMT, Xiaolong Peng wrote: >> Sorry I should have read the code in `op_init_update_refs`, it doesn't change GC state, we can remove the the call of `propagate_gc_state_to_all_threads` to clean up code a little bit, it was missed when you replaced the init_update_refs pause with handshake. > > For `final_roots`, I think I should leave it as it is in this PR, later you will remove the timings and gc state propagation into a handshake anyway. Yep, I agree. Thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1970726624 From wkemper at openjdk.org Wed Feb 26 00:34:51 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 26 Feb 2025 00:34:51 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v3] In-Reply-To: References: Message-ID: <0kZCj6U-om8Gz_iMeXiQf1jSitVjhTvy2PYwLjllvGM=.5321e4f5-7e39-4d5f-adb4-4b41cf23dce8@github.com> On Wed, 26 Feb 2025 00:01:12 GMT, Xiaolong Peng wrote: >> The change is to improve the observability of Shenandoah GC, basically there are two changes for Shenandoah GC timings in this PR: >> >> 1. Net GC pause timings include the time to propagate GC state to Java threads >> 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs >> >> With the change, the new GC timing log will be like: >> >> [11.056s][info][gc,stats ] Concurrent Reset 89 us >> [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us >> [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us >> [11.056s][info][gc,stats ] Update Region States 3 us >> [11.056s][info][gc,stats ] Propagate GC state 1 us >> [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x >> [11.056s][info][gc,stats ] CMR: 456 us >> [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, >> [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x >> [11.057s][info][gc,stats ] CM: 3043 us >> [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, >> [11.057s][info][gc,stats ] Flush SATB 204 us >> [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us >> [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us >> [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x >> [11.057s][info][gc,stats ] Propagate GC state 2 us >> [11.057s][info][gc,stats ] Update Region States 12 us >> [11.057s][info][gc,stats ] Choose Collection Set 25 us >> [11.057s][info][gc,stats ] Rebuild Free Set 29 us >> [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x >> [11.057s][info][gc,stats ] CWRF: 17 us >> [11.057s][info][gc,stats ] CWRF: Weak References 17 us, workers (us): 15, 1, 0, ---, ---, ---, >> [11.057s][info][gc,stats ] Concu... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Remove propagate_gc_state_to_all_threads call from op_init_update_refs Thank you for the changes. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23759#pullrequestreview-2642754796 From ysr at openjdk.org Wed Feb 26 01:32:03 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 26 Feb 2025 01:32:03 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v14] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 22:06:19 GMT, William Kemper wrote: >> There are several changes to the operation of Shenandoah's control threads here. >> * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. >> * The cancellation handling is driven entirely by the cancellation cause >> * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed >> * The shutdown sequence is simpler >> * The generational control thread uses a lock to coordinate updates to the requested cause and generation >> * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance >> * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles >> * The control thread doesn't loop on its own (unless the pacer is enabled). >> >> ## Testing >> * jtreg hotspot_gc_shenandoah >> * dacapo, extremem, diluvian, specjbb2015, specjvm2018, heapothesys > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: > > - Merge tag 'jdk-25+11' into fix-control-regulator-threads > > Added tag jdk-25+11 for changeset 0131c1bf > - Address review feedback (better comments, better names) > - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads > - Old gen bootstrap cycle must make it to init mark > - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads > - Improve message for assertion > - Make shutdown safer for threads requesting (or expecting) gc > - Do not accept requests if control thread is terminating > - Notify waiters when control thread terminates > - Add event for control thread state changes > - ... and 22 more: https://git.openjdk.org/jdk/compare/0131c1bf...d7858deb A few random comments, mostly on documentation, and a few assertion suggestions. Rest looks good. Thanks for tightening up the code via your changes in this PR. I don't think I should need to re-review any changes you make stemming from my (rally quite minor) comments. Reviewed; ? src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 114: > 112: } > 113: > 114: void ShenandoahGenerationalControlThread::check_for_request(ShenandoahGCRequest& request) { Comment: // Fill in the cause, generation requested, and set gc_mode, all under the lock. // Make any relevant changes to the control state of heuristics or policy objects. Just as an aside, these internal work methods hve cascading effects on a bunch of states, all covered by the control lock. It would be interesting to see which threads contend for this lock and whether there are circumstances under which the lock might become hot of contended. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 125: > 123: if (request.cause == GCCause::_shenandoah_concurrent_gc) { > 124: request.generation = _heap->young_generation(); > 125: _heap->clear_cancelled_gc(false); label name of parameter to the call to help the reader. `/* clear_oom_handler */` src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 717: > 715: > 716: void ShenandoahGenerationalControlThread::notify_control_thread(MonitorLocker& ml, GCCause::Cause cause, ShenandoahGeneration* generation) { > 717: assert(_control_lock.is_locked(), "Request lock must be held here"); Somewhat paranoid suggestion: I'd use the stronger, if slightly more expensive, `owned_by_self()` rather than the weaker `is_locked()`. You can alternatively use `assert_lock_strong(...)`. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 720: > 718: log_debug(gc, thread)("Notify control (%s): %s, %s", gc_mode_name(gc_mode()), GCCause::to_string(cause), generation->name()); > 719: _requested_gc_cause = cause; > 720: _requested_generation = generation; This _may_ be a good spot to add any potential invariant (sanity) checks for the consistency of `cause` and `generation` to catch any potential issues. For example, I might check I am not overwriting legit previous requests (?), and that any new request I am creating are sensible. Such assertion/invariant checks may be relegated into a work method to avoid cluttering this method. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 730: > 728: > 729: void ShenandoahGenerationalControlThread::notify_cancellation(MonitorLocker& ml, GCCause::Cause cause) { > 730: assert(_heap->cancelled_gc(), "GC should already be cancelled"); Not sure about this, but if cancellations can happen only when `request*` fields are clear (or may be not?), then this would be a good place to do such invariant checks (much as the suggestion above in `notify_control_thread()`. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 795: > 793: if (_mode != new_mode) { > 794: log_debug(gc, thread)("Transition from: %s to: %s", gc_mode_name(_mode), gc_mode_name(new_mode)); > 795: EventMark event("Control thread transition from: %s, to %s", gc_mode_name(_mode), gc_mode_name(new_mode)); As in my suggestions above for `notify_control_thread` and `notify_cancellation`, this might be an opportune time to do any sanity/consistency checks for `gc_mode` updates (wrt the other two `_request*` fields. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 75: > 73: > 74: // The mode is read frequently by requesting threads and only ever written by the control thread. > 75: volatile GCMode _mode; A bit of a nit: Any reason not to just call it `_gc_mode` which seems now to be its _de facto_ name due to its accessor method name and how it's referred to in comments? src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 132: > 130: > 131: void set_gc_mode(GCMode new_mode); > 132: void set_gc_mode(MonitorLocker& ml, GCMode new_mode); // Set gc mode under lock and post a notification. The second variant is called from // contexts where the lock is already held. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 135: > 133: static const char* gc_mode_name(GCMode mode); > 134: > 135: // Takes the request lock and updates the requested cause and generation, then notifies the control thread. // The second variant is used from contexts where the lock is already held. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 139: > 137: void notify_control_thread(MonitorLocker& ml, GCCause::Cause cause, ShenandoahGeneration* generation); > 138: > 139: // Notifies the control thread, but does not update the requested cause or generation. ```// The second variant ...``` src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 143: > 141: void notify_cancellation(MonitorLocker& ml, GCCause::Cause cause); > 142: > 143: void maybe_set_aging_cycle(); 1 line documentation comment for the private APIs. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 150: > 148: GCMode prepare_for_explicit_gc_request(ShenandoahGCRequest &request); > 149: > 150: GCMode prepare_for_concurrent_gc_request(ShenandoahGCRequest &request); Documentation of private APIs. Bit of a nit: These all seem to take a request type, fill in some yet unfilled fields, and return a GC mode. So they look to me like `prepare_request_for_` rather than `pepare_for__request`. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 443: > 441: > 442: public: > 443: // Returns true if and only if cancellation request was successfully communicated. This comment should probably go at line 455, and here you could say ```// Has GC been cancelled?``` src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 687: > 685: HeapWord* allocate_from_gclab_slow(Thread* thread, size_t size); > 686: HeapWord* allocate_new_gclab(size_t min_size, size_t word_size, size_t* actual_size); > 687: bool retry_allocation(size_t original_full_gc_count) const; It feels like this should be called `should_retry_allocation()`. Also a 1-line documentation comment: ``` // We want to retry an unsuccessful attempt at allocation until at least a full gc. ``` ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23475#pullrequestreview-2642645029 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970692572 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970689728 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970730527 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970747110 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970750529 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970752575 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970669392 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970673227 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970754002 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970754196 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970674398 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970678568 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970758140 PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970761571 From ysr at openjdk.org Wed Feb 26 01:32:04 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 26 Feb 2025 01:32:04 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v12] In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 17:50:38 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 64: >> >>> 62: private: >>> 63: // This lock is used to coordinate setting the _requested_gc_cause, _requested generation >>> 64: // and _gc_mode. It is important that these be changed together and have a consistent view. >> >> In that case, for ease of maintenance, I'd move the declaration of all of the 3 data members that this lock protects next to this lock, either immediately preceding or immediately succeeding its declaration in the body of this class. >> >> Are these data members always both read and written under this lock? If so, then `_gc_mode` below doesn't need to be defined `volatile`. > > The `_gc_mode` is read without the lock by the regulator thread. However, the regulator thread does take the lock and reads `_gc_mode` again under the lock before making any state changes. Make sense -- so the regulator uses a double-checked locking idiom. May be note that somewhere where you talk about this lock and the volatile _gc_mode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970654476 From ysr at openjdk.org Wed Feb 26 01:32:05 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 26 Feb 2025 01:32:05 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v12] In-Reply-To: References: Message-ID: On Sat, 15 Feb 2025 00:34:59 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: >> >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Old gen bootstrap cycle must make it to init mark >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Improve message for assertion >> - Make shutdown safer for threads requesting (or expecting) gc >> - Do not accept requests if control thread is terminating >> - Notify waiters when control thread terminates >> - Add event for control thread state changes >> - Fix shutdown livelock error >> - Fix includes >> - ... and 20 more: https://git.openjdk.org/jdk/compare/ba6c9659...915ffbda > > src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 450: > >> 448: >> 449: void cancel_concurrent_mark(); >> 450: bool cancel_gc(GCCause::Cause cause); > > // Returns true if and only if cancellation request was successfully communicated. See new comment at line 443 above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970759318 From ysr at openjdk.org Wed Feb 26 01:32:06 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 26 Feb 2025 01:32:06 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v14] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 01:22:05 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: >> >> - Merge tag 'jdk-25+11' into fix-control-regulator-threads >> >> Added tag jdk-25+11 for changeset 0131c1bf >> - Address review feedback (better comments, better names) >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Old gen bootstrap cycle must make it to init mark >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Improve message for assertion >> - Make shutdown safer for threads requesting (or expecting) gc >> - Do not accept requests if control thread is terminating >> - Notify waiters when control thread terminates >> - Add event for control thread state changes >> - ... and 22 more: https://git.openjdk.org/jdk/compare/0131c1bf...d7858deb > > src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 687: > >> 685: HeapWord* allocate_from_gclab_slow(Thread* thread, size_t size); >> 686: HeapWord* allocate_new_gclab(size_t min_size, size_t word_size, size_t* actual_size); >> 687: bool retry_allocation(size_t original_full_gc_count) const; > > It feels like this should be called `should_retry_allocation()`. Also a 1-line documentation comment: ``` // We want to retry an unsuccessful attempt at allocation until at least a full gc. ``` PS: do we know that a full gc will reclaim all soft refs? I suppose that is Shenandoah policy? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1970762980 From xpeng at openjdk.org Wed Feb 26 02:17:36 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Feb 2025 02:17:36 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v4] In-Reply-To: References: Message-ID: > The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: > > 1. Net GC pause timings include the time to propagate GC state to Java threads > 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs > 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. > > With the change, the new GC timing log will be like: > > [11.056s][info][gc,stats ] Concurrent Reset 89 us > [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us > [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us > [11.056s][info][gc,stats ] Update Region States 3 us > [11.056s][info][gc,stats ] Propagate GC state 1 us > [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x > [11.056s][info][gc,stats ] CMR: 456 us > [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, > [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x > [11.057s][info][gc,stats ] CM: 3043 us > [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, > [11.057s][info][gc,stats ] Flush SATB 204 us > [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us > [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us > [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x > [11.057s][info][gc,stats ] Propagate GC state 2 us > [11.057s][info][gc,stats ] Update Region States 12 us > [11.057s][info][gc,stats ] Choose Collection Set 25 us > [11.057s][info][gc,stats ] Rebuild Free Set 29 us > [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x > [11.057s][info][gc,stats ] CWRF: 17 us > [11.057s][info][gc,stats ] CWRF: Weak References 17 us, workers (... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Set _gc_state_changed to false at the end of concurrent_prepare_for_update_refs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23759/files - new: https://git.openjdk.org/jdk/pull/23759/files/2e7b2cea..54ae1c0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23759&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23759&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23759/head:pull/23759 PR: https://git.openjdk.org/jdk/pull/23759 From ysr at openjdk.org Wed Feb 26 02:17:36 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 26 Feb 2025 02:17:36 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v4] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 02:14:22 GMT, Xiaolong Peng wrote: >> The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: >> >> 1. Net GC pause timings include the time to propagate GC state to Java threads >> 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs >> 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. >> >> With the change, the new GC timing log will be like: >> >> [11.056s][info][gc,stats ] Concurrent Reset 89 us >> [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us >> [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us >> [11.056s][info][gc,stats ] Update Region States 3 us >> [11.056s][info][gc,stats ] Propagate GC state 1 us >> [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x >> [11.056s][info][gc,stats ] CMR: 456 us >> [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, >> [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x >> [11.057s][info][gc,stats ] CM: 3043 us >> [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, >> [11.057s][info][gc,stats ] Flush SATB 204 us >> [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us >> [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us >> [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x >> [11.057s][info][gc,stats ] Propagate GC state 2 us >> [11.057s][info][gc,stats ] Update Region States 12 us >> [11.057s][info][gc,stats ] Choose Collection Set 25 us >> [11.057s][info][gc,stats ] Rebuild Free Set 29 us >> [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x >> [11.057s][info][gc,stats ] CWRF: 17 us >> [11.057s][info][gc,... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Set _gc_state_changed to false at the end of concurrent_prepare_for_update_refs LGTM ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23759#pullrequestreview-2642858719 From xpeng at openjdk.org Wed Feb 26 02:17:36 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Feb 2025 02:17:36 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v2] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 00:31:03 GMT, William Kemper wrote: >> For `final_roots`, I think I should leave it as it is in this PR, later you will remove the timings and gc state propagation into a handshake anyway. > > Yep, I agree. Thank you! To remove propagate_gc_state_to_all_threads, `_gc_state_changed` has to be set to false in `ShenandoahHeap::concurrent_prepare_for_update_refs`, I have updated PR to fix it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1970795124 From ysr at openjdk.org Wed Feb 26 02:17:36 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 26 Feb 2025 02:17:36 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v3] In-Reply-To: References: Message-ID: <7HdJx1s4PkFH9L1AdYjiQMjR4nfl0RM4FDHtdlLDge4=.cf536d80-1679-44f8-b36c-9aab738f7cd8@github.com> On Wed, 26 Feb 2025 00:01:12 GMT, Xiaolong Peng wrote: >> The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: >> >> 1. Net GC pause timings include the time to propagate GC state to Java threads >> 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs >> 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. >> >> With the change, the new GC timing log will be like: >> >> [11.056s][info][gc,stats ] Concurrent Reset 89 us >> [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us >> [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us >> [11.056s][info][gc,stats ] Update Region States 3 us >> [11.056s][info][gc,stats ] Propagate GC state 1 us >> [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x >> [11.056s][info][gc,stats ] CMR: 456 us >> [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, >> [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x >> [11.057s][info][gc,stats ] CM: 3043 us >> [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, >> [11.057s][info][gc,stats ] Flush SATB 204 us >> [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us >> [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us >> [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x >> [11.057s][info][gc,stats ] Propagate GC state 2 us >> [11.057s][info][gc,stats ] Update Region States 12 us >> [11.057s][info][gc,stats ] Choose Collection Set 25 us >> [11.057s][info][gc,stats ] Rebuild Free Set 29 us >> [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x >> [11.057s][info][gc,stats ] CWRF: 17 us >> [11.057s][info][gc,... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Remove propagate_gc_state_to_all_threads call from op_init_update_refs Changes are fine. This jumped out in yr sample output: ... [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x ... [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x which seemed kinda interesting. I assume this is just a consequence of the very little work (and extremely brief time in this phase) here, and can be ignored in this sample output from likely a toy GC, or one where you may have artificially boosted the number of worker threads. Still I thought I'd ask in case you've seen this with bigger timings or more work in any of these phases with low fractional speed-ups. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23759#issuecomment-2683721058 From ysr at openjdk.org Wed Feb 26 02:25:52 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 26 Feb 2025 02:25:52 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v4] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 02:17:36 GMT, Xiaolong Peng wrote: >> The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: >> >> 1. Net GC pause timings include the time to propagate GC state to Java threads >> 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs >> 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. >> >> With the change, the new GC timing log will be like: >> >> [11.056s][info][gc,stats ] Concurrent Reset 89 us >> [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us >> [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us >> [11.056s][info][gc,stats ] Update Region States 3 us >> [11.056s][info][gc,stats ] Propagate GC state 1 us >> [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x >> [11.056s][info][gc,stats ] CMR: 456 us >> [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, >> [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x >> [11.057s][info][gc,stats ] CM: 3043 us >> [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, >> [11.057s][info][gc,stats ] Flush SATB 204 us >> [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us >> [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us >> [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x >> [11.057s][info][gc,stats ] Propagate GC state 2 us >> [11.057s][info][gc,stats ] Update Region States 12 us >> [11.057s][info][gc,stats ] Choose Collection Set 25 us >> [11.057s][info][gc,stats ] Rebuild Free Set 29 us >> [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x >> [11.057s][info][gc,stats ] CWRF: 17 us >> [11.057s][info][gc,... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Set _gc_state_changed to false at the end of concurrent_prepare_for_update_refs src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1288: > 1286: > 1287: _update_refs_iterator.reset(); > 1288: _gc_state_changed = false; Sorry, what is this and why do we need it. When was it set? In `set_update_refs_in_progress()` ? May be we need a better abstraction to toggle the `state_changed` boolean. This seems error prone. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1970802802 From kbarrett at openjdk.org Wed Feb 26 06:37:56 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 26 Feb 2025 06:37:56 GMT Subject: RFR: 8350643: G1: Make loop iteration variable type correspond to limit in G1SurvRateGroup In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 10:42:32 GMT, Thomas Schatzl wrote: > Hi all, > > in the G1SurvRateGroup class there are three loops that use a `size_t` loop iteration variable that is compared to `uint`s in the condition, causing some implicit narrowing. > > Change the loop variables to also use `uints`. > > Pointed out by @kimbarrett during review of JDK-8349906. > > Testing: local compilation, GHA, checking other loop variables to match type > > Thanks, > Thomas Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23773#pullrequestreview-2643242784 From duke at openjdk.org Wed Feb 26 06:59:09 2025 From: duke at openjdk.org (Saint Wesonga) Date: Wed, 26 Feb 2025 06:59:09 GMT Subject: RFR: 8350722: Remove duplicate SerialGC logic for detecting pointers in young gen Message-ID: Checking whether a pointer is in the young generation is currently done by comparing the pointer to the end of the young generation reserved space. The duplication of these checks in various places complicates any changes the layout of the young generation since all these locations need to be updated. This PR replaces the duplicated logic with the DefNewGeneration::is_in_reserved method. ------------- Commit messages: - Remove duplicate logic for detecting pointers in young gen Changes: https://git.openjdk.org/jdk/pull/23792/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23792&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350722 Stats: 18 lines in 3 files changed: 4 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23792.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23792/head:pull/23792 PR: https://git.openjdk.org/jdk/pull/23792 From xpeng at openjdk.org Wed Feb 26 07:53:09 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Feb 2025 07:53:09 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v4] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 02:23:15 GMT, Y. Srinivas Ramakrishna wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Set _gc_state_changed to false at the end of concurrent_prepare_for_update_refs > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1288: > >> 1286: >> 1287: _update_refs_iterator.reset(); >> 1288: _gc_state_changed = false; > > Sorry, what is this and why do we need it. When was it set? > > In `set_update_refs_in_progress()` ? > > May be we need a better abstraction to toggle the `state_changed` boolean. This seems error prone. You are right, we don't need to set _gc_state_changed to false here, because `concurrent_prepare_for_update_refs` calls `ShenandoahHeap::set_gc_state_concurrent` to set gc state, `ShenandoahHeap::set_gc_state_concurrent` doesn't change `_gc_state_changed` to true, hence there is no need to reset it false here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1971094948 From xpeng at openjdk.org Wed Feb 26 07:53:08 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Feb 2025 07:53:08 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v5] In-Reply-To: References: Message-ID: > The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: > > 1. Net GC pause timings include the time to propagate GC state to Java threads > 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs > 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. > > With the change, the new GC timing log will be like: > > [11.056s][info][gc,stats ] Concurrent Reset 89 us > [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us > [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us > [11.056s][info][gc,stats ] Update Region States 3 us > [11.056s][info][gc,stats ] Propagate GC state 1 us > [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x > [11.056s][info][gc,stats ] CMR: 456 us > [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, > [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x > [11.057s][info][gc,stats ] CM: 3043 us > [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, > [11.057s][info][gc,stats ] Flush SATB 204 us > [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us > [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us > [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x > [11.057s][info][gc,stats ] Propagate GC state 2 us > [11.057s][info][gc,stats ] Update Region States 12 us > [11.057s][info][gc,stats ] Choose Collection Set 25 us > [11.057s][info][gc,stats ] Rebuild Free Set 29 us > [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x > [11.057s][info][gc,stats ] CWRF: 17 us > [11.057s][info][gc,stats ] CWRF: Weak References 17 us, workers (... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Fix test failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23759/files - new: https://git.openjdk.org/jdk/pull/23759/files/54ae1c0e..30647a9c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23759&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23759&range=03-04 Stats: 6 lines in 2 files changed: 5 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23759/head:pull/23759 PR: https://git.openjdk.org/jdk/pull/23759 From xpeng at openjdk.org Wed Feb 26 08:12:31 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Feb 2025 08:12:31 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v6] In-Reply-To: References: Message-ID: > The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: > > 1. Net GC pause timings include the time to propagate GC state to Java threads > 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs > 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. > > With the change, the new GC timing log will be like: > > [11.056s][info][gc,stats ] Concurrent Reset 89 us > [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us > [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us > [11.056s][info][gc,stats ] Update Region States 3 us > [11.056s][info][gc,stats ] Propagate GC state 1 us > [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x > [11.056s][info][gc,stats ] CMR: 456 us > [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, > [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x > [11.057s][info][gc,stats ] CM: 3043 us > [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, > [11.057s][info][gc,stats ] Flush SATB 204 us > [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us > [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us > [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x > [11.057s][info][gc,stats ] Propagate GC state 2 us > [11.057s][info][gc,stats ] Update Region States 12 us > [11.057s][info][gc,stats ] Choose Collection Set 25 us > [11.057s][info][gc,stats ] Rebuild Free Set 29 us > [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x > [11.057s][info][gc,stats ] CWRF: 17 us > [11.057s][info][gc,stats ] CWRF: Weak References 17 us, workers (... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Remove trailing whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23759/files - new: https://git.openjdk.org/jdk/pull/23759/files/30647a9c..891dd11d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23759&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23759&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23759/head:pull/23759 PR: https://git.openjdk.org/jdk/pull/23759 From tschatzl at openjdk.org Wed Feb 26 08:31:33 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Feb 2025 08:31:33 GMT Subject: RFR: 8349906: G1: Improve initial survivor rate for newly used young regions [v3] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that tries to improve the survivor rate initial values for newly expanded regions. > > Currently G1 uses `InitialSurvivorRate` as survivor rate for such regions, but it is typically a pretty bad choice because > > * it's rather conservative, estimating that 40% of region contents will survive > * such a conservative value is kind of bad particularly in cases for regions that are expanded late in the mutator phase because they are not frequently updated (and with our running weights changes get propagated over a very long time), i.e. this 40% sticks for a long time (*) > * it is a random value, i.e. not particularly specific to the application. > > The suggestion is to use the survivor rate for the last region we know the survivor rate already. > > (*) to clarify this a little: G1 keeps track of `[0...m]` survivor rate predictors. For a given garbage collection, `[0...n]` of those are updated (`n` is the number of eden/survivor regions depending on the rate group). However those for `]n...m]` are not, particularly those in that range that are seldom allocated, the predictors are not updated very frequently. Now the young gen sizing uses these predictions "at the end" of the predictor anyway, and since they are infrequently updated and their values are very conservative, G1 won't expand young gen as much as it could/should. > > Testing: gha, tier1-7 (with other changes) > > Hth, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * kbarrett review - remove include previously used for debugging ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23584/files - new: https://git.openjdk.org/jdk/pull/23584/files/a09bc25e..fc2dde0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23584&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23584&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23584.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23584/head:pull/23584 PR: https://git.openjdk.org/jdk/pull/23584 From tschatzl at openjdk.org Wed Feb 26 08:31:33 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Feb 2025 08:31:33 GMT Subject: RFR: 8349906: G1: Improve initial survivor rate for newly used young regions [v2] In-Reply-To: <05XFWM5YjizGeiZm1Jb5OCYmMR5QabJrfLV5E7IFzxY=.6dbc460d-9ea5-4590-aaa9-dc2d6e337f18@github.com> References: <05XFWM5YjizGeiZm1Jb5OCYmMR5QabJrfLV5E7IFzxY=.6dbc460d-9ea5-4590-aaa9-dc2d6e337f18@github.com> Message-ID: On Wed, 26 Feb 2025 06:58:18 GMT, Kim Barrett wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * kbarrett review: do not change the type of loop variable >> * ayang review: use actual last value instead of prediction for newly allocated survivor rate groups > > src/hotspot/share/gc/g1/g1SurvRateGroup.cpp line 31: > >> 29: #include "memory/allocation.hpp" >> 30: >> 31: #include "logging/logStream.hpp" > > Left-over debugging include? I don't see any uses, but if I missed some then this needs to be put into proper > sort order. Leftover debug code. Removed. :( ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23584#discussion_r1971145004 From tschatzl at openjdk.org Wed Feb 26 08:48:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Feb 2025 08:48:58 GMT Subject: RFR: 8350722: Remove duplicate SerialGC logic for detecting pointers in young gen In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 06:54:19 GMT, Saint Wesonga wrote: > Checking whether a pointer is in the young generation is currently done by comparing the pointer to the end of the young generation reserved space. The duplication of these checks in various places complicates any changes the layout of the young generation since all these locations need to be updated. This PR replaces the duplicated logic with the DefNewGeneration::is_in_reserved method. src/hotspot/share/gc/serial/defNewGeneration.cpp line 150: > 148: bool do_object_b(oop p) { > 149: HeapWord* heap_word_ptr = cast_from_oop(p); > 150: bool is_in_young_gen = _young_gen->is_in_reserved((void*)heap_word_ptr); The check for only the boundary is intentional, and guided by performance. One comparison/memory load in the original code is faster than the two memory loads/comparisons (both bounds of the reserved area) plus the eventual indirect load via `_young_gen` is just slower. The same reason why most of the places it is used store a local copy of the generation boundary. This adds up for hundreds of thousands of checks/references to evacuate; at least it did last time years ago. If the goal is just factoring out the check to make it change for all locations whenever heap layout is changed, I would prefer adding a helper method in `SerialHeap` for example that's easily inlinable for the compiler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23792#discussion_r1971169800 From tschatzl at openjdk.org Wed Feb 26 09:07:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Feb 2025 09:07:52 GMT Subject: RFR: 8314840: 3 gc/epsilon tests ignore external vm options In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 15:27:46 GMT, Ramkumar Sunderbabu wrote: > These tests do not pass Java/JVM test command line options (flags) to the child process. More details in JBS. > Tiers 1 to 3 tested. Along with various flag combinations. lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23751#pullrequestreview-2643679216 From kbarrett at openjdk.org Wed Feb 26 10:03:00 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 26 Feb 2025 10:03:00 GMT Subject: RFR: 8349906: G1: Improve initial survivor rate for newly used young regions [v3] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 08:31:33 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that tries to improve the survivor rate initial values for newly expanded regions. >> >> Currently G1 uses `InitialSurvivorRate` as survivor rate for such regions, but it is typically a pretty bad choice because >> >> * it's rather conservative, estimating that 40% of region contents will survive >> * such a conservative value is kind of bad particularly in cases for regions that are expanded late in the mutator phase because they are not frequently updated (and with our running weights changes get propagated over a very long time), i.e. this 40% sticks for a long time (*) >> * it is a random value, i.e. not particularly specific to the application. >> >> The suggestion is to use the survivor rate for the last region we know the survivor rate already. >> >> (*) to clarify this a little: G1 keeps track of `[0...m]` survivor rate predictors. For a given garbage collection, `[0...n]` of those are updated (`n` is the number of eden/survivor regions depending on the rate group). However those for `]n...m]` are not, particularly those in that range that are seldom allocated, the predictors are not updated very frequently. Now the young gen sizing uses these predictions "at the end" of the predictor anyway, and since they are infrequently updated and their values are very conservative, G1 won't expand young gen as much as it could/should. >> >> Testing: gha, tier1-7 (with other changes) >> >> Hth, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * kbarrett review - remove include previously used for debugging Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23584#pullrequestreview-2643873048 From ayang at openjdk.org Wed Feb 26 10:27:59 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 26 Feb 2025 10:27:59 GMT Subject: RFR: 8350643: G1: Make loop iteration variable type correspond to limit in G1SurvRateGroup In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 10:42:32 GMT, Thomas Schatzl wrote: > Hi all, > > in the G1SurvRateGroup class there are three loops that use a `size_t` loop iteration variable that is compared to `uint`s in the condition, causing some implicit narrowing. > > Change the loop variables to also use `uints`. > > Pointed out by @kimbarrett during review of JDK-8349906. > > Testing: local compilation, GHA, checking other loop variables to match type > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23773#pullrequestreview-2643952250 From tschatzl at openjdk.org Wed Feb 26 10:33:10 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Feb 2025 10:33:10 GMT Subject: RFR: 8349906: G1: Improve initial survivor rate for newly used young regions [v3] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 10:00:04 GMT, Kim Barrett wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * kbarrett review - remove include previously used for debugging > > Looks good. Thanks @kimbarrett @walulyai for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/23584#issuecomment-2684556124 From tschatzl at openjdk.org Wed Feb 26 10:33:23 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Feb 2025 10:33:23 GMT Subject: RFR: 8350643: G1: Make loop iteration variable type correspond to limit in G1SurvRateGroup In-Reply-To: References: Message-ID: <6JHMc8SLpKwwopjHfgh32_d4B3CLmwhyKgCg_2OlOmM=.621c5e92-8064-4a76-97e8-c895fa482b81@github.com> On Wed, 26 Feb 2025 10:25:20 GMT, Albert Mingkun Yang wrote: >> Hi all, >> >> in the G1SurvRateGroup class there are three loops that use a `size_t` loop iteration variable that is compared to `uint`s in the condition, causing some implicit narrowing. >> >> Change the loop variables to also use `uints`. >> >> Pointed out by @kimbarrett during review of JDK-8349906. >> >> Testing: local compilation, GHA, checking other loop variables to match type >> >> Thanks, >> Thomas > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk @kimbarrett for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/23773#issuecomment-2684558378 From tschatzl at openjdk.org Wed Feb 26 10:33:10 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Feb 2025 10:33:10 GMT Subject: Integrated: 8349906: G1: Improve initial survivor rate for newly used young regions In-Reply-To: References: Message-ID: On Wed, 12 Feb 2025 10:55:46 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that tries to improve the survivor rate initial values for newly expanded regions. > > Currently G1 uses `InitialSurvivorRate` as survivor rate for such regions, but it is typically a pretty bad choice because > > * it's rather conservative, estimating that 40% of region contents will survive > * such a conservative value is kind of bad particularly in cases for regions that are expanded late in the mutator phase because they are not frequently updated (and with our running weights changes get propagated over a very long time), i.e. this 40% sticks for a long time (*) > * it is a random value, i.e. not particularly specific to the application. > > The suggestion is to use the survivor rate for the last region we know the survivor rate already. > > (*) to clarify this a little: G1 keeps track of `[0...m]` survivor rate predictors. For a given garbage collection, `[0...n]` of those are updated (`n` is the number of eden/survivor regions depending on the rate group). However those for `]n...m]` are not, particularly those in that range that are seldom allocated, the predictors are not updated very frequently. Now the young gen sizing uses these predictions "at the end" of the predictor anyway, and since they are infrequently updated and their values are very conservative, G1 won't expand young gen as much as it could/should. > > Testing: gha, tier1-7 (with other changes) > > Hth, > Thomas This pull request has now been integrated. Changeset: aac9cb45 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/aac9cb4537b13a4af123ae76f29359e851dc4c82 Stats: 16 lines in 1 file changed: 13 ins; 0 del; 3 mod 8349906: G1: Improve initial survivor rate for newly used young regions Reviewed-by: kbarrett, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/23584 From tschatzl at openjdk.org Wed Feb 26 11:09:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Feb 2025 11:09:56 GMT Subject: Withdrawn: 8350643: G1: Make loop iteration variable type correspond to limit in G1SurvRateGroup In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 10:42:32 GMT, Thomas Schatzl wrote: > Hi all, > > in the G1SurvRateGroup class there are three loops that use a `size_t` loop iteration variable that is compared to `uint`s in the condition, causing some implicit narrowing. > > Change the loop variables to also use `uints`. > > Pointed out by @kimbarrett during review of JDK-8349906. > > Testing: local compilation, GHA, checking other loop variables to match type > > Thanks, > Thomas This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23773 From tschatzl at openjdk.org Wed Feb 26 11:20:15 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Feb 2025 11:20:15 GMT Subject: RFR: 8350643: G1: Make loop iteration variable type correspond to limit in G1SurvRateGroup [v2] In-Reply-To: References: Message-ID: > Hi all, > > in the G1SurvRateGroup class there are three loops that use a `size_t` loop iteration variable that is compared to `uint`s in the condition, causing some implicit narrowing. > > Change the loop variables to also use `uints`. > > Pointed out by @kimbarrett during review of JDK-8349906. > > Testing: local compilation, GHA, checking other loop variables to match type > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' into 8350643-loop-counter-variable-type - 8350643 Hi all, in the G1SurvRateGroup class there are three loops that use a `size_t` loop iteration variable that is compared to `uint`s in the condition, causing some implicit narrowing. Change the loop variables to also use `uints`. Pointed out by @kimbarrett during review of JDK-8349906. Testing: local compilation, github testing, checking other loop variables to match type Thanks, Thomas ------------- Changes: https://git.openjdk.org/jdk/pull/23773/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23773&range=01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23773.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23773/head:pull/23773 PR: https://git.openjdk.org/jdk/pull/23773 From tschatzl at openjdk.org Wed Feb 26 11:20:15 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Feb 2025 11:20:15 GMT Subject: RFR: 8350643: G1: Make loop iteration variable type correspond to limit in G1SurvRateGroup In-Reply-To: References: Message-ID: <8WXY_blbp0N6EQ-iqT1i8V_Zv6BwZ2kP1hW5Z8Pwsac=.c39ac0e7-877e-47dd-9410-30edc6dd3d08@github.com> On Tue, 25 Feb 2025 10:42:32 GMT, Thomas Schatzl wrote: > Hi all, > > in the G1SurvRateGroup class there are three loops that use a `size_t` loop iteration variable that is compared to `uint`s in the condition, causing some implicit narrowing. > > Change the loop variables to also use `uints`. > > Pointed out by @kimbarrett during review of JDK-8349906. > > Testing: local compilation, GHA, checking other loop variables to match type > > Thanks, > Thomas This branch needed merging before integration, so needs a re-review. Nothing changed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23773#issuecomment-2684667705 From ayang at openjdk.org Wed Feb 26 11:26:54 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 26 Feb 2025 11:26:54 GMT Subject: RFR: 8350643: G1: Make loop iteration variable type correspond to limit in G1SurvRateGroup [v2] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 11:20:15 GMT, Thomas Schatzl wrote: >> Hi all, >> >> in the G1SurvRateGroup class there are three loops that use a `size_t` loop iteration variable that is compared to `uint`s in the condition, causing some implicit narrowing. >> >> Change the loop variables to also use `uints`. >> >> Pointed out by @kimbarrett during review of JDK-8349906. >> >> Testing: local compilation, GHA, checking other loop variables to match type >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into 8350643-loop-counter-variable-type > - 8350643 > > Hi all, > > in the G1SurvRateGroup class there are three loops that use a `size_t` loop iteration variable that is compared to `uint`s in the condition, causing some implicit narrowing. > > Change the loop variables to also use `uints`. > > Pointed out by @kimbarrett during review of JDK-8349906. > > Testing: local compilation, github testing, checking other loop variables to match type > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23773#pullrequestreview-2644127650 From tschatzl at openjdk.org Wed Feb 26 11:34:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Feb 2025 11:34:58 GMT Subject: RFR: 8350643: G1: Make loop iteration variable type correspond to limit in G1SurvRateGroup [v2] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 11:24:03 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge branch 'master' into 8350643-loop-counter-variable-type >> - 8350643 >> >> Hi all, >> >> in the G1SurvRateGroup class there are three loops that use a `size_t` loop iteration variable that is compared to `uint`s in the condition, causing some implicit narrowing. >> >> Change the loop variables to also use `uints`. >> >> Pointed out by @kimbarrett during review of JDK-8349906. >> >> Testing: local compilation, github testing, checking other loop variables to match type >> >> Thanks, >> Thomas > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk @kimbarrett for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/23773#issuecomment-2684696957 From tschatzl at openjdk.org Wed Feb 26 11:34:59 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Feb 2025 11:34:59 GMT Subject: Integrated: 8350643: G1: Make loop iteration variable type correspond to limit in G1SurvRateGroup In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 10:42:32 GMT, Thomas Schatzl wrote: > Hi all, > > in the G1SurvRateGroup class there are three loops that use a `size_t` loop iteration variable that is compared to `uint`s in the condition, causing some implicit narrowing. > > Change the loop variables to also use `uints`. > > Pointed out by @kimbarrett during review of JDK-8349906. > > Testing: local compilation, GHA, checking other loop variables to match type > > Thanks, > Thomas This pull request has now been integrated. Changeset: a0dd5654 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/a0dd56543219343306aea99b684b5e2cb04c7d76 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod 8350643: G1: Make loop iteration variable type correspond to limit in G1SurvRateGroup Reviewed-by: ayang, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/23773 From xpeng at openjdk.org Wed Feb 26 18:58:54 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Feb 2025 18:58:54 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v3] In-Reply-To: <7HdJx1s4PkFH9L1AdYjiQMjR4nfl0RM4FDHtdlLDge4=.cf536d80-1679-44f8-b36c-9aab738f7cd8@github.com> References: <7HdJx1s4PkFH9L1AdYjiQMjR4nfl0RM4FDHtdlLDge4=.cf536d80-1679-44f8-b36c-9aab738f7cd8@github.com> Message-ID: On Wed, 26 Feb 2025 02:13:07 GMT, Y. Srinivas Ramakrishna wrote: > Changes are fine. > > This jumped out in yr sample output: > > ``` > ... > [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x > ... > [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x > ``` > > which seemed kinda interesting. I assume this is just a consequence of the very little work (and extremely brief time in this phase) here, and can be ignored in this sample output from likely a toy GC, or one where you may have artificially boosted the number of worker threads. Still I thought I'd ask in case you've seen this with bigger timings or more work in any of these phases with low fractional speed-ups. I'm not sure how parallelism is calculated, but I think it is caused by the test I was running, the test is very simple and there are only small number of live objects after mark. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23759#issuecomment-2685925735 From shade at openjdk.org Wed Feb 26 19:44:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Feb 2025 19:44:55 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v6] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 08:12:31 GMT, Xiaolong Peng wrote: >> The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: >> >> 1. Net GC pause timings include the time to propagate GC state to Java threads >> 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs >> 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. >> >> With the change, the new GC timing log will be like: >> >> [11.056s][info][gc,stats ] Concurrent Reset 89 us >> [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us >> [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us >> [11.056s][info][gc,stats ] Update Region States 3 us >> [11.056s][info][gc,stats ] Propagate GC state 1 us >> [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x >> [11.056s][info][gc,stats ] CMR: 456 us >> [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, >> [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x >> [11.057s][info][gc,stats ] CM: 3043 us >> [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, >> [11.057s][info][gc,stats ] Flush SATB 204 us >> [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us >> [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us >> [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x >> [11.057s][info][gc,stats ] Propagate GC state 2 us >> [11.057s][info][gc,stats ] Update Region States 12 us >> [11.057s][info][gc,stats ] Choose Collection Set 25 us >> [11.057s][info][gc,stats ] Rebuild Free Set 29 us >> [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x >> [11.057s][info][gc,stats ] CWRF: 17 us >> [11.057s][info][gc,... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Remove trailing whitespace src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 693: > 691: > 692: { > 693: ShenandoahGCPhase phase(ShenandoahPhaseTimings::init_propagate_gc_state); Here and later, I messed up my suggestion. I think these should be `ShenandoahTimingsTracker`, not `ShenandoahGCPhase`. `ShenandoahGCPhase` does more stuff we don't need. `ShenandoahTimingsTracker` does only the basic stuff. src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 86: > 84: op_degenerated(); > 85: heap->set_degenerated_gc_in_progress(false); > 86: { A bit sad we need to do this due to `op_degenerated` early returns, but fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1972273776 PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1972269107 From wkemper at openjdk.org Wed Feb 26 19:50:59 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 26 Feb 2025 19:50:59 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v14] In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 23:10:45 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: >> >> - Merge tag 'jdk-25+11' into fix-control-regulator-threads >> >> Added tag jdk-25+11 for changeset 0131c1bf >> - Address review feedback (better comments, better names) >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Old gen bootstrap cycle must make it to init mark >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Improve message for assertion >> - Make shutdown safer for threads requesting (or expecting) gc >> - Do not accept requests if control thread is terminating >> - Notify waiters when control thread terminates >> - Add event for control thread state changes >> - ... and 22 more: https://git.openjdk.org/jdk/compare/0131c1bf...d7858deb > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 75: > >> 73: >> 74: // The mode is read frequently by requesting threads and only ever written by the control thread. >> 75: volatile GCMode _mode; > > A bit of a nit: Any reason not to just call it `_gc_mode` which seems now to be its _de facto_ name due to its accessor method name and how it's referred to in comments? Renamed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1972289732 From wkemper at openjdk.org Wed Feb 26 20:03:56 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 26 Feb 2025 20:03:56 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v14] In-Reply-To: References: Message-ID: <-UXwakwdCebRDEfV0p5VLeRJqByHrZENp_NwYt30Af8=.4bc68e66-0de3-4b43-867a-9b6e6ba7a045@github.com> On Tue, 25 Feb 2025 23:22:48 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: >> >> - Merge tag 'jdk-25+11' into fix-control-regulator-threads >> >> Added tag jdk-25+11 for changeset 0131c1bf >> - Address review feedback (better comments, better names) >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Old gen bootstrap cycle must make it to init mark >> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads >> - Improve message for assertion >> - Make shutdown safer for threads requesting (or expecting) gc >> - Do not accept requests if control thread is terminating >> - Notify waiters when control thread terminates >> - Add event for control thread state changes >> - ... and 22 more: https://git.openjdk.org/jdk/compare/0131c1bf...d7858deb > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 150: > >> 148: GCMode prepare_for_explicit_gc_request(ShenandoahGCRequest &request); >> 149: >> 150: GCMode prepare_for_concurrent_gc_request(ShenandoahGCRequest &request); > > Documentation of private APIs. > > Bit of a nit: These all seem to take a request type, fill in some yet unfilled fields, and return a GC mode. So they look to me like `prepare_request_for_` rather than `pepare_for__request`. Hmm, they do modify the request, so I see your point. But they also touch `ShenandoahHeap` , `ShenandoahPolicy` and `ShenandoahHeuristics`. How about we just drop `_request` from their name? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1972319562 From wkemper at openjdk.org Wed Feb 26 20:19:59 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 26 Feb 2025 20:19:59 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v14] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 01:24:18 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 687: >> >>> 685: HeapWord* allocate_from_gclab_slow(Thread* thread, size_t size); >>> 686: HeapWord* allocate_new_gclab(size_t min_size, size_t word_size, size_t* actual_size); >>> 687: bool retry_allocation(size_t original_full_gc_count) const; >> >> It feels like this should be called `should_retry_allocation()`. Also a 1-line documentation comment: ``` // We want to retry an unsuccessful attempt at allocation until at least a full gc. ``` > > PS: do we know that a full gc will reclaim all soft refs? I suppose that is Shenandoah policy? Yes, I like that name better. And yes, full gc will reclaim all soft refs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1972342529 From xpeng at openjdk.org Wed Feb 26 20:24:06 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Feb 2025 20:24:06 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v6] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 19:42:10 GMT, Aleksey Shipilev wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove trailing whitespace > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 693: > >> 691: >> 692: { >> 693: ShenandoahGCPhase phase(ShenandoahPhaseTimings::init_propagate_gc_state); > > Here and later, I messed up my suggestion. I think these should be `ShenandoahTimingsTracker`, not `ShenandoahGCPhase`. `ShenandoahGCPhase` does more stuff we don't need. `ShenandoahTimingsTracker` does only the basic stuff. Thanks for pointing out, I'll update it and move them to `ShenandoahTimingsTracker` and test it again. > src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 86: > >> 84: op_degenerated(); >> 85: heap->set_degenerated_gc_in_progress(false); >> 86: { > > A bit sad we need to do this due to `op_degenerated` early returns, but fine. Yeah, this part is not very clean due to op_degenerated early returns. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1972349321 PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1972347797 From wkemper at openjdk.org Wed Feb 26 21:20:20 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 26 Feb 2025 21:20:20 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v15] In-Reply-To: References: Message-ID: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). > > ## Testing > * jtreg hotspot_gc_shenandoah > * dacapo, extremem, diluvian, specjbb2015, specjvm2018, heapothesys William Kemper has updated the pull request incrementally with one additional commit since the last revision: Improve names and comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23475/files - new: https://git.openjdk.org/jdk/pull/23475/files/d7858deb..fb7819d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=13-14 Stats: 57 lines in 4 files changed: 24 ins; 2 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From xpeng at openjdk.org Wed Feb 26 21:54:32 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Feb 2025 21:54:32 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v7] In-Reply-To: References: Message-ID: > The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: > > 1. Net GC pause timings include the time to propagate GC state to Java threads > 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs > 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. > > With the change, the new GC timing log will be like: > > [11.056s][info][gc,stats ] Concurrent Reset 89 us > [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us > [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us > [11.056s][info][gc,stats ] Update Region States 3 us > [11.056s][info][gc,stats ] Propagate GC state 1 us > [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x > [11.056s][info][gc,stats ] CMR: 456 us > [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, > [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x > [11.057s][info][gc,stats ] CM: 3043 us > [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, > [11.057s][info][gc,stats ] Flush SATB 204 us > [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us > [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us > [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x > [11.057s][info][gc,stats ] Propagate GC state 2 us > [11.057s][info][gc,stats ] Update Region States 12 us > [11.057s][info][gc,stats ] Choose Collection Set 25 us > [11.057s][info][gc,stats ] Rebuild Free Set 29 us > [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x > [11.057s][info][gc,stats ] CWRF: 17 us > [11.057s][info][gc,stats ] CWRF: Weak References 17 us, workers (... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Use ShenandoahTimingsTracker instead of ShenandoahGCPhase to track the timings of gc state propagation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23759/files - new: https://git.openjdk.org/jdk/pull/23759/files/891dd11d..858ebc39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23759&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23759&range=05-06 Stats: 7 lines in 4 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23759/head:pull/23759 PR: https://git.openjdk.org/jdk/pull/23759 From shade at openjdk.org Wed Feb 26 22:17:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Feb 2025 22:17:59 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v7] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 21:54:32 GMT, Xiaolong Peng wrote: >> The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: >> >> 1. Net GC pause timings include the time to propagate GC state to Java threads >> 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs >> 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. >> >> With the change, the new GC timing log will be like: >> >> [11.056s][info][gc,stats ] Concurrent Reset 89 us >> [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us >> [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us >> [11.056s][info][gc,stats ] Update Region States 3 us >> [11.056s][info][gc,stats ] Propagate GC state 1 us >> [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x >> [11.056s][info][gc,stats ] CMR: 456 us >> [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, >> [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x >> [11.057s][info][gc,stats ] CM: 3043 us >> [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, >> [11.057s][info][gc,stats ] Flush SATB 204 us >> [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us >> [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us >> [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x >> [11.057s][info][gc,stats ] Propagate GC state 2 us >> [11.057s][info][gc,stats ] Update Region States 12 us >> [11.057s][info][gc,stats ] Choose Collection Set 25 us >> [11.057s][info][gc,stats ] Rebuild Free Set 29 us >> [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x >> [11.057s][info][gc,stats ] CWRF: 17 us >> [11.057s][info][gc,... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Use ShenandoahTimingsTracker instead of ShenandoahGCPhase to track the timings of gc state propagation I am good with this, thanks. Only one minor nit remains. src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp line 60: > 58: f(init_transfer_satb, " Transfer Old From SATB") \ > 59: f(init_update_region_states, " Update Region States") \ > 60: f(init_propagate_gc_state, " Propagate GC state") \ Sorry, one last thing. Note how all our counters are Capitalized, so all these should be `Propagate GC State`. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23759#pullrequestreview-2646073039 PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1972497261 From xpeng at openjdk.org Wed Feb 26 22:23:24 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Feb 2025 22:23:24 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v7] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 22:14:44 GMT, Aleksey Shipilev wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Use ShenandoahTimingsTracker instead of ShenandoahGCPhase to track the timings of gc state propagation > > src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp line 60: > >> 58: f(init_transfer_satb, " Transfer Old From SATB") \ >> 59: f(init_update_region_states, " Update Region States") \ >> 60: f(init_propagate_gc_state, " Propagate GC state") \ > > Sorry, one last thing. Note how all our counters are Capitalized, so all these should be `Propagate GC State`. Fixed, thanks! sorry I should have noticed the naming convention here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23759#discussion_r1972502610 From xpeng at openjdk.org Wed Feb 26 22:23:24 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Feb 2025 22:23:24 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v8] In-Reply-To: References: Message-ID: > The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: > > 1. Net GC pause timings include the time to propagate GC state to Java threads > 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs > 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. > > With the change, the new GC timing log will be like: > > [11.056s][info][gc,stats ] Concurrent Reset 89 us > [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us > [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us > [11.056s][info][gc,stats ] Update Region States 3 us > [11.056s][info][gc,stats ] Propagate GC state 1 us > [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x > [11.056s][info][gc,stats ] CMR: 456 us > [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, > [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x > [11.057s][info][gc,stats ] CM: 3043 us > [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, > [11.057s][info][gc,stats ] Flush SATB 204 us > [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us > [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us > [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x > [11.057s][info][gc,stats ] Propagate GC state 2 us > [11.057s][info][gc,stats ] Update Region States 12 us > [11.057s][info][gc,stats ] Choose Collection Set 25 us > [11.057s][info][gc,stats ] Rebuild Free Set 29 us > [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x > [11.057s][info][gc,stats ] CWRF: 17 us > [11.057s][info][gc,stats ] CWRF: Weak References 17 us, workers (... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Fix name of counter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23759/files - new: https://git.openjdk.org/jdk/pull/23759/files/858ebc39..caf9d4c3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23759&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23759&range=06-07 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23759/head:pull/23759 PR: https://git.openjdk.org/jdk/pull/23759 From shade at openjdk.org Wed Feb 26 22:25:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Feb 2025 22:25:59 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v8] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 22:23:24 GMT, Xiaolong Peng wrote: >> The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: >> >> 1. Net GC pause timings include the time to propagate GC state to Java threads >> 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs >> 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. >> >> With the change, the new GC timing log will be like: >> >> [11.056s][info][gc,stats ] Concurrent Reset 89 us >> [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us >> [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us >> [11.056s][info][gc,stats ] Update Region States 3 us >> [11.056s][info][gc,stats ] Propagate GC state 1 us >> [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x >> [11.056s][info][gc,stats ] CMR: 456 us >> [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, >> [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x >> [11.057s][info][gc,stats ] CM: 3043 us >> [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, >> [11.057s][info][gc,stats ] Flush SATB 204 us >> [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us >> [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us >> [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x >> [11.057s][info][gc,stats ] Propagate GC state 2 us >> [11.057s][info][gc,stats ] Update Region States 12 us >> [11.057s][info][gc,stats ] Choose Collection Set 25 us >> [11.057s][info][gc,stats ] Rebuild Free Set 29 us >> [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x >> [11.057s][info][gc,stats ] CWRF: 17 us >> [11.057s][info][gc,... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Fix name of counter Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23759#pullrequestreview-2646089074 From ysr at openjdk.org Thu Feb 27 00:03:54 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 27 Feb 2025 00:03:54 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v8] In-Reply-To: References: Message-ID: <6KQyDTODPGMJk75Fyygvo36xQp1Z_WdnRuHUx1Wt9Uw=.84f67082-639c-45a0-898d-496b54aa2bae@github.com> On Wed, 26 Feb 2025 22:23:24 GMT, Xiaolong Peng wrote: >> The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: >> >> 1. Net GC pause timings include the time to propagate GC state to Java threads >> 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs >> 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. >> >> With the change, the new GC timing log will be like: >> >> [11.056s][info][gc,stats ] Concurrent Reset 89 us >> [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us >> [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us >> [11.056s][info][gc,stats ] Update Region States 3 us >> [11.056s][info][gc,stats ] Propagate GC state 1 us >> [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x >> [11.056s][info][gc,stats ] CMR: 456 us >> [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, >> [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x >> [11.057s][info][gc,stats ] CM: 3043 us >> [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, >> [11.057s][info][gc,stats ] Flush SATB 204 us >> [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us >> [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us >> [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x >> [11.057s][info][gc,stats ] Propagate GC state 2 us >> [11.057s][info][gc,stats ] Update Region States 12 us >> [11.057s][info][gc,stats ] Choose Collection Set 25 us >> [11.057s][info][gc,stats ] Rebuild Free Set 29 us >> [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x >> [11.057s][info][gc,stats ] CWRF: 17 us >> [11.057s][info][gc,... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Fix name of counter Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23759#pullrequestreview-2646224512 From xpeng at openjdk.org Thu Feb 27 00:34:02 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 27 Feb 2025 00:34:02 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v8] In-Reply-To: References: Message-ID: <3HzYErJUBCxonwTuiqe8GJRNv_U1PVaOGedzk1Wsn8s=.430da6d6-1054-4495-8877-0c343347c394@github.com> On Wed, 26 Feb 2025 22:23:24 GMT, Xiaolong Peng wrote: >> The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: >> >> 1. Net GC pause timings include the time to propagate GC state to Java threads >> 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs >> 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. >> >> With the change, the new GC timing log will be like: >> >> [11.056s][info][gc,stats ] Concurrent Reset 89 us >> [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us >> [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us >> [11.056s][info][gc,stats ] Update Region States 3 us >> [11.056s][info][gc,stats ] Propagate GC state 1 us >> [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x >> [11.056s][info][gc,stats ] CMR: 456 us >> [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, >> [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x >> [11.057s][info][gc,stats ] CM: 3043 us >> [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, >> [11.057s][info][gc,stats ] Flush SATB 204 us >> [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us >> [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us >> [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x >> [11.057s][info][gc,stats ] Propagate GC state 2 us >> [11.057s][info][gc,stats ] Update Region States 12 us >> [11.057s][info][gc,stats ] Choose Collection Set 25 us >> [11.057s][info][gc,stats ] Rebuild Free Set 29 us >> [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x >> [11.057s][info][gc,stats ] CWRF: 17 us >> [11.057s][info][gc,... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Fix name of counter Thanks all for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23759#issuecomment-2686513145 From duke at openjdk.org Thu Feb 27 00:34:03 2025 From: duke at openjdk.org (duke) Date: Thu, 27 Feb 2025 00:34:03 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v8] In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 22:23:24 GMT, Xiaolong Peng wrote: >> The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: >> >> 1. Net GC pause timings include the time to propagate GC state to Java threads >> 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs >> 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. >> >> With the change, the new GC timing log will be like: >> >> [11.056s][info][gc,stats ] Concurrent Reset 89 us >> [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us >> [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us >> [11.056s][info][gc,stats ] Update Region States 3 us >> [11.056s][info][gc,stats ] Propagate GC state 1 us >> [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x >> [11.056s][info][gc,stats ] CMR: 456 us >> [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, >> [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, >> [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x >> [11.057s][info][gc,stats ] CM: 3043 us >> [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, >> [11.057s][info][gc,stats ] Flush SATB 204 us >> [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us >> [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us >> [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x >> [11.057s][info][gc,stats ] Propagate GC state 2 us >> [11.057s][info][gc,stats ] Update Region States 12 us >> [11.057s][info][gc,stats ] Choose Collection Set 25 us >> [11.057s][info][gc,stats ] Rebuild Free Set 29 us >> [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x >> [11.057s][info][gc,stats ] CWRF: 17 us >> [11.057s][info][gc,... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Fix name of counter @pengxiaolong Your change (at version caf9d4c35a5712b4fa4c234aedb092abbdcc826e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23759#issuecomment-2686513700 From wkemper at openjdk.org Thu Feb 27 01:14:29 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 27 Feb 2025 01:14:29 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v16] In-Reply-To: References: Message-ID: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). > > ## Testing > * jtreg hotspot_gc_shenandoah > * dacapo, extremem, diluvian, specjbb2015, specjvm2018, heapothesys William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Add assertions about old gen state when resuming old cycles - Remove duplicated field pointer for old generation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23475/files - new: https://git.openjdk.org/jdk/pull/23475/files/fb7819d0..d2e90dde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=14-15 Stats: 9 lines in 2 files changed: 2 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From duke at openjdk.org Thu Feb 27 01:40:58 2025 From: duke at openjdk.org (duke) Date: Thu, 27 Feb 2025 01:40:58 GMT Subject: RFR: 8314840: 3 gc/epsilon tests ignore external vm options In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 15:27:46 GMT, Ramkumar Sunderbabu wrote: > These tests do not pass Java/JVM test command line options (flags) to the child process. More details in JBS. > Tiers 1 to 3 tested. Along with various flag combinations. @rsunderbabu Your change (at version a83a03bc0fbb4895726d1b1316f4486a69ff475b) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23751#issuecomment-2686588196 From rsunderbabu at openjdk.org Thu Feb 27 01:40:58 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Thu, 27 Feb 2025 01:40:58 GMT Subject: RFR: 8314840: 3 gc/epsilon tests ignore external vm options In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 15:27:46 GMT, Ramkumar Sunderbabu wrote: > These tests do not pass Java/JVM test command line options (flags) to the child process. More details in JBS. > Tiers 1 to 3 tested. Along with various flag combinations. Please sponsor. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23751#issuecomment-2686588474 From ysr at openjdk.org Thu Feb 27 02:42:09 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 27 Feb 2025 02:42:09 GMT Subject: RFR: 8350314: Shenandoah: Capture thread state sync times in GC timings [v3] In-Reply-To: References: <7HdJx1s4PkFH9L1AdYjiQMjR4nfl0RM4FDHtdlLDge4=.cf536d80-1679-44f8-b36c-9aab738f7cd8@github.com> Message-ID: On Wed, 26 Feb 2025 18:56:06 GMT, Xiaolong Peng wrote: > I'm not sure how parallelism is calculated, but I think it is caused by the test I was running, the test is very simple and there are only small number of live objects after mark. Yes that explains it, thanks. (PS: Parallelism (or really parallel speed-up) is calculated as the ratio of total thread virtual time to the wall-clock (elapsed) time, IIRC. As you noted work in this specific workload is too small for the parallel task overhead, as indicated by the micro-seconds worth of time.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23759#issuecomment-2686707985 From tschatzl at openjdk.org Thu Feb 27 08:47:30 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 27 Feb 2025 08:47:30 GMT Subject: RFR: 8350758: G1: Use actual last prediction in accumulated survivor rate prediction too Message-ID: Hi all, please review this fix to recent JDK-8349906 to use the current survivor rate in the accumulated survivor rate for initializing newly allocated entries as well. Testing: gha Thanks, Thomas ------------- Commit messages: - 8350758 Changes: https://git.openjdk.org/jdk/pull/23795/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23795&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350758 Stats: 16 lines in 1 file changed: 5 ins; 8 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23795.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23795/head:pull/23795 PR: https://git.openjdk.org/jdk/pull/23795 From xpeng at openjdk.org Thu Feb 27 09:52:22 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 27 Feb 2025 09:52:22 GMT Subject: Integrated: 8350314: Shenandoah: Capture thread state sync times in GC timings In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 01:20:35 GMT, Xiaolong Peng wrote: > The change is to improve the observability of Shenandoah GC, basically there are three changes for Shenandoah GC timings in this PR: > > 1. Net GC pause timings include the time to propagate GC state to Java threads > 2. Add new timing "Propagate GC state" in Shenandoah GC timing logs > 3. Removal of the call of `propagate_gc_state_to_all_threads` from "init_update_refs", which handles gc state in handshake already. > > With the change, the new GC timing log will be like: > > [11.056s][info][gc,stats ] Concurrent Reset 89 us > [11.056s][info][gc,stats ] Pause Init Mark (G) 257 us > [11.056s][info][gc,stats ] Pause Init Mark (N) 17 us > [11.056s][info][gc,stats ] Update Region States 3 us > [11.056s][info][gc,stats ] Propagate GC state 1 us > [11.056s][info][gc,stats ] Concurrent Mark Roots 232 us, parallelism: 1.96x > [11.056s][info][gc,stats ] CMR: 456 us > [11.056s][info][gc,stats ] CMR: Thread Roots 429 us, workers (us): 139, 148, 142, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: VM Strong Roots 11 us, workers (us): 8, 3, 0, ---, ---, ---, > [11.057s][info][gc,stats ] CMR: CLDG Roots 16 us, workers (us): 16, ---, ---, ---, ---, ---, > [11.057s][info][gc,stats ] Concurrent Marking 1304 us, parallelism: 2.33x > [11.057s][info][gc,stats ] CM: 3043 us > [11.057s][info][gc,stats ] CM: Parallel Mark 3043 us, workers (us): 1023, 1017, 1003, ---, ---, ---, > [11.057s][info][gc,stats ] Flush SATB 204 us > [11.057s][info][gc,stats ] Pause Final Mark (G) 865 us > [11.057s][info][gc,stats ] Pause Final Mark (N) 234 us > [11.057s][info][gc,stats ] Finish Mark 129 us, parallelism: 0.01x > [11.057s][info][gc,stats ] Propagate GC state 2 us > [11.057s][info][gc,stats ] Update Region States 12 us > [11.057s][info][gc,stats ] Choose Collection Set 25 us > [11.057s][info][gc,stats ] Rebuild Free Set 29 us > [11.057s][info][gc,stats ] Concurrent Weak References 67 us, parallelism: 0.25x > [11.057s][info][gc,stats ] CWRF: 17 us > [11.057s][info][gc,stats ] CWRF: Weak References 17 us, workers (... This pull request has now been integrated. Changeset: 01bd7e41 Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/01bd7e417ee3d39067370e616660b7f5c723dc26 Stats: 47 lines in 6 files changed: 40 ins; 7 del; 0 mod 8350314: Shenandoah: Capture thread state sync times in GC timings Reviewed-by: ysr, shade, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/23759 From ayang at openjdk.org Thu Feb 27 10:59:52 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 27 Feb 2025 10:59:52 GMT Subject: RFR: 8350758: G1: Use actual last prediction in accumulated survivor rate prediction too In-Reply-To: References: Message-ID: <_8V7e9yAXzH5KceaLFtRmBe-_HGePHWuM0m5WmXSFKI=.2cda4958-d53c-4713-ad00-155a0cbb4b9e@github.com> On Wed, 26 Feb 2025 11:51:31 GMT, Thomas Schatzl wrote: > Hi all, > > please review this fix to recent JDK-8349906 to use the current > survivor rate in the accumulated survivor rate for initializing newly allocated entries as well. > > Testing: gha > > Thanks, > Thomas src/hotspot/share/gc/g1/g1SurvRateGroup.cpp line 76: > 74: if (i == 0) { > 75: _surv_rate_predictors[i]->add(InitialSurvivorRate); > 76: _accum_surv_rate_pred[i] = 0.0; Should this be updated as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23795#discussion_r1973350393 From aboldtch at openjdk.org Thu Feb 27 11:20:11 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 27 Feb 2025 11:20:11 GMT Subject: RFR: 8350572: ZGC: Enhance z_verify_safepoints_are_blocked interactions with VMError Message-ID: If VMError reporting is triggered from a disallowed thread state `z_verify_safepoints_are_blocked` will cause reentrant assertions to be triggered, when for example when loading the thread oop as part of thread printing. This extends the verification to be ignored if triggered from the thread doing the error reporting. In most cases performing the load barriers from disallowed thread states during error reporting will work. Testing: - tier 1 Oracle supported platforms - GHA ------------- Commit messages: - 8350572: ZGC: Enhance z_verify_safepoints_are_blocked interactions with VMError Changes: https://git.openjdk.org/jdk/pull/23820/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23820&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350572 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23820.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23820/head:pull/23820 PR: https://git.openjdk.org/jdk/pull/23820 From tschatzl at openjdk.org Thu Feb 27 12:22:33 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 27 Feb 2025 12:22:33 GMT Subject: RFR: 8350758: G1: Use actual last prediction in accumulated survivor rate prediction too [v2] In-Reply-To: References: Message-ID: <6NAHP_4D-gWM_Sx1wmWg_nxDvoIPGpyN6E2xJR8Deag=.0f8d8872-cb8f-47a4-8bbf-fa6e62101db6@github.com> > Hi all, > > please review this fix to recent JDK-8349906 to use the current > survivor rate in the accumulated survivor rate for initializing newly allocated entries as well. > > Testing: gha > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * ayang review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23795/files - new: https://git.openjdk.org/jdk/pull/23795/files/5ab28451..8bfe673a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23795&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23795&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23795.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23795/head:pull/23795 PR: https://git.openjdk.org/jdk/pull/23795 From ayang at openjdk.org Thu Feb 27 12:22:33 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 27 Feb 2025 12:22:33 GMT Subject: RFR: 8350758: G1: Use actual last prediction in accumulated survivor rate prediction too [v2] In-Reply-To: <6NAHP_4D-gWM_Sx1wmWg_nxDvoIPGpyN6E2xJR8Deag=.0f8d8872-cb8f-47a4-8bbf-fa6e62101db6@github.com> References: <6NAHP_4D-gWM_Sx1wmWg_nxDvoIPGpyN6E2xJR8Deag=.0f8d8872-cb8f-47a4-8bbf-fa6e62101db6@github.com> Message-ID: <5Egj5zzjGbKW9VO_-PDpnhgdsVhO4PFh_9szRmndrnA=.e6a3102f-2aec-4a90-a980-1314b0fc0816@github.com> On Thu, 27 Feb 2025 12:19:16 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this fix to recent JDK-8349906 to use the current >> survivor rate in the accumulated survivor rate for initializing newly allocated entries as well. >> >> Testing: gha >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * ayang review Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23795#pullrequestreview-2647588439 From eosterlund at openjdk.org Thu Feb 27 12:48:57 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 27 Feb 2025 12:48:57 GMT Subject: RFR: 8350572: ZGC: Enhance z_verify_safepoints_are_blocked interactions with VMError In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 11:15:52 GMT, Axel Boldt-Christmas wrote: > If VMError reporting is triggered from a disallowed thread state `z_verify_safepoints_are_blocked` will cause reentrant assertions to be triggered, when for example when loading the thread oop as part of thread printing. This extends the verification to be ignored if triggered from the thread doing the error reporting. In most cases performing the load barriers from disallowed thread states during error reporting will work. > > Testing: > - tier 1 Oracle supported platforms > - GHA Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23820#pullrequestreview-2647669760 From aboldtch at openjdk.org Thu Feb 27 12:50:30 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 27 Feb 2025 12:50:30 GMT Subject: RFR: 8350851: ZGC: Reduce size of ZAddressOffsetMax scaling data structures Message-ID: ZAddressOffsetMax is used to scale a few of our BitMap and GranuleMap data structures. ZAddressOffsetMax is initialised to an upper limit, prior to reserving the virtual address space for the heap. After the reservation, the largest address offset that can be encountered may be much lower. I propose we scale ZAddressOffsetMax down after our heap reservation is complete, to the actual max value an zoffset_end is allowed to be. Doing this gives us two benefits. Firstly the assertions and type checks will be stricter, and will exercise code paths that otherwise only occur when using a 16TB heap. Secondly we can reduce the size of the data structures which scale with ZAddressOffsetMax. (For most OSs the extra memory of these data structures do not really matter as they are not page'd in. But they are accounted for both on the OS, allocator and NMT layers). The page table, uses ZIndexDistributor to iterate and distribute indices. The different strategies have different requirements on the alignment of the size of the range it distribute across. My proposed implementation simply aligns up the page table size to this alignment requirement. As it is the least intrusive change, at the cost of some larger data structure than strictly required. The alternative would be to extend ZIndexDistributor with support for any alignment on the range, or condition the use of the distributed indices based on if they are less than the size. The data structures can also be larger than required if we fail to reserve the heap starting at our heap base. However this is a very rare occurrence, and while it would be nice to extend our asserts to check for a "ZAddressOffsetMin", I'll leave that for a future enhancement. Testing: * ZGC specific tasks, tier 1 through tier 8 on Oracle Supported platforms * with `ZIndexDistributorStrategy=0`, and * with `ZIndexDistributorStrategy=1` * GHA ------------- Commit messages: - 8350851: ZGC: Reduce size of ZAddressOffsetMax scaling data structures Changes: https://git.openjdk.org/jdk/pull/23822/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23822&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350851 Stats: 82 lines in 10 files changed: 73 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23822.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23822/head:pull/23822 PR: https://git.openjdk.org/jdk/pull/23822 From rsunderbabu at openjdk.org Thu Feb 27 13:01:11 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Thu, 27 Feb 2025 13:01:11 GMT Subject: Integrated: 8314840: 3 gc/epsilon tests ignore external vm options In-Reply-To: References: Message-ID: On Mon, 24 Feb 2025 15:27:46 GMT, Ramkumar Sunderbabu wrote: > These tests do not pass Java/JVM test command line options (flags) to the child process. More details in JBS. > Tiers 1 to 3 tested. Along with various flag combinations. This pull request has now been integrated. Changeset: 799ac528 Author: Ramkumar Sunderbabu Committer: SendaoYan URL: https://git.openjdk.org/jdk/commit/799ac5288efbbb89e21319cd45657c8f817ad680 Stats: 9 lines in 3 files changed: 0 ins; 0 del; 9 mod 8314840: 3 gc/epsilon tests ignore external vm options Reviewed-by: tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/23751 From iwalulya at openjdk.org Thu Feb 27 13:02:06 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 27 Feb 2025 13:02:06 GMT Subject: RFR: 8350758: G1: Use actual last prediction in accumulated survivor rate prediction too [v2] In-Reply-To: <6NAHP_4D-gWM_Sx1wmWg_nxDvoIPGpyN6E2xJR8Deag=.0f8d8872-cb8f-47a4-8bbf-fa6e62101db6@github.com> References: <6NAHP_4D-gWM_Sx1wmWg_nxDvoIPGpyN6E2xJR8Deag=.0f8d8872-cb8f-47a4-8bbf-fa6e62101db6@github.com> Message-ID: <19YTnvDZIxpkuA9etPTn0Hb5bGcgJkKJDGHiR3ER6j8=.45ba6b3a-be0e-4cf2-bd24-c0b5633fcee2@github.com> On Thu, 27 Feb 2025 12:22:33 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this fix to recent JDK-8349906 to use the current >> survivor rate in the accumulated survivor rate for initializing newly allocated entries as well. >> >> Testing: gha >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * ayang review Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23795#pullrequestreview-2647702251 From rsunderbabu at openjdk.org Thu Feb 27 13:43:57 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Thu, 27 Feb 2025 13:43:57 GMT Subject: RFR: 8314840: 3 gc/epsilon tests ignore external vm options In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 09:05:04 GMT, Thomas Schatzl wrote: >> These tests do not pass Java/JVM test command line options (flags) to the child process. More details in JBS. >> Tiers 1 to 3 tested. Along with various flag combinations. > > lgtm. Thanks @tschatzl and @sendaoYan ------------- PR Comment: https://git.openjdk.org/jdk/pull/23751#issuecomment-2688000537 From eosterlund at openjdk.org Thu Feb 27 16:03:04 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 27 Feb 2025 16:03:04 GMT Subject: RFR: 8350851: ZGC: Reduce size of ZAddressOffsetMax scaling data structures In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 12:45:36 GMT, Axel Boldt-Christmas wrote: > ZAddressOffsetMax is used to scale a few of our BitMap and GranuleMap data structures. ZAddressOffsetMax is initialised to an upper limit, prior to reserving the virtual address space for the heap. After the reservation, the largest address offset that can be encountered may be much lower. > > I propose we scale ZAddressOffsetMax down after our heap reservation is complete, to the actual max value an zoffset_end is allowed to be. > > Doing this gives us two benefits. Firstly the assertions and type checks will be stricter, and will exercise code paths that otherwise only occur when using a 16TB heap. Secondly we can reduce the size of the data structures which scale with ZAddressOffsetMax. (For most OSs the extra memory of these data structures do not really matter as they are not page'd in. But they are accounted for both on the OS, allocator and NMT layers). > > The page table, uses ZIndexDistributor to iterate and distribute indices. The different strategies have different requirements on the alignment of the size of the range it distribute across. My proposed implementation simply aligns up the page table size to this alignment requirement. As it is the least intrusive change, at the cost of some larger data structure than strictly required. The alternative would be to extend ZIndexDistributor with support for any alignment on the range, or condition the use of the distributed indices based on if they are less than the size. > > The data structures can also be larger than required if we fail to reserve the heap starting at our heap base. However this is a very rare occurrence, and while it would be nice to extend our asserts to check for a "ZAddressOffsetMin", I'll leave that for a future enhancement. > > Testing: > * ZGC specific tasks, tier 1 through tier 8 on Oracle Supported platforms > * with `ZIndexDistributorStrategy=0`, and > * with `ZIndexDistributorStrategy=1` > * GHA Nice. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23822#pullrequestreview-2648286555 From kdnilsen at openjdk.org Thu Feb 27 17:10:43 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 27 Feb 2025 17:10:43 GMT Subject: RFR: 8350889: GenShen: Break out of infinite loop of old GC cycles Message-ID: Add log message when heuristic trigger because previous trigger is pending. Cancel the pending-trigger condition for old-generation at the start of old-generation GC. ------------- Commit messages: - Fix white space - Cancel pending GC trigger at start of old gc - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 20 more: https://git.openjdk.org/jdk/compare/3c9d64eb...85319914 Changes: https://git.openjdk.org/jdk/pull/23827/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23827&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350889 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23827.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23827/head:pull/23827 PR: https://git.openjdk.org/jdk/pull/23827 From ysr at openjdk.org Thu Feb 27 17:35:02 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 27 Feb 2025 17:35:02 GMT Subject: RFR: 8350889: GenShen: Break out of infinite loop of old GC cycles In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 16:58:49 GMT, Kelvin Nilsen wrote: > Add log message when heuristic trigger because previous trigger is pending. Cancel the pending-trigger condition for old-generation at the start of old-generation GC. Changes look good. A somewhat tangential comment (possibly a nit) on old (existing) code. src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 246: > 244: log_trigger("GC start is already pending"); > 245: return true; > 246: } Should this happen lower down? Otherwise you miss the sampling of allocation rate at this time, down at line 249. Not sure if that can be an issue. Perhaps not and you just interpolate an average over the duration since the last sample albeit missing the occasional spike or dip, with perhaps the resulting low-pass filtering here ok? It would be OK if there is not an expectation of roughly syncrhonous sampling. Depends on how the code at line 286 works, I guess. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23827#pullrequestreview-2648522862 PR Review Comment: https://git.openjdk.org/jdk/pull/23827#discussion_r1974023918 From wkemper at openjdk.org Thu Feb 27 17:35:01 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 27 Feb 2025 17:35:01 GMT Subject: RFR: 8350889: GenShen: Break out of infinite loop of old GC cycles In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 16:58:49 GMT, Kelvin Nilsen wrote: > Add log message when heuristic trigger because previous trigger is pending. Cancel the pending-trigger condition for old-generation at the start of old-generation GC. LGTM ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23827#pullrequestreview-2648558352 From ysr at openjdk.org Thu Feb 27 17:35:03 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 27 Feb 2025 17:35:03 GMT Subject: RFR: 8350889: GenShen: Break out of infinite loop of old GC cycles In-Reply-To: References: Message-ID: <7YIfJj5yEf_pjdRuHh2IXyiF88Un6h1MQ-TPbQzrkIY=.5ac56e7f-faa7-479f-b16d-4ec3ff552bbc@github.com> On Thu, 27 Feb 2025 17:18:28 GMT, Y. Srinivas Ramakrishna wrote: >> Add log message when heuristic trigger because previous trigger is pending. Cancel the pending-trigger condition for old-generation at the start of old-generation GC. > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 246: > >> 244: log_trigger("GC start is already pending"); >> 245: return true; >> 246: } > > Should this happen lower down? Otherwise you miss the sampling of allocation rate at this time, down at line 249. Not sure if that can be an issue. Perhaps not and you just interpolate an average over the duration since the last sample albeit missing the occasional spike or dip, with perhaps the resulting low-pass filtering here ok? It would be OK if there is not an expectation of roughly syncrhonous sampling. Depends on how the code at line 286 works, I guess. PS: the lvalue rate at line 249 isn't used anywhere. Does the compiler complain if you `sample` without assigning the value? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23827#discussion_r1974045814 From ysr at openjdk.org Thu Feb 27 17:35:03 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 27 Feb 2025 17:35:03 GMT Subject: RFR: 8350889: GenShen: Break out of infinite loop of old GC cycles In-Reply-To: <7YIfJj5yEf_pjdRuHh2IXyiF88Un6h1MQ-TPbQzrkIY=.5ac56e7f-faa7-479f-b16d-4ec3ff552bbc@github.com> References: <7YIfJj5yEf_pjdRuHh2IXyiF88Un6h1MQ-TPbQzrkIY=.5ac56e7f-faa7-479f-b16d-4ec3ff552bbc@github.com> Message-ID: On Thu, 27 Feb 2025 17:28:07 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 246: >> >>> 244: log_trigger("GC start is already pending"); >>> 245: return true; >>> 246: } >> >> Should this happen lower down? Otherwise you miss the sampling of allocation rate at this time, down at line 249. Not sure if that can be an issue. Perhaps not and you just interpolate an average over the duration since the last sample albeit missing the occasional spike or dip, with perhaps the resulting low-pass filtering here ok? It would be OK if there is not an expectation of roughly syncrhonous sampling. Depends on how the code at line 286 works, I guess. > > PS: the lvalue rate at line 249 isn't used anywhere. Does the compiler complain if you `sample` without assigning the value? Looked at the code for `ShenandoahAllocationRate`, I guess this may have the effect of making the sampling a bit more asynchronous, but the resulting smoothing over the interval could underestimate the volatility of the rate which is used in an `upper_bound` calculation below. May be that is harmless? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23827#discussion_r1974048690 From wkemper at openjdk.org Thu Feb 27 18:27:00 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 27 Feb 2025 18:27:00 GMT Subject: RFR: 8347804: GenShen: Crash with small GCCardSizeInBytes and small Java heap In-Reply-To: References: Message-ID: On Thu, 30 Jan 2025 18:55:53 GMT, Kelvin Nilsen wrote: > Original implementation was not robust to overriding of CardSizeInBytes, especially to smaller values. This fixes that issue. Minor nit. src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp line 192: > 190: > 191: if (GCCardSizeInBytes < ShenandoahMinCardSizeInBytes) { > 192: char buf[512]; It looks like using `err_msg` here is more idiomatic than `os:snprintf` for errors during initialization. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 950: > 948: ShenandoahHeap* const heap = ShenandoahHeap::heap(); > 949: assert(heap->is_concurrent_weak_root_in_progress(), "Only during this phase"); > 950: { This looks like it came from https://github.com/openjdk/jdk/pull/23604. Did you cherry-pick that into this branch? Not sure why it shows in the diff here. ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23373#pullrequestreview-2648607661 PR Review Comment: https://git.openjdk.org/jdk/pull/23373#discussion_r1974075963 PR Review Comment: https://git.openjdk.org/jdk/pull/23373#discussion_r1974082900 From ayang at openjdk.org Thu Feb 27 18:34:14 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 27 Feb 2025 18:34:14 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v2] In-Reply-To: References: Message-ID: <3zmj-DeeRyPMHc32YnvfqACN0xJxLQ6jZZ7sd-Baa3w=.672912f6-e4a3-4679-b8a3-b7f6ad51589d@github.com> On Tue, 25 Feb 2025 15:13:43 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * remove unnecessarily added logging src/hotspot/share/gc/g1/g1BarrierSet.hpp line 54: > 52: // them, keeping the write barrier simple. > 53: // > 54: // The refinement threads mark cards in the the current collection set specially on the "the the" typo. src/hotspot/share/gc/g1/g1CardTable.inline.hpp line 47: > 45: > 46: // Returns bits from a where mask is 0, and bits from b where mask is 1. > 47: inline size_t blend(size_t a, size_t b, size_t mask) { Can you provide some input/output examples in the doc? src/hotspot/share/gc/g1/g1CardTableClaimTable.cpp line 45: > 43: } > 44: > 45: void G1CardTableClaimTable::initialize(size_t max_reserved_regions) { Should the arg be `uint`? src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 280: > 278: assert_state(State::SweepRT); > 279: > 280: set_state_start_time(); This method is called in a loop; would that skew the state-starting time? src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 344: > 342: size_t _num_clean; > 343: size_t _num_dirty; > 344: size_t _num_to_cset; Seem never read. src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 349: > 347: > 348: bool do_heap_region(G1HeapRegion* r) override { > 349: if (!r->is_free()) { I am a bit lost on this closure; the intention seems to set unclaimed to all non-free regions, why can't this be done in one go, instead of first setting all regions to claimed (`reset_all_claims_to_claimed`), then set non-free ones unclaimed? src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 116: > 114: > 115: // Current heap snapshot. > 116: G1CardTableClaimTable* _sweep_state; Since this is a table, I wonder if we can name it "x_table" instead of "x_state". src/hotspot/share/gc/g1/g1RemSet.cpp line 147: > 145: if (_contains[region]) { > 146: return; > 147: } Indentation seems broken. src/hotspot/share/gc/g1/g1RemSet.cpp line 830: > 828: size_t const start_idx = region_card_base_idx + claim.value(); > 829: > 830: size_t* card_cur_card = (size_t*)card_table->byte_for_index(start_idx); This var name should end with "_word", instead of "_card". src/hotspot/share/gc/g1/g1RemSet.cpp line 1252: > 1250: G1ConcurrentRefineWorkState::snapshot_heap_into(&constructed); > 1251: claim = &constructed; > 1252: } It's not super obvious to me why the "has_sweep_claims" checking needs to be on this level. Can `G1ConcurrentRefineWorkState` return a valid `G1CardTableClaimTable*` directly? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1974124792 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1971426039 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1973435950 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1974083760 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1973447654 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1973452168 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1974056492 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1973423400 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1974108760 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1974134441 From kdnilsen at openjdk.org Thu Feb 27 18:35:57 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 27 Feb 2025 18:35:57 GMT Subject: RFR: 8350889: GenShen: Break out of infinite loop of old GC cycles In-Reply-To: References: <7YIfJj5yEf_pjdRuHh2IXyiF88Un6h1MQ-TPbQzrkIY=.5ac56e7f-faa7-479f-b16d-4ec3ff552bbc@github.com> Message-ID: <4lsZ9HWnjBflQG5tz0R5gaQf3eHsn21jzvbVTBUGv2c=.a412baa0-2c8d-486e-8cda-880e2c6cc52b@github.com> On Thu, 27 Feb 2025 17:30:08 GMT, Y. Srinivas Ramakrishna wrote: >> PS: the lvalue rate at line 249 isn't used anywhere. Does the compiler complain if you `sample` without assigning the value? > > Looked at the code for `ShenandoahAllocationRate`, I guess this may have the effect of making the sampling a bit more asynchronous, but the resulting smoothing over the interval could underestimate the volatility of the rate which is used in an `upper_bound` calculation below. May be that is harmless? Yes. If we've already decided to trigger, there is no urgency to re-evaluate the allocation rate. We won't be checking for should_start_gc() again until the end of the current gc cycle. At that time, we will update the allocation rate based on how many allocation were realized during the GC cycle. The initial allocation rate estimate will be refined as should_start_gc() is again sampled every ms or so. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23827#discussion_r1974136474 From jsikstro at openjdk.org Thu Feb 27 18:39:01 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 27 Feb 2025 18:39:01 GMT Subject: RFR: 8350851: ZGC: Reduce size of ZAddressOffsetMax scaling data structures In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 12:45:36 GMT, Axel Boldt-Christmas wrote: > ZAddressOffsetMax is used to scale a few of our BitMap and GranuleMap data structures. ZAddressOffsetMax is initialised to an upper limit, prior to reserving the virtual address space for the heap. After the reservation, the largest address offset that can be encountered may be much lower. > > I propose we scale ZAddressOffsetMax down after our heap reservation is complete, to the actual max value an zoffset_end is allowed to be. > > Doing this gives us two benefits. Firstly the assertions and type checks will be stricter, and will exercise code paths that otherwise only occur when using a 16TB heap. Secondly we can reduce the size of the data structures which scale with ZAddressOffsetMax. (For most OSs the extra memory of these data structures do not really matter as they are not page'd in. But they are accounted for both on the OS, allocator and NMT layers). > > The page table, uses ZIndexDistributor to iterate and distribute indices. The different strategies have different requirements on the alignment of the size of the range it distribute across. My proposed implementation simply aligns up the page table size to this alignment requirement. As it is the least intrusive change, at the cost of some larger data structure than strictly required. The alternative would be to extend ZIndexDistributor with support for any alignment on the range, or condition the use of the distributed indices based on if they are less than the size. > > The data structures can also be larger than required if we fail to reserve the heap starting at our heap base. However this is a very rare occurrence, and while it would be nice to extend our asserts to check for a "ZAddressOffsetMin", I'll leave that for a future enhancement. > > Testing: > * ZGC specific tasks, tier 1 through tier 8 on Oracle Supported platforms > * with `ZIndexDistributorStrategy=0`, and > * with `ZIndexDistributorStrategy=1` > * GHA Looks good! ------------- Marked as reviewed by jsikstro (Committer). PR Review: https://git.openjdk.org/jdk/pull/23822#pullrequestreview-2648726308 From wkemper at openjdk.org Thu Feb 27 18:39:59 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 27 Feb 2025 18:39:59 GMT Subject: RFR: 8349766: GenShen: Bad progress after degen does not always need full gc [v5] In-Reply-To: References: Message-ID: On Tue, 18 Feb 2025 19:28:28 GMT, Kelvin Nilsen wrote: >> In generational mode, only upgrade to full GC from degenerated GC if we've done two degenerated cycles in a row and both indicated bad progress. Otherwise, start another concurrent GC, which will most likely degenerate also. But this degenerated cycle will reclaim floating garbage within the young generation much more quickly than a full GC would have done. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' of https://git.openjdk.org/jdk into defer-generational-full-gc > - Merge master > - Fix typo in merge conflict resolution > - 8348595: GenShen: Fix generational free-memory no-progress check > > Reviewed-by: phh, xpeng > - 8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710) > > Reviewed-by: shade > - Merge tag 'jdk-25+10' into defer-generational-full-gc > > Added tag jdk-25+10 for changeset a637ccf2 > - Be less eager to upgrade degen to full gc Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23552#pullrequestreview-2648733128 From kdnilsen at openjdk.org Thu Feb 27 18:43:02 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 27 Feb 2025 18:43:02 GMT Subject: Integrated: 8349766: GenShen: Bad progress after degen does not always need full gc In-Reply-To: References: Message-ID: <0sDKADu6y5hHS3qy_R2vjlKRi0ttb2Q4XAKqdtnm23k=.872c061e-7409-424d-acd1-9436ed187866@github.com> On Tue, 11 Feb 2025 03:31:51 GMT, Kelvin Nilsen wrote: > In generational mode, only upgrade to full GC from degenerated GC if we've done two degenerated cycles in a row and both indicated bad progress. Otherwise, start another concurrent GC, which will most likely degenerate also. But this degenerated cycle will reclaim floating garbage within the young generation much more quickly than a full GC would have done. This pull request has now been integrated. Changeset: 3ae80bfb Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/3ae80bfb6085e1a6bcb551c7b0be8f27b6f9fde9 Stats: 20 lines in 2 files changed: 17 ins; 0 del; 3 mod 8349766: GenShen: Bad progress after degen does not always need full gc Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/jdk/pull/23552 From wkemper at openjdk.org Thu Feb 27 18:47:24 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 27 Feb 2025 18:47:24 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v17] In-Reply-To: References: Message-ID: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). > > ## Testing > * jtreg hotspot_gc_shenandoah > * dacapo, extremem, diluvian, specjbb2015, specjvm2018, heapothesys William Kemper has updated the pull request incrementally with one additional commit since the last revision: Don't check for shutdown in control thread loop condition It may cause the thread to exit before it is requested to stop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23475/files - new: https://git.openjdk.org/jdk/pull/23475/files/d2e90dde..150cb798 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=15-16 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From kdnilsen at openjdk.org Thu Feb 27 18:56:06 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 27 Feb 2025 18:56:06 GMT Subject: RFR: 8350889: GenShen: Break out of infinite loop of old GC cycles In-Reply-To: <4lsZ9HWnjBflQG5tz0R5gaQf3eHsn21jzvbVTBUGv2c=.a412baa0-2c8d-486e-8cda-880e2c6cc52b@github.com> References: <7YIfJj5yEf_pjdRuHh2IXyiF88Un6h1MQ-TPbQzrkIY=.5ac56e7f-faa7-479f-b16d-4ec3ff552bbc@github.com> <4lsZ9HWnjBflQG5tz0R5gaQf3eHsn21jzvbVTBUGv2c=.a412baa0-2c8d-486e-8cda-880e2c6cc52b@github.com> Message-ID: On Thu, 27 Feb 2025 18:32:55 GMT, Kelvin Nilsen wrote: >> Looked at the code for `ShenandoahAllocationRate`, I guess this may have the effect of making the sampling a bit more asynchronous, but the resulting smoothing over the interval could underestimate the volatility of the rate which is used in an `upper_bound` calculation below. May be that is harmless? > > Yes. If we've already decided to trigger, there is no urgency to re-evaluate the allocation rate. We won't be checking for should_start_gc() again until the end of the current gc cycle. At that time, we will update the allocation rate based on how many allocation were realized during the GC cycle. The initial allocation rate estimate will be refined as should_start_gc() is again sampled every ms or so. I think rate is used below as an argument to a log_trigger() invocation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23827#discussion_r1974166806 From kdnilsen at openjdk.org Thu Feb 27 19:21:32 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 27 Feb 2025 19:21:32 GMT Subject: RFR: 8347804: GenShen: Crash with small GCCardSizeInBytes and small Java heap [v2] In-Reply-To: References: Message-ID: > Original implementation was not robust to overriding of CardSizeInBytes, especially to smaller values. This fixes that issue. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Respond to reviewer feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23373/files - new: https://git.openjdk.org/jdk/pull/23373/files/7120cdf3..cf06bf0d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23373&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23373&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23373/head:pull/23373 PR: https://git.openjdk.org/jdk/pull/23373 From kdnilsen at openjdk.org Thu Feb 27 19:26:56 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 27 Feb 2025 19:26:56 GMT Subject: RFR: 8347804: GenShen: Crash with small GCCardSizeInBytes and small Java heap [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 17:49:16 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Respond to reviewer feedback > > src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp line 192: > >> 190: >> 191: if (GCCardSizeInBytes < ShenandoahMinCardSizeInBytes) { >> 192: char buf[512]; > > It looks like using `err_msg` here is more idiomatic than `os:snprintf` for errors during initialization. Thanks. I've made this change. > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 950: > >> 948: ShenandoahHeap* const heap = ShenandoahHeap::heap(); >> 949: assert(heap->is_concurrent_weak_root_in_progress(), "Only during this phase"); >> 950: { > > This looks like it came from https://github.com/openjdk/jdk/pull/23604. Did you cherry-pick that into this branch? Not sure why it shows in the diff here. Looks like I accidentally did cherry-pick into my gitfarm variant of this branch, and then merged into the github version of the branch. The merge history must have gotten confused. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23373#discussion_r1974210655 PR Review Comment: https://git.openjdk.org/jdk/pull/23373#discussion_r1974206670 From kdnilsen at openjdk.org Thu Feb 27 19:26:56 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 27 Feb 2025 19:26:56 GMT Subject: RFR: 8347804: GenShen: Crash with small GCCardSizeInBytes and small Java heap [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 19:22:31 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 950: >> >>> 948: ShenandoahHeap* const heap = ShenandoahHeap::heap(); >>> 949: assert(heap->is_concurrent_weak_root_in_progress(), "Only during this phase"); >>> 950: { >> >> This looks like it came from https://github.com/openjdk/jdk/pull/23604. Did you cherry-pick that into this branch? Not sure why it shows in the diff here. > > Looks like I accidentally did cherry-pick into my gitfarm variant of this branch, and then merged into the github version of the branch. The merge history must have gotten confused. would it help if I cherry-pick the same commit directly into github? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23373#discussion_r1974209324 From wkemper at openjdk.org Thu Feb 27 19:30:52 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 27 Feb 2025 19:30:52 GMT Subject: RFR: 8347804: GenShen: Crash with small GCCardSizeInBytes and small Java heap [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 19:23:49 GMT, Kelvin Nilsen wrote: >> Looks like I accidentally did cherry-pick into my gitfarm variant of this branch, and then merged into the github version of the branch. The merge history must have gotten confused. > > would it help if I cherry-pick the same commit directly into github? I think you could try this on your `fix-small-card-size` branch for this PR: % git revert 7120cdf36 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23373#discussion_r1974216887 From kdnilsen at openjdk.org Thu Feb 27 19:43:47 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 27 Feb 2025 19:43:47 GMT Subject: RFR: 8347804: GenShen: Crash with small GCCardSizeInBytes and small Java heap [v3] In-Reply-To: References: Message-ID: > Original implementation was not robust to overriding of CardSizeInBytes, especially to smaller values. This fixes that issue. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Revert "8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710)" This reverts commit 7120cdf36c1657a250fd3e60136e7b615fc7b538. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23373/files - new: https://git.openjdk.org/jdk/pull/23373/files/cf06bf0d..1fcdc869 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23373&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23373&range=01-02 Stats: 19 lines in 1 file changed: 0 ins; 14 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23373/head:pull/23373 PR: https://git.openjdk.org/jdk/pull/23373 From kdnilsen at openjdk.org Thu Feb 27 19:43:48 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 27 Feb 2025 19:43:48 GMT Subject: RFR: 8347804: GenShen: Crash with small GCCardSizeInBytes and small Java heap [v3] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 19:28:05 GMT, William Kemper wrote: >> would it help if I cherry-pick the same commit directly into github? > > I think you could try this on your `fix-small-card-size` branch for this PR: > > % git revert 7120cdf36 Thanks. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23373#discussion_r1974233880 From wkemper at openjdk.org Thu Feb 27 19:56:17 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 27 Feb 2025 19:56:17 GMT Subject: RFR: 8350898: Shenandoah: Eliminate final roots safepoint Message-ID: This PR converts the final roots safepoint operation into a handshake. The safepoint operation still exists, but is only executed when `ShenandoahVerify` is enabled. In addition to this change, this PR also improves the logging for the concurrent preparation for update references from [PR 22688](https://github.com/openjdk/jdk/pull/22688). ------------- Commit messages: - Fix comments - Add whitespace at end of file - More detail for init update refs event message - Use timing tracker for timing verification - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots - WIP: Fix up phase timings for newly concurrent final roots and init update refs - WIP: Combine satb transfer with state propagation, restore phase timing data - WIP: Transfer pointers out of SATB with a handshake - WIP: Clear weak roots flag concurrently Changes: https://git.openjdk.org/jdk/pull/23830/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23830&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350898 Stats: 291 lines in 14 files changed: 194 ins; 47 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/23830.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23830/head:pull/23830 PR: https://git.openjdk.org/jdk/pull/23830 From wkemper at openjdk.org Thu Feb 27 21:07:00 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 27 Feb 2025 21:07:00 GMT Subject: RFR: 8347804: GenShen: Crash with small GCCardSizeInBytes and small Java heap [v3] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 19:43:47 GMT, Kelvin Nilsen wrote: >> Original implementation was not robust to overriding of CardSizeInBytes, especially to smaller values. This fixes that issue. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Revert "8348092: Shenandoah: assert(nk >= _lowest_valid_narrow_klass_id && nk <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710)" > > This reverts commit 7120cdf36c1657a250fd3e60136e7b615fc7b538. Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23373#pullrequestreview-2649079470 From kdnilsen at openjdk.org Thu Feb 27 23:13:07 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 27 Feb 2025 23:13:07 GMT Subject: Integrated: 8347804: GenShen: Crash with small GCCardSizeInBytes and small Java heap In-Reply-To: References: Message-ID: <0JEkZIRntf6CSn4JnUeSOLPvbp4GaY_Ci624kY5wRms=.d33ea15d-3e2c-4af8-ba92-a782a5c8ef37@github.com> On Thu, 30 Jan 2025 18:55:53 GMT, Kelvin Nilsen wrote: > Original implementation was not robust to overriding of CardSizeInBytes, especially to smaller values. This fixes that issue. This pull request has now been integrated. Changeset: 0a4c5a8a Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/0a4c5a8a483b23ec8c534054187c44f986d137bb Stats: 86 lines in 5 files changed: 63 ins; 4 del; 19 mod 8347804: GenShen: Crash with small GCCardSizeInBytes and small Java heap Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/jdk/pull/23373 From kdnilsen at openjdk.org Thu Feb 27 23:26:57 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 27 Feb 2025 23:26:57 GMT Subject: Integrated: 8350889: GenShen: Break out of infinite loop of old GC cycles In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 16:58:49 GMT, Kelvin Nilsen wrote: > Add log message when heuristic trigger because previous trigger is pending. Cancel the pending-trigger condition for old-generation at the start of old-generation GC. This pull request has now been integrated. Changeset: ab4b0ef9 Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/ab4b0ef9242a4cd964fbcf2d1f3d370234c09408 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod 8350889: GenShen: Break out of infinite loop of old GC cycles Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/jdk/pull/23827 From xpeng at openjdk.org Fri Feb 28 00:08:22 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 28 Feb 2025 00:08:22 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle [v9] In-Reply-To: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> Message-ID: > Reset marking bitmaps after collection cycle; for GenShen only do this for young generation, also choose not do this for Degen and full GC since both are running at safepoint, we should leave safepoint as ASAP. > > I have run same workload for 30s with Shenandoah in generational mode and classic mode, average average time of concurrent reset dropped significantly since in most case bitmap for young gen should have been reset after pervious concurrent cycle finishes if there is no need to preserve bitmap states. > > GenShen: > Before: > > [33.342s][info][gc,stats ] Concurrent Reset = 0.023 s (a = 1921 us) (n = 12) (lvls, us = 133, 385, 1191, 1836, 8878) > > > After: > > [33.597s][info][gc,stats ] Concurrent Reset = 0.004 s (a = 317 us) (n = 13) (lvls, us = 58, 119, 217, 410, 670) > [33.597s][info][gc,stats ] Concurrent Reset After Collect = 0.018 s (a = 1365 us) (n = 13) (lvls, us = 91, 186, 818, 1836, 3872) > > > Shenandoah: > Before: > > [33.144s][info][gc,stats ] Concurrent Reset = 0.014 s (a = 1067 us) (n = 13) (lvls, us = 139, 277, 898, 1328, 2118) > > After: > > [33.128s][info][gc,stats ] Concurrent Reset = 0.003 s (a = 225 us) (n = 13) (lvls, us = 32, 92, 137, 295, 542) > [33.128s][info][gc,stats ] Concurrent Reset After Collect = 0.009 s (a = 661 us) (n = 13) (lvls, us = 92, 160, 594, 896, 1661) > > > Additional changes: > * Remove `ShenandoahResetBitmapClosure` and `ShenandoahPrepareForMarkClosure`, merge the code with `ShenandoahResetBitmapClosure`, saving one iteration over all the regions. > * Use API `ShenandoahGeneration::parallel_heap_region_iterate_free` to iterate the regions, two benefits from this: > - Underneath it calls `ShenandoahHeap::parallel_heap_region_iterate`, which is faster for very light tasks, see https://bugs.openjdk.org/browse/JDK-8337154 > - `ShenandoahGeneration::parallel_heap_region_iterate_free` decorate the closure with `ShenandoahExcludeRegionClosure`, which simplifies the code in closure. > * When `_do_old_gc_bootstrap is true`, instead of reset mark bitmap for old gen separately, simply reset the global generations, so we don't need walk the all regions twice. > * Clean up FullGC code, remove duplicate code. > > Additional tests: > - [x] CONF=macosx-aarch64-server-fastdebug make test T... Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: - Merge branch 'openjdk:master' into reset-bitmap - Merge branch 'openjdk:master' into reset-bitmap - Merge branch 'openjdk:master' into reset-bitmap - Merge branch 'openjdk:master' into reset-bitmap - Merge branch 'openjdk:master' into reset-bitmap - Adding condition "!_do_old_gc_bootstrap && !heap->is_concurrent_old_mark_in_progress()" back and address some PR comments - Remove entry_reset_after_collect from ShenandoahOldGC - Remove condition check !_do_old_gc_bootstrap && !heap->is_concurrent_old_mark_in_progress() from op_reset_after_collect - Merge branch 'openjdk:master' into reset-bitmap - Address review comments - ... and 15 more: https://git.openjdk.org/jdk/compare/c65af1fb...7eea9556 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22778/files - new: https://git.openjdk.org/jdk/pull/22778/files/c7e9bff3..7eea9556 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22778&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22778&range=07-08 Stats: 20235 lines in 598 files changed: 11195 ins; 6911 del; 2129 mod Patch: https://git.openjdk.org/jdk/pull/22778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22778/head:pull/22778 PR: https://git.openjdk.org/jdk/pull/22778 From tschatzl at openjdk.org Fri Feb 28 10:35:03 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 28 Feb 2025 10:35:03 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v2] In-Reply-To: <3zmj-DeeRyPMHc32YnvfqACN0xJxLQ6jZZ7sd-Baa3w=.672912f6-e4a3-4679-b8a3-b7f6ad51589d@github.com> References: <3zmj-DeeRyPMHc32YnvfqACN0xJxLQ6jZZ7sd-Baa3w=.672912f6-e4a3-4679-b8a3-b7f6ad51589d@github.com> Message-ID: On Thu, 27 Feb 2025 18:24:15 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * remove unnecessarily added logging > > src/hotspot/share/gc/g1/g1BarrierSet.hpp line 54: > >> 52: // them, keeping the write barrier simple. >> 53: // >> 54: // The refinement threads mark cards in the the current collection set specially on the > > "the the" typo. I fixed one more occurrence in files changed in this CR. There are like 10 more of these duplications in our code, I will fix separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1975186407 From tschatzl at openjdk.org Fri Feb 28 11:25:53 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 28 Feb 2025 11:25:53 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v2] In-Reply-To: <3zmj-DeeRyPMHc32YnvfqACN0xJxLQ6jZZ7sd-Baa3w=.672912f6-e4a3-4679-b8a3-b7f6ad51589d@github.com> References: <3zmj-DeeRyPMHc32YnvfqACN0xJxLQ6jZZ7sd-Baa3w=.672912f6-e4a3-4679-b8a3-b7f6ad51589d@github.com> Message-ID: <9tS5E1tteGutSNX7rZh5WYLdZoF7Vgl_4_pjuAdT4WU=.c8c73c45-7abb-48a9-b623-769d3c1679ca@github.com> On Thu, 27 Feb 2025 12:07:29 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * remove unnecessarily added logging > > src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 349: > >> 347: >> 348: bool do_heap_region(G1HeapRegion* r) override { >> 349: if (!r->is_free()) { > > I am a bit lost on this closure; the intention seems to set unclaimed to all non-free regions, why can't this be done in one go, instead of first setting all regions to claimed (`reset_all_claims_to_claimed`), then set non-free ones unclaimed? `do_heap_region()` only visits committed regions in this case. I wanted to avoid the additional check in the iteration code. If you still think it is more clear to filter those out later, please tell me. I'll add a comment for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1975250646 From tschatzl at openjdk.org Fri Feb 28 12:14:01 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 28 Feb 2025 12:14:01 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v2] In-Reply-To: <3zmj-DeeRyPMHc32YnvfqACN0xJxLQ6jZZ7sd-Baa3w=.672912f6-e4a3-4679-b8a3-b7f6ad51589d@github.com> References: <3zmj-DeeRyPMHc32YnvfqACN0xJxLQ6jZZ7sd-Baa3w=.672912f6-e4a3-4679-b8a3-b7f6ad51589d@github.com> Message-ID: <87L5pcyGAgyDsXTwlSdAFLyIAOcUl1ZdYXK-nwzLrUQ=.c3db7522-b3e6-46e0-b268-e457c3d2bdc2@github.com> On Thu, 27 Feb 2025 18:31:16 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * remove unnecessarily added logging > > src/hotspot/share/gc/g1/g1RemSet.cpp line 1252: > >> 1250: G1ConcurrentRefineWorkState::snapshot_heap_into(&constructed); >> 1251: claim = &constructed; >> 1252: } > > It's not super obvious to me why the "has_sweep_claims" checking needs to be on this level. Can `G1ConcurrentRefineWorkState` return a valid `G1CardTableClaimTable*` directly? I agree. I remember having similar thoughts as well, but then did not do anything about this. Will fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1975311607 From tschatzl at openjdk.org Fri Feb 28 13:43:24 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 28 Feb 2025 13:43:24 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v3] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: - * ayang review 1 (ctd) * split up sweep-rt state into "start" (to be called once) and "step" (to be called repeatedly) phases * move building the snapshot our of g1remset - * ayang review 1 * use uint for number of reserved regions consistently * rename *sweep_state to *sweep_table * improved comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/9ef9c5f4..7d361fc1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=01-02 Stats: 108 lines in 8 files changed: 40 ins; 24 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Fri Feb 28 14:20:19 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 28 Feb 2025 14:20:19 GMT Subject: Integrated: 8350758: G1: Use actual last prediction in accumulated survivor rate prediction too In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 11:51:31 GMT, Thomas Schatzl wrote: > Hi all, > > please review this fix to recent JDK-8349906 to use the current > survivor rate in the accumulated survivor rate for initializing newly allocated entries as well. > > Testing: gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: d6c4be67 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/d6c4be672f6348f8ed985416ed90d0447f5d5bb3 Stats: 17 lines in 1 file changed: 5 ins; 8 del; 4 mod 8350758: G1: Use actual last prediction in accumulated survivor rate prediction too Reviewed-by: ayang, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/23795 From tschatzl at openjdk.org Fri Feb 28 14:20:18 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 28 Feb 2025 14:20:18 GMT Subject: RFR: 8350758: G1: Use actual last prediction in accumulated survivor rate prediction too [v2] In-Reply-To: <19YTnvDZIxpkuA9etPTn0Hb5bGcgJkKJDGHiR3ER6j8=.45ba6b3a-be0e-4cf2-bd24-c0b5633fcee2@github.com> References: <6NAHP_4D-gWM_Sx1wmWg_nxDvoIPGpyN6E2xJR8Deag=.0f8d8872-cb8f-47a4-8bbf-fa6e62101db6@github.com> <19YTnvDZIxpkuA9etPTn0Hb5bGcgJkKJDGHiR3ER6j8=.45ba6b3a-be0e-4cf2-bd24-c0b5633fcee2@github.com> Message-ID: On Thu, 27 Feb 2025 12:59:14 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * ayang review > > Marked as reviewed by iwalulya (Reviewer). Thanks @walulyai @albertnetymk for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23795#issuecomment-2690754693 From kdnilsen at openjdk.org Fri Feb 28 15:58:00 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 28 Feb 2025 15:58:00 GMT Subject: RFR: 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 01:38:14 GMT, William Kemper wrote: > The protocol which is meant to prevent regions from being uncommitted while their bitmaps are being reset may fail. This happens when the control thread attempts to wait for the uncommit thread to finish, but the uncommit thread has not yet indicated that it has started. > > ## Testing > GHA, Dacapo, Extremem, Heapothesys, Diluvian, SpecJBB2015, SpecJVM2008 (with and without stress flags, asserts). Also have run the JTREG test that failed this assertion over 10K times (and counting). Looks good. Most of our pipeline tests are running with fixed size heap and pretouch. Do we have good test coverage of scattered uncommitted regions? src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.cpp line 146: > 144: log_info(gc, start)("%s", msg); > 145: > 146: const size_t uncommitted_region_count = do_uncommit_work(shrink_before, shrink_until); a comment here might also be helpful. i think this is newly uncommitted by the current call. there may be other preexisting uncommitted regions? src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.hpp line 63: > 61: bool is_uncommit_allowed() const; > 62: > 63: size_t do_uncommit_work(double shrink_before, size_t shrink_until) const; A comment describing arguments and return value would be nice. Mention premature termination if gc is requested. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/23760#pullrequestreview-2651140350 PR Review Comment: https://git.openjdk.org/jdk/pull/23760#discussion_r1975639431 PR Review Comment: https://git.openjdk.org/jdk/pull/23760#discussion_r1975635358 From wkemper at openjdk.org Fri Feb 28 17:17:17 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 28 Feb 2025 17:17:17 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v18] In-Reply-To: References: Message-ID: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). > > ## Testing > * jtreg hotspot_gc_shenandoah > * dacapo, extremem, diluvian, specjbb2015, specjvm2018, heapothesys William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads - Don't check for shutdown in control thread loop condition It may cause the thread to exit before it is requested to stop - Add assertions about old gen state when resuming old cycles - Remove duplicated field pointer for old generation - Improve names and comments - Merge tag 'jdk-25+11' into fix-control-regulator-threads Added tag jdk-25+11 for changeset 0131c1bf - Address review feedback (better comments, better names) - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads - Old gen bootstrap cycle must make it to init mark - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads - ... and 27 more: https://git.openjdk.org/jdk/compare/e98df71d...37e445d6 ------------- Changes: https://git.openjdk.org/jdk/pull/23475/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23475&range=17 Stats: 963 lines in 18 files changed: 327 ins; 294 del; 342 mod Patch: https://git.openjdk.org/jdk/pull/23475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23475/head:pull/23475 PR: https://git.openjdk.org/jdk/pull/23475 From wkemper at openjdk.org Fri Feb 28 17:37:16 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 28 Feb 2025 17:37:16 GMT Subject: RFR: 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them [v2] In-Reply-To: References: Message-ID: > The protocol which is meant to prevent regions from being uncommitted while their bitmaps are being reset may fail. This happens when the control thread attempts to wait for the uncommit thread to finish, but the uncommit thread has not yet indicated that it has started. > > ## Testing > GHA, Dacapo, Extremem, Heapothesys, Diluvian, SpecJBB2015, SpecJVM2008 (with and without stress flags, asserts). Also have run the JTREG test that failed this assertion over 10K times (and counting). William Kemper has updated the pull request incrementally with one additional commit since the last revision: Improve comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23760/files - new: https://git.openjdk.org/jdk/pull/23760/files/ec13274c..b194db8f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23760&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23760&range=00-01 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23760.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23760/head:pull/23760 PR: https://git.openjdk.org/jdk/pull/23760 From wkemper at openjdk.org Fri Feb 28 17:44:36 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 28 Feb 2025 17:44:36 GMT Subject: RFR: 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them [v3] In-Reply-To: References: Message-ID: > The protocol which is meant to prevent regions from being uncommitted while their bitmaps are being reset may fail. This happens when the control thread attempts to wait for the uncommit thread to finish, but the uncommit thread has not yet indicated that it has started. > > ## Testing > GHA, Dacapo, Extremem, Heapothesys, Diluvian, SpecJBB2015, SpecJVM2008 (with and without stress flags, asserts). Also have run the JTREG test that failed this assertion over 10K times (and counting). William Kemper has updated the pull request incrementally with one additional commit since the last revision: Comment tweak ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23760/files - new: https://git.openjdk.org/jdk/pull/23760/files/b194db8f..1c32c0e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23760&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23760&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23760.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23760/head:pull/23760 PR: https://git.openjdk.org/jdk/pull/23760 From wkemper at openjdk.org Fri Feb 28 17:47:53 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 28 Feb 2025 17:47:53 GMT Subject: RFR: 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them [v3] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 17:44:36 GMT, William Kemper wrote: >> The protocol which is meant to prevent regions from being uncommitted while their bitmaps are being reset may fail. This happens when the control thread attempts to wait for the uncommit thread to finish, but the uncommit thread has not yet indicated that it has started. >> >> ## Testing >> GHA, Dacapo, Extremem, Heapothesys, Diluvian, SpecJBB2015, SpecJVM2008 (with and without stress flags, asserts). Also have run the JTREG test that failed this assertion over 10K times (and counting). > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Comment tweak That's a good point. I created a branch that enables uncommit for the test pipelines when I made this original change. I'll resurrect that branch and run that configuration again. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23760#issuecomment-2691218679 From tschatzl at openjdk.org Fri Feb 28 17:52:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 28 Feb 2025 17:52:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v4] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * fix assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/7d361fc1..d87935a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From duke at openjdk.org Fri Feb 28 18:11:59 2025 From: duke at openjdk.org (Abdelhak Zaaim) Date: Fri, 28 Feb 2025 18:11:59 GMT Subject: RFR: 8350851: ZGC: Reduce size of ZAddressOffsetMax scaling data structures In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 12:45:36 GMT, Axel Boldt-Christmas wrote: > ZAddressOffsetMax is used to scale a few of our BitMap and GranuleMap data structures. ZAddressOffsetMax is initialised to an upper limit, prior to reserving the virtual address space for the heap. After the reservation, the largest address offset that can be encountered may be much lower. > > I propose we scale ZAddressOffsetMax down after our heap reservation is complete, to the actual max value an zoffset_end is allowed to be. > > Doing this gives us two benefits. Firstly the assertions and type checks will be stricter, and will exercise code paths that otherwise only occur when using a 16TB heap. Secondly we can reduce the size of the data structures which scale with ZAddressOffsetMax. (For most OSs the extra memory of these data structures do not really matter as they are not page'd in. But they are accounted for both on the OS, allocator and NMT layers). > > The page table, uses ZIndexDistributor to iterate and distribute indices. The different strategies have different requirements on the alignment of the size of the range it distribute across. My proposed implementation simply aligns up the page table size to this alignment requirement. As it is the least intrusive change, at the cost of some larger data structure than strictly required. The alternative would be to extend ZIndexDistributor with support for any alignment on the range, or condition the use of the distributed indices based on if they are less than the size. > > The data structures can also be larger than required if we fail to reserve the heap starting at our heap base. However this is a very rare occurrence, and while it would be nice to extend our asserts to check for a "ZAddressOffsetMin", I'll leave that for a future enhancement. > > Testing: > * ZGC specific tasks, tier 1 through tier 8 on Oracle Supported platforms > * with `ZIndexDistributorStrategy=0`, and > * with `ZIndexDistributorStrategy=1` > * GHA Marked as reviewed by abdelhak-zaaim at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/23822#pullrequestreview-2651480088 From wkemper at openjdk.org Fri Feb 28 23:59:11 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 28 Feb 2025 23:59:11 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle [v9] In-Reply-To: References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> Message-ID: On Fri, 28 Feb 2025 00:08:22 GMT, Xiaolong Peng wrote: >> Reset marking bitmaps after collection cycle; for GenShen only do this for young generation, also choose not do this for Degen and full GC since both are running at safepoint, we should leave safepoint as ASAP. >> >> I have run same workload for 30s with Shenandoah in generational mode and classic mode, average average time of concurrent reset dropped significantly since in most case bitmap for young gen should have been reset after pervious concurrent cycle finishes if there is no need to preserve bitmap states. >> >> GenShen: >> Before: >> >> [33.342s][info][gc,stats ] Concurrent Reset = 0.023 s (a = 1921 us) (n = 12) (lvls, us = 133, 385, 1191, 1836, 8878) >> >> >> After: >> >> [33.597s][info][gc,stats ] Concurrent Reset = 0.004 s (a = 317 us) (n = 13) (lvls, us = 58, 119, 217, 410, 670) >> [33.597s][info][gc,stats ] Concurrent Reset After Collect = 0.018 s (a = 1365 us) (n = 13) (lvls, us = 91, 186, 818, 1836, 3872) >> >> >> Shenandoah: >> Before: >> >> [33.144s][info][gc,stats ] Concurrent Reset = 0.014 s (a = 1067 us) (n = 13) (lvls, us = 139, 277, 898, 1328, 2118) >> >> After: >> >> [33.128s][info][gc,stats ] Concurrent Reset = 0.003 s (a = 225 us) (n = 13) (lvls, us = 32, 92, 137, 295, 542) >> [33.128s][info][gc,stats ] Concurrent Reset After Collect = 0.009 s (a = 661 us) (n = 13) (lvls, us = 92, 160, 594, 896, 1661) >> >> >> Additional changes: >> * Remove `ShenandoahResetBitmapClosure` and `ShenandoahPrepareForMarkClosure`, merge the code with `ShenandoahResetBitmapClosure`, saving one iteration over all the regions. >> * Use API `ShenandoahGeneration::parallel_heap_region_iterate_free` to iterate the regions, two benefits from this: >> - Underneath it calls `ShenandoahHeap::parallel_heap_region_iterate`, which is faster for very light tasks, see https://bugs.openjdk.org/browse/JDK-8337154 >> - `ShenandoahGeneration::parallel_heap_region_iterate_free` decorate the closure with `ShenandoahExcludeRegionClosure`, which simplifies the code in closure. >> * When `_do_old_gc_bootstrap is true`, instead of reset mark bitmap for old gen separately, simply reset the global generations, so we don't need walk the all regions twice. >> * Clean up FullGC code, remove duplicate code. >> >> ... > > Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: > > - Merge branch 'openjdk:master' into reset-bitmap > - Merge branch 'openjdk:master' into reset-bitmap > - Merge branch 'openjdk:master' into reset-bitmap > - Merge branch 'openjdk:master' into reset-bitmap > - Merge branch 'openjdk:master' into reset-bitmap > - Adding condition "!_do_old_gc_bootstrap && !heap->is_concurrent_old_mark_in_progress()" back and address some PR comments > - Remove entry_reset_after_collect from ShenandoahOldGC > - Remove condition check !_do_old_gc_bootstrap && !heap->is_concurrent_old_mark_in_progress() from op_reset_after_collect > - Merge branch 'openjdk:master' into reset-bitmap > - Address review comments > - ... and 15 more: https://git.openjdk.org/jdk/compare/ad10cf10...7eea9556 Okay, these changes look good. We don't yet understand why we cannot reset young bitmaps during an old cycle, but we will follow up that investigation separately. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22778#pullrequestreview-2651973439