From xpeng at openjdk.org Sat Mar 1 06:06:01 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Sat, 1 Mar 2025 06:06:01 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle [v9] In-Reply-To: References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> Message-ID: On Fri, 28 Feb 2025 00:08:22 GMT, Xiaolong Peng wrote: >> Reset marking bitmaps after collection cycle; for GenShen only do this for young generation, also choose not do this for Degen and full GC since both are running at safepoint, we should leave safepoint as ASAP. >> >> I have run same workload for 30s with Shenandoah in generational mode and classic mode, average average time of concurrent reset dropped significantly since in most case bitmap for young gen should have been reset after pervious concurrent cycle finishes if there is no need to preserve bitmap states. >> >> GenShen: >> Before: >> >> [33.342s][info][gc,stats ] Concurrent Reset = 0.023 s (a = 1921 us) (n = 12) (lvls, us = 133, 385, 1191, 1836, 8878) >> >> >> After: >> >> [33.597s][info][gc,stats ] Concurrent Reset = 0.004 s (a = 317 us) (n = 13) (lvls, us = 58, 119, 217, 410, 670) >> [33.597s][info][gc,stats ] Concurrent Reset After Collect = 0.018 s (a = 1365 us) (n = 13) (lvls, us = 91, 186, 818, 1836, 3872) >> >> >> Shenandoah: >> Before: >> >> [33.144s][info][gc,stats ] Concurrent Reset = 0.014 s (a = 1067 us) (n = 13) (lvls, us = 139, 277, 898, 1328, 2118) >> >> After: >> >> [33.128s][info][gc,stats ] Concurrent Reset = 0.003 s (a = 225 us) (n = 13) (lvls, us = 32, 92, 137, 295, 542) >> [33.128s][info][gc,stats ] Concurrent Reset After Collect = 0.009 s (a = 661 us) (n = 13) (lvls, us = 92, 160, 594, 896, 1661) >> >> >> Additional changes: >> * Remove `ShenandoahResetBitmapClosure` and `ShenandoahPrepareForMarkClosure`, merge the code with `ShenandoahResetBitmapClosure`, saving one iteration over all the regions. >> * Use API `ShenandoahGeneration::parallel_heap_region_iterate_free` to iterate the regions, two benefits from this: >> - Underneath it calls `ShenandoahHeap::parallel_heap_region_iterate`, which is faster for very light tasks, see https://bugs.openjdk.org/browse/JDK-8337154 >> - `ShenandoahGeneration::parallel_heap_region_iterate_free` decorate the closure with `ShenandoahExcludeRegionClosure`, which simplifies the code in closure. >> * When `_do_old_gc_bootstrap is true`, instead of reset mark bitmap for old gen separately, simply reset the global generations, so we don't need walk the all regions twice. >> * Clean up FullGC code, remove duplicate code. >> >> ... > > Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: > > - Merge branch 'openjdk:master' into reset-bitmap > - Merge branch 'openjdk:master' into reset-bitmap > - Merge branch 'openjdk:master' into reset-bitmap > - Merge branch 'openjdk:master' into reset-bitmap > - Merge branch 'openjdk:master' into reset-bitmap > - Adding condition "!_do_old_gc_bootstrap && !heap->is_concurrent_old_mark_in_progress()" back and address some PR comments > - Remove entry_reset_after_collect from ShenandoahOldGC > - Remove condition check !_do_old_gc_bootstrap && !heap->is_concurrent_old_mark_in_progress() from op_reset_after_collect > - Merge branch 'openjdk:master' into reset-bitmap > - Address review comments > - ... and 15 more: https://git.openjdk.org/jdk/compare/8e164a93...7eea9556 Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22778#issuecomment-2691984558 From duke at openjdk.org Sat Mar 1 06:06:01 2025 From: duke at openjdk.org (duke) Date: Sat, 1 Mar 2025 06:06:01 GMT Subject: RFR: 8338737: Shenandoah: Reset marking bitmaps after the cycle [v9] In-Reply-To: References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> Message-ID: On Fri, 28 Feb 2025 00:08:22 GMT, Xiaolong Peng wrote: >> Reset marking bitmaps after collection cycle; for GenShen only do this for young generation, also choose not do this for Degen and full GC since both are running at safepoint, we should leave safepoint as ASAP. >> >> I have run same workload for 30s with Shenandoah in generational mode and classic mode, average average time of concurrent reset dropped significantly since in most case bitmap for young gen should have been reset after pervious concurrent cycle finishes if there is no need to preserve bitmap states. >> >> GenShen: >> Before: >> >> [33.342s][info][gc,stats ] Concurrent Reset = 0.023 s (a = 1921 us) (n = 12) (lvls, us = 133, 385, 1191, 1836, 8878) >> >> >> After: >> >> [33.597s][info][gc,stats ] Concurrent Reset = 0.004 s (a = 317 us) (n = 13) (lvls, us = 58, 119, 217, 410, 670) >> [33.597s][info][gc,stats ] Concurrent Reset After Collect = 0.018 s (a = 1365 us) (n = 13) (lvls, us = 91, 186, 818, 1836, 3872) >> >> >> Shenandoah: >> Before: >> >> [33.144s][info][gc,stats ] Concurrent Reset = 0.014 s (a = 1067 us) (n = 13) (lvls, us = 139, 277, 898, 1328, 2118) >> >> After: >> >> [33.128s][info][gc,stats ] Concurrent Reset = 0.003 s (a = 225 us) (n = 13) (lvls, us = 32, 92, 137, 295, 542) >> [33.128s][info][gc,stats ] Concurrent Reset After Collect = 0.009 s (a = 661 us) (n = 13) (lvls, us = 92, 160, 594, 896, 1661) >> >> >> Additional changes: >> * Remove `ShenandoahResetBitmapClosure` and `ShenandoahPrepareForMarkClosure`, merge the code with `ShenandoahResetBitmapClosure`, saving one iteration over all the regions. >> * Use API `ShenandoahGeneration::parallel_heap_region_iterate_free` to iterate the regions, two benefits from this: >> - Underneath it calls `ShenandoahHeap::parallel_heap_region_iterate`, which is faster for very light tasks, see https://bugs.openjdk.org/browse/JDK-8337154 >> - `ShenandoahGeneration::parallel_heap_region_iterate_free` decorate the closure with `ShenandoahExcludeRegionClosure`, which simplifies the code in closure. >> * When `_do_old_gc_bootstrap is true`, instead of reset mark bitmap for old gen separately, simply reset the global generations, so we don't need walk the all regions twice. >> * Clean up FullGC code, remove duplicate code. >> >> ... > > Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 25 additional commits since the last revision: > > - Merge branch 'openjdk:master' into reset-bitmap > - Merge branch 'openjdk:master' into reset-bitmap > - Merge branch 'openjdk:master' into reset-bitmap > - Merge branch 'openjdk:master' into reset-bitmap > - Merge branch 'openjdk:master' into reset-bitmap > - Adding condition "!_do_old_gc_bootstrap && !heap->is_concurrent_old_mark_in_progress()" back and address some PR comments > - Remove entry_reset_after_collect from ShenandoahOldGC > - Remove condition check !_do_old_gc_bootstrap && !heap->is_concurrent_old_mark_in_progress() from op_reset_after_collect > - Merge branch 'openjdk:master' into reset-bitmap > - Address review comments > - ... and 15 more: https://git.openjdk.org/jdk/compare/8e164a93...7eea9556 @pengxiaolong Your change (at version 7eea95568115c3ceb976bf83559b4df1d2b490d4) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22778#issuecomment-2691985569 From tschatzl at openjdk.org Mon Mar 3 08:42:05 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 08:42:05 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * fix comment (trailing whitespace) * another assert when snapshotting at a safepoint. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/d87935a0..810bf2d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=03-04 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Mon Mar 3 12:11:20 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 12:11:20 GMT Subject: RFR: 8350954: Fix repetitions of the word "the" in gc component comments Message-ID: Hi all, please review this trivial change that fixes "the the" repetitions (and in this case grammar/wording) in the gc related sources. If you think it's not worth fixing, I am okay with that and just retract the change. Testing: gha Thanks, Thomas ------------- Commit messages: - 8350954 Changes: https://git.openjdk.org/jdk/pull/23859/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23859&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350954 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23859.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23859/head:pull/23859 PR: https://git.openjdk.org/jdk/pull/23859 From iwalulya at openjdk.org Mon Mar 3 12:46:56 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 3 Mar 2025 12:46:56 GMT Subject: RFR: 8350954: Fix repetitions of the word "the" in gc component comments In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 11:07:50 GMT, Thomas Schatzl wrote: > Hi all, > > please review this trivial change that fixes "the the" repetitions (and in this case grammar/wording) in the gc related sources. > > If you think it's not worth fixing, I am okay with that and just retract the change. > > Testing: gha > > Thanks, > Thomas Trivial! ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23859#pullrequestreview-2654034410 From ayang at openjdk.org Mon Mar 3 13:40:54 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 3 Mar 2025 13:40:54 GMT Subject: RFR: 8350954: Fix repetitions of the word "the" in gc component comments In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 11:07:50 GMT, Thomas Schatzl wrote: > Hi all, > > please review this trivial change that fixes "the the" repetitions (and in this case grammar/wording) in the gc related sources. > > If you think it's not worth fixing, I am okay with that and just retract the change. > > Testing: gha > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23859#pullrequestreview-2654154109 From tschatzl at openjdk.org Mon Mar 3 14:06:06 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 14:06:06 GMT Subject: Integrated: 8350954: Fix repetitions of the word "the" in gc component comments In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 11:07:50 GMT, Thomas Schatzl wrote: > Hi all, > > please review this trivial change that fixes "the the" repetitions (and in this case grammar/wording) in the gc related sources. > > If you think it's not worth fixing, I am okay with that and just retract the change. > > Testing: gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: f47232ad Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/f47232ad7129e40bdc433525a66de2ca6657f211 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod 8350954: Fix repetitions of the word "the" in gc component comments Reviewed-by: iwalulya, ayang ------------- PR: https://git.openjdk.org/jdk/pull/23859 From tschatzl at openjdk.org Mon Mar 3 14:06:05 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 14:06:05 GMT Subject: RFR: 8350954: Fix repetitions of the word "the" in gc component comments In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 13:38:14 GMT, Albert Mingkun Yang wrote: >> Hi all, >> >> please review this trivial change that fixes "the the" repetitions (and in this case grammar/wording) in the gc related sources. >> >> If you think it's not worth fixing, I am okay with that and just retract the change. >> >> Testing: gha >> >> Thanks, >> Thomas > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk @walulyai for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/23859#issuecomment-2694491062 From amitkumar at openjdk.org Mon Mar 3 14:25:54 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 3 Mar 2025 14:25:54 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 08:42:05 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * fix comment (trailing whitespace) > * another assert when snapshotting at a safepoint. I don't see any failure on s390x. Tier1 test looks good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2694563382 From ayang at openjdk.org Mon Mar 3 15:22:10 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 3 Mar 2025 15:22:10 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 08:42:05 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * fix comment (trailing whitespace) > * another assert when snapshotting at a safepoint. src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 106: > 104: > 105: __ testptr(count, count); > 106: __ jcc(Assembler::equal, done); I wonder if we can use "zero" instead of "equal" here; they have the same underlying value, but the semantic is to checking for "zero". src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 133: > 131: Label is_clean_card; > 132: __ cmpb(Address(addr, 0), G1CardTable::clean_card_val()); > 133: __ jcc(Assembler::equal, is_clean_card); Should this checking be guarded by `if (UseCondCardMark)`? I see that aarch64 does that. src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 143: > 141: > 142: __ bind(is_clean_card); > 143: // Card was not clean. Dirty card and go to next.. Why "not clean"? I thought this path is for dirtying clean card? src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 323: > 321: assert(thread == r15_thread, "must be"); > 322: #endif // _LP64 > 323: assert_different_registers(store_addr, new_val, thread, tmp1 /*, tmp2 unused */, noreg); Seems that `tmp2` is unused in this method. It is used in aarch64, but it's not obvious to me whether that is indeed necessary. If so, can you add a comment saying sth like "this unused var is needed for other archs..."? src/hotspot/share/gc/g1/g1CardTable.inline.hpp line 54: > 52: // result = 0xBBAABBAA > 53: inline size_t blend(size_t a, size_t b, size_t mask) { > 54: return a ^ ((a ^ b) & mask); The example makes it much clearer; I wonder if `return (a & ~mask) | (b & mask);` is more readable. src/hotspot/share/gc/g1/g1CardTableClaimTable.cpp line 59: > 57: > 58: void G1CardTableClaimTable::reset_all_claims_to_claimed() { > 59: for (size_t i = 0; i < _max_reserved_regions; i++) { `uint` for `i`? src/hotspot/share/gc/g1/g1CardTableClaimTable.hpp line 64: > 62: void reset_all_claims_to_unclaimed(); > 63: void reset_all_claims_to_claimed(); > 64: I wonder if these two APIs can be renamed to "reset_all_to_x", which is more aligned with its single-region counterpart, `reset_to_unclaimed`, IMO. src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 348: > 346: void G1ConcurrentRefineWorkState::snapshot_heap_into(G1CardTableClaimTable* sweep_table) { > 347: // G1CollectedHeap::heap_region_iterate() below will only visit committed regions. Initialize > 348: // all entries in the state table here to not require special handling when iterating over it. Can you elaborate on what the "special handling" would be, if we don's set "claimed" for non-committed regions? src/hotspot/share/gc/g1/g1RemSet.cpp line 837: > 835: for (; refinement_cur_card < refinement_end_card; ++refinement_cur_card, ++card_cur_word) { > 836: size_t value = *refinement_cur_card; > 837: *refinement_cur_card = G1CardTable::WordAllClean; Similarly, this is a "word", not "card", also. src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 857: > 855: // We do not expect too many non-Java threads compared to Java threads, so just > 856: // let one worker claim that work. > 857: if (!_non_java_threads_claim && !Atomic::cmpxchg(&_non_java_threads_claim, false, true, memory_order_relaxed)) { Do non-java threads have card-table-base? src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 862: > 860: > 861: class ResizeAndSwapCardTableClosure : public ThreadClosure { > 862: SwapCardTableClosure _cl; Field indentation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977586579 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977594184 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977583002 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977601907 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977645576 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977571306 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977573354 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977704351 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977575441 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977701293 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977679688 From tschatzl at openjdk.org Mon Mar 3 15:40:04 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 15:40:04 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 14:11:09 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * fix comment (trailing whitespace) >> * another assert when snapshotting at a safepoint. > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 143: > >> 141: >> 142: __ bind(is_clean_card); >> 143: // Card was not clean. Dirty card and go to next.. > > Why "not clean"? I thought this path is for dirtying clean card? My interpretation is: in this path the card has been found clean ("is clean") earlier. So dirty it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977733993 From tschatzl at openjdk.org Mon Mar 3 15:42:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 15:42:57 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 14:47:00 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * fix comment (trailing whitespace) >> * another assert when snapshotting at a safepoint. > > src/hotspot/share/gc/g1/g1CardTable.inline.hpp line 54: > >> 52: // result = 0xBBAABBAA >> 53: inline size_t blend(size_t a, size_t b, size_t mask) { >> 54: return a ^ ((a ^ b) & mask); > > The example makes it much clearer; I wonder if `return (a & ~mask) | (b & mask);` is more readable. ... and hope that the optimizer knows this pattern? If you insist I can do that, brief examination of that code snippet by itself (not within this code) showed that it does. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977739888 From tschatzl at openjdk.org Mon Mar 3 16:55:55 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 16:55:55 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 15:17:27 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * fix comment (trailing whitespace) >> * another assert when snapshotting at a safepoint. > > src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 857: > >> 855: // We do not expect too many non-Java threads compared to Java threads, so just >> 856: // let one worker claim that work. >> 857: if (!_non_java_threads_claim && !Atomic::cmpxchg(&_non_java_threads_claim, false, true, memory_order_relaxed)) { > > Do non-java threads have card-table-base? This code should not be necessary (any more). Will remove. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977853483 From xpeng at openjdk.org Mon Mar 3 17:24:02 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 3 Mar 2025 17:24:02 GMT Subject: Integrated: 8338737: Shenandoah: Reset marking bitmaps after the cycle In-Reply-To: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> References: <6duTgo8vKHyCUnasOsrHp341B2krxcK8jNogKjX09gs=.af63669e-9c8d-4f17-b055-bf3a03a9618e@github.com> Message-ID: On Tue, 17 Dec 2024 00:09:25 GMT, Xiaolong Peng wrote: > Reset marking bitmaps after collection cycle; for GenShen only do this for young generation, also choose not do this for Degen and full GC since both are running at safepoint, we should leave safepoint as ASAP. > > I have run same workload for 30s with Shenandoah in generational mode and classic mode, average average time of concurrent reset dropped significantly since in most case bitmap for young gen should have been reset after pervious concurrent cycle finishes if there is no need to preserve bitmap states. > > GenShen: > Before: > > [33.342s][info][gc,stats ] Concurrent Reset = 0.023 s (a = 1921 us) (n = 12) (lvls, us = 133, 385, 1191, 1836, 8878) > > > After: > > [33.597s][info][gc,stats ] Concurrent Reset = 0.004 s (a = 317 us) (n = 13) (lvls, us = 58, 119, 217, 410, 670) > [33.597s][info][gc,stats ] Concurrent Reset After Collect = 0.018 s (a = 1365 us) (n = 13) (lvls, us = 91, 186, 818, 1836, 3872) > > > Shenandoah: > Before: > > [33.144s][info][gc,stats ] Concurrent Reset = 0.014 s (a = 1067 us) (n = 13) (lvls, us = 139, 277, 898, 1328, 2118) > > After: > > [33.128s][info][gc,stats ] Concurrent Reset = 0.003 s (a = 225 us) (n = 13) (lvls, us = 32, 92, 137, 295, 542) > [33.128s][info][gc,stats ] Concurrent Reset After Collect = 0.009 s (a = 661 us) (n = 13) (lvls, us = 92, 160, 594, 896, 1661) > > > Additional changes: > * Remove `ShenandoahResetBitmapClosure` and `ShenandoahPrepareForMarkClosure`, merge the code with `ShenandoahResetBitmapClosure`, saving one iteration over all the regions. > * Use API `ShenandoahGeneration::parallel_heap_region_iterate_free` to iterate the regions, two benefits from this: > - Underneath it calls `ShenandoahHeap::parallel_heap_region_iterate`, which is faster for very light tasks, see https://bugs.openjdk.org/browse/JDK-8337154 > - `ShenandoahGeneration::parallel_heap_region_iterate_free` decorate the closure with `ShenandoahExcludeRegionClosure`, which simplifies the code in closure. > * When `_do_old_gc_bootstrap is true`, instead of reset mark bitmap for old gen separately, simply reset the global generations, so we don't need walk the all regions twice. > * Clean up FullGC code, remove duplicate code. > > Additional tests: > - [x] CONF=macosx-aarch64-server-fastdebug make test T... This pull request has now been integrated. Changeset: 7c187b5d Author: Xiaolong Peng Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/7c187b5d81a653b87fc498101ad9e2d99b72efc6 Stats: 180 lines in 8 files changed: 95 ins; 62 del; 23 mod 8338737: Shenandoah: Reset marking bitmaps after the cycle Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/jdk/pull/22778 From tschatzl at openjdk.org Mon Mar 3 18:22:24 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 3 Mar 2025 18:22:24 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v6] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review 2 * removal of useless code * renamings ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/810bf2d3..b3dd0084 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=04-05 Stats: 51 lines in 7 files changed: 16 ins; 10 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From wkemper at openjdk.org Mon Mar 3 18:24:52 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 3 Mar 2025 18:24:52 GMT Subject: RFR: 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them [v3] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 17:44:36 GMT, William Kemper wrote: >> The protocol which is meant to prevent regions from being uncommitted while their bitmaps are being reset may fail. This happens when the control thread attempts to wait for the uncommit thread to finish, but the uncommit thread has not yet indicated that it has started. >> >> ## Testing >> GHA, Dacapo, Extremem, Heapothesys, Diluvian, SpecJBB2015, SpecJVM2008 (with and without stress flags, asserts). Also have run the JTREG test that failed this assertion over 10K times (and counting). > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Comment tweak Tests with uncommit behavior enabled look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23760#issuecomment-2695210222 From wkemper at openjdk.org Mon Mar 3 18:30:33 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 3 Mar 2025 18:30:33 GMT Subject: RFR: 8350898: Shenandoah: Eliminate final roots safepoint [v2] In-Reply-To: References: Message-ID: <5Lr95p3Uwv5w0n3YzDmALQc6KESs9xLnWdGm7p1IwGA=.3df358c6-f5d5-4f10-822d-5905429c050e@github.com> > This PR converts the final roots safepoint operation into a handshake. The safepoint operation still exists, but is only executed when `ShenandoahVerify` is enabled. In addition to this change, this PR also improves the logging for the concurrent preparation for update references from [PR 22688](https://github.com/openjdk/jdk/pull/22688). William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots - Fix comments - Add whitespace at end of file - More detail for init update refs event message - Use timing tracker for timing verification - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots - WIP: Fix up phase timings for newly concurrent final roots and init update refs - WIP: Combine satb transfer with state propagation, restore phase timing data - WIP: Transfer pointers out of SATB with a handshake - WIP: Clear weak roots flag concurrently ------------- Changes: https://git.openjdk.org/jdk/pull/23830/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23830&range=01 Stats: 291 lines in 14 files changed: 194 ins; 47 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/23830.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23830/head:pull/23830 PR: https://git.openjdk.org/jdk/pull/23830 From xpeng at openjdk.org Mon Mar 3 20:16:32 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 3 Mar 2025 20:16:32 GMT Subject: RFR: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect Message-ID: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> This is a trivial PR to update the code comments in ShenandoahConcurrentGC::op_reset_after_collect. After doing more test and analysis, we have a better understanding why reset bitmap of young gen after concurrent cycle may cause crash if there is pending old GC cycle to execute: When there is soft reference in old gen, but the referent is in young, reseting bitmap of young will cause wrong state of the soft reference, which may lead to expected cashes. ------------- Commit messages: - 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect Changes: https://git.openjdk.org/jdk/pull/23872/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23872&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351077 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23872/head:pull/23872 PR: https://git.openjdk.org/jdk/pull/23872 From iwalulya at openjdk.org Mon Mar 3 20:18:58 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 3 Mar 2025 20:18:58 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 08:42:05 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * fix comment (trailing whitespace) > * another assert when snapshotting at a safepoint. src/hotspot/share/gc/g1/g1CardTable.cpp line 44: > 42: if (!failures) { > 43: G1CollectedHeap* g1h = G1CollectedHeap::heap(); > 44: G1HeapRegion* r = g1h->heap_region_containing(mr.start()); Probably we can move this outside the loop, and assert that `mr` does not cross region boundaries src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 916: > 914: void safepoint_synchronize_end() override; > 915: > 916: jlong synchronized_duration() const { return _safepoint_duration; } safepoint_duration() seems easier to comprehend. src/hotspot/share/gc/g1/g1CollectionSet.cpp line 310: > 308: verify_young_cset_indices(); > 309: > 310: size_t card_rs_length = _policy->analytics()->predict_card_rs_length(in_young_only_phase); Why are we using a prediction here? Additionally, won't this prediction also include cards from the old gen regions in case of mixed gcs? How do we reconcile that when we are adding old gen regions to c-set? src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 42: > 40: class G1HeapRegion; > 41: class G1Policy; > 42: class G1CardTableClaimTable; Nit: ordering of the declarations src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 84: > 82: // Tracks the current refinement state from idle to completion (and reset back > 83: // to idle). > 84: class G1ConcurrentRefineWorkState { G1ConcurrentRefinementState? I am not convinced the "Work" adds any clarity src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 113: > 111: // Current epoch the work has been started; used to determine if there has been > 112: // a forced card table swap due to a garbage collection while doing work. > 113: size_t _refine_work_epoch; same as previous comment, why `refine_work` instead of `refinement`? src/hotspot/share/gc/g1/g1ConcurrentRefineStats.hpp line 43: > 41: size_t _cards_clean; // Number of cards found clean. > 42: size_t _cards_not_parsable; // Number of cards we could not parse and left unrefined. > 43: size_t _cards_still_refer_to_cset; // Number of cards marked still young. `_cards_still_refer_to_cset` from the naming it is not clear what the difference is with `_cards_refer_to_cset`, the comment is not helping with that ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977688778 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977969470 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977982999 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1977991124 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1978017843 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1978019093 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1978119476 From wkemper at openjdk.org Mon Mar 3 20:20:05 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 3 Mar 2025 20:20:05 GMT Subject: RFR: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect In-Reply-To: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> References: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> Message-ID: On Mon, 3 Mar 2025 20:12:34 GMT, Xiaolong Peng wrote: > This is a trivial PR to update the code comments in ShenandoahConcurrentGC::op_reset_after_collect. > > After doing more test and analysis, we have a better understanding why reset bitmap of young gen after concurrent cycle may cause crash if there is pending old GC cycle to execute: When there is soft reference in old gen, but the referent is in young, reseting bitmap of young will cause wrong state of the soft reference, which may lead to expected cashes. Thanks for getting to the bottom of this. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23872#pullrequestreview-2655199892 From xpeng at openjdk.org Mon Mar 3 20:30:59 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 3 Mar 2025 20:30:59 GMT Subject: RFR: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect In-Reply-To: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> References: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> Message-ID: On Mon, 3 Mar 2025 20:12:34 GMT, Xiaolong Peng wrote: > This is a trivial PR to update the code comments in ShenandoahConcurrentGC::op_reset_after_collect. > > After doing more test and analysis, we have a better understanding why reset bitmap of young gen after concurrent cycle may cause crash if there is pending old GC cycle to execute: When there is soft reference in old gen, but the referent is in young, reseting bitmap of young will cause wrong state of the soft reference, which may lead to expected cashes. Thanks for the review, I'll integrate it since it is really a trivial only for code comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23872#issuecomment-2695462169 From duke at openjdk.org Mon Mar 3 20:31:00 2025 From: duke at openjdk.org (duke) Date: Mon, 3 Mar 2025 20:31:00 GMT Subject: RFR: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect In-Reply-To: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> References: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> Message-ID: On Mon, 3 Mar 2025 20:12:34 GMT, Xiaolong Peng wrote: > This is a trivial PR to update the code comments in ShenandoahConcurrentGC::op_reset_after_collect. > > After doing more test and analysis, we have a better understanding why reset bitmap of young gen after concurrent cycle may cause crash if there is pending old GC cycle to execute: When there is soft reference in old gen, but the referent is in young, reseting bitmap of young will cause wrong state of the soft reference, which may lead to expected cashes. @pengxiaolong Your change (at version 3764bf7d41619a2b51bb860e7ae4005e7f8c0e37) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23872#issuecomment-2695464781 From ysr at openjdk.org Mon Mar 3 21:19:02 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 3 Mar 2025 21:19:02 GMT Subject: RFR: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect In-Reply-To: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> References: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> Message-ID: On Mon, 3 Mar 2025 20:12:34 GMT, Xiaolong Peng wrote: > This is a trivial PR to update the code comments in ShenandoahConcurrentGC::op_reset_after_collect. > > After doing more test and analysis, we have a better understanding why reset bitmap of young gen after concurrent cycle may cause crash if there is pending old GC cycle to execute: When there is soft reference in old gen, but the referent is in young, reseting bitmap of young will cause wrong state of the soft reference, which may lead to expected cashes. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1235: > 1233: // Valid bitmap of young generation is needed by concurrent weak references phase of old GC cycle, > 1234: // because it is possible that there is soft reference in old generation with the referent in young generation; > 1235: // therefore mark bitmap of young generation can't be reset if there will be old GC after the concurrent GC cycle. I don't understand the comment. If the soft reference in old gen points to its referent in the young gen, then the latter should be either reachable, or should have been cleared (depending on who discovered the soft reference & the soft reference clearing policy). If the former, the old gen card should be dirty. May be I am confused about the change in comment, but this may be pointing to a bug in the reference processing code or the associated card-marking code. Or I am not clearly understanding your comment in context. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23872#discussion_r1978221380 From ysr at openjdk.org Mon Mar 3 23:01:07 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 3 Mar 2025 23:01:07 GMT Subject: RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v18] In-Reply-To: References: Message-ID: <9rfQ1rnji3vwQIPlRGqVmh_PwZxLdvcYv-JuukdP7G0=.b4583678-800b-416a-a154-b878535189e4@github.com> On Fri, 28 Feb 2025 17:17:17 GMT, William Kemper wrote: >> There are several changes to the operation of Shenandoah's control threads here. >> * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. >> * The cancellation handling is driven entirely by the cancellation cause >> * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed >> * The shutdown sequence is simpler >> * The generational control thread uses a lock to coordinate updates to the requested cause and generation >> * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance >> * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles >> * The control thread doesn't loop on its own (unless the pacer is enabled). >> >> ## Testing >> * jtreg hotspot_gc_shenandoah >> * dacapo, extremem, diluvian, specjbb2015, specjvm2018, heapothesys > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: > > - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads > - Don't check for shutdown in control thread loop condition > > It may cause the thread to exit before it is requested to stop > - Add assertions about old gen state when resuming old cycles > - Remove duplicated field pointer for old generation > - Improve names and comments > - Merge tag 'jdk-25+11' into fix-control-regulator-threads > > Added tag jdk-25+11 for changeset 0131c1bf > - Address review feedback (better comments, better names) > - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads > - Old gen bootstrap cycle must make it to init mark > - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads > - ... and 27 more: https://git.openjdk.org/jdk/compare/e98df71d...37e445d6 ? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23475#pullrequestreview-2655535655 From ysr at openjdk.org Mon Mar 3 23:08:06 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 3 Mar 2025 23:08:06 GMT Subject: RFR: 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them [v3] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 17:44:57 GMT, William Kemper wrote: > That's a good point. I created a branch that enables uncommit for the test pipelines when I made this original change. I'll resurrect that branch and run that configuration again. Thanks. Any reason not to have (a subset or all) non-performance testing in pipeline run with the default of uncommit enabled? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23760#issuecomment-2695768379 From ysr at openjdk.org Mon Mar 3 23:52:53 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 3 Mar 2025 23:52:53 GMT Subject: RFR: 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them [v3] In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 17:44:36 GMT, William Kemper wrote: >> The protocol which is meant to prevent regions from being uncommitted while their bitmaps are being reset may fail. This happens when the control thread attempts to wait for the uncommit thread to finish, but the uncommit thread has not yet indicated that it has started. >> >> ## Testing >> GHA, Dacapo, Extremem, Heapothesys, Diluvian, SpecJBB2015, SpecJVM2008 (with and without stress flags, asserts). Also have run the JTREG test that failed this assertion over 10K times (and counting). > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Comment tweak ? Small documentation suggestion. No re-review needed. If available, please add to the ticket or to the PR the failing test name(s), and a suitable exemplar stack retrace of assertion violation. src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.hpp line 65: > 63: // Iterate and uncommit eligible regions. Return the number of regions uncommitted. > 64: // This operation may be interrupted if the GC calls `forbid_uncommit`. > 65: size_t do_uncommit_work(double shrink_before, size_t shrink_until) const; I'd document the semantics of the parameters too: // Iterate over and uncommit eligible regions unless committed heap would fall below `shrink_until` . // A region is eligible if it's been empty for at least `shrink_before` // Returns the number of regions uncommitted. May be interrupted by `forbid_uncommit`. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23760#pullrequestreview-2655587421 PR Review Comment: https://git.openjdk.org/jdk/pull/23760#discussion_r1978390429 From xpeng at openjdk.org Tue Mar 4 00:12:53 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 4 Mar 2025 00:12:53 GMT Subject: RFR: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect In-Reply-To: References: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> Message-ID: On Tue, 4 Mar 2025 00:02:29 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1235: >> >>> 1233: // Valid bitmap of young generation is needed by concurrent weak references phase of old GC cycle, >>> 1234: // because it is possible that there is soft reference in old generation with the referent in young generation; >>> 1235: // therefore mark bitmap of young generation can't be reset if there will be old GC after the concurrent GC cycle. >> >> I don't understand the comment. If the soft reference in old gen points to its referent in the young gen, then the latter should be either reachable, or should have been cleared (depending on who discovered the soft reference & the soft reference clearing policy). If the former, the old gen card should be dirty. >> >> May be I am confused about the change in comment, but this may be pointing to a bug in the reference processing code or the associated card-marking code. >> >> Or I am not clearly understanding your comment in context. > > Thanks @earthling-amzn for explaining the issue to me offline. Based on my current understanding of the issue from that explanation, I'd suggest rewording the comment as follows: > > // If we are in the midst of an old gc bootstrap or an old marking, we want to leave the mark bit map of > // the young generation intact. In particular, reference processing in the old generation may potentially > // need the reachability of a young generation referent of a Reference object in the old generation. Thank you Ramki, I'll update the comments and refresh the PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23872#discussion_r1978411180 From ysr at openjdk.org Tue Mar 4 00:12:52 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 4 Mar 2025 00:12:52 GMT Subject: RFR: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect In-Reply-To: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> References: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> Message-ID: On Mon, 3 Mar 2025 20:12:34 GMT, Xiaolong Peng wrote: > This is a trivial PR to update the code comments in ShenandoahConcurrentGC::op_reset_after_collect. > > After doing more test and analysis, we have a better understanding why reset bitmap of young gen after concurrent cycle may cause crash if there is pending old GC cycle to execute: When there is soft reference in old gen, but the referent is in young, reseting bitmap of young will cause wrong state of the soft reference, which may lead to expected cashes. ? small suggested rewording, although what you have also works. (I'll think some more about this to fully understand the context. Thanks.) ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23872#pullrequestreview-2655609678 From ysr at openjdk.org Tue Mar 4 00:12:53 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 4 Mar 2025 00:12:53 GMT Subject: RFR: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect In-Reply-To: References: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> Message-ID: On Mon, 3 Mar 2025 21:16:32 GMT, Y. Srinivas Ramakrishna wrote: >> This is a trivial PR to update the code comments in ShenandoahConcurrentGC::op_reset_after_collect. >> >> After doing more test and analysis, we have a better understanding why reset bitmap of young gen after concurrent cycle may cause crash if there is pending old GC cycle to execute: When there is soft reference in old gen, but the referent is in young, reseting bitmap of young will cause wrong state of the soft reference, which may lead to expected cashes. > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 1235: > >> 1233: // Valid bitmap of young generation is needed by concurrent weak references phase of old GC cycle, >> 1234: // because it is possible that there is soft reference in old generation with the referent in young generation; >> 1235: // therefore mark bitmap of young generation can't be reset if there will be old GC after the concurrent GC cycle. > > I don't understand the comment. If the soft reference in old gen points to its referent in the young gen, then the latter should be either reachable, or should have been cleared (depending on who discovered the soft reference & the soft reference clearing policy). If the former, the old gen card should be dirty. > > May be I am confused about the change in comment, but this may be pointing to a bug in the reference processing code or the associated card-marking code. > > Or I am not clearly understanding your comment in context. Thanks @earthling-amzn for explaining the issue to me offline. Based on my current understanding of the issue from that explanation, I'd suggest rewording the comment as follows: // If we are in the midst of an old gc bootstrap or an old marking, we want to leave the mark bit map of // the young generation intact. In particular, reference processing in the old generation may potentially // need the reachability of a young generation referent of a Reference object in the old generation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23872#discussion_r1978405645 From wkemper at openjdk.org Tue Mar 4 00:44:58 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 4 Mar 2025 00:44:58 GMT Subject: Integrated: 8349094: GenShen: Race between control and regulator threads may violate assertions In-Reply-To: References: Message-ID: On Wed, 5 Feb 2025 22:30:35 GMT, William Kemper wrote: > There are several changes to the operation of Shenandoah's control threads here. > * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread. > * The cancellation handling is driven entirely by the cancellation cause > * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed > * The shutdown sequence is simpler > * The generational control thread uses a lock to coordinate updates to the requested cause and generation > * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance > * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles > * The control thread doesn't loop on its own (unless the pacer is enabled). > > ## Testing > * jtreg hotspot_gc_shenandoah > * dacapo, extremem, diluvian, specjbb2015, specjvm2018, heapothesys This pull request has now been integrated. Changeset: 3a8a432c Author: William Kemper URL: https://git.openjdk.org/jdk/commit/3a8a432c05999fe478b94de75b416404b5a515d2 Stats: 963 lines in 18 files changed: 327 ins; 294 del; 342 mod 8349094: GenShen: Race between control and regulator threads may violate assertions Reviewed-by: ysr, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/23475 From wkemper at openjdk.org Tue Mar 4 00:57:06 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 4 Mar 2025 00:57:06 GMT Subject: RFR: 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them [v4] In-Reply-To: References: Message-ID: > The protocol which is meant to prevent regions from being uncommitted while their bitmaps are being reset may fail. This happens when the control thread attempts to wait for the uncommit thread to finish, but the uncommit thread has not yet indicated that it has started. > > ## Testing > GHA, Dacapo, Extremem, Heapothesys, Diluvian, SpecJBB2015, SpecJVM2008 (with and without stress flags, asserts). Also have run the JTREG test that failed this assertion over 10K times (and counting). William Kemper has updated the pull request incrementally with one additional commit since the last revision: Document parameters for do_uncommit_work ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23760/files - new: https://git.openjdk.org/jdk/pull/23760/files/1c32c0e3..e25e6276 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23760&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23760&range=02-03 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23760.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23760/head:pull/23760 PR: https://git.openjdk.org/jdk/pull/23760 From wkemper at openjdk.org Tue Mar 4 00:57:06 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 4 Mar 2025 00:57:06 GMT Subject: RFR: 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them [v3] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 23:05:45 GMT, Y. Srinivas Ramakrishna wrote: >> That's a good point. I created a branch that enables uncommit for the test pipelines when I made this original change. I'll resurrect that branch and run that configuration again. Thanks. > >> That's a good point. I created a branch that enables uncommit for the test pipelines when I made this original change. I'll resurrect that branch and run that configuration again. Thanks. > > Any reason not to have (a subset or all) non-performance testing in pipeline run with the default of uncommit enabled? @ysramakrishna , I will enable uncommit for the stress tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23760#issuecomment-2695908894 From wkemper at openjdk.org Tue Mar 4 00:57:06 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 4 Mar 2025 00:57:06 GMT Subject: RFR: 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them [v3] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 23:40:25 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Comment tweak > > src/hotspot/share/gc/shenandoah/shenandoahUncommitThread.hpp line 65: > >> 63: // Iterate and uncommit eligible regions. Return the number of regions uncommitted. >> 64: // This operation may be interrupted if the GC calls `forbid_uncommit`. >> 65: size_t do_uncommit_work(double shrink_before, size_t shrink_until) const; > > I'd document the semantics of the parameters too: > > // Iterate over and uncommit eligible regions unless committed heap would fall below `shrink_until` . > // A region is eligible if it's been empty for at least `shrink_before` > // Returns the number of regions uncommitted. May be interrupted by `forbid_uncommit`. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23760#discussion_r1978440214 From xpeng at openjdk.org Tue Mar 4 00:58:27 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 4 Mar 2025 00:58:27 GMT Subject: RFR: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect [v2] In-Reply-To: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> References: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> Message-ID: <6GKNvjF02TlZU_UZMNtWnzbs_BIRVf2x1UeiDIFg4hU=.160089d2-5601-4fc4-9d77-2fb6aa09d18b@github.com> > This is a trivial PR to update the code comments in ShenandoahConcurrentGC::op_reset_after_collect. > > After doing more test and analysis, we have a better understanding why reset bitmap of young gen after concurrent cycle may cause crash if there is pending old GC cycle to execute: When there is soft reference in old gen, but the referent is in young, reseting bitmap of young will cause wrong state of the soft reference, which may lead to expected cashes. Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Update code comments as suggested in PR ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23872/files - new: https://git.openjdk.org/jdk/pull/23872/files/3764bf7d..d760471e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23872&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23872&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23872/head:pull/23872 PR: https://git.openjdk.org/jdk/pull/23872 From xpeng at openjdk.org Tue Mar 4 00:58:27 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 4 Mar 2025 00:58:27 GMT Subject: RFR: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect [v2] In-Reply-To: References: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> Message-ID: On Tue, 4 Mar 2025 00:08:27 GMT, Y. Srinivas Ramakrishna wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Update code comments as suggested in PR > > ? > > small suggested rewording, although what you have also works. > > (I'll think some more about this to fully understand the context. Thanks.) Thank you @ysramakrishna and @earthling-amzn! I have updated the comments as you have suggested in the PR review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23872#issuecomment-2695910096 From wkemper at openjdk.org Tue Mar 4 01:08:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 4 Mar 2025 01:08:54 GMT Subject: RFR: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect [v2] In-Reply-To: <6GKNvjF02TlZU_UZMNtWnzbs_BIRVf2x1UeiDIFg4hU=.160089d2-5601-4fc4-9d77-2fb6aa09d18b@github.com> References: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> <6GKNvjF02TlZU_UZMNtWnzbs_BIRVf2x1UeiDIFg4hU=.160089d2-5601-4fc4-9d77-2fb6aa09d18b@github.com> Message-ID: On Tue, 4 Mar 2025 00:58:27 GMT, Xiaolong Peng wrote: >> This is a trivial PR to update the code comments in ShenandoahConcurrentGC::op_reset_after_collect. >> >> After doing more test and analysis, we have a better understanding why reset bitmap of young gen after concurrent cycle may cause crash if there is pending old GC cycle to execute: When there is soft reference in old gen, but the referent is in young, reseting bitmap of young will cause wrong state of the soft reference, which may lead to expected cashes. > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update code comments as suggested in PR Marked as reviewed by wkemper (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23872#pullrequestreview-2655679787 From duke at openjdk.org Tue Mar 4 01:19:52 2025 From: duke at openjdk.org (duke) Date: Tue, 4 Mar 2025 01:19:52 GMT Subject: RFR: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect [v2] In-Reply-To: <6GKNvjF02TlZU_UZMNtWnzbs_BIRVf2x1UeiDIFg4hU=.160089d2-5601-4fc4-9d77-2fb6aa09d18b@github.com> References: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> <6GKNvjF02TlZU_UZMNtWnzbs_BIRVf2x1UeiDIFg4hU=.160089d2-5601-4fc4-9d77-2fb6aa09d18b@github.com> Message-ID: On Tue, 4 Mar 2025 00:58:27 GMT, Xiaolong Peng wrote: >> This is a trivial PR to update the code comments in ShenandoahConcurrentGC::op_reset_after_collect. >> >> After doing more test and analysis, we have a better understanding why reset bitmap of young gen after concurrent cycle may cause crash if there is pending old GC cycle to execute: When there is soft reference in old gen, but the referent is in young, reseting bitmap of young will cause wrong state of the soft reference, which may lead to expected cashes. > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update code comments as suggested in PR @pengxiaolong Your change (at version d760471e5a84bc45466ba2d676f97a0efcb477db) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23872#issuecomment-2695934719 From ysr at openjdk.org Tue Mar 4 02:13:54 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 4 Mar 2025 02:13:54 GMT Subject: RFR: 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them [v4] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 00:57:06 GMT, William Kemper wrote: >> The protocol which is meant to prevent regions from being uncommitted while their bitmaps are being reset may fail. This happens when the control thread attempts to wait for the uncommit thread to finish, but the uncommit thread has not yet indicated that it has started. >> >> ## Testing >> GHA, Dacapo, Extremem, Heapothesys, Diluvian, SpecJBB2015, SpecJVM2008 (with and without stress flags, asserts). Also have run the JTREG test that failed this assertion over 10K times (and counting). > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Document parameters for do_uncommit_work Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23760#pullrequestreview-2655747950 From xpeng at openjdk.org Tue Mar 4 03:58:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 4 Mar 2025 03:58:56 GMT Subject: Integrated: 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect In-Reply-To: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> References: <1drXUZ5QM7_IPvLi3eRBKVx14M0ofow8KF0XlnzaJzY=.b37d216f-4c68-4427-ab2d-f591bf00d18f@github.com> Message-ID: On Mon, 3 Mar 2025 20:12:34 GMT, Xiaolong Peng wrote: > This is a trivial PR to update the code comments in ShenandoahConcurrentGC::op_reset_after_collect. > > After doing more test and analysis, we have a better understanding why reset bitmap of young gen after concurrent cycle may cause crash if there is pending old GC cycle to execute: When there is soft reference in old gen, but the referent is in young, reseting bitmap of young will cause wrong state of the soft reference, which may lead to expected cashes. This pull request has now been integrated. Changeset: 7c173fde Author: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/7c173fde4274a798f299876492a2cd833eee9fdd Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod 8351077: Shenandoah: Update comments in ShenandoahConcurrentGC::op_reset_after_collect Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/jdk/pull/23872 From cslucas at openjdk.org Tue Mar 4 04:10:25 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 4 Mar 2025 04:10:25 GMT Subject: RFR: 8351081: Off-by-one error in ShenandoahCardCluster Message-ID: Given certain values for the variables in [this expression](https://github.com/openjdk/jdk/blob/a87dd1a75f78cf872df49bea83ba48af8acfa2fd/src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp#L173) the result of the computation can be equal to `_ rs->total_cards()` which will lead to segmentation fault, for instance in [starts_object(card_at_end)](https://github.com/openjdk/jdk/blob/a87dd1a75f78cf872df49bea83ba48af8acfa2fd/src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp#L393). The problem happens, though, because the `_object_starts` array doesn't have a [guarding entry](https://github.com/openjdk/jdk/blob/a87dd1a75f78cf872df49bea83ba48af8acfa2fd/src/hotspot/share/gc/shenandoah/shenandoahCardTable.cpp#L37) at the end. This pull request adjusts the allocation of `_object_starts` to include an additional entry at the end to account for this situation. Tested with JTREG tier 1-4, x86_64 & AArch64 on Linux. ------------- Commit messages: - Adjust allocation of object_starts Changes: https://git.openjdk.org/jdk/pull/23882/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23882&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351081 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23882.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23882/head:pull/23882 PR: https://git.openjdk.org/jdk/pull/23882 From aboldtch at openjdk.org Tue Mar 4 07:33:03 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 4 Mar 2025 07:33:03 GMT Subject: RFR: 8350851: ZGC: Reduce size of ZAddressOffsetMax scaling data structures In-Reply-To: References: Message-ID: <583w99DOSzrZgnaBNEERDKgdqOes5xa-oQYWtfEIFaA=.a7ce3ef1-486d-4150-932e-fd5d2743a634@github.com> On Thu, 27 Feb 2025 12:45:36 GMT, Axel Boldt-Christmas wrote: > ZAddressOffsetMax is used to scale a few of our BitMap and GranuleMap data structures. ZAddressOffsetMax is initialised to an upper limit, prior to reserving the virtual address space for the heap. After the reservation, the largest address offset that can be encountered may be much lower. > > I propose we scale ZAddressOffsetMax down after our heap reservation is complete, to the actual max value an zoffset_end is allowed to be. > > Doing this gives us two benefits. Firstly the assertions and type checks will be stricter, and will exercise code paths that otherwise only occur when using a 16TB heap. Secondly we can reduce the size of the data structures which scale with ZAddressOffsetMax. (For most OSs the extra memory of these data structures do not really matter as they are not page'd in. But they are accounted for both on the OS, allocator and NMT layers). > > The page table, uses ZIndexDistributor to iterate and distribute indices. The different strategies have different requirements on the alignment of the size of the range it distribute across. My proposed implementation simply aligns up the page table size to this alignment requirement. As it is the least intrusive change, at the cost of some larger data structure than strictly required. The alternative would be to extend ZIndexDistributor with support for any alignment on the range, or condition the use of the distributed indices based on if they are less than the size. > > The data structures can also be larger than required if we fail to reserve the heap starting at our heap base. However this is a very rare occurrence, and while it would be nice to extend our asserts to check for a "ZAddressOffsetMin", I'll leave that for a future enhancement. > > Testing: > * ZGC specific tasks, tier 1 through tier 8 on Oracle Supported platforms > * with `ZIndexDistributorStrategy=0`, and > * with `ZIndexDistributorStrategy=1` > * GHA Thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/23822#issuecomment-2696468239 From aboldtch at openjdk.org Tue Mar 4 07:33:03 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 4 Mar 2025 07:33:03 GMT Subject: Integrated: 8350851: ZGC: Reduce size of ZAddressOffsetMax scaling data structures In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 12:45:36 GMT, Axel Boldt-Christmas wrote: > ZAddressOffsetMax is used to scale a few of our BitMap and GranuleMap data structures. ZAddressOffsetMax is initialised to an upper limit, prior to reserving the virtual address space for the heap. After the reservation, the largest address offset that can be encountered may be much lower. > > I propose we scale ZAddressOffsetMax down after our heap reservation is complete, to the actual max value an zoffset_end is allowed to be. > > Doing this gives us two benefits. Firstly the assertions and type checks will be stricter, and will exercise code paths that otherwise only occur when using a 16TB heap. Secondly we can reduce the size of the data structures which scale with ZAddressOffsetMax. (For most OSs the extra memory of these data structures do not really matter as they are not page'd in. But they are accounted for both on the OS, allocator and NMT layers). > > The page table, uses ZIndexDistributor to iterate and distribute indices. The different strategies have different requirements on the alignment of the size of the range it distribute across. My proposed implementation simply aligns up the page table size to this alignment requirement. As it is the least intrusive change, at the cost of some larger data structure than strictly required. The alternative would be to extend ZIndexDistributor with support for any alignment on the range, or condition the use of the distributed indices based on if they are less than the size. > > The data structures can also be larger than required if we fail to reserve the heap starting at our heap base. However this is a very rare occurrence, and while it would be nice to extend our asserts to check for a "ZAddressOffsetMin", I'll leave that for a future enhancement. > > Testing: > * ZGC specific tasks, tier 1 through tier 8 on Oracle Supported platforms > * with `ZIndexDistributorStrategy=0`, and > * with `ZIndexDistributorStrategy=1` > * GHA This pull request has now been integrated. Changeset: 1f10ffba Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/1f10ffba88119caab169b1fc43ccfd143e3b85a6 Stats: 82 lines in 10 files changed: 73 ins; 1 del; 8 mod 8350851: ZGC: Reduce size of ZAddressOffsetMax scaling data structures Reviewed-by: eosterlund, jsikstro ------------- PR: https://git.openjdk.org/jdk/pull/23822 From aboldtch at openjdk.org Tue Mar 4 07:54:05 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 4 Mar 2025 07:54:05 GMT Subject: RFR: 8333578: Fix uses of overaligned types induced by ZCACHE_ALIGNED Message-ID: The only directly heap allocated, constructed object of types that are overaligned because of ZCACHE_ALIGNED is ZCollectedHeap. The other are either in static storage or contained in (and constructed as part of) ZCollectedHeap. So we only need to fix ZCollectedHeap allocation. As the CollectedHeap is only ever created once and is never destroyed, we can simply align the allocation and create an unfreeable pointer. This implementation imposes that `ZCacheLineSize` is a power of two, but we already have this requirement elsewhere (e.g. `ZContendedStorage`). Testing: * tier 1 through tier 5 Oracle supported platforms * GHA ------------- Commit messages: - 8333578: Fix uses of overaligned types induced by ZCACHE_ALIGNED Changes: https://git.openjdk.org/jdk/pull/23885/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23885&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333578 Stats: 16 lines in 1 file changed: 14 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23885/head:pull/23885 PR: https://git.openjdk.org/jdk/pull/23885 From tschatzl at openjdk.org Tue Mar 4 08:24:54 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 08:24:54 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 20:02:16 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * fix comment (trailing whitespace) >> * another assert when snapshotting at a safepoint. > > src/hotspot/share/gc/g1/g1ConcurrentRefineStats.hpp line 43: > >> 41: size_t _cards_clean; // Number of cards found clean. >> 42: size_t _cards_not_parsable; // Number of cards we could not parse and left unrefined. >> 43: size_t _cards_still_refer_to_cset; // Number of cards marked still young. > > `_cards_still_refer_to_cset` from the naming it is not clear what the difference is with `_cards_refer_to_cset`, the comment is not helping with that `cards_still_refer_to_cset` refers to cards that were found to have already been marked as `to-collection-set`. Renamed to `_cards_already_refer_to_cset`, would that be okay? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1978868225 From tschatzl at openjdk.org Tue Mar 4 08:28:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 08:28:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 18:28:48 GMT, Ivan Walulya wrote: > Why are we using a prediction here? Quickly checking again, do we have the actual count here from somewhere? > Additionally, won't this prediction also include cards from the old gen regions in case of mixed gcs? How do we reconcile that when we are adding old gen regions to c-set? The predictor contents changed to (supposedly) only contain cards containing young gen references. See g1Policy.cpp:934: _analytics->report_card_rs_length(total_cards_scanned - total_non_young_rs_cards, is_young_only_pause); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1978876199 From tschatzl at openjdk.org Tue Mar 4 08:36:55 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 08:36:55 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 15:19:20 GMT, Albert Mingkun Yang wrote: > Can you elaborate on what the "special handling" would be, if we don's set "claimed" for non-committed regions? the iteration code, would for every region check whether the region is actually committed or not. The `heap_region_iterate()` API of `G1CollectedHeap` only iterates over committed regions. So only committed regions will be updated in the state table. Later when iterating over the state table, the code uses the array directly, i.e. the claim state of uncommitted regions would be read as uninitialized. Further, it would be hard to exclude regions committed after the snapshot otherwise (we do not need to iterate over them. Their card table can't contain card marks) as we do not track newly committed regions in the snapshot. We could do, but would be a headache due to memory synchronization because regions can be committed any time. Imho it is much simpler to reset all the card claims to "already processed" and then make the regions we want to work on claimable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1978893134 From aboldtch at openjdk.org Tue Mar 4 08:39:03 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 4 Mar 2025 08:39:03 GMT Subject: RFR: 8351137: ZGC: Improve ZValueStorage alignment support Message-ID: ZValueStorage only align the allocations to the alignment defined by the storage but ignores the alignment of the types. Right now all usages of our different storages all have types which have an alignment less than or equal to the alignment set by its storage. I wish to improve this so that types with greater alignment than the storage alignment can be used. The UB caused by using a type larger than the storage alignment is something I have seen materialise as returning bad address (and crashing) on Windows. As we use `utilities/align.hpp` for our alignment utilities we only support power of two alignment, I added extra asserts here because we use the fact that `lcm(x, y) = max(x, y)` if both are powers of two. Testing: * tier 1 through tier 5 Oracle supported platforms * GHA ------------- Commit messages: - 8351137: ZGC: Improve ZValueStorage alignment support Changes: https://git.openjdk.org/jdk/pull/23887/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23887&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351137 Stats: 21 lines in 2 files changed: 14 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23887.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23887/head:pull/23887 PR: https://git.openjdk.org/jdk/pull/23887 From tschatzl at openjdk.org Tue Mar 4 08:39:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 08:39:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:22:03 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1ConcurrentRefineStats.hpp line 43: >> >>> 41: size_t _cards_clean; // Number of cards found clean. >>> 42: size_t _cards_not_parsable; // Number of cards we could not parse and left unrefined. >>> 43: size_t _cards_still_refer_to_cset; // Number of cards marked still young. >> >> `_cards_still_refer_to_cset` from the naming it is not clear what the difference is with `_cards_refer_to_cset`, the comment is not helping with that > > `cards_still_refer_to_cset` refers to cards that were found to have already been marked as `to-collection-set`. Renamed to `_cards_already_refer_to_cset`, would that be okay? Fwiw, this is just for statistics, so if you want I can remove these. I did some experiments with re-examining these cards too to see whether we could clear them later. For determining if/when to do that a rate of increase for the young cards has been interesting. As mentioned, if you want I can remove them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1978896272 From tschatzl at openjdk.org Tue Mar 4 08:53:46 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 08:53:46 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v7] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * iwalulya initial comments * renaming * made blend() helper function more clear; at least gcc will optimize it to the same code as before ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/b3dd0084..8f46dc9a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=05-06 Stats: 27 lines in 9 files changed: 7 ins; 3 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Tue Mar 4 09:15:24 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 09:15:24 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v8] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * do not change card table base for gc threads during swapping * not necessary because they do not use it * (recent assert that verifies that non-java threads do not have a card table found this) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/8f46dc9a..9e2ee543 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=06-07 Stats: 25 lines in 1 file changed: 9 ins; 14 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From iwalulya at openjdk.org Tue Mar 4 09:38:58 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 4 Mar 2025 09:38:58 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:36:58 GMT, Thomas Schatzl wrote: >> `cards_still_refer_to_cset` refers to cards that were found to have already been marked as `to-collection-set`. Renamed to `_cards_already_refer_to_cset`, would that be okay? > > Fwiw, this particular counter is just for statistics, so if you want I can remove these. I did some experiments with re-examining these cards too to see whether we could clear them later. For determining if/when to do that a rate of increase for the young cards has been interesting. > > As mentioned, if you want I can remove them. `_cards_already_refer_to_cset` is fine by me, i don't like the option of removing them ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979009507 From iwalulya at openjdk.org Tue Mar 4 09:43:54 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 4 Mar 2025 09:43:54 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:26:10 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1CollectionSet.cpp line 310: >> >>> 308: verify_young_cset_indices(); >>> 309: >>> 310: size_t card_rs_length = _policy->analytics()->predict_card_rs_length(in_young_only_phase); >> >> Why are we using a prediction here? Additionally, won't this prediction also include cards from the old gen regions in case of mixed gcs? How do we reconcile that when we are adding old gen regions to c-set? > >> Why are we using a prediction here? > > Quickly checking again, do we have the actual count here from somewhere? > >> Additionally, won't this prediction also include cards from the old gen regions in case of mixed gcs? How do we reconcile that when we are adding old gen regions to c-set? > > The predictor contents changed to (supposedly) only contain cards containing young gen references. See g1Policy.cpp:934: > > _analytics->report_card_rs_length(total_cards_scanned - total_non_young_rs_cards, is_young_only_pause); Fair, I missed that details on young RS have been removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979022900 From tschatzl at openjdk.org Tue Mar 4 09:57:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 09:57:58 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: On Mon, 3 Mar 2025 18:50:37 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * fix comment (trailing whitespace) >> * another assert when snapshotting at a safepoint. > > src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 84: > >> 82: // Tracks the current refinement state from idle to completion (and reset back >> 83: // to idle). >> 84: class G1ConcurrentRefineWorkState { > > G1ConcurrentRefinementState? I am not convinced the "Work" adds any clarity We agreed on `G1ConcurrentRefineSweepState` for now, better suggestions welcome. Use `Refine` instead of `Refinement` since all pre-existing classes also use `Refine`. This could be renamed in an extra change. > src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 113: > >> 111: // Current epoch the work has been started; used to determine if there has been >> 112: // a forced card table swap due to a garbage collection while doing work. >> 113: size_t _refine_work_epoch; > > same as previous comment, why `refine_work` instead of `refinement`? Already renamed, same as previous comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979050867 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979051649 From tschatzl at openjdk.org Tue Mar 4 09:57:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 09:57:58 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v5] In-Reply-To: References: Message-ID: <3BAl6ELdTMEhWoovthkw7lq86mwuoUnyKxzCANFnwNc=.41077bf4-8073-4810-9d0d-078d7ad06240@github.com> On Tue, 4 Mar 2025 09:52:40 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 84: >> >>> 82: // Tracks the current refinement state from idle to completion (and reset back >>> 83: // to idle). >>> 84: class G1ConcurrentRefineWorkState { >> >> G1ConcurrentRefinementState? I am not convinced the "Work" adds any clarity > > We agreed on `G1ConcurrentRefineSweepState` for now, better suggestions welcome. > > Use `Refine` instead of `Refinement` since all pre-existing classes also use `Refine`. This could be renamed in an extra change. Add the `Sweep` in the name because this is not the state for entire refinement (which also includes information about when to start refinement/sweeping). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979053344 From tschatzl at openjdk.org Tue Mar 4 09:57:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 09:57:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v9] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * iwalulya review 2 * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState * some additional documentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/9e2ee543..442d9eae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=07-08 Stats: 93 lines in 7 files changed: 27 ins; 3 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From mdoerr at openjdk.org Tue Mar 4 10:40:55 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 4 Mar 2025 10:40:55 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v9] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:57:56 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * iwalulya review 2 > * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState > * some additional documentation I got an error while testing java/foreign/TestUpcallStress.java on linuxaarch64 with this PR: # Internal Error (/openjdk-jdk-linux_aarch64-dbg/jdk/src/hotspot/share/gc/g1/g1CardTable.cpp:56), pid=19044, tid=19159 # guarantee(!failures) failed: there should not have been any failures ... V [libjvm.so+0xb6e988] G1CardTable::verify_region(MemRegion, unsigned char, bool)+0x3b8 (g1CardTable.cpp:56) V [libjvm.so+0xc3a10c] G1MergeHeapRootsTask::G1ClearBitmapClosure::do_heap_region(G1HeapRegion*)+0x13c (g1RemSet.cpp:1048) V [libjvm.so+0xb7a80c] G1CollectedHeap::par_iterate_regions_array(G1HeapRegionClosure*, G1HeapRegionClaimer*, unsigned int const*, unsigned long, unsigned int) const+0x9c (g1CollectedHeap.cpp:2059) V [libjvm.so+0xc49fe8] G1MergeHeapRootsTask::work(unsigned int)+0x708 (g1RemSet.cpp:1225) V [libjvm.so+0x19597bc] WorkerThread::run()+0x98 (workerThread.cpp:69) V [libjvm.so+0x1824510] Thread::call_run()+0xac (thread.cpp:231) V [libjvm.so+0x13b3994] thread_native_entry(Thread*)+0x130 (os_linux.cpp:877) C [libpthread.so.0+0x875c] start_thread+0x18c ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2697024679 From tschatzl at openjdk.org Tue Mar 4 10:48:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 10:48:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v9] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 10:37:47 GMT, Martin Doerr wrote: > I got an error while testing java/foreign/TestUpcallStress.java on linuxaarch64 with this PR: > > ``` > # Internal Error (/openjdk-jdk-linux_aarch64-dbg/jdk/src/hotspot/share/gc/g1/g1CardTable.cpp:56), pid=19044, tid=19159 > # guarantee(!failures) failed: there should not have been any failures > ... > V [libjvm.so+0xb6e988] G1CardTable::verify_region(MemRegion, unsigned char, bool)+0x3b8 (g1CardTable.cpp:56) > V [libjvm.so+0xc3a10c] G1MergeHeapRootsTask::G1ClearBitmapClosure::do_heap_region(G1HeapRegion*)+0x13c (g1RemSet.cpp:1048) > V [libjvm.so+0xb7a80c] G1CollectedHeap::par_iterate_regions_array(G1HeapRegionClosure*, G1HeapRegionClaimer*, unsigned int const*, unsigned long, unsigned int) const+0x9c (g1CollectedHeap.cpp:2059) > V [libjvm.so+0xc49fe8] G1MergeHeapRootsTask::work(unsigned int)+0x708 (g1RemSet.cpp:1225) > V [libjvm.so+0x19597bc] WorkerThread::run()+0x98 (workerThread.cpp:69) > V [libjvm.so+0x1824510] Thread::call_run()+0xac (thread.cpp:231) > V [libjvm.so+0x13b3994] thread_native_entry(Thread*)+0x130 (os_linux.cpp:877) > C [libpthread.so.0+0x875c] start_thread+0x18c > ``` I will try to reproduce. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2697052899 From tschatzl at openjdk.org Tue Mar 4 10:53:46 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 10:53:46 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v10] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * ayang review - fix comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/442d9eae..fc674f02 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From iwalulya at openjdk.org Tue Mar 4 11:19:59 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 4 Mar 2025 11:19:59 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v9] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:57:56 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * iwalulya review 2 > * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState > * some additional documentation src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 108: > 106: > 107: void G1ConcurrentRefineThreadControl::control_thread_do(ThreadClosure* tc) { > 108: if (_control_thread != nullptr) { maybe maintain using `if (max_num_threads() > 0)` as used in `G1ConcurrentRefineThreadControl::initialize`, so that it is clear that setting `G1ConcRefinementThreads=0` effectively turns off concurrent refinement. src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 354: > 352: if (!r->is_free()) { > 353: // Need to scan all parts of non-free regions, so reset the claim. > 354: // No need for synchronization: we are only interested about regions s/about/in src/hotspot/share/gc/g1/g1OopClosures.hpp line 205: > 203: G1CollectedHeap* _g1h; > 204: uint _worker_id; > 205: bool _has_to_cset_ref; Similar to `_cards_refer_to_cset` , do you mind renaming `_has_to_cset_ref` and `_has_to_old_ref` to `_has_ref_to_cset` and `_has_ref_to_old` src/hotspot/share/gc/g1/g1Policy.hpp line 105: > 103: uint _free_regions_at_end_of_collection; > 104: > 105: size_t _pending_cards_from_gc; A comment on the variable would be nice, especially on how it is set/reset both at end of GC and by refinement. Also the `_to_collection_set_cards` below could use a comment ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979077904 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979102189 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979212854 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979155941 From tschatzl at openjdk.org Tue Mar 4 11:39:55 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 11:39:55 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v9] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 10:06:37 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * iwalulya review 2 >> * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState >> * some additional documentation > > src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 108: > >> 106: >> 107: void G1ConcurrentRefineThreadControl::control_thread_do(ThreadClosure* tc) { >> 108: if (_control_thread != nullptr) { > > maybe maintain using `if (max_num_threads() > 0)` as used in `G1ConcurrentRefineThreadControl::initialize`, so that it is clear that setting `G1ConcRefinementThreads=0` effectively turns off concurrent refinement. I added a new `is_refinement_enabled()` predicate instead (that uses `max_num_threads()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979252156 From tschatzl at openjdk.org Tue Mar 4 11:56:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 11:56:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: iwalulya review * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement * predicate for determining whether the refinement has been disabled * some other typos/comment improvements * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/fc674f02..b4d19d9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=09-10 Stats: 40 lines in 8 files changed: 14 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From kdnilsen at openjdk.org Tue Mar 4 15:02:04 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 4 Mar 2025 15:02:04 GMT Subject: RFR: 8350898: Shenandoah: Eliminate final roots safepoint [v2] In-Reply-To: <5Lr95p3Uwv5w0n3YzDmALQc6KESs9xLnWdGm7p1IwGA=.3df358c6-f5d5-4f10-822d-5905429c050e@github.com> References: <5Lr95p3Uwv5w0n3YzDmALQc6KESs9xLnWdGm7p1IwGA=.3df358c6-f5d5-4f10-822d-5905429c050e@github.com> Message-ID: On Mon, 3 Mar 2025 18:30:33 GMT, William Kemper wrote: >> This PR converts the final roots safepoint operation into a handshake. The safepoint operation still exists, but is only executed when `ShenandoahVerify` is enabled. In addition to this change, this PR also improves the logging for the concurrent preparation for update references from [PR 22688](https://github.com/openjdk/jdk/pull/22688). > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots > - Fix comments > - Add whitespace at end of file > - More detail for init update refs event message > - Use timing tracker for timing verification > - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots > - WIP: Fix up phase timings for newly concurrent final roots and init update refs > - WIP: Combine satb transfer with state propagation, restore phase timing data > - WIP: Transfer pointers out of SATB with a handshake > - WIP: Clear weak roots flag concurrently Thanks. Great improvement. src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 458: > 456: > 457: // Step 1. All threads need to 'complete' partially filled, thread local buffers. This > 458: // is accomplished in ShenandoahConcurrentGC::complete_abbreviated_cycle using a Handshake I think we're talking about "complete processing" of thread-local satb buffers. To avoid confusion with tlab, maybe add satb to this comment. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/23830#pullrequestreview-2657883998 PR Review Comment: https://git.openjdk.org/jdk/pull/23830#discussion_r1979620964 From kdnilsen at openjdk.org Tue Mar 4 15:04:59 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 4 Mar 2025 15:04:59 GMT Subject: RFR: 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them [v4] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 00:57:06 GMT, William Kemper wrote: >> The protocol which is meant to prevent regions from being uncommitted while their bitmaps are being reset may fail. This happens when the control thread attempts to wait for the uncommit thread to finish, but the uncommit thread has not yet indicated that it has started. >> >> ## Testing >> GHA, Dacapo, Extremem, Heapothesys, Diluvian, SpecJBB2015, SpecJVM2008 (with and without stress flags, asserts). Also have run the JTREG test that failed this assertion over 10K times (and counting). > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Document parameters for do_uncommit_work Repeat approval. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/23760#pullrequestreview-2657921882 From kbarrett at openjdk.org Tue Mar 4 15:33:59 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 4 Mar 2025 15:33:59 GMT Subject: RFR: 8333578: Fix uses of overaligned types induced by ZCACHE_ALIGNED In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 07:49:23 GMT, Axel Boldt-Christmas wrote: > The only directly heap allocated, constructed object of types that are overaligned because of ZCACHE_ALIGNED is ZCollectedHeap. The other are either in static storage or contained in (and constructed as part of) ZCollectedHeap. So we only need to fix ZCollectedHeap allocation. > > As the CollectedHeap is only ever created once and is never destroyed, we can simply align the allocation and create an unfreeable pointer. > > This implementation imposes that `ZCacheLineSize` is a power of two, but we already have this requirement elsewhere (e.g. `ZContendedStorage`). > > Testing: > * tier 1 through tier 5 Oracle supported platforms > * GHA Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23885#pullrequestreview-2658015606 From ayang at openjdk.org Tue Mar 4 15:47:00 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 4 Mar 2025 15:47:00 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 11:56:56 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > iwalulya review > * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement > * predicate for determining whether the refinement has been disabled > * some other typos/comment improvements > * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 356: > 354: bool do_heap_region(G1HeapRegion* r) override { > 355: if (!r->is_free()) { > 356: // Need to scan all parts of non-free regions, so reset the claim. Why is the condition "is_free"? I thought we scan only old-or-humongous regions? src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 116: > 114: SwapGlobalCT, // Swap global card table. > 115: SwapJavaThreadsCT, // Swap java thread's card tables. > 116: SwapGCThreadsCT, // Swap GC thread's card tables. Do GC threads have card-table? src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 219: > 217: // The young gen revising mechanism reads the predictor and the values set > 218: // here. Avoid inconsistencies by locking. > 219: MutexLocker x(G1RareEvent_lock, Mutex::_no_safepoint_check_flag); Who else can be in this critical-section? I don't get what this lock is protecting us from. src/hotspot/share/gc/g1/g1ConcurrentRefineThread.hpp line 83: > 81: > 82: public: > 83: static G1ConcurrentRefineThread* create(G1ConcurrentRefine* cr); I wonder if the comment for this class "One or more G1 Concurrent Refinement Threads..." has become obsolete. (AFAICS, this class is a singleton.) src/hotspot/share/gc/g1/g1ConcurrentRefineWorkTask.cpp line 69: > 67: } else if (res == G1RemSet::NoInteresting) { > 68: _refine_stats.inc_cards_clean_again(); > 69: } A `switch` is probably cleaner. src/hotspot/share/gc/g1/g1ConcurrentRefineWorkTask.cpp line 78: > 76: do_dirty_card(source, dest_card); > 77: } > 78: return pointer_delta(dirty_r, dirty_l, sizeof(CardValue)); I feel the `pointer_delta` line belongs to the caller. After that, even the entire method can be inlined to the caller. YMMV. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979666477 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979678325 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979699376 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979695999 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979705019 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979709682 From tschatzl at openjdk.org Tue Mar 4 16:03:55 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 16:03:55 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 15:16:17 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> iwalulya review >> * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement >> * predicate for determining whether the refinement has been disabled >> * some other typos/comment improvements >> * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming > > src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 356: > >> 354: bool do_heap_region(G1HeapRegion* r) override { >> 355: if (!r->is_free()) { >> 356: // Need to scan all parts of non-free regions, so reset the claim. > > Why is the condition "is_free"? I thought we scan only old-or-humongous regions? We also need to clear young gen region marks because we want them to be all clean in the card table for the garbage collection (evacuation failure handling, use in next cycle). This is maybe a bit of a waste if there are multiple refinement rounds between two gcs, but it's less expensive than in the pause wrt to latency. It's fast anyway. > src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 116: > >> 114: SwapGlobalCT, // Swap global card table. >> 115: SwapJavaThreadsCT, // Swap java thread's card tables. >> 116: SwapGCThreadsCT, // Swap GC thread's card tables. > > Do GC threads have card-table? Hmm, I thought I changed tat already just recently with Ivan's latest requests. Will fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979742662 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979752692 From tschatzl at openjdk.org Tue Mar 4 16:07:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 16:07:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 16:00:46 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1ConcurrentRefine.hpp line 116: >> >>> 114: SwapGlobalCT, // Swap global card table. >>> 115: SwapJavaThreadsCT, // Swap java thread's card tables. >>> 116: SwapGCThreadsCT, // Swap GC thread's card tables. >> >> Do GC threads have card-table? > > Hmm, I thought I changed tat already just recently with Ivan's latest requests. Will fix. Oh, I only fixed the string. Apologies. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979761737 From tschatzl at openjdk.org Tue Mar 4 16:07:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 16:07:58 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 15:33:29 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> iwalulya review >> * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement >> * predicate for determining whether the refinement has been disabled >> * some other typos/comment improvements >> * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming > > src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 219: > >> 217: // The young gen revising mechanism reads the predictor and the values set >> 218: // here. Avoid inconsistencies by locking. >> 219: MutexLocker x(G1RareEvent_lock, Mutex::_no_safepoint_check_flag); > > Who else can be in this critical-section? I don't get what this lock is protecting us from. The concurrent refine control thread in `G1ConcurrentRefineThread::do_refinement`, when calling `G1Policy::record_dirtying_stats`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979759329 From tschatzl at openjdk.org Tue Mar 4 16:20:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 16:20:58 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 15:56:05 GMT, Thomas Schatzl wrote: > It's fast anyway. To clarify: If you have multiple refinement rounds between two garbage collections, the time to clear the young gen cards is almost noise compared to the actual refinement effort. Like two magnitudes faster. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979785011 From tschatzl at openjdk.org Tue Mar 4 16:34:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 16:34:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: <3LR5VKMhSuXWmMlphpe8SLHm8vQQt6j343qaO61S_mQ=.dc1d2e4a-c858-44bd-9da0-f3f98340d939@github.com> On Tue, 4 Mar 2025 16:04:00 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 219: >> >>> 217: // The young gen revising mechanism reads the predictor and the values set >>> 218: // here. Avoid inconsistencies by locking. >>> 219: MutexLocker x(G1RareEvent_lock, Mutex::_no_safepoint_check_flag); >> >> Who else can be in this critical-section? I don't get what this lock is protecting us from. > > The concurrent refine control thread in `G1ConcurrentRefineThread::do_refinement`, when calling `G1Policy::record_dirtying_stats`. I could create an extra mutex for that if you want to make it clear which two parties access the same data. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1979810144 From wkemper at openjdk.org Tue Mar 4 17:14:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 4 Mar 2025 17:14:54 GMT Subject: RFR: 8350898: Shenandoah: Eliminate final roots safepoint [v2] In-Reply-To: References: <5Lr95p3Uwv5w0n3YzDmALQc6KESs9xLnWdGm7p1IwGA=.3df358c6-f5d5-4f10-822d-5905429c050e@github.com> Message-ID: On Tue, 4 Mar 2025 14:52:23 GMT, Kelvin Nilsen wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots >> - Fix comments >> - Add whitespace at end of file >> - More detail for init update refs event message >> - Use timing tracker for timing verification >> - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots >> - WIP: Fix up phase timings for newly concurrent final roots and init update refs >> - WIP: Combine satb transfer with state propagation, restore phase timing data >> - WIP: Transfer pointers out of SATB with a handshake >> - WIP: Clear weak roots flag concurrently > > src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 458: > >> 456: >> 457: // Step 1. All threads need to 'complete' partially filled, thread local buffers. This >> 458: // is accomplished in ShenandoahConcurrentGC::complete_abbreviated_cycle using a Handshake > > I think we're talking about "complete processing" of thread-local satb buffers. To avoid confusion with tlab, maybe add satb to this comment. Yes, good point. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23830#discussion_r1979884800 From wkemper at openjdk.org Tue Mar 4 17:14:58 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 4 Mar 2025 17:14:58 GMT Subject: Integrated: 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them In-Reply-To: References: Message-ID: On Tue, 25 Feb 2025 01:38:14 GMT, William Kemper wrote: > The protocol which is meant to prevent regions from being uncommitted while their bitmaps are being reset may fail. This happens when the control thread attempts to wait for the uncommit thread to finish, but the uncommit thread has not yet indicated that it has started. > > ## Testing > GHA, Dacapo, Extremem, Heapothesys, Diluvian, SpecJBB2015, SpecJVM2008 (with and without stress flags, asserts). Also have run the JTREG test that failed this assertion over 10K times (and counting). This pull request has now been integrated. Changeset: fe806caa Author: William Kemper URL: https://git.openjdk.org/jdk/commit/fe806caa160b2d550db273af17dc08270f143819 Stats: 79 lines in 2 files changed: 41 ins; 24 del; 14 mod 8350605: assert(!heap->is_uncommit_in_progress()) failed: Cannot uncommit bitmaps while resetting them Reviewed-by: kdnilsen, ysr ------------- PR: https://git.openjdk.org/jdk/pull/23760 From wkemper at openjdk.org Tue Mar 4 17:18:37 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 4 Mar 2025 17:18:37 GMT Subject: RFR: 8350898: Shenandoah: Eliminate final roots safepoint [v3] In-Reply-To: References: Message-ID: > This PR converts the final roots safepoint operation into a handshake. The safepoint operation still exists, but is only executed when `ShenandoahVerify` is enabled. In addition to this change, this PR also improves the logging for the concurrent preparation for update references from [PR 22688](https://github.com/openjdk/jdk/pull/22688). William Kemper has updated the pull request incrementally with one additional commit since the last revision: Clarify which thread local buffers in comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23830/files - new: https://git.openjdk.org/jdk/pull/23830/files/0b2675af..390de7f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23830&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23830&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23830.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23830/head:pull/23830 PR: https://git.openjdk.org/jdk/pull/23830 From tschatzl at openjdk.org Tue Mar 4 17:20:28 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 4 Mar 2025 17:20:28 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v12] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review * renamings * refactorings ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/b4d19d9b..4a978118 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=10-11 Stats: 34 lines in 4 files changed: 13 ins; 1 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From wkemper at openjdk.org Tue Mar 4 18:40:55 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 4 Mar 2025 18:40:55 GMT Subject: RFR: 8351081: Off-by-one error in ShenandoahCardCluster In-Reply-To: References: Message-ID: <6todYj98wTBywpKJ8GkvakvJGoPiAvF2Gurs01Pq6t0=.8cfb3200-86a3-4289-91c4-5fdfdb7d82bb@github.com> On Tue, 4 Mar 2025 04:06:00 GMT, Cesar Soares Lucas wrote: > Given certain values for the variables in [this expression](https://github.com/openjdk/jdk/blob/a87dd1a75f78cf872df49bea83ba48af8acfa2fd/src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp#L173) the result of the computation can be equal to `_ rs->total_cards()` which will lead to segmentation fault, for instance in [starts_object(card_at_end)](https://github.com/openjdk/jdk/blob/a87dd1a75f78cf872df49bea83ba48af8acfa2fd/src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp#L393). The problem happens, though, because the `_object_starts` array doesn't have a [guarding entry](https://github.com/openjdk/jdk/blob/a87dd1a75f78cf872df49bea83ba48af8acfa2fd/src/hotspot/share/gc/shenandoah/shenandoahCardTable.cpp#L37) at the end. This pull request adjusts the allocation of `_object_starts` to include an additional entry at the end to account for this situation. > > Tested with JTREG tier 1-4, x86_64 & AArch64 on Linux. LGTM. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23882#pullrequestreview-2658615578 From duke at openjdk.org Tue Mar 4 19:18:59 2025 From: duke at openjdk.org (duke) Date: Tue, 4 Mar 2025 19:18:59 GMT Subject: RFR: 8351081: Off-by-one error in ShenandoahCardCluster In-Reply-To: References: Message-ID: <65Nau_mejcjgMsRM1Qli2hkyeEJlXGZxDExGV6vmWcQ=.84f05fff-f04c-4708-bb40-b974a99aff5e@github.com> On Tue, 4 Mar 2025 04:06:00 GMT, Cesar Soares Lucas wrote: > Given certain values for the variables in [this expression](https://github.com/openjdk/jdk/blob/a87dd1a75f78cf872df49bea83ba48af8acfa2fd/src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp#L173) the result of the computation can be equal to `_ rs->total_cards()` which will lead to segmentation fault, for instance in [starts_object(card_at_end)](https://github.com/openjdk/jdk/blob/a87dd1a75f78cf872df49bea83ba48af8acfa2fd/src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp#L393). The problem happens, though, because the `_object_starts` array doesn't have a [guarding entry](https://github.com/openjdk/jdk/blob/a87dd1a75f78cf872df49bea83ba48af8acfa2fd/src/hotspot/share/gc/shenandoah/shenandoahCardTable.cpp#L37) at the end. This pull request adjusts the allocation of `_object_starts` to include an additional entry at the end to account for this situation. > > Tested with JTREG tier 1-4, x86_64 & AArch64 on Linux. @JohnTortugo Your change (at version 9a4ac53343aaa62b055241f90bd6d610a483ed66) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23882#issuecomment-2698667853 From jsikstro at openjdk.org Tue Mar 4 20:17:28 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 4 Mar 2025 20:17:28 GMT Subject: RFR: 8351167: ZGC: Lazily initialize livemap [v2] In-Reply-To: <05cL5-IAaVEDpyTUxQ61JqzRzgM6myzEsIWt-xBLJRM=.ad94b7e8-50e0-4447-952d-995e143b5218@github.com> References: <05cL5-IAaVEDpyTUxQ61JqzRzgM6myzEsIWt-xBLJRM=.ad94b7e8-50e0-4447-952d-995e143b5218@github.com> Message-ID: > Memory for the bitmap inside the livemap of a ZPage is currently allocated upon calling its constructor, which adds a latency overhead when allocating pages. As preparation for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)), but also as a standalone improvement, we want to instead lazily initialize the livemap's bitmap. > > This patch holds off with allocating memory for the bitmap that the livemap uses until the livemap is written to the first time (i.e. by calling ZLiveMap::set). The effect of this is that the latency impact of allocating the bitmap will only be taken by GC threads and not by mutator threads, since only GC threads mark objects before pushing them onto the mark stack. This improvement will reduce page allocation latencies somewhat. > > In addition to lazily allocating the bitmap, I've converted the static C-style cast to a checked cast for `ZPage::object_max_count()`, which is passed as the size to the bitmaps. This is because a value not contained in 32 bits will overflow with the C-style cast and give a too small bitmap when passed to the livemap. This is not an observed issue, just more of a sanity check. > > Testing: > * Tiers 1-5 > * GHA Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Copyright years ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23907/files - new: https://git.openjdk.org/jdk/pull/23907/files/969cb3cb..c70b4bda Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23907&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23907&range=00-01 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23907/head:pull/23907 PR: https://git.openjdk.org/jdk/pull/23907 From xpeng at openjdk.org Tue Mar 4 21:26:03 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 4 Mar 2025 21:26:03 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained Message-ID: With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. ### Test - [x] hotspot_gc_shenandoah - [x] Tier 1 - [ ] Tier 2 ------------- Commit messages: - Revert unnecessary changes in ShenandoahReferenceProcessor - Revert the change in ShenandoahHeap::generation_for - touch up - If GC generation is young and referent is in old, make should_drop return false if old gen marking is not complete - Remove ShenandoahHeap::complete_marking_context() - Fix improper use of heap->complete_marking_context() - promotion in place and reference processor should be aware of heap generation when use complete marking context - JDK-8351091: initial works Changes: https://git.openjdk.org/jdk/pull/23886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23886&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351091 Stats: 61 lines in 17 files changed: 9 ins; 23 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/23886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23886/head:pull/23886 PR: https://git.openjdk.org/jdk/pull/23886 From wkemper at openjdk.org Tue Mar 4 21:26:04 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 4 Mar 2025 21:26:04 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:34:16 GMT, Xiaolong Peng wrote: > With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. > > This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] Tier 1 > - [ ] Tier 2 Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2837: > 2835: } else if (affiliation == OLD_GENERATION) { > 2836: return old_generation(); > 2837: } else if (affiliation == FREE) { I don't think it makes sense to connect `FREE` regions to the global generation in this way. Free regions are _not_ affiliated with any generation. I think in some of these cases where you want to find the mark context, it would be possible to take it from a `_generation` member variable. src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.cpp line 337: > 335: // If generation is young and referent is in old, marking context of the old > 336: // may or may not be complete, we can safely drop the reference when old gen mark is complete. > 337: if (_generation->is_young() && referent_region->is_old()) { Have you seen this happen? The reference processor for each generation is only supposed to discover references for which the referent is in the collected generation. See `ShenandoahReferenceProcessor::should_discover`: if (!heap->is_in_active_generation(referent)) { log_trace(gc,ref)("Referent outside of active generation: " PTR_FORMAT, p2i(referent)); return false; } ------------- PR Review: https://git.openjdk.org/jdk/pull/23886#pullrequestreview-2658463721 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1979938123 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1979932540 From xpeng at openjdk.org Tue Mar 4 21:26:04 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 4 Mar 2025 21:26:04 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained In-Reply-To: References: Message-ID: <5EhmY89ZN6u3AyeugsAf1wAVw7AxHU5HD0pfEmPZXZE=.a69c2802-cae9-479d-ab51-47cc69f85c4d@github.com> On Tue, 4 Mar 2025 17:48:58 GMT, William Kemper wrote: >> With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. >> >> This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] Tier 1 >> - [ ] Tier 2 > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2837: > >> 2835: } else if (affiliation == OLD_GENERATION) { >> 2836: return old_generation(); >> 2837: } else if (affiliation == FREE) { > > I don't think it makes sense to connect `FREE` regions to the global generation in this way. Free regions are _not_ affiliated with any generation. I think in some of these cases where you want to find the mark context, it would be possible to take it from a `_generation` member variable. Yeah, I don't think it is necessary to change the behavior here either, I'll remove it in later update. > src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.cpp line 337: > >> 335: // If generation is young and referent is in old, marking context of the old >> 336: // may or may not be complete, we can safely drop the reference when old gen mark is complete. >> 337: if (_generation->is_young() && referent_region->is_old()) { > > Have you seen this happen? The reference processor for each generation is only supposed to discover references for which the referent is in the collected generation. See `ShenandoahReferenceProcessor::should_discover`: > > if (!heap->is_in_active_generation(referent)) { > log_trace(gc,ref)("Referent outside of active generation: " PTR_FORMAT, p2i(referent)); > return false; > } Ok, I didn't see happen in any of the jtreg tests yet. Just base on the the behavior we saw in old gc, I assumed this could happen. Now I am more curious about the real cause of the crash caused by reference from old to young, since we always check if the referent is in the active generation, that shouldn't have happened if it works as described, my feeling is there might be something fishy in the place where we use `_active_generation`(the comments it should be update only in the STW phases), maybe should we should get rid of it, currently we directly use _gc_generation in many places as well, not sure it if is possible to cause inconsistency. I'll revert this part, I'll follow up on the questions in separate work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1979976173 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1980145388 From xpeng at openjdk.org Tue Mar 4 21:26:04 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 4 Mar 2025 21:26:04 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained In-Reply-To: <5EhmY89ZN6u3AyeugsAf1wAVw7AxHU5HD0pfEmPZXZE=.a69c2802-cae9-479d-ab51-47cc69f85c4d@github.com> References: <5EhmY89ZN6u3AyeugsAf1wAVw7AxHU5HD0pfEmPZXZE=.a69c2802-cae9-479d-ab51-47cc69f85c4d@github.com> Message-ID: <4_6n2QkucG-4itVGY9thZovsVDHqZFD_FbgFdBo5Fyg=.03fa388d-d65f-4ab2-b891-109de430fd2c@github.com> On Tue, 4 Mar 2025 18:14:58 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2837: >> >>> 2835: } else if (affiliation == OLD_GENERATION) { >>> 2836: return old_generation(); >>> 2837: } else if (affiliation == FREE) { >> >> I don't think it makes sense to connect `FREE` regions to the global generation in this way. Free regions are _not_ affiliated with any generation. I think in some of these cases where you want to find the mark context, it would be possible to take it from a `_generation` member variable. > > Yeah, I don't think it is necessary to change the behavior here either, I'll remove it in later update. I have removed the change. >> src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.cpp line 337: >> >>> 335: // If generation is young and referent is in old, marking context of the old >>> 336: // may or may not be complete, we can safely drop the reference when old gen mark is complete. >>> 337: if (_generation->is_young() && referent_region->is_old()) { >> >> Have you seen this happen? The reference processor for each generation is only supposed to discover references for which the referent is in the collected generation. See `ShenandoahReferenceProcessor::should_discover`: >> >> if (!heap->is_in_active_generation(referent)) { >> log_trace(gc,ref)("Referent outside of active generation: " PTR_FORMAT, p2i(referent)); >> return false; >> } > > Ok, I didn't see happen in any of the jtreg tests yet. > > Just base on the the behavior we saw in old gc, I assumed this could happen. Now I am more curious about the real cause of the crash caused by reference from old to young, since we always check if the referent is in the active generation, that shouldn't have happened if it works as described, my feeling is there might be something fishy in the place where we use `_active_generation`(the comments it should be update only in the STW phases), maybe should we should get rid of it, currently we directly use _gc_generation in many places as well, not sure it if is possible to cause inconsistency. > > I'll revert this part, I'll follow up on the questions in separate work. Reverted, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1980219344 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1980239633 From cslucas at openjdk.org Tue Mar 4 21:47:57 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 4 Mar 2025 21:47:57 GMT Subject: Integrated: 8351081: Off-by-one error in ShenandoahCardCluster In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 04:06:00 GMT, Cesar Soares Lucas wrote: > Given certain values for the variables in [this expression](https://github.com/openjdk/jdk/blob/a87dd1a75f78cf872df49bea83ba48af8acfa2fd/src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp#L173) the result of the computation can be equal to `_ rs->total_cards()` which will lead to segmentation fault, for instance in [starts_object(card_at_end)](https://github.com/openjdk/jdk/blob/a87dd1a75f78cf872df49bea83ba48af8acfa2fd/src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp#L393). The problem happens, though, because the `_object_starts` array doesn't have a [guarding entry](https://github.com/openjdk/jdk/blob/a87dd1a75f78cf872df49bea83ba48af8acfa2fd/src/hotspot/share/gc/shenandoah/shenandoahCardTable.cpp#L37) at the end. This pull request adjusts the allocation of `_object_starts` to include an additional entry at the end to account for this situation. > > Tested with JTREG tier 1-4, x86_64 & AArch64 on Linux. This pull request has now been integrated. Changeset: 38b4d46c Author: Cesar Soares Lucas Committer: William Kemper URL: https://git.openjdk.org/jdk/commit/38b4d46c1ff3701d75ff8347e5edbb01acd9b512 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8351081: Off-by-one error in ShenandoahCardCluster Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/jdk/pull/23882 From tschatzl at openjdk.org Wed Mar 5 09:45:00 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 5 Mar 2025 09:45:00 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v13] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * fix whitespace * additional whitespace between log tags * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/4a978118..a457e6e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=11-12 Stats: 116 lines in 6 files changed: 50 ins; 50 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From iwalulya at openjdk.org Wed Mar 5 11:12:58 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 5 Mar 2025 11:12:58 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v12] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 17:20:28 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > ayang review > * renamings > * refactorings src/hotspot/share/gc/g1/g1HeapRegion.hpp line 475: > 473: void hr_clear(bool clear_space); > 474: // Clear the card table corresponding to this region. > 475: void clear_cardtable(); in some places `cardtable()` has been refactored to `card_table` e.g. in G1HeapRegionManager. src/hotspot/share/gc/g1/g1ParScanThreadState.hpp line 67: > 65: > 66: size_t _num_marked_as_dirty_cards; > 67: size_t _num_marked_as_into_cset_cards; Suggestion: size_t _num_cards_marked_dirty; size_t _num_cards_marked_to_cset; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1980117641 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1980145229 From iwalulya at openjdk.org Wed Mar 5 11:12:56 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 5 Mar 2025 11:12:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v13] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 09:45:00 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * fix whitespace > * additional whitespace between log tags > * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename src/hotspot/share/gc/g1/c1/g1BarrierSetC1.cpp line 32: > 30: #include "gc/g1/g1HeapRegion.hpp" > 31: #include "gc/g1/g1ThreadLocalData.hpp" > 32: #include "utilities/macros.hpp" Suggestion: #include "utilities/formatBuffer.hpp" #include "utilities/macros.hpp" to use `err_msg` src/hotspot/share/gc/g1/g1RemSet.cpp line 90: > 88: // contiguous ranges of dirty cards to be scanned. These blocks are converted to actual > 89: // memory ranges and then passed on to actual scanning. > 90: class G1RemSetScanState : public CHeapObj { Need to update the comment above to remove reference to "log buffers" (L:67). src/hotspot/share/gc/g1/g1RemSet.hpp line 44: > 42: class CardTableBarrierSet; > 43: class G1AbstractSubTask; > 44: class G1RemSetScanState; Already declared on line 48 below src/hotspot/share/gc/g1/g1ThreadLocalData.hpp line 29: > 27: #include "gc/g1/g1BarrierSet.hpp" > 28: #include "gc/g1/g1CardTable.hpp" > 29: #include "gc/g1/g1CollectedHeap.hpp" probably does not need to be included ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1981138746 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1981162792 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1981118865 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1981142943 From ysr at openjdk.org Wed Mar 5 18:02:56 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 5 Mar 2025 18:02:56 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:34:16 GMT, Xiaolong Peng wrote: > With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. > > This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] Tier 1 > - [x] Tier 2 Had a few questions and comments inline. I'll take a closer look once you have responded to those. Thank you for finding this probably long-standing incorrectness/fuzziness and fixing it properly! src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1028: > 1026: > 1027: #ifdef ASSERT > 1028: ShenandoahMarkingContext* const ctx = _heap->marking_context(); Why not this instead? ShenandoahMarkingContext* const ctx = _heap->marking_context(r); src/hotspot/share/gc/shenandoah/shenandoahGeneration.hpp line 206: > 204: bool is_mark_complete() { return _is_marking_complete.is_set(); } > 205: virtual void set_mark_complete(); > 206: virtual void set_mark_incomplete(); Why are these declared virtual? src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 737: > 735: public: > 736: inline ShenandoahMarkingContext* complete_marking_context(ShenandoahHeapRegion* region) const; > 737: inline ShenandoahMarkingContext* marking_context() const; Should document semantics of both methods, please! src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 868: > 866: #ifdef ASSERT > 867: { > 868: // During full gc, heap->complete_marking_context() is not valid, may equal nullptr. Looks like this comment is obsolete? src/hotspot/share/gc/shenandoah/shenandoahMarkingContext.cpp line 103: > 101: > 102: bool ShenandoahMarkingContext::is_complete() { > 103: return ShenandoahHeap::heap()->global_generation()->is_mark_complete(); Do we need this? It seems wrong to me that even though each generation has its own marking context, we ask any marking context to report if that of the Global Generation is complete. I'd explicitly let generations maintain the state of completeness of their marking contexts, and for clients to query the generations for that state rather than having the individual marking contexts respond to that question. Where is this used after your changes? src/hotspot/share/gc/shenandoah/shenandoahMarkingContext.hpp line 88: > 86: bool is_bitmap_range_within_region_clear(const HeapWord* start, const HeapWord* end) const; > 87: > 88: bool is_complete(); Add a 1-line documentation comment for this method. src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.cpp line 337: > 335: // drop the reference. > 336: if (type == REF_PHANTOM) { > 337: return heap->complete_marking_context(referent_region)->is_marked(raw_referent); Doesn't the assert down at line 350 also need `complete_marking_context` ? Same at line 441. May be comb through all of these to determine which we need for proper assertion checking? I'd start by documenting the semantics of the APIs clearly. I am not completely clear on that yet (pun not intended :-) ------------- PR Review: https://git.openjdk.org/jdk/pull/23886#pullrequestreview-2659389355 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1980523168 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1980420417 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1980401312 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1980403403 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1980437298 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1980406186 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1981905245 From ysr at openjdk.org Wed Mar 5 18:02:57 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 5 Mar 2025 18:02:57 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 23:29:18 GMT, Y. Srinivas Ramakrishna wrote: >> With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. >> >> This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] Tier 1 >> - [x] Tier 2 > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.hpp line 206: > >> 204: bool is_mark_complete() { return _is_marking_complete.is_set(); } >> 205: virtual void set_mark_complete(); >> 206: virtual void set_mark_incomplete(); > > Why are these declared virtual? OK, I see that `ShenandoahGlobalGeneration` forces the state of `ShenandoahOdGeneration` and `ShenandoahYoungGeneration`, but is that our intention? I am seeing (see comment elsewhere) that we are always either using global generation's marking context explicitly, or using a region to index into the appropriate containing generation's marking context. If so, can we dispense with the forcing of global context's state into the contexts for the two generations? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1980429065 From shade at openjdk.org Wed Mar 5 18:32:50 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Mar 2025 18:32:50 GMT Subject: RFR: 8348278: Trim InitialRAMPercentage to improve startup in default modes [v2] In-Reply-To: References: Message-ID: <3IYkBn19J4N0BZbxvsn6GVIQyRLBn1320bnb_Kh45fA=.f8a97dd3-9b2a-442b-b943-fcdfac3ddc75@github.com> > See bug for discussion. This is the code change, which is simple. What is not simple is deciding what the new value should be. The change would probably require CSR, which I can file after we agree on the value. > > I think cutting to 0.2% of RAM size gets us into good sweet spot: > - On huge 1024G machine, this yields 2G initial heap > - On reasonably sized 128G machine, this gives 256M initial heap > - On smaller 1G container, this gives 2M initial heap > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Also man page - Merge branch 'master' into JDK-8348278-trim-iramp - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23262/files - new: https://git.openjdk.org/jdk/pull/23262/files/b05d2747..d3a327ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23262&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23262&range=00-01 Stats: 94859 lines in 4485 files changed: 45607 ins; 31232 del; 18020 mod Patch: https://git.openjdk.org/jdk/pull/23262.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23262/head:pull/23262 PR: https://git.openjdk.org/jdk/pull/23262 From shade at openjdk.org Wed Mar 5 18:42:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 5 Mar 2025 18:42:01 GMT Subject: RFR: 8348278: Trim InitialRAMPercentage to improve startup in default modes [v2] In-Reply-To: <3IYkBn19J4N0BZbxvsn6GVIQyRLBn1320bnb_Kh45fA=.f8a97dd3-9b2a-442b-b943-fcdfac3ddc75@github.com> References: <3IYkBn19J4N0BZbxvsn6GVIQyRLBn1320bnb_Kh45fA=.f8a97dd3-9b2a-442b-b943-fcdfac3ddc75@github.com> Message-ID: <3vTaEwAtqS8e99AdE9HVZZrcEvZzi-YR5lt3ORfWPL0=.37aaa7cf-9231-485a-aa29-2597555ca3d5@github.com> On Wed, 5 Mar 2025 18:32:50 GMT, Aleksey Shipilev wrote: >> See bug for discussion. This is the code change, which is simple. What is not simple is deciding what the new value should be. The change would probably require CSR, which I can file after we agree on the value. >> >> I think cutting to 0.2% of RAM size gets us into good sweet spot: >> - On huge 1024G machine, this yields 2G initial heap >> - On reasonably sized 128G machine, this gives 256M initial heap >> - On smaller 1G container, this gives 2M initial heap >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Also man page > - Merge branch 'master' into JDK-8348278-trim-iramp > - Fix I played with caps on reasonable initial heap size: diff --git a/src/hotspot/share/gc/shared/gc_globals.hpp b/src/hotspot/share/gc/shared/gc_globals.hpp index 924c45f137d..fd4d5fab5d7 100644 --- a/src/hotspot/share/gc/shared/gc_globals.hpp +++ b/src/hotspot/share/gc/shared/gc_globals.hpp @@ -292,6 +292,12 @@ "Percentage of real memory used for initial heap size") \ range(0.0, 100.0) \ \ + product(size_t, InitialRAMMin, 8*M, \ + "Min initial heap size chosen using InitialRAMPercentage") \ + \ + product(size_t, InitialRAMMax, 1024*M, \ + "Max initial heap size chosen using InitialRAMPercentage") \ + \ product(int, ActiveProcessorCount, -1, \ "Specify the CPU count the VM should use and report as active") \ \ diff --git a/src/hotspot/share/runtime/arguments.cpp b/src/hotspot/share/runtime/arguments.cpp index c6651c46e02..35d096f03d2 100644 --- a/src/hotspot/share/runtime/arguments.cpp +++ b/src/hotspot/share/runtime/arguments.cpp @@ -1592,6 +1592,8 @@ void Arguments::set_heap_size() { if (InitialHeapSize == 0) { julong reasonable_initial = (julong)(((double)phys_mem * InitialRAMPercentage) / 100); + reasonable_initial = MAX2(reasonable_initial, InitialRAMMin); + reasonable_initial = MIN2(reasonable_initial, InitialRAMMax); reasonable_initial = limit_heap_by_allocatable_memory(reasonable_initial); reasonable_initial = MAX3(reasonable_initial, reasonable_minimum, (julong)MinHeapSize); ...and the more I think about it, the less great it feels. To step back a bit: We initially (pun intended) wanted to trim IRAMP to cater for out of the box use cases. Most production environments I saw set up `-Xms`, explicitly setting the initial heap size. So whatever we do here is covering effectively a corner case, and so the solution should likely balance the complexity against that goal. Adding two more flags maybe passes the bar, but it still increases maintenance burden: we would need to document these tunables, do sanity checks for them, write tests, etc. The heap sizing heuristics is already quite complicated (it other places), so there is a cognitive complexity to consider here as well. Additional twist comes from obvious follow-up question: do we only apply these limits when user did _not_ override IRAMP, i.e. only in fully out-of-the-box mode? Or do we do this unconditionally (like in the patch above), so one can still override the auto-adjustable set-point within the limits? As a prospective user, I can reasonably argue both ways. This tells me this is another one of those snowballing complexity issues. Current IRAMP is not great because machines got larger over the years. Choosing a new heuristics in a future-proof manner is not easy. Even if we introduce two more tunables, we would still need to figure out what are the future-proof values for them. It is a guess without knowing the characteristics of "reasonable" hardware that does not really exist yet. So I would prefer to just trim a single tunable default to match current reality better, and move on. We can select a new value in ten years. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23262#issuecomment-2701763566 From xpeng at openjdk.org Wed Mar 5 19:11:30 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 5 Mar 2025 19:11:30 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v2] In-Reply-To: References: Message-ID: > With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. > > This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] Tier 1 > - [x] Tier 2 Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Always use active_generation()->complete_marking_context() during reference processing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23886/files - new: https://git.openjdk.org/jdk/pull/23886/files/01c6ea66..465deaec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23886&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23886&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/23886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23886/head:pull/23886 PR: https://git.openjdk.org/jdk/pull/23886 From jsikstro at openjdk.org Wed Mar 5 19:51:58 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 5 Mar 2025 19:51:58 GMT Subject: RFR: 8351137: ZGC: Improve ZValueStorage alignment support In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 08:34:36 GMT, Axel Boldt-Christmas wrote: > ZValueStorage only align the allocations to the alignment defined by the storage but ignores the alignment of the types. Right now all usages of our different storages all have types which have an alignment less than or equal to the alignment set by its storage. > > I wish to improve this so that types with greater alignment than the storage alignment can be used. > > The UB caused by using a type larger than the storage alignment is something I have seen materialise as returning bad address (and crashing) on Windows. > > As we use `utilities/align.hpp` for our alignment utilities we only support power of two alignment, I added extra asserts here because we use the fact that `lcm(x, y) = max(x, y)` if both are powers of two. > > Testing: > * tier 1 through tier 5 Oracle supported platforms > * GHA Marked as reviewed by jsikstro (Committer). Looks very good! ------------- PR Review: https://git.openjdk.org/jdk/pull/23887#pullrequestreview-2662333450 PR Comment: https://git.openjdk.org/jdk/pull/23887#issuecomment-2701914039 From xpeng at openjdk.org Wed Mar 5 21:13:53 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 5 Mar 2025 21:13:53 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v2] In-Reply-To: References: Message-ID: <18_o9YLk3Ri0MTJscSSdp1Mg1C8c_cLUjoRfnxGL2e4=.ab8937c4-7ba2-4a67-8ecf-248f1c6f5545@github.com> On Wed, 5 Mar 2025 17:59:36 GMT, Y. Srinivas Ramakrishna wrote: > Had a few questions and comments inline. I'll take a closer look once you have responded to those. > > Thank you for finding this probably long-standing incorrectness/fuzziness and fixing it properly! Thanks, I'll update PR to address your comments. > src/hotspot/share/gc/shenandoah/shenandoahMarkingContext.cpp line 103: > >> 101: >> 102: bool ShenandoahMarkingContext::is_complete() { >> 103: return ShenandoahHeap::heap()->global_generation()->is_mark_complete(); > > Do we need this? It seems wrong to me that even though each generation has its own marking context, we ask any marking context to report if that of the Global Generation is complete. I'd explicitly let generations maintain the state of completeness of their marking contexts, and for clients to query the generations for that state rather than having the individual marking contexts respond to that question. > > Where is this used after your changes? It may not be needed anymore, I will double check the usage and remove it is not used at all. > src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.cpp line 337: > >> 335: // drop the reference. >> 336: if (type == REF_PHANTOM) { >> 337: return heap->complete_marking_context(referent_region)->is_marked(raw_referent); > > Doesn't the assert down at line 350 also need `complete_marking_context` ? Same at line 441. May be comb through all of these to determine which we need for proper assertion checking? > > I'd start by documenting the semantics of the APIs clearly. I am not completely clear on that yet (pun not intended :-) Yes, the assert at line 350 should use complete_marking_context, I have update it in the fix of the issue we found in stress test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23886#issuecomment-2702079928 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1982190515 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1982188076 From xpeng at openjdk.org Wed Mar 5 21:50:09 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 5 Mar 2025 21:50:09 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v2] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 23:11:20 GMT, Y. Srinivas Ramakrishna wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Always use active_generation()->complete_marking_context() during reference processing > > src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 737: > >> 735: public: >> 736: inline ShenandoahMarkingContext* complete_marking_context(ShenandoahHeapRegion* region) const; >> 737: inline ShenandoahMarkingContext* marking_context() const; > > Should document semantics of both methods, please! I'll add some comments for both. Also I'm feel the assert is not enough, I feel we should change the `assert` in complete_marking_context to `guarantee`, should be something like: guarantee(is_mark_complete(), "Marking must be completed."); return ShenandoahHeap::heap()->marking_context(); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1982236158 From xpeng at openjdk.org Wed Mar 5 21:54:08 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 5 Mar 2025 21:54:08 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v3] In-Reply-To: References: Message-ID: <1fKMcwPJFREZry2kJf0Vv3DoY5G4xzbdVJcK4It9hyo=.9a38f089-86c6-4fc9-abeb-a807284be822@github.com> > With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. > > This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] Tier 1 > - [x] Tier 2 Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Remove obsolete code comments - Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23886/files - new: https://git.openjdk.org/jdk/pull/23886/files/465deaec..c78f66ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23886&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23886&range=01-02 Stats: 9 lines in 4 files changed: 2 ins; 7 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23886/head:pull/23886 PR: https://git.openjdk.org/jdk/pull/23886 From xpeng at openjdk.org Wed Mar 5 22:02:02 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 5 Mar 2025 22:02:02 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v3] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 01:33:26 GMT, Y. Srinivas Ramakrishna wrote: >> Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove obsolete code comments >> - Address review comments > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1028: > >> 1026: >> 1027: #ifdef ASSERT >> 1028: ShenandoahMarkingContext* const ctx = _heap->marking_context(); > > Why not this instead? > > ShenandoahMarkingContext* const ctx = _heap->marking_context(r); Technically there is only one global marking context for Shenandoah, even in generational mode, passing the region to marking_context doesn't make any difference. But in the method `complete_marking_context(r)`, it checks if the affiliated generation has complete marking, it is a more convenient version of `complete_marking_context(affiliation)`. > src/hotspot/share/gc/shenandoah/shenandoahMarkingContext.hpp line 88: > >> 86: bool is_bitmap_range_within_region_clear(const HeapWord* start, const HeapWord* end) const; >> 87: >> 88: bool is_complete(); > > Add a 1-line documentation comment for this method. is_complete is not used in any place, I removed it in the new version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1982247904 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1982248805 From stefank at openjdk.org Thu Mar 6 15:01:59 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 6 Mar 2025 15:01:59 GMT Subject: RFR: 8350572: ZGC: Enhance z_verify_safepoints_are_blocked interactions with VMError In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 11:15:52 GMT, Axel Boldt-Christmas wrote: > If VMError reporting is triggered from a disallowed thread state `z_verify_safepoints_are_blocked` will cause reentrant assertions to be triggered, when for example when loading the thread oop as part of thread printing. This extends the verification to be ignored if triggered from the thread doing the error reporting. In most cases performing the load barriers from disallowed thread states during error reporting will work. > > Testing: > - tier 1 Oracle supported platforms > - GHA Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23820#pullrequestreview-2664717938 From stefank at openjdk.org Thu Mar 6 15:12:57 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 6 Mar 2025 15:12:57 GMT Subject: RFR: 8350851: ZGC: Reduce size of ZAddressOffsetMax scaling data structures In-Reply-To: References: Message-ID: On Fri, 28 Feb 2025 18:08:57 GMT, Abdelhak Zaaim wrote: >> ZAddressOffsetMax is used to scale a few of our BitMap and GranuleMap data structures. ZAddressOffsetMax is initialised to an upper limit, prior to reserving the virtual address space for the heap. After the reservation, the largest address offset that can be encountered may be much lower. >> >> I propose we scale ZAddressOffsetMax down after our heap reservation is complete, to the actual max value an zoffset_end is allowed to be. >> >> Doing this gives us two benefits. Firstly the assertions and type checks will be stricter, and will exercise code paths that otherwise only occur when using a 16TB heap. Secondly we can reduce the size of the data structures which scale with ZAddressOffsetMax. (For most OSs the extra memory of these data structures do not really matter as they are not page'd in. But they are accounted for both on the OS, allocator and NMT layers). >> >> The page table, uses ZIndexDistributor to iterate and distribute indices. The different strategies have different requirements on the alignment of the size of the range it distribute across. My proposed implementation simply aligns up the page table size to this alignment requirement. As it is the least intrusive change, at the cost of some larger data structure than strictly required. The alternative would be to extend ZIndexDistributor with support for any alignment on the range, or condition the use of the distributed indices based on if they are less than the size. >> >> The data structures can also be larger than required if we fail to reserve the heap starting at our heap base. However this is a very rare occurrence, and while it would be nice to extend our asserts to check for a "ZAddressOffsetMin", I'll leave that for a future enhancement. >> >> Testing: >> * ZGC specific tasks, tier 1 through tier 8 on Oracle Supported platforms >> * with `ZIndexDistributorStrategy=0`, and >> * with `ZIndexDistributorStrategy=1` >> * GHA > > Marked as reviewed by abdelhak-zaaim at github.com (no known OpenJDK username). I keep seeing reviews from @abdelhak-zaaim in various non-trivial changes in various areas of OpenJDK. Could you introduce yourself so that we know more about who you are? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23822#issuecomment-2704126820 From stefank at openjdk.org Thu Mar 6 15:33:59 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 6 Mar 2025 15:33:59 GMT Subject: RFR: 8333578: Fix uses of overaligned types induced by ZCACHE_ALIGNED In-Reply-To: References: Message-ID: <6d8LMxL6w_3iajfsCf5c7dU9P2knBCs8DEMIsLInoro=.da23a426-963b-4834-943b-c909fab51754@github.com> On Tue, 4 Mar 2025 07:49:23 GMT, Axel Boldt-Christmas wrote: > The only directly heap allocated, constructed object of types that are overaligned because of ZCACHE_ALIGNED is ZCollectedHeap. The other are either in static storage or contained in (and constructed as part of) ZCollectedHeap. So we only need to fix ZCollectedHeap allocation. > > As the CollectedHeap is only ever created once and is never destroyed, we can simply align the allocation and create an unfreeable pointer. > > This implementation imposes that `ZCacheLineSize` is a power of two, but we already have this requirement elsewhere (e.g. `ZContendedStorage`). > > Testing: > * tier 1 through tier 5 Oracle supported platforms > * GHA Marked as reviewed by stefank (Reviewer). src/hotspot/share/gc/z/zArguments.cpp line 223: > 221: > 222: CollectedHeap* ZArguments::create_heap() { > 223: // ZCollectedHeap has an alignment >= ZCacheLineSize, which may be larger than The `ZCollectedHeap has an alignment >= ZCacheLineSize,` had me reading the sentence a couple of times. I think it would be easier to read if this used words instead of >=. src/hotspot/share/gc/z/zArguments.cpp line 225: > 223: // ZCollectedHeap has an alignment >= ZCacheLineSize, which may be larger than > 224: // std::max_align_t. Instead of using operator new, align the storage manually > 225: // and construct the ZCollectedHeap using operator placement new Suggestion: // and construct the ZCollectedHeap using operator placement new. ------------- PR Review: https://git.openjdk.org/jdk/pull/23885#pullrequestreview-2664809137 PR Review Comment: https://git.openjdk.org/jdk/pull/23885#discussion_r1983571395 PR Review Comment: https://git.openjdk.org/jdk/pull/23885#discussion_r1983568901 From tschatzl at openjdk.org Thu Mar 6 15:39:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 6 Mar 2025 15:39:57 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v11] In-Reply-To: References: Message-ID: <4um7PHAs89PIoa3QgbkPx-8Jx9vHiYr7afFQGOtFTY8=.f1ca8bad-0827-4f8c-852d-0fc82ffd546a@github.com> On Tue, 4 Mar 2025 15:33:29 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> iwalulya review >> * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement >> * predicate for determining whether the refinement has been disabled >> * some other typos/comment improvements >> * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming > > src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 219: > >> 217: // The young gen revising mechanism reads the predictor and the values set >> 218: // here. Avoid inconsistencies by locking. >> 219: MutexLocker x(G1RareEvent_lock, Mutex::_no_safepoint_check_flag); > > Who else can be in this critical-section? I don't get what this lock is protecting us from. Actually further discussion with @albertnetymk showed that this change introduces an unintended behaviorial change where since the refinement control thread is also responsible for updating the current young gen length. It means that the mutex isn't required. However this means that while the refinement is running this is not done any more, because refinement can take seconds, I need to move this work to another thread (probably the `G1ServiceThread?). I will add a separate mutex then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1983587293 From tschatzl at openjdk.org Thu Mar 6 16:13:02 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 6 Mar 2025 16:13:02 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v13] In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 10:41:02 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * fix whitespace >> * additional whitespace between log tags >> * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename > > src/hotspot/share/gc/g1/g1ThreadLocalData.hpp line 29: > >> 27: #include "gc/g1/g1BarrierSet.hpp" >> 28: #include "gc/g1/g1CardTable.hpp" >> 29: #include "gc/g1/g1CollectedHeap.hpp" > > probably does not need to be included `g1CardTable.hpp` needed because of `G1CardTable::CardValue` I think. I removed the 'G1CollectedHeap` include though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1983655594 From tschatzl at openjdk.org Thu Mar 6 16:26:31 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 6 Mar 2025 16:26:31 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v14] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * iwalulya review * renaming * fix some includes, forward declaration ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/a457e6e7..350a4fa3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=12-13 Stats: 31 lines in 13 files changed: 1 ins; 2 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From wkemper at openjdk.org Thu Mar 6 17:59:00 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Mar 2025 17:59:00 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v3] In-Reply-To: <1fKMcwPJFREZry2kJf0Vv3DoY5G4xzbdVJcK4It9hyo=.9a38f089-86c6-4fc9-abeb-a807284be822@github.com> References: <1fKMcwPJFREZry2kJf0Vv3DoY5G4xzbdVJcK4It9hyo=.9a38f089-86c6-4fc9-abeb-a807284be822@github.com> Message-ID: On Wed, 5 Mar 2025 21:54:08 GMT, Xiaolong Peng wrote: >> With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. >> >> This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] Tier 1 >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: > > - Remove obsolete code comments > - Address review comments If we always get the complete marking context directly through the generation, we can delete `ShenandoahHeap::complete_marking_context`. src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 123: > 121: #ifdef ASSERT > 122: bool reg_live = region->has_live(); > 123: bool bm_live = heap->complete_marking_context(region)->is_marked(cast_to_oop(region->bottom())); Could also use `heap->active_generation()->complete_marking_context()` here. src/hotspot/share/gc/shenandoah/shenandoahGenerationalEvacuationTask.cpp line 172: > 170: // contained herein. > 171: void ShenandoahGenerationalEvacuationTask::promote_in_place(ShenandoahHeapRegion* region) { > 172: ShenandoahMarkingContext* const marking_context = _heap->complete_marking_context(region); We shouldn't need to look up the generation for this region. It's being promoted so it must be young (in fact, this asserted a few lines down). Perhaps: assert(_heap->young_generation()->is_mark_completed(), "Cannot promote without complete marking for young"); ShenandoahMarkingContext* const marking_context = _heap->marking_context(); ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23886#pullrequestreview-2665222915 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1983818301 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1983812569 From wkemper at openjdk.org Thu Mar 6 17:59:01 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Mar 2025 17:59:01 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v3] In-Reply-To: References: <1fKMcwPJFREZry2kJf0Vv3DoY5G4xzbdVJcK4It9hyo=.9a38f089-86c6-4fc9-abeb-a807284be822@github.com> Message-ID: On Thu, 6 Mar 2025 17:49:35 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove obsolete code comments >> - Address review comments > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalEvacuationTask.cpp line 172: > >> 170: // contained herein. >> 171: void ShenandoahGenerationalEvacuationTask::promote_in_place(ShenandoahHeapRegion* region) { >> 172: ShenandoahMarkingContext* const marking_context = _heap->complete_marking_context(region); > > We shouldn't need to look up the generation for this region. It's being promoted so it must be young (in fact, this asserted a few lines down). Perhaps: > > assert(_heap->young_generation()->is_mark_completed(), "Cannot promote without complete marking for young"); > ShenandoahMarkingContext* const marking_context = _heap->marking_context(); or `_heap->young_generation()->complete_marking_context()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1983821706 From xpeng at openjdk.org Thu Mar 6 18:29:59 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Mar 2025 18:29:59 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v3] In-Reply-To: References: <1fKMcwPJFREZry2kJf0Vv3DoY5G4xzbdVJcK4It9hyo=.9a38f089-86c6-4fc9-abeb-a807284be822@github.com> Message-ID: <4q52xc9nKJWFe63AT5i4InyJuRu6pTPahZYmmWTJia4=.f7be6d2d-2082-4644-b6e9-dff343b20cdf@github.com> On Thu, 6 Mar 2025 17:55:53 GMT, William Kemper wrote: > If we always get the complete marking context directly through the generation, we can delete `ShenandoahHeap::complete_marking_context`. True, we don't really need it anymore, I'll update the PR and remove it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23886#issuecomment-2704629400 From xpeng at openjdk.org Thu Mar 6 18:30:00 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Mar 2025 18:30:00 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v3] In-Reply-To: References: <1fKMcwPJFREZry2kJf0Vv3DoY5G4xzbdVJcK4It9hyo=.9a38f089-86c6-4fc9-abeb-a807284be822@github.com> Message-ID: On Thu, 6 Mar 2025 17:56:31 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGenerationalEvacuationTask.cpp line 172: >> >>> 170: // contained herein. >>> 171: void ShenandoahGenerationalEvacuationTask::promote_in_place(ShenandoahHeapRegion* region) { >>> 172: ShenandoahMarkingContext* const marking_context = _heap->complete_marking_context(region); >> >> We shouldn't need to look up the generation for this region. It's being promoted so it must be young (in fact, this asserted a few lines down). Perhaps: >> >> assert(_heap->young_generation()->is_mark_completed(), "Cannot promote without complete marking for young"); >> ShenandoahMarkingContext* const marking_context = _heap->marking_context(); > > or `_heap->young_generation()->complete_marking_context()`. I think `_heap->young_generation()->complete_marking_context()` is better here, I'll update it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1983859657 From xpeng at openjdk.org Thu Mar 6 18:34:43 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Mar 2025 18:34:43 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: Message-ID: > With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. > > This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] Tier 1 > - [x] Tier 2 Xiaolong Peng has updated the pull request incrementally with three additional commits since the last revision: - Remove ShenandoahHeap::complete_marking_context(ShenandoahHeapRegion* region) - Revert "complete_marking_context should guarantee mark is complete" This reverts commit 2004973965ea0e617cf9e5fc45be24f0e06e90a1. - complete_marking_context should guarantee mark is complete ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23886/files - new: https://git.openjdk.org/jdk/pull/23886/files/c78f66ee..952f7ea5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23886&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23886&range=02-03 Stats: 9 lines in 5 files changed: 0 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23886/head:pull/23886 PR: https://git.openjdk.org/jdk/pull/23886 From wkemper at openjdk.org Thu Mar 6 18:47:58 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 6 Mar 2025 18:47:58 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 18:34:43 GMT, Xiaolong Peng wrote: >> With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. >> >> This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] Tier 1 >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with three additional commits since the last revision: > > - Remove ShenandoahHeap::complete_marking_context(ShenandoahHeapRegion* region) > - Revert "complete_marking_context should guarantee mark is complete" > > This reverts commit 2004973965ea0e617cf9e5fc45be24f0e06e90a1. > - complete_marking_context should guarantee mark is complete Thanks for cleaning this up. ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23886#pullrequestreview-2665341497 From xpeng at openjdk.org Thu Mar 6 20:23:00 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Mar 2025 20:23:00 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 23:38:47 GMT, Y. Srinivas Ramakrishna wrote: > If so, can we dispense with the forcing of global context's state into the contexts for the two generations? I think we can do that if we deprecated the classical mode and only support generational Shenandoah, in classical mode, there is only global generation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1983998709 From ysr at openjdk.org Thu Mar 6 22:57:07 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 6 Mar 2025 22:57:07 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 18:34:43 GMT, Xiaolong Peng wrote: >> With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. >> >> This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. >> >> ### Test >> - [x] hotspot_gc_shenandoah >> - [x] Tier 1 >> - [x] Tier 2 > > Xiaolong Peng has updated the pull request incrementally with three additional commits since the last revision: > > - Remove ShenandoahHeap::complete_marking_context(ShenandoahHeapRegion* region) > - Revert "complete_marking_context should guarantee mark is complete" > > This reverts commit 2004973965ea0e617cf9e5fc45be24f0e06e90a1. > - complete_marking_context should guarantee mark is complete A few more comments, mostly pertaining to global gen's "complete" marking context semantics and usage, as well as `SH::[*_]marking_context` delegating to its `active_generation()`'s method. This should be my last round of comments. Thank you for your patience... src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 123: > 121: #ifdef ASSERT > 122: bool reg_live = region->has_live(); > 123: bool bm_live = heap->active_generation()->complete_marking_context()->is_marked(cast_to_oop(region->bottom())); Apropos of another comment, if we really want to keep a delegating method in `ShenandoahHeap`, why not use `heap->complete_marking_context()` as a synonym for `heap->active_generation()->complete_marking_context()` ? src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 1: > 1: /* This all looks good. One thing to think about in general about assertions in closures is whether instead of making use of knowledge of the context in which these closures are used, whether it may produce more mantianable code to embed the "active_generation" to which the closure is being applied in the closure itself and have the assertions (or other uses of context) use that instead. Nothing to be done now, but something to think about in making more maintainable code. src/hotspot/share/gc/shenandoah/shenandoahGenerationalEvacuationTask.cpp line 172: > 170: // contained herein. > 171: void ShenandoahGenerationalEvacuationTask::promote_in_place(ShenandoahHeapRegion* region) { > 172: ShenandoahMarkingContext* const marking_context = _heap->young_generation()->complete_marking_context(); For clarity, you might assert the following before line 172: assert(gc_generation() == _heap->young_generation(), "Sanity check"); Even though it might seem somewhat tautological. src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1: > 1: /* See previous suggestion/comments on `SH::[complete_]marking_context()` as delegating to that method of its `gc_generation()`. What you have here sounds fine too, but a uniform usage of either keeping `SH::[complete_]marking_context()` or not at all makes more sense to me, and seems cleaner to me. src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1283: > 1281: if (_heap->gc_generation()->is_global()) { > 1282: return _heap->marking_context(); > 1283: } Not sure I understand the point of this change in behavior. What purpose does a partial marking context serve? Why not just leave the behavior as was before and return a non-null marking context only when marking is complete and null otherwise. When the client uses the context, it does so to skip over unmarked objects (which are dead if marking is complete), which might end up being too weak if we are still in the midst of marking. I realize that you may not be maintaining a global mark completion so you are returning the marking context irrespective of the state of completion of the marking, but I wonder if that is really the bahavior you want. I would rather, as necessary, we maintain a flag for completion of global marking for the case where we are doing a global gc? ------------- PR Review: https://git.openjdk.org/jdk/pull/23886#pullrequestreview-2665674435 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984134275 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984075136 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984115881 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984140131 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984108547 From ysr at openjdk.org Thu Mar 6 22:57:09 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 6 Mar 2025 22:57:09 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: Message-ID: <7yfWKXewUM1XqWtlnyuPV3nu9bGr5VNJXuXi1aNQGvQ=.4c53d85b-13f3-4bfc-87c3-634d547bb440@github.com> On Wed, 5 Mar 2025 21:58:03 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1028: >> >>> 1026: >>> 1027: #ifdef ASSERT >>> 1028: ShenandoahMarkingContext* const ctx = _heap->marking_context(); >> >> Why not this instead? >> >> ShenandoahMarkingContext* const ctx = _heap->marking_context(r); > > Technically there is only one global marking context for Shenandoah, even in generational mode, passing the region to marking_context doesn't make any difference. > > But in the method `complete_marking_context(r)`, it checks if the affiliated generation has complete marking, it is a more convenient version of `complete_marking_context(affiliation)`. OK, yes, that makes sense. Why not then use both `ShenandoahHeap::[complete_]marking_context()` as synonyms for `ShehandoahHeap::active_generation()->[complete_]marking_context()`. See other related comments in this review round. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984136546 From ysr at openjdk.org Thu Mar 6 22:57:09 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 6 Mar 2025 22:57:09 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: Message-ID: <8w22oUPhZEx0iEIeNQ-GUUjx8jNkjXrTHjfjN_sX4HE=.2c391dd5-227e-4755-ba4d-528a7dcefca3@github.com> On Thu, 6 Mar 2025 20:20:17 GMT, Xiaolong Peng wrote: >> OK, I see that `ShenandoahGlobalGeneration` forces the state of `ShenandoahOdGeneration` and `ShenandoahYoungGeneration`, but is that our intention? I am seeing (see comment elsewhere) that we are always either using global generation's marking context explicitly, or using a region to index into the appropriate containing generation's marking context. If so, can we dispense with the forcing of global context's state into the contexts for the two generations? > >> If so, can we dispense with the forcing of global context's state into the contexts for the two generations? > > I think we can do that if we deprecated the classical mode and only support generational Shenandoah, in classical mode, there is only global generation. I am not sure I follow. In the legacy (non-generational mode) we shouldn't care about the marking state of the old and young generations, just that of the GlobalGeneration. In the generational case, we explicitly track the marking state of the old and young generations explicitly. It sounds to me as if forcing the Old and Young marking states to the state of that of the GlobalGeneration must be exactly for the case where we are using Generational Shenandoah, and we are doing a Global collection? Indeed: void ShenandoahGlobalGeneration::set_mark_complete() { ShenandoahGeneration::set_mark_complete(); if (ShenandoahHeap::heap()->mode()->is_generational()) { ShenandoahGenerationalHeap* heap = ShenandoahGenerationalHeap::heap(); heap->young_generation()->set_mark_complete(); heap->old_generation()->set_mark_complete(); } } I am saying that each of Old, Young, and Global generations maintain their own mark completion state and use that to determine what they pass back in response to `complete_marking_context()`. This completely localizes all state rather than unnecessarily and confusingly coupling the states of these generations. So, you remove the part in the `if` branch in the code above, which reduces to the default (or rather only) implementation in the base class, not requiring the over-ride of the Global generation's method for the generational case. void ShenandoahGeneration::set_mark_complete() { _is_marking_complete.set(); } It is possible that I am still missing the actual structure here that requires the override for GlobalGeneration for the generational case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984121357 From xpeng at openjdk.org Thu Mar 6 23:12:53 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Mar 2025 23:12:53 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: <7yfWKXewUM1XqWtlnyuPV3nu9bGr5VNJXuXi1aNQGvQ=.4c53d85b-13f3-4bfc-87c3-634d547bb440@github.com> References: <7yfWKXewUM1XqWtlnyuPV3nu9bGr5VNJXuXi1aNQGvQ=.4c53d85b-13f3-4bfc-87c3-634d547bb440@github.com> Message-ID: On Thu, 6 Mar 2025 22:27:59 GMT, Y. Srinivas Ramakrishna wrote: >> Technically there is only one global marking context for Shenandoah, even in generational mode, passing the region to marking_context doesn't make any difference. >> >> But in the method `complete_marking_context(r)`, it checks if the affiliated generation has complete marking, it is a more convenient version of `complete_marking_context(affiliation)`. > > OK, yes, that makes sense. Why not then use both `ShenandoahHeap::[complete_]marking_context()` as synonyms for `ShehandoahHeap::active_generation()->[complete_]marking_context()`. See other related comments in this review round. I feel using `henandoahHeap::complete_marking_context()` as synonyms for `ShehandoahHeap::active_generation()->[complete_]marking_context()` may cause more confusion, just read from the name it seems that it indicates the marking is complete for the whole heap, not just the active generation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984170738 From xpeng at openjdk.org Thu Mar 6 23:29:55 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Mar 2025 23:29:55 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: <8w22oUPhZEx0iEIeNQ-GUUjx8jNkjXrTHjfjN_sX4HE=.2c391dd5-227e-4755-ba4d-528a7dcefca3@github.com> References: <8w22oUPhZEx0iEIeNQ-GUUjx8jNkjXrTHjfjN_sX4HE=.2c391dd5-227e-4755-ba4d-528a7dcefca3@github.com> Message-ID: On Thu, 6 Mar 2025 22:11:26 GMT, Y. Srinivas Ramakrishna wrote: >>> If so, can we dispense with the forcing of global context's state into the contexts for the two generations? >> >> I think we can do that if we deprecated the classical mode and only support generational Shenandoah, in classical mode, there is only global generation. > > I am not sure I follow. In the legacy (non-generational mode) we shouldn't care about the marking state of the old and young generations, just that of the GlobalGeneration. In the generational case, we explicitly track the marking state of the old and young generations explicitly. It sounds to me as if forcing the Old and Young marking states to the state of that of the GlobalGeneration must be exactly for the case where we are using Generational Shenandoah, and we are doing a Global collection? Indeed: > > > void ShenandoahGlobalGeneration::set_mark_complete() { > ShenandoahGeneration::set_mark_complete(); > if (ShenandoahHeap::heap()->mode()->is_generational()) { > ShenandoahGenerationalHeap* heap = ShenandoahGenerationalHeap::heap(); > heap->young_generation()->set_mark_complete(); > heap->old_generation()->set_mark_complete(); > } > } > > > I am saying that each of Old, Young, and Global generations maintain their own mark completion state and use that to determine what they pass back in response to `complete_marking_context()`. This completely localizes all state rather than unnecessarily and confusingly coupling the states of these generations. > > So, you remove the part in the `if` branch in the code above, which reduces to the default (or rather only) implementation in the base class, not requiring the over-ride of the Global generation's method for the generational case. > > > void ShenandoahGeneration::set_mark_complete() { > _is_marking_complete.set(); > } > > > It is possible that I am still missing the actual structure here that requires the override for GlobalGeneration for the generational case. Sorry I misunderstood your original proposal, I thought you meant to suggest to remove the flag from ShenandoahGlobalGeneration, instead the set_mark_complete/is_mark_complete will more like view/delegation layer like: void ShenandoahGlobalGeneration::set_mark_complete() { ShenandoahGenerationalHeap* heap = ShenandoahGenerationalHeap::heap(); heap->young_generation()->set_mark_complete(); heap->old_generation()->set_mark_complete(); } bool ShenandoahGlobalGeneration::is_mark_complete() { ShenandoahGenerationalHeap* heap = ShenandoahGenerationalHeap::heap(); return heap->young_generation()->is_mark_complete() && heap->old_generation()->is_mark_complete(); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984182699 From xpeng at openjdk.org Thu Mar 6 23:36:55 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Mar 2025 23:36:55 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: Message-ID: <9nhUQ5sIaBFGlhEh-w5J-TAQMAbp3dWUiSRfMRoK2rY=.9fd2e8bc-6a53-4385-9e7b-1b0d36a91a8d@github.com> On Thu, 6 Mar 2025 22:05:31 GMT, Y. Srinivas Ramakrishna wrote: >> Xiaolong Peng has updated the pull request incrementally with three additional commits since the last revision: >> >> - Remove ShenandoahHeap::complete_marking_context(ShenandoahHeapRegion* region) >> - Revert "complete_marking_context should guarantee mark is complete" >> >> This reverts commit 2004973965ea0e617cf9e5fc45be24f0e06e90a1. >> - complete_marking_context should guarantee mark is complete > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalEvacuationTask.cpp line 172: > >> 170: // contained herein. >> 171: void ShenandoahGenerationalEvacuationTask::promote_in_place(ShenandoahHeapRegion* region) { >> 172: ShenandoahMarkingContext* const marking_context = _heap->young_generation()->complete_marking_context(); > > For clarity, you might assert the following before line 172: > > assert(gc_generation() == _heap->young_generation(), "Sanity check"); > > > Even though it might seem somewhat tautological. Thanks, I'll add it. > src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1283: > >> 1281: if (_heap->gc_generation()->is_global()) { >> 1282: return _heap->marking_context(); >> 1283: } > > Not sure I understand the point of this change in behavior. What purpose does a partial marking context serve? Why not just leave the behavior as was before and return a non-null marking context only when marking is complete and null otherwise. When the client uses the context, it does so to skip over unmarked objects (which are dead if marking is complete), which might end up being too weak if we are still in the midst of marking. > > I realize that you may not be maintaining a global mark completion so you are returning the marking context irrespective of the state of completion of the marking, but I wonder if that is really the bahavior you want. I would rather, as necessary, we maintain a flag for completion of global marking for the case where we are doing a global gc? It is confusing that it looks like a behavior change, but actually there is no behavior change in this method, all the change here is to make the behavior of this method to be exactly same a before. The old impl always return the the marking context, regardless the completeness status of marking, because the old `_heap->complete_marking_context()` always return w/o assert error due to inaccurate completeness marking status in the marking context, we are fixing the issue in this PR which breaks the old impl of this method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984188328 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984188121 From xpeng at openjdk.org Thu Mar 6 23:48:53 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Mar 2025 23:48:53 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: <8w22oUPhZEx0iEIeNQ-GUUjx8jNkjXrTHjfjN_sX4HE=.2c391dd5-227e-4755-ba4d-528a7dcefca3@github.com> Message-ID: On Thu, 6 Mar 2025 23:26:16 GMT, Xiaolong Peng wrote: >> I am not sure I follow. In the legacy (non-generational mode) we shouldn't care about the marking state of the old and young generations, just that of the GlobalGeneration. In the generational case, we explicitly track the marking state of the old and young generations explicitly. It sounds to me as if forcing the Old and Young marking states to the state of that of the GlobalGeneration must be exactly for the case where we are using Generational Shenandoah, and we are doing a Global collection? Indeed: >> >> >> void ShenandoahGlobalGeneration::set_mark_complete() { >> ShenandoahGeneration::set_mark_complete(); >> if (ShenandoahHeap::heap()->mode()->is_generational()) { >> ShenandoahGenerationalHeap* heap = ShenandoahGenerationalHeap::heap(); >> heap->young_generation()->set_mark_complete(); >> heap->old_generation()->set_mark_complete(); >> } >> } >> >> >> I am saying that each of Old, Young, and Global generations maintain their own mark completion state and use that to determine what they pass back in response to `complete_marking_context()`. This completely localizes all state rather than unnecessarily and confusingly coupling the states of these generations. >> >> So, you remove the part in the `if` branch in the code above, which reduces to the default (or rather only) implementation in the base class, not requiring the over-ride of the Global generation's method for the generational case. >> >> >> void ShenandoahGeneration::set_mark_complete() { >> _is_marking_complete.set(); >> } >> >> >> It is possible that I am still missing the actual structure here that requires the override for GlobalGeneration for the generational case. > > Sorry I misunderstood your original proposal, I thought you meant to suggest to remove the flag from ShenandoahGlobalGeneration, instead the set_mark_complete/is_mark_complete will more like view/delegation layer like: > > void ShenandoahGlobalGeneration::set_mark_complete() { > ShenandoahGenerationalHeap* heap = ShenandoahGenerationalHeap::heap(); > heap->young_generation()->set_mark_complete(); > heap->old_generation()->set_mark_complete(); > } > > bool ShenandoahGlobalGeneration::is_mark_complete() { > ShenandoahGenerationalHeap* heap = ShenandoahGenerationalHeap::heap(); > return heap->young_generation()->is_mark_complete() && heap->old_generation()->is_mark_complete(); > } You proposal will make the impl of the set_mark_complete/is_mark_complete of ShenandoahGeneration cleaner, but the thing is it will change current design and behavior, we may have to update the code where there methods is called, e.g. when we call `set_mark_complete` of gc_generation/active_generation, if it is global generation, we may have to explicitly call the same methods of ShenandoahYoungGeneration and ShenandoahOldGeneration to fan out the status. How about I follow up it in a separate task and update the implementation if necessary? I want to limit the changes involved in this PR, and only fix the bug. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984196615 From xpeng at openjdk.org Thu Mar 6 23:54:54 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Mar 2025 23:54:54 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 21:26:22 GMT, Y. Srinivas Ramakrishna wrote: >> Xiaolong Peng has updated the pull request incrementally with three additional commits since the last revision: >> >> - Remove ShenandoahHeap::complete_marking_context(ShenandoahHeapRegion* region) >> - Revert "complete_marking_context should guarantee mark is complete" >> >> This reverts commit 2004973965ea0e617cf9e5fc45be24f0e06e90a1. >> - complete_marking_context should guarantee mark is complete > > src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 1: > >> 1: /* > > This all looks good. > One thing to think about in general about assertions in closures is whether instead of making use of knowledge of the context in which these closures are used, whether it may produce more mantianable code to embed the "active_generation" to which the closure is being applied in the closure itself and have the assertions (or other uses of context) use that instead. Nothing to be done now, but something to think about in making more maintainable code. Right, active_generation should be used instead of global_generation to get the complete marking context, with the context of full GC, even we know it active_generation is the global gen, but it's better not to use global_generation directly for better maintainable code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984201726 From xpeng at openjdk.org Thu Mar 6 23:57:54 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 6 Mar 2025 23:57:54 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: <9nhUQ5sIaBFGlhEh-w5J-TAQMAbp3dWUiSRfMRoK2rY=.9fd2e8bc-6a53-4385-9e7b-1b0d36a91a8d@github.com> References: <9nhUQ5sIaBFGlhEh-w5J-TAQMAbp3dWUiSRfMRoK2rY=.9fd2e8bc-6a53-4385-9e7b-1b0d36a91a8d@github.com> Message-ID: On Thu, 6 Mar 2025 23:34:21 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGenerationalEvacuationTask.cpp line 172: >> >>> 170: // contained herein. >>> 171: void ShenandoahGenerationalEvacuationTask::promote_in_place(ShenandoahHeapRegion* region) { >>> 172: ShenandoahMarkingContext* const marking_context = _heap->young_generation()->complete_marking_context(); >> >> For clarity, you might assert the following before line 172: >> >> assert(gc_generation() == _heap->young_generation(), "Sanity check"); >> >> >> Even though it might seem somewhat tautological. > > Thanks, I'll add it. Question: Does Shenandoah promote region in global cycles? the gc_generation might be global if so. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984203547 From xpeng at openjdk.org Fri Mar 7 00:14:10 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 7 Mar 2025 00:14:10 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v5] In-Reply-To: References: Message-ID: > With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed. > > This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API. > > ### Test > - [x] hotspot_gc_shenandoah > - [x] Tier 1 > - [x] Tier 2 Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23886/files - new: https://git.openjdk.org/jdk/pull/23886/files/952f7ea5..17bcb358 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23886&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23886&range=03-04 Stats: 6 lines in 2 files changed: 1 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23886/head:pull/23886 PR: https://git.openjdk.org/jdk/pull/23886 From xpeng at openjdk.org Fri Mar 7 00:36:53 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 7 Mar 2025 00:36:53 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 23:52:32 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 1: >> >>> 1: /* >> >> This all looks good. >> One thing to think about in general about assertions in closures is whether instead of making use of knowledge of the context in which these closures are used, whether it may produce more mantianable code to embed the "active_generation" to which the closure is being applied in the closure itself and have the assertions (or other uses of context) use that instead. Nothing to be done now, but something to think about in making more maintainable code. > > Right, active_generation should be used instead of global_generation to get the complete marking context, with the context of full GC, even we know it active_generation is the global gen, but it's better not to use global_generation directly for better maintainable code. Updated it to use active_generation. >> src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1283: >> >>> 1281: if (_heap->gc_generation()->is_global()) { >>> 1282: return _heap->marking_context(); >>> 1283: } >> >> Not sure I understand the point of this change in behavior. What purpose does a partial marking context serve? Why not just leave the behavior as was before and return a non-null marking context only when marking is complete and null otherwise. When the client uses the context, it does so to skip over unmarked objects (which are dead if marking is complete), which might end up being too weak if we are still in the midst of marking. >> >> I realize that you may not be maintaining a global mark completion so you are returning the marking context irrespective of the state of completion of the marking, but I wonder if that is really the bahavior you want. I would rather, as necessary, we maintain a flag for completion of global marking for the case where we are doing a global gc? > > It is confusing that it looks like a behavior change, but actually there is no behavior change in this method, all the change here is to make the behavior of this method to be exactly same a before. > > The old impl always return the the marking context, regardless the completeness status of marking, because the old `_heap->complete_marking_context()` always return w/o assert error due to inaccurate completeness marking status in the marking context, we are fixing the issue in this PR which breaks the old impl of this method. The method get_marking_context_for_old is called at line 1363 in method `verify_rem_set_before_mark`, as the name indicates it could be called before mark. If current gc generation is global gc, the old marking flag should have set to false when the global flag is set, it is a bit wired I'm not sure if we should change it now, but I think we will have to correct/update the impl of this method later when update the design of completeness flags of global/young/old generations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984234928 PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984233610 From xpeng at openjdk.org Fri Mar 7 00:48:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 7 Mar 2025 00:48:56 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 22:54:14 GMT, Y. Srinivas Ramakrishna wrote: > A few more comments, mostly pertaining to global gen's "complete" marking context semantics and usage, as well as `SH::[*_]marking_context` delegating to its `active_generation()`'s method. > > This should be my last round of comments. Thank you for your patience... Thanks for much for the reviews, I'll probably not add method like `SH::complete_marking_context()` delegating to its `SH::active_generation()->complete_marking_context()` since the other confusion it may causes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23886#issuecomment-2705261964 From xpeng at openjdk.org Fri Mar 7 01:04:52 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 7 Mar 2025 01:04:52 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: <9nhUQ5sIaBFGlhEh-w5J-TAQMAbp3dWUiSRfMRoK2rY=.9fd2e8bc-6a53-4385-9e7b-1b0d36a91a8d@github.com> Message-ID: On Fri, 7 Mar 2025 00:58:15 GMT, Y. Srinivas Ramakrishna wrote: >> Question: Does Shenandoah promote region in global cycles? the gc_generation might be global if so. > > Good point. I don't see any reason promotions should be verboten in global cycles. cc @earthling-amzn ? > > If that is indeed the case, a clean separation and maintenance of completeness of marking for global generation, and use of `_heap->gc_generation()` would make sense to me. Thanks for the confirmation, I added assert as below since it gc_generation could be global : assert(!_heap->gc_generation()->is_old(), "Sanity check"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984253086 From ysr at openjdk.org Fri Mar 7 01:09:52 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 7 Mar 2025 01:09:52 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: Message-ID: On Fri, 7 Mar 2025 00:31:53 GMT, Xiaolong Peng wrote: >> It is confusing that it looks like a behavior change, but actually there is no behavior change in this method, all the change here is to make the behavior of this method to be exactly same a before. >> >> The old impl always return the the marking context, regardless the completeness status of marking, because the old `_heap->complete_marking_context()` always return w/o assert error due to inaccurate completeness marking status in the marking context, we are fixing the issue in this PR which breaks the old impl of this method. > > The method get_marking_context_for_old is called at line 1363 in method `verify_rem_set_before_mark`, as the name indicates it could be called before mark. > > If current gc generation is global gc, the old marking flag should have set to false when the global flag is set, it is a bit wired I'm not sure if we should change it now, but I think we will have to correct/update the impl of this method later when update the design of completeness flags of global/young/old generations. Here's my thinking: The clients of this method do not want to use an incomplete marking context. We either want to look at all the objects (when marking information is incomplete) or we want complete marking context in which case we will skip over dead objects. Hence my reservation about this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984255948 From ysr at openjdk.org Fri Mar 7 01:13:02 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 7 Mar 2025 01:13:02 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: <9nhUQ5sIaBFGlhEh-w5J-TAQMAbp3dWUiSRfMRoK2rY=.9fd2e8bc-6a53-4385-9e7b-1b0d36a91a8d@github.com> Message-ID: On Fri, 7 Mar 2025 01:02:29 GMT, Xiaolong Peng wrote: >> Good point. I don't see any reason promotions should be verboten in global cycles. cc @earthling-amzn ? >> >> If that is indeed the case, a clean separation and maintenance of completeness of marking for global generation, and use of `_heap->gc_generation()` would make sense to me. > > Thanks for the confirmation, I added assert as below since it gc_generation could be global : > > > assert(!_heap->gc_generation()->is_old(), "Sanity check"); The assert may be fine, but the treatment of completeness of the marking context seems very brittle to me and apt to cause problems in the future. I would prefer a cleaner separation of these. May be we can sync up separately to discuss this along with @earthling-amzn . ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984257756 From ayang at openjdk.org Fri Mar 7 13:16:59 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 7 Mar 2025 13:16:59 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v14] In-Reply-To: References: Message-ID: <5w6qUwzDQadxseocRl6rRF0AllyeukWTpYl2XjAfiTE=.fb62a50e-e308-4d08-8057-67e70e13ccbb@github.com> On Thu, 6 Mar 2025 16:26:31 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * iwalulya review > * renaming > * fix some includes, forward declaration src/hotspot/share/gc/g1/g1CardTable.hpp line 76: > 74: g1_card_already_scanned = 0x1, > 75: g1_to_cset_card = 0x2, > 76: g1_from_remset_card = 0x4 Could you outline the motivation for this more precise info? Is it for optimization or essentially for correctness? src/hotspot/share/gc/g1/g1ConcurrentRefineSweepTask.cpp line 54: > 52: assert(refinement_r == card_r, "not same region source %u (%zu) dest %u (%zu) ", refinement_r->hrm_index(), refinement_i, card_r->hrm_index(), card_i); > 53: assert(refinement_i == card_i, "indexes are not same %zu %zu", refinement_i, card_i); > 54: #endif I feel this assert logic can be extracted to a method, sth like `verify_card_pair`. src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 64: > 62: report_inactive("Paused"); > 63: sts_join.yield(); > 64: // Reset after yield rather than accumulating across yields, else a The comment seems obsolete after the removal of stats. src/hotspot/share/gc/g1/g1OopClosures.inline.hpp line 158: > 156: if (_has_ref_to_cset) { > 157: return; > 158: } Is it really necessary to write `false` to `_has_ref_to_cset`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1985041202 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1983846649 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1983842440 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1983857348 From wkemper at openjdk.org Fri Mar 7 18:30:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 7 Mar 2025 18:30:54 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: <9nhUQ5sIaBFGlhEh-w5J-TAQMAbp3dWUiSRfMRoK2rY=.9fd2e8bc-6a53-4385-9e7b-1b0d36a91a8d@github.com> Message-ID: On Fri, 7 Mar 2025 01:10:26 GMT, Y. Srinivas Ramakrishna wrote: >> Thanks for the confirmation, I added assert as below since it gc_generation could be global : >> >> >> assert(!_heap->gc_generation()->is_old(), "Sanity check"); > > The assert may be fine, but the treatment of completeness of the marking context seems very brittle to me and apt to cause problems in the future. I would prefer a cleaner separation of these. May be we can sync up separately to discuss this along with @earthling-amzn . Yes, regions may be promoted during a global cycle. Completing the mark for a global cycle also completes the mark for the young and old generations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1985512202 From wkemper at openjdk.org Fri Mar 7 19:19:53 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 7 Mar 2025 19:19:53 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4] In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 22:25:29 GMT, Y. Srinivas Ramakrishna wrote: >> Xiaolong Peng has updated the pull request incrementally with three additional commits since the last revision: >> >> - Remove ShenandoahHeap::complete_marking_context(ShenandoahHeapRegion* region) >> - Revert "complete_marking_context should guarantee mark is complete" >> >> This reverts commit 2004973965ea0e617cf9e5fc45be24f0e06e90a1. >> - complete_marking_context should guarantee mark is complete > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 123: > >> 121: #ifdef ASSERT >> 122: bool reg_live = region->has_live(); >> 123: bool bm_live = heap->active_generation()->complete_marking_context()->is_marked(cast_to_oop(region->bottom())); > > Apropos of another comment, if we really want to keep a delegating method in `ShenandoahHeap`, why not use `heap->complete_marking_context()` as a synonym for `heap->active_generation()->complete_marking_context()` ? This makes sense to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1985570114 From wkemper at openjdk.org Fri Mar 7 19:27:54 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 7 Mar 2025 19:27:54 GMT Subject: RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v5] In-Reply-To: References: <8w22oUPhZEx0iEIeNQ-GUUjx8jNkjXrTHjfjN_sX4HE=.2c391dd5-227e-4755-ba4d-528a7dcefca3@github.com> Message-ID: On Thu, 6 Mar 2025 23:46:02 GMT, Xiaolong Peng wrote: >> Sorry I misunderstood your original proposal, I thought you meant to suggest to remove the flag from ShenandoahGlobalGeneration, instead the set_mark_complete/is_mark_complete will more like view/delegation layer like: >> >> void ShenandoahGlobalGeneration::set_mark_complete() { >> ShenandoahGenerationalHeap* heap = ShenandoahGenerationalHeap::heap(); >> heap->young_generation()->set_mark_complete(); >> heap->old_generation()->set_mark_complete(); >> } >> >> bool ShenandoahGlobalGeneration::is_mark_complete() { >> ShenandoahGenerationalHeap* heap = ShenandoahGenerationalHeap::heap(); >> return heap->young_generation()->is_mark_complete() && heap->old_generation()->is_mark_complete(); >> } > > You proposal will make the impl of the set_mark_complete/is_mark_complete of ShenandoahGeneration cleaner, but the thing is it will change current design and behavior, we may have to update the code where there methods is called, e.g. when we call `set_mark_complete` of gc_generation/active_generation, if it is global generation, we may have to explicitly call the same methods of ShenandoahYoungGeneration and ShenandoahOldGeneration to fan out the status. > > How about I follow up it in a separate task and update the implementation if necessary? I want to limit the changes involved in this PR, and only fix the bug. The young and old generations are only instantiated in the generational mode, so using them without checking the mode will result in SEGV in non-generational modes. Global collections have a lot of overlap with old collections. I think what Ramki is saying, is that if we change all the code that makes assertions about the completion status of young/old marking to use the `active_generation` field instead, then we wouldn't need to update the completion status of young/old during a global collection. The difficulty here is that we need assurances that the old generation mark bitmap is valid in collections subsequent to a global collection. So, I don't think we can rely on completion status of `active_generation` when it was global, in following collections where it may now refer to young or old. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1985578948 From wkemper at openjdk.org Fri Mar 7 21:52:09 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 7 Mar 2025 21:52:09 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops Message-ID: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Unloading classes may require a walk of unreachable oops. For this reason, it is not safe to recycle memory before class unloading is complete. This complements existing code to prevent mutators from recycling trash regions while weak roots is in progress. ------------- Commit messages: - Make concurrent class unloading a little safer - Can't recycle during weak roots, but does the LRB really need to return doomed from space objects? - What happens if we allow trash regions to be recycled during concurrent weak roots? - Trying to find a test that fails because the LRB won't return a doomed from space object Changes: https://git.openjdk.org/jdk/pull/23951/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23951&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351444 Stats: 24 lines in 3 files changed: 10 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23951.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23951/head:pull/23951 PR: https://git.openjdk.org/jdk/pull/23951 From tschatzl at openjdk.org Sat Mar 8 19:32:54 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Sat, 8 Mar 2025 19:32:54 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v15] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: - * fix card table verification crashes: in the first refinement phase, when switching the global card tables, we need to re-check whether we are still in the same sweep epoch or not. It might have changed due to a GC interrupting acquiring the Heap_lock. Otherwise new threads will scribble on the refinement table. Cause are last-minute changes before making the PR ready to review. Testing: without the patch, occurs fairly frequently when continuously (1 in 20) starting refinement. Does not afterward. - * ayang review 3 * comments * minor refactorings ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/350a4fa3..93b884f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=13-14 Stats: 35 lines in 5 files changed: 30 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Sat Mar 8 19:32:54 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Sat, 8 Mar 2025 19:32:54 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v9] In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 10:46:13 GMT, Thomas Schatzl wrote: > I got an error while testing java/foreign/TestUpcallStress.java on linuxaarch64 with this PR: Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2708458459 From aboldtch at openjdk.org Mon Mar 10 06:05:30 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 10 Mar 2025 06:05:30 GMT Subject: RFR: 8333578: Fix uses of overaligned types induced by ZCACHE_ALIGNED [v2] In-Reply-To: References: Message-ID: > The only directly heap allocated, constructed object of types that are overaligned because of ZCACHE_ALIGNED is ZCollectedHeap. The other are either in static storage or contained in (and constructed as part of) ZCollectedHeap. So we only need to fix ZCollectedHeap allocation. > > As the CollectedHeap is only ever created once and is never destroyed, we can simply align the allocation and create an unfreeable pointer. > > This implementation imposes that `ZCacheLineSize` is a power of two, but we already have this requirement elsewhere (e.g. `ZContendedStorage`). > > Testing: > * tier 1 through tier 5 Oracle supported platforms > * GHA Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Updated comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23885/files - new: https://git.openjdk.org/jdk/pull/23885/files/2c13edcd..0b918f00 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23885&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23885&range=00-01 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23885/head:pull/23885 PR: https://git.openjdk.org/jdk/pull/23885 From stefank at openjdk.org Mon Mar 10 07:02:56 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 10 Mar 2025 07:02:56 GMT Subject: RFR: 8333578: Fix uses of overaligned types induced by ZCACHE_ALIGNED [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 06:05:30 GMT, Axel Boldt-Christmas wrote: >> The only directly heap allocated, constructed object of types that are overaligned because of ZCACHE_ALIGNED is ZCollectedHeap. The other are either in static storage or contained in (and constructed as part of) ZCollectedHeap. So we only need to fix ZCollectedHeap allocation. >> >> As the CollectedHeap is only ever created once and is never destroyed, we can simply align the allocation and create an unfreeable pointer. >> >> This implementation imposes that `ZCacheLineSize` is a power of two, but we already have this requirement elsewhere (e.g. `ZContendedStorage`). >> >> Testing: >> * tier 1 through tier 5 Oracle supported platforms >> * GHA > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Updated comment Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23885#pullrequestreview-2669959118 From stefank at openjdk.org Mon Mar 10 07:28:51 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 10 Mar 2025 07:28:51 GMT Subject: RFR: 8350905: Releasing a WeakHandle's referent may extend its lifetime In-Reply-To: References: Message-ID: <2rdNxa8sWH0qCHRDCtZgM27hh839433UT_KXGhjK7s4=.856e615e-e9a9-469a-b3d5-fb8b5d6181a2@github.com> On Thu, 6 Mar 2025 18:57:18 GMT, William Kemper wrote: > When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. We are proposing that native weak references are cleared with an additional `AS_NO_KEEPALIVE` decorator. This is similar to what was done for j.l.r.WeakReference in [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696). > > # Testing > > GHA, `hotspot_gc_shenandoah`. Additionally, for G1, ZGC, and Shenandoah we've run Extremem, Dacapo, SpecJVM2008, SpecJBB2015, Heapothesys and Diluvian. All executions completed without errors. @fisk gave an offline comment that he would prefer if this could be handled by the GC Barrier backend instead of having to change the runtime code to understand how SATB and weak handles work. Take a look at how ZGC deals with this: template inline void ZBarrierSet::AccessBarrier::oop_store_not_in_heap(zpointer* p, oop value) { verify_decorators_absent(); if (!is_store_barrier_no_keep_alive()) { store_barrier_native_without_healing(p); } Raw::store(p, store_good(value)); } and then how `is_store_barrier_no_keep_alive` ensures that ON_PHANTOM_OOP_REF stores are treated as no-keepalive stores. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23935#issuecomment-2709653170 From aboldtch at openjdk.org Mon Mar 10 11:57:04 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 10 Mar 2025 11:57:04 GMT Subject: RFR: 8333578: Fix uses of overaligned types induced by ZCACHE_ALIGNED [v2] In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 06:05:30 GMT, Axel Boldt-Christmas wrote: >> The only directly heap allocated, constructed object of types that are overaligned because of ZCACHE_ALIGNED is ZCollectedHeap. The other are either in static storage or contained in (and constructed as part of) ZCollectedHeap. So we only need to fix ZCollectedHeap allocation. >> >> As the CollectedHeap is only ever created once and is never destroyed, we can simply align the allocation and create an unfreeable pointer. >> >> This implementation imposes that `ZCacheLineSize` is a power of two, but we already have this requirement elsewhere (e.g. `ZContendedStorage`). >> >> Testing: >> * tier 1 through tier 5 Oracle supported platforms >> * GHA > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Updated comment Thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/23885#issuecomment-2710328082 From aboldtch at openjdk.org Mon Mar 10 11:57:03 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 10 Mar 2025 11:57:03 GMT Subject: RFR: 8350572: ZGC: Enhance z_verify_safepoints_are_blocked interactions with VMError In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 11:15:52 GMT, Axel Boldt-Christmas wrote: > If VMError reporting is triggered from a disallowed thread state `z_verify_safepoints_are_blocked` will cause reentrant assertions to be triggered, when for example when loading the thread oop as part of thread printing. This extends the verification to be ignored if triggered from the thread doing the error reporting. In most cases performing the load barriers from disallowed thread states during error reporting will work. > > Testing: > - tier 1 Oracle supported platforms > - GHA Thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/23820#issuecomment-2710327472 From aboldtch at openjdk.org Mon Mar 10 11:57:03 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 10 Mar 2025 11:57:03 GMT Subject: Integrated: 8350572: ZGC: Enhance z_verify_safepoints_are_blocked interactions with VMError In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 11:15:52 GMT, Axel Boldt-Christmas wrote: > If VMError reporting is triggered from a disallowed thread state `z_verify_safepoints_are_blocked` will cause reentrant assertions to be triggered, when for example when loading the thread oop as part of thread printing. This extends the verification to be ignored if triggered from the thread doing the error reporting. In most cases performing the load barriers from disallowed thread states during error reporting will work. > > Testing: > - tier 1 Oracle supported platforms > - GHA This pull request has now been integrated. Changeset: 64caf085 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/64caf085344dcd5fc5185ed5882439249e239d50 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod 8350572: ZGC: Enhance z_verify_safepoints_are_blocked interactions with VMError Reviewed-by: eosterlund, stefank ------------- PR: https://git.openjdk.org/jdk/pull/23820 From aboldtch at openjdk.org Mon Mar 10 11:57:05 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 10 Mar 2025 11:57:05 GMT Subject: Integrated: 8333578: Fix uses of overaligned types induced by ZCACHE_ALIGNED In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 07:49:23 GMT, Axel Boldt-Christmas wrote: > The only directly heap allocated, constructed object of types that are overaligned because of ZCACHE_ALIGNED is ZCollectedHeap. The other are either in static storage or contained in (and constructed as part of) ZCollectedHeap. So we only need to fix ZCollectedHeap allocation. > > As the CollectedHeap is only ever created once and is never destroyed, we can simply align the allocation and create an unfreeable pointer. > > This implementation imposes that `ZCacheLineSize` is a power of two, but we already have this requirement elsewhere (e.g. `ZContendedStorage`). > > Testing: > * tier 1 through tier 5 Oracle supported platforms > * GHA This pull request has now been integrated. Changeset: fb0efbe8 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/fb0efbe87403fa2f1263c9b916db1a3a3b037eeb Stats: 17 lines in 1 file changed: 15 ins; 0 del; 2 mod 8333578: Fix uses of overaligned types induced by ZCACHE_ALIGNED Reviewed-by: stefank, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/23885 From shade at openjdk.org Mon Mar 10 11:59:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Mar 2025 11:59:53 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops In-Reply-To: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Message-ID: <3AO8SncuFl0-pj94X6S1GHNXi01EoOTZU1lnhrmtsKo=.85990912-b326-40f1-9dda-594b05b1f694@github.com> On Fri, 7 Mar 2025 21:47:31 GMT, William Kemper wrote: > Unloading classes may require a walk of unreachable oops. For this reason, it is not safe to recycle memory before class unloading is complete. This complements existing code to prevent mutators from recycling trash regions while weak roots is in progress. This looks fine as the first step to this sequencing problem. I do think we still have a conceptual problem of accessing the the oops in _trash_ regions. This patch blocks `trash` -> `empty` transition by delaying cleanup. This likely works well in release builds. I would expect debug builds to still complain we are touching the oop in `trash` region. At class unloading, we can only have trash regions from the immediate trashing during region selection. So, in addition to this, I think we really need to move immediate trashing somewhere after class unloading as well. This would likely require more fiddling with heurstics: we do immediate trash there to see if we can take a shortcut cycle. Changes requested by shade (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 124: > 122: } > 123: > 124: // Allow resurrection of unreachable objects that are visited during concurrent class-unloading. Let's not call it "Allow resurrection", which somewhat implies the object has full privileges to exist, i.e. can be inserted into the object graph back. But it really can't. We only do this because we were asked with `AS_NO_KEEPALIVE`. Something like: "Allow runtime to see unreachable objects that are visited during concurrent class unloading" src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 153: > 151: } > 152: > 153: assert(heap->is_concurrent_weak_root_in_progress(), "Must be doing weak roots now"); This does not ring true when final mark is cancelled, or am I missing something? src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 163: > 161: // We cannot recycle regions because weak roots need to know what is marked in trashed regions. > 162: entry_weak_refs(); > 163: entry_weak_roots(); Same as above: should probably be still protected by GC state checks to cover cancelled cases. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23951#pullrequestreview-2670666176 PR Review: https://git.openjdk.org/jdk/pull/23951#pullrequestreview-2670700570 PR Review Comment: https://git.openjdk.org/jdk/pull/23951#discussion_r1987125159 PR Review Comment: https://git.openjdk.org/jdk/pull/23951#discussion_r1987139867 PR Review Comment: https://git.openjdk.org/jdk/pull/23951#discussion_r1987143228 From stefank at openjdk.org Mon Mar 10 15:24:36 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 10 Mar 2025 15:24:36 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings Message-ID: When -XX:+UseStringDeduplication is turned on, ZGC requests that every single String it encounters is deduplicated. The Strings of these requests are saved in weak handles, and then processed by a separate thread. One problematic part with this is that ZGC treats these handles as strong roots for young collections. So, even if the Strings are short-lived they will be artificially kept alive until they get promoted up to the old generation. This creates an extreme amount of Strings and weak handles to be processed by the old collection, which can result in long major collections and allocation stalls. Other GCs have a filter to check for how old the Strings are before they get deduplicated. The proposed fix is to move the string deduplication requests to when the Strings are promoted to the old generation. Testing: * I've tested this with a small micro that showed how ZGC got extremely long major collections with string deduplication turned on. * SPECjbb2015 with a JVMTI agent that induces load and adds deduplicatable strings. * Tier1-7 Note: I'm currently not aware of any non-artificial workload where string deduplication is an important optimization when running with Generational ZGC. If anyone knows of a workload that greatly benefits from it *AND* uses ZGC as a low-latency collector, then that would be highly interesting to look at. Note 2: the branch contains two changesets. In the first changeset I added a flag to be able to test and compare the old implementation with the new implementation. For the final PR I've removed that flag and the associated code as a second changeset. If we really want we could keep that flag, but given how poorly that implementation worked for Generational ZGC, I think we should just go with this new implementation. ------------- Commit messages: - Remove string dedup from marking - 8347337: ZGC: String dedups short-lived strings Changes: https://git.openjdk.org/jdk/pull/23965/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23965&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347337 Stats: 151 lines in 6 files changed: 109 ins; 34 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23965.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23965/head:pull/23965 PR: https://git.openjdk.org/jdk/pull/23965 From wkemper at openjdk.org Mon Mar 10 16:37:01 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Mar 2025 16:37:01 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops In-Reply-To: <3AO8SncuFl0-pj94X6S1GHNXi01EoOTZU1lnhrmtsKo=.85990912-b326-40f1-9dda-594b05b1f694@github.com> References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> <3AO8SncuFl0-pj94X6S1GHNXi01EoOTZU1lnhrmtsKo=.85990912-b326-40f1-9dda-594b05b1f694@github.com> Message-ID: On Mon, 10 Mar 2025 11:54:50 GMT, Aleksey Shipilev wrote: >> Unloading classes may require a walk of unreachable oops. For this reason, it is not safe to recycle memory before class unloading is complete. This complements existing code to prevent mutators from recycling trash regions while weak roots is in progress. > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 153: > >> 151: } >> 152: >> 153: assert(heap->is_concurrent_weak_root_in_progress(), "Must be doing weak roots now"); > > This does not ring true when final mark is cancelled, or am I missing something? There is a cancellation check just above this line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23951#discussion_r1987650487 From wkemper at openjdk.org Mon Mar 10 17:25:57 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Mar 2025 17:25:57 GMT Subject: RFR: 8350905: Releasing a WeakHandle's referent may extend its lifetime In-Reply-To: <2rdNxa8sWH0qCHRDCtZgM27hh839433UT_KXGhjK7s4=.856e615e-e9a9-469a-b3d5-fb8b5d6181a2@github.com> References: <2rdNxa8sWH0qCHRDCtZgM27hh839433UT_KXGhjK7s4=.856e615e-e9a9-469a-b3d5-fb8b5d6181a2@github.com> Message-ID: On Mon, 10 Mar 2025 07:26:02 GMT, Stefan Karlsson wrote: >> When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. We are proposing that native weak references are cleared with an additional `AS_NO_KEEPALIVE` decorator. This is similar to what was done for j.l.r.WeakReference in [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696). >> >> # Testing >> >> GHA, `hotspot_gc_shenandoah`. Additionally, for G1, ZGC, and Shenandoah we've run Extremem, Dacapo, SpecJVM2008, SpecJBB2015, Heapothesys and Diluvian. All executions completed without errors. > > @fisk gave an offline comment that he would prefer if this could be handled by the GC Barrier backend instead of having to change the runtime code to understand how SATB and weak handles work. > > Take a look at how ZGC deals with this: > > template > inline void ZBarrierSet::AccessBarrier::oop_store_not_in_heap(zpointer* p, oop value) { > verify_decorators_absent(); > > if (!is_store_barrier_no_keep_alive()) { > store_barrier_native_without_healing(p); > } > > Raw::store(p, store_good(value)); > } > > > and then how `is_store_barrier_no_keep_alive` ensures that ON_PHANTOM_OOP_REF stores are treated as no-keepalive stores. Thank you @stefank , will take a look at this today. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23935#issuecomment-2711314050 From wkemper at openjdk.org Mon Mar 10 17:31:53 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Mar 2025 17:31:53 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops In-Reply-To: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Message-ID: On Fri, 7 Mar 2025 21:47:31 GMT, William Kemper wrote: > Unloading classes may require a walk of unreachable oops. For this reason, it is not safe to recycle memory before class unloading is complete. This complements existing code to prevent mutators from recycling trash regions while weak roots is in progress. It isn't just about trash regions. This path through the barrier will also allow access to objects in the collection set (without evacuating them). We also choose the collection set during the final mark safepoint. Alternatively, we could soften the constraint for `ShenandoahHeap::is_in` to allow access to trash regions if concurrent weak roots is in progress. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23951#issuecomment-2711330793 From wkemper at openjdk.org Mon Mar 10 18:55:51 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Mar 2025 18:55:51 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops [v2] In-Reply-To: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Message-ID: > Unloading classes may require a walk of unreachable oops. For this reason, it is not safe to recycle memory before class unloading is complete. This complements existing code to prevent mutators from recycling trash regions while weak roots is in progress. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Consider trash regions to be in the heap during concurrent weak roots - Better comment for LRB when accessing unreachable oops ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23951/files - new: https://git.openjdk.org/jdk/pull/23951/files/b231d4fe..a5db7360 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23951&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23951&range=00-01 Stats: 27 lines in 2 files changed: 16 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/23951.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23951/head:pull/23951 PR: https://git.openjdk.org/jdk/pull/23951 From shade at openjdk.org Mon Mar 10 19:47:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Mar 2025 19:47:53 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops In-Reply-To: References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Message-ID: On Mon, 10 Mar 2025 17:28:59 GMT, William Kemper wrote: > It isn't just about trash regions. This path through the barrier will also allow access to objects in the collection set (without evacuating them). We also choose the collection set during the final mark safepoint. Alternatively, we could soften the constraint for `ShenandoahHeap::is_in` to allow access to trash regions if concurrent weak roots is in progress. Yes, also `cset`. My point is that conceptually, `trash` means trash, and we should not be accessing it. So if something is not trash yet (including references from weak roots), it should not be labeled `trash` then. With this patch, we are kinda stretching the definition. Which is fine for a local patch, but this over-stretching should be resolved, so it does not haunt us in future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23951#issuecomment-2711652598 From shade at openjdk.org Mon Mar 10 19:47:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Mar 2025 19:47:55 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops [v2] In-Reply-To: References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Message-ID: On Mon, 10 Mar 2025 18:55:51 GMT, William Kemper wrote: >> Unloading classes may require a walk of unreachable oops. For this reason, it is not safe to recycle memory before class unloading is complete. This complements existing code to prevent mutators from recycling trash regions while weak roots is in progress. > > William Kemper has updated the pull request incrementally with two additional commits since the last revision: > > - Consider trash regions to be in the heap during concurrent weak roots > - Better comment for LRB when accessing unreachable oops src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 127: > 125: // Note that this may also interfere with the DeadCounterClosure when visiting weak oop storage, > 126: // but it does not seem to be a problem in practice because the dead count callbacks do not care > 127: // about the precise number of dead objects (only that there are dead objects). The last 3 lines feel too specific, really :) I think that paragraph should be in the related bug report. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23951#discussion_r1987915941 From shade at openjdk.org Mon Mar 10 19:47:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 10 Mar 2025 19:47:56 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops [v2] In-Reply-To: References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> <3AO8SncuFl0-pj94X6S1GHNXi01EoOTZU1lnhrmtsKo=.85990912-b326-40f1-9dda-594b05b1f694@github.com> Message-ID: On Mon, 10 Mar 2025 16:34:22 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 153: >> >>> 151: } >>> 152: >>> 153: assert(heap->is_concurrent_weak_root_in_progress(), "Must be doing weak roots now"); >> >> This does not ring true when final mark is cancelled, or am I missing something? > > There is a cancellation check just above this line. Ah. Back when we wrote this code originally, we used GC state flags to figure out whether we really need to go into particular phases. But I guess cancellation check is OK too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23951#discussion_r1987909613 From wkemper at openjdk.org Mon Mar 10 21:12:57 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Mar 2025 21:12:57 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops [v2] In-Reply-To: References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Message-ID: On Mon, 10 Mar 2025 19:42:30 GMT, Aleksey Shipilev wrote: >> William Kemper has updated the pull request incrementally with two additional commits since the last revision: >> >> - Consider trash regions to be in the heap during concurrent weak roots >> - Better comment for LRB when accessing unreachable oops > > src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 127: > >> 125: // Note that this may also interfere with the DeadCounterClosure when visiting weak oop storage, >> 126: // but it does not seem to be a problem in practice because the dead count callbacks do not care >> 127: // about the precise number of dead objects (only that there are dead objects). > > The last 3 lines feel too specific, really :) I think that paragraph should be in the related bug report. Okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23951#discussion_r1988028969 From wkemper at openjdk.org Mon Mar 10 21:25:06 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Mar 2025 21:25:06 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops [v3] In-Reply-To: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Message-ID: > Unloading classes may require a walk of unreachable oops. For this reason, it is not safe to recycle memory before class unloading is complete. This complements existing code to prevent mutators from recycling trash regions while weak roots is in progress. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Trim extraneous comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23951/files - new: https://git.openjdk.org/jdk/pull/23951/files/a5db7360..1c73a85a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23951&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23951&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23951.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23951/head:pull/23951 PR: https://git.openjdk.org/jdk/pull/23951 From wkemper at openjdk.org Mon Mar 10 21:25:07 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Mar 2025 21:25:07 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops [v2] In-Reply-To: References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Message-ID: <6WGwFtG_zwWr9SJw1zebfJPlro0LdRcOzOlE2TWB2N0=.471c1dab-65f7-4fd9-9d5e-d0dcf0142cdb@github.com> On Mon, 10 Mar 2025 18:55:51 GMT, William Kemper wrote: >> Unloading classes may require a walk of unreachable oops. For this reason, it is not safe to recycle memory before class unloading is complete. This complements existing code to prevent mutators from recycling trash regions while weak roots is in progress. > > William Kemper has updated the pull request incrementally with two additional commits since the last revision: > > - Consider trash regions to be in the heap during concurrent weak roots > - Better comment for LRB when accessing unreachable oops If it is just the name of the region state here, we could call it `pending_recycle` or something that communicates our intent to recycle it, but that we still need to use it. Moving `cset ` and `immediate trash` selection out of `final mark` would probably require a new safepoint. I think we would still need a means to express the 'region cannot be used for allocations' concept between final mark and class unloading. I also modified `ShenandoahHeap::is_in` to match the same constraints we impose on the freeset for allocations during concurrent weak roots (https://github.com/openjdk/jdk/pull/23951/commits/a5db7360610691833ecb4204af2861c77c8b7858). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23951#issuecomment-2711869412 From kbarrett at openjdk.org Mon Mar 10 21:47:52 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 10 Mar 2025 21:47:52 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 15:08:16 GMT, Stefan Karlsson wrote: > When -XX:+UseStringDeduplication is turned on, ZGC requests that every single String it encounters is deduplicated. The Strings of these requests are saved in weak handles, and then processed by a separate thread. One problematic part with this is that ZGC treats these handles as strong roots for young collections. So, even if the Strings are short-lived they will be artificially kept alive until they get promoted up to the old generation. > > This creates an extreme amount of Strings and weak handles to be processed by the old collection, which can result in long major collections and allocation stalls. > > Other GCs have a filter to check for how old the Strings are before they get deduplicated. > > The proposed fix is to move the string deduplication requests to when the Strings are promoted to the old generation. > > Testing: > > * I've tested this with a small micro that showed how ZGC got extremely long major collections with string deduplication turned on. > > * SPECjbb2015 with a JVMTI agent that induces load and adds deduplicatable strings. > > * Tier1-7 > > Note: I'm currently not aware of any non-artificial workload where string deduplication is an important optimization when running with Generational ZGC. If anyone knows of a workload that greatly benefits from it *AND* uses ZGC as a low-latency collector, then that would be highly interesting to look at. > > Note 2: the branch contains two changesets. In the first changeset I added a flag to be able to test and compare the old implementation with the new implementation. For the final PR I've removed that flag and the associated code as a second changeset. If we really want we could keep that flag, but given how poorly that implementation worked for Generational ZGC, I think we should just go with this new implementation. Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/z/zRelocate.cpp line 802: > 800: > 801: void maybe_string_dedup(zaddress to_addr) { > 802: const bool is_promotion = _forwarding->to_age() == ZPageAge::old && _forwarding->from_age() != ZPageAge::old; It seems like this could be computed once at `ZRelocateWork` construction time? Unless `_forwarding` can change. Also, isn't this the same as `_forwarding->is_promotion()`? src/hotspot/share/gc/z/zStringDedup.inline.hpp line 41: > 39: // Not a String object > 40: return; > 41: } Consider using `StringDedup::is_enabled_string(obj)`, which combines `is_enabled` and `is_instance` into a single test. ------------- PR Review: https://git.openjdk.org/jdk/pull/23965#pullrequestreview-2672261317 PR Review Comment: https://git.openjdk.org/jdk/pull/23965#discussion_r1988061552 PR Review Comment: https://git.openjdk.org/jdk/pull/23965#discussion_r1988066547 From wkemper at openjdk.org Mon Mar 10 22:58:00 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 10 Mar 2025 22:58:00 GMT Subject: Withdrawn: 8348400: GenShen: assert(ShenandoahHeap::heap()->is_full_gc_in_progress() || (used_regions_size() <= _max_capacity)) failed: Cannot use more than capacity # In-Reply-To: References: Message-ID: On Thu, 23 Jan 2025 21:36:37 GMT, William Kemper wrote: > When the capacity of a trashed region is transferred from the young to old generation, we must first recycle the region to break its affiliation with the young generation. Failing to do this may violate the constraint that the capacity of a generation is always equal to or greater than the capacity of its affiliated regions. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23282 From fyang at openjdk.org Tue Mar 11 03:25:55 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 11 Mar 2025 03:25:55 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v15] In-Reply-To: References: Message-ID: On Sat, 8 Mar 2025 19:32:54 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - * fix card table verification crashes: in the first refinement phase, when switching the global card tables, we need to re-check whether we are still in the same sweep epoch or not. It might have changed due to a GC interrupting acquiring the Heap_lock. Otherwise new threads will scribble on the refinement table. > Cause are last-minute changes before making the PR ready to review. > > Testing: without the patch, occurs fairly frequently when continuously > (1 in 20) starting refinement. Does not afterward. > - * ayang review 3 > * comments > * minor refactorings Tier1-3 test good on linux-riscv64 platform. And I have prepared an add-on change which implements the barrier method to write cards for a reference array for this platform. Do you want to have it in this PR? Thanks. [23739-riscv-addon.txt](https://github.com/user-attachments/files/19174898/23739-riscv-addon.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2712469306 From sjohanss at openjdk.org Tue Mar 11 07:06:04 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 11 Mar 2025 07:06:04 GMT Subject: RFR: 8351216: ZGC: Store NUMA node count In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 20:06:08 GMT, Joel Sikstr?m wrote: > To avoid calling into `os::Linux::max_numa_node()` and in turn libnuma on every count lookup, I propose we instead store the count statically inside ZNUMA. This is perfectly fine since the value that we get from libnuma is configured once during initialization and never change during runtime. > > The count is set during platform dependent initialization and the getter is now defined in the common code in ZNUMA.cpp. On operating systems that ZGC does not support NUMA for (BSD and Windows) we keep the current behavior by setting the count to 1. > > This is also preparation work for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)). > > Testing: > * Tiers 1-3 > * GHA > * Verify that the count is set on a Linux system with NUMA hardware Looks good. ------------- Marked as reviewed by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23922#pullrequestreview-2673090112 From stefank at openjdk.org Tue Mar 11 08:39:52 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 11 Mar 2025 08:39:52 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 21:35:35 GMT, Kim Barrett wrote: >> When -XX:+UseStringDeduplication is turned on, ZGC requests that every single String it encounters is deduplicated. The Strings of these requests are saved in weak handles, and then processed by a separate thread. One problematic part with this is that ZGC treats these handles as strong roots for young collections. So, even if the Strings are short-lived they will be artificially kept alive until they get promoted up to the old generation. >> >> This creates an extreme amount of Strings and weak handles to be processed by the old collection, which can result in long major collections and allocation stalls. >> >> Other GCs have a filter to check for how old the Strings are before they get deduplicated. >> >> The proposed fix is to move the string deduplication requests to when the Strings are promoted to the old generation. >> >> Testing: >> >> * I've tested this with a small micro that showed how ZGC got extremely long major collections with string deduplication turned on. >> >> * SPECjbb2015 with a JVMTI agent that induces load and adds deduplicatable strings. >> >> * Tier1-7 >> >> Note: I'm currently not aware of any non-artificial workload where string deduplication is an important optimization when running with Generational ZGC. If anyone knows of a workload that greatly benefits from it *AND* uses ZGC as a low-latency collector, then that would be highly interesting to look at. >> >> Note 2: the branch contains two changesets. In the first changeset I added a flag to be able to test and compare the old implementation with the new implementation. For the final PR I've removed that flag and the associated code as a second changeset. If we really want we could keep that flag, but given how poorly that implementation worked for Generational ZGC, I think we should just go with this new implementation. > > src/hotspot/share/gc/z/zRelocate.cpp line 802: > >> 800: >> 801: void maybe_string_dedup(zaddress to_addr) { >> 802: const bool is_promotion = _forwarding->to_age() == ZPageAge::old && _forwarding->from_age() != ZPageAge::old; > > It seems like this could be computed once at `ZRelocateWork` construction time? Unless `_forwarding` > can change. Also, isn't this the same as `_forwarding->is_promotion()`? `_forwarding` changes whenever we move to the next ZPage to relocate. We could probably add an `_is_promotion` variable/constant to ZForwarding, but we're already reading `_to_age` and `_from_age` in `update_remset_for_fields` so I don't expect a noticeable performance difference if we would do that. Great point about `_forwarding->is_promotion()`. I'm updating the code to use that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23965#discussion_r1988690267 From stefank at openjdk.org Tue Mar 11 08:58:53 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 11 Mar 2025 08:58:53 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 21:40:33 GMT, Kim Barrett wrote: >> When -XX:+UseStringDeduplication is turned on, ZGC requests that every single String it encounters is deduplicated. The Strings of these requests are saved in weak handles, and then processed by a separate thread. One problematic part with this is that ZGC treats these handles as strong roots for young collections. So, even if the Strings are short-lived they will be artificially kept alive until they get promoted up to the old generation. >> >> This creates an extreme amount of Strings and weak handles to be processed by the old collection, which can result in long major collections and allocation stalls. >> >> Other GCs have a filter to check for how old the Strings are before they get deduplicated. >> >> The proposed fix is to move the string deduplication requests to when the Strings are promoted to the old generation. >> >> Testing: >> >> * I've tested this with a small micro that showed how ZGC got extremely long major collections with string deduplication turned on. >> >> * SPECjbb2015 with a JVMTI agent that induces load and adds deduplicatable strings. >> >> * Tier1-7 >> >> Note: I'm currently not aware of any non-artificial workload where string deduplication is an important optimization when running with Generational ZGC. If anyone knows of a workload that greatly benefits from it *AND* uses ZGC as a low-latency collector, then that would be highly interesting to look at. >> >> Note 2: the branch contains two changesets. In the first changeset I added a flag to be able to test and compare the old implementation with the new implementation. For the final PR I've removed that flag and the associated code as a second changeset. If we really want we could keep that flag, but given how poorly that implementation worked for Generational ZGC, I think we should just go with this new implementation. > > src/hotspot/share/gc/z/zStringDedup.inline.hpp line 41: > >> 39: // Not a String object >> 40: return; >> 41: } > > Consider using `StringDedup::is_enabled_string(obj)`, which combines `is_enabled` and `is_instance` > into a single test. >From what I can see, the `is_enable_string` takes a 'Klass*' and not an `oop`. So, that would require us to fetch the Klass bits, read the global `_klass_mode`, and decode the Klass*. I think I prefer the version we currently have because it optimize for the default (and most common run mode) that users are not using string dedup with ZGC. Also, I don't think it is that important to optimize these checks now that the code is only run for promoted objects, which should be a significant lower number of objects compared to the number of objects that are visited by the marking code path. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23965#discussion_r1988727987 From eosterlund at openjdk.org Tue Mar 11 09:09:06 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 11 Mar 2025 09:09:06 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings In-Reply-To: References: Message-ID: On Mon, 10 Mar 2025 15:08:16 GMT, Stefan Karlsson wrote: > When -XX:+UseStringDeduplication is turned on, ZGC requests that every single String it encounters is deduplicated. The Strings of these requests are saved in weak handles, and then processed by a separate thread. One problematic part with this is that ZGC treats these handles as strong roots for young collections. So, even if the Strings are short-lived they will be artificially kept alive until they get promoted up to the old generation. > > This creates an extreme amount of Strings and weak handles to be processed by the old collection, which can result in long major collections and allocation stalls. > > Other GCs have a filter to check for how old the Strings are before they get deduplicated. > > The proposed fix is to move the string deduplication requests to when the Strings are promoted to the old generation. > > Testing: > > * I've tested this with a small micro that showed how ZGC got extremely long major collections with string deduplication turned on. > > * SPECjbb2015 with a JVMTI agent that induces load and adds deduplicatable strings. > > * Tier1-7 > > Note: I'm currently not aware of any non-artificial workload where string deduplication is an important optimization when running with Generational ZGC. If anyone knows of a workload that greatly benefits from it *AND* uses ZGC as a low-latency collector, then that would be highly interesting to look at. > > Note 2: the branch contains two changesets. In the first changeset I added a flag to be able to test and compare the old implementation with the new implementation. For the final PR I've removed that flag and the associated code as a second changeset. If we really want we could keep that flag, but given how poorly that implementation worked for Generational ZGC, I think we should just go with this new implementation. With the change to use _forwarding->is_promotion() this looks good. Great fix! ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23965#pullrequestreview-2673496559 From stefank at openjdk.org Tue Mar 11 09:33:20 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 11 Mar 2025 09:33:20 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings [v2] In-Reply-To: References: Message-ID: <-d_fQfoAWB7fuNa7M46BFbuvHqdScs1ZTW3bFULjOnY=.b2b279cd-a9a9-46ad-8933-e5d2fd3f26c3@github.com> > When -XX:+UseStringDeduplication is turned on, ZGC requests that every single String it encounters is deduplicated. The Strings of these requests are saved in weak handles, and then processed by a separate thread. One problematic part with this is that ZGC treats these handles as strong roots for young collections. So, even if the Strings are short-lived they will be artificially kept alive until they get promoted up to the old generation. > > This creates an extreme amount of Strings and weak handles to be processed by the old collection, which can result in long major collections and allocation stalls. > > Other GCs have a filter to check for how old the Strings are before they get deduplicated. > > The proposed fix is to move the string deduplication requests to when the Strings are promoted to the old generation. > > Testing: > > * I've tested this with a small micro that showed how ZGC got extremely long major collections with string deduplication turned on. > > * SPECjbb2015 with a JVMTI agent that induces load and adds deduplicatable strings. > > * Tier1-7 > > Note: I'm currently not aware of any non-artificial workload where string deduplication is an important optimization when running with Generational ZGC. If anyone knows of a workload that greatly benefits from it *AND* uses ZGC as a low-latency collector, then that would be highly interesting to look at. > > Note 2: the branch contains two changesets. In the first changeset I added a flag to be able to test and compare the old implementation with the new implementation. For the final PR I've removed that flag and the associated code as a second changeset. If we really want we could keep that flag, but given how poorly that implementation worked for Generational ZGC, I think we should just go with this new implementation. Stefan Karlsson has updated the pull request incrementally with two additional commits since the last revision: - Make ZPageAge ZForwarding member fileds constant - Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23965/files - new: https://git.openjdk.org/jdk/pull/23965/files/d7579b95..12ad843f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23965&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23965&range=00-01 Stats: 4 lines in 2 files changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23965.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23965/head:pull/23965 PR: https://git.openjdk.org/jdk/pull/23965 From tschatzl at openjdk.org Tue Mar 11 09:51:53 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 11 Mar 2025 09:51:53 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v16] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * optimized RISCV gen_write_ref_array_post_barrier() implementation contributed by @RealFYang ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/93b884f1..758fac01 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=14-15 Stats: 36 lines in 1 file changed: 28 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Tue Mar 11 09:54:05 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 11 Mar 2025 09:54:05 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v15] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 03:22:52 GMT, Fei Yang wrote: > Tier1-3 test good on linux-riscv64 platform. And I have prepared an add-on change which implements the barrier method to write cards for a reference array for this platform. Do you want to have it in this PR? Thanks. I added your changes, thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2713415911 From stuefe at openjdk.org Tue Mar 11 15:29:07 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 11 Mar 2025 15:29:07 GMT Subject: RFR: 8351500: Random JVM crashes after task being moved to different NUMA node Message-ID: For details, please see JBS issue. _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. ---- The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. --- Testing: Testing is difficult. See remark in JBS issue. I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. ------------- Commit messages: - start Changes: https://git.openjdk.org/jdk/pull/23984/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23984&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351500 Stats: 28 lines in 4 files changed: 10 ins; 6 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/23984.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23984/head:pull/23984 PR: https://git.openjdk.org/jdk/pull/23984 From jsikstro at openjdk.org Tue Mar 11 15:40:55 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 11 Mar 2025 15:40:55 GMT Subject: RFR: 8351500: Random JVM crashes after task being moved to different NUMA node In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 14:01:11 GMT, Thomas Stuefe wrote: > For details, please see JBS issue. > > _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ > > I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. > > ---- > > The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. > > This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. > > --- > > Testing: > > Testing is difficult. See remark in JBS issue. > > I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. Just a comment: In the memory allocation layer "rewrite" in ZGC that we're working on (the [Mapped Cache](https://bugs.openjdk.org/browse/JDK-8350441)), we have the same policy that is proposed in this PR. We read the thread's affinity once and use a stored result for the remainder of the allocation work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23984#issuecomment-2714803212 From shade at openjdk.org Tue Mar 11 16:03:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 11 Mar 2025 16:03:07 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops [v3] In-Reply-To: References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Message-ID: On Mon, 10 Mar 2025 21:25:06 GMT, William Kemper wrote: >> Unloading classes may require a walk of unreachable oops. For this reason, it is not safe to recycle memory before class unloading is complete. This complements existing code to prevent mutators from recycling trash regions while weak roots is in progress. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Trim extraneous comment Looks okay to me. @rkennke, @zhengyu123 might have an opinion here as well. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 844: > 842: // during weak roots. Concurrent class unloading may access unmarked oops > 843: // in trash regions. > 844: return r->is_trash() && is_concurrent_weak_root_in_progress(); Pity to do this, but I understand the reason for it. We should investigate if this window is unnecessarily large. I see currently we drop `WEAK_ROOTS` gc state in `ShenandoahHeap::concurrent_prepare_for_update_refs`. Should we drop the flag sooner, somewhere after concurrent class unloading? Can be done separately, if it snowballs into something more complicated. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23951#pullrequestreview-2675209101 PR Review Comment: https://git.openjdk.org/jdk/pull/23951#discussion_r1989630728 From kbarrett at openjdk.org Tue Mar 11 17:22:54 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 11 Mar 2025 17:22:54 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings [v2] In-Reply-To: <-d_fQfoAWB7fuNa7M46BFbuvHqdScs1ZTW3bFULjOnY=.b2b279cd-a9a9-46ad-8933-e5d2fd3f26c3@github.com> References: <-d_fQfoAWB7fuNa7M46BFbuvHqdScs1ZTW3bFULjOnY=.b2b279cd-a9a9-46ad-8933-e5d2fd3f26c3@github.com> Message-ID: On Tue, 11 Mar 2025 09:33:20 GMT, Stefan Karlsson wrote: >> When -XX:+UseStringDeduplication is turned on, ZGC requests that every single String it encounters is deduplicated. The Strings of these requests are saved in weak handles, and then processed by a separate thread. One problematic part with this is that ZGC treats these handles as strong roots for young collections. So, even if the Strings are short-lived they will be artificially kept alive until they get promoted up to the old generation. >> >> This creates an extreme amount of Strings and weak handles to be processed by the old collection, which can result in long major collections and allocation stalls. >> >> Other GCs have a filter to check for how old the Strings are before they get deduplicated. >> >> The proposed fix is to move the string deduplication requests to when the Strings are promoted to the old generation. >> >> Testing: >> >> * I've tested this with a small micro that showed how ZGC got extremely long major collections with string deduplication turned on. >> >> * SPECjbb2015 with a JVMTI agent that induces load and adds deduplicatable strings. >> >> * Tier1-7 >> >> Note: I'm currently not aware of any non-artificial workload where string deduplication is an important optimization when running with Generational ZGC. If anyone knows of a workload that greatly benefits from it *AND* uses ZGC as a low-latency collector, then that would be highly interesting to look at. >> >> Note 2: the branch contains two changesets. In the first changeset I added a flag to be able to test and compare the old implementation with the new implementation. For the final PR I've removed that flag and the associated code as a second changeset. If we really want we could keep that flag, but given how poorly that implementation worked for Generational ZGC, I think we should just go with this new implementation. > > Stefan Karlsson has updated the pull request incrementally with two additional commits since the last revision: > > - Make ZPageAge ZForwarding member fileds constant > - Review comments Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/z/zStringDedup.hpp line 31: > 29: #include "oops/oopsHierarchy.hpp" > 30: > 31: class ZStringDedupContext { The file name and the class that constitutes all of the content of the file don't match. That seems kind of strange. Also contrary to the style guide: https://github.com/openjdk/jdk/blame/da2b4f0749dffc99fa42c7311fbc74231af273bd/doc/hotspot-style.md#L85-L87 ------------- PR Review: https://git.openjdk.org/jdk/pull/23965#pullrequestreview-2675462550 PR Review Comment: https://git.openjdk.org/jdk/pull/23965#discussion_r1989775357 From kbarrett at openjdk.org Tue Mar 11 17:22:55 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 11 Mar 2025 17:22:55 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings [v2] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 08:56:19 GMT, Stefan Karlsson wrote: >> src/hotspot/share/gc/z/zStringDedup.inline.hpp line 41: >> >>> 39: // Not a String object >>> 40: return; >>> 41: } >> >> Consider using `StringDedup::is_enabled_string(obj)`, which combines `is_enabled` and `is_instance` >> into a single test. > > From what I can see, the `is_enable_string` takes a 'Klass*' and not an `oop`. So, that would require us to fetch the Klass bits, read the global `_klass_mode`, and decode the Klass*. I think I prefer the version we currently have because it optimize for the default (and most common run mode) that users are not using string dedup with ZGC. > > Also, I don't think it is that important to optimize these checks now that the code is only run for promoted objects, which should be a significant lower number of objects compared to the number of objects that are visited by the marking code path. Good points. Places where `is_enabled_string` are used already have the `Klass*` in hand for other reasons. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23965#discussion_r1989783345 From stuefe at openjdk.org Tue Mar 11 18:21:57 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 11 Mar 2025 18:21:57 GMT Subject: RFR: 8351500: Random JVM crashes after task being moved to different NUMA node In-Reply-To: References: Message-ID: <216qpFBFbW7p1L-FFXMczMnaYU0JyESeBn4uHVxWVi4=.6657b3cc-5550-4cec-8f21-ab522a75dfe1@github.com> On Tue, 11 Mar 2025 15:38:22 GMT, Joel Sikstr?m wrote: > Just a comment: In the memory allocation layer "rewrite" in ZGC that we're working on (the [Mapped Cache](https://bugs.openjdk.org/browse/JDK-8350441)), we have the same policy that is proposed in this PR. We read the thread's affinity once and use a stored result for the remainder of the allocation work. Thank you @jsikstro . I wondered what ZGC did about this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23984#issuecomment-2715303882 From stefank at openjdk.org Tue Mar 11 18:46:53 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 11 Mar 2025 18:46:53 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings [v2] In-Reply-To: References: <-d_fQfoAWB7fuNa7M46BFbuvHqdScs1ZTW3bFULjOnY=.b2b279cd-a9a9-46ad-8933-e5d2fd3f26c3@github.com> Message-ID: <7ZsJhxZDYDIsXtDjhDHAnO0I4cxAFSlf7Oi9FVIh5_I=.2168a8f6-c7c6-4d07-aee6-753680dd11d3@github.com> On Tue, 11 Mar 2025 17:10:10 GMT, Kim Barrett wrote: >> Stefan Karlsson has updated the pull request incrementally with two additional commits since the last revision: >> >> - Make ZPageAge ZForwarding member fileds constant >> - Review comments > > src/hotspot/share/gc/z/zStringDedup.hpp line 31: > >> 29: #include "oops/oopsHierarchy.hpp" >> 30: >> 31: class ZStringDedupContext { > > The file name and the class that constitutes all of the content of the file don't match. That seems > kind of strange. Also contrary to the style guide: > https://github.com/openjdk/jdk/blame/da2b4f0749dffc99fa42c7311fbc74231af273bd/doc/hotspot-style.md#L85-L87 This was intentional. I named them zStringDedup.* to show that this is the file that contains ZGC's support to interface with StringDedup. We do that for other sub-systems that ZGC interfaces with. The crux is that the string dedup API requires us to maintain the lifecycle of a `StringDedup::Requests` instance, so we can't simply have a function like `ZStringDedup::request(obj)`. Instead we need to add a ZStringDedupContext class, just so that we maintain the Requests object. (I choose suffix `Context` instead of `Requests`, because that naming fits better with the rest of the ZGC code). About the style _guide_. I see that section more as a helpful guide, but not as a complete mandate to how to name files in HotSpot. Note, that even the StringDedup::Request class is placed in a file named stringDedup.hpp and not stringDedupRequest.hpp! I could easily also create a structure that simulates the structure used in stringDedup.hpp: class ZStringDedup { public: class Context { ... }; }; However, I don't find that particularly appealing and it somewhat goes against the informal style guide that Per and I used when we first started to write ZGC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23965#discussion_r1989934945 From wkemper at openjdk.org Tue Mar 11 19:03:59 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 19:03:59 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops [v3] In-Reply-To: References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Message-ID: On Tue, 11 Mar 2025 15:58:23 GMT, Aleksey Shipilev wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Trim extraneous comment > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 844: > >> 842: // during weak roots. Concurrent class unloading may access unmarked oops >> 843: // in trash regions. >> 844: return r->is_trash() && is_concurrent_weak_root_in_progress(); > > Pity to do this, but I understand the reason for it. > > We should investigate if this window is unnecessarily large. I see currently we drop `WEAK_ROOTS` gc state in `ShenandoahHeap::concurrent_prepare_for_update_refs`. Should we drop the flag sooner, somewhere after concurrent class unloading? Can be done separately, if it snowballs into something more complicated. Class unloading is the last thing we do before recycling trash regions. A region will be usable for allocation as soon as it is recycled, so, in a sense, this has the same effect as turning off the weak roots flag immediately after class unloading. Also, the weak roots phase itself cannot have regions recycled because it relies on accurate mark information (recycling clears live data and resets the TAMS). We _could_ work around this by preserving the mark data (perhaps decoupling TAMS reset from region recycling). But changing the `gc_state` currently requires either a safepoint or a handshake (while holding the `Thread_lock`). I haven't thought all the way through this, but something like this (psuedo-code) might be possible: ```C++ vmop_entry_final_mark(); // Complete class unloading, since it actually _needs_ the oops (still need to forbid trash recycling here). entry_class_unloading(); // Recycle trash, but do not reset TAMS (weak roots needs TAMS to decide reachability of referents). entry_cleanup_early(); // Complete weak roots. There are no more trash regions and we don't have to change gc_state entry_weak_refs(); entry_weak_roots(); What do you think? This would be a separate PR of course, but do you see any reason something like this wouldn't work? I'd expect some asserts to break if we allocate into a new region with TAMS > bottom. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23951#discussion_r1989959925 From wkemper at openjdk.org Tue Mar 11 19:37:15 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 19:37:15 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational Message-ID: Failed allocations may race to cancel the GC with the collector who is working to clear the cancelled GC. When the GC wins this race, it will fail to notify threads that are waiting for the failed GC cycle to complete. This change directly tracks the number of threads waiting due to an allocation failure, rather than indirectly tracking them through the cancelled gc state. # Testing Ran TestAllocHumongousFragment#generational 6,500 times without failures. ------------- Commit messages: - Track number of threads waiting on allocation failures for notification Changes: https://git.openjdk.org/jdk/pull/23997/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23997&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351464 Stats: 12 lines in 3 files changed: 9 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23997.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23997/head:pull/23997 PR: https://git.openjdk.org/jdk/pull/23997 From wkemper at openjdk.org Tue Mar 11 19:41:04 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 19:41:04 GMT Subject: Withdrawn: 8350905: Releasing a WeakHandle's referent may extend its lifetime In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 18:57:18 GMT, William Kemper wrote: > When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. We are proposing that native weak references are cleared with an additional `AS_NO_KEEPALIVE` decorator. This is similar to what was done for j.l.r.WeakReference in [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696). > > # Testing > > GHA, `hotspot_gc_shenandoah`. Additionally, for G1, ZGC, and Shenandoah we've run Extremem, Dacapo, SpecJVM2008, SpecJBB2015, Heapothesys and Diluvian. All executions completed without errors. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23935 From wkemper at openjdk.org Tue Mar 11 19:41:04 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 19:41:04 GMT Subject: RFR: 8350905: Releasing a WeakHandle's referent may extend its lifetime In-Reply-To: References: Message-ID: On Thu, 6 Mar 2025 18:57:18 GMT, William Kemper wrote: > When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. We are proposing that native weak references are cleared with an additional `AS_NO_KEEPALIVE` decorator. This is similar to what was done for j.l.r.WeakReference in [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696). > > # Testing > > GHA, `hotspot_gc_shenandoah`. Additionally, for G1, ZGC, and Shenandoah we've run Extremem, Dacapo, SpecJVM2008, SpecJBB2015, Heapothesys and Diluvian. All executions completed without errors. Withdrawing this PR. We'll do this in the Shenandoah barrier. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23935#issuecomment-2715505017 From wkemper at openjdk.org Tue Mar 11 19:59:29 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 19:59:29 GMT Subject: RFR: 8348400: GenShen: assert(ShenandoahHeap::heap()->is_full_gc_in_progress() || (used_regions_size() <= _max_capacity)) failed: Cannot use more than capacity # Message-ID: Shenandoah cannot recycle immediate trash regions during the concurrent weak roots phase, however some of these regions may be assigned to the old generation collector's reserve. When an evacuation/promotion tries to allocate in such a region, it will fail (as expected) and try to 'steal' a region from the mutator's partition of the free set. There are cases when this cannot be allowed due to capacity constraints. However, in some of these cases it will be possible to 'swap' a region between the old reserve and the mutator's partition. This change covers this case. ------------- Commit messages: - Do not enforce size constraints on generations - Don't allocate in regions that cannot be flipped to old gc - Do not allocate from mutator if young gen cannot spare the region Changes: https://git.openjdk.org/jdk/pull/23998/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23998&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8348400 Stats: 66 lines in 3 files changed: 42 ins; 13 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/23998.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23998/head:pull/23998 PR: https://git.openjdk.org/jdk/pull/23998 From kdnilsen at openjdk.org Tue Mar 11 21:01:53 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Mar 2025 21:01:53 GMT Subject: RFR: 8348400: GenShen: assert(ShenandoahHeap::heap()->is_full_gc_in_progress() || (used_regions_size() <= _max_capacity)) failed: Cannot use more than capacity # In-Reply-To: References: Message-ID: <1TI7zry8_JLLMVwxDq0Yd65TrgkSYafDOEn8zOFS7z0=.0517105a-520a-4686-83eb-a2446ee72a8a@github.com> On Tue, 11 Mar 2025 19:54:20 GMT, William Kemper wrote: > Shenandoah cannot recycle immediate trash regions during the concurrent weak roots phase, however some of these regions may be assigned to the old generation collector's reserve. When an evacuation/promotion tries to allocate in such a region, it will fail (as expected) and try to 'steal' a region from the mutator's partition of the free set. There are cases when this cannot be allowed due to capacity constraints. However, in some of these cases it will be possible to 'swap' a region between the old reserve and the mutator's partition. This change covers this case. src/hotspot/share/gc/shenandoah/shenandoahGenerationSizer.cpp line 127: > 125: } > 126: > 127: if (dst->max_capacity() + bytes_to_transfer > max_size_for(dst)) { Do we need to edit the descriptions of ShenandoahMinYoungPercentage and ShenandoahMaxYoungPercentage? Do we need to remove these options entirely from shenandoah_globals? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23998#discussion_r1990119862 From wkemper at openjdk.org Tue Mar 11 21:49:06 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 21:49:06 GMT Subject: RFR: 8350898: Shenandoah: Eliminate final roots safepoint [v4] In-Reply-To: References: Message-ID: > This PR converts the final roots safepoint operation into a handshake. The safepoint operation still exists, but is only executed when `ShenandoahVerify` is enabled. In addition to this change, this PR also improves the logging for the concurrent preparation for update references from [PR 22688](https://github.com/openjdk/jdk/pull/22688). William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots - Clarify which thread local buffers in comment - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots - Fix comments - Add whitespace at end of file - More detail for init update refs event message - Use timing tracker for timing verification - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots - WIP: Fix up phase timings for newly concurrent final roots and init update refs - WIP: Combine satb transfer with state propagation, restore phase timing data - ... and 2 more: https://git.openjdk.org/jdk/compare/1dd9cf10...a3575f1e ------------- Changes: https://git.openjdk.org/jdk/pull/23830/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23830&range=03 Stats: 291 lines in 14 files changed: 194 ins; 47 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/23830.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23830/head:pull/23830 PR: https://git.openjdk.org/jdk/pull/23830 From xpeng at openjdk.org Tue Mar 11 21:54:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 11 Mar 2025 21:54:56 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 19:31:47 GMT, William Kemper wrote: > Failed allocations may race to cancel the GC with the collector who is working to clear the cancelled GC. When the GC wins this race, it will fail to notify threads that are waiting for the failed GC cycle to complete. This change directly tracks the number of threads waiting due to an allocation failure, rather than indirectly tracking them through the cancelled gc state. > > # Testing > Ran TestAllocHumongousFragment#generational 6,500 times without failures. The bug should also exist in classical Shenandoah w/o generation, I think ShenandoahControlThread also need to be updated to fix the bug, even it seems not happening in ShenandoahControlThread in the jtreg test. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 281: > 279: > 280: { > 281: MonitorLocker ml(&_alloc_failure_waiters_lock); Should the notification code be encapsulated in method `notify_alloc_failure_waiters()`? ------------- PR Review: https://git.openjdk.org/jdk/pull/23997#pullrequestreview-2676193059 PR Review Comment: https://git.openjdk.org/jdk/pull/23997#discussion_r1990174134 From kdnilsen at openjdk.org Tue Mar 11 22:27:53 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 11 Mar 2025 22:27:53 GMT Subject: RFR: 8350898: Shenandoah: Eliminate final roots safepoint [v4] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 21:49:06 GMT, William Kemper wrote: >> This PR converts the final roots safepoint operation into a handshake. The safepoint operation still exists, but is only executed when `ShenandoahVerify` is enabled. In addition to this change, this PR also improves the logging for the concurrent preparation for update references from [PR 22688](https://github.com/openjdk/jdk/pull/22688). > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots > - Clarify which thread local buffers in comment > - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots > - Fix comments > - Add whitespace at end of file > - More detail for init update refs event message > - Use timing tracker for timing verification > - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots > - WIP: Fix up phase timings for newly concurrent final roots and init update refs > - WIP: Combine satb transfer with state propagation, restore phase timing data > - ... and 2 more: https://git.openjdk.org/jdk/compare/1dd9cf10...a3575f1e Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23830#pullrequestreview-2676245948 From wkemper at openjdk.org Tue Mar 11 22:32:52 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 22:32:52 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 21:52:00 GMT, Xiaolong Peng wrote: >> Failed allocations may race to cancel the GC with the collector who is working to clear the cancelled GC. When the GC wins this race, it will fail to notify threads that are waiting for the failed GC cycle to complete. This change directly tracks the number of threads waiting due to an allocation failure, rather than indirectly tracking them through the cancelled gc state. >> >> # Testing >> Ran TestAllocHumongousFragment#generational 6,500 times without failures. > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 281: > >> 279: >> 280: { >> 281: MonitorLocker ml(&_alloc_failure_waiters_lock); > > Should the notification code be encapsulated in method `notify_alloc_failure_waiters()`? Yes, will do this when I take out the waiter counts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23997#discussion_r1990208348 From wkemper at openjdk.org Tue Mar 11 22:35:58 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 22:35:58 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 19:31:47 GMT, William Kemper wrote: > Failed allocations may race to cancel the GC with the collector who is working to clear the cancelled GC. When the GC wins this race, it will fail to notify threads that are waiting for the failed GC cycle to complete. This change directly tracks the number of threads waiting due to an allocation failure, rather than indirectly tracking them through the cancelled gc state. > > # Testing > Ran TestAllocHumongousFragment#generational 6,500 times without failures. Not sure I want to change `ShenandoahControlThread.` It uses a different mechanism to track whether or not to notify. It only notifies when it services the alloc failure request (it doesn't depend on the shared `cancelled_gc` state the same way the generational mode does). In the scenario that leads to this live lock for the generational mode, the default mode would _not_ notify the waiters upon successful completion of the concurrent cycle. It would notify them after the subsequent degenerated cycle. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23997#issuecomment-2715851851 From wkemper at openjdk.org Tue Mar 11 22:59:45 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 22:59:45 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational [v2] In-Reply-To: References: Message-ID: > Failed allocations may race to cancel the GC with the collector who is working to clear the cancelled GC. When the GC wins this race, it will fail to notify threads that are waiting for the failed GC cycle to complete. This change directly tracks the number of threads waiting due to an allocation failure, rather than indirectly tracking them through the cancelled gc state. > > # Testing > Ran TestAllocHumongousFragment#generational 6,500 times without failures. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Notify alloc waiters when GC completes without cancellation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23997/files - new: https://git.openjdk.org/jdk/pull/23997/files/d0168ca9..cb9cd72f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23997&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23997&range=00-01 Stats: 12 lines in 3 files changed: 0 ins; 9 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23997.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23997/head:pull/23997 PR: https://git.openjdk.org/jdk/pull/23997 From xpeng at openjdk.org Tue Mar 11 23:30:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 11 Mar 2025 23:30:56 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 22:33:44 GMT, William Kemper wrote: > Not sure I want to change `ShenandoahControlThread.` It uses a different mechanism to track whether or not to notify. It only notifies when it services the alloc failure request (it doesn't depend on the shared `cancelled_gc` state the same way the generational mode does). In the scenario that leads to this live lock for the generational mode, the default mode would _not_ notify the waiters upon successful completion of the concurrent cycle. It would notify them after the subsequent degenerated cycle. It does use the shared cancelled_cause, see the code here https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp#L68 and at [line 171](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp#L171), `ShenandoahControlThread` does have same problem, alloc_failure_pending is evaluated using shared cancelled_cause before starting a cycle. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23997#issuecomment-2715950210 From wkemper at openjdk.org Tue Mar 11 23:40:24 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 23:40:24 GMT Subject: RFR: 8350905: Shenandoah: Releasing a WeakHandle's referent may extend its lifetime Message-ID: When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier like Shenandoah, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. Shenandoah's store barrier should be modified to not enqueue WEAK or PHANTOM stores in the SATB buffer. ------------- Commit messages: - Can only make assertions about reference strength for stores outside of the heap - Merge remote-tracking branch 'jdk/master' into satb-ignore-weak-store - Do not enqueue weak stores Changes: https://git.openjdk.org/jdk/pull/24001/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24001&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350905 Stats: 7 lines in 1 file changed: 4 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24001.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24001/head:pull/24001 PR: https://git.openjdk.org/jdk/pull/24001 From wkemper at openjdk.org Tue Mar 11 23:51:35 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 23:51:35 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational [v3] In-Reply-To: References: Message-ID: > Failed allocations may race to cancel the GC with the collector who is working to clear the cancelled GC. When the GC wins this race, it will fail to notify threads that are waiting for the failed GC cycle to complete. This change directly tracks the number of threads waiting due to an allocation failure, rather than indirectly tracking them through the cancelled gc state. > > # Testing > Ran TestAllocHumongousFragment#generational 6,500 times without failures. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Unproblem list test that found this issue - Merge remote-tracking branch 'jdk/master' into fix-alloc-waiters-missed-notify - Notify alloc waiters when GC completes without cancellation - Track number of threads waiting on allocation failures for notification ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23997/files - new: https://git.openjdk.org/jdk/pull/23997/files/cb9cd72f..3a16131b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23997&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23997&range=01-02 Stats: 17915 lines in 138 files changed: 6689 ins; 10288 del; 938 mod Patch: https://git.openjdk.org/jdk/pull/23997.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23997/head:pull/23997 PR: https://git.openjdk.org/jdk/pull/23997 From wkemper at openjdk.org Tue Mar 11 23:53:52 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 11 Mar 2025 23:53:52 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational In-Reply-To: References: Message-ID: <9LfJ2F0nrNM7VPPnLGwsu_jPi5UPuZwy8hsbjkcmPII=.15ea5668-6541-43a4-b012-723dd28d9efb@github.com> On Tue, 11 Mar 2025 23:28:32 GMT, Xiaolong Peng wrote: >> Not sure I want to change `ShenandoahControlThread.` It uses a different mechanism to track whether or not to notify. It only notifies when it services the alloc failure request (it doesn't depend on the shared `cancelled_gc` state the same way the generational mode does). In the scenario that leads to this live lock for the generational mode, the default mode would _not_ notify the waiters upon successful completion of the concurrent cycle. It would notify them after the subsequent degenerated cycle. > >> Not sure I want to change `ShenandoahControlThread.` It uses a different mechanism to track whether or not to notify. It only notifies when it services the alloc failure request (it doesn't depend on the shared `cancelled_gc` state the same way the generational mode does). In the scenario that leads to this live lock for the generational mode, the default mode would _not_ notify the waiters upon successful completion of the concurrent cycle. It would notify them after the subsequent degenerated cycle. > > It does use the shared cancelled_cause, see the code here https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp#L68 and at [line 171](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp#L171), `ShenandoahControlThread` does have same problem, alloc_failure_pending is evaluated using shared cancelled_cause before starting a cycle. @pengxiaolong , yes - I agree. Good catch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23997#issuecomment-2715984419 From wkemper at openjdk.org Wed Mar 12 00:05:05 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Mar 2025 00:05:05 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational [v4] In-Reply-To: References: Message-ID: > Failed allocations may race to cancel the GC with the collector who is working to clear the cancelled GC. When the GC wins this race, it will fail to notify threads that are waiting for the failed GC cycle to complete. This change directly tracks the number of threads waiting due to an allocation failure, rather than indirectly tracking them through the cancelled gc state. > > # Testing > Ran TestAllocHumongousFragment#generational 6,500 times without failures. William Kemper has updated the pull request incrementally with one additional commit since the last revision: The non-generational modes may also fail to notify waiters ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23997/files - new: https://git.openjdk.org/jdk/pull/23997/files/3a16131b..f72e71c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23997&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23997&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23997.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23997/head:pull/23997 PR: https://git.openjdk.org/jdk/pull/23997 From xpeng at openjdk.org Wed Mar 12 00:29:53 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 12 Mar 2025 00:29:53 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational [v4] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 00:05:05 GMT, William Kemper wrote: >> Failed allocations may race to cancel the GC with the collector who is working to clear the cancelled GC. When the GC wins this race, it will fail to notify threads that are waiting for the failed GC cycle to complete. This change directly tracks the number of threads waiting due to an allocation failure, rather than indirectly tracking them through the cancelled gc state. >> >> # Testing >> Ran TestAllocHumongousFragment#generational 6,500 times without failures. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > The non-generational modes may also fail to notify waiters src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 171: > 169: > 170: // If this cycle completed without being cancelled, notify waiters about it > 171: if (!heap->cancelled_gc()) { I feel we should remove the test `!heap->cancelled_gc()` here, if is fine if there is single mutator thread, but in most cases there are mutator threads, then the following case could happen: 1. **Mutator A** try to cancel GC and notify control thread, it will wait with `_alloc_failure_waiters_lock`, `_cancelled_cause` is set to `_allocation_failure` 2. Concurrent GC clear `_cancelled_cause` and set it to `_no_gc` in op_final_update_refs 3. **Mutator B** try to cancel GC and successfully set `_cancelled_cause` to `_allocation_failure` again. 4. Concurrent GC finishes. 5. Control thread check `!heap->cancelled_gc()` which is false, and won't wake up mutators. In this case, it will delay the wake up for mutator A & B to next cycle. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23997#discussion_r1990344257 From stuefe at openjdk.org Wed Mar 12 06:32:52 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 12 Mar 2025 06:32:52 GMT Subject: RFR: 8351500: Random JVM crashes after task being moved to different NUMA node In-Reply-To: References: Message-ID: <8iglk9C-DVHYhScZcKoHy9zHwX2m1wQgloRVQqFb8bw=.d020fed8-2eb7-4a61-b0b2-769b1adc22e0@github.com> On Tue, 11 Mar 2025 14:01:11 GMT, Thomas Stuefe wrote: > For details, please see JBS issue. > > _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ > > I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. > > ---- > > The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. > > This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. > > --- > > Testing: > > Testing is difficult. See remark in JBS issue. > > I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. @kstefanj , could you take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23984#issuecomment-2716686647 From sjohanss at openjdk.org Wed Mar 12 07:18:08 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 12 Mar 2025 07:18:08 GMT Subject: RFR: 8351167: ZGC: Lazily initialize livemap [v2] In-Reply-To: References: <05cL5-IAaVEDpyTUxQ61JqzRzgM6myzEsIWt-xBLJRM=.ad94b7e8-50e0-4447-952d-995e143b5218@github.com> Message-ID: On Tue, 4 Mar 2025 20:17:28 GMT, Joel Sikstr?m wrote: >> Memory for the bitmap inside the livemap of a ZPage is currently allocated upon calling its constructor, which adds a latency overhead when allocating pages. As preparation for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)), but also as a standalone improvement, we want to instead lazily initialize the livemap's bitmap. >> >> This patch holds off with allocating memory for the bitmap that the livemap uses until the livemap is written to the first time (i.e. by calling ZLiveMap::set). The effect of this is that the latency impact of allocating the bitmap will only be taken by GC threads and not by mutator threads, since only GC threads mark objects before pushing them onto the mark stack. This improvement will reduce page allocation latencies somewhat. >> >> In addition to lazily allocating the bitmap, I've converted the static C-style cast to a checked cast for `ZPage::object_max_count()`, which is passed as the size to the bitmaps. This is because a value not contained in 32 bits will overflow with the C-style cast and give a too small bitmap when passed to the livemap. This is not an observed issue, just more of a sanity check. >> >> Testing: >> * Tiers 1-5 >> * GHA > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Copyright years Looks good. ------------- Marked as reviewed by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23907#pullrequestreview-2677230303 From sjohanss at openjdk.org Wed Mar 12 07:38:56 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 12 Mar 2025 07:38:56 GMT Subject: RFR: 8351500: Random JVM crashes after task being moved to different NUMA node In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 14:01:11 GMT, Thomas Stuefe wrote: > For details, please see JBS issue. > > _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ > > I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. > > ---- > > The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. > > This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. > > --- > > Testing: > > Testing is difficult. See remark in JBS issue. > > I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. I think this looks like a reasonable fix, I'll talk to the rest of the team later today. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23984#issuecomment-2716903846 From stuefe at openjdk.org Wed Mar 12 08:51:57 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 12 Mar 2025 08:51:57 GMT Subject: RFR: 8351500: Random JVM crashes after task being moved to different NUMA node In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 07:36:02 GMT, Stefan Johansson wrote: > I think this looks like a reasonable fix, I'll talk to the rest of the team later today. Thanks Stefan! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23984#issuecomment-2717092363 From eosterlund at openjdk.org Wed Mar 12 09:41:57 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 12 Mar 2025 09:41:57 GMT Subject: RFR: 8351167: ZGC: Lazily initialize livemap [v2] In-Reply-To: References: <05cL5-IAaVEDpyTUxQ61JqzRzgM6myzEsIWt-xBLJRM=.ad94b7e8-50e0-4447-952d-995e143b5218@github.com> Message-ID: <9FDmrlyfL9DHJImGay7SUyMriVrNbv6HhC46DNgZ--A=.91775fa6-eda2-473d-87da-6515d62f0b86@github.com> On Tue, 4 Mar 2025 20:17:28 GMT, Joel Sikstr?m wrote: >> Memory for the bitmap inside the livemap of a ZPage is currently allocated upon calling its constructor, which adds a latency overhead when allocating pages. As preparation for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)), but also as a standalone improvement, we want to instead lazily initialize the livemap's bitmap. >> >> This patch holds off with allocating memory for the bitmap that the livemap uses until the livemap is written to the first time (i.e. by calling ZLiveMap::set). The effect of this is that the latency impact of allocating the bitmap will only be taken by GC threads and not by mutator threads, since only GC threads mark objects before pushing them onto the mark stack. This improvement will reduce page allocation latencies somewhat. >> >> In addition to lazily allocating the bitmap, I've converted the static C-style cast to a checked cast for `ZPage::object_max_count()`, which is passed as the size to the bitmaps. This is because a value not contained in 32 bits will overflow with the C-style cast and give a too small bitmap when passed to the livemap. This is not an observed issue, just more of a sanity check. >> >> Testing: >> * Tiers 1-5 >> * GHA > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Copyright years Seems reasonable. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23907#pullrequestreview-2677735385 From eosterlund at openjdk.org Wed Mar 12 09:45:03 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 12 Mar 2025 09:45:03 GMT Subject: RFR: 8351216: ZGC: Store NUMA node count In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 20:06:08 GMT, Joel Sikstr?m wrote: > To avoid calling into `os::Linux::max_numa_node()` and in turn libnuma on every count lookup, I propose we instead store the count statically inside ZNUMA. This is perfectly fine since the value that we get from libnuma is configured once during initialization and never change during runtime. > > The count is set during platform dependent initialization and the getter is now defined in the common code in ZNUMA.cpp. On operating systems that ZGC does not support NUMA for (BSD and Windows) we keep the current behavior by setting the count to 1. > > This is also preparation work for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)). > > Testing: > * Tiers 1-3 > * GHA > * Verify that the count is set on a Linux system with NUMA hardware Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23922#pullrequestreview-2677744474 From shade at openjdk.org Wed Mar 12 10:38:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 10:38:02 GMT Subject: RFR: 8350905: Shenandoah: Releasing a WeakHandle's referent may extend its lifetime In-Reply-To: References: Message-ID: <2Hcn5dvmiq7DKbSwecZyrIeTxwbGVU2ISKgyoFdS-sk=.be164c1b-b4b9-47cc-941b-ecd9d25d5fb1@github.com> On Tue, 11 Mar 2025 23:35:24 GMT, William Kemper wrote: > When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier like Shenandoah, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. Shenandoah's store barrier should be modified to not enqueue WEAK or PHANTOM stores in the SATB buffer. Some nits: src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 159: > 157: HasDecorator::value || > 158: HasDecorator::value || > 159: HasDecorator::value) { Suggest to split it into two things, with comments: // Uninitialized and no-keepalive stores do not need barrier. if (HasDecorator::value || HasDecorator::value) { return; } // Stores to weak/phantom require no barrier. The old references would // have been resurrected by load barrier if they were needed. if (HasDecorator::value || HasDecorator::value) { return; } (I think I caught the reason why we are safe to skip SATB here, maybe comment can be expanded) src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 279: > 277: oop_store_common(addr, value); > 278: if (ShenandoahCardBarrier) { > 279: barrier_set()->write_ref_field_post(addr); Unnecessary change? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24001#pullrequestreview-2677892517 PR Review Comment: https://git.openjdk.org/jdk/pull/24001#discussion_r1991165939 PR Review Comment: https://git.openjdk.org/jdk/pull/24001#discussion_r1991163625 From tschatzl at openjdk.org Wed Mar 12 11:27:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Mar 2025 11:27:52 GMT Subject: RFR: 8351500: G1: NUMA migrations cause crashes in region allocation In-Reply-To: References: Message-ID: <0Zu7ErYNI_4FhaOxcHecxPgXMdOeL1QT-P2VmWm0eyQ=.6ab2a279-b0d9-4a7f-8f8e-f873c9ad77e7@github.com> On Tue, 11 Mar 2025 14:01:11 GMT, Thomas Stuefe wrote: > For details, please see JBS issue. > > _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ > > I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. > > ---- > > The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. > > This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. > > --- > > Testing: > > Testing is difficult. See remark in JBS issue. > > I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. src/hotspot/share/gc/g1/g1Allocator.inline.hpp line 55: > 53: size_t desired_word_size, > 54: size_t* actual_word_size, > 55: uint node_index) { (I errorneously commented this on the JDK 21 change, but it is the same here; since this is a pre-existing issue we might want to fix all this in a separate CR though) I would strongly prefer if node_index were not the last argument - it is an input argument, and should be next to the other input arguments. Ie. right now the types of arguments are "input, input, output, input" (min_word_size, desired_word_size, actual_word_size, node_index) which is imo not good style. Maybe this should also be fixed in survivor_attempt_allocation and other similar places (this is a pre-existing issue; I do remember we just tacked the parameter on when we added NUMA support). Also the comment should be updated to something like Attempt allocation in the current alloc region on the given node or so. Another option would be to explicitly group the allocation type parameters into a struct, e.g. struct AllocParams { size_t desired_word_size; uint node_index; } to make it explicit that we want to enforce this closeness. not required though, as for apart from this method the number of arguments isn't that bad. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23984#discussion_r1991270482 From tschatzl at openjdk.org Wed Mar 12 11:42:52 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Mar 2025 11:42:52 GMT Subject: RFR: 8351167: ZGC: Lazily initialize livemap [v2] In-Reply-To: References: <05cL5-IAaVEDpyTUxQ61JqzRzgM6myzEsIWt-xBLJRM=.ad94b7e8-50e0-4447-952d-995e143b5218@github.com> Message-ID: On Tue, 4 Mar 2025 20:17:28 GMT, Joel Sikstr?m wrote: >> Memory for the bitmap inside the livemap of a ZPage is currently allocated upon calling its constructor, which adds a latency overhead when allocating pages. As preparation for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)), but also as a standalone improvement, we want to instead lazily initialize the livemap's bitmap. >> >> This patch holds off with allocating memory for the bitmap that the livemap uses until the livemap is written to the first time (i.e. by calling ZLiveMap::set). The effect of this is that the latency impact of allocating the bitmap will only be taken by GC threads and not by mutator threads, since only GC threads mark objects before pushing them onto the mark stack. This improvement will reduce page allocation latencies somewhat. >> >> In addition to lazily allocating the bitmap, I've converted the static C-style cast to a checked cast for `ZPage::object_max_count()`, which is passed as the size to the bitmaps. This is because a value not contained in 32 bits will overflow with the C-style cast and give a too small bitmap when passed to the livemap. This is not an observed issue, just more of a sanity check. >> >> Testing: >> * Tiers 1-5 >> * GHA > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Copyright years Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23907#pullrequestreview-2678106656 From tschatzl at openjdk.org Wed Mar 12 11:48:54 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Mar 2025 11:48:54 GMT Subject: RFR: 8351216: ZGC: Store NUMA node count In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 20:06:08 GMT, Joel Sikstr?m wrote: > To avoid calling into `os::Linux::max_numa_node()` and in turn libnuma on every count lookup, I propose we instead store the count statically inside ZNUMA. This is perfectly fine since the value that we get from libnuma is configured once during initialization and never change during runtime. > > The count is set during platform dependent initialization and the getter is now defined in the common code in ZNUMA.cpp. On operating systems that ZGC does not support NUMA for (BSD and Windows) we keep the current behavior by setting the count to 1. > > This is also preparation work for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)). > > Testing: > * Tiers 1-3 > * GHA > * Verify that the count is set on a Linux system with NUMA hardware Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23922#pullrequestreview-2678126620 From tschatzl at openjdk.org Wed Mar 12 11:58:45 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Mar 2025 11:58:45 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v17] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * optimized RISCV gen_write_ref_array_post_barrier() implementation contributed by @RealFYang - * fix card table verification crashes: in the first refinement phase, when switching the global card tables, we need to re-check whether we are still in the same sweep epoch or not. It might have changed due to a GC interrupting acquiring the Heap_lock. Otherwise new threads will scribble on the refinement table. Cause are last-minute changes before making the PR ready to review. Testing: without the patch, occurs fairly frequently when continuously (1 in 20) starting refinement. Does not afterward. - * ayang review 3 * comments * minor refactorings - * iwalulya review * renaming * fix some includes, forward declaration - * fix whitespace * additional whitespace between log tags * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename - ayang review * renamings * refactorings - iwalulya review * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement * predicate for determining whether the refinement has been disabled * some other typos/comment improvements * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming - * ayang review - fix comment - * iwalulya review 2 * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState * some additional documentation - ... and 14 more: https://git.openjdk.org/jdk/compare/f77fa17b...aec95051 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/758fac01..aec95051 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=15-16 Stats: 78123 lines in 1539 files changed: 36243 ins; 29177 del; 12703 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From stuefe at openjdk.org Wed Mar 12 13:27:17 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 12 Mar 2025 13:27:17 GMT Subject: RFR: 8351500: G1: NUMA migrations cause crashes in region allocation [v2] In-Reply-To: References: Message-ID: > For details, please see JBS issue. > > _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ > > I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. > > ---- > > The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. > > This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. > > --- > > Testing: > > Testing is difficult. See remark in JBS issue. > > I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: node_index parameter should precede output parameters ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23984/files - new: https://git.openjdk.org/jdk/pull/23984/files/0c6f547b..c8870820 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23984&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23984&range=00-01 Stats: 18 lines in 4 files changed: 0 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/23984.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23984/head:pull/23984 PR: https://git.openjdk.org/jdk/pull/23984 From ayang at openjdk.org Wed Mar 12 13:34:04 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 12 Mar 2025 13:34:04 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v17] In-Reply-To: References: Message-ID: <0w7seS1tIFhUxnmStxQySISWVfpBBsRmUtx7EoTy9a4=.509a3d5e-56d0-4fd8-8896-51835b14302b@github.com> On Wed, 12 Mar 2025 11:58:45 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: > > - Merge branch 'master' into 8342382-card-table-instead-of-dcq > - * optimized RISCV gen_write_ref_array_post_barrier() implementation contributed by @RealFYang > - * fix card table verification crashes: in the first refinement phase, when switching the global card tables, we need to re-check whether we are still in the same sweep epoch or not. It might have changed due to a GC interrupting acquiring the Heap_lock. Otherwise new threads will scribble on the refinement table. > Cause are last-minute changes before making the PR ready to review. > > Testing: without the patch, occurs fairly frequently when continuously > (1 in 20) starting refinement. Does not afterward. > - * ayang review 3 > * comments > * minor refactorings > - * iwalulya review > * renaming > * fix some includes, forward declaration > - * fix whitespace > * additional whitespace between log tags > * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename > - ayang review > * renamings > * refactorings > - iwalulya review > * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement > * predicate for determining whether the refinement has been disabled > * some other typos/comment improvements > * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming > - * ayang review - fix comment > - * iwalulya review 2 > * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState > * some additional documentation > - ... and 14 more: https://git.openjdk.org/jdk/compare/53a66058...aec95051 src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 217: > 215: > 216: { > 217: SuspendibleThreadSetLeaver sts_leave; Can you add some comment on why leaving the set is required? It's not obvious to me why. I'd expect handshake to work out of the box... src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 263: > 261: > 262: SuspendibleThreadSetLeaver sts_leave; > 263: VMThread::execute(&op); Can you elaborate what synchronization this VM op is trying to achieve? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1991489399 PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1991382024 From stuefe at openjdk.org Wed Mar 12 13:33:55 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 12 Mar 2025 13:33:55 GMT Subject: RFR: 8351500: G1: NUMA migrations cause crashes in region allocation [v2] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 13:27:17 GMT, Thomas Stuefe wrote: >> For details, please see JBS issue. >> >> _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ >> >> I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. >> >> ---- >> >> The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. >> >> This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. >> >> --- >> >> Testing: >> >> Testing is difficult. See remark in JBS issue. >> >> I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > node_index parameter should precede output parameters @tschatzl Thanks for looking at this. I tried to find a minimally invasive way to do the struct solution, but its not so easy. I want to keep the patch somewhat minimal and focused on the problem to solve, also to be able to downport it to 21 easily. So I did a compromise, moving node_index to a unified position in G1Allocator and, where necessary, G1CollectedHeap, while not touching other uses of this parameter. While working on this, I found that the ambiguity between size_t and uint is annoying when dealing with parameters that are a mixture of those; there is not much type safety when using these raw types. We have three different parameter groupings: {size, node_index}, {desired, min, node_index, output size} and {desired, min, output size}. Having a structure using all three does not do much for clarity (some members make no sense in some cases etc). What I would do if I were to rework this: - add a type for node index, ideally something that cannot be converted automatically to a size_t, but something that can be handed around via value. E.g. a `struct NodeIndex { uint _v; };`. It could be homed in g1NUMA.hpp. - then add a struct for the typical min+desired+actual parameter group, three size_t. I played around with this but the patch got far too big for this simple crash fix. Would be something for a cleanup RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23984#issuecomment-2717898769 From ayang at openjdk.org Wed Mar 12 13:33:59 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 12 Mar 2025 13:33:59 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v14] In-Reply-To: <5w6qUwzDQadxseocRl6rRF0AllyeukWTpYl2XjAfiTE=.fb62a50e-e308-4d08-8057-67e70e13ccbb@github.com> References: <5w6qUwzDQadxseocRl6rRF0AllyeukWTpYl2XjAfiTE=.fb62a50e-e308-4d08-8057-67e70e13ccbb@github.com> Message-ID: On Fri, 7 Mar 2025 13:14:02 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * iwalulya review >> * renaming >> * fix some includes, forward declaration > > src/hotspot/share/gc/g1/g1CardTable.hpp line 76: > >> 74: g1_card_already_scanned = 0x1, >> 75: g1_to_cset_card = 0x2, >> 76: g1_from_remset_card = 0x4 > > Could you outline the motivation for this more precise info? Is it for optimization or essentially for correctness? OK, it's for better performance, not correctness. How much is the improvement? As I understand it, this more precise info is largely independent of the new barrier logic. I wonder if it makes sense to extract this out to its own ticket to better assess its impact on perf and impl complexity. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1991375754 From shade at openjdk.org Wed Mar 12 13:39:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 13:39:53 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops [v3] In-Reply-To: References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Message-ID: On Tue, 11 Mar 2025 19:01:31 GMT, William Kemper wrote: > A region will be usable for allocation as soon as it is recycled, so, in a sense, this has the same effect as turning off the weak roots flag immediately after class unloading. Right. This answers my original question. > What do you think? This would be a separate PR of course, but do you see any reason something like this wouldn't work? It looks to me as stretching the definition of "trash" even further? I think it would be conceptually cleaner to never turn regular regions into trash until after weak roots are done. So accesses to "dead" weak roots are still possible like a regular access to "regular" region. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23951#discussion_r1991522731 From tschatzl at openjdk.org Wed Mar 12 14:00:15 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Mar 2025 14:00:15 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v17] In-Reply-To: <0w7seS1tIFhUxnmStxQySISWVfpBBsRmUtx7EoTy9a4=.509a3d5e-56d0-4fd8-8896-51835b14302b@github.com> References: <0w7seS1tIFhUxnmStxQySISWVfpBBsRmUtx7EoTy9a4=.509a3d5e-56d0-4fd8-8896-51835b14302b@github.com> Message-ID: On Wed, 12 Mar 2025 12:23:50 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: >> >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - * optimized RISCV gen_write_ref_array_post_barrier() implementation contributed by @RealFYang >> - * fix card table verification crashes: in the first refinement phase, when switching the global card tables, we need to re-check whether we are still in the same sweep epoch or not. It might have changed due to a GC interrupting acquiring the Heap_lock. Otherwise new threads will scribble on the refinement table. >> Cause are last-minute changes before making the PR ready to review. >> >> Testing: without the patch, occurs fairly frequently when continuously >> (1 in 20) starting refinement. Does not afterward. >> - * ayang review 3 >> * comments >> * minor refactorings >> - * iwalulya review >> * renaming >> * fix some includes, forward declaration >> - * fix whitespace >> * additional whitespace between log tags >> * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename >> - ayang review >> * renamings >> * refactorings >> - iwalulya review >> * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement >> * predicate for determining whether the refinement has been disabled >> * some other typos/comment improvements >> * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming >> - * ayang review - fix comment >> - * iwalulya review 2 >> * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState >> * some additional documentation >> - ... and 14 more: https://git.openjdk.org/jdk/compare/5727f166...aec95051 > > src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 263: > >> 261: >> 262: SuspendibleThreadSetLeaver sts_leave; >> 263: VMThread::execute(&op); > > Can you elaborate what synchronization this VM op is trying to achieve? Memory visibility for refinement threads for the references written to the heap. Without them, they may not have received the most recent values. This is the same as the `StoreLoad` barriers synchronization between mutator and refinement threads imo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1991561707 From ecaspole at openjdk.org Wed Mar 12 15:54:28 2025 From: ecaspole at openjdk.org (Eric Caspole) Date: Wed, 12 Mar 2025 15:54:28 GMT Subject: RFR: 8346470: Improve WriteBarrier JMH to have old-to-young refs Message-ID: Adds new cases based on the existing ones using an extra iteration setup to allocate into young gen, allowing to test old-to-young stores etc, where the existing code is all old-to-old. Here is a run on a standard OCI A1.160 with JDK 25: Benchmark Mode Cnt Score Error Units WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3448.826 ? 1.620 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.740 ? 0.151 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.353 ? 0.860 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.565 ? 0.132 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge avgt 12 4476.390 ? 0.930 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall avgt 12 73.517 ? 0.082 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealLarge avgt 12 3103.911 ? 2.079 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealSmall avgt 12 57.549 ? 0.512 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge avgt 12 9587.762 ? 2.044 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall avgt 12 157.244 ? 0.169 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge avgt 12 3103.191 ? 1.100 ns/op WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall avgt 12 57.392 ? 0.624 ns/op WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPath avgt 12 2.668 ? 0.001 ns/op WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPathYoungRef avgt 12 9.337 ? 0.001 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3449.234 ? 1.840 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.720 ? 0.079 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.055 ? 0.665 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.555 ? 0.116 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge avgt 12 4476.646 ? 1.560 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall avgt 12 73.525 ? 0.070 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathRealLarge avgt 12 3104.395 ? 1.537 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathRealSmall avgt 12 57.766 ? 0.136 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge avgt 12 9586.585 ? 1.086 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall avgt 12 157.147 ? 0.128 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge avgt 12 3103.531 ? 0.883 ns/op WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall avgt 12 57.669 ? 0.620 ns/op WriteBarrier.WithoutUnrolling.testFieldWriteBarrierFastPath avgt 12 2.668 ? 0.001 ns/op WriteBarrier.WithoutUnrolling.testFieldWriteBarrierFastPathYoungRef avgt 12 9.337 ? 0.001 ns/op ------------- Commit messages: - cleanup and copyright year - 8346470: Improve WriteBarrier JMH to have old-to-young refs Changes: https://git.openjdk.org/jdk/pull/24010/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24010&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346470 Stats: 94 lines in 1 file changed: 92 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24010.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24010/head:pull/24010 PR: https://git.openjdk.org/jdk/pull/24010 From jsikstro at openjdk.org Wed Mar 12 16:05:15 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 12 Mar 2025 16:05:15 GMT Subject: RFR: 8351216: ZGC: Store NUMA node count [v2] In-Reply-To: References: Message-ID: > To avoid calling into `os::Linux::max_numa_node()` and in turn libnuma on every count lookup, I propose we instead store the count statically inside ZNUMA. This is perfectly fine since the value that we get from libnuma is configured once during initialization and never change during runtime. > > The count is set during platform dependent initialization and the getter is now defined in the common code in ZNUMA.cpp. On operating systems that ZGC does not support NUMA for (BSD and Windows) we keep the current behavior by setting the count to 1. > > This is also preparation work for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)). > > Testing: > * Tiers 1-3 > * GHA > * Verify that the count is set on a Linux system with NUMA hardware Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Style fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23922/files - new: https://git.openjdk.org/jdk/pull/23922/files/a2acf21a..45dc106a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23922&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23922&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23922.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23922/head:pull/23922 PR: https://git.openjdk.org/jdk/pull/23922 From wkemper at openjdk.org Wed Mar 12 16:42:59 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Mar 2025 16:42:59 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational [v4] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 00:24:46 GMT, Xiaolong Peng wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> The non-generational modes may also fail to notify waiters > > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 171: > >> 169: >> 170: // If this cycle completed without being cancelled, notify waiters about it >> 171: if (!heap->cancelled_gc()) { > > I feel we should remove the test `!heap->cancelled_gc()` here, if is fine if there is single mutator thread, but in most cases there are mutator threads, then the following case could happen: > 1. **Mutator A** try to cancel GC and notify control thread, it will wait with `_alloc_failure_waiters_lock`, `_cancelled_cause` is set to `_allocation_failure` > 2. Concurrent GC clear `_cancelled_cause` and set it to `_no_gc` in op_final_update_refs > 3. **Mutator B** try to cancel GC and successfully set `_cancelled_cause` to `_allocation_failure` again. > 4. Concurrent GC finishes. > 5. Control thread check `!heap->cancelled_gc()` which is false, and won't wake up mutators. > > In this case, it will delay the wake up for mutator A & B to next cycle. I believe that is the correct behavior. The mutators are waiting until there is memory available. If mutator B cannot allocate, there is no reason to believe mutator A would be able to allocate. In this case, it is fine for both mutators to wait (even if it means A has to wait an extra cycle). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23997#discussion_r1991893985 From wkemper at openjdk.org Wed Mar 12 16:46:55 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Mar 2025 16:46:55 GMT Subject: RFR: 8350905: Shenandoah: Releasing a WeakHandle's referent may extend its lifetime In-Reply-To: <2Hcn5dvmiq7DKbSwecZyrIeTxwbGVU2ISKgyoFdS-sk=.be164c1b-b4b9-47cc-941b-ecd9d25d5fb1@github.com> References: <2Hcn5dvmiq7DKbSwecZyrIeTxwbGVU2ISKgyoFdS-sk=.be164c1b-b4b9-47cc-941b-ecd9d25d5fb1@github.com> Message-ID: On Wed, 12 Mar 2025 10:30:58 GMT, Aleksey Shipilev wrote: >> When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier like Shenandoah, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. Shenandoah's store barrier should be modified to not enqueue WEAK or PHANTOM stores in the SATB buffer. > > src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 159: > >> 157: HasDecorator::value || >> 158: HasDecorator::value || >> 159: HasDecorator::value) { > > Suggest to split it into two things, with comments: > > > // Uninitialized and no-keepalive stores do not need barrier. > if (HasDecorator::value || > HasDecorator::value) { > return; > } > > // Stores to weak/phantom require no barrier. The old references would > // have been resurrected by load barrier if they were needed. > if (HasDecorator::value || > HasDecorator::value) { > return; > } > > > (I think I caught the reason why we are safe to skip SATB here, maybe comment can be expanded) Ha. I had it that way originally - I'll put it back. > src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 279: > >> 277: oop_store_common(addr, value); >> 278: if (ShenandoahCardBarrier) { >> 279: barrier_set()->write_ref_field_post(addr); > > Unnecessary change? Yes, just idly fixing warnings in my editor. I'll revert it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24001#discussion_r1991898007 PR Review Comment: https://git.openjdk.org/jdk/pull/24001#discussion_r1991899265 From wkemper at openjdk.org Wed Mar 12 16:55:52 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Mar 2025 16:55:52 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops [v3] In-Reply-To: References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Message-ID: <9WVKbKiBylo3hsIAazsKHpj13TC9q9yzSj-YujSDoWY=.2b50746e-cc0b-4eaa-9976-1ed58d959c83@github.com> On Wed, 12 Mar 2025 13:37:15 GMT, Aleksey Shipilev wrote: >> Class unloading is the last thing we do before recycling trash regions. A region will be usable for allocation as soon as it is recycled, so, in a sense, this has the same effect as turning off the weak roots flag immediately after class unloading. >> >> Also, the weak roots phase itself cannot have regions recycled because it relies on accurate mark information (recycling clears live data and resets the TAMS). We _could_ work around this by preserving the mark data (perhaps decoupling TAMS reset from region recycling). But changing the `gc_state` currently requires either a safepoint or a handshake (while holding the `Thread_lock`). I haven't thought all the way through this, but something like this (psuedo-code) might be possible: >> >> ```C++ >> vmop_entry_final_mark(); >> >> // Complete class unloading, since it actually _needs_ the oops (still need to forbid trash recycling here). >> entry_class_unloading(); >> >> // Recycle trash, but do not reset TAMS (weak roots needs TAMS to decide reachability of referents). >> entry_cleanup_early(); >> >> // Complete weak roots. There are no more trash regions and we don't have to change gc_state >> entry_weak_refs(); >> entry_weak_roots(); >> >> What do you think? This would be a separate PR of course, but do you see any reason something like this wouldn't work? I'd expect some asserts to break if we allocate into a new region with TAMS > bottom. > >> A region will be usable for allocation as soon as it is recycled, so, in a sense, this has the same effect as turning off the weak roots flag immediately after class unloading. > > Right. This answers my original question. > >> What do you think? This would be a separate PR of course, but do you see any reason something like this wouldn't work? > > It looks to me as stretching the definition of "trash" even further? I think it would be conceptually cleaner to never turn regular regions into trash until after weak roots are done. So accesses to "dead" weak roots are still possible like a regular access to "regular" region. The advantage with the scheme I proposed is that it makes immediate trash regions available for allocations earlier in the cycle. I don't think it changes the way "trash" is treated during concurrent class unloading, but it would mean that weak roots/refs wouldn't see "trash" regions any more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23951#discussion_r1991914955 From xpeng at openjdk.org Wed Mar 12 17:27:55 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 12 Mar 2025 17:27:55 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational [v4] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 00:05:05 GMT, William Kemper wrote: >> Failed allocations may race to cancel the GC with the collector who is working to clear the cancelled GC. When the GC wins this race, it will fail to notify threads that are waiting for the failed GC cycle to complete. This change directly tracks the number of threads waiting due to an allocation failure, rather than indirectly tracking them through the cancelled gc state. >> >> # Testing >> Ran TestAllocHumongousFragment#generational 6,500 times without failures. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > The non-generational modes may also fail to notify waiters Looks good to me. ------------- Marked as reviewed by xpeng (Author). PR Review: https://git.openjdk.org/jdk/pull/23997#pullrequestreview-2679311781 From xpeng at openjdk.org Wed Mar 12 17:27:55 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 12 Mar 2025 17:27:55 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational [v4] In-Reply-To: References: Message-ID: <8HGl6b056y3lTi7An0UsJ896JOy-7Ij8SMcc2MULj0I=.26ca6193-f829-449e-afbe-4d068b8533ab@github.com> On Wed, 12 Mar 2025 16:40:40 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 171: >> >>> 169: >>> 170: // If this cycle completed without being cancelled, notify waiters about it >>> 171: if (!heap->cancelled_gc()) { >> >> I feel we should remove the test `!heap->cancelled_gc()` here, if is fine if there is single mutator thread, but in most cases there are mutator threads, then the following case could happen: >> 1. **Mutator A** try to cancel GC and notify control thread, it will wait with `_alloc_failure_waiters_lock`, `_cancelled_cause` is set to `_allocation_failure` >> 2. Concurrent GC clear `_cancelled_cause` and set it to `_no_gc` in op_final_update_refs >> 3. **Mutator B** try to cancel GC and successfully set `_cancelled_cause` to `_allocation_failure` again. >> 4. Concurrent GC finishes. >> 5. Control thread check `!heap->cancelled_gc()` which is false, and won't wake up mutators. >> >> In this case, it will delay the wake up for mutator A & B to next cycle. > > I believe that is the correct behavior. The mutators are waiting until there is memory available. If mutator B cannot allocate, there is no reason to believe mutator A would be able to allocate. In this case, it is fine for both mutators to wait (even if it means A has to wait an extra cycle). Thanks for the explanation, re-read the relevant codes I think it make sense, when Mutator B fails to allocate when Concurrent GC is at `op_final_update_refs`, very unlikely there is enough space for Mutator A. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23997#discussion_r1991968117 From tschatzl at openjdk.org Wed Mar 12 17:35:59 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Mar 2025 17:35:59 GMT Subject: RFR: 8351500: G1: NUMA migrations cause crashes in region allocation [v2] In-Reply-To: References: Message-ID: <8DfF7m33WP9nq2ZIToR-32Anx1chIG1Ek01cY3pIyZU=.bee4182f-b802-4bbf-9b06-2d8eb858b889@github.com> On Wed, 12 Mar 2025 13:27:17 GMT, Thomas Stuefe wrote: >> For details, please see JBS issue. >> >> _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ >> >> I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. >> >> ---- >> >> The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. >> >> This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. >> >> --- >> >> Testing: >> >> Testing is difficult. See remark in JBS issue. >> >> I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > node_index parameter should precede output parameters Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23984#pullrequestreview-2679335407 From tschatzl at openjdk.org Wed Mar 12 17:36:00 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Mar 2025 17:36:00 GMT Subject: RFR: 8351500: G1: NUMA migrations cause crashes in region allocation [v2] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 13:31:32 GMT, Thomas Stuefe wrote: > I played around with this but the patch got far too big for this simple crash fix. Would be something for a cleanup RFE. That's understandable and fine with me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23984#issuecomment-2718625919 From tschatzl at openjdk.org Wed Mar 12 17:44:01 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Mar 2025 17:44:01 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v17] In-Reply-To: <0w7seS1tIFhUxnmStxQySISWVfpBBsRmUtx7EoTy9a4=.509a3d5e-56d0-4fd8-8896-51835b14302b@github.com> References: <0w7seS1tIFhUxnmStxQySISWVfpBBsRmUtx7EoTy9a4=.509a3d5e-56d0-4fd8-8896-51835b14302b@github.com> Message-ID: On Wed, 12 Mar 2025 13:20:25 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: >> >> - Merge branch 'master' into 8342382-card-table-instead-of-dcq >> - * optimized RISCV gen_write_ref_array_post_barrier() implementation contributed by @RealFYang >> - * fix card table verification crashes: in the first refinement phase, when switching the global card tables, we need to re-check whether we are still in the same sweep epoch or not. It might have changed due to a GC interrupting acquiring the Heap_lock. Otherwise new threads will scribble on the refinement table. >> Cause are last-minute changes before making the PR ready to review. >> >> Testing: without the patch, occurs fairly frequently when continuously >> (1 in 20) starting refinement. Does not afterward. >> - * ayang review 3 >> * comments >> * minor refactorings >> - * iwalulya review >> * renaming >> * fix some includes, forward declaration >> - * fix whitespace >> * additional whitespace between log tags >> * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename >> - ayang review >> * renamings >> * refactorings >> - iwalulya review >> * comments for variables tracking to-collection-set and just dirtied cards after GC/refinement >> * predicate for determining whether the refinement has been disabled >> * some other typos/comment improvements >> * renamed _has_xxx_ref to _has_ref_to_xxx to be more consistent with naming >> - * ayang review - fix comment >> - * iwalulya review 2 >> * G1ConcurrentRefineWorkState -> G1ConcurrentRefineSweepState >> * some additional documentation >> - ... and 14 more: https://git.openjdk.org/jdk/compare/0c7b5abb...aec95051 > > src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 217: > >> 215: >> 216: { >> 217: SuspendibleThreadSetLeaver sts_leave; > > Can you add some comment on why leaving the set is required? It's not obvious to me why. I'd expect handshake to work out of the box... It isn't apparently. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1991999476 From tschatzl at openjdk.org Wed Mar 12 17:59:51 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 12 Mar 2025 17:59:51 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v18] In-Reply-To: References: Message-ID: <3KOwgdzYn_vXQVWisVUEY-0i1gtZEfZhcD1-id3epYE=.17aa84bc-a7ec-4dda-b596-7a1016d710fc@github.com> > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * ayang review * remove unnecessary STSleaver * some more documentation around to_collection_card card color ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/aec95051..3766b76c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=16-17 Stats: 18 lines in 2 files changed: 5 ins; 4 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From wkemper at openjdk.org Wed Mar 12 18:55:08 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Mar 2025 18:55:08 GMT Subject: RFR: 8350905: Shenandoah: Releasing a WeakHandle's referent may extend its lifetime [v2] In-Reply-To: References: Message-ID: > When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier like Shenandoah, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. Shenandoah's store barrier should be modified to not enqueue WEAK or PHANTOM stores in the SATB buffer. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Split out and comment on weak/phantom stores separately - Revert unnecessary change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24001/files - new: https://git.openjdk.org/jdk/pull/24001/files/a742874e..929bf043 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24001&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24001&range=00-01 Stats: 10 lines in 1 file changed: 7 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24001.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24001/head:pull/24001 PR: https://git.openjdk.org/jdk/pull/24001 From shade at openjdk.org Wed Mar 12 19:24:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Mar 2025 19:24:54 GMT Subject: RFR: 8350905: Shenandoah: Releasing a WeakHandle's referent may extend its lifetime [v2] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 18:55:08 GMT, William Kemper wrote: >> When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier like Shenandoah, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. Shenandoah's store barrier should be modified to not enqueue WEAK or PHANTOM stores in the SATB buffer. > > William Kemper has updated the pull request incrementally with two additional commits since the last revision: > > - Split out and comment on weak/phantom stores separately > - Revert unnecessary change Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24001#pullrequestreview-2679647613 From rkennke at openjdk.org Wed Mar 12 19:41:54 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 12 Mar 2025 19:41:54 GMT Subject: RFR: 8351444: Shenandoah: Class Unloading may encounter recycled oops [v3] In-Reply-To: References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Message-ID: On Mon, 10 Mar 2025 21:25:06 GMT, William Kemper wrote: >> Unloading classes may require a walk of unreachable oops. For this reason, it is not safe to recycle memory before class unloading is complete. This complements existing code to prevent mutators from recycling trash regions while weak roots is in progress. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Trim extraneous comment Looks ok to me. Thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23951#pullrequestreview-2679682909 From wkemper at openjdk.org Wed Mar 12 20:16:07 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Mar 2025 20:16:07 GMT Subject: Integrated: 8351444: Shenandoah: Class Unloading may encounter recycled oops In-Reply-To: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> References: <78jaUyUnMnfncp8I4k6yvHqFaxxJ1BrvqkIelqK6aDc=.a1e2c417-3df2-45cf-befa-d60ff514533f@github.com> Message-ID: On Fri, 7 Mar 2025 21:47:31 GMT, William Kemper wrote: > Unloading classes may require a walk of unreachable oops. For this reason, it is not safe to recycle memory before class unloading is complete. This complements existing code to prevent mutators from recycling trash regions while weak roots is in progress. This pull request has now been integrated. Changeset: cdf7632f Author: William Kemper URL: https://git.openjdk.org/jdk/commit/cdf7632f8a85611077a27c91ad928ed8ea116f95 Stats: 47 lines in 4 files changed: 23 ins; 7 del; 17 mod 8351444: Shenandoah: Class Unloading may encounter recycled oops Reviewed-by: shade, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/23951 From ysr at openjdk.org Wed Mar 12 20:28:55 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 12 Mar 2025 20:28:55 GMT Subject: RFR: 8350905: Shenandoah: Releasing a WeakHandle's referent may extend its lifetime [v2] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 18:55:08 GMT, William Kemper wrote: >> When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier like Shenandoah, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. Shenandoah's store barrier should be modified to not enqueue WEAK or PHANTOM stores in the SATB buffer. > > William Kemper has updated the pull request incrementally with two additional commits since the last revision: > > - Split out and comment on weak/phantom stores separately > - Revert unnecessary change Looks good to me. I'm curious if this made any difference to SPECjbb performance w/GenShen. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24001#pullrequestreview-2679782378 From wkemper at openjdk.org Wed Mar 12 20:45:08 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Mar 2025 20:45:08 GMT Subject: RFR: 8350905: Shenandoah: Releasing a WeakHandle's referent may extend its lifetime [v2] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 18:55:08 GMT, William Kemper wrote: >> When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier like Shenandoah, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. Shenandoah's store barrier should be modified to not enqueue WEAK or PHANTOM stores in the SATB buffer. > > William Kemper has updated the pull request incrementally with two additional commits since the last revision: > > - Split out and comment on weak/phantom stores separately > - Revert unnecessary change I don't see any performance difference on Specjbb. The issue there is with getting cleared weak references processed by the Java thread that queues the weak references. References processing also uses `RawAccess` to null out the referent, so it doesn't go through this barrier. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24001#issuecomment-2719079655 From wkemper at openjdk.org Wed Mar 12 20:45:09 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Mar 2025 20:45:09 GMT Subject: Integrated: 8350905: Shenandoah: Releasing a WeakHandle's referent may extend its lifetime In-Reply-To: References: Message-ID: <1FWeSQKJ3IFVHgk7roq8VUgXut5mXckVdhDMrjAP6bk=.28958377-21bc-4869-a9ca-327b95810eb6@github.com> On Tue, 11 Mar 2025 23:35:24 GMT, William Kemper wrote: > When weak handles are cleared, the `nullptr` is stored with the `ON_PHANTOM_OOP_REF` decorator. For concurrent collectors using a SATB barrier like Shenandoah, this may cause the referent to be enqueued and marked when it would be otherwise unreachable. The problem is especially acute for Shenandoah's generational mode, in which a young region holding the otherwise unreachable referent, may become trash after the referent is enqueued for old marking. Shenandoah's store barrier should be modified to not enqueue WEAK or PHANTOM stores in the SATB buffer. This pull request has now been integrated. Changeset: a347ecde Author: William Kemper URL: https://git.openjdk.org/jdk/commit/a347ecdedc098bd23598ba6acf28d77db01be066 Stats: 10 lines in 1 file changed: 10 ins; 0 del; 0 mod 8350905: Shenandoah: Releasing a WeakHandle's referent may extend its lifetime Reviewed-by: shade, ysr ------------- PR: https://git.openjdk.org/jdk/pull/24001 From wkemper at openjdk.org Wed Mar 12 20:55:07 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Mar 2025 20:55:07 GMT Subject: RFR: 8350905: Shenandoah: Releasing a WeakHandle's referent may extend its lifetime [v2] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 20:26:37 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request incrementally with two additional commits since the last revision: >> >> - Split out and comment on weak/phantom stores separately >> - Revert unnecessary change > > Looks good to me. > > I'm curious if this made any difference to SPECjbb performance w/GenShen. @ysramakrishna , The issue description could make it more clear, but, in addition to the issue described in the title, this PR fixes the specific problem of the old generation SATB asserting when it tries to decode a narrow oop. The sequence of events for the assertion failure is: 1. SATB is on for old marking 2. Young collection transitions some regions to `trash` 3. Young weak root processing nulls out a referent that points into a `trash` region 4. Old gen SATB barrier tries to decode the narrow oop, but asserts out because the oop is not in the heap (because it is in a trash region) This also relates to: https://github.com/openjdk/jdk/pull/23951 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24001#issuecomment-2719098857 From wkemper at openjdk.org Wed Mar 12 22:41:53 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Mar 2025 22:41:53 GMT Subject: RFR: 8348400: GenShen: assert(ShenandoahHeap::heap()->is_full_gc_in_progress() || (used_regions_size() <= _max_capacity)) failed: Cannot use more than capacity # In-Reply-To: <1TI7zry8_JLLMVwxDq0Yd65TrgkSYafDOEn8zOFS7z0=.0517105a-520a-4686-83eb-a2446ee72a8a@github.com> References: <1TI7zry8_JLLMVwxDq0Yd65TrgkSYafDOEn8zOFS7z0=.0517105a-520a-4686-83eb-a2446ee72a8a@github.com> Message-ID: On Tue, 11 Mar 2025 20:59:10 GMT, Kelvin Nilsen wrote: >> Shenandoah cannot recycle immediate trash regions during the concurrent weak roots phase, however some of these regions may be assigned to the old generation collector's reserve. When an evacuation/promotion tries to allocate in such a region, it will fail (as expected) and try to 'steal' a region from the mutator's partition of the free set. There are cases when this cannot be allowed due to capacity constraints. However, in some of these cases it will be possible to 'swap' a region between the old reserve and the mutator's partition. This change covers this case. > > src/hotspot/share/gc/shenandoah/shenandoahGenerationSizer.cpp line 127: > >> 125: } >> 126: >> 127: if (dst->max_capacity() + bytes_to_transfer > max_size_for(dst)) { > > Do we need to edit the descriptions of ShenandoahMinYoungPercentage and ShenandoahMaxYoungPercentage? Do we need to remove these options entirely from shenandoah_globals? Yes, we should probably remove one of them and name the other `ShenandoahInitYoungPercentage`. I think I will back out this change to `shGenerationSizer` in this PR, and open a different PR for removing this constraint and renaming the options. It's a bit outside the scope of this bug fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23998#discussion_r1992390432 From wkemper at openjdk.org Wed Mar 12 23:17:44 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Mar 2025 23:17:44 GMT Subject: RFR: 8348400: GenShen: assert(ShenandoahHeap::heap()->is_full_gc_in_progress() || (used_regions_size() <= _max_capacity)) failed: Cannot use more than capacity # [v2] In-Reply-To: References: Message-ID: <-73CoqTBA5dJPEwr7bxSvDmMFC9g_LZpW-q7XSjjtrE=.4966fa3b-e98f-4a50-9492-22bf99eecf1f@github.com> > Shenandoah cannot recycle immediate trash regions during the concurrent weak roots phase, however some of these regions may be assigned to the old generation collector's reserve. When an evacuation/promotion tries to allocate in such a region, it will fail (as expected) and try to 'steal' a region from the mutator's partition of the free set. There are cases when this cannot be allowed due to capacity constraints. However, in some of these cases it will be possible to 'swap' a region between the old reserve and the mutator's partition. This change covers this case. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Revert "Do not enforce size constraints on generations" This reverts commit 11ff0677449fa6749df8830f4a03f1c7861ba314. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23998/files - new: https://git.openjdk.org/jdk/pull/23998/files/11ff0677..a42efe5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23998&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23998&range=00-01 Stats: 10 lines in 1 file changed: 10 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23998.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23998/head:pull/23998 PR: https://git.openjdk.org/jdk/pull/23998 From wkemper at openjdk.org Wed Mar 12 23:17:44 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 12 Mar 2025 23:17:44 GMT Subject: RFR: 8348400: GenShen: assert(ShenandoahHeap::heap()->is_full_gc_in_progress() || (used_regions_size() <= _max_capacity)) failed: Cannot use more than capacity # [v2] In-Reply-To: References: <1TI7zry8_JLLMVwxDq0Yd65TrgkSYafDOEn8zOFS7z0=.0517105a-520a-4686-83eb-a2446ee72a8a@github.com> Message-ID: On Wed, 12 Mar 2025 22:39:45 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGenerationSizer.cpp line 127: >> >>> 125: } >>> 126: >>> 127: if (dst->max_capacity() + bytes_to_transfer > max_size_for(dst)) { >> >> Do we need to edit the descriptions of ShenandoahMinYoungPercentage and ShenandoahMaxYoungPercentage? Do we need to remove these options entirely from shenandoah_globals? > > Yes, we should probably remove one of them and name the other `ShenandoahInitYoungPercentage`. I think I will back out this change to `shGenerationSizer` in this PR, and open a different PR for removing this constraint and renaming the options. It's a bit outside the scope of this bug fix. Filed: https://bugs.openjdk.org/browse/JDK-8351892 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23998#discussion_r1992416245 From sjohanss at openjdk.org Thu Mar 13 08:52:55 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 13 Mar 2025 08:52:55 GMT Subject: RFR: 8351500: G1: NUMA migrations cause crashes in region allocation [v2] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 13:27:17 GMT, Thomas Stuefe wrote: >> For details, please see JBS issue. >> >> _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ >> >> I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. >> >> ---- >> >> The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. >> >> This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. >> >> --- >> >> Testing: >> >> Testing is difficult. See remark in JBS issue. >> >> I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > node_index parameter should precede output parameters I agree with keeping this patch small and then at some point doing a larger refactor in this area. I also agree that we want the out-parameter last, but when reading the parameters now you get sizing parameters split by the node index parameter, i.e: HeapWord* result = _allocator->attempt_allocation(min_word_size, desired_word_size, node_index, actual_word_size); I would prefer the node index first here, like: HeapWord* result = _allocator->attempt_allocation(node_index, min_word_size, desired_word_size, actual_word_size); For this one: HeapWord* par_allocate_during_gc(G1HeapRegionAttr dest, size_t min_word_size, size_t desired_word_size, uint node_index, size_t* actual_word_size); I would prefer to put the node_index after dest but before the sizes. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23984#issuecomment-2720447995 From jsikstro at openjdk.org Thu Mar 13 09:02:39 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 13 Mar 2025 09:02:39 GMT Subject: RFR: 8351216: ZGC: Store NUMA node count [v3] In-Reply-To: References: Message-ID: > To avoid calling into `os::Linux::max_numa_node()` and in turn libnuma on every count lookup, I propose we instead store the count statically inside ZNUMA. This is perfectly fine since the value that we get from libnuma is configured once during initialization and never change during runtime. > > The count is set during platform dependent initialization and the getter is now defined in the common code in ZNUMA.cpp. On operating systems that ZGC does not support NUMA for (BSD and Windows) we keep the current behavior by setting the count to 1. > > This is also preparation work for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)). > > Testing: > * Tiers 1-3 > * GHA > * Verify that the count is set on a Linux system with NUMA hardware Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Copyright years ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23922/files - new: https://git.openjdk.org/jdk/pull/23922/files/45dc106a..bdfcc781 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23922&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23922&range=01-02 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23922.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23922/head:pull/23922 PR: https://git.openjdk.org/jdk/pull/23922 From stuefe at openjdk.org Thu Mar 13 09:14:52 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 13 Mar 2025 09:14:52 GMT Subject: RFR: 8351500: G1: NUMA migrations cause crashes in region allocation [v2] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 08:50:40 GMT, Stefan Johansson wrote: > I agree with keeping this patch small and then at some point doing a larger refactor in this area. I also agree that we want the out-parameter last, but when reading the parameters now you get sizing parameters split by the node index parameter, i.e: > > ``` > HeapWord* result = _allocator->attempt_allocation(min_word_size, desired_word_size, node_index, actual_word_size); > ``` > > I would prefer the node index first here, like: > > ``` > HeapWord* result = _allocator->attempt_allocation(node_index, min_word_size, desired_word_size, actual_word_size); > ``` > > For this one: > > ``` > HeapWord* par_allocate_during_gc(G1HeapRegionAttr dest, > size_t min_word_size, > size_t desired_word_size, > uint node_index, > size_t* actual_word_size); > ``` > > I would prefer to put the node_index after dest but before the sizes. > > What do you think? This was actually the first attempt (node_index first) I did, but I found that affected more places that had nothing to do with the patch. OTOH if we plan on reworking this anyway, its better to do it the right way the first time instead of reshuffling the parameters later. Okay, I change this to node_index first. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23984#issuecomment-2720522811 From stuefe at openjdk.org Thu Mar 13 11:07:21 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 13 Mar 2025 11:07:21 GMT Subject: RFR: 8351500: G1: NUMA migrations cause crashes in region allocation [v3] In-Reply-To: References: Message-ID: > For details, please see JBS issue. > > _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ > > I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. > > ---- > > The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. > > This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. > > --- > > Testing: > > Testing is difficult. See remark in JBS issue. > > I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - revert blank line change - node_index preceding sizes - Merge branch 'master' into JDK-8351630-Fix-NUMA-association-for-the-duration-of-a-single-G1-Heap-allocation - node_index parameter should precede output parameters - start ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23984/files - new: https://git.openjdk.org/jdk/pull/23984/files/c8870820..578002e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23984&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23984&range=01-02 Stats: 18976 lines in 265 files changed: 7389 ins; 10345 del; 1242 mod Patch: https://git.openjdk.org/jdk/pull/23984.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23984/head:pull/23984 PR: https://git.openjdk.org/jdk/pull/23984 From stuefe at openjdk.org Thu Mar 13 11:07:21 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 13 Mar 2025 11:07:21 GMT Subject: RFR: 8351500: G1: NUMA migrations cause crashes in region allocation [v2] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 08:50:40 GMT, Stefan Johansson wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> node_index parameter should precede output parameters > > I agree with keeping this patch small and then at some point doing a larger refactor in this area. I also agree that we want the out-parameter last, but when reading the parameters now you get sizing parameters split by the node index parameter, i.e: > > HeapWord* result = _allocator->attempt_allocation(min_word_size, desired_word_size, node_index, actual_word_size); > > > I would prefer the node index first here, like: > > HeapWord* result = _allocator->attempt_allocation(node_index, min_word_size, desired_word_size, actual_word_size); > > For this one: > > HeapWord* par_allocate_during_gc(G1HeapRegionAttr dest, > size_t min_word_size, > size_t desired_word_size, > uint node_index, > size_t* actual_word_size); > > > I would prefer to put the node_index after dest but before the sizes. > > What do you think? @kstefanj I rewrote the patch to put node_index at the front. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23984#issuecomment-2720854399 From rkennke at openjdk.org Thu Mar 13 11:20:59 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 13 Mar 2025 11:20:59 GMT Subject: RFR: 8351500: G1: NUMA migrations cause crashes in region allocation [v3] In-Reply-To: References: Message-ID: <62_SyQLvCB0fClGX8q1qj2PP5fsYwXTbQLjZMNEFETc=.01f6f352-5181-4ed6-b240-b16574824289@github.com> On Thu, 13 Mar 2025 11:07:21 GMT, Thomas Stuefe wrote: >> For details, please see JBS issue. >> >> _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ >> >> I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. >> >> ---- >> >> The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. >> >> This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. >> >> --- >> >> Testing: >> >> Testing is difficult. See remark in JBS issue. >> >> I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - revert blank line change > - node_index preceding sizes > - Merge branch 'master' into JDK-8351630-Fix-NUMA-association-for-the-duration-of-a-single-G1-Heap-allocation > - node_index parameter should precede output parameters > - start The fix looks good to me. ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23984#pullrequestreview-2681503873 From tschatzl at openjdk.org Thu Mar 13 13:07:29 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 13 Mar 2025 13:07:29 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v19] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * when aborting refinement during full collection, the global card table and the per-thread card table might not be in sync. Roll forward during abort of the refinement in these situations. * additional verification * added some missing ResourceMarks in asserts * added variant of ArrayJuggle2 that crashes fairly quickly without these changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/3766b76c..78611173 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=17-18 Stats: 111 lines in 11 files changed: 82 ins; 13 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Thu Mar 13 13:14:54 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 13 Mar 2025 13:14:54 GMT Subject: RFR: 8351216: ZGC: Store NUMA node count [v3] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 09:02:39 GMT, Joel Sikstr?m wrote: >> To avoid calling into `os::Linux::max_numa_node()` and in turn libnuma on every count lookup, I propose we instead store the count statically inside ZNUMA. This is perfectly fine since the value that we get from libnuma is configured once during initialization and never change during runtime. >> >> The count is set during platform dependent initialization and the getter is now defined in the common code in ZNUMA.cpp. On operating systems that ZGC does not support NUMA for (BSD and Windows) we keep the current behavior by setting the count to 1. >> >> This is also preparation work for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)). >> >> Testing: >> * Tiers 1-3 >> * GHA >> * Verify that the count is set on a Linux system with NUMA hardware > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Copyright years Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23922#pullrequestreview-2681856590 From jsikstro at openjdk.org Thu Mar 13 13:25:02 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 13 Mar 2025 13:25:02 GMT Subject: RFR: 8351216: ZGC: Store NUMA node count [v3] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 09:02:39 GMT, Joel Sikstr?m wrote: >> To avoid calling into `os::Linux::max_numa_node()` and in turn libnuma on every count lookup, I propose we instead store the count statically inside ZNUMA. This is perfectly fine since the value that we get from libnuma is configured once during initialization and never change during runtime. >> >> The count is set during platform dependent initialization and the getter is now defined in the common code in ZNUMA.cpp. On operating systems that ZGC does not support NUMA for (BSD and Windows) we keep the current behavior by setting the count to 1. >> >> This is also preparation work for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)). >> >> Testing: >> * Tiers 1-3 >> * GHA >> * Verify that the count is set on a Linux system with NUMA hardware > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Copyright years Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23922#issuecomment-2721239847 From jsikstro at openjdk.org Thu Mar 13 13:25:03 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 13 Mar 2025 13:25:03 GMT Subject: Integrated: 8351216: ZGC: Store NUMA node count In-Reply-To: References: Message-ID: On Wed, 5 Mar 2025 20:06:08 GMT, Joel Sikstr?m wrote: > To avoid calling into `os::Linux::max_numa_node()` and in turn libnuma on every count lookup, I propose we instead store the count statically inside ZNUMA. This is perfectly fine since the value that we get from libnuma is configured once during initialization and never change during runtime. > > The count is set during platform dependent initialization and the getter is now defined in the common code in ZNUMA.cpp. On operating systems that ZGC does not support NUMA for (BSD and Windows) we keep the current behavior by setting the count to 1. > > This is also preparation work for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)). > > Testing: > * Tiers 1-3 > * GHA > * Verify that the count is set on a Linux system with NUMA hardware This pull request has now been integrated. Changeset: 7e3bc81e Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/7e3bc81e885071352fceab01015d7deef067a27a Stats: 30 lines in 7 files changed: 6 ins; 12 del; 12 mod 8351216: ZGC: Store NUMA node count Reviewed-by: tschatzl, sjohanss, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/23922 From sjohanss at openjdk.org Thu Mar 13 13:26:54 2025 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 13 Mar 2025 13:26:54 GMT Subject: RFR: 8351500: G1: NUMA migrations cause crashes in region allocation [v3] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 11:07:21 GMT, Thomas Stuefe wrote: >> For details, please see JBS issue. >> >> _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ >> >> I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. >> >> ---- >> >> The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. >> >> This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. >> >> --- >> >> Testing: >> >> Testing is difficult. See remark in JBS issue. >> >> I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - revert blank line change > - node_index preceding sizes > - Merge branch 'master' into JDK-8351630-Fix-NUMA-association-for-the-duration-of-a-single-G1-Heap-allocation > - node_index parameter should precede output parameters > - start I like this =) Thanks for addressing my comments. ------------- Marked as reviewed by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23984#pullrequestreview-2681900915 From stuefe at openjdk.org Thu Mar 13 13:29:53 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 13 Mar 2025 13:29:53 GMT Subject: RFR: 8351500: G1: NUMA migrations cause crashes in region allocation [v3] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 11:07:21 GMT, Thomas Stuefe wrote: >> For details, please see JBS issue. >> >> _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ >> >> I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. >> >> ---- >> >> The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. >> >> This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. >> >> --- >> >> Testing: >> >> Testing is difficult. See remark in JBS issue. >> >> I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - revert blank line change > - node_index preceding sizes > - Merge branch 'master' into JDK-8351630-Fix-NUMA-association-for-the-duration-of-a-single-G1-Heap-allocation > - node_index parameter should precede output parameters > - start Shenandoah problem in GHAs obviously unrelated, since this affects G1 only; but I am investigating atm an issue that can prevent OOMEs from being thrown when we run out of class space, so it may be related to that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23984#issuecomment-2721258563 From stuefe at openjdk.org Thu Mar 13 13:34:10 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 13 Mar 2025 13:34:10 GMT Subject: RFR: 8351500: G1: NUMA migrations cause crashes in region allocation [v4] In-Reply-To: References: Message-ID: <8HuwfXROj9o6jssyGF3gIyIKrcezPJJlUA0qdvPfu8M=.e6fd67ea-370e-4a0c-8231-89e0916f9d27@github.com> > For details, please see JBS issue. > > _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ > > I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. > > ---- > > The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. > > This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. > > --- > > Testing: > > Testing is difficult. See remark in JBS issue. > > I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8351630-Fix-NUMA-association-for-the-duration-of-a-single-G1-Heap-allocation - revert blank line change - node_index preceding sizes - Merge branch 'master' into JDK-8351630-Fix-NUMA-association-for-the-duration-of-a-single-G1-Heap-allocation - node_index parameter should precede output parameters - start ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23984/files - new: https://git.openjdk.org/jdk/pull/23984/files/578002e0..e8b0a71d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23984&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23984&range=02-03 Stats: 286 lines in 20 files changed: 162 ins; 68 del; 56 mod Patch: https://git.openjdk.org/jdk/pull/23984.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23984/head:pull/23984 PR: https://git.openjdk.org/jdk/pull/23984 From stuefe at openjdk.org Thu Mar 13 13:34:11 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 13 Mar 2025 13:34:11 GMT Subject: RFR: 8351500: G1: NUMA migrations cause crashes in region allocation [v3] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 11:07:21 GMT, Thomas Stuefe wrote: >> For details, please see JBS issue. >> >> _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ >> >> I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. >> >> ---- >> >> The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. >> >> This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. >> >> --- >> >> Testing: >> >> Testing is difficult. See remark in JBS issue. >> >> I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - revert blank line change > - node_index preceding sizes > - Merge branch 'master' into JDK-8351630-Fix-NUMA-association-for-the-duration-of-a-single-G1-Heap-allocation > - node_index parameter should precede output parameters > - start Thanks everyone for reviewing. I am letting the GHAs run through one last time before push. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23984#issuecomment-2721270812 From tschatzl at openjdk.org Thu Mar 13 14:16:07 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 13 Mar 2025 14:16:07 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v19] In-Reply-To: References: Message-ID: <-ys7CbBNU4hCmEgYQyZpmBQ_rso4i2_KoFHLPNv73sI=.bd715b1d-b9fd-48b7-bb06-d6673ab2dbfc@github.com> On Thu, 13 Mar 2025 13:07:29 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * when aborting refinement during full collection, the global card table and the per-thread card table might not be in sync. Roll forward during abort of the refinement in these situations. > * additional verification > * added some missing ResourceMarks in asserts > * added variant of ArrayJuggle2 that crashes fairly quickly without these changes Commit https://github.com/openjdk/jdk/pull/23739/commits/786111735c306583af5bc75f7653f0da67d52adb fixes an issue with full gc interrupting refinement while the global card table and the JavaThread's card table changes. Testing: tier1-7 with changes, tier1-5 with changes stressing refinement similar to the ones added to the new test. The new variant of `ArrayJuggle2` fails >50% of all times in our CI without the patch (verified 700 or so executions of that not failing with patch). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2721413659 From tschatzl at openjdk.org Thu Mar 13 15:46:00 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 13 Mar 2025 15:46:00 GMT Subject: RFR: 8346470: Improve WriteBarrier JMH to have old-to-young refs In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 15:48:20 GMT, Eric Caspole wrote: > Adds new cases based on the existing ones using an extra iteration setup to allocate into young gen, allowing to test old-to-young stores etc, where the existing code is all old-to-old. > > Here is a run on a standard OCI A1.160 with JDK 25: > > Benchmark Mode Cnt Score Error Units > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3448.826 ? 1.620 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.740 ? 0.151 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.353 ? 0.860 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.565 ? 0.132 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge avgt 12 4476.390 ? 0.930 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall avgt 12 73.517 ? 0.082 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealLarge avgt 12 3103.911 ? 2.079 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealSmall avgt 12 57.549 ? 0.512 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge avgt 12 9587.762 ? 2.044 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall avgt 12 157.244 ? 0.169 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge avgt 12 3103.191 ? 1.100 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall avgt 12 57.392 ? 0.624 ns/op > WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPath avgt 12 2.668 ? 0.001 ns/op > WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPathYoungRef avgt 12 9.337 ? 0.001 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3449.234 ? 1.840 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.720 ? 0.079 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.055 ? 0.665 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.555 ? 0.116 ns/op > WriteBarrier.Witho... Some minor formatting issues... test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java line 127: > 125: @Setup(Level.Iteration) > 126: public void setupIteration() { > 127: // Reallocate each iteration to ensure they are in young gen Suggestion: // Reallocate target objects each iteration to ensure they are in young gen. test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java line 185: > 183: } > 184: } > 185: Suggestion: For consistent spacing between benchmarks. ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24010#pullrequestreview-2682458764 PR Review Comment: https://git.openjdk.org/jdk/pull/24010#discussion_r1993806737 PR Review Comment: https://git.openjdk.org/jdk/pull/24010#discussion_r1993805710 From ecaspole at openjdk.org Thu Mar 13 15:53:43 2025 From: ecaspole at openjdk.org (Eric Caspole) Date: Thu, 13 Mar 2025 15:53:43 GMT Subject: RFR: 8346470: Improve WriteBarrier JMH to have old-to-young refs [v2] In-Reply-To: References: Message-ID: > Adds new cases based on the existing ones using an extra iteration setup to allocate into young gen, allowing to test old-to-young stores etc, where the existing code is all old-to-old. > > Here is a run on a standard OCI A1.160 with JDK 25: > > Benchmark Mode Cnt Score Error Units > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3448.826 ? 1.620 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.740 ? 0.151 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.353 ? 0.860 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.565 ? 0.132 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge avgt 12 4476.390 ? 0.930 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall avgt 12 73.517 ? 0.082 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealLarge avgt 12 3103.911 ? 2.079 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealSmall avgt 12 57.549 ? 0.512 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge avgt 12 9587.762 ? 2.044 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall avgt 12 157.244 ? 0.169 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge avgt 12 3103.191 ? 1.100 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall avgt 12 57.392 ? 0.624 ns/op > WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPath avgt 12 2.668 ? 0.001 ns/op > WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPathYoungRef avgt 12 9.337 ? 0.001 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3449.234 ? 1.840 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.720 ? 0.079 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.055 ? 0.665 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.555 ? 0.116 ns/op > WriteBarrier.Witho... Eric Caspole has updated the pull request incrementally with two additional commits since the last revision: - Update test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> - Update test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24010/files - new: https://git.openjdk.org/jdk/pull/24010/files/7075365a..46da7dbc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24010&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24010&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24010.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24010/head:pull/24010 PR: https://git.openjdk.org/jdk/pull/24010 From ecaspole at openjdk.org Thu Mar 13 15:55:58 2025 From: ecaspole at openjdk.org (Eric Caspole) Date: Thu, 13 Mar 2025 15:55:58 GMT Subject: RFR: 8346470: Improve WriteBarrier JMH to have old-to-young refs [v2] In-Reply-To: References: Message-ID: <2J5tv5CdcNPngtTD7dTc0N_BU7hR_9oh4naM34JWIqg=.15d13edb-c2a6-4f7c-85ef-cffa5d5eef7c@github.com> On Thu, 13 Mar 2025 15:53:43 GMT, Eric Caspole wrote: >> Adds new cases based on the existing ones using an extra iteration setup to allocate into young gen, allowing to test old-to-young stores etc, where the existing code is all old-to-old. >> >> Here is a run on a standard OCI A1.160 with JDK 25: >> >> Benchmark Mode Cnt Score Error Units >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3448.826 ? 1.620 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.740 ? 0.151 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.353 ? 0.860 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.565 ? 0.132 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge avgt 12 4476.390 ? 0.930 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall avgt 12 73.517 ? 0.082 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealLarge avgt 12 3103.911 ? 2.079 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealSmall avgt 12 57.549 ? 0.512 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge avgt 12 9587.762 ? 2.044 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall avgt 12 157.244 ? 0.169 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge avgt 12 3103.191 ? 1.100 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall avgt 12 57.392 ? 0.624 ns/op >> WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPath avgt 12 2.668 ? 0.001 ns/op >> WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPathYoungRef avgt 12 9.337 ? 0.001 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3449.234 ? 1.840 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.720 ? 0.079 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.055 ? 0.665 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 1... > > Eric Caspole has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> Updated the code with Thomas' suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24010#issuecomment-2721751933 From tschatzl at openjdk.org Thu Mar 13 15:58:58 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 13 Mar 2025 15:58:58 GMT Subject: RFR: 8346470: Improve WriteBarrier JMH to have old-to-young refs [v2] In-Reply-To: References: Message-ID: <5WMDwKLS_c68QKjiBL16cjMUR6b_MQKJ28m36XqIdE8=.78ef226b-bf64-42c7-89be-2eb6a655c7f2@github.com> On Thu, 13 Mar 2025 15:53:43 GMT, Eric Caspole wrote: >> Adds new cases based on the existing ones using an extra iteration setup to allocate into young gen, allowing to test old-to-young stores etc, where the existing code is all old-to-old. >> >> Here is a run on a standard OCI A1.160 with JDK 25: >> >> Benchmark Mode Cnt Score Error Units >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3448.826 ? 1.620 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.740 ? 0.151 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.353 ? 0.860 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.565 ? 0.132 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge avgt 12 4476.390 ? 0.930 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall avgt 12 73.517 ? 0.082 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealLarge avgt 12 3103.911 ? 2.079 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealSmall avgt 12 57.549 ? 0.512 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge avgt 12 9587.762 ? 2.044 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall avgt 12 157.244 ? 0.169 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge avgt 12 3103.191 ? 1.100 ns/op >> WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall avgt 12 57.392 ? 0.624 ns/op >> WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPath avgt 12 2.668 ? 0.001 ns/op >> WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPathYoungRef avgt 12 9.337 ? 0.001 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3449.234 ? 1.840 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.720 ? 0.079 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.055 ? 0.665 ns/op >> WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 1... > > Eric Caspole has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> > - Update test/micro/org/openjdk/bench/vm/compiler/WriteBarrier.java > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24010#pullrequestreview-2682518985 From stuefe at openjdk.org Thu Mar 13 16:11:09 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 13 Mar 2025 16:11:09 GMT Subject: Integrated: 8351500: G1: NUMA migrations cause crashes in region allocation In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 14:01:11 GMT, Thomas Stuefe wrote: > For details, please see JBS issue. > > _Please note that this bug only shows symptoms in JDK 21 and JDK 17! Due to code shuffling done as part of G1 region-local pinning work, the error does not show in JDKs 22 and later._ > > I originally planned to fix this just for JDK 21 and 17 (see https://github.com/openjdk/jdk21u-dev/pull/1460). However, I would rather have it fixed in the mainline, even though it is symptom-free. It is a lingering issue that may surface later if the code is ever changed. Plus, this prevents the fix from being accidentally overwritten in JDK 21 if we backport. > > ---- > > The fix is simple: we fix (hah) the NUMA association for the full duration of a heap allocation in G1. That way, regardless of the OS scheduler moving the thread to a different NUMA node, we always use the same `G1AllocRegion` object, and changes in the control flow that rely on that won't break on NUMA. > > This has the disadvantage of allocating memory from a node we are potentially moving away from. However, I argue that this is exceedingly rare, and if it happens, the OS will cope by eventually migrating the memory to the correct node. > > --- > > Testing: > > Testing is difficult. See remark in JBS issue. > > I tested a modified version of this patch on JDK 21, where the error does cause crashes. I tested with an additional patch mimicking tons of NUMA node migrations. As I wrote in JBS, I plan to contribute that "FakeNUMA" mode eventually, but lack the time to polish it up for now. I hope the fix is simple and uncontested enough to go in quickly, since I would like to fix JDK 21 soon via backporting this patch. This pull request has now been integrated. Changeset: 37ec7962 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/37ec796255ae857588a5c7e0d572407dd81cbec9 Stats: 53 lines in 5 files changed: 17 ins; 11 del; 25 mod 8351500: G1: NUMA migrations cause crashes in region allocation Reviewed-by: rkennke, sjohanss, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/23984 From cslucas at openjdk.org Thu Mar 13 18:12:00 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 13 Mar 2025 18:12:00 GMT Subject: RFR: 8350898: Shenandoah: Eliminate final roots safepoint [v4] In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 21:49:06 GMT, William Kemper wrote: >> This PR converts the final roots safepoint operation into a handshake. The safepoint operation still exists, but is only executed when `ShenandoahVerify` is enabled. In addition to this change, this PR also improves the logging for the concurrent preparation for update references from [PR 22688](https://github.com/openjdk/jdk/pull/22688). > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots > - Clarify which thread local buffers in comment > - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots > - Fix comments > - Add whitespace at end of file > - More detail for init update refs event message > - Use timing tracker for timing verification > - Merge remote-tracking branch 'jdk/master' into eliminate-final-roots > - WIP: Fix up phase timings for newly concurrent final roots and init update refs > - WIP: Combine satb transfer with state propagation, restore phase timing data > - ... and 2 more: https://git.openjdk.org/jdk/compare/1dd9cf10...a3575f1e Marked as reviewed by cslucas (Author). src/hotspot/share/gc/shenandoah/shenandoahOldGeneration.cpp line 129: > 127: ShenandoahSATBMarkQueueSet& _satb_queues; > 128: ShenandoahObjToScanQueueSet* _mark_queues; > 129: volatile size_t _trashed_oops; NIT: perhaps a comment about why this needs to be volatile? ------------- PR Review: https://git.openjdk.org/jdk/pull/23830#pullrequestreview-2682970505 PR Review Comment: https://git.openjdk.org/jdk/pull/23830#discussion_r1994072609 From ecaspole at openjdk.org Thu Mar 13 18:34:57 2025 From: ecaspole at openjdk.org (Eric Caspole) Date: Thu, 13 Mar 2025 18:34:57 GMT Subject: Integrated: 8346470: Improve WriteBarrier JMH to have old-to-young refs In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 15:48:20 GMT, Eric Caspole wrote: > Adds new cases based on the existing ones using an extra iteration setup to allocate into young gen, allowing to test old-to-young stores etc, where the existing code is all old-to-old. > > Here is a run on a standard OCI A1.160 with JDK 25: > > Benchmark Mode Cnt Score Error Units > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3448.826 ? 1.620 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.740 ? 0.151 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.353 ? 0.860 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.565 ? 0.132 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungLarge avgt 12 4476.390 ? 0.930 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathOldToYoungSmall avgt 12 73.517 ? 0.082 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealLarge avgt 12 3103.911 ? 2.079 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathRealSmall avgt 12 57.549 ? 0.512 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldLarge avgt 12 9587.762 ? 2.044 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToOldSmall avgt 12 157.244 ? 0.169 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungLarge avgt 12 3103.191 ? 1.100 ns/op > WriteBarrier.WithDefaultUnrolling.testArrayWriteBarrierFastPathYoungToYoungSmall avgt 12 57.392 ? 0.624 ns/op > WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPath avgt 12 2.668 ? 0.001 ns/op > WriteBarrier.WithDefaultUnrolling.testFieldWriteBarrierFastPathYoungRef avgt 12 9.337 ? 0.001 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullLarge avgt 12 3449.234 ? 1.840 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullSmall avgt 12 63.720 ? 0.079 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungLarge avgt 12 3448.055 ? 0.665 ns/op > WriteBarrier.WithoutUnrolling.testArrayWriteBarrierFastPathNullYoungSmall avgt 12 63.555 ? 0.116 ns/op > WriteBarrier.Witho... This pull request has now been integrated. Changeset: 03ef79cf Author: Eric Caspole URL: https://git.openjdk.org/jdk/commit/03ef79cf05bdcfc3bb126d004f8f039fb2f4ba9f Stats: 93 lines in 1 file changed: 91 ins; 0 del; 2 mod 8346470: Improve WriteBarrier JMH to have old-to-young refs Reviewed-by: tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/24010 From wkemper at openjdk.org Thu Mar 13 20:43:05 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 13 Mar 2025 20:43:05 GMT Subject: RFR: 8350898: Shenandoah: Eliminate final roots safepoint [v5] In-Reply-To: References: Message-ID: > This PR converts the final roots safepoint operation into a handshake. The safepoint operation still exists, but is only executed when `ShenandoahVerify` is enabled. In addition to this change, this PR also improves the logging for the concurrent preparation for update references from [PR 22688](https://github.com/openjdk/jdk/pull/22688). William Kemper has updated the pull request incrementally with one additional commit since the last revision: Add comment explaining use of _trashed_oops ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23830/files - new: https://git.openjdk.org/jdk/pull/23830/files/a3575f1e..cd6c6e44 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23830&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23830&range=03-04 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23830.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23830/head:pull/23830 PR: https://git.openjdk.org/jdk/pull/23830 From kbarrett at openjdk.org Thu Mar 13 21:34:56 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 13 Mar 2025 21:34:56 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings [v2] In-Reply-To: <-d_fQfoAWB7fuNa7M46BFbuvHqdScs1ZTW3bFULjOnY=.b2b279cd-a9a9-46ad-8933-e5d2fd3f26c3@github.com> References: <-d_fQfoAWB7fuNa7M46BFbuvHqdScs1ZTW3bFULjOnY=.b2b279cd-a9a9-46ad-8933-e5d2fd3f26c3@github.com> Message-ID: On Tue, 11 Mar 2025 09:33:20 GMT, Stefan Karlsson wrote: >> When -XX:+UseStringDeduplication is turned on, ZGC requests that every single String it encounters is deduplicated. The Strings of these requests are saved in weak handles, and then processed by a separate thread. One problematic part with this is that ZGC treats these handles as strong roots for young collections. So, even if the Strings are short-lived they will be artificially kept alive until they get promoted up to the old generation. >> >> This creates an extreme amount of Strings and weak handles to be processed by the old collection, which can result in long major collections and allocation stalls. >> >> Other GCs have a filter to check for how old the Strings are before they get deduplicated. >> >> The proposed fix is to move the string deduplication requests to when the Strings are promoted to the old generation. >> >> Testing: >> >> * I've tested this with a small micro that showed how ZGC got extremely long major collections with string deduplication turned on. >> >> * SPECjbb2015 with a JVMTI agent that induces load and adds deduplicatable strings. >> >> * Tier1-7 >> >> Note: I'm currently not aware of any non-artificial workload where string deduplication is an important optimization when running with Generational ZGC. If anyone knows of a workload that greatly benefits from it *AND* uses ZGC as a low-latency collector, then that would be highly interesting to look at. >> >> Note 2: the branch contains two changesets. In the first changeset I added a flag to be able to test and compare the old implementation with the new implementation. For the final PR I've removed that flag and the associated code as a second changeset. If we really want we could keep that flag, but given how poorly that implementation worked for Generational ZGC, I think we should just go with this new implementation. > > Stefan Karlsson has updated the pull request incrementally with two additional commits since the last revision: > > - Make ZPageAge ZForwarding member fileds constant > - Review comments Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23965#pullrequestreview-2683413060 From kbarrett at openjdk.org Thu Mar 13 21:34:57 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 13 Mar 2025 21:34:57 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings [v2] In-Reply-To: <7ZsJhxZDYDIsXtDjhDHAnO0I4cxAFSlf7Oi9FVIh5_I=.2168a8f6-c7c6-4d07-aee6-753680dd11d3@github.com> References: <-d_fQfoAWB7fuNa7M46BFbuvHqdScs1ZTW3bFULjOnY=.b2b279cd-a9a9-46ad-8933-e5d2fd3f26c3@github.com> <7ZsJhxZDYDIsXtDjhDHAnO0I4cxAFSlf7Oi9FVIh5_I=.2168a8f6-c7c6-4d07-aee6-753680dd11d3@github.com> Message-ID: <9Qjdz5cKLBdj8Axo80Usgw5eDC6S7woId2cK-rCL4xA=.861dd4d9-2c01-444c-8472-32ac69dead50@github.com> On Tue, 11 Mar 2025 18:43:49 GMT, Stefan Karlsson wrote: > About the style _guide_. I see that section more as a helpful guide, but not > as a complete mandate to how to name files in HotSpot. That's true for a lot of the style guide. But there generally ought to be a (good) reason to be different. I don't see a particularly compelling reason in this case, though zgc has a number of stylistic peculiarities of its own, and maybe that's sufficient. > Note, that even the StringDedup::Request class is placed in a file named > stringDedup.hpp and not stringDedupRequest.hpp! The StringDedup class (and associated header file) is the external interface to the facility, so having the Request class be a member is natural. As such it's natural to treat it as a "buddy" class in the sense used in the style guide. So I don't see that as a compelling argument for the proposed zgc structure. > I could easily also create a structure that simulates the structure used in > stringDedup.hpp: Given that the Context class would be the only thing in such a notional ZStringDedup class, I agree that doesn't seem like an improvement. > This was intentional. I named them zStringDedup.* to show that this is the > file that contains ZGC's support to interface with StringDedup. We do that for > other sub-systems that ZGC interfaces with. If I was looking for StringDedup-related stuff in zgc and only found files named zStringDedupContext.*, I would certainly look there. Though I admit that if I was looking for the class ZStringDedupContext I would look in zStringDedup.* in the absence of any other zStringDedup*.*. I don't feel super strongly about this, and it seems you do, so I'm not going to block over it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23965#discussion_r1994331071 From tschatzl at openjdk.org Fri Mar 14 14:28:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 14 Mar 2025 14:28:57 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v17] In-Reply-To: References: <0w7seS1tIFhUxnmStxQySISWVfpBBsRmUtx7EoTy9a4=.509a3d5e-56d0-4fd8-8896-51835b14302b@github.com> Message-ID: <58jXaIS3TNN9Y9xWGSKWM7B4C0dbZ6YxRWjPMmBeFnY=.506b75a0-12a4-424c-869c-8358195947d9@github.com> On Wed, 12 Mar 2025 13:56:57 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 263: >> >>> 261: >>> 262: SuspendibleThreadSetLeaver sts_leave; >>> 263: VMThread::execute(&op); >> >> Can you elaborate what synchronization this VM op is trying to achieve? > > Memory visibility for refinement threads for the references written to the heap. Without them, they may not have received the most recent values. > This is the same as the `StoreLoad` barriers synchronization between mutator and refinement threads imo. There has been a discussion about whether this is actually needed. Initially we thought that this could be removed because it's only the refinement worker threads that would need memory synchronization, and the memory synchronization is handled by just starting up the refinement threads. However the rebuild remsets process (marking threads) also access the global card table reference to mark the to-collection-set cards and its value must be synchronized. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r1995683088 From tschatzl at openjdk.org Fri Mar 14 14:37:27 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 14 Mar 2025 14:37:27 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v20] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * ayang review * re-add STS leaver for java thread handshake ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/78611173..51a9eed8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=18-19 Stats: 15 lines in 1 file changed: 5 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Fri Mar 14 16:35:38 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 14 Mar 2025 16:35:38 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v21] In-Reply-To: References: Message-ID: <1bH6bLmIYx6eVtZ4IPlFtdYpdCAwSaNB6w0uNljTSJE=.8a4a88c7-2f66-493c-91dd-6fc6c744c08f@github.com> > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: - Merge branch 'master' into 8342381-card-table-instead-of-dcq - * ayang review * re-add STS leaver for java thread handshake - * when aborting refinement during full collection, the global card table and the per-thread card table might not be in sync. Roll forward during abort of the refinement in these situations. * additional verification * added some missing ResourceMarks in asserts * added variant of ArrayJuggle2 that crashes fairly quickly without these changes - * ayang review * remove unnecessary STSleaver * some more documentation around to_collection_card card color - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * optimized RISCV gen_write_ref_array_post_barrier() implementation contributed by @RealFYang - * fix card table verification crashes: in the first refinement phase, when switching the global card tables, we need to re-check whether we are still in the same sweep epoch or not. It might have changed due to a GC interrupting acquiring the Heap_lock. Otherwise new threads will scribble on the refinement table. Cause are last-minute changes before making the PR ready to review. Testing: without the patch, occurs fairly frequently when continuously (1 in 20) starting refinement. Does not afterward. - * ayang review 3 * comments * minor refactorings - * iwalulya review * renaming * fix some includes, forward declaration - * fix whitespace * additional whitespace between log tags * rename G1ConcurrentRefineWorkTask -> ...SweepTask to conform to the other similar rename - ... and 18 more: https://git.openjdk.org/jdk/compare/7f428041...b0730176 ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=20 Stats: 6761 lines in 99 files changed: 2368 ins; 3464 del; 929 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From wkemper at openjdk.org Fri Mar 14 23:50:25 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 14 Mar 2025 23:50:25 GMT Subject: RFR: 8352091: GenShen: assert(!(request.generation->is_old() && _heap->old_generation()->is_doing_mixed_evacuations())) failed: Old heuristic should not request cycles while it waits for mixed evacuation Message-ID: Consider the following: 1. Regulator thread sees that control thread is `idle` and requests an old cycle 2. Regulator thread waits until control thread is not `idle` 3. Control thread starts old cycle and notifies the Regulator thread (as expected) 4. Regulator thread stays off CPU for a _long_ time 5. Control thread _completes_ old marking and returns to `idle` state 6. Regulator thread finally wakes up and sees that Control thread is _still_ idle 7. In fact, the control thread has completed old marking and the regulator thread should not request another cycle ------------- Commit messages: - Fix ABA issue that could have regulator thread request unexpected old cycles Changes: https://git.openjdk.org/jdk/pull/24069/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24069&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352091 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24069.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24069/head:pull/24069 PR: https://git.openjdk.org/jdk/pull/24069 From tschatzl at openjdk.org Sat Mar 15 13:12:39 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Sat, 15 Mar 2025 13:12:39 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v22] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * more documentation on why we need to rendezvous the gc threads ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/b0730176..447fe39b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=20-21 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From duke at openjdk.org Sun Mar 16 17:12:53 2025 From: duke at openjdk.org (Saint Wesonga) Date: Sun, 16 Mar 2025 17:12:53 GMT Subject: RFR: 8350722: Serial GC: Remove duplicate logic for detecting pointers in young gen In-Reply-To: References: Message-ID: On Wed, 26 Feb 2025 08:45:57 GMT, Thomas Schatzl wrote: > If the goal is just factoring out the check to make it change for all locations whenever heap layout is changed, I would prefer adding a helper method in `SerialHeap` for example that's easily inlinable for the compiler. Thanks for the explanation. I'll look into a different approach like an inlinable helper method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23792#discussion_r1997663963 From stefank at openjdk.org Mon Mar 17 07:24:57 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 17 Mar 2025 07:24:57 GMT Subject: RFR: 8347337: ZGC: String dedups short-lived strings [v2] In-Reply-To: <9Qjdz5cKLBdj8Axo80Usgw5eDC6S7woId2cK-rCL4xA=.861dd4d9-2c01-444c-8472-32ac69dead50@github.com> References: <-d_fQfoAWB7fuNa7M46BFbuvHqdScs1ZTW3bFULjOnY=.b2b279cd-a9a9-46ad-8933-e5d2fd3f26c3@github.com> <7ZsJhxZDYDIsXtDjhDHAnO0I4cxAFSlf7Oi9FVIh5_I=.2168a8f6-c7c6-4d07-aee6-753680dd11d3@github.com> <9Qjdz5cKLBdj8Axo80Usgw5eDC6S7woId2cK-rCL4xA=.861dd4d9-2c01-444c-8472-32ac69dead50@github.com> Message-ID: On Thu, 13 Mar 2025 21:32:09 GMT, Kim Barrett wrote: >> This was intentional. I named them zStringDedup.* to show that this is the file that contains ZGC's support to interface with StringDedup. We do that for other sub-systems that ZGC interfaces with. >> >> The crux is that the string dedup API requires us to maintain the lifecycle of a `StringDedup::Requests` instance, so we can't simply have a function like `ZStringDedup::request(obj)`. Instead we need to add a ZStringDedupContext class, just so that we maintain the Requests object. (I choose suffix `Context` instead of `Requests`, because that naming fits better with the rest of the ZGC code). >> >> About the style _guide_. I see that section more as a helpful guide, but not as a complete mandate to how to name files in HotSpot. >> >> Note, that even the StringDedup::Request class is placed in a file named stringDedup.hpp and not stringDedupRequest.hpp! I could easily also create a structure that simulates the structure used in stringDedup.hpp: >> >> >> class ZStringDedup { >> public: >> class Context { >> ... >> }; >> }; >> >> >> However, I don't find that particularly appealing and it somewhat goes against the informal style guide that Per and I used when we first started to write ZGC. > >> About the style _guide_. I see that section more as a helpful guide, but not >> as a complete mandate to how to name files in HotSpot. > > That's true for a lot of the style guide. But there generally ought to be a > (good) reason to be different. I don't see a particularly compelling reason > in this case, though zgc has a number of stylistic peculiarities of its own, > and maybe that's sufficient. > >> Note, that even the StringDedup::Request class is placed in a file named >> stringDedup.hpp and not stringDedupRequest.hpp! > > The StringDedup class (and associated header file) is the external interface > to the facility, so having the Request class be a member is natural. As such > it's natural to treat it as a "buddy" class in the sense used in the style > guide. So I don't see that as a compelling argument for the proposed zgc > structure. > >> I could easily also create a structure that simulates the structure used in >> stringDedup.hpp: > > Given that the Context class would be the only thing in such a notional > ZStringDedup class, I agree that doesn't seem like an improvement. > >> This was intentional. I named them zStringDedup.* to show that this is the >> file that contains ZGC's support to interface with StringDedup. We do that for >> other sub-systems that ZGC interfaces with. > > If I was looking for StringDedup-related stuff in zgc and only found files > named `zStringDedupContext.*`, I would certainly look there. Though I admit that > if I was looking for the class ZStringDedupContext I would look in > `zStringDedup.*` in the absence of any other `zStringDedup*.*`. > > I don't feel super strongly about this, and it seems you do, so I'm not going > to block over it. Right, where you state that you don't see a compelling reason I do see them, so thanks for not blocking this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23965#discussion_r1998085535 From jsikstro at openjdk.org Mon Mar 17 07:58:06 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 17 Mar 2025 07:58:06 GMT Subject: Integrated: 8351167: ZGC: Lazily initialize livemap In-Reply-To: <05cL5-IAaVEDpyTUxQ61JqzRzgM6myzEsIWt-xBLJRM=.ad94b7e8-50e0-4447-952d-995e143b5218@github.com> References: <05cL5-IAaVEDpyTUxQ61JqzRzgM6myzEsIWt-xBLJRM=.ad94b7e8-50e0-4447-952d-995e143b5218@github.com> Message-ID: <6fstOLt4mCdsm0nHjHJRiULyNsJ4igMOyCQq64pRXy4=.3b02312c-f89d-46da-985a-5a635908e9d7@github.com> On Tue, 4 Mar 2025 19:55:38 GMT, Joel Sikstr?m wrote: > Memory for the bitmap inside the livemap of a ZPage is currently allocated upon calling its constructor, which adds a latency overhead when allocating pages. As preparation for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)), but also as a standalone improvement, we want to instead lazily initialize the livemap's bitmap. > > This patch holds off with allocating memory for the bitmap that the livemap uses until the livemap is written to the first time (i.e. by calling ZLiveMap::set). The effect of this is that the latency impact of allocating the bitmap will only be taken by GC threads and not by mutator threads, since only GC threads mark objects before pushing them onto the mark stack. This improvement will reduce page allocation latencies somewhat. > > In addition to lazily allocating the bitmap, I've converted the static C-style cast to a checked cast for `ZPage::object_max_count()`, which is passed as the size to the bitmaps. This is because a value not contained in 32 bits will overflow with the C-style cast and give a too small bitmap when passed to the livemap. This is not an observed issue, just more of a sanity check. > > Testing: > * Tiers 1-5 > * GHA This pull request has now been integrated. Changeset: 2672c40b Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/2672c40bf10a6597ae861e2183e7558ffed43dba Stats: 25 lines in 4 files changed: 17 ins; 1 del; 7 mod 8351167: ZGC: Lazily initialize livemap Reviewed-by: sjohanss, eosterlund, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/23907 From jsikstro at openjdk.org Mon Mar 17 07:58:05 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 17 Mar 2025 07:58:05 GMT Subject: RFR: 8351167: ZGC: Lazily initialize livemap [v2] In-Reply-To: References: <05cL5-IAaVEDpyTUxQ61JqzRzgM6myzEsIWt-xBLJRM=.ad94b7e8-50e0-4447-952d-995e143b5218@github.com> Message-ID: On Tue, 4 Mar 2025 20:17:28 GMT, Joel Sikstr?m wrote: >> Memory for the bitmap inside the livemap of a ZPage is currently allocated upon calling its constructor, which adds a latency overhead when allocating pages. As preparation for the Mapped Cache ([JDK-8350441](https://bugs.openjdk.org/browse/JDK-8350441)), but also as a standalone improvement, we want to instead lazily initialize the livemap's bitmap. >> >> This patch holds off with allocating memory for the bitmap that the livemap uses until the livemap is written to the first time (i.e. by calling ZLiveMap::set). The effect of this is that the latency impact of allocating the bitmap will only be taken by GC threads and not by mutator threads, since only GC threads mark objects before pushing them onto the mark stack. This improvement will reduce page allocation latencies somewhat. >> >> In addition to lazily allocating the bitmap, I've converted the static C-style cast to a checked cast for `ZPage::object_max_count()`, which is passed as the size to the bitmaps. This is because a value not contained in 32 bits will overflow with the C-style cast and give a too small bitmap when passed to the livemap. This is not an observed issue, just more of a sanity check. >> >> Testing: >> * Tiers 1-5 >> * GHA > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Copyright years Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23907#issuecomment-2728492564 From tschatzl at openjdk.org Mon Mar 17 09:29:51 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 17 Mar 2025 09:29:51 GMT Subject: RFR: 8352138: G1: Remove G1AddMetaspaceDependency.java test Message-ID: Hi all, please review trivial removal of this test because the reason for its existence, the `DirtyCardQ_CBL_mon` lock has been removed in JDK-8237143 long time ago. Testing: gha Thanks, Thomas ------------- Commit messages: - 8352138 Changes: https://git.openjdk.org/jdk/pull/24075/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24075&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352138 Stats: 125 lines in 1 file changed: 0 ins; 125 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24075.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24075/head:pull/24075 PR: https://git.openjdk.org/jdk/pull/24075 From ayang at openjdk.org Mon Mar 17 09:58:53 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 17 Mar 2025 09:58:53 GMT Subject: RFR: 8352138: G1: Remove G1AddMetaspaceDependency.java test In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 09:23:09 GMT, Thomas Schatzl wrote: > Hi all, > > please review trivial removal of this test because the reason for its > existence, the `DirtyCardQ_CBL_mon` lock has been removed in JDK-8237143 long time ago. > > Testing: gha > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24075#pullrequestreview-2689813344 From shade at openjdk.org Mon Mar 17 10:10:58 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 17 Mar 2025 10:10:58 GMT Subject: RFR: 8352138: G1: Remove G1AddMetaspaceDependency.java test In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 09:23:09 GMT, Thomas Schatzl wrote: > Hi all, > > please review trivial removal of this test because the reason for its > existence, the `DirtyCardQ_CBL_mon` lock has been removed in JDK-8237143 long time ago. > > Testing: gha > > Thanks, > Thomas Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24075#pullrequestreview-2689860366 From tschatzl at openjdk.org Mon Mar 17 10:32:33 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 17 Mar 2025 10:32:33 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v23] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * obsolete G1UpdateBufferSize G1UpdateBufferSize has previously been used to size the refinement buffers and impose a minimum limit on the number of cards per thread that need to be pending before refinement starts. The former function is now obsolete with the removal of the dirty card queues, the latter functionality has been taken over by the new diagnostic option `G1PerThreadPendingCardThreshold`. I prefer to make this a diagnostic option is better than a product option because it is something that is only necessary for some test cases to produce some otherwise unwanted behavior (continuous refinement). CSR is pending. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/447fe39b..4d0afd57 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=21-22 Stats: 16 lines in 7 files changed: 2 ins; 9 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Mon Mar 17 11:55:12 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 17 Mar 2025 11:55:12 GMT Subject: RFR: 8352147: G1: TestEagerReclaimHumongousRegionsClearMarkBits test takes very long Message-ID: Hi all, please review this refactor of the TestEagerReclaimHumongousRegionsClearMarkBits test that runs way too long, and sometimes even causing timeouts in GHA. So the problem is that TestEagerReclaimHumongousRegionsClearMarkBits checks whether after eager reclaim during marking the mark on the humongous object is cleared correctly. It does so with a trial-and-error approach allocating humongous objects and hoping that the faulty state somehow occurs. This can take a long time, and although the test limits itself to 50s runtime, for some reason there can still be sporadic timeouts in some setups. However for a long time it is possible to halt concurrent mark just before completion, inducing the exact state needed for this test. So rewrite the test to be more targeted. Testing: gha, running test locally Thanks, Thomas ------------- Commit messages: - 8352147 Changes: https://git.openjdk.org/jdk/pull/24077/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24077&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352147 Stats: 93 lines in 1 file changed: 9 ins; 55 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/24077.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24077/head:pull/24077 PR: https://git.openjdk.org/jdk/pull/24077 From iwalulya at openjdk.org Mon Mar 17 12:40:48 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 17 Mar 2025 12:40:48 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection Message-ID: Hi all, Please review this change which aims to reduce spikes in mixed GCs, especially the last mixed-gc in a mixed cycle. Currently, G1 sorts regions identified for collection by reclaimable bytes, then prunes the list removing regions that with the lowest amount of reclaimable bytes. The pruned list is then split into collection groups which are later sorted on gc-efficiency. In the cachestress benchmark, we run into a case where some regions contain onlya few live objects but having many incoming references from other regions. These regions very expensive collect (low gc-efficiency). This patch improves the pruning process by tracking incoming references to regions during marking. Instead of pruning based on reclaimable bytes alone, we estimate GC efficiency beforehand and prune regions with the worst GC efficiency. This reduces the spikes in gc pause time as shown for cachestress benchmark in the image below. ![mixed-gc](https://github.com/user-attachments/assets/740fb51d-eb20-4946-bf90-4eef23afe2e4) Testing: Tier 1-3. ------------- Commit messages: - remove double prune - save - revise region selection - save - init Changes: https://git.openjdk.org/jdk/pull/24076/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24076&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351405 Stats: 79 lines in 11 files changed: 66 ins; 2 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/24076.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24076/head:pull/24076 PR: https://git.openjdk.org/jdk/pull/24076 From tschatzl at openjdk.org Mon Mar 17 13:27:13 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 17 Mar 2025 13:27:13 GMT Subject: RFR: 8352147: G1: TestEagerReclaimHumongousRegionsClearMarkBits test takes very long [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this refactor of the TestEagerReclaimHumongousRegionsClearMarkBits test that runs way too long, and sometimes even causing timeouts in GHA. > > So the problem is that TestEagerReclaimHumongousRegionsClearMarkBits checks whether after eager reclaim during marking the mark on the humongous object is cleared correctly. It does so with a trial-and-error approach allocating humongous objects and hoping that the faulty state somehow occurs. > > This can take a long time, and although the test limits itself to 50s runtime, for some reason there can still be sporadic timeouts in some setups. > > However for a long time it is possible to halt concurrent mark just before completion, inducing the exact state needed for this test. So rewrite the test to be more targeted. > > Testing: gha, running test locally > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * also check for actual region reclamation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24077/files - new: https://git.openjdk.org/jdk/pull/24077/files/8e834051..240f01f4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24077&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24077&range=00-01 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24077.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24077/head:pull/24077 PR: https://git.openjdk.org/jdk/pull/24077 From tschatzl at openjdk.org Mon Mar 17 13:51:33 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 17 Mar 2025 13:51:33 GMT Subject: RFR: 8352147: G1: TestEagerReclaimHumongousRegionsClearMarkBits test takes very long [v3] In-Reply-To: References: Message-ID: <7vkgivvmXeRKTBOYP33bS2LgBJMgcz-WvZQsC9oB9zU=.f1eb1fcd-f2cd-4455-be98-198226b13e02@github.com> > Hi all, > > please review this refactor of the TestEagerReclaimHumongousRegionsClearMarkBits test that runs way too long, and sometimes even causing timeouts in GHA. > > So the problem is that TestEagerReclaimHumongousRegionsClearMarkBits checks whether after eager reclaim during marking the mark on the humongous object is cleared correctly. It does so with a trial-and-error approach allocating humongous objects and hoping that the faulty state somehow occurs. > > This can take a long time, and although the test limits itself to 50s runtime, for some reason there can still be sporadic timeouts in some setups. > > However for a long time it is possible to halt concurrent mark just before completion, inducing the exact state needed for this test. So rewrite the test to be more targeted. > > Testing: gha, running test locally > > Thanks, > Thomas Thomas Schatzl has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8352147 Hi all, please review this refactor of the TestEagerReclaimHumongousRegionsClearMarkBits test that runs way too long, and sometimes even causing timeouts in GHA. So the problem is that TestEagerReclaimHumongousRegionsClearMarkBits checks whether after eager reclaim during marking the mark on the humongous object is cleared correctly. It does so with a trial-and-error approach allocating humongous objects and hoping that the faulty state somehow occurs. This can take a long time, and although the test limits itself to 50s runtime, for some reason there can still be sporadic timeouts in some setups. However for a long time it is possible to halt concurrent mark just before completion, inducing the exact state needed for this test. So rewrite the test to be more targeted. Testing: gha, running test locally Thanks, Thomas * also check for actual region reclamation * last minute typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24077/files - new: https://git.openjdk.org/jdk/pull/24077/files/240f01f4..86aa8495 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24077&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24077&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24077.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24077/head:pull/24077 PR: https://git.openjdk.org/jdk/pull/24077 From kirk at kodewerk.com Mon Mar 17 15:26:18 2025 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Mon, 17 Mar 2025 08:26:18 -0700 Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection In-Reply-To: References: Message-ID: Hi Ivan, Not a review but this change looks good. I?ve also noted that the last mixed collection in the set always takes longer than the previous mixed. Does this patch address that spike? Kind regards, Kirk > On Mar 17, 2025, at 5:40?AM, Ivan Walulya wrote: > > Hi all, > > Please review this change which aims to reduce spikes in mixed GCs, especially the last mixed-gc in a mixed cycle. Currently, G1 sorts regions identified for collection by reclaimable bytes, then prunes the list removing regions that with the lowest amount of reclaimable bytes. The pruned list is then split into collection groups which are later sorted on gc-efficiency. > > In the cachestress benchmark, we run into a case where some regions contain onlya few live objects but having many incoming references from other regions. These regions very expensive collect (low gc-efficiency). > > This patch improves the pruning process by tracking incoming references to regions during marking. Instead of pruning based on reclaimable bytes alone, we estimate GC efficiency beforehand and prune regions with the worst GC efficiency. > > This reduces the spikes in gc pause time as shown for cachestress benchmark in the image below. > > ![mixed-gc](https://github.com/user-attachments/assets/740fb51d-eb20-4946-bf90-4eef23afe2e4) > > > Testing: Tier 1-3. > > ------------- > > Commit messages: > - remove double prune > - save > - revise region selection > - save > - init > > Changes: https://git.openjdk.org/jdk/pull/24076/files > Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24076&range=00 > Issue: https://bugs.openjdk.org/browse/JDK-8351405 > Stats: 79 lines in 11 files changed: 66 ins; 2 del; 11 mod > Patch: https://git.openjdk.org/jdk/pull/24076.diff > Fetch: git fetch https://git.openjdk.org/jdk.git pull/24076/head:pull/24076 > > PR: https://git.openjdk.org/jdk/pull/24076 From iwalulya at openjdk.org Mon Mar 17 15:34:53 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 17 Mar 2025 15:34:53 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection In-Reply-To: References: Message-ID: <99NA624UECmZMmIUlKTtL42R6gq5tFRMA9JnKeMMBns=.3a508c27-79bc-4a23-8ca7-da47e3341530@github.com> On Mon, 17 Mar 2025 11:19:02 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change which aims to reduce spikes in mixed GCs, especially the last mixed-gc in a mixed cycle. Currently, G1 sorts regions identified for collection by reclaimable bytes, then prunes the list removing regions that with the lowest amount of reclaimable bytes. The pruned list is then split into collection groups which are later sorted on gc-efficiency. > > In the cachestress benchmark, we run into a case where some regions contain onlya few live objects but having many incoming references from other regions. These regions very expensive collect (low gc-efficiency). > > This patch improves the pruning process by tracking incoming references to regions during marking. Instead of pruning based on reclaimable bytes alone, we estimate GC efficiency beforehand and prune regions with the worst GC efficiency. > > This reduces the spikes in gc pause time as shown for cachestress benchmark in the image below. > > ![mixed-gc](https://github.com/user-attachments/assets/740fb51d-eb20-4946-bf90-4eef23afe2e4) > > > Testing: Tier 1-3. > _Mailing list message from [Kirk Pepperdine](mailto:kirk at kodewerk.com) on [hotspot-gc-dev](mailto:hotspot-gc-dev at mail.openjdk.org):_ > > Hi Ivan, > > Not a review but this change looks good. I?ve also noted that the last mixed collection in the set always takes longer than the previous mixed. Does this patch address that spike? > > Kind regards, Kirk Thanks for taking a look. Yes this patch is meant to address the spikey "last mixed collection", at least reduce the spikes if not eliminating them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24076#issuecomment-2729967241 From tschatzl at openjdk.org Mon Mar 17 15:47:34 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 17 Mar 2025 15:47:34 GMT Subject: RFR: 8351921: G1: Pinned regions with pinned objects only reachable by native code crash VM Message-ID: Hi all, please review this fix for a crash in G1 where it tries to reclaim a pinned region that does not have any reference visible to the VM any more and there are no other reachable pinnable objects in the same region. This can happen e.g. when JNI code is the only holder for a reference. This has been reported this in a real application, but the attached test case (that uses WhiteBox to simulate a lone pinnable object in a region where the GC does not have a reference any more) shows the principle as well. The solution involves unconditionally adding pinned regions in the collection set to the set of evacuation failed regions, instead of only doing that when G1 first encounters a reachable pinnable object in that pinned region. Testing: gha, tier1-5 Thanks, Thomas ------------- Commit messages: - * fix copyright date - * remove debug code - * keep regular evacuation failure working... - * fix - * crashing test case Changes: https://git.openjdk.org/jdk/pull/24060/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24060&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351921 Stats: 124 lines in 4 files changed: 109 ins; 7 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24060.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24060/head:pull/24060 PR: https://git.openjdk.org/jdk/pull/24060 From tschatzl at openjdk.org Mon Mar 17 15:48:14 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 17 Mar 2025 15:48:14 GMT Subject: RFR: 8352147: G1: TestEagerReclaimHumongousRegionsClearMarkBits test takes very long [v4] In-Reply-To: References: Message-ID: > Hi all, > > please review this refactor of the TestEagerReclaimHumongousRegionsClearMarkBits test that runs way too long, and sometimes even causing timeouts in GHA. > > So the problem is that TestEagerReclaimHumongousRegionsClearMarkBits checks whether after eager reclaim during marking the mark on the humongous object is cleared correctly. It does so with a trial-and-error approach allocating humongous objects and hoping that the faulty state somehow occurs. > > This can take a long time, and although the test limits itself to 50s runtime, for some reason there can still be sporadic timeouts in some setups. > > However for a long time it is possible to halt concurrent mark just before completion, inducing the exact state needed for this test. So rewrite the test to be more targeted. > > Testing: gha, running test locally > > Thanks, > Thomas Thomas Schatzl has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8352147 Hi all, please review this refactor of the TestEagerReclaimHumongousRegionsClearMarkBits test that runs way too long, and sometimes even causing timeouts in GHA. So the problem is that TestEagerReclaimHumongousRegionsClearMarkBits checks whether after eager reclaim during marking the mark on the humongous object is cleared correctly. It does so with a trial-and-error approach allocating humongous objects and hoping that the faulty state somehow occurs. This can take a long time, and although the test limits itself to 50s runtime, for some reason there can still be sporadic timeouts in some setups. However for a long time it is possible to halt concurrent mark just before completion, inducing the exact state needed for this test. So rewrite the test to be more targeted. Testing: gha, running test locally Thanks, Thomas * also check for actual region reclamation * last minute typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24077/files - new: https://git.openjdk.org/jdk/pull/24077/files/86aa8495..8f3f5ae1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24077&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24077&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24077.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24077/head:pull/24077 PR: https://git.openjdk.org/jdk/pull/24077 From tschatzl at openjdk.org Mon Mar 17 16:33:39 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 17 Mar 2025 16:33:39 GMT Subject: RFR: 8351921: G1: Pinned regions with pinned objects only reachable by native code crash VM [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this fix for a crash in G1 where it tries to reclaim a pinned region that does not have any reference visible to the VM any more and there are no other reachable pinnable objects in the same region. This can happen e.g. when JNI code is the only holder for a reference. > > This has been reported this in a real application, but the attached test case (that uses WhiteBox to simulate a lone pinnable object in a region where the GC does not have a reference any more) shows the principle as well. > > The solution involves unconditionally adding pinned regions in the collection set to the set of evacuation failed regions, instead of only doing that when G1 first encounters a reachable pinnable object in that pinned region. > > Testing: gha, tier1-5 > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: - * iwalulya review * typos - Update test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedEvacEmpty.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24060/files - new: https://git.openjdk.org/jdk/pull/24060/files/d0848bad..bfd96710 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24060&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24060&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24060.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24060/head:pull/24060 PR: https://git.openjdk.org/jdk/pull/24060 From iwalulya at openjdk.org Mon Mar 17 16:33:40 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 17 Mar 2025 16:33:40 GMT Subject: RFR: 8351921: G1: Pinned regions with pinned objects only reachable by native code crash VM [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 16:28:57 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this fix for a crash in G1 where it tries to reclaim a pinned region that does not have any reference visible to the VM any more and there are no other reachable pinnable objects in the same region. This can happen e.g. when JNI code is the only holder for a reference. >> >> This has been reported this in a real application, but the attached test case (that uses WhiteBox to simulate a lone pinnable object in a region where the GC does not have a reference any more) shows the principle as well. >> >> The solution involves unconditionally adding pinned regions in the collection set to the set of evacuation failed regions, instead of only doing that when G1 first encounters a reachable pinnable object in that pinned region. >> >> Testing: gha, tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - * iwalulya review > * typos > - Update test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedEvacEmpty.java Looks good! Nits: test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedEvacEmpty.java line 25: > 23: > 24: /* @test > 25: * @summary Test that pinned objects we lost all Java references to keep "to keep do not make" possibily a typo, probably the "to keep" should be removed. test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedEvacEmpty.java line 29: > 27: * This test simulates this behavior using Whitebox/Unsafe methods > 28: * to pin a Java object in a region with no other pinnable objects and > 29: * loose the reference to it before the garbage collection. s/loose/lose ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24060#pullrequestreview-2691207022 PR Review Comment: https://git.openjdk.org/jdk/pull/24060#discussion_r1999136762 PR Review Comment: https://git.openjdk.org/jdk/pull/24060#discussion_r1999133889 From tschatzl at openjdk.org Mon Mar 17 16:33:41 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 17 Mar 2025 16:33:41 GMT Subject: RFR: 8351921: G1: Pinned regions with pinned objects only reachable by native code crash VM [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 16:28:57 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this fix for a crash in G1 where it tries to reclaim a pinned region that does not have any reference visible to the VM any more and there are no other reachable pinnable objects in the same region. This can happen e.g. when JNI code is the only holder for a reference. >> >> This has been reported this in a real application, but the attached test case (that uses WhiteBox to simulate a lone pinnable object in a region where the GC does not have a reference any more) shows the principle as well. >> >> The solution involves unconditionally adding pinned regions in the collection set to the set of evacuation failed regions, instead of only doing that when G1 first encounters a reachable pinnable object in that pinned region. >> >> Testing: gha, tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - * iwalulya review > * typos > - Update test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedEvacEmpty.java test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedEvacEmpty.java line 61: > 59: > 60: public static void main(String[] args) throws Exception { > 61: System.out.println("foobar"); Suggestion: Unnecessary debug code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24060#discussion_r1999136043 From iwalulya at openjdk.org Mon Mar 17 16:36:35 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 17 Mar 2025 16:36:35 GMT Subject: RFR: 8352147: G1: TestEagerReclaimHumongousRegionsClearMarkBits test takes very long [v4] In-Reply-To: References: Message-ID: <4E86maQ-caPooTGiXJRzXxhZy5AKANQKqNZYhqCuP8Y=.4c2014bb-e230-4d80-b574-d6ac03152f7c@github.com> On Mon, 17 Mar 2025 15:48:14 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this refactor of the TestEagerReclaimHumongousRegionsClearMarkBits test that runs way too long, and sometimes even causing timeouts in GHA. >> >> So the problem is that TestEagerReclaimHumongousRegionsClearMarkBits checks whether after eager reclaim during marking the mark on the humongous object is cleared correctly. It does so with a trial-and-error approach allocating humongous objects and hoping that the faulty state somehow occurs. >> >> This can take a long time, and although the test limits itself to 50s runtime, for some reason there can still be sporadic timeouts in some setups. >> >> However for a long time it is possible to halt concurrent mark just before completion, inducing the exact state needed for this test. So rewrite the test to be more targeted. >> >> Testing: gha, running test locally >> >> Thanks, >> Thomas > > Thomas Schatzl has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8352147 > > Hi all, > > please review this refactor of the TestEagerReclaimHumongousRegionsClearMarkBits test that runs way too long, and sometimes even causing timeouts in GHA. > > So the problem is that TestEagerReclaimHumongousRegionsClearMarkBits checks whether after eager reclaim during marking the mark on the humongous object is cleared correctly. It does so with a trial-and-error approach allocating humongous objects and hoping that the faulty state somehow occurs. > > This can take a long time, and although the test limits itself to 50s runtime, for some reason there can still be sporadic timeouts in some setups. > > However for a long time it is possible to halt concurrent mark just before completion, inducing the exact state needed for this test. So rewrite the test to be more targeted. > > Testing: gha, running test locally > > Thanks, > Thomas > > * also check for actual region reclamation > * last minute typo LGTM! ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24077#pullrequestreview-2691281215 From ayang at openjdk.org Mon Mar 17 19:53:08 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 17 Mar 2025 19:53:08 GMT Subject: RFR: 8351921: G1: Pinned regions with pinned objects only reachable by native code crash VM [v2] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 16:33:39 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this fix for a crash in G1 where it tries to reclaim a pinned region that does not have any reference visible to the VM any more and there are no other reachable pinnable objects in the same region. This can happen e.g. when JNI code is the only holder for a reference. >> >> This has been reported this in a real application, but the attached test case (that uses WhiteBox to simulate a lone pinnable object in a region where the GC does not have a reference any more) shows the principle as well. >> >> The solution involves unconditionally adding pinned regions in the collection set to the set of evacuation failed regions, instead of only doing that when G1 first encounters a reachable pinnable object in that pinned region. >> >> Testing: gha, tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: > > - * iwalulya review > * typos > - Update test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedEvacEmpty.java Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24060#pullrequestreview-2691818928 From wkemper at openjdk.org Mon Mar 17 21:43:22 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 17 Mar 2025 21:43:22 GMT Subject: RFR: 8352181: Shenandoah: Evacuate thread roots after early cleanup Message-ID: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> Moving the evacuation of thread roots after early cleanup allows Shenandoah to recycle immediate garbage a bit sooner in the cycle. ------------- Commit messages: - Merge remote-tracking branch 'jdk/master' into investigate-root-evacuation - What happens if we evacuate roots after weak roots and class unloading? Changes: https://git.openjdk.org/jdk/pull/24090/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24090&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352181 Stats: 8 lines in 1 file changed: 3 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24090.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24090/head:pull/24090 PR: https://git.openjdk.org/jdk/pull/24090 From wkemper at openjdk.org Mon Mar 17 21:43:22 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 17 Mar 2025 21:43:22 GMT Subject: RFR: 8352181: Shenandoah: Evacuate thread roots after early cleanup In-Reply-To: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> References: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> Message-ID: On Mon, 17 Mar 2025 21:37:14 GMT, William Kemper wrote: > Moving the evacuation of thread roots after early cleanup allows Shenandoah to recycle immediate garbage a bit sooner in the cycle. @rkennke , this is a small change that allows immediate garbage to be recycled sooner. Wasn't sure if there was a specific reason to have thread roots evacuated before weak refs/roots and class unloading. Testing didn't show any problems. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24090#issuecomment-2730994869 From Monica.Beckwith at microsoft.com Mon Mar 17 23:59:43 2025 From: Monica.Beckwith at microsoft.com (Monica Beckwith) Date: Mon, 17 Mar 2025 23:59:43 +0000 Subject: [EXTERNAL] Re: RFC: G1 as default collector (for real this time) In-Reply-To: References: <74d05686-9c57-4262-881d-31c269f34bc5@oracle.com> <61CEE33A-6718-479D-A498-697C1063B5AA@oracle.com> Message-ID: Hi Thomas, Erik, and all, This is an important and timely discussion, and I appreciate the insights on how the gap between SerialGC and G1GC has diminished over time. Based on recent comparative tests of out-of-the-box GC configurations (-Xmx only), I wanted to share some data-backed observations that might help validate this shift. I tested G1GC and SerialGC under 1-core/2GB and 2-core/2GB containerized environments (512MB < -Xmx <1.5GB), running SPECJBB2015 with and without stress tests. The key findings: Throughput (max_jOPS & critical_jOPS): * G1GC consistently outperforms SerialGC. * 1 core: G1GC shows a 1.78? increase in max_jOPS. * 2 cores: G1GC shows a 2.84? improvement over SerialGC. Latency and Stop-the-World (STW) Impact: * SerialGC struggles under stress, with frequent full GCs leading to long pauses. * G1GC?s incremental collections keep pause times lower, especially under stress load. * critical_jOPS, a key SLA metric, is 4.5? higher for G1GC on 2 cores. Memory Behavior & Stability: * In 512MB heap configurations, SerialGC encountered OOM failures due to heap exhaustion. Given these results, it seems reasonable to reconsider why SerialGC remains the default in small environments when G1GC offers clear performance and stability advantages. Looking forward to thoughts on this. Best, Monica P.S.: I haven?t tested for <512MB heaps yet, as that requires a different test config I?m still working on. I?d also love to hear from anyone running single-threaded, CPU-bound workloads if they have observations to share. ________________________________ From: hotspot-gc-dev on behalf of Thomas Schatzl Sent: Monday, February 24, 2025 2:33 AM To: Erik Osterlund Cc: hotspot-gc-dev at openjdk.org Subject: [EXTERNAL] Re: RFC: G1 as default collector (for real this time) Hi, On 21.02.25 15:02, Erik Osterlund wrote: > Hi Thomas, > [...]> There is however a flip side for that argument on the other side of the scaling spectrum, where ZGC is probably a better fit on the even larger scale. So while it?s true that the effect of a Serial -> G1 default change is a static default GC, I just think we should mind the fact that there is more uncertainty on the larger end of the scale. I?m not proposing any changes, just saying that maybe we should be careful about stressing the importance of having a static default GC, if we don?t know if that is the better strategy on the larger end of the scale or not, going forward. +1 Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From kirk at kodewerk.com Tue Mar 18 01:03:52 2025 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Mon, 17 Mar 2025 18:03:52 -0700 Subject: [EXTERNAL] RFC: G1 as default collector (for real this time) In-Reply-To: References: <74d05686-9c57-4262-881d-31c269f34bc5@oracle.com> <61CEE33A-6718-479D-A498-697C1063B5AA@oracle.com> Message-ID: <3140F71F-6ACD-41F5-8B5E-20E4E3DD956C@kodewerk.com> Hi Monica, Interesting results. Can you share GC logs for those runs? I?m asking because these results don?t correlate with what I?ve been seeing. That said, I don?t have enough data as of yet to draw any conclusions. Kind regards, Kirk > On Mar 17, 2025, at 4:59?PM, Monica Beckwith wrote: > > Hi Thomas, Erik, and all, > > This is an important and timely discussion, and I appreciate the insights on how the gap between SerialGC and G1GC has diminished over time. Based on recent comparative tests of out-of-the-box GC configurations (-Xmx only), I wanted to share some data-backed observations that might help validate this shift. > > I tested G1GC and SerialGC under 1-core/2GB and 2-core/2GB containerized environments (512MB < -Xmx <1.5GB), running SPECJBB2015 with and without stress tests. The key findings: > > Throughput (max_jOPS & critical_jOPS): > > G1GC consistently outperforms SerialGC. > 1 core: G1GC shows a 1.78? increase in max_jOPS. > 2 cores: G1GC shows a 2.84? improvement over SerialGC. > > Latency and Stop-the-World (STW) Impact: > > SerialGC struggles under stress, with frequent full GCs leading to long pauses. > G1GC?s incremental collections keep pause times lower, especially under stress load. > critical_jOPS, a key SLA metric, is 4.5? higher for G1GC on 2 cores. > > Memory Behavior & Stability: > > In 512MB heap configurations, SerialGC encountered OOM failures due to heap exhaustion. > > Given these results, it seems reasonable to reconsider why SerialGC remains the default in small environments when G1GC offers clear performance and stability advantages. > > Looking forward to thoughts on this. > > Best, > Monica > > P.S.: I haven?t tested for <512MB heaps yet, as that requires a different test config I?m still working on. I?d also love to hear from anyone running single-threaded, CPU-bound workloads if they have observations to share. > > > From: hotspot-gc-dev on behalf of Thomas Schatzl > Sent: Monday, February 24, 2025 2:33 AM > To: Erik Osterlund > Cc: hotspot-gc-dev at openjdk.org > Subject: [EXTERNAL] Re: RFC: G1 as default collector (for real this time) > > Hi, > > On 21.02.25 15:02, Erik Osterlund wrote: > > Hi Thomas, > > > [...]> There is however a flip side for that argument on the other side > of the scaling spectrum, where ZGC is probably a better fit on the even > larger scale. So while it?s true that the effect of a Serial -> G1 > default change is a static default GC, I just think we should mind the > fact that there is more uncertainty on the larger end of the scale. I?m > not proposing any changes, just saying that maybe we should be careful > about stressing the importance of having a static default GC, if we > don?t know if that is the better strategy on the larger end of the scale > or not, going forward. > > +1 > > Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shade at openjdk.org Tue Mar 18 09:06:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Mar 2025 09:06:14 GMT Subject: RFR: 8352181: Shenandoah: Evacuate thread roots after early cleanup In-Reply-To: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> References: <99wc8_4LoODnc8E0fwS3VV3NTfdPJ3soau-_jaiLrGU=.ef48e18a-03f2-4863-b610-513b52e539a5@github.com> Message-ID: On Mon, 17 Mar 2025 21:37:14 GMT, William Kemper wrote: > Moving the evacuation of thread roots after early cleanup allows Shenandoah to recycle immediate garbage a bit sooner in the cycle. I believe the reason we do thread roots earlier is to do the bulk of the stack processing before mutator sees it. If mutator does it by itself, it will go through armed nmethod barriers, which might be introducing extra latency. So we need to think if the benefit of doing the immediate cleanup earlier is worth accepting more active nmethod barriers in mutator. ------------- PR Review: https://git.openjdk.org/jdk/pull/24090#pullrequestreview-2693567054 From tschatzl at openjdk.org Tue Mar 18 09:31:22 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 18 Mar 2025 09:31:22 GMT Subject: RFR: 8352138: G1: Remove G1AddMetaspaceDependency.java test In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 10:07:54 GMT, Aleksey Shipilev wrote: >> Hi all, >> >> please review trivial removal of this test because the reason for its >> existence, the `DirtyCardQ_CBL_mon` lock has been removed in JDK-8237143 long time ago. >> >> Testing: gha >> >> Thanks, >> Thomas > > Marked as reviewed by shade (Reviewer). Thanks @shipilev @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/24075#issuecomment-2732280170 From tschatzl at openjdk.org Tue Mar 18 09:31:23 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 18 Mar 2025 09:31:23 GMT Subject: Integrated: 8352138: G1: Remove G1AddMetaspaceDependency.java test In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 09:23:09 GMT, Thomas Schatzl wrote: > Hi all, > > please review trivial removal of this test because the reason for its > existence, the `DirtyCardQ_CBL_mon` lock has been removed in JDK-8237143 long time ago. > > Testing: gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: f8c2122b Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/f8c2122b306df72883673f4af9b297b553db247f Stats: 125 lines in 1 file changed: 0 ins; 125 del; 0 mod 8352138: G1: Remove G1AddMetaspaceDependency.java test Reviewed-by: ayang, shade ------------- PR: https://git.openjdk.org/jdk/pull/24075 From tschatzl at openjdk.org Tue Mar 18 09:32:19 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 18 Mar 2025 09:32:19 GMT Subject: RFR: 8351921: G1: Pinned regions with pinned objects only reachable by native code crash VM [v2] In-Reply-To: References: Message-ID: <1a4FMNFxCVL4QJ1y4so11m_g02IEAJ39-YcJlzEW_Is=.770549c1-b9d2-4b21-af15-07e1736e6546@github.com> On Mon, 17 Mar 2025 19:50:42 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with two additional commits since the last revision: >> >> - * iwalulya review >> * typos >> - Update test/hotspot/jtreg/gc/g1/pinnedobjs/TestPinnedEvacEmpty.java > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk @walulyai for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/24060#issuecomment-2732283693 From tschatzl at openjdk.org Tue Mar 18 09:32:20 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 18 Mar 2025 09:32:20 GMT Subject: Integrated: 8351921: G1: Pinned regions with pinned objects only reachable by native code crash VM In-Reply-To: References: Message-ID: <0gU4U1_rih8tuuzWLxRaWTS-kMSYqWawzrZgzu31LAI=.644453ac-af07-4679-928c-6439a878c4df@github.com> On Fri, 14 Mar 2025 13:56:06 GMT, Thomas Schatzl wrote: > Hi all, > > please review this fix for a crash in G1 where it tries to reclaim a pinned region that does not have any reference visible to the VM any more and there are no other reachable pinnable objects in the same region. This can happen e.g. when JNI code is the only holder for a reference. > > This has been reported this in a real application, but the attached test case (that uses WhiteBox to simulate a lone pinnable object in a region where the GC does not have a reference any more) shows the principle as well. > > The solution involves unconditionally adding pinned regions in the collection set to the set of evacuation failed regions, instead of only doing that when G1 first encounters a reachable pinnable object in that pinned region. > > Testing: gha, tier1-5 > > Thanks, > Thomas This pull request has now been integrated. Changeset: 558c015c Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/558c015c995dbe65d876c1c5761030588773271c Stats: 123 lines in 4 files changed: 108 ins; 7 del; 8 mod 8351921: G1: Pinned regions with pinned objects only reachable by native code crash VM Reviewed-by: ayang, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/24060 From tschatzl at openjdk.org Tue Mar 18 11:23:10 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 18 Mar 2025 11:23:10 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 11:19:02 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change which aims to reduce spikes in mixed GCs, especially the last mixed-gc in a mixed cycle. Currently, G1 sorts regions identified for collection by reclaimable bytes, then prunes the list removing regions that with the lowest amount of reclaimable bytes. The pruned list is then split into collection groups which are later sorted on gc-efficiency. > > In the cachestress benchmark, we run into a case where some regions contain onlya few live objects but having many incoming references from other regions. These regions very expensive collect (low gc-efficiency). > > This patch improves the pruning process by tracking incoming references to regions during marking. Instead of pruning based on reclaimable bytes alone, we estimate GC efficiency beforehand and prune regions with the worst GC efficiency. > > This reduces the spikes in gc pause time as shown for cachestress benchmark in the image below. > > ![mixed-gc](https://github.com/user-attachments/assets/740fb51d-eb20-4946-bf90-4eef23afe2e4) > > > Testing: Tier 1-3. Looks good to me, thanks for removing the double-pruning from the initial protoype! Did you ever try to get statistics about differences in marking length and changes to the cache hits in the mark stats cache? (Just curious) src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 137: > 135: } > 136: > 137: int G1CollectionSetCandidateInfo::compare_gc_efficiency(G1CollectionSetCandidateInfo* ci1, G1CollectionSetCandidateInfo* ci2) { Maybe make this name a bit more distinct than above `compare_gc_efficiency`, i.e. `compare_region_gc_efficiency` vs. `group_efficiency`. Maybe however the different parameters are fine to distinguish them. src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 1291: > 1289: } > 1290: } else if (hr->is_old()) { > 1291: hr->note_end_of_marking(_cm->top_at_mark_start(hr), _cm->live_bytes(hr->hrm_index()), _cm->incoming_refs(hr->hrm_index())); Potentially put `hr->hrm_index()` into a local so that the line is not that long any more; or just change the parameters to take a `HeapRegion*` for both - afaict they both are always called with `hr->hrm_index()`... src/hotspot/share/gc/g1/g1ConcurrentMark.hpp line 566: > 564: void set_live_bytes(uint region, size_t live_bytes) { _region_mark_stats[region]._live_words = live_bytes / HeapWordSize; } > 565: > 566: size_t incoming_refs(uint region) const { return _region_mark_stats[region]._incoming_refs; } Suggestion: // Approximate number of incoming references found during marking. size_t incoming_refs(uint region) const { return _region_mark_stats[region]._incoming_refs; } src/hotspot/share/gc/g1/g1HeapRegion.cpp line 346: > 344: G1Policy* p = G1CollectedHeap::heap()->policy(); > 345: > 346: double merge_scan_time_ms = p->predict_merge_scan_time(_incoming_refs); // We use _incoming_refs as an estimate for remset cards Suggestion: double merge_scan_time_ms = p->predict_merge_scan_time(_incoming_refs); // We use the number of incoming references as an estimate for remset cards. src/hotspot/share/gc/g1/g1HeapRegion.hpp line 352: > 350: // GC Efficiency for collecting this region based on the time estimate in > 351: // total_based_on_incoming_refs_ms. > 352: double gc_efficiency(); Suggestion: // GC efficiency for collecting this region based on the time estimate in // total_based_on_incoming_refs_ms. double gc_efficiency(); src/hotspot/share/gc/g1/g1HeapRegion.hpp line 369: > 367: > 368: // Notify the region that concurrent marking has finished. Passes TAMS and the number of > 369: // bytes marked between bottom and TAMS. Suggestion: // Notify the region that concurrent marking has finished. Passes TAMS, the number of // bytes marked between bottom and TAMS and the estimate for incoming references. src/hotspot/share/gc/g1/g1RegionMarkStatsCache.hpp line 42: > 40: struct G1RegionMarkStats { > 41: size_t _live_words; > 42: size_t _incoming_refs; The comment above needs update. ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24076#pullrequestreview-2694081010 PR Review Comment: https://git.openjdk.org/jdk/pull/24076#discussion_r2000803063 PR Review Comment: https://git.openjdk.org/jdk/pull/24076#discussion_r2000808520 PR Review Comment: https://git.openjdk.org/jdk/pull/24076#discussion_r2000810079 PR Review Comment: https://git.openjdk.org/jdk/pull/24076#discussion_r2000811646 PR Review Comment: https://git.openjdk.org/jdk/pull/24076#discussion_r2000812718 PR Review Comment: https://git.openjdk.org/jdk/pull/24076#discussion_r2000814177 PR Review Comment: https://git.openjdk.org/jdk/pull/24076#discussion_r2000815004 From tschatzl at openjdk.org Tue Mar 18 16:24:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 18 Mar 2025 16:24:56 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v24] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: - * factor out card table and refinement table merging into a single method - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 - * obsolete G1UpdateBufferSize G1UpdateBufferSize has previously been used to size the refinement buffers and impose a minimum limit on the number of cards per thread that need to be pending before refinement starts. The former function is now obsolete with the removal of the dirty card queues, the latter functionality has been taken over by the new diagnostic option `G1PerThreadPendingCardThreshold`. I prefer to make this a diagnostic option is better than a product option because it is something that is only necessary for some test cases to produce some otherwise unwanted behavior (continuous refinement). CSR is pending. - * more documentation on why we need to rendezvous the gc threads - Merge branch 'master' into 8342381-card-table-instead-of-dcq - * ayang review * re-add STS leaver for java thread handshake - * when aborting refinement during full collection, the global card table and the per-thread card table might not be in sync. Roll forward during abort of the refinement in these situations. * additional verification * added some missing ResourceMarks in asserts * added variant of ArrayJuggle2 that crashes fairly quickly without these changes - * ayang review * remove unnecessary STSleaver * some more documentation around to_collection_card card color - Merge branch 'master' into 8342382-card-table-instead-of-dcq - * optimized RISCV gen_write_ref_array_post_barrier() implementation contributed by @RealFYang - ... and 22 more: https://git.openjdk.org/jdk/compare/b025d8c2...c833bc83 ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=23 Stats: 6788 lines in 104 files changed: 2382 ins; 3476 del; 930 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From rkennke at openjdk.org Tue Mar 18 18:18:20 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 18 Mar 2025 18:18:20 GMT Subject: RFR: 8350898: Shenandoah: Eliminate final roots safepoint [v5] In-Reply-To: References: Message-ID: <01CgWS5bjZ6prTge9OW7tOkS8g4w1FZ1zIJG1A9_798=.6afb2edc-d075-4f2d-b560-c75c195613d4@github.com> On Thu, 13 Mar 2025 20:43:05 GMT, William Kemper wrote: >> This PR converts the final roots safepoint operation into a handshake. The safepoint operation still exists, but is only executed when `ShenandoahVerify` is enabled. In addition to this change, this PR also improves the logging for the concurrent preparation for update references from [PR 22688](https://github.com/openjdk/jdk/pull/22688). > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Add comment explaining use of _trashed_oops It looks good to me. I only have a small nit, up to you if you want to change that or not. Thank you! src/hotspot/share/gc/shenandoah/shenandoahClosures.hpp line 59: > 57: // > 58: > 59: class ShenandoahFlushSATBHandshakeClosure : public HandshakeClosure { Maybe place the closure somewhere in shenandoahConcurrentGC.cpp, where it is used? Or is there a need to expose it on shenandoahClosures.hpp? ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23830#pullrequestreview-2695514909 PR Review Comment: https://git.openjdk.org/jdk/pull/23830#discussion_r2001568319 From wkemper at openjdk.org Tue Mar 18 20:34:26 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 18 Mar 2025 20:34:26 GMT Subject: RFR: 8350898: Shenandoah: Eliminate final roots safepoint [v5] In-Reply-To: <01CgWS5bjZ6prTge9OW7tOkS8g4w1FZ1zIJG1A9_798=.6afb2edc-d075-4f2d-b560-c75c195613d4@github.com> References: <01CgWS5bjZ6prTge9OW7tOkS8g4w1FZ1zIJG1A9_798=.6afb2edc-d075-4f2d-b560-c75c195613d4@github.com> Message-ID: On Tue, 18 Mar 2025 17:19:56 GMT, Roman Kennke wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment explaining use of _trashed_oops > > src/hotspot/share/gc/shenandoah/shenandoahClosures.hpp line 59: > >> 57: // >> 58: >> 59: class ShenandoahFlushSATBHandshakeClosure : public HandshakeClosure { > > Maybe place the closure somewhere in shenandoahConcurrentGC.cpp, where it is used? Or is there a need to expose it on shenandoahClosures.hpp? Ah, it is also used in `shenandoahConcurrentMark.cpp`: https://github.com/openjdk/jdk/pull/23830/files#diff-d5228ec0709dbd663da93db4cf13eac3b28015d90d0c4ef206a68b008dc1d429L215 (in fact, this is where I took it from ?). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23830#discussion_r2001949346 From wkemper at openjdk.org Tue Mar 18 21:55:32 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 18 Mar 2025 21:55:32 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled Message-ID: The sequence of events that creates this state: 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake 2. The regulator thread cancels old marking to start a young collection 3. A mutator thread shortly follows and attempts to cancel the nascent young collection 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure` 5. The mutator thread enters a tight loop in which it retries allocations without `waiting` 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`. ------------- Commit messages: - Allow young cycles that interrupt old cycles to be cancelled Changes: https://git.openjdk.org/jdk/pull/24105/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24105&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352299 Stats: 9 lines in 2 files changed: 7 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24105.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24105/head:pull/24105 PR: https://git.openjdk.org/jdk/pull/24105 From kdnilsen at openjdk.org Tue Mar 18 22:18:44 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 18 Mar 2025 22:18:44 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 00:19:35 GMT, Xiaolong Peng wrote: > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 648: > 646: assert(!heap->has_forwarded_objects(), "No forwarded objects on this path"); > 647: > 648: if (heap->mode()->is_generational()) { I think we do not want to change this code. We only swap remembered set for young-gen because gen will scan the remset and reconstruct it with more updated information. For a global GC, we do not scan the remset and do not reconstruct it. If we swap here, we will lose the information that is currently within the remset. src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 953: > 951: // pinned regions. > 952: if (!r->is_pinned()) { > 953: _heap->marking_context()->reset_top_at_mark_start(r); Here, and below, I think we want to keep complete_marking_context() rather than changing to marking_context() src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1281: > 1279: ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > 1280: shenandoah_assert_generations_reconciled(); > 1281: if (_heap->old_generation()->is_mark_complete()) { In the case that this is an global GC, we know that old-generation()->is_mark_complete() by virtue of the current program counter, I assume. (We would only ask for the old marking context if global marking were already finished.) In the case that we are doing a global GC cycle, I'm guessing that we do not set is-mark-complete for the old generation. So that's why I believe you need to keep the condition as originally written. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r1999966128 PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r1999982949 PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r1999975507 From xpeng at openjdk.org Tue Mar 18 22:18:44 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 18 Mar 2025 22:18:44 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification Message-ID: There are some scenarios in which GenShen may have improper remembered set verification logic: 1. Concurrent young cycles following a Full GC: In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification ShenandoahVerifier ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { shenandoah_assert_generations_reconciled(); if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { return _heap->complete_marking_context(); } return nullptr; } For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. ### Test - [x] `make test TEST=hotspot_gc_shenandoah` ------------- Commit messages: - Clean and rebuild rem-set in global gc - Set mark incomplete after ShenandoahMCResetCompleteBitmapTask - Only clean rem-set read table in young gc; not verify rem-set in concurrent global GC in generational mode - Always swap card table in generational mode so the table can be properly rebuilt through marking. - Initial works Changes: https://git.openjdk.org/jdk/pull/24092/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352185 Stats: 42 lines in 4 files changed: 17 ins; 16 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092 PR: https://git.openjdk.org/jdk/pull/24092 From wkemper at openjdk.org Tue Mar 18 22:18:44 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 18 Mar 2025 22:18:44 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 00:19:35 GMT, Xiaolong Peng wrote: > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1389: > 1387: shenandoah_assert_safepoint(); > 1388: shenandoah_assert_generational(); > 1389: ShenandoahMarkingContext* ctx = get_marking_context_for_old(); This should always be `nullptr` after a full GC, right? The marking context is no longer valid after compaction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2001580562 From xpeng at openjdk.org Tue Mar 18 22:18:44 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 18 Mar 2025 22:18:44 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 17:24:54 GMT, William Kemper wrote: >> There are some scenarios in which GenShen may have improper remembered set verification logic: >> >> 1. Concurrent young cycles following a Full GC: >> >> In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification >> >> >> ShenandoahVerifier >> ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { >> shenandoah_assert_generations_reconciled(); >> if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { >> return _heap->complete_marking_context(); >> } >> return nullptr; >> } >> >> >> For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. >> >> 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. >> >> 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. >> >> >> ### Test >> - [x] `make test TEST=hotspot_gc_shenandoah` > > src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1389: > >> 1387: shenandoah_assert_safepoint(); >> 1388: shenandoah_assert_generational(); >> 1389: ShenandoahMarkingContext* ctx = get_marking_context_for_old(); > > This should always be `nullptr` after a full GC, right? The marking context is no longer valid after compaction. Yes, get_marking_context_for_old always return `nullptr` after a full GC, the marking completeness has been set to false when we reset marking bitmaps after full GC. I think the method get_marking_context_for_old and the ctx arg of the helper function can be removed, I'll do that in next update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2001815702 From xpeng at openjdk.org Tue Mar 18 22:18:44 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 18 Mar 2025 22:18:44 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification In-Reply-To: References: Message-ID: <_GG5htdXFZ2Jv3qTAyG6djSrvXDtGx-jTLGoA2JbEXU=.b8588ac1-e51f-4ddf-afda-c64e6a789440@github.com> On Tue, 18 Mar 2025 01:25:11 GMT, Kelvin Nilsen wrote: >> There are some scenarios in which GenShen may have improper remembered set verification logic: >> >> 1. Concurrent young cycles following a Full GC: >> >> In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification >> >> >> ShenandoahVerifier >> ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { >> shenandoah_assert_generations_reconciled(); >> if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { >> return _heap->complete_marking_context(); >> } >> return nullptr; >> } >> >> >> For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. >> >> 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. >> >> 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. >> >> >> ### Test >> - [x] `make test TEST=hotspot_gc_shenandoah` > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 648: > >> 646: assert(!heap->has_forwarded_objects(), "No forwarded objects on this path"); >> 647: >> 648: if (heap->mode()->is_generational()) { > > I think we do not want to change this code. We only swap remembered set for young-gen because gen will scan the remset and reconstruct it with more updated information. For a global GC, we do not scan the remset and do not reconstruct it. If we swap here, we will lose the information that is currently within the remset. Thanks for for explanation, I have been reading and trying the understand how the remembered set works in GenShen. I wasn't sure whether this is actually right. In generational mode, if the GC cycle is global, the read table is already cleaned during reset phase, so remembered set verification from `verify_before_concmark` and `verify_before_update_refs` shouldn't work properly, I think the remembered set verification before mark and update references should be disabled, what do you think? Meanwhile, there is no need to clean read table during global cycle in generational mode. > src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 953: > >> 951: // pinned regions. >> 952: if (!r->is_pinned()) { >> 953: _heap->marking_context()->reset_top_at_mark_start(r); > > Here, and below, I think we want to keep complete_marking_context() rather than changing to marking_context() The marking context is not complete anymore after ShenandoahMCResetCompleteBitmapTask, but ShenandoahMCResetCompleteBitmapTask only reset bitmaps for the regions w/o pinned objects, the place calling `set_mark_incomplete()` need to moved to some place after ShenandoahPostCompactClosure being executed if use complete_marking_context here. > src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1281: > >> 1279: ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { >> 1280: shenandoah_assert_generations_reconciled(); >> 1281: if (_heap->old_generation()->is_mark_complete()) { > > In the case that this is an global GC, we know that old-generation()->is_mark_complete() by virtue of the current program counter, I assume. (We would only ask for the old marking context if global marking were already finished.) In the case that we are doing a global GC cycle, I'm guessing that we do not set is-mark-complete for the old generation. So that's why I believe you need to keep the condition as originally written. If it is global GC in generational mode, old-generation()->is_mark_complete() is always false after reset and before mark because bitmaps of the entire heap including old gen has been reset during concurrent reset phase, so old mark is not finished in when verify_before_concmark is called. The marking context return by this method is only used for remembered set verification, but as I pointed out in the first comments, we shouldn't do remembered set verification in such case because the rem-set read table is already cleaned/stale. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2000354419 PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2000361095 PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2000371950 From shade at openjdk.org Tue Mar 18 22:26:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 18 Mar 2025 22:26:07 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 21:51:34 GMT, William Kemper wrote: > The sequence of events that creates this state: > 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake > 2. The regulator thread cancels old marking to start a young collection > 3. A mutator thread shortly follows and attempts to cancel the nascent young collection > 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure` > 5. The mutator thread enters a tight loop in which it retries allocations without `waiting` > 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`. (too tired to do a full review, just mentioning a thing, so we look at it tomorrow) src/hotspot/share/gc/shenandoah/shenandoahSharedVariables.hpp line 243: > 241: assert (new_value < (sizeof(ShenandoahSharedValue) * CHAR_MAX), "sanity"); > 242: // Hmm, no platform template specialization defined for exchanging one byte... (up cast to intptr is workaround). > 243: return (T)Atomic::xchg((intptr_t*)&value, (intptr_t)new_value); That... likely gets awkward on different endianness. See the complicated dance `Atomic::CmpxchgByteUsingInt` has to do to handle it. Not to mention we are likely writing to adjacent memory location. Which is _currently_ innocuous, since we hit padding, but it is not very reliable. ------------- Changes requested by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24105#pullrequestreview-2696449916 PR Review Comment: https://git.openjdk.org/jdk/pull/24105#discussion_r2002110190 From ysr at openjdk.org Tue Mar 18 22:38:08 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 18 Mar 2025 22:38:08 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational [v4] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 00:05:05 GMT, William Kemper wrote: >> Failed allocations may race to cancel the GC with the collector who is working to clear the cancelled GC. When the GC wins this race, it will fail to notify threads that are waiting for the failed GC cycle to complete. This change directly tracks the number of threads waiting due to an allocation failure, rather than indirectly tracking them through the cancelled gc state. >> >> # Testing >> Ran TestAllocHumongousFragment#generational 6,500 times without failures. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > The non-generational modes may also fail to notify waiters Is the description in the PR still valid? > This change directly tracks the number of threads waiting due to an allocation failure, rather than indirectly tracking them through the cancelled gc state. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23997#issuecomment-2734885815 From kdnilsen at openjdk.org Tue Mar 18 22:40:41 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 18 Mar 2025 22:40:41 GMT Subject: RFR: 8350889: GenShen: Break out of infinite loop of old GC cycles part2 Message-ID: A recent commit failed to address all paths by which an infinite loop of old GC cycles might occur. This new PR handles one other case related to the original problem. ------------- Commit messages: - Cancel old GC triggers when old GC starts/resumes - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 20 more: https://git.openjdk.org/jdk/compare/4a02de82...4ebb3aaf Changes: https://git.openjdk.org/jdk/pull/24106/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24106&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350889 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24106.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24106/head:pull/24106 PR: https://git.openjdk.org/jdk/pull/24106 From wkemper at openjdk.org Tue Mar 18 23:01:08 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 18 Mar 2025 23:01:08 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled In-Reply-To: References: Message-ID: <7PFHErLXXCsFeCjx55B_u8JisUcDGX9VFLa5azzsCso=.92f7d81d-8989-4aff-b57e-d2128403e01f@github.com> On Tue, 18 Mar 2025 22:23:23 GMT, Aleksey Shipilev wrote: >> The sequence of events that creates this state: >> 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake >> 2. The regulator thread cancels old marking to start a young collection >> 3. A mutator thread shortly follows and attempts to cancel the nascent young collection >> 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure` >> 5. The mutator thread enters a tight loop in which it retries allocations without `waiting` >> 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`. > > src/hotspot/share/gc/shenandoah/shenandoahSharedVariables.hpp line 243: > >> 241: assert (new_value < (sizeof(ShenandoahSharedValue) * CHAR_MAX), "sanity"); >> 242: // Hmm, no platform template specialization defined for exchanging one byte... (up cast to intptr is workaround). >> 243: return (T)Atomic::xchg((intptr_t*)&value, (intptr_t)new_value); > > That... likely gets awkward on different endianness. See the complicated dance `Atomic::CmpxchgByteUsingInt` has to do to handle it. > > Not to mention we are likely writing to adjacent memory location. Which is _currently_ innocuous, since we hit padding, but it is not very reliable. `PlatformCmpxchg` has specializations on aarch64 and x86 for `sizeof(T) == 1`. Should we also add platform specializations for `PlatformXchg` for `sizeof(T) == 1`? (It has them for `4` and `8`). Could also do what `XchgUsingCmpxchg` does... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24105#discussion_r2002137199 From wkemper at openjdk.org Tue Mar 18 23:04:07 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 18 Mar 2025 23:04:07 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational [v4] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 00:05:05 GMT, William Kemper wrote: >> Failed allocations may race to cancel the GC with the collector who is working to clear the cancelled GC. When the GC wins this race, it will fail to notify threads that are waiting for the failed GC cycle to complete. This change sees allocation waiters notified any time a GC completes without being cancelled. >> >> # Testing >> Ran TestAllocHumongousFragment#generational 6,500 times without failures. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > The non-generational modes may also fail to notify waiters Fixed the description. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23997#issuecomment-2734917437 From wkemper at openjdk.org Tue Mar 18 23:08:09 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 18 Mar 2025 23:08:09 GMT Subject: RFR: 8350889: GenShen: Break out of infinite loop of old GC cycles part2 In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 22:36:49 GMT, Kelvin Nilsen wrote: > A recent commit failed to address all paths by which an infinite loop of old GC cycles might occur. This new PR handles one other case related to the original problem. LGTM ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24106#pullrequestreview-2696496570 From ysr at openjdk.org Tue Mar 18 23:47:08 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 18 Mar 2025 23:47:08 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational [v4] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 00:05:05 GMT, William Kemper wrote: >> Failed allocations may race to cancel the GC with the collector who is working to clear the cancelled GC. When the GC wins this race, it will fail to notify threads that are waiting for the failed GC cycle to complete. This change sees allocation waiters notified any time a GC completes without being cancelled. >> >> # Testing >> Ran TestAllocHumongousFragment#generational 6,500 times without failures. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > The non-generational modes may also fail to notify waiters LGTM! ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23997#pullrequestreview-2696532375 From kdnilsen at openjdk.org Wed Mar 19 00:15:10 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 19 Mar 2025 00:15:10 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 00:19:35 GMT, Xiaolong Peng wrote: > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` Can we confirm that this addresses JBS issue with further testing before integration? src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 660: > 658: > 659: // Verify before mark is done before swapping card tables, > 660: // therefore the write card table will be verified before being taken snapshot. Not a big deal, but this is two sentences. "... swapping card tables. Therefore, the write card table is verified before we swap read and write card tables." ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/24092#pullrequestreview-2696536061 PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2002170767 From kdnilsen at openjdk.org Wed Mar 19 00:15:11 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 19 Mar 2025 00:15:11 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification In-Reply-To: <_GG5htdXFZ2Jv3qTAyG6djSrvXDtGx-jTLGoA2JbEXU=.b8588ac1-e51f-4ddf-afda-c64e6a789440@github.com> References: <_GG5htdXFZ2Jv3qTAyG6djSrvXDtGx-jTLGoA2JbEXU=.b8588ac1-e51f-4ddf-afda-c64e6a789440@github.com> Message-ID: On Tue, 18 Mar 2025 07:14:23 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 648: >> >>> 646: assert(!heap->has_forwarded_objects(), "No forwarded objects on this path"); >>> 647: >>> 648: if (heap->mode()->is_generational()) { >> >> I think we do not want to change this code. We only swap remembered set for young-gen because gen will scan the remset and reconstruct it with more updated information. For a global GC, we do not scan the remset and do not reconstruct it. If we swap here, we will lose the information that is currently within the remset. > > Thanks for for explanation, I have been reading and trying the understand how the remembered set works in GenShen. I wasn't sure whether this is actually right. > > In generational mode, if the GC cycle is global, the read table is already cleaned during reset phase, so remembered set verification from `verify_before_concmark` and `verify_before_update_refs` shouldn't work properly, I think the remembered set verification before mark and update references should be disabled, what do you think? Meanwhile, there is no need to clean read table during global cycle in generational mode. Ok. So we will always swap card tables, but we'll do it after verify-before-mark. To clarify the intention, after we swap card table, the write-table is all clean, and the read table holds whatever had been gathered prior to the start of GC. Young and bootstrap collection will update the write card table as a side effect of remembered set scanning. Global collection will update the card table as a side effect of global marking of old objects. >> src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 953: >> >>> 951: // pinned regions. >>> 952: if (!r->is_pinned()) { >>> 953: _heap->marking_context()->reset_top_at_mark_start(r); >> >> Here, and below, I think we want to keep complete_marking_context() rather than changing to marking_context() > > The marking context is not complete anymore after ShenandoahMCResetCompleteBitmapTask, but ShenandoahMCResetCompleteBitmapTask only reset bitmaps for the regions w/o pinned objects, the place calling `set_mark_incomplete()` need to moved to some place after ShenandoahPostCompactClosure being executed if use complete_marking_context here. Can we move heap_region_iterate(&post_compact) and post_compact.update_generation_usage() before heap->workers()->run_task(ShenandoahMCResetCompletedBitmaptask) so that we can use complete_marking_context here? I'm a bit uncomfortable using an incomplete marking context as if it is complete. (I understand "why it works" in this case, but this looks like an "accident waiting to happen" when someone comes back to modify this code in the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2002169842 PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2002179864 From xpeng at openjdk.org Wed Mar 19 00:19:09 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 19 Mar 2025 00:19:09 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification In-Reply-To: References: Message-ID: <-t-I-pGhv43DJxTgXO3bMPX0G5eMYqsO3LPjLCq9XNg=.682e2b60-8ca6-413f-8b8d-86a44e25a37a@github.com> On Tue, 18 Mar 2025 23:50:24 GMT, Kelvin Nilsen wrote: >> There are some scenarios in which GenShen may have improper remembered set verification logic: >> >> 1. Concurrent young cycles following a Full GC: >> >> In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification >> >> >> ShenandoahVerifier >> ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { >> shenandoah_assert_generations_reconciled(); >> if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { >> return _heap->complete_marking_context(); >> } >> return nullptr; >> } >> >> >> For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. >> >> 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. >> >> 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. >> >> >> ### Test >> - [x] `make test TEST=hotspot_gc_shenandoah` > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 660: > >> 658: >> 659: // Verify before mark is done before swapping card tables, >> 660: // therefore the write card table will be verified before being taken snapshot. > > Not a big deal, but this is two sentences. "... swapping card tables. Therefore, the write card table is verified before we swap read and write card tables." Thanks, I'll fix it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2002187325 From ysr at openjdk.org Wed Mar 19 00:25:07 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 19 Mar 2025 00:25:07 GMT Subject: RFR: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational [v4] In-Reply-To: <8HGl6b056y3lTi7An0UsJ896JOy-7Ij8SMcc2MULj0I=.26ca6193-f829-449e-afbe-4d068b8533ab@github.com> References: <8HGl6b056y3lTi7An0UsJ896JOy-7Ij8SMcc2MULj0I=.26ca6193-f829-449e-afbe-4d068b8533ab@github.com> Message-ID: On Wed, 12 Mar 2025 17:25:20 GMT, Xiaolong Peng wrote: >> I believe that is the correct behavior. The mutators are waiting until there is memory available. If mutator B cannot allocate, there is no reason to believe mutator A would be able to allocate. In this case, it is fine for both mutators to wait (even if it means A has to wait an extra cycle). > > Thanks for the explanation, re-read the relevant codes I think it make sense, when Mutator B fails to allocate when Concurrent GC is at `op_final_update_refs`, very unlikely there is enough space for Mutator A. In the case of the stop world collectors, the waiters would form cohorts behind GC count epochs, the idea being that if your failure to allocate happened during a specific epoch, you didn't have sufficient head room to allocate which would then require at least a new GC. Depending on how we think of the allocation failures interacting with potential freeing of memory by a concurrent collector and the sizes of the allocations being attempted, I could see this going either way. I do realize that a large number of notifications when space is exhausted might extract a cost, but if we are allocating and collecting concurrently, I can imagine that some notion of a monotonically increasing count and notifying all of the early waiters might yield some benefit. I assume we would need to do collect the dustribution of the allocation failure sizes and the space available to really tell if it makes a difference. A benchmark such as SPECjbb might be able to tell the difference but I am not sure. Intuition can sometimes mislead in these scenarios, so empirical data might help. Can probably be tackled/investigated in the fullness of time, but I thought it was worth leaving my thoughts here anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23997#discussion_r2002191328 From wkemper at openjdk.org Wed Mar 19 00:33:14 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 19 Mar 2025 00:33:14 GMT Subject: Integrated: 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational In-Reply-To: References: Message-ID: On Tue, 11 Mar 2025 19:31:47 GMT, William Kemper wrote: > Failed allocations may race to cancel the GC with the collector who is working to clear the cancelled GC. When the GC wins this race, it will fail to notify threads that are waiting for the failed GC cycle to complete. This change sees allocation waiters notified any time a GC completes without being cancelled. > > # Testing > Ran TestAllocHumongousFragment#generational 6,500 times without failures. This pull request has now been integrated. Changeset: 20d4fe3a Author: William Kemper URL: https://git.openjdk.org/jdk/commit/20d4fe3a574a33784dc02e7cc653cdb248b697a2 Stats: 5 lines in 3 files changed: 0 ins; 1 del; 4 mod 8351464: Shenandoah: Hang on ShenandoahController::handle_alloc_failure when run test TestAllocHumongousFragment#generational Reviewed-by: xpeng, ysr ------------- PR: https://git.openjdk.org/jdk/pull/23997 From xpeng at openjdk.org Wed Mar 19 00:35:29 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 19 Mar 2025 00:35:29 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v2] In-Reply-To: References: Message-ID: <0tjufPvihcze6ELUIAybBhxFDp3tZk2qgaD0XPHFUjw=.9148d06f-95ba-449b-af11-2ba86ee40c7a@github.com> > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24092/files - new: https://git.openjdk.org/jdk/pull/24092/files/5a94b141..021f2fef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=00-01 Stats: 8 lines in 2 files changed: 3 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092 PR: https://git.openjdk.org/jdk/pull/24092 From xpeng at openjdk.org Wed Mar 19 00:35:29 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 19 Mar 2025 00:35:29 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v2] In-Reply-To: References: <_GG5htdXFZ2Jv3qTAyG6djSrvXDtGx-jTLGoA2JbEXU=.b8588ac1-e51f-4ddf-afda-c64e6a789440@github.com> Message-ID: <2U6kJoymX-uKOSUoR54QvEQtJ54DyKgttgItf96SYzI=.69334cab-f9ca-4609-93ba-23197c76f430@github.com> On Wed, 19 Mar 2025 00:04:07 GMT, Kelvin Nilsen wrote: >> The marking context is not complete anymore after ShenandoahMCResetCompleteBitmapTask, but ShenandoahMCResetCompleteBitmapTask only reset bitmaps for the regions w/o pinned objects, the place calling `set_mark_incomplete()` need to moved to some place after ShenandoahPostCompactClosure being executed if use complete_marking_context here. > > Can we move heap_region_iterate(&post_compact) and post_compact.update_generation_usage() before heap->workers()->run_task(ShenandoahMCResetCompletedBitmaptask) so that we can use complete_marking_context here? I'm a bit uncomfortable using an incomplete marking context as if it is complete. (I understand "why it works" in this case, but this looks like an "accident waiting to happen" when someone comes back to modify this code in the future. I'm not uncomfortable changing the orders here since I am not sure if there is dependency on the execution order(even it should be working), but I can move set_mark_incomplete() to the a place close to the end of phase5_epilog. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2002196180 From ysr at openjdk.org Wed Mar 19 00:38:08 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 19 Mar 2025 00:38:08 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v2] In-Reply-To: <0tjufPvihcze6ELUIAybBhxFDp3tZk2qgaD0XPHFUjw=.9148d06f-95ba-449b-af11-2ba86ee40c7a@github.com> References: <0tjufPvihcze6ELUIAybBhxFDp3tZk2qgaD0XPHFUjw=.9148d06f-95ba-449b-af11-2ba86ee40c7a@github.com> Message-ID: On Wed, 19 Mar 2025 00:35:29 GMT, Xiaolong Peng wrote: >> There are some scenarios in which GenShen may have improper remembered set verification logic: >> >> 1. Concurrent young cycles following a Full GC: >> >> In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification >> >> >> ShenandoahVerifier >> ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { >> shenandoah_assert_generations_reconciled(); >> if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { >> return _heap->complete_marking_context(); >> } >> return nullptr; >> } >> >> >> For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. >> >> 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. >> >> 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. >> >> >> ### Test >> - [x] `make test TEST=hotspot_gc_shenandoah` > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Can you sync w/master so GHA (& problem lists) is more uptodate. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24092#issuecomment-2735021061 From xpeng at openjdk.org Wed Mar 19 00:48:31 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 19 Mar 2025 00:48:31 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v2] In-Reply-To: References: <0tjufPvihcze6ELUIAybBhxFDp3tZk2qgaD0XPHFUjw=.9148d06f-95ba-449b-af11-2ba86ee40c7a@github.com> Message-ID: On Wed, 19 Mar 2025 00:35:39 GMT, Y. Srinivas Ramakrishna wrote: > Can you sync w/master so GHA (& problem lists) is more uptodate. Thanks! Done, now waiting for GHS to rerun, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24092#issuecomment-2735030715 From xpeng at openjdk.org Wed Mar 19 00:48:31 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 19 Mar 2025 00:48:31 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v3] In-Reply-To: References: Message-ID: > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8345399-v3 - Address review comments - Clean and rebuild rem-set in global gc - Set mark incomplete after ShenandoahMCResetCompleteBitmapTask - Only clean rem-set read table in young gc; not verify rem-set in concurrent global GC in generational mode - Always swap card table in generational mode so the table can be properly rebuilt through marking. - Initial works ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24092/files - new: https://git.openjdk.org/jdk/pull/24092/files/021f2fef..4726b876 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=01-02 Stats: 54221 lines in 805 files changed: 27210 ins; 17485 del; 9526 mod Patch: https://git.openjdk.org/jdk/pull/24092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092 PR: https://git.openjdk.org/jdk/pull/24092 From iwalulya at openjdk.org Wed Mar 19 08:32:49 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 19 Mar 2025 08:32:49 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection [v2] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change which aims to reduce spikes in mixed GCs, especially the last mixed-gc in a mixed cycle. Currently, G1 sorts regions identified for collection by reclaimable bytes, then prunes the list removing regions that with the lowest amount of reclaimable bytes. The pruned list is then split into collection groups which are later sorted on gc-efficiency. > > In the cachestress benchmark, we run into a case where some regions contain onlya few live objects but having many incoming references from other regions. These regions very expensive collect (low gc-efficiency). > > This patch improves the pruning process by tracking incoming references to regions during marking. Instead of pruning based on reclaimable bytes alone, we estimate GC efficiency beforehand and prune regions with the worst GC efficiency. > > This reduces the spikes in gc pause time as shown for cachestress benchmark in the image below. > > ![mixed-gc](https://github.com/user-attachments/assets/740fb51d-eb20-4946-bf90-4eef23afe2e4) > > > Testing: Tier 1-3. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Thomas Review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24076/files - new: https://git.openjdk.org/jdk/pull/24076/files/d55f382b..f5fa92f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24076&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24076&range=00-01 Stats: 15 lines in 8 files changed: 3 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24076.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24076/head:pull/24076 PR: https://git.openjdk.org/jdk/pull/24076 From iwalulya at openjdk.org Wed Mar 19 08:35:13 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 19 Mar 2025 08:35:13 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection [v2] In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 11:20:10 GMT, Thomas Schatzl wrote: > Looks good to me, thanks for removing the double-pruning from the initial protoype! > > Did you ever try to get statistics about differences in marking length and changes to the cache hits in the mark stats cache? (Just curious) For the Small Heap Sizes (24-26G) used for testing, it is a slight drop of 100 to 99.98** hit rate in a few marking cycles. Will try with larger heaps ------------- PR Comment: https://git.openjdk.org/jdk/pull/24076#issuecomment-2735728622 From shade at openjdk.org Wed Mar 19 09:31:09 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Mar 2025 09:31:09 GMT Subject: RFR: 8350889: GenShen: Break out of infinite loop of old GC cycles In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 22:36:49 GMT, Kelvin Nilsen wrote: > A recent commit failed to address all paths by which an infinite loop of old GC cycles might occur. This new PR handles one other case related to the original problem. This is part2 of the fix. I don't understand the bug mapping here. JDK-8350889 is already resolved. I think you need to file a follow-up bug and reference that new bug in this PR. ------------- Changes requested by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24106#pullrequestreview-2697641876 From tschatzl at openjdk.org Wed Mar 19 09:55:09 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 19 Mar 2025 09:55:09 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 08:32:49 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change which aims to reduce spikes in mixed GCs, especially the last mixed-gc in a mixed cycle. Currently, G1 sorts regions identified for collection by reclaimable bytes, then prunes the list removing regions that with the lowest amount of reclaimable bytes. The pruned list is then split into collection groups which are later sorted on gc-efficiency. >> >> In the cachestress benchmark, we run into a case where some regions contain onlya few live objects but having many incoming references from other regions. These regions very expensive collect (low gc-efficiency). >> >> This patch improves the pruning process by tracking incoming references to regions during marking. Instead of pruning based on reclaimable bytes alone, we estimate GC efficiency beforehand and prune regions with the worst GC efficiency. >> >> This reduces the spikes in gc pause time as shown for cachestress benchmark in the image below. >> >> ![mixed-gc](https://github.com/user-attachments/assets/740fb51d-eb20-4946-bf90-4eef23afe2e4) >> >> >> Testing: Tier 1-3. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Thomas Review Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24076#pullrequestreview-2697733082 From tschatzl at openjdk.org Wed Mar 19 10:39:08 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 19 Mar 2025 10:39:08 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 08:32:52 GMT, Ivan Walulya wrote: > > Looks good to me, thanks for removing the double-pruning from the initial protoype! > > Did you ever try to get statistics about differences in marking length and changes to the cache hits in the mark stats cache? (Just curious) > > For the Small Heap Sizes (24-26G) used for testing, it is a slight drop of 100 to 99.98** hit rate in a few marking cycles. Will try with larger heaps I would actually expect a larger decrease with smaller heaps (or region sizes) - with larger heaps regions are larger, so the amount of cross-references/jumping around in the object graph potentially smaller, and so the amount of cache evictions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24076#issuecomment-2736113234 From tschatzl at openjdk.org Wed Mar 19 13:17:19 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 19 Mar 2025 13:17:19 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v25] In-Reply-To: References: Message-ID: <5Q9-MERAD4KIP-fzgw7JVAtC9u4L1fEFGcNkdHBvkg4=.1917bd58-a5f8-4c5c-b1f9-27b7457c6262@github.com> > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * fix IR code generation tests that change due to barrier cost changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/c833bc83..f419556e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=23-24 Stats: 5 lines in 2 files changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From tschatzl at openjdk.org Wed Mar 19 13:27:17 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 19 Mar 2025 13:27:17 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v25] In-Reply-To: <5Q9-MERAD4KIP-fzgw7JVAtC9u4L1fEFGcNkdHBvkg4=.1917bd58-a5f8-4c5c-b1f9-27b7457c6262@github.com> References: <5Q9-MERAD4KIP-fzgw7JVAtC9u4L1fEFGcNkdHBvkg4=.1917bd58-a5f8-4c5c-b1f9-27b7457c6262@github.com> Message-ID: On Wed, 19 Mar 2025 13:17:19 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * fix IR code generation tests that change due to barrier cost changes Commit https://github.com/openjdk/jdk/pull/23739/commits/f419556e9177ecf9fbf22e606dd6c1b850f4330f fixes the failing compiler tests that check whether the compiler emits the correct object graph. Occurs after merging with mainline that significantly reduces total barrier cost calculation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2736639357 From wkemper at openjdk.org Wed Mar 19 16:59:16 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 19 Mar 2025 16:59:16 GMT Subject: Integrated: 8350898: Shenandoah: Eliminate final roots safepoint In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 19:51:24 GMT, William Kemper wrote: > This PR converts the final roots safepoint operation into a handshake. The safepoint operation still exists, but is only executed when `ShenandoahVerify` is enabled. In addition to this change, this PR also improves the logging for the concurrent preparation for update references from [PR 22688](https://github.com/openjdk/jdk/pull/22688). This pull request has now been integrated. Changeset: 8a1c85ea Author: William Kemper URL: https://git.openjdk.org/jdk/commit/8a1c85eaa902500d49ca82c67b6838d39cb5b24f Stats: 295 lines in 14 files changed: 198 ins; 47 del; 50 mod 8350898: Shenandoah: Eliminate final roots safepoint Reviewed-by: rkennke, kdnilsen, cslucas ------------- PR: https://git.openjdk.org/jdk/pull/23830 From kdnilsen at openjdk.org Wed Mar 19 17:38:17 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 19 Mar 2025 17:38:17 GMT Subject: RFR: 8352428: GenShen: Old-gen cycles are still looping In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 09:28:53 GMT, Aleksey Shipilev wrote: >> A recent commit failed to address all paths by which an infinite loop of old GC cycles might occur. This new PR handles one other case related to the original problem. This is part2 of the fix. > > I don't understand the bug mapping here. JDK-8350889 is already resolved. I think you need to file a follow-up bug and reference that new bug in this PR. Thanks @shipilev for your suggestion. I've opened another JBS issue and linked this new PR to that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24106#issuecomment-2737503085 From shade at openjdk.org Wed Mar 19 18:00:09 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 19 Mar 2025 18:00:09 GMT Subject: RFR: 8352428: GenShen: Old-gen cycles are still looping In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 22:36:49 GMT, Kelvin Nilsen wrote: > A recent commit failed to address all paths by which an infinite loop of old GC cycles might occur. This new PR handles one other case related to the original problem. This is part2 of the fix. Looks fine to me. Note the are useful "Caused by" and "Related" links in JBS, which you should really use to track the dependencies between the tickets. I added some, see how it looks. Also, these are either "Enhancement" or "Bug". "Task" is usually about something that is not code-related: https://openjdk.org/guide/#types-of-issues ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24106#pullrequestreview-2699539104 From wkemper at openjdk.org Wed Mar 19 18:33:50 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 19 Mar 2025 18:33:50 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled [v2] In-Reply-To: References: Message-ID: > The sequence of events that creates this state: > 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake > 2. The regulator thread cancels old marking to start a young collection > 3. A mutator thread shortly follows and attempts to cancel the nascent young collection > 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure` > 5. The mutator thread enters a tight loop in which it retries allocations without `waiting` > 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Emulate single byte xchg with cmpxchg - Merge remote-tracking branch 'jdk/master' into fix-uncancellable-young-gc - Allow young cycles that interrupt old cycles to be cancelled ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24105/files - new: https://git.openjdk.org/jdk/pull/24105/files/9b0faf0c..adcb999b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24105&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24105&range=00-01 Stats: 2244 lines in 56 files changed: 964 ins; 745 del; 535 mod Patch: https://git.openjdk.org/jdk/pull/24105.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24105/head:pull/24105 PR: https://git.openjdk.org/jdk/pull/24105 From kdnilsen at openjdk.org Thu Mar 20 00:56:19 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 20 Mar 2025 00:56:19 GMT Subject: Integrated: 8352428: GenShen: Old-gen cycles are still looping In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 22:36:49 GMT, Kelvin Nilsen wrote: > A recent commit failed to address all paths by which an infinite loop of old GC cycles might occur. This new PR handles one other case related to the original problem. This is part2 of the fix. This pull request has now been integrated. Changeset: 74df384a Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/74df384a9870431efb184158bba032c79c35356e Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod 8352428: GenShen: Old-gen cycles are still looping Reviewed-by: wkemper, shade ------------- PR: https://git.openjdk.org/jdk/pull/24106 From xpeng at openjdk.org Thu Mar 20 02:36:56 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 20 Mar 2025 02:36:56 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v4] In-Reply-To: References: Message-ID: > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: remembered set can't be verified w/o complete old marking or parsable old generation. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24092/files - new: https://git.openjdk.org/jdk/pull/24092/files/4726b876..f16dd729 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=02-03 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092 PR: https://git.openjdk.org/jdk/pull/24092 From xpeng at openjdk.org Thu Mar 20 02:45:34 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 20 Mar 2025 02:45:34 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v5] In-Reply-To: References: Message-ID: > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Merge branch 'master' of https://github.com/openjdk/jdk into JDK-8345399-v3 - remembered set can't be verified w/o complete old marking or parsable old generation. - Merge branch 'openjdk:master' into JDK-8345399-v3 - Address review comments - Clean and rebuild rem-set in global gc - Set mark incomplete after ShenandoahMCResetCompleteBitmapTask - Only clean rem-set read table in young gc; not verify rem-set in concurrent global GC in generational mode - Always swap card table in generational mode so the table can be properly rebuilt through marking. - Initial works ------------- Changes: https://git.openjdk.org/jdk/pull/24092/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=04 Stats: 49 lines in 4 files changed: 24 ins; 16 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092 PR: https://git.openjdk.org/jdk/pull/24092 From xpeng at openjdk.org Thu Mar 20 03:48:49 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 20 Mar 2025 03:48:49 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v6] In-Reply-To: References: Message-ID: > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Fix test failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24092/files - new: https://git.openjdk.org/jdk/pull/24092/files/6c420c4a..3947f36d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=04-05 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092 PR: https://git.openjdk.org/jdk/pull/24092 From xpeng at openjdk.org Thu Mar 20 07:24:42 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 20 Mar 2025 07:24:42 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v7] In-Reply-To: References: Message-ID: <_DhsSyxboYzJQHLs_pzwb-IPixh2jdXkxxO6p36Z-n8=.66db88ee-f15f-47cc-9eae-3579e81af6b6@github.com> > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: set old gen parsable to false when complete mixed evacuations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24092/files - new: https://git.openjdk.org/jdk/pull/24092/files/3947f36d..f0e8d694 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=05-06 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092 PR: https://git.openjdk.org/jdk/pull/24092 From xpeng at openjdk.org Thu Mar 20 08:47:46 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 20 Mar 2025 08:47:46 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v8] In-Reply-To: References: Message-ID: > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Revert "set old gen parsable to false when complete mixed evacuations" This reverts commit f0e8d694f58b0b2a513b3ff3206a9eea1c998868. - Not verify rem-set before init-mark in global gc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24092/files - new: https://git.openjdk.org/jdk/pull/24092/files/f0e8d694..e3327245 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=06-07 Stats: 12 lines in 3 files changed: 0 ins; 8 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092 PR: https://git.openjdk.org/jdk/pull/24092 From xpeng at openjdk.org Thu Mar 20 09:20:31 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 20 Mar 2025 09:20:31 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v9] In-Reply-To: References: Message-ID: > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Use read table for verify_rem_set_before_mark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24092/files - new: https://git.openjdk.org/jdk/pull/24092/files/e3327245..2b538632 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092 PR: https://git.openjdk.org/jdk/pull/24092 From tschatzl at openjdk.org Thu Mar 20 09:49:13 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 20 Mar 2025 09:49:13 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v26] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 09:44:07 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. >> >> The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. >> >> ### Current situation >> >> With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. >> >> The main reason for the current barrier is how g1 implements concurrent refinement: >> * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. >> * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, >> * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. >> >> These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: >> >> >> // Filtering >> if (region(@x.a) == region(y)) goto done; // same region check >> if (y == null) goto done; // null value check >> if (card(@x.a) == young_card) goto done; // write to young gen check >> StoreLoad; // synchronize >> if (card(@x.a) == dirty_card) goto done; >> >> *card(@x.a) = dirty >> >> // Card tracking >> enqueue(card-address(@x.a)) into thread-local-dcq; >> if (thread-local-dcq is not full) goto done; >> >> call runtime to move thread-local-dcq into dcqs >> >> done: >> >> >> Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. >> >> The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. >> >> There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). >> >> The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching c... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * make young gen length revising independent of refinement thread > * use a service task > * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update Commit https://github.com/openjdk/jdk/pull/23739/commits/5e76a516c848e75f56e966a1ffe4115b1dce786c implements the change to make young gen length revising independent of the refinement control thread. Infrastructure to determine currently available number of bytes for allocation and determining the next time the particular task should be redone is shared. It may be distributed across a bit more methods than I would prefer, but particularly the refinement control thread wants to reuse and keep some intermediate results (to not be required to get the `Heap_lock` again basically). I did not have a good reason to make the heuristic to determine the time to the next action different for both, so they are basically the same. There is some pre-existing problem that the minimum time for re-doing the work is ~50ms. That might be too short in some cases, but then again, if you have that short of a GC interval it may not be very useful to e.g. revise young gen length anyway. I think with this change all current concerns are addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23739#issuecomment-2739766880 From tschatzl at openjdk.org Thu Mar 20 09:44:07 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 20 Mar 2025 09:44:07 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v26] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * make young gen length revising independent of refinement thread * use a service task * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23739/files - new: https://git.openjdk.org/jdk/pull/23739/files/f419556e..5e76a516 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=24-25 Stats: 337 lines in 12 files changed: 237 ins; 90 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From xpeng at openjdk.org Thu Mar 20 20:02:52 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 20 Mar 2025 20:02:52 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v10] In-Reply-To: References: Message-ID: <21_ZHL-2drP3fW6JDDSIk0dEiZ0VzXxJht0ayo_0Vco=.f8205d1e-ebdb-4ec5-a9cc-e9b6982499b8@github.com> > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Not verify remembered set w/o parseable old gen when old mark is incomplete - Use decode_raw instead of decode_not_null ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24092/files - new: https://git.openjdk.org/jdk/pull/24092/files/2b538632..feceef3e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=08-09 Stats: 25 lines in 2 files changed: 18 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092 PR: https://git.openjdk.org/jdk/pull/24092 From xpeng at openjdk.org Thu Mar 20 20:06:58 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 20 Mar 2025 20:06:58 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v11] In-Reply-To: References: Message-ID: > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Fix wrong comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24092/files - new: https://git.openjdk.org/jdk/pull/24092/files/feceef3e..39707d15 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=09-10 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092 PR: https://git.openjdk.org/jdk/pull/24092 From wkemper at openjdk.org Thu Mar 20 20:20:10 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 20 Mar 2025 20:20:10 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v11] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 20:06:58 GMT, Xiaolong Peng wrote: >> There are some scenarios in which GenShen may have improper remembered set verification logic: >> >> 1. Concurrent young cycles following a Full GC: >> >> In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification >> >> >> ShenandoahVerifier >> ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { >> shenandoah_assert_generations_reconciled(); >> if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { >> return _heap->complete_marking_context(); >> } >> return nullptr; >> } >> >> >> For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. >> >> 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. >> >> 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. >> >> >> ### Test >> - [x] `make test TEST=hotspot_gc_shenandoah` > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Fix wrong comments Couple of nits. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 711: > 709: } > 710: > 711: if (ShenandoahVerify && heap->mode()->is_generational()) { Are we calling `verify_before_concmark` twice in generational mode? Should we delete this second call here? src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1377: > 1375: log_debug(gc)("Verifying remembered set at %s mark", old_generation->is_doing_mixed_evacuations() ? "mixed" : "young"); > 1376: > 1377: ShenandoahWriteTableScanner scanner(ShenandoahGenerationalHeap::heap()->old_generation()->card_scan()); Can use existing local variable: `old_generation`? ------------- Changes requested by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24092#pullrequestreview-2703995429 PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2006389044 PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2006383550 From ysr at openjdk.org Thu Mar 20 22:21:10 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 20 Mar 2025 22:21:10 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v11] In-Reply-To: References: <_GG5htdXFZ2Jv3qTAyG6djSrvXDtGx-jTLGoA2JbEXU=.b8588ac1-e51f-4ddf-afda-c64e6a789440@github.com> Message-ID: <7pNU1UWNVucen0QwESfFkOiKIP59gBVZNF5gCHveOQ0=.144c0152-20a4-41f4-8929-b906a284be7f@github.com> On Tue, 18 Mar 2025 23:48:53 GMT, Kelvin Nilsen wrote: >> Thanks for for explanation, I have been reading and trying the understand how the remembered set works in GenShen. I wasn't sure whether this is actually right. >> >> In generational mode, if the GC cycle is global, the read table is already cleaned during reset phase, so remembered set verification from `verify_before_concmark` and `verify_before_update_refs` shouldn't work properly, I think the remembered set verification before mark and update references should be disabled, what do you think? Meanwhile, there is no need to clean read table during global cycle in generational mode. > > Ok. So we will always swap card tables, but we'll do it after verify-before-mark. To clarify the intention, after we swap card table, the write-table is all clean, and the read table holds whatever had been gathered prior to the start of GC. Young and bootstrap collection will update the write card table as a side effect of remembered set scanning. Global collection will update the card table as a side effect of global marking of old objects. I'd leave a comment to this effect (along the lines of Kelvin's last comment) here. Did we measure the impact of this change on performance? In particular it would seem that the number of dirty old cards might now reduce after a global gc compared to before this change. Ideally, this would be a change that would go in on its own. (There is no impact on correctness, since in the absence of this change, the dirty card set is an over-approximation.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2006539947 From xpeng at openjdk.org Thu Mar 20 22:39:24 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 20 Mar 2025 22:39:24 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v12] In-Reply-To: References: Message-ID: <6tFCTl-s2bUS0Tu3oqK0kMPx45J1JEru-tf0Ec0WMZc=.35a8b864-df0e-4adb-ab7c-f511ffa07e0b@github.com> > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` Xiaolong Peng has updated the pull request incrementally with two additional commits since the last revision: - Not validate remembered set w/o complete old marking - Use decode_raw_not_null instead of decode_raw ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24092/files - new: https://git.openjdk.org/jdk/pull/24092/files/39707d15..73e95e8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=10-11 Stats: 25 lines in 2 files changed: 1 ins; 14 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092 PR: https://git.openjdk.org/jdk/pull/24092 From xpeng at openjdk.org Thu Mar 20 22:48:24 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 20 Mar 2025 22:48:24 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v13] In-Reply-To: References: Message-ID: > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: tide up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24092/files - new: https://git.openjdk.org/jdk/pull/24092/files/73e95e8a..16494d48 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092 PR: https://git.openjdk.org/jdk/pull/24092 From xpeng at openjdk.org Thu Mar 20 23:20:09 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 20 Mar 2025 23:20:09 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v11] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 20:16:14 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix wrong comments > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 711: > >> 709: } >> 710: >> 711: if (ShenandoahVerify && heap->mode()->is_generational()) { > > Are we calling `verify_before_concmark` twice in generational mode? Should we delete this second call here? Thanks, the condition here is wrong. I have updated code only verify after swap read/write table. > src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1377: > >> 1375: log_debug(gc)("Verifying remembered set at %s mark", old_generation->is_doing_mixed_evacuations() ? "mixed" : "young"); >> 1376: >> 1377: ShenandoahWriteTableScanner scanner(ShenandoahGenerationalHeap::heap()->old_generation()->card_scan()); > > Can use existing local variable: `old_generation`? Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2006592728 PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2006590066 From xpeng at openjdk.org Thu Mar 20 23:24:10 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 20 Mar 2025 23:24:10 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v13] In-Reply-To: <7pNU1UWNVucen0QwESfFkOiKIP59gBVZNF5gCHveOQ0=.144c0152-20a4-41f4-8929-b906a284be7f@github.com> References: <_GG5htdXFZ2Jv3qTAyG6djSrvXDtGx-jTLGoA2JbEXU=.b8588ac1-e51f-4ddf-afda-c64e6a789440@github.com> <7pNU1UWNVucen0QwESfFkOiKIP59gBVZNF5gCHveOQ0=.144c0152-20a4-41f4-8929-b906a284be7f@github.com> Message-ID: On Thu, 20 Mar 2025 22:18:43 GMT, Y. Srinivas Ramakrishna wrote: >> Ok. So we will always swap card tables, but we'll do it after verify-before-mark. To clarify the intention, after we swap card table, the write-table is all clean, and the read table holds whatever had been gathered prior to the start of GC. Young and bootstrap collection will update the write card table as a side effect of remembered set scanning. Global collection will update the card table as a side effect of global marking of old objects. > > I'd leave a comment to this effect (along the lines of Kelvin's last comment) here. Did we measure the impact of this change on performance? In particular it would seem that the number of dirty old cards might now reduce after a global gc compared to before this change. > > Ideally, this would be a change that would go in on its own. (There is no impact on correctness, since in the absence of this change, the dirty card set is an over-approximation.) It is a bit hard to measure the impact on performance I think, but given the rem-set is more accurate, there shouldn't be any performance regression. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2006595141 From Monica.Beckwith at microsoft.com Fri Mar 21 00:19:29 2025 From: Monica.Beckwith at microsoft.com (Monica Beckwith) Date: Fri, 21 Mar 2025 00:19:29 +0000 Subject: Moving Forward with AHS for G1 Message-ID: Hi all, Following up on the previous discussions around Automatic Heap Sizing (AHS) for G1, I wanted to summarize the key takeaways and outline the next steps. In my November message [1], I described how AHS could dynamically manage heap sizing based on multiple inputs, including global memory pressure, GCTimeRatio policy, and user-defined heap tunables. This aligns with Thomas?s summary [2], which outlines how AHS integrates with G1?s existing mechanisms, including CPU-based heap resizing (JDK-8238687) [3], external constraints like CurrentMaxHeapSize (JDK-8204088) [4], and SoftMaxHeapSize (JDK-8236073) [5] as a key influence on heap adjustments. AHS will operate as a broader mechanism, where SoftMaxHeapSize serves as a heuristic to guide memory management but does not impose strict limits. It will work in conjunction with CPU-based heuristics to manage heap growth and contraction efficiently. Google?s previous work on ProposedHeapSize for G1 contributed valuable insights into adaptive heap management, but as discussions evolved, the consensus has shifted toward a model centered on SoftMaxHeapSize as a guiding input within AHS. Given this consensus, I will proceed with the implementation of JDK-8236073 [5] to ensure that AHS integrates effectively with G1?s dynamic heap sizing policies. I will share updates as the work progresses. If there are any additional concerns or areas where further clarification is needed, please let me know. Thanks again for the valuable discussions. Best, Monica ________________________________ References [1] Monica Beckwith, "Clarifications on AHS behavior and its role in G1," OpenJDK hotspot-gc-dev mailing list, November 2024. [https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050191.html] [2] Thomas Schatzl, "Giving a rough summary about the system envisioned for G1," OpenJDK hotspot-gc-dev mailing list, February 2025. [https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051069.html] [3] OpenJDK Issue, "JDK-8238687: Improve CPU-based heap sizing in G1," OpenJDK Bug Database. [https://bugs.openjdk.org/browse/JDK-8238687] [4] OpenJDK Issue, "JDK-8204088: Introduce CurrentMaxHeapSize to allow external heap constraints," OpenJDK Bug Database. [https://bugs.openjdk.org/browse/JDK-8204088] [5] OpenJDK Issue, "JDK-8236073: Introduce SoftMaxHeapSize as a guide for G1 AHS," OpenJDK Bug Database. [https://bugs.openjdk.org/browse/JDK-8236073] [cid:3d91bfcf-1aee-4667-9392-844632521ddf] Book time to meet with me -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-oy3qsdus.png Type: image/png Size: 528 bytes Desc: Outlook-oy3qsdus.png URL: From xpeng at openjdk.org Fri Mar 21 00:25:14 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 21 Mar 2025 00:25:14 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v13] In-Reply-To: References: <_GG5htdXFZ2Jv3qTAyG6djSrvXDtGx-jTLGoA2JbEXU=.b8588ac1-e51f-4ddf-afda-c64e6a789440@github.com> <7pNU1UWNVucen0QwESfFkOiKIP59gBVZNF5gCHveOQ0=.144c0152-20a4-41f4-8929-b906a284be7f@github.com> Message-ID: <2jgKqoBrD8WfxKs9cLqfzWa5AMS__muV4O0IxTiWFbA=.b41179e5-5252-476f-8bc9-c42e2a6a507b@github.com> On Thu, 20 Mar 2025 23:21:37 GMT, Xiaolong Peng wrote: >> I'd leave a comment to this effect (along the lines of Kelvin's last comment) here. Did we measure the impact of this change on performance? In particular it would seem that the number of dirty old cards might now reduce after a global gc compared to before this change. >> >> Ideally, this would be a change that would go in on its own. (There is no impact on correctness, since in the absence of this change, the dirty card set is an over-approximation.) > > It is a bit hard to measure the impact on performance I think, but given the rem-set is more accurate, there shouldn't be any performance regression. I'll add comment here as you are suggesting. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2006650222 From ysr at openjdk.org Fri Mar 21 04:03:17 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 21 Mar 2025 04:03:17 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v13] In-Reply-To: <2jgKqoBrD8WfxKs9cLqfzWa5AMS__muV4O0IxTiWFbA=.b41179e5-5252-476f-8bc9-c42e2a6a507b@github.com> References: <_GG5htdXFZ2Jv3qTAyG6djSrvXDtGx-jTLGoA2JbEXU=.b8588ac1-e51f-4ddf-afda-c64e6a789440@github.com> <7pNU1UWNVucen0QwESfFkOiKIP59gBVZNF5gCHveOQ0=.144c0152-20a4-41f4-8929-b906a284be7f@github.com> <2jgKqoBrD8WfxKs9cLqfzWa5AMS__muV4O0IxTiWFbA=.b41179e5-5252-476f-8bc9-c42e2a6a507b@github.com> Message-ID: On Fri, 21 Mar 2025 00:22:46 GMT, Xiaolong Peng wrote: >> It is a bit hard to measure the impact on performance I think, but given the rem-set is more accurate, there shouldn't be any performance regression. > > I'll add comment here as you are suggesting. I was suggesting looking to see if normal perf measures showed any improvements. E.g. if you ran say SPECjbb and compared the remset scan times for the minor GC's that followed global collections. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2006806542 From tschatzl at openjdk.org Fri Mar 21 07:56:44 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 21 Mar 2025 07:56:44 GMT Subject: RFR: 8351921: G1: Pinned regions with pinned objects only reachable by native code crash VM Message-ID: Hi all, please review this clean backout of "8351921: G1: Pinned regions with pinned objects only reachable by native code crash VM". The original patch is incomplete, and the new patch is basically a rewrite apart from the regression test. This reverts commit 558c015c995dbe65d876c1c5761030588773271c. Thanks, Thomas ------------- Commit messages: - 8351921 Changes: https://git.openjdk.org/jdk/pull/24146/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24146&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351921 Stats: 123 lines in 4 files changed: 7 ins; 108 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24146.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24146/head:pull/24146 PR: https://git.openjdk.org/jdk/pull/24146 From thomas.schatzl at oracle.com Fri Mar 21 08:30:11 2025 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 21 Mar 2025 09:30:11 +0100 Subject: [EXTERNAL] Re: RFC: G1 as default collector (for real this time) In-Reply-To: References: <74d05686-9c57-4262-881d-31c269f34bc5@oracle.com> <61CEE33A-6718-479D-A498-697C1063B5AA@oracle.com> Message-ID: <6cee0300-89d7-4363-80a8-87bc0f5dcd1e@oracle.com> Hi Monica, thanks for posting about your observation! On 18.03.25 00:59, Monica Beckwith wrote: > Hi Thomas, Erik, and all, > > This is an important and timely discussion, and I appreciate the > insights on how the gap between SerialGC and G1GC has diminished over > time. Based on recent comparative tests of out-of-the-box GC > configurations (-Xmx only), I wanted to share some data-backed > observations that might help validate this shift. > > I tested G1GC and SerialGC under 1-core/2GB and 2-core/2GB containerized > environments (512MB < -Xmx <1.5GB), running SPECJBB2015 with and without > stress tests. The key findings: > > *Throughput (max_jOPS & critical_jOPS):* > > * > G1GC consistently outperforms SerialGC. > * > 1 core: G1GC shows a 1.78? increase in max_jOPS. > * > 2 cores: G1GC shows a 2.84? improvement over SerialGC. > > > *Latency and Stop-the-World (STW) Impact:* > > * > SerialGC struggles under stress, with frequent full GCs leading to > long pauses. > * > G1GC?s incremental?collections keep pause times lower, especially > under stress load. > * > critical_jOPS, a key SLA metric, is 4.5? higher for G1GC on 2 cores. > > > *Memory Behavior & Stability:* > > * > In 512MB heap configurations, SerialGC encountered OOM failures due > to heap exhaustion. > > > Given these results, it seems reasonable to reconsider why SerialGC > remains the default in small environments when G1GC offers clear > performance and stability advantages. > > Looking forward to thoughts on this. this is somewhat what we have noticed too: Serial (and maybe to some more extent Parallel) are good as long as there are no full gcs; if there are, G1 fairly quickly becomes competitive, even more so with increasing full gc frequency, at some point overtaking it. Serial could certainly be better with some manual generation size tuning, but the suggestion is about the default collector choice. Imo that one should largely be about default settings. The situation is becoming even more in favor of G1 with the new "throughput" barriers (JDK-8342382). Thomas From shade at openjdk.org Fri Mar 21 09:07:08 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 21 Mar 2025 09:07:08 GMT Subject: RFR: 8352584: [Backout] G1: Pinned regions with pinned objects only reachable by native code crash VM In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 07:51:43 GMT, Thomas Schatzl wrote: > Hi all, > > please review this clean backout of "8351921: G1: Pinned regions with pinned objects only reachable by native code crash VM". > > The original patch is incomplete, and the new patch is basically a rewrite apart from the regression test. > > This reverts commit 558c015c995dbe65d876c1c5761030588773271c. > > Thanks, > Thomas Trivial. Go! ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24146#pullrequestreview-2705203412 From tschatzl at openjdk.org Fri Mar 21 09:52:17 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 21 Mar 2025 09:52:17 GMT Subject: RFR: 8352584: [Backout] G1: Pinned regions with pinned objects only reachable by native code crash VM In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 09:04:44 GMT, Aleksey Shipilev wrote: >> Hi all, >> >> please review this clean backout of "8351921: G1: Pinned regions with pinned objects only reachable by native code crash VM". >> >> The original patch is incomplete, and the new patch is basically a rewrite apart from the regression test. >> >> This reverts commit 558c015c995dbe65d876c1c5761030588773271c. >> >> Thanks, >> Thomas > > Trivial. Go! Thanks @shipilev for your review ------------- PR Comment: https://git.openjdk.org/jdk/pull/24146#issuecomment-2742856120 From tschatzl at openjdk.org Fri Mar 21 10:04:20 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 21 Mar 2025 10:04:20 GMT Subject: Integrated: 8352584: [Backout] G1: Pinned regions with pinned objects only reachable by native code crash VM In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 07:51:43 GMT, Thomas Schatzl wrote: > Hi all, > > please review this clean backout of "8351921: G1: Pinned regions with pinned objects only reachable by native code crash VM". > > The original patch is incomplete, and the new patch is basically a rewrite apart from the regression test. > > This reverts commit 558c015c995dbe65d876c1c5761030588773271c. > > Thanks, > Thomas This pull request has now been integrated. Changeset: b545b9e7 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/b545b9e79ea6a1e616c35e584f333b47bd7bd6d6 Stats: 123 lines in 4 files changed: 7 ins; 108 del; 8 mod 8352584: [Backout] G1: Pinned regions with pinned objects only reachable by native code crash VM Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/24146 From tschatzl at openjdk.org Fri Mar 21 11:16:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 21 Mar 2025 11:16:57 GMT Subject: RFR: 8352508: [Redo] G1: Pinned regions with pinned objects only reachable by native code crash VM In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 08:07:35 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that re-implements the fix for [JDK-8351921](https://bugs.openjdk.org/browse/JDK-8351921); in that fix we (think we) forgot to consider the same situation with optional regions. > > I.e. the previous fix only fixed the situation occurring during initial evacuation, however as we add regions due to optional evacuation, the same situation can still happen. > > So this change adds some work to every evacuation phase that marks all pinned regions in the current collection set as evacuation failed/pinned instead of only doing this work once in the pre evacuation phase. > > As for testing, it is extremely hard to induce a situation where there is a pinned region with no apparent live objects in an optional collection set, so I gave up and just added the original test again. > > Testing: gha, test > > Thanks, > Thomas [This test](https://github.com/openjdk/jdk/compare/master...tschatzl:jdk:8351921a-induced-test-failure?expand=1) using some VM hacking (also in that patch) shows the additional failure in optional evacuation. I.e. with the old fix, the `TestPinnedEvacEmpty.java` test succeeds while the new test `TestPinnedEvacEmptyOptional.java` fails. This change, with the appropriate hack, passes both tests. Fwiw, I asked the reporter to try this fix on their private failing test, going to wait for their result too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24147#issuecomment-2743050999 PR Comment: https://git.openjdk.org/jdk/pull/24147#issuecomment-2743053802 From tschatzl at openjdk.org Fri Mar 21 11:16:56 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 21 Mar 2025 11:16:56 GMT Subject: RFR: 8352508: [Redo] G1: Pinned regions with pinned objects only reachable by native code crash VM Message-ID: Hi all, please review this change that re-implements the fix for [JDK-8351921](https://bugs.openjdk.org/browse/JDK-8351921); in that fix we (think we) forgot to consider the same situation with optional regions. I.e. the previous fix only fixed the situation occurring during initial evacuation, however as we add regions due to optional evacuation, the same situation can still happen. So this change adds some work to every evacuation phase that marks all pinned regions in the current collection set as evacuation failed/pinned instead of only doing this work once in the pre evacuation phase. As for testing, it is extremely hard to induce a situation where there is a pinned region with no apparent live objects in an optional collection set, so I gave up and just added the original test again. Testing: gha, test Thanks, Thomas ------------- Commit messages: - * latest merge from master added live bytes assert again - * re-added test after merge - Merge branch 'master' into 8352508-pinned-regions-crash-optional-regions - * fix merge error after factoring out backout fix - 8351921 - 8351921 Changes: https://git.openjdk.org/jdk/pull/24147/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24147&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352508 Stats: 139 lines in 5 files changed: 126 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24147/head:pull/24147 PR: https://git.openjdk.org/jdk/pull/24147 From tschatzl at openjdk.org Fri Mar 21 14:20:34 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 21 Mar 2025 14:20:34 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v27] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits: - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq - * make young gen length revising independent of refinement thread * use a service task * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update - * fix IR code generation tests that change due to barrier cost changes - * factor out card table and refinement table merging into a single method - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 - * obsolete G1UpdateBufferSize G1UpdateBufferSize has previously been used to size the refinement buffers and impose a minimum limit on the number of cards per thread that need to be pending before refinement starts. The former function is now obsolete with the removal of the dirty card queues, the latter functionality has been taken over by the new diagnostic option `G1PerThreadPendingCardThreshold`. I prefer to make this a diagnostic option is better than a product option because it is something that is only necessary for some test cases to produce some otherwise unwanted behavior (continuous refinement). CSR is pending. - * more documentation on why we need to rendezvous the gc threads - Merge branch 'master' into 8342381-card-table-instead-of-dcq - * ayang review * re-add STS leaver for java thread handshake - * when aborting refinement during full collection, the global card table and the per-thread card table might not be in sync. Roll forward during abort of the refinement in these situations. * additional verification * added some missing ResourceMarks in asserts * added variant of ArrayJuggle2 that crashes fairly quickly without these changes - ... and 25 more: https://git.openjdk.org/jdk/compare/0cb110eb...d9311047 ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=26 Stats: 7089 lines in 110 files changed: 2610 ins; 3555 del; 924 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From xpeng at openjdk.org Fri Mar 21 15:11:19 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 21 Mar 2025 15:11:19 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v13] In-Reply-To: References: <_GG5htdXFZ2Jv3qTAyG6djSrvXDtGx-jTLGoA2JbEXU=.b8588ac1-e51f-4ddf-afda-c64e6a789440@github.com> <7pNU1UWNVucen0QwESfFkOiKIP59gBVZNF5gCHveOQ0=.144c0152-20a4-41f4-8929-b906a284be7f@github.com> <2jgKqoBrD8WfxKs9cLqfzWa5AMS__muV4O0IxTiWFbA=.b41179e5-5252-476f-8bc9-c42e2a6a507b@github.com> Message-ID: On Fri, 21 Mar 2025 04:00:17 GMT, Y. Srinivas Ramakrishna wrote: >> I'll add comment here as you are suggesting. > > I was suggesting looking to see if normal perf measures showed any improvements. E.g. if you ran say SPECjbb and compared the remset scan times for the minor GC's that followed global collections. I have run h2 benchmark, here is the remembered set scan times after a global GC, it does seem to improve remembered set scan time in this case: PR version: [2025-03-21T07:35:41.801+0000][10.292s][19715][info ][gc ] GC(6) Concurrent remembered set scanning 13.069ms [2025-03-21T07:35:48.088+0000][16.579s][19715][info ][gc ] GC(9) Concurrent remembered set scanning 5.537ms [2025-03-21T07:35:56.610+0000][25.101s][19715][info ][gc ] GC(14) Concurrent remembered set scanning 6.186ms [2025-03-21T07:36:03.967+0000][32.459s][19715][info ][gc ] GC(18) Concurrent remembered set scanning 9.562ms [2025-03-21T07:36:11.234+0000][39.725s][19715][info ][gc ] GC(22) Concurrent remembered set scanning 2.591ms [2025-03-21T07:36:17.303+0000][45.794s][19715][info ][gc ] GC(25) Concurrent remembered set scanning 0.999ms [2025-03-21T07:36:25.647+0000][54.139s][19715][info ][gc ] GC(30) Concurrent remembered set scanning 1.665ms [2025-03-21T07:36:32.790+0000][61.281s][19715][info ][gc ] GC(33) Concurrent remembered set scanning 2.851ms [2025-03-21T07:36:40.241+0000][68.732s][19715][info ][gc ] GC(36) Concurrent remembered set scanning 0.716ms [2025-03-21T07:36:47.440+0000][75.931s][19715][info ][gc ] GC(39) Concurrent remembered set scanning 1.932ms master: [2025-03-21T07:34:04.978+0000][10.765s][17923][info ][gc ] GC(6) Concurrent remembered set scanning 22.813ms [2025-03-21T07:34:11.250+0000][17.038s][17923][info ][gc ] GC(9) Concurrent remembered set scanning 14.457ms [2025-03-21T07:34:18.692+0000][24.480s][17923][info ][gc ] GC(14) Concurrent remembered set scanning 4.972ms [2025-03-21T07:34:26.033+0000][31.820s][17923][info ][gc ] GC(18) Concurrent remembered set scanning 9.134ms [2025-03-21T07:34:34.416+0000][40.203s][17923][info ][gc ] GC(22) Concurrent remembered set scanning 3.655ms [2025-03-21T07:34:42.180+0000][47.967s][17923][info ][gc ] GC(26) Concurrent remembered set scanning 3.253ms [2025-03-21T07:34:49.371+0000][55.168s][17923][info ][gc ] GC(29) Concurrent remembered set scanning 1.615ms [2025-03-21T07:34:56.592+0000][62.396s][17923][info ][gc ] GC(32) Concurrent remembered set scanning 1.570ms [2025-03-21T07:35:03.766+0000][69.575s][17923][info ][gc ] GC(35) Concurrent remembered set scanning 1.040ms [2025-03-21T07:35:10.941+0000][76.753s][17923][info ][gc ] GC(38) Concurrent remembered set scanning 1.947ms ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2007788818 From rkennke at openjdk.org Fri Mar 21 15:56:21 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 21 Mar 2025 15:56:21 GMT Subject: RFR: 8352091: GenShen: assert(!(request.generation->is_old() && _heap->old_generation()->is_doing_mixed_evacuations())) failed: Old heuristic should not request cycles while it waits for mixed evacuation In-Reply-To: References: Message-ID: On Fri, 14 Mar 2025 23:45:28 GMT, William Kemper wrote: > Consider the following: > 1. Regulator thread sees that control thread is `idle` and requests an old cycle > 2. Regulator thread waits until control thread is not `idle` > 3. Control thread starts old cycle and notifies the Regulator thread (as expected) > 4. Regulator thread stays off CPU for a _long_ time > 5. Control thread _completes_ old marking and returns to `idle` state > 6. Regulator thread finally wakes up and sees that Control thread is _still_ idle > 7. In fact, the control thread has completed old marking and the regulator thread should not request another cycle Looks reasonable. ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24069#pullrequestreview-2706448918 From wkemper at openjdk.org Fri Mar 21 16:07:20 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 21 Mar 2025 16:07:20 GMT Subject: Integrated: 8352091: GenShen: assert(!(request.generation->is_old() && _heap->old_generation()->is_doing_mixed_evacuations())) failed: Old heuristic should not request cycles while it waits for mixed evacuation In-Reply-To: References: Message-ID: <1XBrk_Rxi-fGi6YyFJ2xvw9Gaq-5y3pcVidNHeeTDgE=.927d9df1-8788-40fe-b92d-6457da3e3ca1@github.com> On Fri, 14 Mar 2025 23:45:28 GMT, William Kemper wrote: > Consider the following: > 1. Regulator thread sees that control thread is `idle` and requests an old cycle > 2. Regulator thread waits until control thread is not `idle` > 3. Control thread starts old cycle and notifies the Regulator thread (as expected) > 4. Regulator thread stays off CPU for a _long_ time > 5. Control thread _completes_ old marking and returns to `idle` state > 6. Regulator thread finally wakes up and sees that Control thread is _still_ idle > 7. In fact, the control thread has completed old marking and the regulator thread should not request another cycle This pull request has now been integrated. Changeset: 52c6ce6c Author: William Kemper URL: https://git.openjdk.org/jdk/commit/52c6ce6c73194762970fd9521121333713495fa3 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8352091: GenShen: assert(!(request.generation->is_old() && _heap->old_generation()->is_doing_mixed_evacuations())) failed: Old heuristic should not request cycles while it waits for mixed evacuation Reviewed-by: rkennke ------------- PR: https://git.openjdk.org/jdk/pull/24069 From xpeng at openjdk.org Fri Mar 21 22:09:32 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 21 Mar 2025 22:09:32 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId Message-ID: ### Root cause Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in JFR. ### Solution it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. ### Test - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" - [ ] TEST=hotspot_gc_shenandoah ------------- Commit messages: - Add static cast from size_t to uint - Rename _gc_id of ShenandoahController to _gc_count, and gc id will be derived from _gc_count - tide up - gc_id should start from 0 - GenShen: Enabling JFR asserts when getting GCId Changes: https://git.openjdk.org/jdk/pull/24166/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24166&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352588 Stats: 37 lines in 4 files changed: 11 ins; 7 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/24166.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24166/head:pull/24166 PR: https://git.openjdk.org/jdk/pull/24166 From xpeng at openjdk.org Fri Mar 21 22:09:32 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 21 Mar 2025 22:09:32 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId In-Reply-To: References: Message-ID: <8wQiOKS3dt30v5KKmBI-YFk0KRsFSdakjaJfpvHU8ow=.9702993f-d73f-4c49-ba7c-57c9dcae87d0@github.com> On Fri, 21 Mar 2025 19:09:46 GMT, Xiaolong Peng wrote: > ### Root cause > Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in JFR. > > ### Solution > it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` > > In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. > > ### Test > - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" > - [ ] TEST=hotspot_gc_shenandoah src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 137: > 135: // GC is starting, bump the internal gc count and set GCIdMark > 136: update_gc_count(); > 137: GCIdMark gc_id_mark(static_cast(get_gc_id())); static cast from size_t to uint here since GCIdMark use uint. If needed, I can change the data type of gc id and count in ShenandoahController to uint, but need to update more files. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24166#discussion_r2008389233 From wkemper at openjdk.org Fri Mar 21 22:31:07 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 21 Mar 2025 22:31:07 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 19:09:46 GMT, Xiaolong Peng wrote: > ### Root cause > Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. > > ### Solution > it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` > > In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. > > ### Test > - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" > - [x] TEST=hotspot_gc_shenandoah Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahController.cpp line 50: > 48: } > 49: > 50: size_t ShenandoahController::get_gc_id() { Do we need to keep this method? Can't everything just use `get_gc_count` now? ------------- PR Review: https://git.openjdk.org/jdk/pull/24166#pullrequestreview-2707440082 PR Review Comment: https://git.openjdk.org/jdk/pull/24166#discussion_r2008418183 From wkemper at openjdk.org Fri Mar 21 22:31:08 2025 From: wkemper at openjdk.org (William Kemper) Date: Fri, 21 Mar 2025 22:31:08 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId In-Reply-To: <8wQiOKS3dt30v5KKmBI-YFk0KRsFSdakjaJfpvHU8ow=.9702993f-d73f-4c49-ba7c-57c9dcae87d0@github.com> References: <8wQiOKS3dt30v5KKmBI-YFk0KRsFSdakjaJfpvHU8ow=.9702993f-d73f-4c49-ba7c-57c9dcae87d0@github.com> Message-ID: On Fri, 21 Mar 2025 22:01:40 GMT, Xiaolong Peng wrote: >> ### Root cause >> Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. >> >> ### Solution >> it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` >> >> In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. >> >> ### Test >> - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" >> - [x] TEST=hotspot_gc_shenandoah > > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 137: > >> 135: // GC is starting, bump the internal gc count and set GCIdMark >> 136: update_gc_count(); >> 137: GCIdMark gc_id_mark(static_cast(get_gc_id())); > > static cast from size_t to uint here since GCIdMark use uint. > If needed, I can change the data type of gc id and count in ShenandoahController to uint, but need to update more files. `static_cast` is fine, but `checked_cast` would be better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24166#discussion_r2008416303 From manc at google.com Fri Mar 21 22:54:19 2025 From: manc at google.com (Man Cao) Date: Fri, 21 Mar 2025 15:54:19 -0700 Subject: Moving Forward with AHS for G1 In-Reply-To: References: Message-ID: Thank you for the summary and volunteering on this work! Apology for the lack of response from our side, due to other tasks and priorities. I have been experimenting with implementing SoftMaxHeapSize for G1 (JDK-8236073), and using this knob instead of ProposedHeapSize for Google's AHS project. I could probably send out a Github PR next week. >From our side, we would really like to make sure the implementation for SoftMaxHeapSize (JDK-8236073) and CurrentMaxHeapSize (JDK-8204088) work with Google's AHS project. It would be more effective if we could test with our internal workload and benchmarks during development. Is it OK if I pick up the work for SoftMaxHeapSize (JDK-8236073) and CurrentMaxHeapSize (JDK-8204088)? -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: From xpeng at openjdk.org Fri Mar 21 23:13:07 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 21 Mar 2025 23:13:07 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 22:27:52 GMT, William Kemper wrote: >> ### Root cause >> Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. >> >> ### Solution >> it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` >> >> In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. >> >> ### Test >> - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" >> - [x] TEST=hotspot_gc_shenandoah > > src/hotspot/share/gc/shenandoah/shenandoahController.cpp line 50: > >> 48: } >> 49: >> 50: size_t ShenandoahController::get_gc_id() { > > Do we need to keep this method? Can't everything just use `get_gc_count` now? We don't have to keep it, it needs a bit more changes to touch up. I think it better to remove it to avoid the confusion with the gc_id() method, ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24166#discussion_r2008461020 From xpeng at openjdk.org Fri Mar 21 23:26:46 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 21 Mar 2025 23:26:46 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v2] In-Reply-To: References: Message-ID: > ### Root cause > Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. > > ### Solution > it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` > > In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. > > ### Test > - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" > - [x] TEST=hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Remove get_gc_id() method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24166/files - new: https://git.openjdk.org/jdk/pull/24166/files/13cea142..b091159e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24166&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24166&range=00-01 Stats: 32 lines in 7 files changed: 0 ins; 8 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/24166.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24166/head:pull/24166 PR: https://git.openjdk.org/jdk/pull/24166 From xpeng at openjdk.org Fri Mar 21 23:26:46 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 21 Mar 2025 23:26:46 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v2] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 23:10:48 GMT, Xiaolong Peng wrote: >> src/hotspot/share/gc/shenandoah/shenandoahController.cpp line 50: >> >>> 48: } >>> 49: >>> 50: size_t ShenandoahController::get_gc_id() { >> >> Do we need to keep this method? Can't everything just use `get_gc_count` now? > > We don't have to keep it, it needs a bit more changes to touch up. I think it better to remove it to avoid the confusion with the gc_id() method, I have removed method ShenandoahController::get_gc_id() in the update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24166#discussion_r2008466559 From xpeng at openjdk.org Fri Mar 21 23:29:22 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 21 Mar 2025 23:29:22 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v3] In-Reply-To: References: Message-ID: > ### Root cause > Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. > > ### Solution > it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` > > In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. > > ### Test > - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" > - [x] TEST=hotspot_gc_shenandoah Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: touch up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24166/files - new: https://git.openjdk.org/jdk/pull/24166/files/b091159e..1920cf09 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24166&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24166&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24166.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24166/head:pull/24166 PR: https://git.openjdk.org/jdk/pull/24166 From tschatzl at openjdk.org Mon Mar 24 10:14:51 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 24 Mar 2025 10:14:51 GMT Subject: RFR: 8352508: [Redo] G1: Pinned regions with pinned objects only reachable by native code crash VM [v3] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that re-implements the fix for [JDK-8351921](https://bugs.openjdk.org/browse/JDK-8351921); in that fix we (think we) forgot to consider the same situation with optional regions. > > I.e. the previous fix only fixed the situation occurring during initial evacuation, however as we add regions due to optional evacuation, the same situation can still happen. > > So this change adds some work to every evacuation phase that marks all pinned regions in the current collection set as evacuation failed/pinned instead of only doing this work once in the pre evacuation phase. > > As for testing, it is extremely hard to induce a situation where there is a pinned region with no apparent live objects in an optional collection set, so I gave up and just added the original test again. > > Testing: gha, test > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * improved test case, covering wrong reclamation of "empty" pinned regions for * full gc * young gc (in young gen), initial evacuation * remark pause ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24147/files - new: https://git.openjdk.org/jdk/pull/24147/files/aa5d8256..10ca8700 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24147&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24147&range=01-02 Stats: 16 lines in 1 file changed: 8 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24147/head:pull/24147 PR: https://git.openjdk.org/jdk/pull/24147 From ayang at openjdk.org Mon Mar 24 11:22:11 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 24 Mar 2025 11:22:11 GMT Subject: RFR: 8352147: G1: TestEagerReclaimHumongousRegionsClearMarkBits test takes very long [v4] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 15:48:14 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this refactor of the TestEagerReclaimHumongousRegionsClearMarkBits test that runs way too long, and sometimes even causing timeouts in GHA. >> >> So the problem is that TestEagerReclaimHumongousRegionsClearMarkBits checks whether after eager reclaim during marking the mark on the humongous object is cleared correctly. It does so with a trial-and-error approach allocating humongous objects and hoping that the faulty state somehow occurs. >> >> This can take a long time, and although the test limits itself to 50s runtime, for some reason there can still be sporadic timeouts in some setups. >> >> However for a long time it has been possible to halt concurrent mark just before completion, inducing the exact state needed for this test. So rewrite the test to be more targeted. >> >> Testing: gha, running test locally >> >> Thanks, >> Thomas > > Thomas Schatzl has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8352147 > > Hi all, > > please review this refactor of the TestEagerReclaimHumongousRegionsClearMarkBits test that runs way too long, and sometimes even causing timeouts in GHA. > > So the problem is that TestEagerReclaimHumongousRegionsClearMarkBits checks whether after eager reclaim during marking the mark on the humongous object is cleared correctly. It does so with a trial-and-error approach allocating humongous objects and hoping that the faulty state somehow occurs. > > This can take a long time, and although the test limits itself to 50s runtime, for some reason there can still be sporadic timeouts in some setups. > > However for a long time it is possible to halt concurrent mark just before completion, inducing the exact state needed for this test. So rewrite the test to be more targeted. > > Testing: gha, running test locally > > Thanks, > Thomas > > * also check for actual region reclamation > * last minute typo Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24077#pullrequestreview-2710026242 From ayang at openjdk.org Mon Mar 24 11:29:08 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 24 Mar 2025 11:29:08 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 08:32:49 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change which aims to reduce spikes in mixed GCs, especially the last mixed-gc in a mixed cycle. Currently, G1 sorts regions identified for collection by reclaimable bytes, then prunes the list removing regions that with the lowest amount of reclaimable bytes. The pruned list is then split into collection groups which are later sorted on gc-efficiency. >> >> In the cachestress benchmark, we run into a case where some regions contain onlya few live objects but having many incoming references from other regions. These regions very expensive collect (low gc-efficiency). >> >> This patch improves the pruning process by tracking incoming references to regions during marking. Instead of pruning based on reclaimable bytes alone, we estimate GC efficiency beforehand and prune regions with the worst GC efficiency. >> >> This reduces the spikes in gc pause time as shown for cachestress benchmark in the image below. >> >> ![mixed-gc](https://github.com/user-attachments/assets/740fb51d-eb20-4946-bf90-4eef23afe2e4) >> >> >> Testing: Tier 1-3. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Thomas Review src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 150: > 148: > 149: size_t reclaimable1 = ci1->_r->gc_efficiency(); > 150: size_t reclaimable2 = ci2->_r->gc_efficiency(); Do the var names need to be updated? src/hotspot/share/gc/g1/g1HeapRegion.cpp line 357: > 355: double G1HeapRegion::gc_efficiency() { > 356: return reclaimable_bytes() / total_based_on_incoming_refs_ms(); > 357: } I wonder if `total_based_on_incoming_refs_ms` can be inlined to its single caller. Also, these logic doesn't belong to a heap-region -- maybe `G1Policy`, since most of code use `p->`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24076#discussion_r2009987868 PR Review Comment: https://git.openjdk.org/jdk/pull/24076#discussion_r2009986535 From tschatzl at openjdk.org Mon Mar 24 11:37:57 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 24 Mar 2025 11:37:57 GMT Subject: RFR: 8352508: [Redo] G1: Pinned regions with pinned objects only reachable by native code crash VM [v4] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that re-implements the fix for [JDK-8351921](https://bugs.openjdk.org/browse/JDK-8351921); in that fix we (think we) forgot to consider the same situation with optional regions. > > I.e. the previous fix only fixed the situation occurring during initial evacuation, however as we add regions due to optional evacuation, the same situation can still happen. > > So this change adds some work to every evacuation phase that marks all pinned regions in the current collection set as evacuation failed/pinned instead of only doing this work once in the pre evacuation phase. > > As for testing, it is extremely hard to induce a situation where there is a pinned region with no apparent live objects in an optional collection set, so I gave up and just added the original test again. > > Testing: gha, test > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: * improve test to also test empty pinned humongous regions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24147/files - new: https://git.openjdk.org/jdk/pull/24147/files/10ca8700..c4c04b98 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24147&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24147&range=02-03 Stats: 26 lines in 1 file changed: 18 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24147/head:pull/24147 PR: https://git.openjdk.org/jdk/pull/24147 From ivan.walulya at oracle.com Mon Mar 24 11:43:49 2025 From: ivan.walulya at oracle.com (Ivan Walulya) Date: Mon, 24 Mar 2025 11:43:49 +0000 Subject: Moving Forward with AHS for G1 In-Reply-To: References: Message-ID: Hi, Thanks for the summary. At Oracle, we are refining the use of GCTimeRatio and enhancing the memory uncommit mechanism [3]. Specifically, we are exploring uncommitting memory during any GC event, rather than restricting it to Remark or Full GCs, as in the current implementation. Additionally, we are investigating ways to improve on the current use of GC events as a time-base. // Ivan On 21 Mar 2025, at 01:19, Monica Beckwith wrote: Hi all, Following up on the previous discussions around Automatic Heap Sizing (AHS) for G1, I wanted to summarize the key takeaways and outline the next steps. In my November message [1], I described how AHS could dynamically manage heap sizing based on multiple inputs, including global memory pressure, GCTimeRatio policy, and user-defined heap tunables. This aligns with Thomas?s summary [2], which outlines how AHS integrates with G1?s existing mechanisms, including CPU-based heap resizing (JDK-8238687) [3], external constraints like CurrentMaxHeapSize (JDK-8204088) [4], and SoftMaxHeapSize (JDK-8236073) [5] as a key influence on heap adjustments. AHS will operate as a broader mechanism, where SoftMaxHeapSize serves as a heuristic to guide memory management but does not impose strict limits. It will work in conjunction with CPU-based heuristics to manage heap growth and contraction efficiently. Google?s previous work on ProposedHeapSize for G1 contributed valuable insights into adaptive heap management, but as discussions evolved, the consensus has shifted toward a model centered onSoftMaxHeapSize as a guiding input within AHS. Given this consensus, I will proceed with the implementation of JDK-8236073 [5] to ensure that AHS integrates effectively with G1?s dynamic heap sizing policies. I will share updates as the work progresses. If there are any additional concerns or areas where further clarification is needed, please let me know. Thanks again for the valuable discussions. Best, Monica ________________________________ References [1] Monica Beckwith, "Clarifications on AHS behavior and its role in G1," OpenJDK hotspot-gc-dev mailing list, November 2024. [https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050191.html] [2] Thomas Schatzl, "Giving a rough summary about the system envisioned for G1," OpenJDK hotspot-gc-dev mailing list, February 2025. [https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051069.html] [3] OpenJDK Issue, "JDK-8238687: Improve CPU-based heap sizing in G1," OpenJDK Bug Database. [https://bugs.openjdk.org/browse/JDK-8238687] [4] OpenJDK Issue, "JDK-8204088: Introduce CurrentMaxHeapSize to allow external heap constraints," OpenJDK Bug Database. [https://bugs.openjdk.org/browse/JDK-8204088] [5] OpenJDK Issue, "JDK-8236073: Introduce SoftMaxHeapSize as a guide for G1 AHS," OpenJDK Bug Database. [https://bugs.openjdk.org/browse/JDK-8236073] Book time to meet with me -------------- next part -------------- An HTML attachment was scrubbed... URL: From tschatzl at openjdk.org Mon Mar 24 12:21:20 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 24 Mar 2025 12:21:20 GMT Subject: RFR: 8352147: G1: TestEagerReclaimHumongousRegionsClearMarkBits test takes very long [v4] In-Reply-To: <4E86maQ-caPooTGiXJRzXxhZy5AKANQKqNZYhqCuP8Y=.4c2014bb-e230-4d80-b574-d6ac03152f7c@github.com> References: <4E86maQ-caPooTGiXJRzXxhZy5AKANQKqNZYhqCuP8Y=.4c2014bb-e230-4d80-b574-d6ac03152f7c@github.com> Message-ID: <_27kUJJuU0wgj02RiqY6nveNiifiEOQ8okGDqGL04ac=.78a195d1-df94-477e-911c-69347d8102f8@github.com> On Mon, 17 Mar 2025 16:34:12 GMT, Ivan Walulya wrote: >> Thomas Schatzl has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> 8352147 >> >> Hi all, >> >> please review this refactor of the TestEagerReclaimHumongousRegionsClearMarkBits test that runs way too long, and sometimes even causing timeouts in GHA. >> >> So the problem is that TestEagerReclaimHumongousRegionsClearMarkBits checks whether after eager reclaim during marking the mark on the humongous object is cleared correctly. It does so with a trial-and-error approach allocating humongous objects and hoping that the faulty state somehow occurs. >> >> This can take a long time, and although the test limits itself to 50s runtime, for some reason there can still be sporadic timeouts in some setups. >> >> However for a long time it is possible to halt concurrent mark just before completion, inducing the exact state needed for this test. So rewrite the test to be more targeted. >> >> Testing: gha, running test locally >> >> Thanks, >> Thomas >> >> * also check for actual region reclamation >> * last minute typo > > LGTM! Thanks @walulyai @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/24077#issuecomment-2747932555 From tschatzl at openjdk.org Mon Mar 24 12:21:21 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 24 Mar 2025 12:21:21 GMT Subject: Integrated: 8352147: G1: TestEagerReclaimHumongousRegionsClearMarkBits test takes very long In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 11:49:42 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactor of the TestEagerReclaimHumongousRegionsClearMarkBits test that runs way too long, and sometimes even causing timeouts in GHA. > > So the problem is that TestEagerReclaimHumongousRegionsClearMarkBits checks whether after eager reclaim during marking the mark on the humongous object is cleared correctly. It does so with a trial-and-error approach allocating humongous objects and hoping that the faulty state somehow occurs. > > This can take a long time, and although the test limits itself to 50s runtime, for some reason there can still be sporadic timeouts in some setups. > > However for a long time it has been possible to halt concurrent mark just before completion, inducing the exact state needed for this test. So rewrite the test to be more targeted. > > Testing: gha, running test locally > > Thanks, > Thomas This pull request has now been integrated. Changeset: 02a4ce23 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/02a4ce23f8353a9dd6400f2dd44f9cc1649626d3 Stats: 112 lines in 1 file changed: 25 ins; 67 del; 20 mod 8352147: G1: TestEagerReclaimHumongousRegionsClearMarkBits test takes very long Reviewed-by: iwalulya, ayang ------------- PR: https://git.openjdk.org/jdk/pull/24077 From iwalulya at openjdk.org Mon Mar 24 12:56:50 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 24 Mar 2025 12:56:50 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection [v3] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change which aims to reduce spikes in mixed GCs, especially the last mixed-gc in a mixed cycle. Currently, G1 sorts regions identified for collection by reclaimable bytes, then prunes the list removing regions that with the lowest amount of reclaimable bytes. The pruned list is then split into collection groups which are later sorted on gc-efficiency. > > In the cachestress benchmark, we run into a case where some regions contain onlya few live objects but having many incoming references from other regions. These regions very expensive collect (low gc-efficiency). > > This patch improves the pruning process by tracking incoming references to regions during marking. Instead of pruning based on reclaimable bytes alone, we estimate GC efficiency beforehand and prune regions with the worst GC efficiency. > > This reduces the spikes in gc pause time as shown for cachestress benchmark in the image below. > > ![mixed-gc](https://github.com/user-attachments/assets/740fb51d-eb20-4946-bf90-4eef23afe2e4) > > > Testing: Tier 1-3. Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Albert Review - Merge remote-tracking branch 'upstream/master' into ReviseRegionSelection - Thomas Review - remove double prune - save - revise region selection - save - init ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24076/files - new: https://git.openjdk.org/jdk/pull/24076/files/f5fa92f0..649651a8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24076&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24076&range=01-02 Stats: 64054 lines in 1111 files changed: 31939 ins; 20494 del; 11621 mod Patch: https://git.openjdk.org/jdk/pull/24076.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24076/head:pull/24076 PR: https://git.openjdk.org/jdk/pull/24076 From tschatzl at openjdk.org Mon Mar 24 14:38:12 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 24 Mar 2025 14:38:12 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection [v3] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 12:56:50 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change which aims to reduce spikes in mixed GCs, especially the last mixed-gc in a mixed cycle. Currently, G1 sorts regions identified for collection by reclaimable bytes, then prunes the list removing regions that with the lowest amount of reclaimable bytes. The pruned list is then split into collection groups which are later sorted on gc-efficiency. >> >> In the cachestress benchmark, we run into a case where some regions contain onlya few live objects but having many incoming references from other regions. These regions very expensive collect (low gc-efficiency). >> >> This patch improves the pruning process by tracking incoming references to regions during marking. Instead of pruning based on reclaimable bytes alone, we estimate GC efficiency beforehand and prune regions with the worst GC efficiency. >> >> This reduces the spikes in gc pause time as shown for cachestress benchmark in the image below. >> >> ![mixed-gc](https://github.com/user-attachments/assets/740fb51d-eb20-4946-bf90-4eef23afe2e4) >> >> >> Testing: Tier 1-3. > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Albert Review > - Merge remote-tracking branch 'upstream/master' into ReviseRegionSelection > - Thomas Review > - remove double prune > - save > - revise region selection > - save > - init Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24076#pullrequestreview-2710620645 From ayang at openjdk.org Mon Mar 24 14:47:19 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 24 Mar 2025 14:47:19 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection [v3] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 12:56:50 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change which aims to reduce spikes in mixed GCs, especially the last mixed-gc in a mixed cycle. Currently, G1 sorts regions identified for collection by reclaimable bytes, then prunes the list removing regions that with the lowest amount of reclaimable bytes. The pruned list is then split into collection groups which are later sorted on gc-efficiency. >> >> In the cachestress benchmark, we run into a case where some regions contain onlya few live objects but having many incoming references from other regions. These regions very expensive collect (low gc-efficiency). >> >> This patch improves the pruning process by tracking incoming references to regions during marking. Instead of pruning based on reclaimable bytes alone, we estimate GC efficiency beforehand and prune regions with the worst GC efficiency. >> >> This reduces the spikes in gc pause time as shown for cachestress benchmark in the image below. >> >> ![mixed-gc](https://github.com/user-attachments/assets/740fb51d-eb20-4946-bf90-4eef23afe2e4) >> >> >> Testing: Tier 1-3. > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Albert Review > - Merge remote-tracking branch 'upstream/master' into ReviseRegionSelection > - Thomas Review > - remove double prune > - save > - revise region selection > - save > - init src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 151: > 149: G1Policy* p = G1CollectedHeap::heap()->policy(); > 150: size_t gc_efficiency1 = p->predict_gc_efficiency(ci1->_r); > 151: size_t gc_efficiency2 = p->predict_gc_efficiency(ci2->_r); Why converting `double` to `size_t`? Is it intentional that all `0.x` becomes 0 and are considered equal? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24076#discussion_r2010327576 From iwalulya at openjdk.org Mon Mar 24 14:50:12 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 24 Mar 2025 14:50:12 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection [v3] In-Reply-To: References: Message-ID: <5KLcu31JxL4GAzCgmnyS6BvCW6RBv1vSxhJvYrjjJQY=.5a59a5dc-58c5-4a66-9272-19e834215b99@github.com> On Mon, 24 Mar 2025 14:44:28 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Albert Review >> - Merge remote-tracking branch 'upstream/master' into ReviseRegionSelection >> - Thomas Review >> - remove double prune >> - save >> - revise region selection >> - save >> - init > > src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 151: > >> 149: G1Policy* p = G1CollectedHeap::heap()->policy(); >> 150: size_t gc_efficiency1 = p->predict_gc_efficiency(ci1->_r); >> 151: size_t gc_efficiency2 = p->predict_gc_efficiency(ci2->_r); > > Why converting `double` to `size_t`? Is it intentional that all `0.x` becomes 0 and are considered equal? Missed the conversion. Let me fix it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24076#discussion_r2010333249 From xpeng at openjdk.org Mon Mar 24 15:18:25 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 24 Mar 2025 15:18:25 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v4] In-Reply-To: References: Message-ID: > ### Root cause > Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. > > ### Solution > it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` > > In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. > > ### Test > - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" > - [x] TEST=hotspot_gc_shenandoah > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: No need to calculate gc_id using gc_count ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24166/files - new: https://git.openjdk.org/jdk/pull/24166/files/1920cf09..4c8c8136 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24166&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24166&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24166.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24166/head:pull/24166 PR: https://git.openjdk.org/jdk/pull/24166 From iwalulya at openjdk.org Mon Mar 24 15:25:35 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 24 Mar 2025 15:25:35 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection [v4] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change which aims to reduce spikes in mixed GCs, especially the last mixed-gc in a mixed cycle. Currently, G1 sorts regions identified for collection by reclaimable bytes, then prunes the list removing regions that with the lowest amount of reclaimable bytes. The pruned list is then split into collection groups which are later sorted on gc-efficiency. > > In the cachestress benchmark, we run into a case where some regions contain onlya few live objects but having many incoming references from other regions. These regions very expensive collect (low gc-efficiency). > > This patch improves the pruning process by tracking incoming references to regions during marking. Instead of pruning based on reclaimable bytes alone, we estimate GC efficiency beforehand and prune regions with the worst GC efficiency. > > This reduces the spikes in gc pause time as shown for cachestress benchmark in the image below. > > ![mixed-gc](https://github.com/user-attachments/assets/740fb51d-eb20-4946-bf90-4eef23afe2e4) > > > Testing: Tier 1-3. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: type conversion error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24076/files - new: https://git.openjdk.org/jdk/pull/24076/files/649651a8..8c123fa0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24076&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24076&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24076.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24076/head:pull/24076 PR: https://git.openjdk.org/jdk/pull/24076 From ayang at openjdk.org Mon Mar 24 17:32:08 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 24 Mar 2025 17:32:08 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection [v4] In-Reply-To: References: Message-ID: <7zuBWIFnEhUd_BVzKD1agmNqOeGH9rD4VcN2iVr8nEA=.b7ac40bd-d72e-47d4-bff2-06575dac3787@github.com> On Mon, 24 Mar 2025 15:25:35 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change which aims to reduce spikes in mixed GCs, especially the last mixed-gc in a mixed cycle. Currently, G1 sorts regions identified for collection by reclaimable bytes, then prunes the list removing regions that with the lowest amount of reclaimable bytes. The pruned list is then split into collection groups which are later sorted on gc-efficiency. >> >> In the cachestress benchmark, we run into a case where some regions contain onlya few live objects but having many incoming references from other regions. These regions very expensive collect (low gc-efficiency). >> >> This patch improves the pruning process by tracking incoming references to regions during marking. Instead of pruning based on reclaimable bytes alone, we estimate GC efficiency beforehand and prune regions with the worst GC efficiency. >> >> This reduces the spikes in gc pause time as shown for cachestress benchmark in the image below. >> >> ![mixed-gc](https://github.com/user-attachments/assets/740fb51d-eb20-4946-bf90-4eef23afe2e4) >> >> >> Testing: Tier 1-3. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > type conversion error Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24076#pullrequestreview-2711181243 From kdnilsen at openjdk.org Mon Mar 24 18:19:09 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 24 Mar 2025 18:19:09 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled [v2] In-Reply-To: <7PFHErLXXCsFeCjx55B_u8JisUcDGX9VFLa5azzsCso=.92f7d81d-8989-4aff-b57e-d2128403e01f@github.com> References: <7PFHErLXXCsFeCjx55B_u8JisUcDGX9VFLa5azzsCso=.92f7d81d-8989-4aff-b57e-d2128403e01f@github.com> Message-ID: On Tue, 18 Mar 2025 22:58:12 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoahSharedVariables.hpp line 243: >> >>> 241: assert (new_value < (sizeof(ShenandoahSharedValue) * CHAR_MAX), "sanity"); >>> 242: // Hmm, no platform template specialization defined for exchanging one byte... (up cast to intptr is workaround). >>> 243: return (T)Atomic::xchg((intptr_t*)&value, (intptr_t)new_value); >> >> That... likely gets awkward on different endianness. See the complicated dance `Atomic::CmpxchgByteUsingInt` has to do to handle it. >> >> Not to mention we are likely writing to adjacent memory location. Which is _currently_ innocuous, since we hit padding, but it is not very reliable. > > `PlatformCmpxchg` has specializations on aarch64 and x86 for `sizeof(T) == 1`. Should we also add platform specializations for `PlatformXchg` for `sizeof(T) == 1`? (It has them for `4` and `8`). Could also do what `XchgUsingCmpxchg` does... Maybe it is easiest/safest to change declaration of value to intptr_t. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24105#discussion_r2010707057 From kdnilsen at openjdk.org Mon Mar 24 18:24:08 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 24 Mar 2025 18:24:08 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled [v2] In-Reply-To: References: Message-ID: On Wed, 19 Mar 2025 18:33:50 GMT, William Kemper wrote: >> The sequence of events that creates this state: >> 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake >> 2. The regulator thread cancels old marking to start a young collection >> 3. A mutator thread shortly follows and attempts to cancel the nascent young collection >> 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure` >> 5. The mutator thread enters a tight loop in which it retries allocations without `waiting` >> 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Emulate single byte xchg with cmpxchg > - Merge remote-tracking branch 'jdk/master' into fix-uncancellable-young-gc > - Allow young cycles that interrupt old cycles to be cancelled src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2143: > 2141: > 2142: bool ShenandoahHeap::try_cancel_gc(GCCause::Cause cause) { > 2143: const jbyte prev = _cancelled_gc.xchg(cause); I guess maybe we want cause and prev to be integer type. Then the template will expand into a type that is known to that Atomic::xchg operation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24105#discussion_r2010713586 From wkemper at openjdk.org Mon Mar 24 18:33:08 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 24 Mar 2025 18:33:08 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v4] In-Reply-To: References: Message-ID: <__h_W-Ubi-14v0aDUciY2v5VuQnFHJOlabA7ZWIQcQM=.367913e0-84c3-46c5-86de-981509367951@github.com> On Mon, 24 Mar 2025 15:18:25 GMT, Xiaolong Peng wrote: >> ### Root cause >> Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. >> >> ### Solution >> it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` >> >> In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. >> >> ### Test >> - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" >> - [x] TEST=hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > No need to calculate gc_id using gc_count Changes requested by wkemper (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 137: > 135: // GC is starting, bump the internal gc count and set GCIdMark > 136: update_gc_count(); > 137: GCIdMark gc_id_mark; Can we still set the `GCIdMark` with our internal counter? I'd prefer they stay in sync explicitly. src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 576: > 574: "At end of Concurrent Young GC"; > 575: if (_heap->collection_set()->has_old_regions()) { > 576: mmu_tracker->record_mixed(gc_id()); Should these be `get_gc_count` now? ------------- PR Review: https://git.openjdk.org/jdk/pull/24166#pullrequestreview-2711348336 PR Review Comment: https://git.openjdk.org/jdk/pull/24166#discussion_r2010726624 PR Review Comment: https://git.openjdk.org/jdk/pull/24166#discussion_r2010725090 From xpeng at openjdk.org Mon Mar 24 18:48:15 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 24 Mar 2025 18:48:15 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v4] In-Reply-To: <__h_W-Ubi-14v0aDUciY2v5VuQnFHJOlabA7ZWIQcQM=.367913e0-84c3-46c5-86de-981509367951@github.com> References: <__h_W-Ubi-14v0aDUciY2v5VuQnFHJOlabA7ZWIQcQM=.367913e0-84c3-46c5-86de-981509367951@github.com> Message-ID: On Mon, 24 Mar 2025 18:30:23 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> No need to calculate gc_id using gc_count > > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 137: > >> 135: // GC is starting, bump the internal gc count and set GCIdMark >> 136: update_gc_count(); >> 137: GCIdMark gc_id_mark; > > Can we still set the `GCIdMark` with our internal counter? I'd prefer they stay in sync explicitly. GCIdMark use [GCId::_next_id](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shared/gcId.cpp#L31) here to generate GC id, if we do that the GCId::_next_id will remain 0, which is a the behavior change I had concern about. We can do it in another approach to keep both counters in sync explicitly: GCIdMark gc_id_mark; update_gc_count(gc_id() + 1) What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24166#discussion_r2010743091 From xpeng at openjdk.org Mon Mar 24 18:53:07 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 24 Mar 2025 18:53:07 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v4] In-Reply-To: <__h_W-Ubi-14v0aDUciY2v5VuQnFHJOlabA7ZWIQcQM=.367913e0-84c3-46c5-86de-981509367951@github.com> References: <__h_W-Ubi-14v0aDUciY2v5VuQnFHJOlabA7ZWIQcQM=.367913e0-84c3-46c5-86de-981509367951@github.com> Message-ID: On Mon, 24 Mar 2025 18:29:27 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> No need to calculate gc_id using gc_count > > src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 576: > >> 574: "At end of Concurrent Young GC"; >> 575: if (_heap->collection_set()->has_old_regions()) { >> 576: mmu_tracker->record_mixed(gc_id()); > > Should these be `get_gc_count` now? Shouldn't we always use gc id for MMUTracker? Although the internal gc counter of Shenandoah is also fine here. I'm ok to change it back to get_gc_count, but will also update the declaration of the relevant methods like below to make them consistent: void record_global(size_t gc_count) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24166#discussion_r2010754634 From xpeng at openjdk.org Mon Mar 24 19:21:23 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 24 Mar 2025 19:21:23 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v5] In-Reply-To: References: Message-ID: > ### Root cause > Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. > > ### Solution > it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` > > In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. > > ### Test > - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" > - [x] TEST=hotspot_gc_shenandoah > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Keep gc id and shenandoah internal gc count in sync ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24166/files - new: https://git.openjdk.org/jdk/pull/24166/files/4c8c8136..f4844848 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24166&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24166&range=03-04 Stats: 9 lines in 4 files changed: 3 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24166.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24166/head:pull/24166 PR: https://git.openjdk.org/jdk/pull/24166 From wkemper at openjdk.org Mon Mar 24 21:45:21 2025 From: wkemper at openjdk.org (William Kemper) Date: Mon, 24 Mar 2025 21:45:21 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled [v3] In-Reply-To: References: Message-ID: > The sequence of events that creates this state: > 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake > 2. The regulator thread cancels old marking to start a young collection > 3. A mutator thread shortly follows and attempts to cancel the nascent young collection > 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure` > 5. The mutator thread enters a tight loop in which it retries allocations without `waiting` > 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Widen type of shared enum value to unlock platform support of atomic xchg ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24105/files - new: https://git.openjdk.org/jdk/pull/24105/files/adcb999b..ca45ff02 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24105&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24105&range=01-02 Stats: 9 lines in 1 file changed: 1 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24105.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24105/head:pull/24105 PR: https://git.openjdk.org/jdk/pull/24105 From xpeng at openjdk.org Mon Mar 24 22:27:45 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 24 Mar 2025 22:27:45 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v6] In-Reply-To: References: Message-ID: > ### Root cause > Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. > > ### Solution > it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` > > In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. > > ### Test > - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" > - [x] TEST=hotspot_gc_shenandoah > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Revert ShenandoahController::_gc_count related refactor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24166/files - new: https://git.openjdk.org/jdk/pull/24166/files/f4844848..57c43ef3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24166&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24166&range=04-05 Stats: 50 lines in 7 files changed: 3 ins; 3 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/24166.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24166/head:pull/24166 PR: https://git.openjdk.org/jdk/pull/24166 From xpeng at openjdk.org Mon Mar 24 22:27:45 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 24 Mar 2025 22:27:45 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v5] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 19:21:23 GMT, Xiaolong Peng wrote: >> ### Root cause >> Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. >> >> ### Solution >> it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` >> >> In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. >> >> ### Test >> - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" >> - [x] TEST=hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Keep gc id and shenandoah internal gc count in sync Removed all code related to the refactor of henandoahController::_gc_id, now the change should be a pure fix for the bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24166#issuecomment-2749534672 From mgronlun at openjdk.org Mon Mar 24 22:33:23 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 24 Mar 2025 22:33:23 GMT Subject: RFR: 8348907: Stress times out when is executed with ZGC Message-ID: Greetings, Here is a suggested solution for solving the intricate deadlock issues involving virtual threads, ZGC load barriers, and JFR. A JFR event can be allocated and committed in specific sensitive contexts, such as inside mutex-protected load barriers. If the thread is a virtual thread, JFR determines its thread name by loading the oop from the thread (jt->vthread()) as part of the event commit. This operation again triggers the load barrier, which contains a non-reentrant lock, effectively deadlocking the thread with itself. So, for specific sensitive event sites, JFR mustn't recurse or reenter into the same event site as part of the event commit. After a few iterations and prototypes, which failed because they eventually ended up touching some oop, I came up with the following. >From a user perspective, an event (site) can now be marked as "non-reentrant" by wrapping it in a helper class. This instruction now guarantees JFR will not reenter this site again as part of the event.commit(). The tradeoff is that we cannot write the virtual thread name for these sensitive event sites; we will instead report "" as the virtual thread name, which is the default virtual thread name in Java. All other information about the thread, such as the thread ID, virtual thread, etc., will still be reported. I believe it is a reasonable tradeoff and a general solution for sensitive JFR event sites, which are rare in practice, with minimal impact on event programming. Testing: jdk_jfr, stress testing Let me know what you think. Thanks Markus ------------- Commit messages: - 8348907 Changes: https://git.openjdk.org/jdk/pull/24209/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24209&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8348907 Stats: 160 lines in 10 files changed: 139 ins; 12 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24209.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24209/head:pull/24209 PR: https://git.openjdk.org/jdk/pull/24209 From ysr at openjdk.org Mon Mar 24 23:00:09 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 24 Mar 2025 23:00:09 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v6] In-Reply-To: References: Message-ID: <7g5yci-7XKxmgaKSWma2-EQraeVr7cjABnKH9ifMZU4=.2b2ff9d2-2735-488c-bfe4-4d15f2990577@github.com> On Mon, 24 Mar 2025 22:27:45 GMT, Xiaolong Peng wrote: >> ### Root cause >> Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. >> >> ### Solution >> it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` >> >> In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. >> >> ### Test >> - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" >> - [x] TEST=hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Revert ShenandoahController::_gc_count related refactor I haven't started reviewing, but in cases where we have a "mark" (a thread local stack scoped constant variable, such as used for logging etc.) and an under;ying "true value", the expectation is that the "mark" is a snapshot of the "true", and represents a label for the work being done in that specific scope. Once you keep this model/idiom in mind, the code should become clean, and the same 0-based conventions should cleanly apply. I hope to review the code soon'ish. Sorry for the delay. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24166#issuecomment-2749579634 From ysr at openjdk.org Mon Mar 24 23:11:09 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 24 Mar 2025 23:11:09 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v4] In-Reply-To: <__h_W-Ubi-14v0aDUciY2v5VuQnFHJOlabA7ZWIQcQM=.367913e0-84c3-46c5-86de-981509367951@github.com> References: <__h_W-Ubi-14v0aDUciY2v5VuQnFHJOlabA7ZWIQcQM=.367913e0-84c3-46c5-86de-981509367951@github.com> Message-ID: On Mon, 24 Mar 2025 18:30:23 GMT, William Kemper wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> No need to calculate gc_id using gc_count > > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 137: > >> 135: // GC is starting, bump the internal gc count and set GCIdMark >> 136: update_gc_count(); >> 137: GCIdMark gc_id_mark; > > Can we still set the `GCIdMark` with our internal counter? I'd prefer they stay in sync explicitly. @earthling-amzn : Is your concern that GC count is incremented concurrently by two different callers? If so, I'd have the atomic increment return the pre- or post-increment value as the case may be and have the caller use that in their mark label. (Question: do we have different Id's for young and a concurrent/interrupted old? -- I would imagine so, with the old carrying an older id, and each subsequent young getting a newer id). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24166#discussion_r2011041651 From manc at openjdk.org Mon Mar 24 23:12:16 2025 From: manc at openjdk.org (Man Cao) Date: Mon, 24 Mar 2025 23:12:16 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics Message-ID: Hi all, I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: - does not respect `MinHeapSize`; - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; - does not affect heuristcs to trigger a concurrent cycle; [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. ------------- Commit messages: - G1: Use SoftMaxHeapSize to guide GC heuristics Changes: https://git.openjdk.org/jdk/pull/24211/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8236073 Stats: 35 lines in 5 files changed: 20 ins; 3 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24211/head:pull/24211 PR: https://git.openjdk.org/jdk/pull/24211 From manc at openjdk.org Mon Mar 24 23:15:08 2025 From: manc at openjdk.org (Man Cao) Date: Mon, 24 Mar 2025 23:15:08 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 23:07:08 GMT, Man Cao wrote: > Hi all, > > I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: > > - does not respect `MinHeapSize`; > - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; > - does not affect heuristcs to trigger a concurrent cycle; > > [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. @mo-beck Here is my implementation for `SoftMaxHeapSize` for G1. Let me know if you have any feedback or concerns. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2749599999 From ysr at openjdk.org Mon Mar 24 23:17:06 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 24 Mar 2025 23:17:06 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v6] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 22:27:45 GMT, Xiaolong Peng wrote: >> ### Root cause >> Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. >> >> ### Solution >> it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` >> >> In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. >> >> ### Test >> - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" >> - [x] TEST=hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Revert ShenandoahController::_gc_count related refactor > Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is undefined, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. This would be by design and, as you discovered, was because a suitable GCIdMark scope was missing which would have supplied the correct ID. It is important that the JFR event issues from the intended scope for the corresponding ID for which the metrics/event are being generated. In particular, if there are multiple concurrent GC ID's in progress, with a common pool of worker threads that multiplex this work, any appropriate event metrics should be correctly attributed to the right ID in question. I am making general comments here without knowledge of the specific details, sorry! :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24166#issuecomment-2749602926 From xpeng at openjdk.org Mon Mar 24 23:41:11 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 24 Mar 2025 23:41:11 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v6] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 23:14:36 GMT, Y. Srinivas Ramakrishna wrote: > > Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is undefined, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. > > This would be by design and, as you discovered, was because a suitable GCIdMark scope was missing which would have supplied the correct ID. It is important that the JFR event issues from the intended scope for the corresponding ID for which the metrics/event are being generated. In particular, if there are multiple concurrent GC ID's in progress, with a common pool of worker threads that multiplex this work, any appropriate event metrics should be correctly attributed to the right ID in question. > > I am making general comments here without knowledge of the specific details, sorry! :-) Thank you @ysramakrishna for reviewing the PR, appreciate it! Yes, it is a simple bug related to the GCIdMark scope, so the fix is to make sure GCIdMark scope is correct. For common pool of worker threads, each thread should copy the gc_id to local with the constructor GCIdMark(gc_id), there some existing examples doing this in hotspot, e.g. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shared/workerThread.cpp#L68 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24166#issuecomment-2749637430 From manc at openjdk.org Mon Mar 24 23:58:11 2025 From: manc at openjdk.org (Man Cao) Date: Mon, 24 Mar 2025 23:58:11 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 23:07:08 GMT, Man Cao wrote: > Hi all, > > I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: > > - does not respect `MinHeapSize`; > - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; > - does not affect heuristcs to trigger a concurrent cycle; > > [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. This probably requires fixing https://bugs.openjdk.org/browse/JDK-8352765 before users try to use `SoftMaxHeapSize`. Otherwise, setting a small `SoftMaxHeapSize` could trigger premature OutOfMemoryError. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24211#issuecomment-2749664966 From iwalulya at openjdk.org Tue Mar 25 09:27:23 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 25 Mar 2025 09:27:23 GMT Subject: RFR: 8351405: G1: Collection set early pruning causes suboptimal region selection [v4] In-Reply-To: <7zuBWIFnEhUd_BVzKD1agmNqOeGH9rD4VcN2iVr8nEA=.b7ac40bd-d72e-47d4-bff2-06575dac3787@github.com> References: <7zuBWIFnEhUd_BVzKD1agmNqOeGH9rD4VcN2iVr8nEA=.b7ac40bd-d72e-47d4-bff2-06575dac3787@github.com> Message-ID: On Mon, 24 Mar 2025 17:29:29 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> type conversion error > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk and @tschatzl for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24076#issuecomment-2750616897 From iwalulya at openjdk.org Tue Mar 25 09:27:23 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 25 Mar 2025 09:27:23 GMT Subject: Integrated: 8351405: G1: Collection set early pruning causes suboptimal region selection In-Reply-To: References: Message-ID: <5H32-J3_krD33bc5TY0X-cXYWFljRMhuPtrUoIpYuHk=.cfb80a42-b0ad-4f43-95ce-d80372e8c98e@github.com> On Mon, 17 Mar 2025 11:19:02 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change which aims to reduce spikes in mixed GCs, especially the last mixed-gc in a mixed cycle. Currently, G1 sorts regions identified for collection by reclaimable bytes, then prunes the list removing regions that with the lowest amount of reclaimable bytes. The pruned list is then split into collection groups which are later sorted on gc-efficiency. > > In the cachestress benchmark, we run into a case where some regions contain onlya few live objects but having many incoming references from other regions. These regions very expensive collect (low gc-efficiency). > > This patch improves the pruning process by tracking incoming references to regions during marking. Instead of pruning based on reclaimable bytes alone, we estimate GC efficiency beforehand and prune regions with the worst GC efficiency. > > This reduces the spikes in gc pause time as shown for cachestress benchmark in the image below. > > ![mixed-gc](https://github.com/user-attachments/assets/740fb51d-eb20-4946-bf90-4eef23afe2e4) > > > Testing: Tier 1-3. This pull request has now been integrated. Changeset: 6879c446 Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/6879c446c6e7734c162c85bd2bd3d7f3b012cca4 Stats: 77 lines in 13 files changed: 57 ins; 2 del; 18 mod 8351405: G1: Collection set early pruning causes suboptimal region selection Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/24076 From shade at openjdk.org Tue Mar 25 11:13:24 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Mar 2025 11:13:24 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled [v3] In-Reply-To: References: Message-ID: <7WjwCHk4uVXhc0eAyxzIrplCMu0DLQm1U_thb56D0as=.d24099dc-498c-45c5-9cc6-1bffa34a5c05@github.com> On Mon, 24 Mar 2025 18:21:14 GMT, Kelvin Nilsen wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Widen type of shared enum value to unlock platform support of atomic xchg > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2143: > >> 2141: >> 2142: bool ShenandoahHeap::try_cancel_gc(GCCause::Cause cause) { >> 2143: const jbyte prev = _cancelled_gc.xchg(cause); > > I guess maybe we want cause and prev to be integer type. Then the template will expand into a type that is known to that Atomic::xchg operation. So this thing is no longer `jbyte`, so implicit cast to `jbyte` is no longer safe. I think we should really be casting to `GCCause::Cause`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24105#discussion_r2011861121 From mgronlun at openjdk.org Tue Mar 25 13:18:41 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 25 Mar 2025 13:18:41 GMT Subject: RFR: 8348907: Stress times out when is executed with ZGC [v2] In-Reply-To: References: Message-ID: <-Lq8r8IewtCDh2Y7b548AHXu-k-3q9maqk0s9W5H5ac=.7fc9e5c4-d94c-46f7-af4a-5bcdc93a6b28@github.com> > Greetings, > > Here is a suggested solution for solving the intricate deadlock issues involving virtual threads, ZGC load barriers, and JFR. > > A JFR event can be allocated and committed in specific sensitive contexts, such as inside mutex-protected load barriers. If the thread is a virtual thread, JFR determines its thread name by loading the oop from the thread (jt->vthread()) as part of the event commit. > > This operation again triggers the load barrier, which contains a non-reentrant lock, effectively deadlocking the thread with itself. > > So, for specific sensitive event sites, JFR mustn't recurse or reenter into the same event site as part of the event commit. > > After a few iterations and prototypes, which failed because they eventually ended up touching some oop, I came up with the following. > > From a user perspective, an event (site) can now be marked as "non-reentrant" by wrapping it in a helper class. > > This instruction now guarantees JFR will not reenter this site again as part of the event.commit(). > > The tradeoff is that we cannot write the virtual thread name for these sensitive event sites; we will instead report "" as the virtual thread name, which is the default virtual thread name in Java. All other information about the thread, such as the thread ID, virtual thread, etc., will still be reported. > > I believe it is a reasonable tradeoff and a general solution for sensitive JFR event sites, which are rare in practice, with minimal impact on event programming. > > Testing: jdk_jfr, stress testing > > Let me know what you think. > > Thanks > Markus Markus Gr?nlund has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - autogenerate helper classes for non-reentrancy - Merge branch '8348907' of github.com:mgronlun/jdk into 8348907 - 8348907 - 8348907 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24209/files - new: https://git.openjdk.org/jdk/pull/24209/files/8aa59c8b..29064afa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24209&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24209&range=00-01 Stats: 10002 lines in 187 files changed: 6686 ins; 888 del; 2428 mod Patch: https://git.openjdk.org/jdk/pull/24209.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24209/head:pull/24209 PR: https://git.openjdk.org/jdk/pull/24209 From mgronlun at openjdk.org Tue Mar 25 13:23:41 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 25 Mar 2025 13:23:41 GMT Subject: RFR: 8348907: Stress times out when is executed with ZGC [v3] In-Reply-To: References: Message-ID: > Greetings, > > Here is a suggested solution for solving the intricate deadlock issues involving virtual threads, ZGC load barriers, and JFR. > > A JFR event can be allocated and committed in specific sensitive contexts, such as inside mutex-protected load barriers. If the thread is a virtual thread, JFR determines its thread name by loading the oop from the thread (jt->vthread()) as part of the event commit. > > This operation again triggers the load barrier, which contains a non-reentrant lock, effectively deadlocking the thread with itself. > > So, for specific sensitive event sites, JFR mustn't recurse or reenter into the same event site as part of the event commit. > > After a few iterations and prototypes, which failed because they eventually ended up touching some oop, I came up with the following. > > From a user perspective, an event (site) can now be marked as "non-reentrant" by wrapping it in a helper class. > > This instruction now guarantees JFR will not reenter this site again as part of the event.commit(). > > The tradeoff is that we cannot write the virtual thread name for these sensitive event sites; we will instead report "" as the virtual thread name, which is the default virtual thread name in Java. All other information about the thread, such as the thread ID, virtual thread, etc., will still be reported. > > I believe it is a reasonable tradeoff and a general solution for sensitive JFR event sites, which are rare in practice, with minimal impact on event programming. > > Testing: jdk_jfr, stress testing > > Let me know what you think. > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: fake commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24209/files - new: https://git.openjdk.org/jdk/pull/24209/files/29064afa..5ca42d01 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24209&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24209&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24209.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24209/head:pull/24209 PR: https://git.openjdk.org/jdk/pull/24209 From mgronlun at openjdk.org Tue Mar 25 13:38:56 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 25 Mar 2025 13:38:56 GMT Subject: RFR: 8348907: Stress times out when is executed with ZGC [v4] In-Reply-To: References: Message-ID: <3DfXgxoWFc07J6yfhjREbhiABvhwPSclqG0RvAjVtP8=.8ad57991-14b9-4c9d-90ef-35daa81f7d9d@github.com> > Greetings, > > Here is a suggested solution for solving the intricate deadlock issues involving virtual threads, ZGC load barriers, and JFR. > > A JFR event can be allocated and committed in specific sensitive contexts, such as inside mutex-protected load barriers. If the thread is a virtual thread, JFR determines its thread name by loading the oop from the thread (jt->vthread()) as part of the event commit. > > This operation again triggers the load barrier, which contains a non-reentrant lock, effectively deadlocking the thread with itself. > > So, for specific sensitive event sites, JFR mustn't recurse or reenter into the same event site as part of the event commit. > > After a few iterations and prototypes, which failed because they eventually ended up touching some oop, I came up with the following. > > From a user perspective, an event (site) can now be marked as "non-reentrant" by wrapping it in a helper class. > > This instruction now guarantees JFR will not reenter this site again as part of the event.commit(). > > The tradeoff is that we cannot write the virtual thread name for these sensitive event sites; we will instead report "" as the virtual thread name, which is the default virtual thread name in Java. All other information about the thread, such as the thread ID, virtual thread, etc., will still be reported. > > I believe it is a reasonable tradeoff and a general solution for sensitive JFR event sites, which are rare in practice, with minimal impact on event programming. > > Testing: jdk_jfr, stress testing > > Let me know what you think. > > Thanks > Markus Markus Gr?nlund has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. The pull request now contains one commit: 8348907 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24209/files - new: https://git.openjdk.org/jdk/pull/24209/files/5ca42d01..ee5be9da Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24209&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24209&range=02-03 Stats: 91 lines in 3 files changed: 57 ins; 34 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24209.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24209/head:pull/24209 PR: https://git.openjdk.org/jdk/pull/24209 From mgronlun at openjdk.org Tue Mar 25 13:42:03 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 25 Mar 2025 13:42:03 GMT Subject: RFR: 8348907: Stress times out when is executed with ZGC [v5] In-Reply-To: References: Message-ID: <-nNSX0SdDXDwR7kJGugqsJs36XLLKjH_UpvZEAvyt_c=.e32dbc05-2ef1-4765-a3fb-5c7bb83c2c15@github.com> > Greetings, > > Here is a suggested solution for solving the intricate deadlock issues involving virtual threads, ZGC load barriers, and JFR. > > A JFR event can be allocated and committed in specific sensitive contexts, such as inside mutex-protected load barriers. If the thread is a virtual thread, JFR determines its thread name by loading the oop from the thread (jt->vthread()) as part of the event commit. > > This operation again triggers the load barrier, which contains a non-reentrant lock, effectively deadlocking the thread with itself. > > So, for specific sensitive event sites, JFR mustn't recurse or reenter into the same event site as part of the event commit. > > After a few iterations and prototypes, which failed because they eventually ended up touching some oop, I came up with the following. > > From a user perspective, an event (site) can now be marked as "non-reentrant" by wrapping it in a helper class. > > This instruction now guarantees JFR will not reenter this site again as part of the event.commit(). > > The tradeoff is that we cannot write the virtual thread name for these sensitive event sites; we will instead report "" as the virtual thread name, which is the default virtual thread name in Java. All other information about the thread, such as the thread ID, virtual thread, etc., will still be reported. > > I believe it is a reasonable tradeoff and a general solution for sensitive JFR event sites, which are rare in practice, with minimal impact on event programming. > > Testing: jdk_jfr, stress testing > > Let me know what you think. > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: autogenerate helper classes for non-reentrancy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24209/files - new: https://git.openjdk.org/jdk/pull/24209/files/ee5be9da..72116f34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24209&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24209&range=03-04 Stats: 91 lines in 3 files changed: 34 ins; 57 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24209.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24209/head:pull/24209 PR: https://git.openjdk.org/jdk/pull/24209 From jsikstro at openjdk.org Tue Mar 25 14:06:46 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 25 Mar 2025 14:06:46 GMT Subject: RFR: 8352762: Use EXACTFMT instead of expanded version where applicable Message-ID: [JDK-8310233](https://bugs.openjdk.org/browse/JDK-8310233) introduced the EXACTFMT macro, which is a shorthand for printing exact values using methods defined in globalDefinitions.hpp. There are currently 20 places in HotSpot which uses the expanded version of the macro, along with the "trace_page_size_params" macro that is defined and used in os.cpp. I have replaced places that use the expanded macro(s) with EXACTFMT + EXACTFMTARGS, and also removed trace_page_size_params from os.cpp, which was essentially a redefnition of EXACTFMTARGS. Testing: GHA, tiers 1-4 ------------- Commit messages: - 8352762: Use EXACTFMT instead of expanded version where applicable Changes: https://git.openjdk.org/jdk/pull/24228/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24228&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352762 Stats: 70 lines in 8 files changed: 0 ins; 20 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/24228.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24228/head:pull/24228 PR: https://git.openjdk.org/jdk/pull/24228 From mgronlun at openjdk.org Tue Mar 25 17:03:24 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 25 Mar 2025 17:03:24 GMT Subject: RFR: 8348907: Stress times out when is executed with ZGC [v6] In-Reply-To: References: Message-ID: > Greetings, > > Here is a suggested solution for solving the intricate deadlock issues involving virtual threads, ZGC load barriers, and JFR. > > A JFR event can be allocated and committed in specific sensitive contexts, such as inside mutex-protected load barriers. If the thread is a virtual thread, JFR determines its thread name by loading the oop from the thread (jt->vthread()) as part of the event commit. > > This operation again triggers the load barrier, which contains a non-reentrant lock, effectively deadlocking the thread with itself. > > So, for specific sensitive event sites, JFR mustn't recurse or reenter into the same event site as part of the event commit. > > After a few iterations and prototypes, which failed because they eventually ended up touching some oop, I came up with the following. > > From a user perspective, an event (site) can now be marked as "non-reentrant" by wrapping it in a helper class. > > This instruction now guarantees JFR will not reenter this site again as part of the event.commit(). > > The tradeoff is that we cannot write the virtual thread name for these sensitive event sites; we will instead report "" as the virtual thread name, which is the default virtual thread name in Java. All other information about the thread, such as the thread ID, virtual thread, etc., will still be reported. > > I believe it is a reasonable tradeoff and a general solution for sensitive JFR event sites, which are rare in practice, with minimal impact on event programming. > > Testing: jdk_jfr, stress testing > > Let me know what you think. > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: renames ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24209/files - new: https://git.openjdk.org/jdk/pull/24209/files/72116f34..f11f33bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24209&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24209&range=04-05 Stats: 18 lines in 8 files changed: 3 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24209.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24209/head:pull/24209 PR: https://git.openjdk.org/jdk/pull/24209 From mgronlun at openjdk.org Tue Mar 25 17:13:55 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 25 Mar 2025 17:13:55 GMT Subject: RFR: 8348907: Stress times out when is executed with ZGC [v7] In-Reply-To: References: Message-ID: > Greetings, > > Here is a suggested solution for solving the intricate deadlock issues involving virtual threads, ZGC load barriers, and JFR. > > A JFR event can be allocated and committed in specific sensitive contexts, such as inside mutex-protected load barriers. If the thread is a virtual thread, JFR determines its thread name by loading the oop from the thread (jt->vthread()) as part of the event commit. > > This operation again triggers the load barrier, which contains a non-reentrant lock, effectively deadlocking the thread with itself. > > So, for specific sensitive event sites, JFR mustn't recurse or reenter into the same event site as part of the event commit. > > After a few iterations and prototypes, which failed because they eventually ended up touching some oop, I came up with the following. > > From a user perspective, an event (site) can now be marked as "non-reentrant" by wrapping it in a helper class. > > This instruction now guarantees JFR will not reenter this site again as part of the event.commit(). > > The tradeoff is that we cannot write the virtual thread name for these sensitive event sites; we will instead report "" as the virtual thread name, which is the default virtual thread name in Java. All other information about the thread, such as the thread ID, virtual thread, etc., will still be reported. > > I believe it is a reasonable tradeoff and a general solution for sensitive JFR event sites, which are rare in practice, with minimal impact on event programming. > > Testing: jdk_jfr, stress testing > > Let me know what you think. > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: delegate thread assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24209/files - new: https://git.openjdk.org/jdk/pull/24209/files/f11f33bc..8d9e14bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24209&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24209&range=05-06 Stats: 5 lines in 1 file changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24209.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24209/head:pull/24209 PR: https://git.openjdk.org/jdk/pull/24209 From egahlin at openjdk.org Tue Mar 25 17:50:27 2025 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 25 Mar 2025 17:50:27 GMT Subject: RFR: 8348907: Stress times out when is executed with ZGC [v7] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 17:13:55 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> Here is a suggested solution for solving the intricate deadlock issues involving virtual threads, ZGC load barriers, and JFR. >> >> A JFR event can be allocated and committed in specific sensitive contexts, such as inside mutex-protected load barriers. If the thread is a virtual thread, JFR determines its thread name by loading the oop from the thread (jt->vthread()) as part of the event commit. >> >> This operation again triggers the load barrier, which contains a non-reentrant lock, effectively deadlocking the thread with itself. >> >> So, for specific sensitive event sites, JFR mustn't recurse or reenter into the same event site as part of the event commit. >> >> After a few iterations and prototypes, which failed because they eventually ended up touching some oop, I came up with the following. >> >> From a user perspective, an event (site) can now be marked as "non-reentrant" by wrapping it in a helper class. >> >> This instruction now guarantees JFR will not reenter this site again as part of the event.commit(). >> >> The tradeoff is that we cannot write the virtual thread name for these sensitive event sites; we will instead report "" as the virtual thread name, which is the default virtual thread name in Java. All other information about the thread, such as the thread ID, virtual thread, etc., will still be reported. >> >> I believe it is a reasonable tradeoff and a general solution for sensitive JFR event sites, which are rare in practice, with minimal impact on event programming. >> >> Testing: jdk_jfr, stress testing >> >> Let me know what you think. >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > delegate thread assert Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24209#pullrequestreview-2714656423 From wkemper at openjdk.org Tue Mar 25 17:51:41 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Mar 2025 17:51:41 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled [v4] In-Reply-To: References: Message-ID: > The sequence of events that creates this state: > 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake > 2. The regulator thread cancels old marking to start a young collection > 3. A mutator thread shortly follows and attempts to cancel the nascent young collection > 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure` > 5. The mutator thread enters a tight loop in which it retries allocations without `waiting` > 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Stop casting GCCause to jbyte ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24105/files - new: https://git.openjdk.org/jdk/pull/24105/files/ca45ff02..abce0381 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24105&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24105&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24105.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24105/head:pull/24105 PR: https://git.openjdk.org/jdk/pull/24105 From wkemper at openjdk.org Tue Mar 25 17:51:41 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Mar 2025 17:51:41 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled [v4] In-Reply-To: <7WjwCHk4uVXhc0eAyxzIrplCMu0DLQm1U_thb56D0as=.d24099dc-498c-45c5-9cc6-1bffa34a5c05@github.com> References: <7WjwCHk4uVXhc0eAyxzIrplCMu0DLQm1U_thb56D0as=.d24099dc-498c-45c5-9cc6-1bffa34a5c05@github.com> Message-ID: On Tue, 25 Mar 2025 11:08:58 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2143: >> >>> 2141: >>> 2142: bool ShenandoahHeap::try_cancel_gc(GCCause::Cause cause) { >>> 2143: const jbyte prev = _cancelled_gc.xchg(cause); >> >> I guess maybe we want cause and prev to be integer type. Then the template will expand into a type that is known to that Atomic::xchg operation. > > So this thing is no longer `jbyte`, so implicit cast to `jbyte` is no longer safe. I think we should really be casting to `GCCause::Cause`. Yes! Good catch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24105#discussion_r2012624868 From shade at openjdk.org Tue Mar 25 18:11:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Mar 2025 18:11:12 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled [v4] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 17:51:41 GMT, William Kemper wrote: >> The sequence of events that creates this state: >> 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake >> 2. The regulator thread cancels old marking to start a young collection >> 3. A mutator thread shortly follows and attempts to cancel the nascent young collection >> 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure` >> 5. The mutator thread enters a tight loop in which it retries allocations without `waiting` >> 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Stop casting GCCause to jbyte I think we want to remove `addr_of` and related methods for `ShenandoahSharedEnumFlag` to avoid accidents. `ShenandoahSharedValue` was defined specifically to stick to `jbyte` for the sake of generated code. If we are not expected to have accesses to generated code to this flag, we should remove the APIs that allow it. Going forward, I think we should consider redefining `ShenandoahSharedValue` to `uint32_t` to begin with. This would require fiddling with barrier sets that might read them. ------------- PR Review: https://git.openjdk.org/jdk/pull/24105#pullrequestreview-2714735454 From wkemper at openjdk.org Tue Mar 25 19:03:07 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Mar 2025 19:03:07 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled [v4] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 17:51:41 GMT, William Kemper wrote: >> The sequence of events that creates this state: >> 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake >> 2. The regulator thread cancels old marking to start a young collection >> 3. A mutator thread shortly follows and attempts to cancel the nascent young collection >> 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure` >> 5. The mutator thread enters a tight loop in which it retries allocations without `waiting` >> 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Stop casting GCCause to jbyte `ShenandoahSharedEnumFlag` is only used for this one variable in `ShenandoahHeap`, do you want to remove it entirely and just have a plain `volatile GCCause::Cause _gc_cancelled` member? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24105#issuecomment-2752251839 From shade at openjdk.org Tue Mar 25 19:31:11 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Mar 2025 19:31:11 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled [v4] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 19:00:53 GMT, William Kemper wrote: > `ShenandoahSharedEnumFlag` is only used for this one variable in `ShenandoahHeap`, do you want to remove it entirely and just have a plain `volatile GCCause::Cause _gc_cancelled` member? Maybe? I was suspecting we want to have padding around the field to make sure we do not accidentally false-share it with anything. But that might not be a real issue. I think we should keep wrapping shared variables in `ShenandoahShared*` to clearly capture which fields are normally accessed by multiple threads, as to encapsulate all the atomic ops. Actually, leave `addr_of` alone, but file a RFE to redefine `ShenandoahSharedValue` to `uint32_t`, which would eliminate the deviation for the underlying `ShenandoahSharedEnumFlag`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24105#issuecomment-2752314412 From shade at openjdk.org Tue Mar 25 19:37:17 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 25 Mar 2025 19:37:17 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled [v4] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 17:51:41 GMT, William Kemper wrote: >> The sequence of events that creates this state: >> 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake >> 2. The regulator thread cancels old marking to start a young collection >> 3. A mutator thread shortly follows and attempts to cancel the nascent young collection >> 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure` >> 5. The mutator thread enters a tight loop in which it retries allocations without `waiting` >> 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Stop casting GCCause to jbyte Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24105#pullrequestreview-2714976353 From wkemper at openjdk.org Tue Mar 25 19:52:18 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Mar 2025 19:52:18 GMT Subject: Integrated: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 21:51:34 GMT, William Kemper wrote: > The sequence of events that creates this state: > 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake > 2. The regulator thread cancels old marking to start a young collection > 3. A mutator thread shortly follows and attempts to cancel the nascent young collection > 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure` > 5. The mutator thread enters a tight loop in which it retries allocations without `waiting` > 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`. This pull request has now been integrated. Changeset: dbc620fb Author: William Kemper URL: https://git.openjdk.org/jdk/commit/dbc620fb1f754ca84f2a07abfdfbd4c5fcb55087 Stats: 15 lines in 2 files changed: 7 ins; 0 del; 8 mod 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/24105 From wkemper at openjdk.org Tue Mar 25 19:52:18 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Mar 2025 19:52:18 GMT Subject: RFR: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled [v4] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 17:51:41 GMT, William Kemper wrote: >> The sequence of events that creates this state: >> 1. An old collection is trying to finish marking by flushing SATB buffers with a Handshake >> 2. The regulator thread cancels old marking to start a young collection >> 3. A mutator thread shortly follows and attempts to cancel the nascent young collection >> 4. Step `3` fails (because of this bug) and cancellation reason does _not_ become `allocation failure` >> 5. The mutator thread enters a tight loop in which it retries allocations without `waiting` >> 6. The mutator thread remains in the `thread_in_vm` state and prevents the VM thread from completing step `1`. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Stop casting GCCause to jbyte Okay, filed: https://bugs.openjdk.org/browse/JDK-8352914. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24105#issuecomment-2752357291 From ysr at openjdk.org Tue Mar 25 19:57:15 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 25 Mar 2025 19:57:15 GMT Subject: RFR: 8348400: GenShen: assert(ShenandoahHeap::heap()->is_full_gc_in_progress() || (used_regions_size() <= _max_capacity)) failed: Cannot use more than capacity # [v2] In-Reply-To: <-73CoqTBA5dJPEwr7bxSvDmMFC9g_LZpW-q7XSjjtrE=.4966fa3b-e98f-4a50-9492-22bf99eecf1f@github.com> References: <-73CoqTBA5dJPEwr7bxSvDmMFC9g_LZpW-q7XSjjtrE=.4966fa3b-e98f-4a50-9492-22bf99eecf1f@github.com> Message-ID: On Wed, 12 Mar 2025 23:17:44 GMT, William Kemper wrote: >> Shenandoah cannot recycle immediate trash regions during the concurrent weak roots phase, however some of these regions may be assigned to the old generation collector's reserve. When an evacuation/promotion tries to allocate in such a region, it will fail (as expected) and try to 'steal' a region from the mutator's partition of the free set. There are cases when this cannot be allowed due to capacity constraints. However, in some of these cases it will be possible to 'swap' a region between the old reserve and the mutator's partition. This change covers this case. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Do not enforce size constraints on generations" > > This reverts commit 11ff0677449fa6749df8830f4a03f1c7861ba314. Generally looks right, but a few comments for your consideration. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1293: > 1291: > 1292: ShenandoahGenerationalHeap* gen_heap = ShenandoahGenerationalHeap::heap(); > 1293: const size_t region_capacity = alloc_capacity(r); A general note on terminology. We have generally used "capacity" to mean the total space, including that which has been allocated, and "used" for the space that has been allocated and isn't available to allocate. I'd use "free" here and avoid the extra arithmetic. I notice that the method actually uses "used", rather than "free". I think the interface for _partitions `move_from_...` is unnecessarily fat. Since we send the region idx to the `move_from_...` method, why not let that method get the amount free, rather than passing it as an additional parameter? I see that we essentially use this value only at line 1300 to correct the evacuation reserve figure. (Side question: Why don't we do that when we do the swap after line 1327?) src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1321: > 1319: > 1320: if (unusable_trash != -1) { > 1321: // 2. Move it to the mutator partition // Move the unusable trash region we found to the mutator partition. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1324: > 1322: _partitions.move_from_partition_to_partition(unusable_trash, > 1323: ShenandoahFreeSetPartitionId::OldCollector, > 1324: ShenandoahFreeSetPartitionId::Mutator, region_capacity); Shouldn't `region_capacity` argument be the free space in the unusable trash region? Wouldn't that be 0 (else why "unusable"?) src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 327: > 325: // hold evacuated objects. If this occurs and memory is still available in the Mutator's free set, we will flip a region from > 326: // the Mutator free set into the Collector or OldCollector free set. > 327: void flip_to_gc(ShenandoahHeapRegion* r); It seems as if (the current implementation of) `flip_to_gc()` always succeeds in flipping. I'd add that to its spec comment. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 329: > 327: void flip_to_gc(ShenandoahHeapRegion* r); > 328: > 329: bool flip_to_old_gc(ShenandoahHeapRegion* r); // Return true if and only if successfully flipped to old partition. ------------- PR Review: https://git.openjdk.org/jdk/pull/23998#pullrequestreview-2714934231 PR Review Comment: https://git.openjdk.org/jdk/pull/23998#discussion_r2012841573 PR Review Comment: https://git.openjdk.org/jdk/pull/23998#discussion_r2012830401 PR Review Comment: https://git.openjdk.org/jdk/pull/23998#discussion_r2012835985 PR Review Comment: https://git.openjdk.org/jdk/pull/23998#discussion_r2012789624 PR Review Comment: https://git.openjdk.org/jdk/pull/23998#discussion_r2012790543 From wkemper at openjdk.org Tue Mar 25 20:50:14 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Mar 2025 20:50:14 GMT Subject: RFR: 8348400: GenShen: assert(ShenandoahHeap::heap()->is_full_gc_in_progress() || (used_regions_size() <= _max_capacity)) failed: Cannot use more than capacity # [v2] In-Reply-To: References: <-73CoqTBA5dJPEwr7bxSvDmMFC9g_LZpW-q7XSjjtrE=.4966fa3b-e98f-4a50-9492-22bf99eecf1f@github.com> Message-ID: On Tue, 25 Mar 2025 19:50:38 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert "Do not enforce size constraints on generations" >> >> This reverts commit 11ff0677449fa6749df8830f4a03f1c7861ba314. > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1324: > >> 1322: _partitions.move_from_partition_to_partition(unusable_trash, >> 1323: ShenandoahFreeSetPartitionId::OldCollector, >> 1324: ShenandoahFreeSetPartitionId::Mutator, region_capacity); > > Shouldn't `region_capacity` argument be the free space in the unusable trash region? Wouldn't that be 0 (else why "unusable"?) Yes, good catch. However, it won't be `0` because this region is only _temporarily_ unusable while concurrent weak roots is in progress. Elsewhere, when the freeset is rebuilt, the `alloc_capacity` of trash regions is considered equal to the region size (regardless if weak roots is in progress). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23998#discussion_r2012916069 From wkemper at openjdk.org Tue Mar 25 21:03:11 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Mar 2025 21:03:11 GMT Subject: RFR: 8348400: GenShen: assert(ShenandoahHeap::heap()->is_full_gc_in_progress() || (used_regions_size() <= _max_capacity)) failed: Cannot use more than capacity # [v2] In-Reply-To: References: <-73CoqTBA5dJPEwr7bxSvDmMFC9g_LZpW-q7XSjjtrE=.4966fa3b-e98f-4a50-9492-22bf99eecf1f@github.com> Message-ID: <4koDTG-c84SvK4641HlEpHJ-ICUze2za6BnZkchYdIA=.a6b39699-53da-4514-b48f-82f990d85b59@github.com> On Tue, 25 Mar 2025 19:53:35 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert "Do not enforce size constraints on generations" >> >> This reverts commit 11ff0677449fa6749df8830f4a03f1c7861ba314. > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1293: > >> 1291: >> 1292: ShenandoahGenerationalHeap* gen_heap = ShenandoahGenerationalHeap::heap(); >> 1293: const size_t region_capacity = alloc_capacity(r); > > A general note on terminology. We have generally used "capacity" to mean the total space, including that which has been allocated, and "used" for the space that has been allocated and isn't available to allocate. I'd use "free" here and avoid the extra arithmetic. > > I notice that the method actually uses "used", rather than "free". > > I think the interface for _partitions `move_from_...` is unnecessarily fat. Since we send the region idx to the `move_from_...` method, why not let that method get the amount free, rather than passing it as an additional parameter? > > I see that we essentially use this value only at line 1300 to correct the evacuation reserve figure. (Side question: Why don't we do that when we do the swap after line 1327?) I see your point, but there are cases where the business logic depends on the allocation capacity before adding the region to the freeset. In those cases, we'd compute allocation capacity twice. We could have an overload to compute and forward allocation capacity for the other cases? That's a good question. We should probably subtract the capacity of the unusable trash region from the reserve, and add the capacity of the usable region back in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23998#discussion_r2012933965 From wkemper at openjdk.org Tue Mar 25 21:16:06 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Mar 2025 21:16:06 GMT Subject: RFR: 8348400: GenShen: assert(ShenandoahHeap::heap()->is_full_gc_in_progress() || (used_regions_size() <= _max_capacity)) failed: Cannot use more than capacity # [v3] In-Reply-To: References: Message-ID: > Shenandoah cannot recycle immediate trash regions during the concurrent weak roots phase, however some of these regions may be assigned to the old generation collector's reserve. When an evacuation/promotion tries to allocate in such a region, it will fail (as expected) and try to 'steal' a region from the mutator's partition of the free set. There are cases when this cannot be allowed due to capacity constraints. However, in some of these cases it will be possible to 'swap' a region between the old reserve and the mutator's partition. This change covers this case. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Update evac reserve when swapping trash region for non-trash region - Use capacity of transferred region - Improve comments - Merge remote-tracking branch 'jdk/master' into fix-flip-to-old-reserve - Revert "Do not enforce size constraints on generations" This reverts commit 11ff0677449fa6749df8830f4a03f1c7861ba314. - Do not enforce size constraints on generations This will make it easier for the old generation collector to take regions from the mutator when necessary - Don't allocate in regions that cannot be flipped to old gc - Do not allocate from mutator if young gen cannot spare the region ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23998/files - new: https://git.openjdk.org/jdk/pull/23998/files/a42efe5a..1807b2ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23998&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23998&range=01-02 Stats: 121111 lines in 3547 files changed: 60844 ins; 38996 del; 21271 mod Patch: https://git.openjdk.org/jdk/pull/23998.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23998/head:pull/23998 PR: https://git.openjdk.org/jdk/pull/23998 From ysr at openjdk.org Tue Mar 25 21:28:10 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 25 Mar 2025 21:28:10 GMT Subject: RFR: 8348400: GenShen: assert(ShenandoahHeap::heap()->is_full_gc_in_progress() || (used_regions_size() <= _max_capacity)) failed: Cannot use more than capacity # [v3] In-Reply-To: References: Message-ID: <6ibXMioOPZ80OpozxJbnv9WsWQO2aKxiRIxWrteaDxs=.685aaf7e-9e56-4ab5-8737-b10e9c0d784a@github.com> On Tue, 25 Mar 2025 21:16:06 GMT, William Kemper wrote: >> Shenandoah cannot recycle immediate trash regions during the concurrent weak roots phase, however some of these regions may be assigned to the old generation collector's reserve. When an evacuation/promotion tries to allocate in such a region, it will fail (as expected) and try to 'steal' a region from the mutator's partition of the free set. There are cases when this cannot be allowed due to capacity constraints. However, in some of these cases it will be possible to 'swap' a region between the old reserve and the mutator's partition. This change covers this case. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Update evac reserve when swapping trash region for non-trash region > - Use capacity of transferred region > - Improve comments > - Merge remote-tracking branch 'jdk/master' into fix-flip-to-old-reserve > - Revert "Do not enforce size constraints on generations" > > This reverts commit 11ff0677449fa6749df8830f4a03f1c7861ba314. > - Do not enforce size constraints on generations > > This will make it easier for the old generation collector to take regions from the mutator when necessary > - Don't allocate in regions that cannot be flipped to old gc > - Do not allocate from mutator if young gen cannot spare the region Thanks for the more careful arithmetic in the adjustments. Let's rerun GHA and testing to make sure this doesn't have any knock-on effects that trigger other checks elsewhere. Looks good otherwise. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23998#pullrequestreview-2715240382 From wkemper at openjdk.org Tue Mar 25 23:20:39 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Mar 2025 23:20:39 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v13] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 22:48:24 GMT, Xiaolong Peng wrote: >> There are some scenarios in which GenShen may have improper remembered set verification logic: >> >> 1. Concurrent young cycles following a Full GC: >> >> In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification >> >> >> ShenandoahVerifier >> ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { >> shenandoah_assert_generations_reconciled(); >> if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { >> return _heap->complete_marking_context(); >> } >> return nullptr; >> } >> >> >> For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. >> >> 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. >> >> 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. >> >> 4. After concurrent young cycle evacuates objects from a young region, it update refs using marking bitmaps from marking context, therefore it won't update references of dead old objects(is_marked(obj) is false: obj is not marking strong/weak and it is below tams). In this case, if the next cycle if global concurrent GC, remembered set can't be verified before init-mark because of the dead pointers. >> >> ### Solution >> * After a full GC, always set marking completeness flag to false after reseting the marking bitmaps. >> * Because there could be dead pointers in old gen were not updated to point to new address after evacuation and refs update, we should disable rem-set validation before init-mark&update-refs if old marking context is incomplete. >> >> ### Test >> - [x] `make test TEST=hotspot_gc_shenandoah` >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > tide up Do we think this will fix https://bugs.openjdk.org/browse/JDK-8345399, should we add it as an issue to this PR? ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24092#pullrequestreview-2715405105 From kdnilsen at openjdk.org Tue Mar 25 23:20:39 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 25 Mar 2025 23:20:39 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v13] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 22:48:24 GMT, Xiaolong Peng wrote: >> There are some scenarios in which GenShen may have improper remembered set verification logic: >> >> 1. Concurrent young cycles following a Full GC: >> >> In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification >> >> >> ShenandoahVerifier >> ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { >> shenandoah_assert_generations_reconciled(); >> if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { >> return _heap->complete_marking_context(); >> } >> return nullptr; >> } >> >> >> For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. >> >> 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. >> >> 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. >> >> 4. After concurrent young cycle evacuates objects from a young region, it update refs using marking bitmaps from marking context, therefore it won't update references of dead old objects(is_marked(obj) is false: obj is not marking strong/weak and it is below tams). In this case, if the next cycle if global concurrent GC, remembered set can't be verified before init-mark because of the dead pointers. >> >> ### Solution >> * After a full GC, always set marking completeness flag to false after reseting the marking bitmaps. >> * Because there could be dead pointers in old gen were not updated to point to new address after evacuation and refs update, we should disable rem-set validation before init-mark&update-refs if old marking context is incomplete. >> >> ### Test >> - [x] `make test TEST=hotspot_gc_shenandoah` >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > tide up Thanks for the refinements. LGTM. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/24092#pullrequestreview-2715403856 From wkemper at openjdk.org Tue Mar 25 23:23:13 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Mar 2025 23:23:13 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v4] In-Reply-To: References: <__h_W-Ubi-14v0aDUciY2v5VuQnFHJOlabA7ZWIQcQM=.367913e0-84c3-46c5-86de-981509367951@github.com> Message-ID: On Mon, 24 Mar 2025 23:08:18 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 137: >> >>> 135: // GC is starting, bump the internal gc count and set GCIdMark >>> 136: update_gc_count(); >>> 137: GCIdMark gc_id_mark; >> >> Can we still set the `GCIdMark` with our internal counter? I'd prefer they stay in sync explicitly. > > @earthling-amzn : Is your concern that GC count is incremented concurrently by two different callers? If so, I'd have the atomic increment return the pre- or post-increment value as the case may be and have the caller use that in their mark label. (Question: do we have different Id's for young and a concurrent/interrupted old? -- I would imagine so, with the old carrying an older id, and each subsequent young getting a newer id). My concern was more that `ShenandoahController::_gc_id` hides a field in its base class `NamedThread::_gc_id`, but `ShenandoahController::_gc_id` starts from `1`, while `NamedThread::_gc_id` starts from `0`. I think this will be addressed in a separate PR. This PR has been simplified to only fix the root cause of the assertion failure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24166#discussion_r2013061689 From wkemper at openjdk.org Tue Mar 25 23:23:12 2025 From: wkemper at openjdk.org (William Kemper) Date: Tue, 25 Mar 2025 23:23:12 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v6] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 22:27:45 GMT, Xiaolong Peng wrote: >> ### Root cause >> Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. >> >> ### Solution >> it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` >> >> In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. >> >> ### Test >> - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" >> - [x] TEST=hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Revert ShenandoahController::_gc_count related refactor LGTM ------------- Marked as reviewed by wkemper (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24166#pullrequestreview-2715408360 From ysr at openjdk.org Tue Mar 25 23:32:09 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 25 Mar 2025 23:32:09 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v6] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 22:27:45 GMT, Xiaolong Peng wrote: >> ### Root cause >> Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. >> >> ### Solution >> it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` >> >> In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. >> >> ### Test >> - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" >> - [x] TEST=hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Revert ShenandoahController::_gc_count related refactor ? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24166#pullrequestreview-2715416184 From ysr at openjdk.org Tue Mar 25 23:32:10 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 25 Mar 2025 23:32:10 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v4] In-Reply-To: References: <__h_W-Ubi-14v0aDUciY2v5VuQnFHJOlabA7ZWIQcQM=.367913e0-84c3-46c5-86de-981509367951@github.com> Message-ID: On Tue, 25 Mar 2025 23:20:28 GMT, William Kemper wrote: >> @earthling-amzn : Is your concern that GC count is incremented concurrently by two different callers? If so, I'd have the atomic increment return the pre- or post-increment value as the case may be and have the caller use that in their mark label. (Question: do we have different Id's for young and a concurrent/interrupted old? -- I would imagine so, with the old carrying an older id, and each subsequent young getting a newer id). > > My concern was more that `ShenandoahController::_gc_id` hides a field in its base class `NamedThread::_gc_id`, but `ShenandoahController::_gc_id` starts from `1`, while `NamedThread::_gc_id` starts from `0`. I think this will be addressed in a separate PR. This PR has been simplified to only fix the root cause of the assertion failure. ah, i see. Good point. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24166#discussion_r2013067152 From xpeng at openjdk.org Tue Mar 25 23:32:21 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 25 Mar 2025 23:32:21 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v13] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 23:17:06 GMT, William Kemper wrote: > Do we think this will fix https://bugs.openjdk.org/browse/JDK-8345399, should we add it as an issue to this PR? It will likely fix the JDK-8345399, I mentioned it in JBS. will see if I can get a ppc64le hardware to verify this week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24092#issuecomment-2752762251 From xpeng at openjdk.org Tue Mar 25 23:53:07 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 25 Mar 2025 23:53:07 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v6] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 22:27:45 GMT, Xiaolong Peng wrote: >> ### Root cause >> Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. >> >> ### Solution >> it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` >> >> In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. >> >> ### Test >> - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" >> - [x] TEST=hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Revert ShenandoahController::_gc_count related refactor Thanks for the reviews and suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24166#issuecomment-2752786931 From duke at openjdk.org Tue Mar 25 23:53:08 2025 From: duke at openjdk.org (duke) Date: Tue, 25 Mar 2025 23:53:08 GMT Subject: RFR: 8352588: GenShen: Enabling JFR asserts when getting GCId [v6] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 22:27:45 GMT, Xiaolong Peng wrote: >> ### Root cause >> Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. >> >> ### Solution >> it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` >> >> In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. >> >> ### Test >> - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" >> - [x] TEST=hotspot_gc_shenandoah >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Revert ShenandoahController::_gc_count related refactor @pengxiaolong Your change (at version 57c43ef3417dec14a80e3f4dcab7d3666e93b033) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24166#issuecomment-2752787229 From ysr at openjdk.org Wed Mar 26 00:20:11 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 26 Mar 2025 00:20:11 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v13] In-Reply-To: References: <_GG5htdXFZ2Jv3qTAyG6djSrvXDtGx-jTLGoA2JbEXU=.b8588ac1-e51f-4ddf-afda-c64e6a789440@github.com> <7pNU1UWNVucen0QwESfFkOiKIP59gBVZNF5gCHveOQ0=.144c0152-20a4-41f4-8929-b906a284be7f@github.com> <2jgKqoBrD8WfxKs9cLqfzWa5AMS__muV4O0IxTiWFbA=.b41179e5-5252-476f-8bc9-c42e2a6a507b@github.com> Message-ID: On Fri, 21 Mar 2025 15:08:22 GMT, Xiaolong Peng wrote: >> I was suggesting looking to see if normal perf measures showed any improvements. E.g. if you ran say SPECjbb and compared the remset scan times for the minor GC's that followed global collections. > > I have run h2 benchmark, here is the remembered set scan times after a global GC, it does seem to improve remembered set scan time in this case: > > PR version: > > [2025-03-21T07:35:41.801+0000][10.292s][19715][info ][gc ] GC(6) Concurrent remembered set scanning 13.069ms > [2025-03-21T07:35:48.088+0000][16.579s][19715][info ][gc ] GC(9) Concurrent remembered set scanning 5.537ms > [2025-03-21T07:35:56.610+0000][25.101s][19715][info ][gc ] GC(14) Concurrent remembered set scanning 6.186ms > [2025-03-21T07:36:03.967+0000][32.459s][19715][info ][gc ] GC(18) Concurrent remembered set scanning 9.562ms > [2025-03-21T07:36:11.234+0000][39.725s][19715][info ][gc ] GC(22) Concurrent remembered set scanning 2.591ms > [2025-03-21T07:36:17.303+0000][45.794s][19715][info ][gc ] GC(25) Concurrent remembered set scanning 0.999ms > [2025-03-21T07:36:25.647+0000][54.139s][19715][info ][gc ] GC(30) Concurrent remembered set scanning 1.665ms > [2025-03-21T07:36:32.790+0000][61.281s][19715][info ][gc ] GC(33) Concurrent remembered set scanning 2.851ms > [2025-03-21T07:36:40.241+0000][68.732s][19715][info ][gc ] GC(36) Concurrent remembered set scanning 0.716ms > [2025-03-21T07:36:47.440+0000][75.931s][19715][info ][gc ] GC(39) Concurrent remembered set scanning 1.932ms > > > master: > > [2025-03-21T07:34:04.978+0000][10.765s][17923][info ][gc ] GC(6) Concurrent remembered set scanning 22.813ms > [2025-03-21T07:34:11.250+0000][17.038s][17923][info ][gc ] GC(9) Concurrent remembered set scanning 14.457ms > [2025-03-21T07:34:18.692+0000][24.480s][17923][info ][gc ] GC(14) Concurrent remembered set scanning 4.972ms > [2025-03-21T07:34:26.033+0000][31.820s][17923][info ][gc ] GC(18) Concurrent remembered set scanning 9.134ms > [2025-03-21T07:34:34.416+0000][40.203s][17923][info ][gc ] GC(22) Concurrent remembered set scanning 3.655ms > [2025-03-21T07:34:42.180+0000][47.967s][17923][info ][gc ] GC(26) Concurrent remembered set scanning 3.253ms > [2025-03-21T07:34:49.371+0000][55.168s][17923][info ][gc ] GC(29) Concurrent remembered set scanning 1.615ms > [2025-03-21T07:34:56.592+0000][62.396s][17923][info ][gc ] GC(32) Concurrent remembered set scanning 1.570ms > [2025-03-21T07:35:03.766+0000][69.575s][17923][info ][gc ] GC(35) Concurrent remembered set scanning 1.040ms > [2025-03-21T07:35:10.941+0000][... very cool! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2013099621 From Monica.Beckwith at microsoft.com Wed Mar 26 00:48:39 2025 From: Monica.Beckwith at microsoft.com (Monica Beckwith) Date: Wed, 26 Mar 2025 00:48:39 +0000 Subject: [EXTERNAL] Re: Moving Forward with AHS for G1 In-Reply-To: References: Message-ID: Hi Man, Thanks for the update ? great to see your patch moving forward. Your changes look well-aligned with the direction Thomas outlined and what I summarized earlier: SoftMaxHeapSize guiding committed memory, integrated into resizing and IHOP logic, without overriding other tunables or introducing hard caps. I?ve updated JDK-8236073 to reflect both your and Thomas?s contributions and reassigned it to you. This change forms a solid foundation as we move toward a more responsive, multi-signal controller (AHS) for G1. Appreciate you picking this up and looking forward to seeing the PR evolve. Best, Monica [cid:7f20e45b-54c7-47bb-9434-f9eefbbb0a22] Book time to meet with me ________________________________ From: Man Cao Sent: Friday, March 21, 2025 5:54 PM To: Monica Beckwith Cc: Thomas Schatzl ; hotspot-gc-dev at openjdk.org Subject: [EXTERNAL] Re: Moving Forward with AHS for G1 Thank you for the summary and volunteering on this work! Apology for the lack of response from our side, due to other tasks and priorities. I have been experimenting with implementing SoftMaxHeapSize for G1 (JDK-8236073), and using this knob instead of ProposedHeapSize for Google's AHS project. I could probably send out a Github PR next week. >From our side, we would really like to make sure the implementation for SoftMaxHeapSize (JDK-8236073) and CurrentMaxHeapSize (JDK-8204088) work with Google's AHS project. It would be more effective if we could test with our internal workload and benchmarks during development. Is it OK if I pick up the work for SoftMaxHeapSize (JDK-8236073) and CurrentMaxHeapSize (JDK-8204088)? -Man -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-riazr0nt.png Type: image/png Size: 528 bytes Desc: Outlook-riazr0nt.png URL: From ysr at openjdk.org Wed Mar 26 00:56:20 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 26 Mar 2025 00:56:20 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v13] In-Reply-To: References: Message-ID: On Thu, 20 Mar 2025 22:48:24 GMT, Xiaolong Peng wrote: >> There are some scenarios in which GenShen may have improper remembered set verification logic: >> >> 1. Concurrent young cycles following a Full GC: >> >> In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification >> >> >> ShenandoahVerifier >> ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { >> shenandoah_assert_generations_reconciled(); >> if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { >> return _heap->complete_marking_context(); >> } >> return nullptr; >> } >> >> >> For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. >> >> 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. >> >> 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. >> >> 4. After concurrent young cycle evacuates objects from a young region, it update refs using marking bitmaps from marking context, therefore it won't update references of dead old objects(is_marked(obj) is false: obj is not marking strong/weak and it is below tams). In this case, if the next cycle if global concurrent GC, remembered set can't be verified before init-mark because of the dead pointers. >> >> ### Solution >> * After a full GC, always set marking completeness flag to false after reseting the marking bitmaps. >> * Because there could be dead pointers in old gen were not updated to point to new address after evacuation and refs update, we should disable rem-set validation before init-mark&update-refs if old marking context is incomplete. >> >> ### Test >> - [x] `make test TEST=hotspot_gc_shenandoah` >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > tide up I think the change can be pushed as is, but I am not convinced that the verification can't be tightened when old marking information is missing as long as we have a valid TAMS and there are no unparsable objects (which should only happen when coalease-&-fill has been interrupted, leaving dead objects with x-gen pointers that would cause false positives or upon class unloading when dead objects may end up being unparsable). The current condition of skipping verification when old bit maps are cleared seems to miss verification opportunities that would be valid after a completed C&F. Left some related comments, but I won't hold back this PR further. The tightening can be done subsequently (and I am happy to pick that up afterwards as needed). Thanks for your patience with my tardy and long-winded reviews! :-) src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1060: > 1058: VerifyRememberedSet verify_remembered_set = _verify_remembered_before_marking; > 1059: if (_heap->mode()->is_generational() && > 1060: !_heap->old_generation()->is_mark_complete()) { Why not the following stronger condition to skip verification? My sense is that the only case we cannot verify is if we do not have marking info _and_ old gen has been left "unparsable" (because of an incomplete/interrupted C&F which may have us look at dead objects -- that are either unparsable because of class unloading, or are parsable but hold cross-gen pointers). In all other cases, we can do a safe and complete verification. is_generational() && !old_gen->is_mark_complete() && !old_gen->is_parsable() src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1125: > 1123: VerifyRememberedSet verify_remembered_set = _verify_remembered_before_updating_references; > 1124: if (_heap->mode()->is_generational() && > 1125: !_heap->old_generation()->is_mark_complete()) { Same comment re stronger condition as previous one above. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24092#pullrequestreview-2715476210 PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2013133891 PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2013142910 From ysr at openjdk.org Wed Mar 26 00:56:21 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 26 Mar 2025 00:56:21 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v13] In-Reply-To: References: <_GG5htdXFZ2Jv3qTAyG6djSrvXDtGx-jTLGoA2JbEXU=.b8588ac1-e51f-4ddf-afda-c64e6a789440@github.com> <7pNU1UWNVucen0QwESfFkOiKIP59gBVZNF5gCHveOQ0=.144c0152-20a4-41f4-8929-b906a284be7f@github.com> <2jgKqoBrD8WfxKs9cLqfzWa5AMS__muV4O0IxTiWFbA=.b41179e5-5252-476f-8bc9-c42e2a6a507b@github.com> Message-ID: On Wed, 26 Mar 2025 00:17:44 GMT, Y. Srinivas Ramakrishna wrote: >> I have run h2 benchmark, here is the remembered set scan times after a global GC, it does seem to improve remembered set scan time in this case: >> >> PR version: >> >> [2025-03-21T07:35:41.801+0000][10.292s][19715][info ][gc ] GC(6) Concurrent remembered set scanning 13.069ms >> [2025-03-21T07:35:48.088+0000][16.579s][19715][info ][gc ] GC(9) Concurrent remembered set scanning 5.537ms >> [2025-03-21T07:35:56.610+0000][25.101s][19715][info ][gc ] GC(14) Concurrent remembered set scanning 6.186ms >> [2025-03-21T07:36:03.967+0000][32.459s][19715][info ][gc ] GC(18) Concurrent remembered set scanning 9.562ms >> [2025-03-21T07:36:11.234+0000][39.725s][19715][info ][gc ] GC(22) Concurrent remembered set scanning 2.591ms >> [2025-03-21T07:36:17.303+0000][45.794s][19715][info ][gc ] GC(25) Concurrent remembered set scanning 0.999ms >> [2025-03-21T07:36:25.647+0000][54.139s][19715][info ][gc ] GC(30) Concurrent remembered set scanning 1.665ms >> [2025-03-21T07:36:32.790+0000][61.281s][19715][info ][gc ] GC(33) Concurrent remembered set scanning 2.851ms >> [2025-03-21T07:36:40.241+0000][68.732s][19715][info ][gc ] GC(36) Concurrent remembered set scanning 0.716ms >> [2025-03-21T07:36:47.440+0000][75.931s][19715][info ][gc ] GC(39) Concurrent remembered set scanning 1.932ms >> >> >> master: >> >> [2025-03-21T07:34:04.978+0000][10.765s][17923][info ][gc ] GC(6) Concurrent remembered set scanning 22.813ms >> [2025-03-21T07:34:11.250+0000][17.038s][17923][info ][gc ] GC(9) Concurrent remembered set scanning 14.457ms >> [2025-03-21T07:34:18.692+0000][24.480s][17923][info ][gc ] GC(14) Concurrent remembered set scanning 4.972ms >> [2025-03-21T07:34:26.033+0000][31.820s][17923][info ][gc ] GC(18) Concurrent remembered set scanning 9.134ms >> [2025-03-21T07:34:34.416+0000][40.203s][17923][info ][gc ] GC(22) Concurrent remembered set scanning 3.655ms >> [2025-03-21T07:34:42.180+0000][47.967s][17923][info ][gc ] GC(26) Concurrent remembered set scanning 3.253ms >> [2025-03-21T07:34:49.371+0000][55.168s][17923][info ][gc ] GC(29) Concurrent remembered set scanning 1.615ms >> [2025-03-21T07:34:56.592+0000][62.396s][17923][info ][gc ] GC(32) Concurrent remembered set scanning 1.570ms >> [2025-03-21T07:35:03.766+0000][69.575s][17923][info ][gc ] GC(35) Concurrent remembere... > > very cool! May be as intended earlier, leave a documentation comment between lines 697 & 698 along the lines of Kelvin's comment: // After we swap card table below, the write-table is all clean, and the read table holds // cards dirty prior to the start of GC. Young and bootstrap collection will update // the write card table as a side effect of remembered set scanning. Global collection will // update the card table as a side effect of global marking of old objects. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2013107472 From Monica.Beckwith at microsoft.com Wed Mar 26 02:33:48 2025 From: Monica.Beckwith at microsoft.com (Monica Beckwith) Date: Wed, 26 Mar 2025 02:33:48 +0000 Subject: Moving Forward with AHS for G1 In-Reply-To: References: Message-ID: Hi Ivan, Thanks for the note ? and nice to meet you! The refinements you're working on around GCTimeRatio and memory uncommit are valuable contributions to the broader AHS direction we've been shaping. They align closely with the multi-input heap sizing model Thomas and I outlined ? especially the emphasis on GC cost (via GCTimeRatio) and memory responsiveness as primary drivers. These kinds of enhancements are central to making G1?s heap sizing more adaptive and responsive, particularly in environments with shifting workload patterns. I?m especially interested in your work around improving the GC time-base ? it seems like a crucial piece for coordinating GC-triggered adjustments more precisely. Given the growing collaboration across contributors, I?ve been thinking of opening an umbrella issue to track these efforts and possibly drafting a JEP to help clarify and unify the overall scope. With Oracle, Google, and others actively contributing, it?s exciting to see a shared vision taking shape ? and your work is clearly part of it. I?m genuinely excited to see this come together. Looking forward to continuing the discussion and shaping the future of G1 ergonomics together. Best, Monica [cid:989053f7-0fb6-4f61-b4fe-cc68816546af] Book time to meet with me ________________________________ From: Ivan Walulya Sent: Monday, March 24, 2025 6:43 AM To: Monica Beckwith Cc: Thomas Schatzl ; hotspot-gc-dev at openjdk.org Subject: [EXTERNAL] Re: Moving Forward with AHS for G1 Hi, Thanks for the summary. At Oracle, we are refining the use of GCTimeRatio and enhancing the memory uncommit mechanism [3]. Specifically, we are exploring uncommitting memory during any GC event, rather than restricting it to Remark or Full GCs, as in the current implementation. Additionally, we are investigating ways to improve on the current use of GC events as a time-base. // Ivan On 21 Mar 2025, at 01:19, Monica Beckwith wrote: Hi all, Following up on the previous discussions around Automatic Heap Sizing (AHS) for G1, I wanted to summarize the key takeaways and outline the next steps. In my November message [1], I described how AHS could dynamically manage heap sizing based on multiple inputs, including global memory pressure, GCTimeRatio policy, and user-defined heap tunables. This aligns with Thomas?s summary [2], which outlines how AHS integrates with G1?s existing mechanisms, including CPU-based heap resizing (JDK-8238687) [3], external constraints like CurrentMaxHeapSize (JDK-8204088) [4], and SoftMaxHeapSize (JDK-8236073) [5] as a key influence on heap adjustments. AHS will operate as a broader mechanism, where SoftMaxHeapSize serves as a heuristic to guide memory management but does not impose strict limits. It will work in conjunction with CPU-based heuristics to manage heap growth and contraction efficiently. Google?s previous work on ProposedHeapSize for G1 contributed valuable insights into adaptive heap management, but as discussions evolved, the consensus has shifted toward a model centered onSoftMaxHeapSize as a guiding input within AHS. Given this consensus, I will proceed with the implementation of JDK-8236073 [5] to ensure that AHS integrates effectively with G1?s dynamic heap sizing policies. I will share updates as the work progresses. If there are any additional concerns or areas where further clarification is needed, please let me know. Thanks again for the valuable discussions. Best, Monica ________________________________ References [1] Monica Beckwith, "Clarifications on AHS behavior and its role in G1," OpenJDK hotspot-gc-dev mailing list, November 2024. [https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050191.html] [2] Thomas Schatzl, "Giving a rough summary about the system envisioned for G1," OpenJDK hotspot-gc-dev mailing list, February 2025. [https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051069.html] [3] OpenJDK Issue, "JDK-8238687: Improve CPU-based heap sizing in G1," OpenJDK Bug Database. [https://bugs.openjdk.org/browse/JDK-8238687] [4] OpenJDK Issue, "JDK-8204088: Introduce CurrentMaxHeapSize to allow external heap constraints," OpenJDK Bug Database. [https://bugs.openjdk.org/browse/JDK-8204088] [5] OpenJDK Issue, "JDK-8236073: Introduce SoftMaxHeapSize as a guide for G1 AHS," OpenJDK Bug Database. [https://bugs.openjdk.org/browse/JDK-8236073] Book time to meet with me -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Outlook-ycaceqlr.png Type: image/png Size: 528 bytes Desc: Outlook-ycaceqlr.png URL: From manc at openjdk.org Wed Mar 26 06:40:46 2025 From: manc at openjdk.org (Man Cao) Date: Wed, 26 Mar 2025 06:40:46 GMT Subject: RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics [v2] In-Reply-To: References: Message-ID: > Hi all, > > I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to [previous PR](https://github.com/openjdk/jdk/pull/20783), and excludes code for `CurrentMaxHeapSize`. I believe I have addressed all direct concerns from [previous email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2024-November/050214.html), such as: > > - does not respect `MinHeapSize`; > - being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as `MinHeapFreeRatio`, `MaxHeapFreeRatio`; > - does not affect heuristcs to trigger a concurrent cycle; > > [This recent thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-March/051619.html) also has some context. Man Cao has updated the pull request incrementally with one additional commit since the last revision: Update copyright year. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24211/files - new: https://git.openjdk.org/jdk/pull/24211/files/b4293e4e..59d1ac69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24211&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24211/head:pull/24211 PR: https://git.openjdk.org/jdk/pull/24211 From tschatzl at openjdk.org Wed Mar 26 10:37:50 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 26 Mar 2025 10:37:50 GMT Subject: RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v28] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that implements (currently Draft) JEP: G1: Improve Application Throughput with a More Efficient Write-Barrier. > > The reason for posting this early is that this is a large change, and the JEP process is already taking very long with no end in sight but we would like to have this ready by JDK 25. > > ### Current situation > > With this change, G1 will reduce the post write barrier to much more resemble Parallel GC's as described in the JEP. The reason is that G1 lacks in throughput compared to Parallel/Serial GC due to larger barrier. > > The main reason for the current barrier is how g1 implements concurrent refinement: > * g1 tracks dirtied cards using sets (dirty card queue set - dcqs) of buffers (dirty card queues - dcq) containing the location of dirtied cards. Refinement threads pick up their contents to re-refine. The barrier needs to enqueue card locations. > * For correctness dirty card updates requires fine-grained synchronization between mutator and refinement threads, > * Finally there is generic code to avoid dirtying cards altogether (filters), to avoid executing the synchronization and the enqueuing as much as possible. > > These tasks require the current barrier to look as follows for an assignment `x.a = y` in pseudo code: > > > // Filtering > if (region(@x.a) == region(y)) goto done; // same region check > if (y == null) goto done; // null value check > if (card(@x.a) == young_card) goto done; // write to young gen check > StoreLoad; // synchronize > if (card(@x.a) == dirty_card) goto done; > > *card(@x.a) = dirty > > // Card tracking > enqueue(card-address(@x.a)) into thread-local-dcq; > if (thread-local-dcq is not full) goto done; > > call runtime to move thread-local-dcq into dcqs > > done: > > > Overall this post-write barrier alone is in the range of 40-50 total instructions, compared to three or four(!) for parallel and serial gc. > > The large size of the inlined barrier not only has a large code footprint, but also prevents some compiler optimizations like loop unrolling or inlining. > > There are several papers showing that this barrier alone can decrease throughput by 10-20% ([Yang12](https://dl.acm.org/doi/10.1145/2426642.2259004)), which is corroborated by some benchmarks (see links). > > The main idea for this change is to not use fine-grained synchronization between refinement and mutator threads, but coarse grained based on atomically switching card tables. Mutators only work on the "primary" card table, refinement threads on a se... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 36 commits: - Merge branch 'master' into 8342382-card-table-instead-of-dcq - Merge branch 'master' into submit/8342382-card-table-instead-of-dcq - * make young gen length revising independent of refinement thread * use a service task * both refinement control thread and young gen length revising use the same infrastructure to get the number of available bytes and determine the time to the next update - * fix IR code generation tests that change due to barrier cost changes - * factor out card table and refinement table merging into a single method - Merge branch 'master' into 8342382-card-table-instead-of-dcq3 - * obsolete G1UpdateBufferSize G1UpdateBufferSize has previously been used to size the refinement buffers and impose a minimum limit on the number of cards per thread that need to be pending before refinement starts. The former function is now obsolete with the removal of the dirty card queues, the latter functionality has been taken over by the new diagnostic option `G1PerThreadPendingCardThreshold`. I prefer to make this a diagnostic option is better than a product option because it is something that is only necessary for some test cases to produce some otherwise unwanted behavior (continuous refinement). CSR is pending. - * more documentation on why we need to rendezvous the gc threads - Merge branch 'master' into 8342381-card-table-instead-of-dcq - * ayang review * re-add STS leaver for java thread handshake - ... and 26 more: https://git.openjdk.org/jdk/compare/059f190f...6d574da0 ------------- Changes: https://git.openjdk.org/jdk/pull/23739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23739&range=27 Stats: 7089 lines in 110 files changed: 2610 ins; 3555 del; 924 mod Patch: https://git.openjdk.org/jdk/pull/23739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23739/head:pull/23739 PR: https://git.openjdk.org/jdk/pull/23739 From aboldtch at openjdk.org Wed Mar 26 13:59:17 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 26 Mar 2025 13:59:17 GMT Subject: RFR: 8348907: Stress times out when is executed with ZGC [v7] In-Reply-To: References: Message-ID: <9fC_0ZRvvRL_WvqU-OVI4HvQzJ3nOVGBkVN1891k4Uk=.a3f4d0ee-6cba-4db0-85b6-f5544790438e@github.com> On Tue, 25 Mar 2025 17:13:55 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> Here is a suggested solution for solving the intricate deadlock issues involving virtual threads, ZGC load barriers, and JFR. >> >> A JFR event can be allocated and committed in specific sensitive contexts, such as inside mutex-protected load barriers. If the thread is a virtual thread, JFR determines its thread name by loading the oop from the thread (jt->vthread()) as part of the event commit. >> >> This operation again triggers the load barrier, which contains a non-reentrant lock, effectively deadlocking the thread with itself. >> >> So, for specific sensitive event sites, JFR mustn't recurse or reenter into the same event site as part of the event commit. >> >> After a few iterations and prototypes, which failed because they eventually ended up touching some oop, I came up with the following. >> >> From a user perspective, an event (site) can now be marked as "non-reentrant" by wrapping it in a helper class. >> >> This instruction now guarantees JFR will not reenter this site again as part of the event.commit(). >> >> The tradeoff is that we cannot write the virtual thread name for these sensitive event sites; we will instead report "" as the virtual thread name, which is the default virtual thread name in Java. All other information about the thread, such as the thread ID, virtual thread, etc., will still be reported. >> >> I believe it is a reasonable tradeoff and a general solution for sensitive JFR event sites, which are rare in practice, with minimal impact on event programming. >> >> Testing: jdk_jfr, stress testing >> >> Let me know what you think. >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > delegate thread assert Looks like a pragmatic solution. I am not reviewing the the implications of not setting the epoch, as my understanding here is a bit lacking. The name `JfrNonReentrant` seems a little general for how tightly coupled the property is to running on a virtual thread and loading an oop. At the same time this is currently the only interaction which exhibits problems with reentry, and I am not sure if there is a better name. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24209#pullrequestreview-2717293129 From eosterlund at openjdk.org Wed Mar 26 15:11:14 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 26 Mar 2025 15:11:14 GMT Subject: RFR: 8348907: Stress times out when is executed with ZGC [v7] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 17:13:55 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> Here is a suggested solution for solving the intricate deadlock issues involving virtual threads, ZGC load barriers, and JFR. >> >> A JFR event can be allocated and committed in specific sensitive contexts, such as inside mutex-protected load barriers. If the thread is a virtual thread, JFR determines its thread name by loading the oop from the thread (jt->vthread()) as part of the event commit. >> >> This operation again triggers the load barrier, which contains a non-reentrant lock, effectively deadlocking the thread with itself. >> >> So, for specific sensitive event sites, JFR mustn't recurse or reenter into the same event site as part of the event commit. >> >> After a few iterations and prototypes, which failed because they eventually ended up touching some oop, I came up with the following. >> >> From a user perspective, an event (site) can now be marked as "non-reentrant" by wrapping it in a helper class. >> >> This instruction now guarantees JFR will not reenter this site again as part of the event.commit(). >> >> The tradeoff is that we cannot write the virtual thread name for these sensitive event sites; we will instead report "" as the virtual thread name, which is the default virtual thread name in Java. All other information about the thread, such as the thread ID, virtual thread, etc., will still be reported. >> >> I believe it is a reasonable tradeoff and a general solution for sensitive JFR event sites, which are rare in practice, with minimal impact on event programming. >> >> Testing: jdk_jfr, stress testing >> >> Let me know what you think. >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > delegate thread assert This seems like a pragmatic workaround for the problem. We might want to revisit this at some point to make it easier to use JFR in the GC code, but I think this is an appropriate fix for the bug right now. Thanks for fixing this Markus. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24209#pullrequestreview-2717616819 From iwalulya at openjdk.org Wed Mar 26 15:31:46 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 26 Mar 2025 15:31:46 GMT Subject: RFR: 8352765: G1CollectedHeap::expand_and_allocate() may fail to allocate even after heap expansion succeeds Message-ID: Hi, Please review this change to ensure that G1 provisions for at least one Eden region after a GC when computing the young length target. The issue reported in the CR occurs at the end of a GC, after successfully expanding the heap, an allocation fails because `policy()->should_allocate_mutator_region()` returns false. This happens because the computation did not properly account for young regions already allocated as survivor regions, leading to an Eden region target of zero. With this change, we factor in the young regions that have already been allocated as survivor regions and ensure that at least one region is targeted for Eden. Testing: Tier 1-3 Reproducer in the CR. ------------- Commit messages: - init - init Changes: https://git.openjdk.org/jdk/pull/24257/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24257&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352765 Stats: 19 lines in 1 file changed: 6 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/24257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24257/head:pull/24257 PR: https://git.openjdk.org/jdk/pull/24257 From xpeng at openjdk.org Wed Mar 26 15:41:25 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Mar 2025 15:41:25 GMT Subject: Integrated: 8352588: GenShen: Enabling JFR asserts when getting GCId In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 19:09:46 GMT, Xiaolong Peng wrote: > ### Root cause > Shenandoah has its own way to generate gc id([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L234), [link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahController.hpp#L43)), but when it runs a specific GC cycle, it still use the default GCIdMark([link](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp#L389)) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is `undefined`, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events. > > ### Solution > it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in `ShenandoahControlThread::run_service` and `ShenandoahGenerationalControlThread::run_gc_cycle` > > In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it. > > ### Test > - [x] TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording" > - [x] TEST=hotspot_gc_shenandoah > - [x] GHA This pull request has now been integrated. Changeset: a2a64dac Author: Xiaolong Peng Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/a2a64dac1680e97dd9eb511ead951bf1be8121c6 Stats: 11 lines in 2 files changed: 4 ins; 7 del; 0 mod 8352588: GenShen: Enabling JFR asserts when getting GCId Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/jdk/pull/24166 From mgronlun at openjdk.org Wed Mar 26 17:33:22 2025 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 26 Mar 2025 17:33:22 GMT Subject: Integrated: 8348907: Stress times out when is executed with ZGC In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 22:26:57 GMT, Markus Gr?nlund wrote: > Greetings, > > Here is a suggested solution for solving the intricate deadlock issues involving virtual threads, ZGC load barriers, and JFR. > > A JFR event can be allocated and committed in specific sensitive contexts, such as inside mutex-protected load barriers. If the thread is a virtual thread, JFR determines its thread name by loading the oop from the thread (jt->vthread()) as part of the event commit. > > This operation again triggers the load barrier, which contains a non-reentrant lock, effectively deadlocking the thread with itself. > > So, for specific sensitive event sites, JFR mustn't recurse or reenter into the same event site as part of the event commit. > > After a few iterations and prototypes, which failed because they eventually ended up touching some oop, I came up with the following. > > From a user perspective, an event (site) can now be marked as "non-reentrant" by wrapping it in a helper class. > > This instruction now guarantees JFR will not reenter this site again as part of the event.commit(). > > The tradeoff is that we cannot write the virtual thread name for these sensitive event sites; we will instead report "" as the virtual thread name, which is the default virtual thread name in Java. All other information about the thread, such as the thread ID, virtual thread, etc., will still be reported. > > I believe it is a reasonable tradeoff and a general solution for sensitive JFR event sites, which are rare in practice, with minimal impact on event programming. > > Testing: jdk_jfr, stress testing > > Let me know what you think. > > Thanks > Markus This pull request has now been integrated. Changeset: c2a4fed9 Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/c2a4fed98c4e17880dd40c19cb73072efea8c583 Stats: 142 lines in 10 files changed: 119 ins; 13 del; 10 mod 8348907: Stress times out when is executed with ZGC Reviewed-by: egahlin, aboldtch, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/24209 From wkemper at openjdk.org Wed Mar 26 17:36:21 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 26 Mar 2025 17:36:21 GMT Subject: RFR: 8348400: GenShen: assert(ShenandoahHeap::heap()->is_full_gc_in_progress() || (used_regions_size() <= _max_capacity)) failed: Cannot use more than capacity # [v3] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 21:16:06 GMT, William Kemper wrote: >> Shenandoah cannot recycle immediate trash regions during the concurrent weak roots phase, however some of these regions may be assigned to the old generation collector's reserve. When an evacuation/promotion tries to allocate in such a region, it will fail (as expected) and try to 'steal' a region from the mutator's partition of the free set. There are cases when this cannot be allowed due to capacity constraints. However, in some of these cases it will be possible to 'swap' a region between the old reserve and the mutator's partition. This change covers this case. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Update evac reserve when swapping trash region for non-trash region > - Use capacity of transferred region > - Improve comments > - Merge remote-tracking branch 'jdk/master' into fix-flip-to-old-reserve > - Revert "Do not enforce size constraints on generations" > > This reverts commit 11ff0677449fa6749df8830f4a03f1c7861ba314. > - Do not enforce size constraints on generations > > This will make it easier for the old generation collector to take regions from the mutator when necessary > - Don't allocate in regions that cannot be flipped to old gc > - Do not allocate from mutator if young gen cannot spare the region No assertions after running `TestPauseNotifications` 40,000 times, no failures in performance/stress test pipelines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23998#issuecomment-2755193068 From wkemper at openjdk.org Wed Mar 26 19:21:17 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 26 Mar 2025 19:21:17 GMT Subject: RFR: 8352918: Shenandoah: Verifier does not deactivate barriers as intended Message-ID: When verifying reachable objects, Shenandoah's verifier clears the `_gc_state` with the intention of deactivating barriers. However, the mechanism for this is a `friend` of the heap and does not toggle the flag to cause threads to use the value set on the verifier's safepoint. The net effect here is that the barriers are _not_ deactivated during verification. Leaving the barriers on while the verifier traverses the heap may have unintended consequences (cards marked, objects evacuated, etc.) ------------- Commit messages: - Fix verifier's gc_state resettter Changes: https://git.openjdk.org/jdk/pull/24264/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24264&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352918 Stats: 8 lines in 2 files changed: 7 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24264.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24264/head:pull/24264 PR: https://git.openjdk.org/jdk/pull/24264 From kdnilsen at openjdk.org Wed Mar 26 20:04:13 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 26 Mar 2025 20:04:13 GMT Subject: RFR: 8352918: Shenandoah: Verifier does not deactivate barriers as intended In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 19:17:33 GMT, William Kemper wrote: > When verifying reachable objects, Shenandoah's verifier clears the `_gc_state` with the intention of deactivating barriers. However, the mechanism for this is a `friend` of the heap and does not toggle the flag to cause threads to use the value set on the verifier's safepoint. The net effect here is that the barriers are _not_ deactivated during verification. Leaving the barriers on while the verifier traverses the heap may have unintended consequences (cards marked, objects evacuated, etc.) Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24264#pullrequestreview-2718492011 From shade at openjdk.org Wed Mar 26 20:09:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 20:09:14 GMT Subject: RFR: 8352918: Shenandoah: Verifier does not deactivate barriers as intended In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 19:17:33 GMT, William Kemper wrote: > When verifying reachable objects, Shenandoah's verifier clears the `_gc_state` with the intention of deactivating barriers. However, the mechanism for this is a `friend` of the heap and does not toggle the flag to cause threads to use the value set on the verifier's safepoint. The net effect here is that the barriers are _not_ deactivated during verification. Leaving the barriers on while the verifier traverses the heap may have unintended consequences (cards marked, objects evacuated, etc.) Ouch. Took me a while to understand which field is updated, since fields are named the same in `ShenandoahHeap` and here. I suggest renaming fields in `ShenandoahGCStateResetter` to `_saved_gc_state` and `_saved_gc_state_changed`. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24264#pullrequestreview-2718502261 From xpeng at openjdk.org Wed Mar 26 20:37:59 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Mar 2025 20:37:59 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v14] In-Reply-To: References: Message-ID: > There are some scenarios in which GenShen may have improper remembered set verification logic: > > 1. Concurrent young cycles following a Full GC: > > In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification > > > ShenandoahVerifier > ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { > shenandoah_assert_generations_reconciled(); > if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { > return _heap->complete_marking_context(); > } > return nullptr; > } > > > For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. > > 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. > > 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. > > 4. After concurrent young cycle evacuates objects from a young region, it update refs using marking bitmaps from marking context, therefore it won't update references of dead old objects(is_marked(obj) is false: obj is not marking strong/weak and it is below tams). In this case, if the next cycle if global concurrent GC, remembered set can't be verified before init-mark because of the dead pointers. > > ### Solution > * After a full GC, always set marking completeness flag to false after reseting the marking bitmaps. > * Because there could be dead pointers in old gen were not updated to point to new address after evacuation and refs update, we should disable rem-set validation before init-mark&update-refs if old marking context is incomplete. > > ### Test > - [x] `make test TEST=hotspot_gc_shenandoah` > - [x] GHA Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Add comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24092/files - new: https://git.openjdk.org/jdk/pull/24092/files/16494d48..e11c6fc3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24092&range=12-13 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092 PR: https://git.openjdk.org/jdk/pull/24092 From xpeng at openjdk.org Wed Mar 26 20:37:59 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Mar 2025 20:37:59 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v14] In-Reply-To: References: <_GG5htdXFZ2Jv3qTAyG6djSrvXDtGx-jTLGoA2JbEXU=.b8588ac1-e51f-4ddf-afda-c64e6a789440@github.com> <7pNU1UWNVucen0QwESfFkOiKIP59gBVZNF5gCHveOQ0=.144c0152-20a4-41f4-8929-b906a284be7f@github.com> <2jgKqoBrD8WfxKs9cLqfzWa5AMS__muV4O0IxTiWFbA=.b41179e5-5252-476f-8bc9-c42e2a6a507b@github.com> Message-ID: <0lv1sy_XF3zy1Xr2JTfNOZFk5lQDf7uifcfY8HF2mYw=.2590df3d-49a2-4fca-afa0-859ebc1cf44e@github.com> On Wed, 26 Mar 2025 00:30:55 GMT, Y. Srinivas Ramakrishna wrote: >> very cool! > > May be as intended earlier, leave a documentation comment between lines 697 & 698 along the lines of Kelvin's comment: > > > // After we swap card table below, the write-table is all clean, and the read table holds > // cards dirty prior to the start of GC. Young and bootstrap collection will update > // the write card table as a side effect of remembered set scanning. Global collection will > // update the card table as a side effect of global marking of old objects. Sorry, I added comments as you suggested but forgot to push.Now the comment has been added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2014959494 From wkemper at openjdk.org Wed Mar 26 20:46:32 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 26 Mar 2025 20:46:32 GMT Subject: RFR: 8352918: Shenandoah: Verifier does not deactivate barriers as intended [v2] In-Reply-To: References: Message-ID: <8Df_0rWlNraMfHLDM9nkevd6bgJE6yTN65MuMXiC458=.4a00283d-b6a4-408f-b143-8c9057b3fd21@github.com> > When verifying reachable objects, Shenandoah's verifier clears the `_gc_state` with the intention of deactivating barriers. However, the mechanism for this is a `friend` of the heap and does not toggle the flag to cause threads to use the value set on the verifier's safepoint. The net effect here is that the barriers are _not_ deactivated during verification. Leaving the barriers on while the verifier traverses the heap may have unintended consequences (cards marked, objects evacuated, etc.) William Kemper has updated the pull request incrementally with one additional commit since the last revision: Rename saved fields so they are easier to distinguish from the heap's fields ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24264/files - new: https://git.openjdk.org/jdk/pull/24264/files/64403639..2c86db47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24264&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24264&range=00-01 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24264.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24264/head:pull/24264 PR: https://git.openjdk.org/jdk/pull/24264 From xpeng at openjdk.org Wed Mar 26 20:49:17 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 26 Mar 2025 20:49:17 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v13] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 00:44:18 GMT, Y. Srinivas Ramakrishna wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> tide up > > src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1060: > >> 1058: VerifyRememberedSet verify_remembered_set = _verify_remembered_before_marking; >> 1059: if (_heap->mode()->is_generational() && >> 1060: !_heap->old_generation()->is_mark_complete()) { > > Why not the following stronger condition to skip verification? My sense is that the only case we cannot verify is if we do not have marking info _and_ old gen has been left "unparsable" (because of an incomplete/interrupted C&F which may have us look at dead objects -- that are either unparsable because of class unloading, or are parsable but hold cross-gen pointers). In all other cases, we can do a safe and complete verification. > > > is_generational() && !old_gen->is_mark_complete() && !old_gen->is_parsable() We may not need to worry about it, old_gen becomes not parsable in class unloading phase of a global concurrent GC, marking is already done for the global including old gen, there should be always complete marking for old when old gen is not parsable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24092#discussion_r2014974376 From kdnilsen at openjdk.org Wed Mar 26 21:19:14 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 26 Mar 2025 21:19:14 GMT Subject: RFR: 8352918: Shenandoah: Verifier does not deactivate barriers as intended [v2] In-Reply-To: <8Df_0rWlNraMfHLDM9nkevd6bgJE6yTN65MuMXiC458=.4a00283d-b6a4-408f-b143-8c9057b3fd21@github.com> References: <8Df_0rWlNraMfHLDM9nkevd6bgJE6yTN65MuMXiC458=.4a00283d-b6a4-408f-b143-8c9057b3fd21@github.com> Message-ID: On Wed, 26 Mar 2025 20:46:32 GMT, William Kemper wrote: >> When verifying reachable objects, Shenandoah's verifier clears the `_gc_state` with the intention of deactivating barriers. However, the mechanism for this is a `friend` of the heap and does not toggle the flag to cause threads to use the value set on the verifier's safepoint. The net effect here is that the barriers are _not_ deactivated during verification. Leaving the barriers on while the verifier traverses the heap may have unintended consequences (cards marked, objects evacuated, etc.) > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Rename saved fields so they are easier to distinguish from the heap's fields Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24264#pullrequestreview-2718638195 From ysr at openjdk.org Wed Mar 26 22:07:21 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 26 Mar 2025 22:07:21 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v14] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 20:37:59 GMT, Xiaolong Peng wrote: >> There are some scenarios in which GenShen may have improper remembered set verification logic: >> >> 1. Concurrent young cycles following a Full GC: >> >> In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification >> >> >> ShenandoahVerifier >> ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { >> shenandoah_assert_generations_reconciled(); >> if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { >> return _heap->complete_marking_context(); >> } >> return nullptr; >> } >> >> >> For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. >> >> 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. >> >> 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. >> >> 4. After concurrent young cycle evacuates objects from a young region, it update refs using marking bitmaps from marking context, therefore it won't update references of dead old objects(is_marked(obj) is false: obj is not marking strong/weak and it is below tams). In this case, if the next cycle if global concurrent GC, remembered set can't be verified before init-mark because of the dead pointers. >> >> ### Solution >> * After a full GC, always set marking completeness flag to false after reseting the marking bitmaps. >> * Because there could be dead pointers in old gen were not updated to point to new address after evacuation and refs update, we should disable rem-set validation before init-mark&update-refs if old marking context is incomplete. >> >> ### Test >> - [x] `make test TEST=hotspot_gc_shenandoah` >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Add comments ? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24092#pullrequestreview-2718713521 From shade at openjdk.org Wed Mar 26 22:17:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 26 Mar 2025 22:17:13 GMT Subject: RFR: 8352918: Shenandoah: Verifier does not deactivate barriers as intended [v2] In-Reply-To: <8Df_0rWlNraMfHLDM9nkevd6bgJE6yTN65MuMXiC458=.4a00283d-b6a4-408f-b143-8c9057b3fd21@github.com> References: <8Df_0rWlNraMfHLDM9nkevd6bgJE6yTN65MuMXiC458=.4a00283d-b6a4-408f-b143-8c9057b3fd21@github.com> Message-ID: On Wed, 26 Mar 2025 20:46:32 GMT, William Kemper wrote: >> When verifying reachable objects, Shenandoah's verifier clears the `_gc_state` with the intention of deactivating barriers. However, the mechanism for this is a `friend` of the heap and does not toggle the flag to cause threads to use the value set on the verifier's safepoint. The net effect here is that the barriers are _not_ deactivated during verification. Leaving the barriers on while the verifier traverses the heap may have unintended consequences (cards marked, objects evacuated, etc.) > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Rename saved fields so they are easier to distinguish from the heap's fields Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24264#pullrequestreview-2718728122 From wkemper at openjdk.org Wed Mar 26 23:55:44 2025 From: wkemper at openjdk.org (William Kemper) Date: Wed, 26 Mar 2025 23:55:44 GMT Subject: RFR: 8351892: GenShen: Remove enforcement of generation sizes Message-ID: * The option to configure minimum and maximum sizes for the young generation have been combined into `ShenandoahInitYoungPercentage`. * The remaining functionality in `shGenerationSizer` wasn't enough to warrant being its own class, so the functionality was rolled into `shGenerationalHeap`. ------------- Commit messages: - Stop enforcing young/old generation sizes. Changes: https://git.openjdk.org/jdk/pull/24268/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24268&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351892 Stats: 395 lines in 11 files changed: 57 ins; 315 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/24268.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24268/head:pull/24268 PR: https://git.openjdk.org/jdk/pull/24268 From ysr at openjdk.org Thu Mar 27 01:01:20 2025 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 27 Mar 2025 01:01:20 GMT Subject: RFR: 8352918: Shenandoah: Verifier does not deactivate barriers as intended [v2] In-Reply-To: <8Df_0rWlNraMfHLDM9nkevd6bgJE6yTN65MuMXiC458=.4a00283d-b6a4-408f-b143-8c9057b3fd21@github.com> References: <8Df_0rWlNraMfHLDM9nkevd6bgJE6yTN65MuMXiC458=.4a00283d-b6a4-408f-b143-8c9057b3fd21@github.com> Message-ID: On Wed, 26 Mar 2025 20:46:32 GMT, William Kemper wrote: >> When verifying reachable objects, Shenandoah's verifier clears the `_gc_state` with the intention of deactivating barriers. However, the mechanism for this is a `friend` of the heap and does not toggle the flag to cause threads to use the value set on the verifier's safepoint. The net effect here is that the barriers are _not_ deactivated during verification. Leaving the barriers on while the verifier traverses the heap may have unintended consequences (cards marked, objects evacuated, etc.) > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Rename saved fields so they are easier to distinguish from the heap's fields ? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24264#pullrequestreview-2718945983 From dholmes at openjdk.org Thu Mar 27 04:55:06 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 27 Mar 2025 04:55:06 GMT Subject: RFR: 8352762: Use EXACTFMT instead of expanded version where applicable In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 13:59:14 GMT, Joel Sikstr?m wrote: > [JDK-8310233](https://bugs.openjdk.org/browse/JDK-8310233) introduced the EXACTFMT macro, which is a shorthand for printing exact values using methods defined in globalDefinitions.hpp. There are currently 20 places in HotSpot which uses the expanded version of the macro, along with the "trace_page_size_params" macro that is defined and used in os.cpp. > > I have replaced places that use the expanded macro(s) with EXACTFMT + EXACTFMTARGS, and also removed trace_page_size_params from os.cpp, which was essentially a redefnition of EXACTFMTARGS. > > Testing: GHA, tiers 1-4 Paging @tstuefe ! Thomas added EXACTFMT in [JDK-8310233](https://github.com/openjdk/jdk/pull/14739/files#top) and did not use it for some of the places where you are now using it. Despite being a reviewer of Thomas's change, I'm not all sure when EXACTFMT should be used ------------- PR Comment: https://git.openjdk.org/jdk/pull/24228#issuecomment-2756673746 From dholmes at openjdk.org Thu Mar 27 05:03:06 2025 From: dholmes at openjdk.org (David Holmes) Date: Thu, 27 Mar 2025 05:03:06 GMT Subject: RFR: 8352762: Use EXACTFMT instead of expanded version where applicable In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 13:59:14 GMT, Joel Sikstr?m wrote: > [JDK-8310233](https://bugs.openjdk.org/browse/JDK-8310233) introduced the EXACTFMT macro, which is a shorthand for printing exact values using methods defined in globalDefinitions.hpp. There are currently 20 places in HotSpot which uses the expanded version of the macro, along with the "trace_page_size_params" macro that is defined and used in os.cpp. > > I have replaced places that use the expanded macro(s) with EXACTFMT + EXACTFMTARGS, and also removed trace_page_size_params from os.cpp, which was essentially a redefnition of EXACTFMTARGS. > > Testing: GHA, tiers 1-4 This looks very consistent and reasonable to me. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24228#pullrequestreview-2719664900 From stuefe at openjdk.org Thu Mar 27 08:09:09 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 27 Mar 2025 08:09:09 GMT Subject: RFR: 8352762: Use EXACTFMT instead of expanded version where applicable In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 13:59:14 GMT, Joel Sikstr?m wrote: > [JDK-8310233](https://bugs.openjdk.org/browse/JDK-8310233) introduced the EXACTFMT macro, which is a shorthand for printing exact values using methods defined in globalDefinitions.hpp. There are currently 20 places in HotSpot which uses the expanded version of the macro, along with the "trace_page_size_params" macro that is defined and used in os.cpp. > > I have replaced places that use the expanded macro(s) with EXACTFMT + EXACTFMTARGS, and also removed trace_page_size_params from os.cpp, which was essentially a redefnition of EXACTFMTARGS. > > Testing: GHA, tiers 1-4 Neat, thank you. Looks good. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24228#pullrequestreview-2720217267 From mdoerr at openjdk.org Thu Mar 27 10:12:12 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 27 Mar 2025 10:12:12 GMT Subject: RFR: 8352508: [Redo] G1: Pinned regions with pinned objects only reachable by native code crash VM In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 11:13:27 GMT, Thomas Schatzl wrote: > Fwiw, I asked the reporter to try this fix on their private failing test, going to wait for their result too. Our tests have passed several times. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24147#issuecomment-2757477440 From tschatzl at openjdk.org Thu Mar 27 11:54:07 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 27 Mar 2025 11:54:07 GMT Subject: RFR: 8352765: G1CollectedHeap::expand_and_allocate() may fail to allocate even after heap expansion succeeds In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 15:05:46 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to ensure that G1 provisions for at least one Eden region after a GC when computing the young length target. > > The issue reported in the CR occurs at the end of a GC, after successfully expanding the heap, an allocation fails because `policy()->should_allocate_mutator_region()` returns false. This happens because the computation did not properly account for young regions already allocated as survivor regions, leading to an Eden region target of zero. > > With this change, we factor in the young regions that have already been allocated as survivor regions and ensure that at least one region is targeted for Eden. > > Testing: Tier 1-3 > Reproducer in the CR. Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/g1/g1Policy.cpp line 360: > 358: log_trace(gc, ergo, heap)("Young target length: Fully eat into reserve " > 359: "receiving young %u receiving additional eden %u", > 360: receiving_eden, the log message reads "receiving young" still. src/hotspot/share/gc/g1/g1Policy.cpp line 383: > 381: "receiving additional eden %u", > 382: free_outside_reserve, receiving_within_reserve, > 383: receiving_eden, receiving_additional_eden); Log message needs update. ------------- PR Review: https://git.openjdk.org/jdk/pull/24257#pullrequestreview-2721033969 PR Review Comment: https://git.openjdk.org/jdk/pull/24257#discussion_r2016346138 PR Review Comment: https://git.openjdk.org/jdk/pull/24257#discussion_r2016347443 From ayang at openjdk.org Thu Mar 27 12:01:07 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 27 Mar 2025 12:01:07 GMT Subject: RFR: 8352508: [Redo] G1: Pinned regions with pinned objects only reachable by native code crash VM [v4] In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 11:37:57 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that re-implements the fix for [JDK-8351921](https://bugs.openjdk.org/browse/JDK-8351921); in that fix we (think we) forgot to consider the same situation with optional regions. >> >> I.e. the previous fix only fixed the situation occurring during initial evacuation, however as we add regions due to optional evacuation, the same situation can still happen. >> >> So this change adds some work to every evacuation phase that marks all pinned regions in the current collection set as evacuation failed/pinned instead of only doing this work once in the pre evacuation phase. >> >> As for testing, it is extremely hard to induce a situation where there is a pinned region with no apparent live objects in an optional collection set, so I gave up and just added the original test again. >> >> Testing: gha, test >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * improve test to also test empty pinned humongous regions Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24147#pullrequestreview-2721084350 From iwalulya at openjdk.org Thu Mar 27 13:48:13 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 27 Mar 2025 13:48:13 GMT Subject: RFR: 8352508: [Redo] G1: Pinned regions with pinned objects only reachable by native code crash VM [v4] In-Reply-To: References: Message-ID: <3n_Z5rbTdFUF8IqCtU2Y-KsoQFGUS9Ykp7fS1SKO4XI=.128e77c9-186f-4636-b703-8c284ca30a1c@github.com> On Mon, 24 Mar 2025 11:37:57 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this change that re-implements the fix for [JDK-8351921](https://bugs.openjdk.org/browse/JDK-8351921); in that fix we (think we) forgot to consider the same situation with optional regions. >> >> I.e. the previous fix only fixed the situation occurring during initial evacuation, however as we add regions due to optional evacuation, the same situation can still happen. >> >> So this change adds some work to every evacuation phase that marks all pinned regions in the current collection set as evacuation failed/pinned instead of only doing this work once in the pre evacuation phase. >> >> As for testing, it is extremely hard to induce a situation where there is a pinned region with no apparent live objects in an optional collection set, so I gave up and just added the original test again. >> >> Testing: gha, test >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > * improve test to also test empty pinned humongous regions Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24147#pullrequestreview-2721578600 From tschatzl at openjdk.org Thu Mar 27 14:33:26 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 27 Mar 2025 14:33:26 GMT Subject: RFR: 8352508: [Redo] G1: Pinned regions with pinned objects only reachable by native code crash VM [v4] In-Reply-To: <3n_Z5rbTdFUF8IqCtU2Y-KsoQFGUS9Ykp7fS1SKO4XI=.128e77c9-186f-4636-b703-8c284ca30a1c@github.com> References: <3n_Z5rbTdFUF8IqCtU2Y-KsoQFGUS9Ykp7fS1SKO4XI=.128e77c9-186f-4636-b703-8c284ca30a1c@github.com> Message-ID: On Thu, 27 Mar 2025 13:45:35 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> * improve test to also test empty pinned humongous regions > > Marked as reviewed by iwalulya (Reviewer). Thanks @walulyai @albertnetymk for your reviews. Thanks @TheRealMDoerr for your thorough testing (and reporting). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24147#issuecomment-2758267318 From iwalulya at openjdk.org Thu Mar 27 14:33:34 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 27 Mar 2025 14:33:34 GMT Subject: RFR: 8352765: G1CollectedHeap::expand_and_allocate() may fail to allocate even after heap expansion succeeds [v2] In-Reply-To: References: Message-ID: > Hi, > > Please review this change to ensure that G1 provisions for at least one Eden region after a GC when computing the young length target. > > The issue reported in the CR occurs at the end of a GC, after successfully expanding the heap, an allocation fails because `policy()->should_allocate_mutator_region()` returns false. This happens because the computation did not properly account for young regions already allocated as survivor regions, leading to an Eden region target of zero. > > With this change, we factor in the young regions that have already been allocated as survivor regions and ensure that at least one region is targeted for Eden. > > Testing: Tier 1-3 > Reproducer in the CR. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Thomas Review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24257/files - new: https://git.openjdk.org/jdk/pull/24257/files/7eea1ee3..041b478c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24257&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24257&range=00-01 Stats: 7 lines in 1 file changed: 1 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24257.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24257/head:pull/24257 PR: https://git.openjdk.org/jdk/pull/24257 From tschatzl at openjdk.org Thu Mar 27 14:33:27 2025 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 27 Mar 2025 14:33:27 GMT Subject: Integrated: 8352508: [Redo] G1: Pinned regions with pinned objects only reachable by native code crash VM In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 08:07:35 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that re-implements the fix for [JDK-8351921](https://bugs.openjdk.org/browse/JDK-8351921); in that fix we (think we) forgot to consider the same situation with optional regions. > > I.e. the previous fix only fixed the situation occurring during initial evacuation, however as we add regions due to optional evacuation, the same situation can still happen. > > So this change adds some work to every evacuation phase that marks all pinned regions in the current collection set as evacuation failed/pinned instead of only doing this work once in the pre evacuation phase. > > As for testing, it is extremely hard to induce a situation where there is a pinned region with no apparent live objects in an optional collection set, so I gave up and just added the original test again. > > Testing: gha, test > > Thanks, > Thomas This pull request has now been integrated. Changeset: c50a0a1f Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/c50a0a1fc126a67528448b282bcfc375abfac142 Stats: 167 lines in 6 files changed: 152 ins; 11 del; 4 mod 8352508: [Redo] G1: Pinned regions with pinned objects only reachable by native code crash VM Reviewed-by: ayang, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/24147 From jsikstro at openjdk.org Thu Mar 27 15:52:19 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 27 Mar 2025 15:52:19 GMT Subject: RFR: 8352762: Use EXACTFMT instead of expanded version where applicable In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 04:52:17 GMT, David Holmes wrote: >> [JDK-8310233](https://bugs.openjdk.org/browse/JDK-8310233) introduced the EXACTFMT macro, which is a shorthand for printing exact values using methods defined in globalDefinitions.hpp. There are currently 20 places in HotSpot which uses the expanded version of the macro, along with the "trace_page_size_params" macro that is defined and used in os.cpp. >> >> I have replaced places that use the expanded macro(s) with EXACTFMT + EXACTFMTARGS, and also removed trace_page_size_params from os.cpp, which was essentially a redefnition of EXACTFMTARGS. >> >> Testing: GHA, tiers 1-4 > > Paging @tstuefe ! Thomas added EXACTFMT in [JDK-8310233](https://github.com/openjdk/jdk/pull/14739/files#top) and did not use it for some of the places where you are now using it. Despite being a reviewer of Thomas's change, I'm not at all sure when EXACTFMT should be used. But this looks good. Thank you for the reviews! @dholmes-ora @tstuefe ------------- PR Comment: https://git.openjdk.org/jdk/pull/24228#issuecomment-2758536909 From jsikstro at openjdk.org Thu Mar 27 15:52:19 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 27 Mar 2025 15:52:19 GMT Subject: Integrated: 8352762: Use EXACTFMT instead of expanded version where applicable In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 13:59:14 GMT, Joel Sikstr?m wrote: > [JDK-8310233](https://bugs.openjdk.org/browse/JDK-8310233) introduced the EXACTFMT macro, which is a shorthand for printing exact values using methods defined in globalDefinitions.hpp. There are currently 20 places in HotSpot which uses the expanded version of the macro, along with the "trace_page_size_params" macro that is defined and used in os.cpp. > > I have replaced places that use the expanded macro(s) with EXACTFMT + EXACTFMTARGS, and also removed trace_page_size_params from os.cpp, which was essentially a redefnition of EXACTFMTARGS. > > Testing: GHA, tiers 1-4 This pull request has now been integrated. Changeset: dc5c4148 Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/dc5c4148c70ca43d0a69c326e14898adca2f0bae Stats: 70 lines in 8 files changed: 0 ins; 20 del; 50 mod 8352762: Use EXACTFMT instead of expanded version where applicable Reviewed-by: dholmes, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/24228 From wkemper at openjdk.org Thu Mar 27 16:37:20 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 27 Mar 2025 16:37:20 GMT Subject: Integrated: 8352918: Shenandoah: Verifier does not deactivate barriers as intended In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 19:17:33 GMT, William Kemper wrote: > When verifying reachable objects, Shenandoah's verifier clears the `_gc_state` with the intention of deactivating barriers. However, the mechanism for this is a `friend` of the heap and does not toggle the flag to cause threads to use the value set on the verifier's safepoint. The net effect here is that the barriers are _not_ deactivated during verification. Leaving the barriers on while the verifier traverses the heap may have unintended consequences (cards marked, objects evacuated, etc.) This pull request has now been integrated. Changeset: 1bd0ce1f Author: William Kemper URL: https://git.openjdk.org/jdk/commit/1bd0ce1f51760d2e57e94b19b83d3ee0fa4aebcd Stats: 11 lines in 2 files changed: 7 ins; 0 del; 4 mod 8352918: Shenandoah: Verifier does not deactivate barriers as intended Reviewed-by: kdnilsen, shade, ysr ------------- PR: https://git.openjdk.org/jdk/pull/24264 From wkemper at openjdk.org Thu Mar 27 22:09:05 2025 From: wkemper at openjdk.org (William Kemper) Date: Thu, 27 Mar 2025 22:09:05 GMT Subject: RFR: 8351892: GenShen: Remove enforcement of generation sizes [v2] In-Reply-To: References: Message-ID: <-BEi4FpPLjKx07-J7ix9fHkKVhkcYylA0ojI-a1zrJs=.a3c073d3-7e52-46fd-8e2a-1ea601bd2074@github.com> > * The option to configure minimum and maximum sizes for the young generation have been combined into `ShenandoahInitYoungPercentage`. > * The remaining functionality in `shGenerationSizer` wasn't enough to warrant being its own class, so the functionality was rolled into `shGenerationalHeap`. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Don't let old have the entire heap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24268/files - new: https://git.openjdk.org/jdk/pull/24268/files/e32ed37c..bc171089 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24268&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24268&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24268.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24268/head:pull/24268 PR: https://git.openjdk.org/jdk/pull/24268 From xpeng at openjdk.org Fri Mar 28 00:54:25 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 28 Mar 2025 00:54:25 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v14] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 20:37:59 GMT, Xiaolong Peng wrote: >> There are some scenarios in which GenShen may have improper remembered set verification logic: >> >> 1. Concurrent young cycles following a Full GC: >> >> In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification >> >> >> ShenandoahVerifier >> ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { >> shenandoah_assert_generations_reconciled(); >> if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { >> return _heap->complete_marking_context(); >> } >> return nullptr; >> } >> >> >> For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. >> >> 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. >> >> 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. >> >> 4. After concurrent young cycle evacuates objects from a young region, it update refs using marking bitmaps from marking context, therefore it won't update references of dead old objects(is_marked(obj) is false: obj is not marking strong/weak and it is below tams). In this case, if the next cycle if global concurrent GC, remembered set can't be verified before init-mark because of the dead pointers. >> >> ### Solution >> * After a full GC, always set marking completeness flag to false after reseting the marking bitmaps. >> * Because there could be dead pointers in old gen were not updated to point to new address after evacuation and refs update, we should disable rem-set validation before init-mark&update-refs if old marking context is incomplete. >> >> ### Test >> - [x] `make test TEST=hotspot_gc_shenandoah` >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Add comments I have reproduced the bug https://bugs.openjdk.org/browse/JDK-8345399 on ppc64le hardware with tip, crash happens in a young cycle after a full GC, which is one of the problems I'm trying to fix in this PR: [13.990s][info][gc,start ] GC(101) Pause Full [13.990s][info][gc,task ] GC(101) Using 4 of 4 workers for full gc [13.990s][info][gc,start ] GC(101) Verify Before Full GC, Level 4 [13.998s][info][gc ] GC(101) Verify Before Full GC, Level 4 (22772 reachable, 0 marked) [13.998s][info][gc,phases,start] GC(101) Phase 1: Mark live objects [14.003s][info][gc,ref ] GC(101) Clearing All SoftReferences [14.003s][info][gc,ref ] GC(101) Clearing All SoftReferences [14.009s][info][gc,ref ] GC(101) Encountered references: Soft: 49, Weak: 101, Final: 0, Phantom: 8 [14.009s][info][gc,ref ] GC(101) Discovered references: Soft: 31, Weak: 39, Final: 0, Phantom: 8 [14.009s][info][gc,ref ] GC(101) Enqueued references: Soft: 0, Weak: 0, Final: 0, Phantom: 0 [14.012s][info][gc,phases ] GC(101) Phase 1: Mark live objects 13.674ms [14.012s][info][gc,phases,start] GC(101) Phase 2: Compute new object addresses [14.026s][info][gc,phases ] GC(101) Phase 2: Compute new object addresses 14.166ms [14.026s][info][gc,phases,start] GC(101) Phase 3: Adjust pointers [14.030s][info][gc,phases ] GC(101) Phase 3: Adjust pointers 3.626ms [14.030s][info][gc,phases,start] GC(101) Phase 4: Move objects [14.128s][info][gc,phases ] GC(101) Phase 4: Move objects 98.264ms [14.128s][info][gc,phases,start] GC(101) Phase 5: Full GC epilog [14.146s][info][gc,ergo ] GC(101) Transfer 234 region(s) from Old to Young, yielding increased size: 790M [14.146s][info][gc,ergo ] GC(101) FullGC done: young usage: 450M, old usage: 231M [14.146s][info][gc,free ] Free: 296M, Max: 512K regular, 296M humongous, Frag: 0% external, 0% internal; Used: 0B, Mutator Free: 592 Collector Reserve: 40959K, Max: 512K; Used: 16B Old Collector Reserve: 1307K, Max: 511K; Used: 740K [14.146s][info][gc,ergo ] GC(101) After Full GC, successfully transferred 0 regions to none to prepare for next gc, old available: 1307K, young_available: 296M [14.146s][info][gc,barrier ] GC(101) Cleaned read_table from 0x0000754a50290000 to 0x0000754a5048ffff [14.146s][info][gc,barrier ] GC(101) Current write_card_table: 0x0000754a4fc90000 [14.148s][info][gc,phases ] GC(101) Phase 5: Full GC epilog 20.265ms [14.148s][info][gc,start ] GC(101) Verify After Full GC, Level 4 [14.182s][info][gc ] GC(101) Verify After Full GC, Level 4 (22664 reachable, 125 marked) [14.182s][info][gc,ergo ] GC(101) At end of Full GC: GCU: 6.9%, MU: 9.9% during period of 0.261s [14.182s][info][gc,ergo ] GC(101) At end of Full GC: Young generation used: 450M, used regions: 454M, humongous waste: 3532K, soft capacity: 1024M, max capacity: 790M, available: 296M [14.182s][info][gc,ergo ] GC(101) At end of Full GC: Old generation used: 231M, used regions: 234M, humongous waste: 1654K, soft capacity: 0B, max capacity: 234M, available: 1307K [14.182s][info][gc,ergo ] GC(101) Good progress for free space: 296M, need 10485K [14.182s][info][gc,ergo ] GC(101) Good progress for used space: 148M, need 512K [14.182s][info][gc ] GC(101) Pause Full 829M->681M(1024M) 192.311ms ... [14.196s][info][gc ] Trigger (Young): Free (65536K) is below minimum threshold (80895K) [14.196s][info][gc,free ] Free: 65536K, Max: 512K regular, 65536K humongous, Frag: 0% external, 0% internal; Used: 0B, Mutator Free: 128 Collector Reserve: 40959K, Max: 512K; Used: 16B Old Collector Reserve: 1307K, Max: 511K; Used: 740K [14.196s][info][gc,ergo ] GC(102) Start GC cycle (Young) [14.196s][info][gc,start ] GC(102) Concurrent reset (Young) [14.196s][info][gc,task ] GC(102) Using 2 of 4 workers for Concurrent reset (Young) [14.196s][info][gc,ergo ] GC(102) Pacer for Reset. Non-Taxable: 1024M Allocated: 732 Mb Allocated: 699 Mb Allocated: 715 Mb [14.200s][info][gc,thread ] Cancelling GC: unknown GCCause [14.200s][info][gc ] Failed to allocate Shared, 61709K [14.202s][info][gc ] GC(102) Concurrent reset (Young) 6.371ms [14.203s][info][gc,barrier ] GC(102) Cleaned read_table from 0x0000754a50080000 to 0x0000754a5027ffff [14.203s][info][gc,start ] GC(102) Pause Init Mark (Young) [14.203s][info][gc,task ] GC(102) Using 4 of 4 workers for init marking [14.205s][info][gc,barrier ] GC(102) Current write_card_table: 0x0000754a4fa80000 [14.205s][info][gc,start ] GC(102) Verify Before Mark, Level 4 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/xlpeng/repos/jdk/src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp:1270), pid=2167519, tid=2167538 # Error: Verify init-mark remembered set violation; clean card, it should be dirty. Referenced from: interior location: 0x00000000c00c2bfc inside Java heap not in collection set region: | 1|R |O|BTE c0080000, c00c2c78, c0100000|TAMS c0080000|UWM c00c2c78|U 267K|T 0B|G 0B|P 0B|S 267K|L 267K|CP 0 Object: 0x00000000e8c00000 - klass 0x000001df001abfa0 [I not allocated after mark start not after update watermark not marked strong not marked weak not in collection set age: 0 mark: mark(is_unlocked no_hash age=0) region: | 1304|H |Y|BTE e8c00000, e8c80000, e8c80000|TAMS e8c80000|UWM e8c80000|U 512K|T 0B|G 0B|P 0B|S 512K|L 0B|CP 0 Forwardee: (the object itself) I'll run the same test to confirm whether this PR fix the bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24092#issuecomment-2759904598 From manc at openjdk.org Fri Mar 28 06:51:23 2025 From: manc at openjdk.org (Man Cao) Date: Fri, 28 Mar 2025 06:51:23 GMT Subject: RFR: 8352765: G1CollectedHeap::expand_and_allocate() may fail to allocate even after heap expansion succeeds [v2] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 14:33:34 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to ensure that G1 provisions for at least one Eden region after a GC when computing the young length target. >> >> The issue reported in the CR occurs at the end of a GC, after successfully expanding the heap, an allocation fails because `policy()->should_allocate_mutator_region()` returns false. This happens because the computation did not properly account for young regions already allocated as survivor regions, leading to an Eden region target of zero. >> >> With this change, we factor in the young regions that have already been allocated as survivor regions and ensure that at least one region is targeted for Eden. >> >> Testing: Tier 1-3 >> Reproducer in the CR. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Thomas Review Thank you for the quick fix! I just have a two questions, which probably does not need to be addressed in this PR. 1. Is it possible for `expand_and_allocate()` to run into the [`receiving_additional_eden = 0;`](https://github.com/openjdk/jdk/blob/2ea1557a0fdaf551d75365d1351bfbd73319dcfb/src/hotspot/share/gc/g1/g1Policy.cpp#L320) situation under the `if (allocated_young_length >= desired_young_length)` branch? 2. It feels like G1 should recalculate young list length parameters after a successful `expand()` or `shrink()`. Currently `G1Policy::record_new_heap_size()` only recalculates the min/max young lengths. Is it reasonable to run `calculate_young_desired_length()` and `calculate_young_target_length()` inside `G1Policy::record_new_heap_size()`? ------------- Marked as reviewed by manc (Committer). PR Review: https://git.openjdk.org/jdk/pull/24257#pullrequestreview-2724441893 From iwalulya at openjdk.org Fri Mar 28 13:09:09 2025 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 28 Mar 2025 13:09:09 GMT Subject: RFR: 8352765: G1CollectedHeap::expand_and_allocate() may fail to allocate even after heap expansion succeeds [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 06:49:04 GMT, Man Cao wrote: > I just have a two questions, which probably does not need to be addressed in this PR. > > 1. Is it possible for `expand_and_allocate()` to run into the [`receiving_additional_eden = 0;`](https://github.com/openjdk/jdk/blob/2ea1557a0fdaf551d75365d1351bfbd73319dcfb/src/hotspot/share/gc/g1/g1Policy.cpp#L320) situation under the `if (allocated_young_length >= desired_young_length)` branch? > > 2. It feels like G1 should recalculate young list length parameters after a successful `expand()` or `shrink()`. Currently `G1Policy::record_new_heap_size()` only recalculates the min/max young lengths. Is it reasonable to run `calculate_young_desired_length()` and `calculate_young_target_length()` inside `G1Policy::record_new_heap_size()`? Yes, I think we need to do the recalculation after expand as the expansion changes the number of free regions available and so should affect the young_[desired,target]_lengths. Otherwise, it is possible to expand and fail allocation. This can be addressed in a separate PR though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24257#issuecomment-2761316156 From vlivanov at openjdk.org Fri Mar 28 22:04:20 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 28 Mar 2025 22:04:20 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks In-Reply-To: References: Message-ID: <4ZbIg2yTtJjQUwkCjO_Klnv0e4_DLNaRzxxpJa4g9RU=.9f32f9f2-b50c-495d-8188-3207a061e7b3@github.com> On Wed, 12 Mar 2025 19:45:41 GMT, Aleksey Shipilev wrote: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Good catch, Aleksey! What do you think about making 1 step further and encapsulating weak/strong reference handling into a helper class? Also, as an optimization idea: seems like weak + strong handles form a union (none -> weak -> strong). So, once a strong reference is captured, corresponding weak handle can be cleared straight away. ------------- PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2726922896 From kdnilsen at openjdk.org Sat Mar 29 00:15:08 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 29 Mar 2025 00:15:08 GMT Subject: RFR: 8351892: GenShen: Remove enforcement of generation sizes [v2] In-Reply-To: <-BEi4FpPLjKx07-J7ix9fHkKVhkcYylA0ojI-a1zrJs=.a3c073d3-7e52-46fd-8e2a-1ea601bd2074@github.com> References: <-BEi4FpPLjKx07-J7ix9fHkKVhkcYylA0ojI-a1zrJs=.a3c073d3-7e52-46fd-8e2a-1ea601bd2074@github.com> Message-ID: On Thu, 27 Mar 2025 22:09:05 GMT, William Kemper wrote: >> * The option to configure minimum and maximum sizes for the young generation have been combined into `ShenandoahInitYoungPercentage`. >> * The remaining functionality in `shGenerationSizer` wasn't enough to warrant being its own class, so the functionality was rolled into `shGenerationalHeap`. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Don't let old have the entire heap Looks good. Thanks for this simplification and improved consistency. src/hotspot/share/gc/shenandoah/shenandoahGenerationalFullGC.cpp line 120: > 118: if (old_capacity > old_usage) { > 119: size_t excess_old_regions = (old_capacity - old_usage) / ShenandoahHeapRegion::region_size_bytes(); > 120: gen_heap->transfer_to_young(excess_old_regions); should we assert result is successful? Or replace with force_transfer? (just seems bad practice to ignore a status result) src/hotspot/share/gc/shenandoah/shenandoahGenerationalHeap.cpp line 134: > 132: ShenandoahHeap::initialize_heuristics(); > 133: > 134: // Max capacity is the maximum _allowed_ capacity. This means the sum of the maximum I don't understand the relevance of this comment. Is there still a mximum allowed for old and a maximum allowed for young? ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/jdk/pull/24268#pullrequestreview-2727093354 PR Review Comment: https://git.openjdk.org/jdk/pull/24268#discussion_r2019606575 PR Review Comment: https://git.openjdk.org/jdk/pull/24268#discussion_r2019607905 From ecki at zusammenkunft.net Sat Mar 29 00:23:16 2025 From: ecki at zusammenkunft.net (ecki) Date: Sat, 29 Mar 2025 01:23:16 +0100 Subject: Zero based Coops now 30GB Message-ID: <2B9B2EA2-3621-1140-A7D2-345254292797@hxcore.ol> An HTML attachment was scrubbed... URL: From duke at openjdk.org Sat Mar 29 00:35:12 2025 From: duke at openjdk.org (duke) Date: Sat, 29 Mar 2025 00:35:12 GMT Subject: RFR: 8352185: Shenandoah: Invalid logic for remembered set verification [v14] In-Reply-To: References: Message-ID: <0SZ62G1n4JFHTQ0XnfQMmWTp5Wkhi9SFS0f22p5cgA8=.7b47c928-2606-4bbc-a35d-fbbc3366633c@github.com> On Wed, 26 Mar 2025 20:37:59 GMT, Xiaolong Peng wrote: >> There are some scenarios in which GenShen may have improper remembered set verification logic: >> >> 1. Concurrent young cycles following a Full GC: >> >> In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification >> >> >> ShenandoahVerifier >> ShenandoahMarkingContext* ShenandoahVerifier::get_marking_context_for_old() { >> shenandoah_assert_generations_reconciled(); >> if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { >> return _heap->complete_marking_context(); >> } >> return nullptr; >> } >> >> >> For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result. >> >> 2. For the impl of `ShenandoahVerifier::get_marking_context_for_old` mentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark, `ShenandoahVerifier::help_verify_region_rem_set` always skip verification in this case. >> >> 3. ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2. >> >> 4. After concurrent young cycle evacuates objects from a young region, it update refs using marking bitmaps from marking context, therefore it won't update references of dead old objects(is_marked(obj) is false: obj is not marking strong/weak and it is below tams). In this case, if the next cycle if global concurrent GC, remembered set can't be verified before init-mark because of the dead pointers. >> >> ### Solution >> * After a full GC, always set marking completeness flag to false after reseting the marking bitmaps. >> * Because there could be dead pointers in old gen were not updated to point to new address after evacuation and refs update, we should disable rem-set validation before init-mark&update-refs if old marking context is incomplete. >> >> ### Test >> - [x] `make test TEST=hotspot_gc_shenandoah` >> - [x] GHA > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Add comments @pengxiaolong Your change (at version e11c6fc3f8ccc25064be26c87273d10125540222) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24092#issuecomment-2762939177 From thomas.stuefe at gmail.com Sat Mar 29 08:49:55 2025 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sat, 29 Mar 2025 09:49:55 +0100 Subject: Zero based Coops now 30GB In-Reply-To: <2B9B2EA2-3621-1140-A7D2-345254292797@hxcore.ol> References: <2B9B2EA2-3621-1140-A7D2-345254292797@hxcore.ol> Message-ID: Hi Ecki, On Sat, Mar 29, 2025 at 1:23?AM ecki wrote: > Hello, > > I just noticed (from -Xlog:gc+init=info) that in later Java versions the > limit for Zero based coops mode seems to be increased from 26GB to 30GB (in > my case Windows x64 with Zulu21), > Did I miss discussion and announcement about it, how different are > platforms/versions/distributions/configurations in this? (32bit mode seems > to have also increased to 2020MB limit), > > There is no hard-coded limit for zero-based compression. Instead, it is derived from object alignment (LogMinObjAlignmentInBytes) since we use the oop alignment to increase the zero-based range. At startup, we try to fit the heap into the zero-based range; that may work or not, depending on address space population. We did some changes, however, to how CDS and class space are placed. We earmarked some space in lower address regions in older JDKs for that. We don't do that anymore, at least not by default; CDS/class space is now always relocated for unrelated reasons. A side effect is that we have more space available in lower address regions, and therefore, we can fit larger heaps in there. That side effect is also beneficial, since oop compression is more important than class pointer compression. Also with zero based being now 30GB and compressed limit at 31, there is a > very narrow range for non zero based compression, was it discussed to > remove it or extend the limit to larger sizes automatically with more > overhead modes? > > No! That is not desirable. The chance of low-address heap placement is high, but not guaranteed, since it depends on address space population (factors like ASLR etc). One example, the JVM may be embedded into a custom launcher and may be loaded and initialized late. It could therefore face an already very fragmented address space. The JVM must still be able to cope with these scenarios. In these cases, it is still much better to use non-zero-based compressed oops instead of switching off oop compression. Switching off oop compression would make memory footprint of a JVM a lot more unpredictable: if the heap happens to fit the heap into low address regions, JVM would use X amount of GB, but if not (semi-random), it would use X * 1.5 (or whatever the factor for that app is resulting from uncompressed oops). Cheers, Thomas > Gru?, > Bernd > -- > https://bernd.eckenfels.net > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Sat Mar 29 08:52:30 2025 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sat, 29 Mar 2025 09:52:30 +0100 Subject: Zero based Coops now 30GB In-Reply-To: References: <2B9B2EA2-3621-1140-A7D2-345254292797@hxcore.ol> Message-ID: p.s. > was it discussed to ... extend the limit to larger sizes automatically with more overhead modes? The only way to do this is to increase object alignment (which in fact you can do manually with -XX:ObjectAlignmentInBytes). But you would use more heap space. On Sat, Mar 29, 2025 at 9:49?AM Thomas St?fe wrote: > Hi Ecki, > > On Sat, Mar 29, 2025 at 1:23?AM ecki wrote: > >> Hello, >> >> I just noticed (from -Xlog:gc+init=info) that in later Java versions the >> limit for Zero based coops mode seems to be increased from 26GB to 30GB (in >> my case Windows x64 with Zulu21), >> > Did I miss discussion and announcement about it, how different are >> platforms/versions/distributions/configurations in this? (32bit mode seems >> to have also increased to 2020MB limit), >> >> > There is no hard-coded limit for zero-based compression. Instead, it is > derived from object alignment (LogMinObjAlignmentInBytes) since we use the > oop alignment to increase the zero-based range. > > At startup, we try to fit the heap into the zero-based range; that may > work or not, depending on address space population. > > We did some changes, however, to how CDS and class space are placed. We > earmarked some space in lower address regions in older JDKs for that. We > don't do that anymore, at least not by default; CDS/class space is now > always relocated for unrelated reasons. A side effect is that we have more > space available in lower address regions, and therefore, we can fit larger > heaps in there. That side effect is also beneficial, since oop compression > is more important than class pointer compression. > > Also with zero based being now 30GB and compressed limit at 31, there is a >> very narrow range for non zero based compression, was it discussed to >> remove it or extend the limit to larger sizes automatically with more >> overhead modes? >> >> > No! That is not desirable. The chance of low-address heap placement is > high, but not guaranteed, since it depends on address space population > (factors like ASLR etc). One example, the JVM may be embedded into a custom > launcher and may be loaded and initialized late. It could therefore face an > already very fragmented address space. The JVM must still be able to cope > with these scenarios. > > In these cases, it is still much better to use non-zero-based compressed > oops instead of switching off oop compression. > > Switching off oop compression would make memory footprint of a JVM a lot > more unpredictable: if the heap happens to fit the heap into low address > regions, JVM would use X amount of GB, but if not (semi-random), it would > use X * 1.5 (or whatever the factor for that app is resulting from > uncompressed oops). > > Cheers, Thomas > > > >> Gru?, >> Bernd >> -- >> https://bernd.eckenfels.net >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdnilsen at openjdk.org Mon Mar 31 03:22:57 2025 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 31 Mar 2025 03:22:57 GMT Subject: RFR: 8353115: GenShen: mixed evacuation candidate regions need accurate live_data Message-ID: The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions that are going to be added or not to the collection set during final-mark safepoint. However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended. This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions. ------------- Commit messages: - Track live and garbage for mixed-evac regions - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 22 more: https://git.openjdk.org/jdk/compare/58ef4015...70613882 Changes: https://git.openjdk.org/jdk/pull/24319/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24319&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353115 Stats: 54 lines in 5 files changed: 54 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24319.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319 PR: https://git.openjdk.org/jdk/pull/24319