From rkennke at openjdk.org Thu Dec 1 09:46:14 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 1 Dec 2022 09:46:14 GMT Subject: RFR: 8297285: Shenandoah pacing causes assertion failure during VM initialization [v2] In-Reply-To: References:

Message-ID: <8akQ83Qu0QVUqLJWmXmIjK8q3rnkiPtj9C_H754etQU=.53980eeb-8c04-4b56-baf5-d588903206a8@github.com> On Thu, 24 Nov 2022 23:53:44 GMT, Ashutosh Mehra wrote: >> Please review the fix for the assertion failure seen during VM init due to pacing in shenandoah gc. >> The fix is to avoid pacing during VM initialization as the main thread is not yet an active java thread. >> >> Signed-off-by: Ashutosh Mehra > > Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: > > Include runtime/javaThread.inline.hpp for JavaThread::is_terminated() to > fix compile failure > > Signed-off-by: Ashutosh Mehra Looks good to me! Thank you, Ashu! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.org/jdk/pull/11360 From duke at openjdk.org Thu Dec 1 19:39:21 2022 From: duke at openjdk.org (Ashutosh Mehra) Date: Thu, 1 Dec 2022 19:39:21 GMT Subject: RFR: 8297285: Shenandoah pacing causes assertion failure during VM initialization [v2] In-Reply-To: <8akQ83Qu0QVUqLJWmXmIjK8q3rnkiPtj9C_H754etQU=.53980eeb-8c04-4b56-baf5-d588903206a8@github.com> References:

<8akQ83Qu0QVUqLJWmXmIjK8q3rnkiPtj9C_H754etQU=.53980eeb-8c04-4b56-baf5-d588903206a8@github.com> Message-ID: On Thu, 1 Dec 2022 09:42:23 GMT, Roman Kennke wrote: >> Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: >> >> Include runtime/javaThread.inline.hpp for JavaThread::is_terminated() to >> fix compile failure >> >> Signed-off-by: Ashutosh Mehra > > Looks good to me! Thank you, Ashu! @rkennke thanks for suggesting and reviewing the fix. ------------- PR: https://git.openjdk.org/jdk/pull/11360 From phh at openjdk.org Fri Dec 2 00:19:14 2022 From: phh at openjdk.org (Paul Hohensee) Date: Fri, 2 Dec 2022 00:19:14 GMT Subject: RFR: 8297285: Shenandoah pacing causes assertion failure during VM initialization [v2] In-Reply-To: References:

Message-ID: <_3PHeiuhtRZjH1XTXF6in0fwUQFjAssUB8AkDUY8jcM=.6a309505-1fda-493b-9fb5-0c3aa2d74566@github.com> On Thu, 24 Nov 2022 23:53:44 GMT, Ashutosh Mehra wrote: >> Please review the fix for the assertion failure seen during VM init due to pacing in shenandoah gc. >> The fix is to avoid pacing during VM initialization as the main thread is not yet an active java thread. >> >> Signed-off-by: Ashutosh Mehra > > Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: > > Include runtime/javaThread.inline.hpp for JavaThread::is_terminated() to > fix compile failure > > Signed-off-by: Ashutosh Mehra I'm not familiar with this code, so please bear with me. :) The comment on line 246 says "Thread which is not an active Java thread should also not block.", but the check at line 251 will return (i.e., looks like not block) if the current thread is an active Java thread. Should the check be !current->is_active_Java_thread() instead? ------------- PR: https://git.openjdk.org/jdk/pull/11360 From duke at openjdk.org Fri Dec 2 03:46:25 2022 From: duke at openjdk.org (Ashutosh Mehra) Date: Fri, 2 Dec 2022 03:46:25 GMT Subject: RFR: 8297285: Shenandoah pacing causes assertion failure during VM initialization [v2] In-Reply-To: <_3PHeiuhtRZjH1XTXF6in0fwUQFjAssUB8AkDUY8jcM=.6a309505-1fda-493b-9fb5-0c3aa2d74566@github.com> References:

<_3PHeiuhtRZjH1XTXF6in0fwUQFjAssUB8AkDUY8jcM=.6a309505-1fda-493b-9fb5-0c3aa2d74566@github.com> Message-ID: On Fri, 2 Dec 2022 00:17:00 GMT, Paul Hohensee wrote: >> Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: >> >> Include runtime/javaThread.inline.hpp for JavaThread::is_terminated() to >> fix compile failure >> >> Signed-off-by: Ashutosh Mehra > > I'm not familiar with this code, so please bear with me. :) The comment on line 246 says "Thread which is not an active Java thread should also not block.", but the check at line 251 will return (i.e., looks like not block) if the current thread is an active Java thread. Should the check be !current->is_active_Java_thread() instead? @phohensee you are right. It should be `!current->is_active_Java_thread()`, how did I miss that `!`! Thanks for catching it in time. ------------- PR: https://git.openjdk.org/jdk/pull/11360 From duke at openjdk.org Fri Dec 2 04:02:33 2022 From: duke at openjdk.org (Ashutosh Mehra) Date: Fri, 2 Dec 2022 04:02:33 GMT Subject: RFR: 8297285: Shenandoah pacing causes assertion failure during VM initialization [v3] In-Reply-To: References: Message-ID: > Please review the fix for the assertion failure seen during VM init due to pacing in shenandoah gc. > The fix is to avoid pacing during VM initialization as the main thread is not yet an active java thread. > > Signed-off-by: Ashutosh Mehra Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: Fix the condition that the current thread is not an active java thread Signed-off-by: Ashutosh Mehra ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11360/files - new: https://git.openjdk.org/jdk/pull/11360/files/60f174fc..17a7b3bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11360&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11360&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11360.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11360/head:pull/11360 PR: https://git.openjdk.org/jdk/pull/11360 From phh at openjdk.org Fri Dec 2 13:32:09 2022 From: phh at openjdk.org (Paul Hohensee) Date: Fri, 2 Dec 2022 13:32:09 GMT Subject: RFR: 8297285: Shenandoah pacing causes assertion failure during VM initialization [v3] In-Reply-To: References:

Message-ID: On Fri, 2 Dec 2022 04:02:33 GMT, Ashutosh Mehra wrote: >> Please review the fix for the assertion failure seen during VM init due to pacing in shenandoah gc. >> The fix is to avoid pacing during VM initialization as the main thread is not yet an active java thread. >> >> Signed-off-by: Ashutosh Mehra > > Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: > > Fix the condition that the current thread is not an active java thread > > Signed-off-by: Ashutosh Mehra Marked as reviewed by rkennke (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11360 From duke at openjdk.org Fri Dec 2 14:25:32 2022 From: duke at openjdk.org (Ashutosh Mehra) Date: Fri, 2 Dec 2022 14:25:32 GMT Subject: Integrated: 8297285: Shenandoah pacing causes assertion failure during VM initialization In-Reply-To: References: Message-ID: <0oaTeJUG3fC-F1V499dwXovnPUGcgR6gILCgnKJd_mY=.6a5dcd3d-4b97-47ed-a029-67122be03ef8@github.com> On Thu, 24 Nov 2022 21:57:06 GMT, Ashutosh Mehra wrote: > Please review the fix for the assertion failure seen during VM init due to pacing in shenandoah gc. > The fix is to avoid pacing during VM initialization as the main thread is not yet an active java thread. > > Signed-off-by: Ashutosh Mehra This pull request has now been integrated. Changeset: 415cfd2e Author: Ashutosh Mehra Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/415cfd2e28e6b7613712ab63a1ab66522e9bf0f2 Stats: 8 lines in 1 file changed: 7 ins; 0 del; 1 mod 8297285: Shenandoah pacing causes assertion failure during VM initialization Reviewed-by: rkennke, phh ------------- PR: https://git.openjdk.org/jdk/pull/11360 From wkemper at openjdk.org Sat Dec 3 01:16:20 2022 From: wkemper at openjdk.org (William Kemper) Date: Sat, 3 Dec 2022 01:16:20 GMT Subject: RFR: Generation resizing Message-ID: These changes have the generational mode track the minimum mutator utilization (percentage of process time used by mutators). When it falls below a configuration percentage (GCTimeRatio), a heuristic will transfer memory capacity to whatever generation has been using more CPU time. The assumption here is that by increasing capacity, we will decrease the collection frequency and improve the MMU. ------------- Commit messages: - Document the class responsible for adjusting generation sizes - Revert unnecessary change - Remove unused time between cycle tracking - Remove vestigial mmu tracker instance - Clamp adjustments to min/max when increment is too large - Adjust generation sizes from safepoint - Fix crash in SATB mode, always log average MMU on scheduled interval - Limits on generation size adjustments, log young/old heap occupancy in generational mode - WIP: Transfer up to 10% capacity to undersized generation - WIP: Track idle gc time and mmu averages, rename confusing method name - ... and 3 more: https://git.openjdk.org/shenandoah/compare/998f68b2...b916a909 Changes: https://git.openjdk.org/shenandoah/pull/177/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=177&range=00 Stats: 449 lines in 22 files changed: 419 ins; 18 del; 12 mod Patch: https://git.openjdk.org/shenandoah/pull/177.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/177/head:pull/177 PR: https://git.openjdk.org/shenandoah/pull/177 From mcimadamore at openjdk.org Mon Dec 5 10:31:52 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 5 Dec 2022 10:31:52 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v39] In-Reply-To: References: Message-ID: <-V_N0Cvh4J0vKNbBYdFcow9E8yFHRIjya8n69MpDSuY=.9626ee4d-95b6-41e4-b21e-395e79840388@github.com> > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Fix Preview annotation for JEP 434 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10872/files - new: https://git.openjdk.org/jdk/pull/10872/files/8b5dc0f0..33b834ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=38 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10872&range=37-38 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10872.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10872/head:pull/10872 PR: https://git.openjdk.org/jdk/pull/10872 From sundar at openjdk.org Mon Dec 5 11:03:15 2022 From: sundar at openjdk.org (Athijegannathan Sundararajan) Date: Mon, 5 Dec 2022 11:03:15 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v39] In-Reply-To: <-V_N0Cvh4J0vKNbBYdFcow9E8yFHRIjya8n69MpDSuY=.9626ee4d-95b6-41e4-b21e-395e79840388@github.com> References: <-V_N0Cvh4J0vKNbBYdFcow9E8yFHRIjya8n69MpDSuY=.9626ee4d-95b6-41e4-b21e-395e79840388@github.com> Message-ID: On Mon, 5 Dec 2022 10:31:52 GMT, Maurizio Cimadamore wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Fix Preview annotation for JEP 434 LGTM ------------- Marked as reviewed by sundar (Reviewer). PR: https://git.openjdk.org/jdk/pull/10872 From rkennke at openjdk.org Mon Dec 5 11:07:01 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 5 Dec 2022 11:07:01 GMT Subject: RFR: Generation resizing In-Reply-To: References: Message-ID: On Sat, 3 Dec 2022 01:09:59 GMT, William Kemper wrote: > These changes have the generational mode track the minimum mutator utilization (percentage of process time used by mutators). When it falls below a configuration percentage (GCTimeRatio), a heuristic will transfer memory capacity to whatever generation has been using more CPU time. The assumption here is that by increasing capacity, we will decrease the collection frequency and improve the MMU. Thanks, William! The PR has merge conflicts, can you resolve them? Thanks, Roman ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From rkennke at openjdk.org Mon Dec 5 11:16:15 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 5 Dec 2022 11:16:15 GMT Subject: RFR: Generation resizing In-Reply-To: References: Message-ID: On Sat, 3 Dec 2022 01:09:59 GMT, William Kemper wrote: > These changes have the generational mode track the minimum mutator utilization (percentage of process time used by mutators). When it falls below a configuration percentage (GCTimeRatio), a heuristic will transfer memory capacity to whatever generation has been using more CPU time. The assumption here is that by increasing capacity, we will decrease the collection frequency and improve the MMU. Thank you for implementing this useful change! I have a few questions and comments. src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 451: > 449: void ShenandoahControlThread::service_concurrent_normal_cycle( > 450: const ShenandoahHeap* heap, const GenerationMode generation, GCCause::Cause cause) { > 451: GCIdMark gc_id_mark; Why does the GCIdMark need to move around? src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1791: > 1789: > 1790: void ShenandoahHeap::on_cycle_start(GCCause::Cause cause, ShenandoahGeneration* generation) { > 1791: log_info(gc)("on_cycle_start: %s", generation->name()); What is that logging for/ what does the log message mean? I'd either improve the log message or remove the logging (or make it dev+trace) if it was only for dev purposes. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1804: > 1802: > 1803: void ShenandoahHeap::on_cycle_end(ShenandoahGeneration* generation) { > 1804: log_info(gc)("on_cycle_end: %s", generation->name()); Same here. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 25: > 23: */ > 24: > 25: #include "gc/shenandoah/shenandoahMmuTracker.hpp" You need to include precompiled.hpp here. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 28: > 26: #define SHARE_GC_SHENANDOAH_SHENANDOAHMMUTRACKER_HPP > 27: > 28: #include "memory/iterator.hpp" What do we need the iterator.hpp for? ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From mcimadamore at openjdk.org Mon Dec 5 13:49:46 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 5 Dec 2022 13:49:46 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v39] In-Reply-To: <-V_N0Cvh4J0vKNbBYdFcow9E8yFHRIjya8n69MpDSuY=.9626ee4d-95b6-41e4-b21e-395e79840388@github.com> References: <-V_N0Cvh4J0vKNbBYdFcow9E8yFHRIjya8n69MpDSuY=.9626ee4d-95b6-41e4-b21e-395e79840388@github.com> Message-ID: On Mon, 5 Dec 2022 10:31:52 GMT, Maurizio Cimadamore wrote: >> This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. >> >> [1] - https://openjdk.org/jeps/434 > > Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: > > Fix Preview annotation for JEP 434 Note: there are 4 tests failing in x86: * MemoryLayoutPrincipalTotalityTest * MemoryLayoutTypeRetentionTest * TestLargeSegmentCopy * TestLinker These failures are addressed in the dependent PR: https://git.openjdk.org/jdk/pull/11019, which will be integrated immediately after these changes ------------- PR: https://git.openjdk.org/jdk/pull/10872 From mcimadamore at openjdk.org Mon Dec 5 13:55:22 2022 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 5 Dec 2022 13:55:22 GMT Subject: Integrated: 8295044: Implementation of Foreign Function and Memory API (Second Preview) In-Reply-To: References: Message-ID: <7Ara-NxY9rdQzABZPYR9T-N7b1XLY99_6J-dG3cr2NY=.4151c690-0138-4ffd-a763-ff2456754189@github.com> On Wed, 26 Oct 2022 13:11:50 GMT, Maurizio Cimadamore wrote: > This PR contains the API and implementation changes for JEP-434 [1]. A more detailed description of such changes, to avoid repetitions during the review process, is included as a separate comment. > > [1] - https://openjdk.org/jeps/434 This pull request has now been integrated. Changeset: 73baadce Author: Maurizio Cimadamore URL: https://git.openjdk.org/jdk/commit/73baadceb60029f6340c1327118aeb59971c2434 Stats: 13808 lines in 255 files changed: 5780 ins; 4448 del; 3580 mod 8295044: Implementation of Foreign Function and Memory API (Second Preview) Co-authored-by: Jorn Vernee Co-authored-by: Per Minborg Co-authored-by: Maurizio Cimadamore Reviewed-by: jvernee, pminborg, psandoz, alanb, sundar ------------- PR: https://git.openjdk.org/jdk/pull/10872 From wkemper at openjdk.org Mon Dec 5 17:04:12 2022 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Dec 2022 17:04:12 GMT Subject: RFR: Generation resizing In-Reply-To: References:

Message-ID: On Mon, 5 Dec 2022 11:07:23 GMT, Roman Kennke wrote: >> These changes have the generational mode track the minimum mutator utilization (percentage of process time used by mutators). When it falls below a configuration percentage (GCTimeRatio), a heuristic will transfer memory capacity to whatever generation has been using more CPU time. The assumption here is that by increasing capacity, we will decrease the collection frequency and improve the MMU. > > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 451: > >> 449: void ShenandoahControlThread::service_concurrent_normal_cycle( >> 450: const ShenandoahHeap* heap, const GenerationMode generation, GCCause::Cause cause) { >> 451: GCIdMark gc_id_mark; > > Why does the GCIdMark need to move around? I pulled up the gcid mark because every old generation collection is preceded by a "bootstrap" young collection. Logically, this bootstrap phase belongs to the old collection so it should have the same GC id as the subsequent old marking phase. ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From wkemper at openjdk.org Mon Dec 5 17:17:20 2022 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Dec 2022 17:17:20 GMT Subject: RFR: Generation resizing In-Reply-To: References:

Message-ID: On Mon, 5 Dec 2022 11:10:05 GMT, Roman Kennke wrote: >> These changes have the generational mode track the minimum mutator utilization (percentage of process time used by mutators). When it falls below a configuration percentage (GCTimeRatio), a heuristic will transfer memory capacity to whatever generation has been using more CPU time. The assumption here is that by increasing capacity, we will decrease the collection frequency and improve the MMU. > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1791: > >> 1789: >> 1790: void ShenandoahHeap::on_cycle_start(GCCause::Cause cause, ShenandoahGeneration* generation) { >> 1791: log_info(gc)("on_cycle_start: %s", generation->name()); > > What is that logging for/ what does the log message mean? I'd either improve the log message or remove the logging (or make it dev+trace) if it was only for dev purposes. Will remove this. > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1804: > >> 1802: >> 1803: void ShenandoahHeap::on_cycle_end(ShenandoahGeneration* generation) { >> 1804: log_info(gc)("on_cycle_end: %s", generation->name()); > > Same here. Will remove this. ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From wkemper at openjdk.org Mon Dec 5 17:29:47 2022 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Dec 2022 17:29:47 GMT Subject: RFR: Generation resizing In-Reply-To: References:

Message-ID: On Mon, 5 Dec 2022 11:10:54 GMT, Roman Kennke wrote: >> These changes have the generational mode track the minimum mutator utilization (percentage of process time used by mutators). When it falls below a configuration percentage (GCTimeRatio), a heuristic will transfer memory capacity to whatever generation has been using more CPU time. The assumption here is that by increasing capacity, we will decrease the collection frequency and improve the MMU. > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 25: > >> 23: */ >> 24: >> 25: #include "gc/shenandoah/shenandoahMmuTracker.hpp" > > You need to include precompiled.hpp here. Done (added this to my new file template as well). ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From wkemper at openjdk.org Mon Dec 5 19:52:35 2022 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Dec 2022 19:52:35 GMT Subject: RFR: Generation resizing [v2] In-Reply-To: References: Message-ID: > These changes have the generational mode track the minimum mutator utilization (percentage of process time used by mutators). When it falls below a configuration percentage (GCTimeRatio), a heuristic will transfer memory capacity to whatever generation has been using more CPU time. The assumption here is that by increasing capacity, we will decrease the collection frequency and improve the MMU. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Remove unnecessary logging, clean up imports - Merge from shenandoah/master - Document the class responsible for adjusting generation sizes - Revert unnecessary change - Remove unused time between cycle tracking - Remove vestigial mmu tracker instance - Clamp adjustments to min/max when increment is too large - Adjust generation sizes from safepoint - Fix crash in SATB mode, always log average MMU on scheduled interval - Limits on generation size adjustments, log young/old heap occupancy in generational mode - ... and 5 more: https://git.openjdk.org/shenandoah/compare/f90a7701...41f057fa ------------- Changes: https://git.openjdk.org/shenandoah/pull/177/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=177&range=01 Stats: 447 lines in 22 files changed: 417 ins; 18 del; 12 mod Patch: https://git.openjdk.org/shenandoah/pull/177.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/177/head:pull/177 PR: https://git.openjdk.org/shenandoah/pull/177 From wkemper at openjdk.org Mon Dec 5 23:25:08 2022 From: wkemper at openjdk.org (William Kemper) Date: Mon, 5 Dec 2022 23:25:08 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: <-xErRk5G6IkHidX1x7cFmyAHd5mU7DRXN0zqH8IT07Q=.40fd9a98-5f4c-48df-9d5e-55a5adcf7c65@github.com> Weekly merge from upstream. Looks fine in testing. ------------- Commit messages: - Merge tag 'jdk-20+26' into merge-jdk-20-26 - 8297731: Remove redundant check in MutableBigInteger.divide - 8287400: Make BitMap range parameter names consistent - 8297584: G1 parallel phase event for scan heap roots is sent too often - 8294924: JvmtiExport::post_exception_throw() doesn't deal well with concurrent stack processing - 8296875: Generational ZGC: Refactor loom code - 8297284: ResolutionErrorTable's key is wrong - 8297740: runtime/ClassUnload/UnloadTest.java failed with "Test failed: should still be live" - 8297644: RISC-V: Compilation error when shenandoah is disabled - 8297523: Various GetPrimitiveArrayCritical miss result - NULL check - ... and 86 more: https://git.openjdk.org/shenandoah/compare/f90a7701...bfd2f109 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=shenandoah&pr=178&range=00.0 - openjdk/jdk:master: https://webrevs.openjdk.org/?repo=shenandoah&pr=178&range=00.1 Changes: https://git.openjdk.org/shenandoah/pull/178/files Stats: 15750 lines in 611 files changed: 10080 ins; 3270 del; 2400 mod Patch: https://git.openjdk.org/shenandoah/pull/178.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/178/head:pull/178 PR: https://git.openjdk.org/shenandoah/pull/178 From ysr at openjdk.org Tue Dec 6 03:57:11 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 6 Dec 2022 03:57:11 GMT Subject: RFR: JDK-8298138: Shenandoah: HdrSeq asserts "sub-bucket index (512) overflow for value ( 1.00)" Message-ID: JBS link: https://bugs.openjdk.org/browse/JDK-8298138 - Fixed a boundary condition that was triggering an assert. - Added a simple-minded gtest for HdrSeq, which allows one to exercise the asserting code in a debug build. - Tested with: `CONF=slowdebug make run-test TEST="gtest:BasicShenandoahNumberSeqTest"` ------------- Commit messages: - Merge branch 'master' into shen_numberseq - A simple-minded test of HdrSeq which also exercises the problematic - Fix a boundary condition issue w/HdrSeq Changes: https://git.openjdk.org/jdk/pull/11524/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11524&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8298138 Stats: 77 lines in 3 files changed: 74 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/11524.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11524/head:pull/11524 PR: https://git.openjdk.org/jdk/pull/11524 From rkennke at openjdk.org Tue Dec 6 11:37:49 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 6 Dec 2022 11:37:49 GMT Subject: RFR: JDK-8298138: Shenandoah: HdrSeq asserts "sub-bucket index (512) overflow for value ( 1.00)" In-Reply-To: References: Message-ID: On Tue, 6 Dec 2022 03:46:12 GMT, Y. Srinivas Ramakrishna wrote: > JBS link: https://bugs.openjdk.org/browse/JDK-8298138 > - Fixed a boundary condition that was triggering an assert. > - Added a simple-minded gtest for HdrSeq, which allows one to exercise the asserting code in a debug build. > - Tested with: `CONF=slowdebug make run-test TEST="gtest:BasicShenandoahNumberSeqTest"` Hi Ramki, The change looks good. I have a few minor comments. src/hotspot/share/gc/shenandoah/shenandoahNumberSeq.hpp line 58: > 56: // It has very low memory requirements, and is thread-safe. When accuracy > 57: // is not needed, it is preferred over HdrSeq. > 58: class BinaryMagnitudeSeq : public CHeapObj { What is the relevance of this change? Also, if it *is* necessary, then it should be mtGC. test/hotspot/gtest/gc/shenandoah/test_shenandoahNumberSeq.cpp line 2: > 1: /* > 2: * Copyright (c) 2016, 2017, Oracle and/or its affiliates. All rights reserved. The copyright should be 2022. ------------- Changes requested by rkennke (Reviewer). PR: https://git.openjdk.org/jdk/pull/11524 From kdnilsen at openjdk.org Tue Dec 6 15:48:18 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 6 Dec 2022 15:48:18 GMT Subject: RFR: Generation resizing [v2] In-Reply-To: References:

Message-ID: On Mon, 5 Dec 2022 19:52:35 GMT, William Kemper wrote: >> These changes have the generational mode track the minimum mutator utilization (percentage of process time used by mutators). When it falls below a configuration percentage (GCTimeRatio), a heuristic will transfer memory capacity to whatever generation has been using more CPU time. The assumption here is that by increasing capacity, we will decrease the collection frequency and improve the MMU. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Remove unnecessary logging, clean up imports > - Merge from shenandoah/master > - Document the class responsible for adjusting generation sizes > - Revert unnecessary change > - Remove unused time between cycle tracking > - Remove vestigial mmu tracker instance > - Clamp adjustments to min/max when increment is too large > - Adjust generation sizes from safepoint > - Fix crash in SATB mode, always log average MMU on scheduled interval > - Limits on generation size adjustments, log young/old heap occupancy in generational mode > - ... and 5 more: https://git.openjdk.org/shenandoah/compare/f90a7701...41f057fa LGTM. I'm not yet convinced that this is the right heuristic, or the only heuristic for resizing generations. But this is a huge step making it possible to adjust generation sizes on the fly. I expect further refinement will be driven by additional experiments. ------------- Marked as reviewed by kdnilsen (Committer). PR: https://git.openjdk.org/shenandoah/pull/177 From wkemper at openjdk.org Tue Dec 6 16:48:37 2022 From: wkemper at openjdk.org (William Kemper) Date: Tue, 6 Dec 2022 16:48:37 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: <-xErRk5G6IkHidX1x7cFmyAHd5mU7DRXN0zqH8IT07Q=.40fd9a98-5f4c-48df-9d5e-55a5adcf7c65@github.com> References: <-xErRk5G6IkHidX1x7cFmyAHd5mU7DRXN0zqH8IT07Q=.40fd9a98-5f4c-48df-9d5e-55a5adcf7c65@github.com> Message-ID: On Mon, 5 Dec 2022 23:16:57 GMT, William Kemper wrote: > Weekly merge from upstream. Looks fine in testing. This pull request has now been integrated. Changeset: 6ce5f226 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/6ce5f226c9110a7c0262c02d86d2ac7539a4a81d Stats: 15750 lines in 611 files changed: 10080 ins; 3270 del; 2400 mod Merge openjdk/jdk:master ------------- PR: https://git.openjdk.org/shenandoah/pull/178 From wkemper at openjdk.org Tue Dec 6 17:26:08 2022 From: wkemper at openjdk.org (William Kemper) Date: Tue, 6 Dec 2022 17:26:08 GMT Subject: RFR: Generation resizing [v3] In-Reply-To: References: Message-ID: <8Fse7IxO14Uc0eJJoLMmGXSo8XYD9Qb144mCyrMX3-g=.1d5d7258-f06d-4a03-bf78-8102f22ada7d@github.com> > These changes have the generational mode track the minimum mutator utilization (percentage of process time used by mutators). When it falls below a configuration percentage (GCTimeRatio), a heuristic will transfer memory capacity to whatever generation has been using more CPU time. The assumption here is that by increasing capacity, we will decrease the collection frequency and improve the MMU. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Remove vestigial lock, do not enroll periodic task while holding threads_lock ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/177/files - new: https://git.openjdk.org/shenandoah/pull/177/files/41f057fa..d7a01946 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=177&range=02 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=177&range=01-02 Stats: 8 lines in 3 files changed: 2 ins; 3 del; 3 mod Patch: https://git.openjdk.org/shenandoah/pull/177.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/177/head:pull/177 PR: https://git.openjdk.org/shenandoah/pull/177 From ysr at openjdk.org Tue Dec 6 18:23:16 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 6 Dec 2022 18:23:16 GMT Subject: RFR: JDK-8298138: Shenandoah: HdrSeq asserts "sub-bucket index (512) overflow for value ( 1.00)" [v2] In-Reply-To: References: Message-ID: > JBS link: https://bugs.openjdk.org/browse/JDK-8298138 > - Fixed a boundary condition that was triggering an assert. > - Added a simple-minded gtest for HdrSeq, which allows one to exercise the asserting code in a debug build. > - Tested with: `CONF=slowdebug make run-test TEST="gtest:BasicShenandoahNumberSeqTest"` Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: - Copyright dates etc. - include reorder to alphabetic; don't use/include std:: namespace. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11524/files - new: https://git.openjdk.org/jdk/pull/11524/files/fb5cd5d0..a714630c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11524&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11524&range=00-01 Stats: 14 lines in 2 files changed: 2 ins; 2 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/11524.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11524/head:pull/11524 PR: https://git.openjdk.org/jdk/pull/11524 From ysr at openjdk.org Tue Dec 6 18:23:17 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 6 Dec 2022 18:23:17 GMT Subject: RFR: JDK-8298138: Shenandoah: HdrSeq asserts "sub-bucket index (512) overflow for value ( 1.00)" [v2] In-Reply-To: References:

Message-ID: On Tue, 6 Dec 2022 11:34:16 GMT, Roman Kennke wrote: >> Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: >> >> - Copyright dates etc. >> - include reorder to alphabetic; don't use/include std:: namespace. > > src/hotspot/share/gc/shenandoah/shenandoahNumberSeq.hpp line 58: > >> 56: // It has very low memory requirements, and is thread-safe. When accuracy >> 57: // is not needed, it is preferred over HdrSeq. >> 58: class BinaryMagnitudeSeq : public CHeapObj { > > What is the relevance of this change? Also, if it *is* necessary, then it should be mtGC. Wanted a spec for allocation for correct accounting. Changed to mtGC; thanks! > test/hotspot/gtest/gc/shenandoah/test_shenandoahNumberSeq.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2016, 2017, Oracle and/or its affiliates. All rights reserved. > > The copyright should be 2022. Fixed; thanks for the catch! ------------- PR: https://git.openjdk.org/jdk/pull/11524 From kdnilsen at openjdk.org Tue Dec 6 18:42:58 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 6 Dec 2022 18:42:58 GMT Subject: RFR: Enforce max regions Message-ID: This commit enforces upper bounds on the number of ShenandoahHeapRegions affiliated with each generation. Prior to this change, enforcement of generation sizes was by usage alone. This allowed situations in which so many sparsely populated regions were affiliated with old-gen that there were insufficient FREE regions available to satisfy legitimate young-gen allocation requests. This was resulting in excessive TLAB allocation failures and degenerated collections. ------------- Commit messages: - Fix whitespace - Merge remote-tracking branch 'GitFarmBranch/enforce-max-old-regions' into enforce-max-regions - Remove instrumentation - Fixup region budgeting errors - Fix spelling error in assertion symbol - Enforce bounds on regions per generation Changes: https://git.openjdk.org/shenandoah/pull/179/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=179&range=00 Stats: 170 lines in 10 files changed: 103 ins; 7 del; 60 mod Patch: https://git.openjdk.org/shenandoah/pull/179.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/179/head:pull/179 PR: https://git.openjdk.org/shenandoah/pull/179 From wkemper at openjdk.org Tue Dec 6 21:35:11 2022 From: wkemper at openjdk.org (William Kemper) Date: Tue, 6 Dec 2022 21:35:11 GMT Subject: RFR: Enforce max regions In-Reply-To: References: Message-ID: <9me-r1yDWDfDp2MTKgO0QkdXwjJsMWcXAWg95oqHlS0=.6182ff46-8a88-413c-a455-05474923760d@github.com> On Tue, 6 Dec 2022 17:57:18 GMT, Kelvin Nilsen wrote: > This commit enforces upper bounds on the number of ShenandoahHeapRegions affiliated with each generation. Prior to this change, enforcement of generation sizes was by usage alone. This allowed situations in which so many sparsely populated regions were affiliated with old-gen that there were insufficient FREE regions available to satisfy legitimate young-gen allocation requests. This was resulting in excessive TLAB allocation failures and degenerated collections. I saw a pattern like this in a couple of a places for young and old generations: size_t avail_young_regions = ((_heap->young_generation()->adjusted_capacity() - _heap->young_generation()->used_regions_size()) / ShenandoahHeapRegion::region_size_bytes()); We also have this method in `ShenandoahGeneration` called `free_unaffiliated_regions` which is similar, except that it uses soft max capacity, instead of adjusted capacity. Could these calculations be consolidated? ------------- PR: https://git.openjdk.org/shenandoah/pull/179 From wkemper at openjdk.org Tue Dec 6 22:03:18 2022 From: wkemper at openjdk.org (William Kemper) Date: Tue, 6 Dec 2022 22:03:18 GMT Subject: RFR: Enforce max regions In-Reply-To: References: Message-ID: On Tue, 6 Dec 2022 17:57:18 GMT, Kelvin Nilsen wrote: > This commit enforces upper bounds on the number of ShenandoahHeapRegions affiliated with each generation. Prior to this change, enforcement of generation sizes was by usage alone. This allowed situations in which so many sparsely populated regions were affiliated with old-gen that there were insufficient FREE regions available to satisfy legitimate young-gen allocation requests. This was resulting in excessive TLAB allocation failures and degenerated collections. src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 1031: > 1029: // affiliation to OLD_GENERATION and adjust the generation-use tallies. The remnant of memory > 1030: // in the last humongous region that is not spanned by obj is currently not used. > 1031: for (size_t i = index(); i < index_limit; i++) { Do we need to worry about races here? Could we have a separate evacuating thread take a new, old region to use for PLABs _after_ we checked old available regions? ------------- PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Tue Dec 6 22:41:39 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 6 Dec 2022 22:41:39 GMT Subject: RFR: Enforce max regions In-Reply-To: References: Message-ID: On Tue, 6 Dec 2022 17:57:18 GMT, Kelvin Nilsen wrote: > This commit enforces upper bounds on the number of ShenandoahHeapRegions affiliated with each generation. Prior to this change, enforcement of generation sizes was by usage alone. This allowed situations in which so many sparsely populated regions were affiliated with old-gen that there were insufficient FREE regions available to satisfy legitimate young-gen allocation requests. This was resulting in excessive TLAB allocation failures and degenerated collections. I'll also introduce a new method adjusted_unaffiliated_regions() to consolidate the code for the "common pattern" you identified. ------------- PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Tue Dec 6 22:41:42 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 6 Dec 2022 22:41:42 GMT Subject: RFR: Enforce max regions In-Reply-To: References:

Message-ID: <_JDV1PL-12IKTMIyES68372b6-tj8Ps1ZzWPVohlkI0=.1d3d6b5b-c94c-4840-a8d4-1ceaae4dd60b@github.com> On Tue, 6 Dec 2022 21:43:17 GMT, William Kemper wrote: >> This commit enforces upper bounds on the number of ShenandoahHeapRegions affiliated with each generation. Prior to this change, enforcement of generation sizes was by usage alone. This allowed situations in which so many sparsely populated regions were affiliated with old-gen that there were insufficient FREE regions available to satisfy legitimate young-gen allocation requests. This was resulting in excessive TLAB allocation failures and degenerated collections. > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 1031: > >> 1029: // affiliation to OLD_GENERATION and adjust the generation-use tallies. The remnant of memory >> 1030: // in the last humongous region that is not spanned by obj is currently not used. >> 1031: for (size_t i = index(); i < index_limit; i++) { > > Do we need to worry about races here? Could we have a separate evacuating thread take a new, old region to use for PLABs _after_ we checked old available regions? Good catch. I need to grab the heap lock for part of this code. ------------- PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Tue Dec 6 22:41:43 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 6 Dec 2022 22:41:43 GMT Subject: RFR: Enforce max regions In-Reply-To: <_JDV1PL-12IKTMIyES68372b6-tj8Ps1ZzWPVohlkI0=.1d3d6b5b-c94c-4840-a8d4-1ceaae4dd60b@github.com> References:

<_JDV1PL-12IKTMIyES68372b6-tj8Ps1ZzWPVohlkI0=.1d3d6b5b-c94c-4840-a8d4-1ceaae4dd60b@github.com> Message-ID: On Tue, 6 Dec 2022 22:18:01 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 1031: >> >>> 1029: // affiliation to OLD_GENERATION and adjust the generation-use tallies. The remnant of memory >>> 1030: // in the last humongous region that is not spanned by obj is currently not used. >>> 1031: for (size_t i = index(); i < index_limit; i++) { >> >> Do we need to worry about races here? Could we have a separate evacuating thread take a new, old region to use for PLABs _after_ we checked old available regions? > > Good catch. I need to grab the heap lock for part of this code. Good catch. I need to grab the heap lock for part of this work. ------------- PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Wed Dec 7 15:39:34 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Dec 2022 15:39:34 GMT Subject: RFR: Enforce max regions [v2] In-Reply-To: References: Message-ID: > This commit enforces upper bounds on the number of ShenandoahHeapRegions affiliated with each generation. Prior to this change, enforcement of generation sizes was by usage alone. This allowed situations in which so many sparsely populated regions were affiliated with old-gen that there were insufficient FREE regions available to satisfy legitimate young-gen allocation requests. This was resulting in excessive TLAB allocation failures and degenerated collections. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Respond to reviewer feedback ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/179/files - new: https://git.openjdk.org/shenandoah/pull/179/files/f5b3e0db..28a53a86 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=179&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=179&range=00-01 Stats: 60 lines in 4 files changed: 22 ins; 10 del; 28 mod Patch: https://git.openjdk.org/shenandoah/pull/179.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/179/head:pull/179 PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Wed Dec 7 16:14:18 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Dec 2022 16:14:18 GMT Subject: RFR: Enforce max regions [v3] In-Reply-To: References: Message-ID: > This commit enforces upper bounds on the number of ShenandoahHeapRegions affiliated with each generation. Prior to this change, enforcement of generation sizes was by usage alone. This allowed situations in which so many sparsely populated regions were affiliated with old-gen that there were insufficient FREE regions available to satisfy legitimate young-gen allocation requests. This was resulting in excessive TLAB allocation failures and degenerated collections. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix white space and add an assertion ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/179/files - new: https://git.openjdk.org/shenandoah/pull/179/files/28a53a86..4617913f Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=179&range=02 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=179&range=01-02 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/179.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/179/head:pull/179 PR: https://git.openjdk.org/shenandoah/pull/179 From wkemper at openjdk.org Wed Dec 7 17:51:09 2022 From: wkemper at openjdk.org (William Kemper) Date: Wed, 7 Dec 2022 17:51:09 GMT Subject: RFR: Enforce max regions [v3] In-Reply-To: References:

Message-ID: On Tue, 6 Dec 2022 11:35:23 GMT, Roman Kennke wrote: >> Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: >> >> - Copyright dates etc. >> - include reorder to alphabetic; don't use/include std:: namespace. > > Hi Ramki, > The change looks good. I have a few minor comments. @rkennke & @shipilev : could you folks please review and approve? I made the changes requested by Roman. Thanks! -- Ramki ------------- PR: https://git.openjdk.org/jdk/pull/11524 From ysr at openjdk.org Wed Dec 7 21:15:06 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 7 Dec 2022 21:15:06 GMT Subject: RFR: Enforce max regions [v3] In-Reply-To: References:

Message-ID: On Wed, 7 Dec 2022 16:14:18 GMT, Kelvin Nilsen wrote: >> This commit enforces upper bounds on the number of ShenandoahHeapRegions affiliated with each generation. Prior to this change, enforcement of generation sizes was by usage alone. This allowed situations in which so many sparsely populated regions were affiliated with old-gen that there were insufficient FREE regions available to satisfy legitimate young-gen allocation requests. This was resulting in excessive TLAB allocation failures and degenerated collections. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Fix white space and add an assertion Overall looks great to the extent that I understood it. I left a few questions/comments in a few places, typically because I may lack the complete picture of the design and its rationale in a few places. Are there performance numbers to share with these changes? Those could be added either here in the pull request, or in the associated JBS ticket, which can be linked to the PR. Thanks! src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 103: > 101: switch (req.affiliation()) { > 102: case ShenandoahRegionAffiliation::OLD_GENERATION: > 103: if (_heap->old_generation()->adjusted_unaffiliated_regions() <= 0) { Re "<=" : I am guessing this is because adjusted unaffiliated_regions can go negative for periods of time while GC is in progress in a tight heap situation? Unfortunately, the signature of this is a size_t (unsigned), so a "<=" comparison with "0" should have been flagged by the compiler? Or does the compiler silently treated as "==", without issuing a warning about the comparison? In any case, worth thinking about a related question in the definition of adjusted_unaffiliated_count(), and adjusting accordingly. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 126: > 124: for (size_t idx = _mutator_leftmost; idx <= _mutator_rightmost; idx++) { > 125: ShenandoahHeapRegion* r = _heap->get_region(idx); > 126: if (is_mutator_free(idx) && (allow_new_region || r->affiliation() != ShenandoahRegionAffiliation::FREE)) { Aside: Does Shanandoah have a concept of an allocation cursor per mutator in shared space independent of its TLAB? This is because firstly it might make first fit searches more efficient, and secondly we might end up with spatial locality of allocations that are temporally in close proximity from the same mutator, which might help reduce fragmentation and potentially evacuation costs. One might consider resetting the cursors following each minor gc. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 171: > 169: ShenandoahHeapRegion* r = _heap->get_region(idx); > 170: if (can_allocate_from(r)) { > 171: flip_to_gc(r); Does the flipping have to strictly precede the allocation attempt? Otherwise the flip is futile and we steal space from mutators but to no advantage. I also notice the asymmetry in the existence of `flip_to_gc()` but no corresponding `flip_to_mutator()`. I suppose that's because regions freed by GC as a result of evacuation will be available to mutators, so the flipping to GC may be considered temporary in that sense. However, I suspect futile flipping may strand space in GC territory for no good reason. In any case, take my comments here with the right grain of salt because I am lacking the philosophical foundations of the need for this mutator & collector view dichotomy here. It would be good if in the `.hpp` file we expended a few sentences listing the rationale for that design choice; e.g. the allocate from left and allocate from right could still hold without necessarily having strict collector/mutator affiliations (as indicated by the `flip` above)? src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 198: > 196: > 197: if (heap->mode()->is_generational()) { > 198: // Since we probably have not yet reclaimed the most recently selected collection set, we have to defer I'd make the comment less tentative, and state: // Since the most recently selected collection set may not have been reclaimed at this stage, // we'll defer unadjust_avaliable() until after the full gc is completed. Question: is the adjusted available value (modulo the loaned size) used by full gc for any purpose, or is it to satisfy assertion checks / verification in some of the methods invoked during full gc work below? src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 924: > 922: size_t ShenandoahGeneration::decrement_affiliated_region_count() { > 923: _affiliated_region_count--; > 924: return _affiliated_region_count; Both these seem fine and probably more readable, but you'd save a line by returning the pre-{in,de}cremented result, e.g.: `return --_affiliated_region_count;` Would it be useful to assert that the region count is always non-zero? src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 986: > 984: } > 985: > 986: size_t ShenandoahGeneration::adjusted_unaffiliated_regions() { You can const this method too. src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 988: > 986: size_t ShenandoahGeneration::adjusted_unaffiliated_regions() { > 987: assert(adjusted_capacity() > used_regions_size(), "adjusted_unaffiliated_regions() cannot return negative"); > 988: return (adjusted_capacity() - used_regions_size()) / ShenandoahHeapRegion::region_size_bytes(); So, just to be clear, this is the number of unaffiliated regions that can _potentially_ be affiliated with this region. I assume it isn't the case that that number of unaffiliated free regions actually exist? If the answer is "no, that number of unaffiliated free regions do exist" would it be worth asserting that invariant here (or may be because this is all concurrent with allocations, no such guarantees will ever hold anyway, so it's futile to assert such invariants?). Indeed this question ties in with my comment further up where you do a "<=" comparison with 0 on the return value from here. src/hotspot/share/gc/shenandoah/shenandoahGeneration.hpp line 169: > 167: void scan_remembered_set(bool is_concurrent); > 168: > 169: size_t increment_affiliated_region_count(); Add a single line comment in the header file describing what a method returns: // Returns the affiliated region count following the operation. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1491: > 1489: // doing this work during a safepoint. We cannot put humongous regions into the collection set because that > 1490: // triggers the load-reference barrier (LRB) to copy on reference fetch. > 1491: if (r->promote_humongous() == 0) { See my comment in ::promote_humongous(). I think that method could directly call the requisite expansion code under those circumstances, so this code can move there, with (as I noted there) promotion always succeeding for humongous object arrays at least, but in general for all humongous objects that are deemed eligible for promotion by other criteria (see my note in ::promote_humongous() on potentially treating humongous primitive type arrays differently from humongous object arrays). src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 1040: > 1038: // Then fall through to finish the promotion after releasing the heap lock. > 1039: } else { > 1040: return 0; This is interesting. Doing some thinking out loud here. I realize we want to very strictly enforce the generation sizes (indicated by the affiliation of regions to generations in a formal sense of generation sizes), but I do wonder if humongous regions should not enter into that calculus at all? In this case, the reason we would typically want to designate a humongous object as old (via promotion via this method) is because we don't want to have to spend effort scanning its contents. After all we never spend any time copying it when it survives a minor collection. Under the circumstances, it appears as if we would always want humongous objects that are primitive type arrays to stay in young (never be promoted, although I admit that it might make sense to not pay even the cost of marking it if it's been around forever per generational hypothesis), and if a humongous object that has references (i.e. ages into the old generation) then it's affiliated with old and is "promoted" even if there aren't any available regions in old. In other words , humongous objects, because they are never copied, have affiliations that do not affect the promotion calculus in a strict manner. For these reasons, I'd think that humongous object promotions should be treated specially and old generation size should not be a criterion for determining generational affiliation of humongous regions. src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 343: > 341: }; > 342: > 343: class ShenandoahCalculateRegionStatsClosure : public ShenandoahHeapRegionClosure { A one-line documentation spec here would be useful: // A closure used to accumulate the net used, committed, and garbage bytes, and number of regions; // typically associated with a generation in generational mode. ------------- Marked as reviewed by ysr (Author). PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Wed Dec 7 21:20:00 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Dec 2022 21:20:00 GMT Subject: RFR: Enforce max regions [v3] In-Reply-To: References:

Message-ID: On Wed, 7 Dec 2022 20:25:23 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix white space and add an assertion > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 103: > >> 101: switch (req.affiliation()) { >> 102: case ShenandoahRegionAffiliation::OLD_GENERATION: >> 103: if (_heap->old_generation()->adjusted_unaffiliated_regions() <= 0) { > > Re "<=" : I am guessing this is because adjusted unaffiliated_regions can go negative for periods of time while GC is in progress in a tight heap situation? > > Unfortunately, the signature of this is a size_t (unsigned), so a "<=" comparison with "0" should have been flagged by the compiler? Or does the compiler silently treated as "==", without issuing a warning about the comparison? In any case, worth thinking about a related question in the definition of adjusted_unaffiliated_count(), and adjusting accordingly. This is really a test for ==, and the compiler doesn't complain because the test is meaningful as written (though perhaps confusing as written). OTOH, writing it this way makes the code more "future proof" in case someone changes the return type to signed. ------------- PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Wed Dec 7 21:23:06 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Dec 2022 21:23:06 GMT Subject: RFR: Enforce max regions [v3] In-Reply-To: References:

Message-ID: On Wed, 7 Dec 2022 18:36:39 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix white space and add an assertion > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 986: > >> 984: } >> 985: >> 986: size_t ShenandoahGeneration::adjusted_unaffiliated_regions() { > > You can const this method too. Thanks. I'll change this. > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 988: > >> 986: size_t ShenandoahGeneration::adjusted_unaffiliated_regions() { >> 987: assert(adjusted_capacity() > used_regions_size(), "adjusted_unaffiliated_regions() cannot return negative"); >> 988: return (adjusted_capacity() - used_regions_size()) / ShenandoahHeapRegion::region_size_bytes(); > > So, just to be clear, this is the number of unaffiliated regions that can _potentially_ be affiliated with this region. I assume it isn't the case that that number of unaffiliated free regions actually exist? > > If the answer is "no, that number of unaffiliated free regions do exist" would it be worth asserting that invariant here (or may be because this is all concurrent with allocations, no such guarantees will ever hold anyway, so it's futile to assert such invariants?). > > Indeed this question ties in with my comment further up where you do a "<=" comparison with 0 on the return value from here. Yes. That is accurate. This is the number of regions that are currently affiliated with FREE, which are eligible to be affiliated as part of this generation if we have reason to do so. If this value is zero, then the entire adjusted_capacity is consumed by the regions already affiliated with this generation, and we are not allowed to move any more FREE regions into this generation. ------------- PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Wed Dec 7 21:30:13 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Dec 2022 21:30:13 GMT Subject: RFR: Enforce max regions [v3] In-Reply-To: References:

Message-ID: On Wed, 7 Dec 2022 18:19:50 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix white space and add an assertion > > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 1040: > >> 1038: // Then fall through to finish the promotion after releasing the heap lock. >> 1039: } else { >> 1040: return 0; > > This is interesting. Doing some thinking out loud here. > > I realize we want to very strictly enforce the generation sizes (indicated by the affiliation of regions to generations in a formal sense of generation sizes), but I do wonder if humongous regions should not enter into that calculus at all? In this case, the reason we would typically want to designate a humongous object as old (via promotion via this method) is because we don't want to have to spend effort scanning its contents. After all we never spend any time copying it when it survives a minor collection. Under the circumstances, it appears as if we would always want humongous objects that are primitive type arrays to stay in young (never be promoted, although I admit that it might make sense to not pay even the cost of marking it if it's been around forever per generational hypothesis), and if a humongous object that has references (i.e. ages into the old generation) then it's affiliated with old and is "promoted" even if there aren't any available regions in old. In other wor ds, humongous objects, because they are never copied, have affiliations that do not affect the promotion calculus in a strict manner. > > For these reasons, I'd think that humongous object promotions should be treated specially and old generation size should not be a criterion for determining generational affiliation of humongous regions. I'm going to add a TODO comment here, so that we can think about changing this behavior. I totally agree with your rationale. Problem is that we have "assumptions" and "invariants" scattered throughout the existing implementation that need to be carefully reconsidered if we allow the rules to bend. (For example: there are lots of size_t subtractions that may overflow to huge unmeaningful numbers, and if we run with ShenandoahVerify enabled, it will complain if the size of the generation exceeds it capacity. ------------- PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Wed Dec 7 21:45:01 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Dec 2022 21:45:01 GMT Subject: RFR: Enforce max regions [v3] In-Reply-To: References:

Message-ID: On Wed, 7 Dec 2022 20:50:45 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix white space and add an assertion > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 126: > >> 124: for (size_t idx = _mutator_leftmost; idx <= _mutator_rightmost; idx++) { >> 125: ShenandoahHeapRegion* r = _heap->get_region(idx); >> 126: if (is_mutator_free(idx) && (allow_new_region || r->affiliation() != ShenandoahRegionAffiliation::FREE)) { > > Aside: Does Shanandoah have a concept of an allocation cursor per mutator in shared space independent of its TLAB? This is because firstly it might make first fit searches more efficient, and secondly we might end up with spatial locality of allocations that are temporally in close proximity from the same mutator, which might help reduce fragmentation and potentially evacuation costs. > > One might consider resetting the cursors following each minor gc. There is no concept of an allocation cursor per mutator. In tracing some "anomalous" behaviors, I observed that the search for a heap region with memory available to be allocated can be very cumbersome. As young memory becomes more scarce, the effort consumed by each thread trying to allocate (under lock by the way) becomes more and more costly, having to sequentially examine large numbers of regions (possibly more than a thousand regions) to find the first region with sufficient space to satisfy the allocation request. We could definitely make some improvements here, especially because the allocating threads holds the lock throughout this traversal. Another possible improvement is to not require the global heap lock while searching for a region to serve tlab allocation request. > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 171: > >> 169: ShenandoahHeapRegion* r = _heap->get_region(idx); >> 170: if (can_allocate_from(r)) { >> 171: flip_to_gc(r); > > Does the flipping have to strictly precede the allocation attempt? Otherwise the flip is futile and we steal space from mutators but to no advantage. > > I also notice the asymmetry in the existence of `flip_to_gc()` but no corresponding `flip_to_mutator()`. I suppose that's because regions freed by GC as a result of evacuation will be available to mutators, so the flipping to GC may be considered temporary in that sense. However, I suspect futile flipping may strand space in GC territory for no good reason. > > In any case, take my comments here with the right grain of salt because I am lacking the philosophical foundations of the need for this mutator & collector view dichotomy here. It would be good if in the `.hpp` file we expended a few sentences listing the rationale for that design choice; e.g. the allocate from left and allocate from right could still hold without necessarily having strict collector/mutator affiliations (as indicated by the `flip` above)? There's no flip to mutator because we do not allow a mutator allocation request to take memory that had been set aside for use by the collector (for evacuations). If a mutator alloc "fails", we can stall that single mutating thread. If a GC evacuation fails, we have to force all threads into a safepoint so that we can perform a FULL GC. This is the reason we don't allow mutators to flip_to_mutator(). ------------- PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Wed Dec 7 22:00:06 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Dec 2022 22:00:06 GMT Subject: RFR: Enforce max regions [v3] In-Reply-To: References:

Message-ID: On Wed, 7 Dec 2022 21:27:10 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 1040: >> >>> 1038: // Then fall through to finish the promotion after releasing the heap lock. >>> 1039: } else { >>> 1040: return 0; >> >> This is interesting. Doing some thinking out loud here. >> >> I realize we want to very strictly enforce the generation sizes (indicated by the affiliation of regions to generations in a formal sense of generation sizes), but I do wonder if humongous regions should not enter into that calculus at all? In this case, the reason we would typically want to designate a humongous object as old (via promotion via this method) is because we don't want to have to spend effort scanning its contents. After all we never spend any time copying it when it survives a minor collection. Under the circumstances, it appears as if we would always want humongous objects that are primitive type arrays to stay in young (never be promoted, although I admit that it might make sense to not pay even the cost of marking it if it's been around forever per generational hypothesis), and if a humongous object that has references (i.e. ages into the old generation) then it's affiliated with old and is "promoted" even if there aren't any available regions in old. In other wo rds, humongous objects, because they are never copied, have affiliations that do not affect the promotion calculus in a strict manner. >> >> For these reasons, I'd think that humongous object promotions should be treated specially and old generation size should not be a criterion for determining generational affiliation of humongous regions. > > I'm going to add a TODO comment here, so that we can think about changing this behavior. I totally agree with your rationale. Problem is that we have "assumptions" and "invariants" scattered throughout the existing implementation that need to be carefully reconsidered if we allow the rules to bend. (For example: there are lots of size_t subtractions that may overflow to huge unmeaningful numbers, and if we run with ShenandoahVerify enabled, it will complain if the size of the generation exceeds it capacity. I also like your idea about just keeping primitive humongous objects in YOUNG. That would allow their memory to be reclaimed much more quickly if and when they do become garbage. OTOH, it may create an "unexpected surprise" to anyone who is carefully specifying the sizes of young-gen and old-gen. Once we have auto-sizing of old- and young- fully working, this would be a good tradeoff to make. ------------- PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Wed Dec 7 22:18:38 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Dec 2022 22:18:38 GMT Subject: RFR: Enforce max regions [v3] In-Reply-To: References:

Message-ID: On Wed, 7 Dec 2022 20:17:07 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix white space and add an assertion > > src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 198: > >> 196: >> 197: if (heap->mode()->is_generational()) { >> 198: // Since we probably have not yet reclaimed the most recently selected collection set, we have to defer > > I'd make the comment less tentative, and state: > > // Since the most recently selected collection set may not have been reclaimed at this stage, > // we'll defer unadjust_avaliable() until after the full gc is completed. > > Question: is the adjusted available value (modulo the loaned size) used by full gc for any purpose, or is it to satisfy assertion checks / verification in some of the methods invoked during full gc work below? It's not used by full GC, but the Shenandoah verifier which runs at the start of full gc and then again at the end of full gc enforces compliance with the adjusted budgets. The verifier didn't used to care. But I made it care with this commit, and then I had to change where we do the unadjusting... > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 924: > >> 922: size_t ShenandoahGeneration::decrement_affiliated_region_count() { >> 923: _affiliated_region_count--; >> 924: return _affiliated_region_count; > > Both these seem fine and probably more readable, but you'd save a line by returning the pre-{in,de}cremented result, e.g.: > > `return --_affiliated_region_count;` > > Would it be useful to assert that the region count is always non-zero? Actually, affiliated region count can be zero. Often starts out that way for old-gen. > src/hotspot/share/gc/shenandoah/shenandoahGeneration.hpp line 169: > >> 167: void scan_remembered_set(bool is_concurrent); >> 168: >> 169: size_t increment_affiliated_region_count(); > > Add a single line comment in the header file describing what a method returns: > > // Returns the affiliated region count following the operation. Thanks. > src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 343: > >> 341: }; >> 342: >> 343: class ShenandoahCalculateRegionStatsClosure : public ShenandoahHeapRegionClosure { > > A one-line documentation spec here would be useful: > > // A closure used to accumulate the net used, committed, and garbage bytes, and number of regions; > // typically associated with a generation in generational mode. Thanks. ------------- PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Wed Dec 7 22:18:39 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Dec 2022 22:18:39 GMT Subject: RFR: Enforce max regions [v3] In-Reply-To: References:

Message-ID: On Wed, 7 Dec 2022 22:11:42 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 198: >> >>> 196: >>> 197: if (heap->mode()->is_generational()) { >>> 198: // Since we probably have not yet reclaimed the most recently selected collection set, we have to defer >> >> I'd make the comment less tentative, and state: >> >> // Since the most recently selected collection set may not have been reclaimed at this stage, >> // we'll defer unadjust_avaliable() until after the full gc is completed. >> >> Question: is the adjusted available value (modulo the loaned size) used by full gc for any purpose, or is it to satisfy assertion checks / verification in some of the methods invoked during full gc work below? > > It's not used by full GC, but the Shenandoah verifier which runs at the start of full gc and then again at the end of full gc enforces compliance with the adjusted budgets. > > The verifier didn't used to care. But I made it care with this commit, and then I had to change where we do the unadjusting... I'll fix the comment. ------------- PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Wed Dec 7 22:29:39 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Dec 2022 22:29:39 GMT Subject: RFR: Enforce max regions [v4] In-Reply-To: References: Message-ID: > This commit enforces upper bounds on the number of ShenandoahHeapRegions affiliated with each generation. Prior to this change, enforcement of generation sizes was by usage alone. This allowed situations in which so many sparsely populated regions were affiliated with old-gen that there were insufficient FREE regions available to satisfy legitimate young-gen allocation requests. This was resulting in excessive TLAB allocation failures and degenerated collections. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Comments in response to reviewer feedback ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/179/files - new: https://git.openjdk.org/shenandoah/pull/179/files/4617913f..8e23c321 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=179&range=03 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=179&range=02-03 Stats: 28 lines in 6 files changed: 24 ins; 0 del; 4 mod Patch: https://git.openjdk.org/shenandoah/pull/179.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/179/head:pull/179 PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Wed Dec 7 22:37:55 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Dec 2022 22:37:55 GMT Subject: RFR: Enforce max regions [v3] In-Reply-To: References:

Message-ID: On Wed, 7 Dec 2022 18:22:40 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix white space and add an assertion > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1491: > >> 1489: // doing this work during a safepoint. We cannot put humongous regions into the collection set because that >> 1490: // triggers the load-reference barrier (LRB) to copy on reference fetch. >> 1491: if (r->promote_humongous() == 0) { > > See my comment in ::promote_humongous(). > > I think that method could directly call the requisite expansion code under those circumstances, so this code can move there, with (as I noted there) promotion always succeeding for humongous object arrays at least, but in general for all humongous objects that are deemed eligible for promotion by other criteria (see my note in ::promote_humongous() on potentially treating humongous primitive type arrays differently from humongous object arrays). I like your ideas, but I'll suggest we tackle this in a future distinct pr. ------------- PR: https://git.openjdk.org/shenandoah/pull/179 From kdnilsen at openjdk.org Wed Dec 7 22:55:51 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 7 Dec 2022 22:55:51 GMT Subject: Integrated: Enforce max regions In-Reply-To: References: Message-ID: <2pK2RGabJYCsXrLQnb7eTQT-NZ0itfK9QydBix_lY0M=.cf35f265-6df3-4c9e-ba20-7c4f1c0ff112@github.com> On Tue, 6 Dec 2022 17:57:18 GMT, Kelvin Nilsen wrote: > This commit enforces upper bounds on the number of ShenandoahHeapRegions affiliated with each generation. Prior to this change, enforcement of generation sizes was by usage alone. This allowed situations in which so many sparsely populated regions were affiliated with old-gen that there were insufficient FREE regions available to satisfy legitimate young-gen allocation requests. This was resulting in excessive TLAB allocation failures and degenerated collections. This pull request has now been integrated. Changeset: 25469283 Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/25469283fbe14e85adeaf0e3a21d40faea5f7288 Stats: 202 lines in 10 files changed: 150 ins; 17 del; 35 mod Enforce max regions Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/179 From ysr at openjdk.org Thu Dec 8 00:59:31 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 8 Dec 2022 00:59:31 GMT Subject: RFR: JDK-8297796 GenShen: instrument the remembered set scan Message-ID: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> **Note:** This pull request is a draft to share the diffs with the project team. The following additional work is planned before this is ready to commit. (Thanks to Kevin, Roman, William etc. for feedback & suggestions.) (1) Collect performance data from SpecJBB and from the pipeline to assess the impact of instrumentation on concurrent remembered set scanning and concurrent update refs phase durations, in addition to the existing data from Extermem mentioned in the ticket. (2) Make available the instrumentation only in non-product (optimized) mode until better performance is achieved. (3) Any improvements that come from further feedback on this draft (e.g. better or different logging of the metrics data), or other suggestions that I may have missed mentioning above. The fix to ShenandoahNumberSeq will be separated out and made into a separate pull request on mainline. **Summary:** The main change is card stats collection during RS scanning. The code is protected by a new diagnostic flag `ShenandoahEnableCardStats`, which is off by default. With the flag disabled there is a small performance impact (measured with extremem; more data will be collected, see above). With the flag enabled there is a larger performance impact because of the large number of clusters, with shared stats updates at the end of each cluster processed. Since we expect the loops in process_clusters() to change in the near future, informed by the learnings from these stats, we expect to work further on reducing the cost of the stats collection as well. Currently the stats are logged per thread at the end of each RS scan. I'm happy to refine both the stats that we collect as well as how frequently we log the data once we have gathered some experience on how we use this. **Details of files changed:** 1. shenandoahGeneration.cpp: add a call to log info at the end of remembered set scan when card stats are enabled. 2. shenandoahHeap.cpp: minor retsructuring of a loop for task claiming during update refs; introduce a worker id option to downstream code for card stats 3. shenandoahNumberSeq.cpp: fix a minor issue with a boundary condition check in code that tries to find the right bucket to increment. This was triggering an assert in the update code. 4. shenandoahNumberSeq.hpp: provide missing allocation spec for BinaryMagnitudeSeq 5. shenandoahScanRemembered.cpp: new class ShenandoahCardStats methods. Minor restructure of loop for task claiming during RS scanning (akin to the one for update refs in 2 above). 6. shenandoahScanRemembered.hpp: Diff looks large because of git-diff'ism having issues with indentation change in restructured if-else branches. Not sure how to make the diffs more easily readable. Updated some documentation comments that were slightly obsolete. New class ShenandoahCardStats and implementation of inline methods. Class ShenandoahScanRemembered keeps cumuative running histograms. Remove some inline declarations for larger methods that we shouldn't force inlining on. Update some old comments. 7. shenandoahScanRemembered.inline.hpp: As in 6, diff looks larger than it should because of the same indentation change. ShenandoahScanRemembered::process_clusters() is the method where the instrumentation probes have been inserted. A couple of variables were renamed for clarity, as well as ti update local variables rather than method arguments. The large diffs at (old) line 589 onwards is the git-diff'ism to do with indentation change. Delete some unused methods. 8. shenandoah_globals.hpp: new diagnostic flag `ShenandoahEnableCardStats` protects the stats collection code and is disabled by default. **Format of stats produced and how to interpret them: (sample)** [1211.515s][info][gc,task ] GC(7069) Using 10 of 20 workers for Concurrent remembered set scanning [1211.529s][info][gc,remset ] GC(7069) Worker 0 Card Stats Histo: [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1245.00 ] [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1157.00 ] [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] [1211.529s][info][gc,remset ] GC(7069) Worker 1 Card Stats Histo: [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1257.00 ] [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1197.00 ] [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] ... The rows represent the metric that's being tracked, and the columns are, respectively, minimum, the 3 quartiles (25%, 50%, 75%) and the maximum. The metrics are: - dirty_run: the length of an uninterrupted run of dirty cards, interpretedas a percentage of a chunk of work assignment (cluster) processed by a thread - clean_run: as above, but the length of an uninterrupted run of clean cards - dirty_cards, clean_cards: as above, but counts of cards as a percentage of chunk - max_dirty_run & max_clean_run: Similarly for the maximum of each. - dirty_objs, clean_objs: these are numbers of objects in any chunk walked, or scanned - dirty_scans, clean_scans: numbers of objects scanned by the closure - alternations: the number of times that we transitioned from clean to dirty or dirty to clean in a chunk The data above indicates that at least 75% of the chunks have no alternations at all, and cards are almost always mostly clean for this specific benchmark config (extremem). Comparing worker stats from worker 0 and worker 9 indicates very little difference between their statistics, as one might typically expect for well-balanced RS scans. **Questions:** 1. Would it make sense to print also, for example, the 1, 10, 90 and 99 percentiles for these metrics as well, in addition to the quartiles? 2. The distributions are per worker for the cumulative history of the run. Would data per RS scan or per Refs Update phase provide more useful information? 3. Any suggestions for a more easily consumable format? 4. I welcome any other feedback on the pull request. ------------- Commit messages: - Merge branch 'master' into JVM-1264 - Card stats only in non-product mode (until impact of stats collection is - Merge branch 'master' into JVM-1264 - Merge branch 'master' into JVM-1264 - jcheck whitespace fixes. - Fix card_stats() so it doesn't crash when card stats aren't enabled. - Fix comment. - Don't allocate stats arrays if not enabled. Should we decide we want - Disable card stats printing when disabled - Remove compile time preprocesor option. - ... and 25 more: https://git.openjdk.org/shenandoah/compare/25469283...f5669577 Changes: https://git.openjdk.org/shenandoah/pull/176/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297796 Stats: 738 lines in 8 files changed: 369 ins; 220 del; 149 mod Patch: https://git.openjdk.org/shenandoah/pull/176.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/176/head:pull/176 PR: https://git.openjdk.org/shenandoah/pull/176 From ysr at openjdk.org Thu Dec 8 00:59:33 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 8 Dec 2022 00:59:33 GMT Subject: RFR: JDK-8297796 GenShen: instrument the remembered set scan In-Reply-To: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> References: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> Message-ID: On Thu, 1 Dec 2022 19:55:45 GMT, Y. Srinivas Ramakrishna wrote: > **Note:** > This pull request is a draft to share the diffs with the project team. The following additional work is planned before this is ready to commit. (Thanks to Kevin, Roman, William etc. for feedback & suggestions.) > > (1) Collect performance data from SpecJBB and from the pipeline to assess the impact of instrumentation on concurrent remembered set scanning and concurrent update refs phase durations, in addition to the existing data from Extermem mentioned in the ticket. > > (2) Make available the instrumentation only in non-product (optimized) mode until better performance is achieved. > > (3) Any improvements that come from further feedback on this draft (e.g. better or different logging of the metrics data), or other suggestions that I may have missed mentioning above. > > The fix to ShenandoahNumberSeq will be separated out and made into a separate pull request on mainline. > > **Summary:** > The main change is card stats collection during RS scanning. The code is protected by a new diagnostic flag `ShenandoahEnableCardStats`, which is off by default. With the flag disabled there is a small performance impact (measured with extremem; more data will be collected, see above). With the flag enabled there is a larger performance impact because of the large number of clusters, with shared stats updates at the end of each cluster processed. Since we expect the loops in process_clusters() to change in the near future, informed by the learnings from these stats, we expect to work further on reducing the cost of the stats collection as well. Currently the stats are logged per thread at the end of each RS scan. I'm happy to refine both the stats that we collect as well as how frequently we log the data once we have gathered some experience on how we use this. > > **Details of files changed:** > > 1. shenandoahGeneration.cpp: add a call to log info at the end of remembered set scan when card stats are enabled. > 2. shenandoahHeap.cpp: minor retsructuring of a loop for task claiming during update refs; introduce a worker id option to downstream code for card stats > 3. shenandoahNumberSeq.cpp: fix a minor issue with a boundary condition check in code that tries to find the right bucket to increment. This was triggering an assert in the update code. > 4. shenandoahNumberSeq.hpp: provide missing allocation spec for BinaryMagnitudeSeq > 5. shenandoahScanRemembered.cpp: new class ShenandoahCardStats methods. Minor restructure of loop for task claiming during RS scanning (akin to the one for update refs in 2 above). > 6. shenandoahScanRemembered.hpp: Diff looks large because of git-diff'ism having issues with indentation change in restructured if-else branches. Not sure how to make the diffs more easily readable. Updated some documentation comments that were slightly obsolete. New class ShenandoahCardStats and implementation of inline methods. Class ShenandoahScanRemembered keeps cumuative running histograms. Remove some inline declarations for larger methods that we shouldn't force inlining on. Update some old comments. > 7. shenandoahScanRemembered.inline.hpp: As in 6, diff looks larger than it should because of the same indentation change. ShenandoahScanRemembered::process_clusters() is the method where the instrumentation probes have been inserted. A couple of variables were renamed for clarity, as well as ti update local variables rather than method arguments. The large diffs at (old) line 589 onwards is the git-diff'ism to do with indentation change. Delete some unused methods. > 8. shenandoah_globals.hpp: new diagnostic flag `ShenandoahEnableCardStats` protects the stats collection code and is disabled by default. > > **Format of stats produced and how to interpret them: (sample)** > > > [1211.515s][info][gc,task ] GC(7069) Using 10 of 20 workers for Concurrent remembered set scanning > [1211.529s][info][gc,remset ] GC(7069) Worker 0 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1245.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1157.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > [1211.529s][info][gc,remset ] GC(7069) Worker 1 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1257.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1197.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > ... > > > The rows represent the metric that's being tracked, and the columns are, respectively, minimum, the 3 quartiles (25%, 50%, 75%) and the maximum. The metrics are: > > - dirty_run: the length of an uninterrupted run of dirty cards, interpretedas a percentage of a chunk of work assignment (cluster) processed by a thread > - clean_run: as above, but the length of an uninterrupted run of clean cards > - dirty_cards, clean_cards: as above, but counts of cards as a percentage of chunk > - max_dirty_run & max_clean_run: Similarly for the maximum of each. > - dirty_objs, clean_objs: these are numbers of objects in any chunk walked, or scanned > - dirty_scans, clean_scans: numbers of objects scanned by the closure > - alternations: the number of times that we transitioned from clean to dirty or dirty to clean in a chunk > > The data above indicates that at least 75% of the chunks have no alternations at all, > and cards are almost always mostly clean for this specific benchmark config (extremem). > > Comparing worker stats from worker 0 and worker 9 indicates very little difference between > their statistics, as one might typically expect for well-balanced RS scans. > > **Questions:** > > 1. Would it make sense to print also, for example, the 1, 10, 90 and 99 percentiles for these metrics as well, in addition to the quartiles? > 2. The distributions are per worker for the cumulative history of the run. Would data per RS scan or per Refs Update phase provide more useful information? > 3. Any suggestions for a more easily consumable format? > 4. I welcome any other feedback on the pull request. Pulled code into non-product mode. Will verify that changes are performance-neutral in product mode. Built & tested slowdebug, fastdebug, optimized, and product builds, and verified that flag & code could be enabled only in non-product builds, and was off by default in all non-debug modes (including optimized where it was available, but disabled by default). Please see the draft pull request message above for further details. The PR is now open for review; thanks for your reviews/comments/feedback! src/hotspot/share/gc/shenandoah/shenandoahNumberSeq.cpp line 59: > 57: if (v > 0) { > 58: mag = 0; > 59: while (v >= 1) { You can safely ignore the changes in this file and the next. They are part of a separate PR to tip, and will eventually get reconciled when tip is merged into the project repo. ------------- PR: https://git.openjdk.org/shenandoah/pull/176 From ysr at openjdk.org Thu Dec 8 01:11:43 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 8 Dec 2022 01:11:43 GMT Subject: RFR: JDK-8297796 GenShen: instrument the remembered set scan [v2] In-Reply-To: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> References: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> Message-ID: <9E6NmFY5877JXtI7RKpqa1r2nXDaEJ7xxLG9q0hEP6U=.03c76ffe-bac9-4401-9091-4ee19d6a394e@github.com> > **Note:** > This pull request is a draft to share the diffs with the project team. The following additional work is planned before this is ready to commit. (Thanks to Kevin, Roman, William etc. for feedback & suggestions.) > > (1) Collect performance data from SpecJBB and from the pipeline to assess the impact of instrumentation on concurrent remembered set scanning and concurrent update refs phase durations, in addition to the existing data from Extermem mentioned in the ticket. > > (2) Make available the instrumentation only in non-product (optimized) mode until better performance is achieved. > > (3) Any improvements that come from further feedback on this draft (e.g. better or different logging of the metrics data), or other suggestions that I may have missed mentioning above. > > The fix to ShenandoahNumberSeq will be separated out and made into a separate pull request on mainline. > > **Summary:** > The main change is card stats collection during RS scanning. The code is protected by a new diagnostic flag `ShenandoahEnableCardStats`, which is off by default. With the flag disabled there is a small performance impact (measured with extremem; more data will be collected, see above). With the flag enabled there is a larger performance impact because of the large number of clusters, with shared stats updates at the end of each cluster processed. Since we expect the loops in process_clusters() to change in the near future, informed by the learnings from these stats, we expect to work further on reducing the cost of the stats collection as well. Currently the stats are logged per thread at the end of each RS scan. I'm happy to refine both the stats that we collect as well as how frequently we log the data once we have gathered some experience on how we use this. > > **Details of files changed:** > > 1. shenandoahGeneration.cpp: add a call to log info at the end of remembered set scan when card stats are enabled. > 2. shenandoahHeap.cpp: minor retsructuring of a loop for task claiming during update refs; introduce a worker id option to downstream code for card stats > 3. shenandoahNumberSeq.cpp: fix a minor issue with a boundary condition check in code that tries to find the right bucket to increment. This was triggering an assert in the update code. > 4. shenandoahNumberSeq.hpp: provide missing allocation spec for BinaryMagnitudeSeq > 5. shenandoahScanRemembered.cpp: new class ShenandoahCardStats methods. Minor restructure of loop for task claiming during RS scanning (akin to the one for update refs in 2 above). > 6. shenandoahScanRemembered.hpp: Diff looks large because of git-diff'ism having issues with indentation change in restructured if-else branches. Not sure how to make the diffs more easily readable. Updated some documentation comments that were slightly obsolete. New class ShenandoahCardStats and implementation of inline methods. Class ShenandoahScanRemembered keeps cumuative running histograms. Remove some inline declarations for larger methods that we shouldn't force inlining on. Update some old comments. > 7. shenandoahScanRemembered.inline.hpp: As in 6, diff looks larger than it should because of the same indentation change. ShenandoahScanRemembered::process_clusters() is the method where the instrumentation probes have been inserted. A couple of variables were renamed for clarity, as well as ti update local variables rather than method arguments. The large diffs at (old) line 589 onwards is the git-diff'ism to do with indentation change. Delete some unused methods. > 8. shenandoah_globals.hpp: new diagnostic flag `ShenandoahEnableCardStats` protects the stats collection code and is disabled by default. > > **Format of stats produced and how to interpret them: (sample)** > > > [1211.515s][info][gc,task ] GC(7069) Using 10 of 20 workers for Concurrent remembered set scanning > [1211.529s][info][gc,remset ] GC(7069) Worker 0 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1245.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1157.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > [1211.529s][info][gc,remset ] GC(7069) Worker 1 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1257.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1197.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > ... > > > The rows represent the metric that's being tracked, and the columns are, respectively, minimum, the 3 quartiles (25%, 50%, 75%) and the maximum. The metrics are: > > - dirty_run: the length of an uninterrupted run of dirty cards, interpretedas a percentage of a chunk of work assignment (cluster) processed by a thread > - clean_run: as above, but the length of an uninterrupted run of clean cards > - dirty_cards, clean_cards: as above, but counts of cards as a percentage of chunk > - max_dirty_run & max_clean_run: Similarly for the maximum of each. > - dirty_objs, clean_objs: these are numbers of objects in any chunk walked, or scanned > - dirty_scans, clean_scans: numbers of objects scanned by the closure > - alternations: the number of times that we transitioned from clean to dirty or dirty to clean in a chunk > > The data above indicates that at least 75% of the chunks have no alternations at all, > and cards are almost always mostly clean for this specific benchmark config (extremem). > > Comparing worker stats from worker 0 and worker 9 indicates very little difference between > their statistics, as one might typically expect for well-balanced RS scans. > > **Questions:** > > 1. Would it make sense to print also, for example, the 1, 10, 90 and 99 percentiles for these metrics as well, in addition to the quartiles? > 2. The distributions are per worker for the cumulative history of the run. Would data per RS scan or per Refs Update phase provide more useful information? > 3. Any suggestions for a more easily consumable format? > 4. I welcome any other feedback on the pull request. Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: Moved some more methods into non-product mode. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/176/files - new: https://git.openjdk.org/shenandoah/pull/176/files/f5669577..c0a4a9d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=00-01 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/shenandoah/pull/176.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/176/head:pull/176 PR: https://git.openjdk.org/shenandoah/pull/176 From ysr at openjdk.org Thu Dec 8 09:13:40 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 8 Dec 2022 09:13:40 GMT Subject: RFR: JDK-8298138: Shenandoah: HdrSeq asserts "sub-bucket index (512) overflow for value ( 1.00)" [v3] In-Reply-To: References: Message-ID: > JBS link: https://bugs.openjdk.org/browse/JDK-8298138 > - Fixed a boundary condition that was triggering an assert. > - Added a simple-minded gtest for HdrSeq, which allows one to exercise the asserting code in a debug build. > - Tested with: `CONF=slowdebug make run-test TEST="gtest:BasicShenandoahNumberSeqTest"` Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into shen_numberseq - - Copyright dates etc. - include reorder to alphabetic; don't use/include std:: namespace. - Merge branch 'master' into shen_numberseq - A simple-minded test of HdrSeq which also exercises the problematic code. - Fix a boundary condition issue w/HdrSeq ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11524/files - new: https://git.openjdk.org/jdk/pull/11524/files/a714630c..a0edcbda Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11524&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11524&range=01-02 Stats: 14012 lines in 340 files changed: 9690 ins; 3152 del; 1170 mod Patch: https://git.openjdk.org/jdk/pull/11524.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11524/head:pull/11524 PR: https://git.openjdk.org/jdk/pull/11524 From shade at openjdk.org Thu Dec 8 10:07:26 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 Dec 2022 10:07:26 GMT Subject: RFR: JDK-8298138: Shenandoah: HdrSeq asserts "sub-bucket index (512) overflow for value ( 1.00)" [v3] In-Reply-To: References:

Message-ID: On Thu, 8 Dec 2022 09:13:40 GMT, Y. Srinivas Ramakrishna wrote: >> JBS link: https://bugs.openjdk.org/browse/JDK-8298138 >> - Fixed a boundary condition that was triggering an assert. >> - Added a simple-minded gtest for HdrSeq, which allows one to exercise the asserting code in a debug build. >> - Tested with: `CONF=slowdebug make run-test TEST="gtest:BasicShenandoahNumberSeqTest"` > > Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into shen_numberseq > - - Copyright dates etc. > - include reorder to alphabetic; don't use/include std:: namespace. > - Merge branch 'master' into shen_numberseq > - A simple-minded test of HdrSeq which also exercises the problematic > code. > - Fix a boundary condition issue w/HdrSeq So the failure is: we want the bucket to cover `[a; a+n)`, but current code makes it cover `[a; a+n]`, which means the right-most value would overflow its assignment for sub-bucket? If so, the fix looks good. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.org/jdk/pull/11524 From rkennke at openjdk.org Thu Dec 8 14:15:58 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 8 Dec 2022 14:15:58 GMT Subject: RFR: JDK-8298138: Shenandoah: HdrSeq asserts "sub-bucket index (512) overflow for value ( 1.00)" [v3] In-Reply-To: References:

Message-ID: <49r00-alghwN1jXs4qCkuePRKAl2ZKEov5G0bWH7aQQ=.ec3b80c4-4b70-4251-a3f5-7492578c111a@github.com> On Thu, 8 Dec 2022 09:13:40 GMT, Y. Srinivas Ramakrishna wrote: >> JBS link: https://bugs.openjdk.org/browse/JDK-8298138 >> - Fixed a boundary condition that was triggering an assert. >> - Added a simple-minded gtest for HdrSeq, which allows one to exercise the asserting code in a debug build. >> - Tested with: `CONF=slowdebug make run-test TEST="gtest:BasicShenandoahNumberSeqTest"` > > Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into shen_numberseq > - - Copyright dates etc. > - include reorder to alphabetic; don't use/include std:: namespace. > - Merge branch 'master' into shen_numberseq > - A simple-minded test of HdrSeq which also exercises the problematic > code. > - Fix a boundary condition issue w/HdrSeq Looks good, thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.org/jdk/pull/11524 From rkennke at openjdk.org Thu Dec 8 14:22:55 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 8 Dec 2022 14:22:55 GMT Subject: RFR: Generation resizing [v3] In-Reply-To: <8Fse7IxO14Uc0eJJoLMmGXSo8XYD9Qb144mCyrMX3-g=.1d5d7258-f06d-4a03-bf78-8102f22ada7d@github.com> References: <8Fse7IxO14Uc0eJJoLMmGXSo8XYD9Qb144mCyrMX3-g=.1d5d7258-f06d-4a03-bf78-8102f22ada7d@github.com> Message-ID: On Tue, 6 Dec 2022 17:26:08 GMT, William Kemper wrote: >> These changes have the generational mode track the minimum mutator utilization (percentage of process time used by mutators). When it falls below a configuration percentage (GCTimeRatio), a heuristic will transfer memory capacity to whatever generation has been using more CPU time. The assumption here is that by increasing capacity, we will decrease the collection frequency and improve the MMU. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Remove vestigial lock, do not enroll periodic task while holding threads_lock Looks good to me. Thank you! EDIT: well actually, it is still complaining about conflict, and you should also make it ready for review ;-) ------------- Marked as reviewed by rkennke (Lead). PR: https://git.openjdk.org/shenandoah/pull/177 From kdnilsen at openjdk.org Thu Dec 8 14:45:17 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 8 Dec 2022 14:45:17 GMT Subject: RFR: JDK-8297796 GenShen: instrument the remembered set scan [v2] In-Reply-To: <9E6NmFY5877JXtI7RKpqa1r2nXDaEJ7xxLG9q0hEP6U=.03c76ffe-bac9-4401-9091-4ee19d6a394e@github.com> References: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> <9E6NmFY5877JXtI7RKpqa1r2nXDaEJ7xxLG9q0hEP6U=.03c76ffe-bac9-4401-9091-4ee19d6a394e@github.com> Message-ID: On Thu, 8 Dec 2022 01:11:43 GMT, Y. Srinivas Ramakrishna wrote: >> **Note:** >> This pull request is a draft to share the diffs with the project team. The following additional work is planned before this is ready to commit. (Thanks to Kevin, Roman, William etc. for feedback & suggestions.) >> >> (1) Collect performance data from SpecJBB and from the pipeline to assess the impact of instrumentation on concurrent remembered set scanning and concurrent update refs phase durations, in addition to the existing data from Extermem mentioned in the ticket. >> >> (2) Make available the instrumentation only in non-product (optimized) mode until better performance is achieved. >> >> (3) Any improvements that come from further feedback on this draft (e.g. better or different logging of the metrics data), or other suggestions that I may have missed mentioning above. >> >> The fix to ShenandoahNumberSeq will be separated out and made into a separate pull request on mainline. >> >> **Summary:** >> The main change is card stats collection during RS scanning. The code is protected by a new diagnostic flag `ShenandoahEnableCardStats`, which is off by default. With the flag disabled there is a small performance impact (measured with extremem; more data will be collected, see above). With the flag enabled there is a larger performance impact because of the large number of clusters, with shared stats updates at the end of each cluster processed. Since we expect the loops in process_clusters() to change in the near future, informed by the learnings from these stats, we expect to work further on reducing the cost of the stats collection as well. Currently the stats are logged per thread at the end of each RS scan. I'm happy to refine both the stats that we collect as well as how frequently we log the data once we have gathered some experience on how we use this. >> >> **Details of files changed:** >> >> 1. shenandoahGeneration.cpp: add a call to log info at the end of remembered set scan when card stats are enabled. >> 2. shenandoahHeap.cpp: minor retsructuring of a loop for task claiming during update refs; introduce a worker id option to downstream code for card stats >> 3. shenandoahNumberSeq.cpp: fix a minor issue with a boundary condition check in code that tries to find the right bucket to increment. This was triggering an assert in the update code. >> 4. shenandoahNumberSeq.hpp: provide missing allocation spec for BinaryMagnitudeSeq >> 5. shenandoahScanRemembered.cpp: new class ShenandoahCardStats methods. Minor restructure of loop for task claiming during RS scanning (akin to the one for update refs in 2 above). >> 6. shenandoahScanRemembered.hpp: Diff looks large because of git-diff'ism having issues with indentation change in restructured if-else branches. Not sure how to make the diffs more easily readable. Updated some documentation comments that were slightly obsolete. New class ShenandoahCardStats and implementation of inline methods. Class ShenandoahScanRemembered keeps cumuative running histograms. Remove some inline declarations for larger methods that we shouldn't force inlining on. Update some old comments. >> 7. shenandoahScanRemembered.inline.hpp: As in 6, diff looks larger than it should because of the same indentation change. ShenandoahScanRemembered::process_clusters() is the method where the instrumentation probes have been inserted. A couple of variables were renamed for clarity, as well as ti update local variables rather than method arguments. The large diffs at (old) line 589 onwards is the git-diff'ism to do with indentation change. Delete some unused methods. >> 8. shenandoah_globals.hpp: new diagnostic flag `ShenandoahEnableCardStats` protects the stats collection code and is disabled by default. >> >> **Format of stats produced and how to interpret them: (sample)** >> >> >> [1211.515s][info][gc,task ] GC(7069) Using 10 of 20 workers for Concurrent remembered set scanning >> [1211.529s][info][gc,remset ] GC(7069) Worker 0 Card Stats Histo: >> [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] >> [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] >> [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] >> [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] >> [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] >> [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] >> [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1245.00 ] >> [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] >> [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1157.00 ] >> [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] >> [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] >> [1211.529s][info][gc,remset ] GC(7069) Worker 1 Card Stats Histo: >> [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] >> [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] >> [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] >> [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] >> [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] >> [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] >> [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1257.00 ] >> [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] >> [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1197.00 ] >> [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] >> [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] >> ... >> >> >> The rows represent the metric that's being tracked, and the columns are, respectively, minimum, the 3 quartiles (25%, 50%, 75%) and the maximum. The metrics are: >> >> - dirty_run: the length of an uninterrupted run of dirty cards, interpretedas a percentage of a chunk of work assignment (cluster) processed by a thread >> - clean_run: as above, but the length of an uninterrupted run of clean cards >> - dirty_cards, clean_cards: as above, but counts of cards as a percentage of chunk >> - max_dirty_run & max_clean_run: Similarly for the maximum of each. >> - dirty_objs, clean_objs: these are numbers of objects in any chunk walked, or scanned >> - dirty_scans, clean_scans: numbers of objects scanned by the closure >> - alternations: the number of times that we transitioned from clean to dirty or dirty to clean in a chunk >> >> The data above indicates that at least 75% of the chunks have no alternations at all, >> and cards are almost always mostly clean for this specific benchmark config (extremem). >> >> Comparing worker stats from worker 0 and worker 9 indicates very little difference between >> their statistics, as one might typically expect for well-balanced RS scans. >> >> **Questions:** >> >> 1. Would it make sense to print also, for example, the 1, 10, 90 and 99 percentiles for these metrics as well, in addition to the quartiles? >> 2. The distributions are per worker for the cumulative history of the run. Would data per RS scan or per Refs Update phase provide more useful information? >> 3. Any suggestions for a more easily consumable format? >> 4. I welcome any other feedback on the pull request. > > Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: > > Moved some more methods into non-product mode. Thanks for sharing this code. A few overview comments: 1. Yes, I think it would be useful to see the data collected for each mark scan and each update-reference scan independently. Sometimes, abnormal behavior of the application causes spikes in performance, and it would be nice to understand the degree to which remembered set scanning is part of this spike. 2. It is also useful to have a cumulative summary of all costs at the end of a run, probably still separating out the mark scans from the update-refs scans. 3. Is it possible to eliminate the overhead entirely of this instrumentation by compiling it out for release builds? ------------- PR: https://git.openjdk.org/shenandoah/pull/176 From wkemper at openjdk.org Thu Dec 8 21:46:54 2022 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Dec 2022 21:46:54 GMT Subject: RFR: Generation resizing [v4] In-Reply-To: References: Message-ID: > These changes have the generational mode track the minimum mutator utilization (percentage of process time used by mutators). When it falls below a configuration percentage (GCTimeRatio), a heuristic will transfer memory capacity to whatever generation has been using more CPU time. The assumption here is that by increasing capacity, we will decrease the collection frequency and improve the MMU. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge branch 'shenandoah-master' into mmu-instrumentation - Remove vestigial lock, do not enroll periodic task while holding threads_lock - Remove unnecessary logging, clean up imports - Merge from shenandoah/master - Document the class responsible for adjusting generation sizes - Revert unnecessary change - Remove unused time between cycle tracking - Remove vestigial mmu tracker instance - Clamp adjustments to min/max when increment is too large - Adjust generation sizes from safepoint - ... and 7 more: https://git.openjdk.org/shenandoah/compare/25469283...50896e31 ------------- Changes: https://git.openjdk.org/shenandoah/pull/177/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=177&range=03 Stats: 448 lines in 22 files changed: 418 ins; 18 del; 12 mod Patch: https://git.openjdk.org/shenandoah/pull/177.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/177/head:pull/177 PR: https://git.openjdk.org/shenandoah/pull/177 From ysr at openjdk.org Thu Dec 8 21:47:38 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 8 Dec 2022 21:47:38 GMT Subject: RFR: JDK-8297796 GenShen: instrument the remembered set scan [v2] In-Reply-To: References: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> <9E6NmFY5877JXtI7RKpqa1r2nXDaEJ7xxLG9q0hEP6U=.03c76ffe-bac9-4401-9091-4ee19d6a394e@github.com> Message-ID: On Thu, 8 Dec 2022 14:42:53 GMT, Kelvin Nilsen wrote: > Thanks for sharing this code. A few overview comments: > > 1. Yes, I think it would be useful to see the data collected for each mark scan and each update-reference scan independently. Sometimes, abnormal behavior of the application causes spikes in performance, and it would be nice to understand the degree to which remembered set scanning is part of this spike. > 2. It is also useful to have a cumulative summary of all costs at the end of a run, probably still separating out the mark scans from the update-refs scans. I'll make those changes. > 3. Is it possible to eliminate the overhead entirely of this instrumentation by compiling it out for release builds? It essentially is compiled out of product/release builds, but compiled only into optimized/release and *debug builds. I'll gather numbers to support that, as well as include the `.s` listing for process_clusters that should be unaffected by the presence of the stat calls which would be inlined and constant folded out. I'll make the changes for 1. and 2., and add the supporting data for 3. ------------- PR: https://git.openjdk.org/shenandoah/pull/176 From wkemper at openjdk.org Thu Dec 8 21:48:57 2022 From: wkemper at openjdk.org (William Kemper) Date: Thu, 8 Dec 2022 21:48:57 GMT Subject: Integrated: Generation resizing In-Reply-To: References: Message-ID: On Sat, 3 Dec 2022 01:09:59 GMT, William Kemper wrote: > These changes have the generational mode track the minimum mutator utilization (percentage of process time used by mutators). When it falls below a configuration percentage (GCTimeRatio), a heuristic will transfer memory capacity to whatever generation has been using more CPU time. The assumption here is that by increasing capacity, we will decrease the collection frequency and improve the MMU. This pull request has now been integrated. Changeset: ee49a488 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/ee49a4888452196877911f10b9b40fb08b2ae293 Stats: 448 lines in 22 files changed: 418 ins; 18 del; 12 mod Generation resizing Reviewed-by: rkennke, kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From ysr at openjdk.org Thu Dec 8 21:55:17 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 8 Dec 2022 21:55:17 GMT Subject: RFR: JDK-8298138: Shenandoah: HdrSeq asserts "sub-bucket index (512) overflow for value ( 1.00)" [v3] In-Reply-To: References:

Message-ID: <0zjCcXrkBBY9nRmwqX1UJBdbrZu2A9-PLxOOoRE2Q90=.4a83bba1-4d70-4c23-a2f5-6ea3183a9966@github.com> On Thu, 8 Dec 2022 09:13:40 GMT, Y. Srinivas Ramakrishna wrote: >> JBS link: https://bugs.openjdk.org/browse/JDK-8298138 >> - Fixed a boundary condition that was triggering an assert. >> - Added a simple-minded gtest for HdrSeq, which allows one to exercise the asserting code in a debug build. >> - Tested with: `CONF=slowdebug make run-test TEST="gtest:BasicShenandoahNumberSeqTest"` > > Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into shen_numberseq > - - Copyright dates etc. > - include reorder to alphabetic; don't use/include std:: namespace. > - Merge branch 'master' into shen_numberseq > - A simple-minded test of HdrSeq which also exercises the problematic > code. > - Fix a boundary condition issue w/HdrSeq Thanks for the reviews, Roman and Alexey! ------------- PR: https://git.openjdk.org/jdk/pull/11524 From ysr at openjdk.org Thu Dec 8 21:57:22 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 8 Dec 2022 21:57:22 GMT Subject: Integrated: JDK-8298138: Shenandoah: HdrSeq asserts "sub-bucket index (512) overflow for value ( 1.00)" In-Reply-To: References: Message-ID: <3JQOWP5xYJOM9dBqfHQtQTCh2hNFAHCkJXxRAzxCUgA=.0f90495d-ef94-4258-bcef-34c0afd01e3b@github.com> On Tue, 6 Dec 2022 03:46:12 GMT, Y. Srinivas Ramakrishna wrote: > JBS link: https://bugs.openjdk.org/browse/JDK-8298138 > - Fixed a boundary condition that was triggering an assert. > - Added a simple-minded gtest for HdrSeq, which allows one to exercise the asserting code in a debug build. > - Tested with: `CONF=slowdebug make run-test TEST="gtest:BasicShenandoahNumberSeqTest"` This pull request has now been integrated. Changeset: c16eb89c Author: Y. Srinivas Ramakrishna URL: https://git.openjdk.org/jdk/commit/c16eb89ce0d59f2ff83b6db0bee3e384ec8d5efe Stats: 77 lines in 3 files changed: 74 ins; 0 del; 3 mod 8298138: Shenandoah: HdrSeq asserts "sub-bucket index (512) overflow for value ( 1.00)" Reviewed-by: rkennke, shade ------------- PR: https://git.openjdk.org/jdk/pull/11524 From ysr at openjdk.org Thu Dec 8 23:25:27 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 8 Dec 2022 23:25:27 GMT Subject: RFR: Generation resizing [v4] In-Reply-To: References:

Message-ID: On Thu, 8 Dec 2022 21:46:54 GMT, William Kemper wrote: >> These changes have the generational mode track the minimum mutator utilization (percentage of process time used by mutators). When it falls below a configuration percentage (GCTimeRatio), a heuristic will transfer memory capacity to whatever generation has been using more CPU time. The assumption here is that by increasing capacity, we will decrease the collection frequency and improve the MMU. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'shenandoah-master' into mmu-instrumentation > - Remove vestigial lock, do not enroll periodic task while holding threads_lock > - Remove unnecessary logging, clean up imports > - Merge from shenandoah/master > - Document the class responsible for adjusting generation sizes > - Revert unnecessary change > - Remove unused time between cycle tracking > - Remove vestigial mmu tracker instance > - Clamp adjustments to min/max when increment is too large > - Adjust generation sizes from safepoint > - ... and 7 more: https://git.openjdk.org/shenandoah/compare/25469283...50896e31 Sorry again for not getting this review back to you in time. It looks good overall, but some comments here for you to use to perhaps improve a few things. Reviewed & approved, modulo the above comments. src/hotspot/share/gc/shenandoah/mode/shenandoahGenerationalMode.cpp line 39: > 37: } > 38: > 39: SHENANDOAH_ERGO_OVERRIDE_DEFAULT(GCTimeRatio, 70); Does this translate to a GC overhead of 1/71*100% = 1.4%? src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 59: > 57: ShenandoahHeap::heap()->gc_threads_do(&cl); > 58: // Include VM thread? Compiler threads? or no - because there > 59: // is nothing the collector can do about those threads. Correct, we should not measure in the control signal that which we do not affect. I'd either delete this comment, or just state smething like: // We do not include non-GC vm threads, such as compiler threads, etc. in our measurement // since we are using the tracker only to control (affect) the time spent in GC. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 88: > 86: // This is only called by the control thread. > 87: double collector_time_s = gc_thread_time_seconds(); > 88: double elapsed_gc_time_s = collector_time_s - _initial_collector_time_s; Since "elapsed" has a different connotation, it would be less confusing for this variable to be called something like `delta_gc_time_s` being the delta between what was previously recorded and what has now been recorded. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 96: > 94: // This is only called by the periodic thread. > 95: double process_time_s = process_time_seconds(); > 96: double elapsed_process_time_s = process_time_s - _initial_process_time_s; elapsed -> delta src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 99: > 97: _initial_process_time_s = process_time_s; > 98: double verify_time_s = gc_thread_time_seconds(); > 99: double verify_elapsed = verify_time_s - _initial_verify_collector_time_s; elapsed -> delta src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 124: > 122: if (old_time_s > young_time_s) { > 123: return transfer_capacity(young, old); > 124: } else { In another place I had asked if this method was idempotent. It would be nice if it were. This is almost idempotent, but not quite. You can make it idempotent by changing the `else` to `else if (young_time_s > old_time_s)` thus sidestepping the case where the two have just been reset and will be 0 (at least until the next gc). src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 146: > 144: size_t regions_to_transfer = MAX2(1UL, size_t(double(available_regions) * _resize_increment)); > 145: size_t bytes_to_transfer = regions_to_transfer * ShenandoahHeapRegion::region_size_bytes(); > 146: if (from->generation_mode() == YOUNG) { I'd consider extracting the work in the `if` and `else` arms into a suitable smaller work method (or two, if one won't suffice for both arms) instead of doing it in line here. It might improve readability and maintainability of the code. If you tried that and it didn't help, you can ignore this comment. The similarity in shape of the two arms and the "duplication" just seemed to be worth refactoring into a worker method. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 54: > 52: double _initial_collector_time_s; > 53: double _initial_process_time_s; > 54: double _initial_verify_collector_time_s; What does this field with `verify` in its name track? For each of the data fields, I'd suggest adding a short comment; e.g.: double _initial_collector_time_s; // tracks cumulative collector threads virtual cpu-time at last recording etc. ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From ysr at openjdk.org Fri Dec 9 01:11:16 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 9 Dec 2022 01:11:16 GMT Subject: RFR: Generation resizing [v3] In-Reply-To: <8Fse7IxO14Uc0eJJoLMmGXSo8XYD9Qb144mCyrMX3-g=.1d5d7258-f06d-4a03-bf78-8102f22ada7d@github.com> References: <8Fse7IxO14Uc0eJJoLMmGXSo8XYD9Qb144mCyrMX3-g=.1d5d7258-f06d-4a03-bf78-8102f22ada7d@github.com> Message-ID: On Tue, 6 Dec 2022 17:26:08 GMT, William Kemper wrote: >> These changes have the generational mode track the minimum mutator utilization (percentage of process time used by mutators). When it falls below a configuration percentage (GCTimeRatio), a heuristic will transfer memory capacity to whatever generation has been using more CPU time. The assumption here is that by increasing capacity, we will decrease the collection frequency and improve the MMU. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Remove vestigial lock, do not enroll periodic task while holding threads_lock src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 980: > 978: } > 979: > 980: void ShenandoahGeneration::increase_capacity(size_t increment) { is there some sanity check done on this elsewhere to make sure the increase/decrease make sense? Perhaps I'll see it in the caller(s) when I get to it. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1095: > 1093: } > 1094: > 1095: bool ShenandoahHeap::adjust_generation_sizes() { Is this method idempotent? I guess it depends on the method of the same name in the MMU Tracker. I guess my question will be answered when I get to it. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 37: > 35: * This class is responsible for tracking and adjusting the minimum mutator > 36: * utilization (MMU). MMU is defined as the percentage of CPU time available > 37: * to mutator threads over an arbitrary, fixed interval of time. MMU is measured Where do we specify the fixed interval used as the basis for the MMU? src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 38: > 36: * utilization (MMU). MMU is defined as the percentage of CPU time available > 37: * to mutator threads over an arbitrary, fixed interval of time. MMU is measured > 38: * by summing all of the time given to the GC threads and comparing this too too -> to src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 44: > 42: * The time spent by GC threads is attributed to the young or old generation. > 43: * The time given to the controller and regulator threads is attributed to the > 44: * global generation. At the end of every collection, the average MMU is inspected. average over ...? Average MMU over the most recently ended collection cycle? Or over the cumulative history of the run? Or over all of the collection cycles since the last adjustment of generation sizes? etc. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 61: > 59: TruncatedSeq _mmu_average; > 60: > 61: bool transfer_capacity(ShenandoahGeneration* from, ShenandoahGeneration* to); Nit: Shouldn't the adjustment be in a sizer object rather than in a tracker object? May be we think of this class as an MmuBasedGenerationSizeController which both tracks MMU and controls the size of the generations. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 90: > 88: // allocators by taking the heap lock). The amount of capacity to move > 89: // from one generation to another is controlled by YoungGenerationSizeIncrement > 90: // and defaults to 20% of the heap. The minimum and maximum sizes of the Is the transfer delta always 20%? Wouldn't that cause oscillations about an equilibrium point at steady load? But I should read on to see how this works. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 93: > 91: // young generation are controlled by ShenandoahMinYoungPercentage and > 92: // ShenandoahMaxYoungPercentage, respectively. The method returns true > 93: // when and adjustment is made, false otherwise. and -> an src/hotspot/share/gc/shenandoah/shenandoahYoungGeneration.cpp line 95: > 93: > 94: void ShenandoahYoungGeneration::add_collection_time(double time_seconds) { > 95: if (_old_gen_task_queues != NULL) { This seems a bit subtle. Isn't there a better/official status flag to check, or a default second parm to leverage from caller? src/hotspot/share/gc/shenandoah/shenandoahYoungGeneration.hpp line 59: > 57: virtual ShenandoahHeuristics* initialize_heuristics(ShenandoahMode* gc_mode) override; > 58: > 59: virtual void add_collection_time(double time_seconds) override; A 1-line documentation comment/spec for the method would be nice here. ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From ysr at openjdk.org Fri Dec 9 01:11:18 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 9 Dec 2022 01:11:18 GMT Subject: RFR: Generation resizing [v3] In-Reply-To: References: <8Fse7IxO14Uc0eJJoLMmGXSo8XYD9Qb144mCyrMX3-g=.1d5d7258-f06d-4a03-bf78-8102f22ada7d@github.com> Message-ID: On Thu, 8 Dec 2022 09:17:51 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove vestigial lock, do not enroll periodic task while holding threads_lock > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 980: > >> 978: } >> 979: >> 980: void ShenandoahGeneration::increase_capacity(size_t increment) { > > is there some sanity check done on this elsewhere to make sure the increase/decrease make sense? Perhaps I'll see it in the caller(s) when I get to it. I see now that you do. Would it still be worthwhile asserting here as well that bounds are respected. Might make the code more maintainable in the face of changes. > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1095: > >> 1093: } >> 1094: >> 1095: bool ShenandoahHeap::adjust_generation_sizes() { > > Is this method idempotent? I guess it depends on the method of the same name in the MMU Tracker. I guess my question will be answered when I get to it. Left a related comment in `MmuTracker::adjust_generational_size()`. > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 37: > >> 35: * This class is responsible for tracking and adjusting the minimum mutator >> 36: * utilization (MMU). MMU is defined as the percentage of CPU time available >> 37: * to mutator threads over an arbitrary, fixed interval of time. MMU is measured > > Where do we specify the fixed interval used as the basis for the MMU? I'd mention the interval `GCPauseIntervalMillis` here for clarity. (I'd say it's a curious naming of the interval, but it's already used in that sense, so we leave it as is.) > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 90: > >> 88: // allocators by taking the heap lock). The amount of capacity to move >> 89: // from one generation to another is controlled by YoungGenerationSizeIncrement >> 90: // and defaults to 20% of the heap. The minimum and maximum sizes of the > > Is the transfer delta always 20%? Wouldn't that cause oscillations about an equilibrium point at steady load? But I should read on to see how this works. I think the way you use it, it's not 20% of the heap but rather 20% of the free space in the generation that will provide the transfer delta. May be reword for clarity? ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From ysr at openjdk.org Fri Dec 9 01:11:19 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 9 Dec 2022 01:11:19 GMT Subject: RFR: Generation resizing [v4] In-Reply-To: References:

Message-ID: On Thu, 8 Dec 2022 23:57:57 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'shenandoah-master' into mmu-instrumentation >> - Remove vestigial lock, do not enroll periodic task while holding threads_lock >> - Remove unnecessary logging, clean up imports >> - Merge from shenandoah/master >> - Document the class responsible for adjusting generation sizes >> - Revert unnecessary change >> - Remove unused time between cycle tracking >> - Remove vestigial mmu tracker instance >> - Clamp adjustments to min/max when increment is too large >> - Adjust generation sizes from safepoint >> - ... and 7 more: https://git.openjdk.org/shenandoah/compare/25469283...50896e31 > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 99: > >> 97: _initial_process_time_s = process_time_s; >> 98: double verify_time_s = gc_thread_time_seconds(); >> 99: double verify_elapsed = verify_time_s - _initial_verify_collector_time_s; > > elapsed -> delta Why do you use the `verify_` prefix here? I'm sure I am missing something here... ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From ysr at openjdk.org Fri Dec 9 01:17:41 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 9 Dec 2022 01:17:41 GMT Subject: RFR: Generation resizing [v3] In-Reply-To: References: <8Fse7IxO14Uc0eJJoLMmGXSo8XYD9Qb144mCyrMX3-g=.1d5d7258-f06d-4a03-bf78-8102f22ada7d@github.com> Message-ID: On Thu, 8 Dec 2022 08:03:10 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove vestigial lock, do not enroll periodic task while holding threads_lock > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 44: > >> 42: * The time spent by GC threads is attributed to the young or old generation. >> 43: * The time given to the controller and regulator threads is attributed to the >> 44: * global generation. At the end of every collection, the average MMU is inspected. > > average over ...? Average MMU over the most recently ended collection cycle? Or over the cumulative history of the run? Or over all of the collection cycles since the last adjustment of generation sizes? etc. I see now that it's a decaying average over samples at every 5 second interval. May be elaborate the comment accrdingly. ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From ysr at openjdk.org Fri Dec 9 01:29:32 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 9 Dec 2022 01:29:32 GMT Subject: RFR: JDK-8297796 GenShen: instrument the remembered set scan [v3] In-Reply-To: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> References: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> Message-ID: <7SIel7MTWCWMMqbkWEUVe2DvNyrmENS0RkxT6MhU-b0=.14e88010-d2fb-41cb-abba-debefde07292@github.com> > **Note:** > This pull request is a draft to share the diffs with the project team. The following additional work is planned before this is ready to commit. (Thanks to Kevin, Roman, William etc. for feedback & suggestions.) > > (1) Collect performance data from SpecJBB and from the pipeline to assess the impact of instrumentation on concurrent remembered set scanning and concurrent update refs phase durations, in addition to the existing data from Extermem mentioned in the ticket. > > (2) Make available the instrumentation only in non-product (optimized) mode until better performance is achieved. > > (3) Any improvements that come from further feedback on this draft (e.g. better or different logging of the metrics data), or other suggestions that I may have missed mentioning above. > > The fix to ShenandoahNumberSeq will be separated out and made into a separate pull request on mainline. > > **Summary:** > The main change is card stats collection during RS scanning. The code is protected by a new diagnostic flag `ShenandoahEnableCardStats`, which is off by default. With the flag disabled there is a small performance impact (measured with extremem; more data will be collected, see above). With the flag enabled there is a larger performance impact because of the large number of clusters, with shared stats updates at the end of each cluster processed. Since we expect the loops in process_clusters() to change in the near future, informed by the learnings from these stats, we expect to work further on reducing the cost of the stats collection as well. Currently the stats are logged per thread at the end of each RS scan. I'm happy to refine both the stats that we collect as well as how frequently we log the data once we have gathered some experience on how we use this. > > **Details of files changed:** > > 1. shenandoahGeneration.cpp: add a call to log info at the end of remembered set scan when card stats are enabled. > 2. shenandoahHeap.cpp: minor retsructuring of a loop for task claiming during update refs; introduce a worker id option to downstream code for card stats > 3. shenandoahNumberSeq.cpp: fix a minor issue with a boundary condition check in code that tries to find the right bucket to increment. This was triggering an assert in the update code. > 4. shenandoahNumberSeq.hpp: provide missing allocation spec for BinaryMagnitudeSeq > 5. shenandoahScanRemembered.cpp: new class ShenandoahCardStats methods. Minor restructure of loop for task claiming during RS scanning (akin to the one for update refs in 2 above). > 6. shenandoahScanRemembered.hpp: Diff looks large because of git-diff'ism having issues with indentation change in restructured if-else branches. Not sure how to make the diffs more easily readable. Updated some documentation comments that were slightly obsolete. New class ShenandoahCardStats and implementation of inline methods. Class ShenandoahScanRemembered keeps cumuative running histograms. Remove some inline declarations for larger methods that we shouldn't force inlining on. Update some old comments. > 7. shenandoahScanRemembered.inline.hpp: As in 6, diff looks larger than it should because of the same indentation change. ShenandoahScanRemembered::process_clusters() is the method where the instrumentation probes have been inserted. A couple of variables were renamed for clarity, as well as ti update local variables rather than method arguments. The large diffs at (old) line 589 onwards is the git-diff'ism to do with indentation change. Delete some unused methods. > 8. shenandoah_globals.hpp: new diagnostic flag `ShenandoahEnableCardStats` protects the stats collection code and is disabled by default. > > **Format of stats produced and how to interpret them: (sample)** > > > [1211.515s][info][gc,task ] GC(7069) Using 10 of 20 workers for Concurrent remembered set scanning > [1211.529s][info][gc,remset ] GC(7069) Worker 0 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1245.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1157.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > [1211.529s][info][gc,remset ] GC(7069) Worker 1 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1257.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1197.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > ... > > > The rows represent the metric that's being tracked, and the columns are, respectively, minimum, the 3 quartiles (25%, 50%, 75%) and the maximum. The metrics are: > > - dirty_run: the length of an uninterrupted run of dirty cards, interpretedas a percentage of a chunk of work assignment (cluster) processed by a thread > - clean_run: as above, but the length of an uninterrupted run of clean cards > - dirty_cards, clean_cards: as above, but counts of cards as a percentage of chunk > - max_dirty_run & max_clean_run: Similarly for the maximum of each. > - dirty_objs, clean_objs: these are numbers of objects in any chunk walked, or scanned > - dirty_scans, clean_scans: numbers of objects scanned by the closure > - alternations: the number of times that we transitioned from clean to dirty or dirty to clean in a chunk > > The data above indicates that at least 75% of the chunks have no alternations at all, > and cards are almost always mostly clean for this specific benchmark config (extremem). > > Comparing worker stats from worker 0 and worker 9 indicates very little difference between > their statistics, as one might typically expect for well-balanced RS scans. > > **Questions:** > > 1. Would it make sense to print also, for example, the 1, 10, 90 and 99 percentiles for these metrics as well, in addition to the quartiles? > 2. The distributions are per worker for the cumulative history of the run. Would data per RS scan or per Refs Update phase provide more useful information? > 3. Any suggestions for a more easily consumable format? > 4. I welcome any other feedback on the pull request. Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 38 commits: - Merge branch 'master' into JVM-1264 - NOT_PRODUCT() for the stats allocation. Although the stats colelction calls were inlined and constant-folded away by the comoiler, the allocation was not removed, go figure. Thus it made sense to remove them all via NOT_PRODUCT(). I might revisit this in later iterations as I work on the card-scan loop itself, but for now this is sufficient. - Moved some more methods into non-product mode. - Merge branch 'master' into JVM-1264 - Card stats only in non-product mode (until impact of stats collection is reduce). - Merge branch 'master' into JVM-1264 - Merge branch 'master' into JVM-1264 - jcheck whitespace fixes. - Fix card_stats() so it doesn't crash when card stats aren't enabled. - Fix comment. - ... and 28 more: https://git.openjdk.org/shenandoah/compare/ee49a488...488c9399 ------------- Changes: https://git.openjdk.org/shenandoah/pull/176/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=02 Stats: 740 lines in 8 files changed: 371 ins; 220 del; 149 mod Patch: https://git.openjdk.org/shenandoah/pull/176.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/176/head:pull/176 PR: https://git.openjdk.org/shenandoah/pull/176 From wkemper at openjdk.org Fri Dec 9 16:34:45 2022 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Dec 2022 16:34:45 GMT Subject: RFR: Generation resizing [v4] In-Reply-To: References:

Message-ID: On Fri, 9 Dec 2022 00:59:24 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'shenandoah-master' into mmu-instrumentation >> - Remove vestigial lock, do not enroll periodic task while holding threads_lock >> - Remove unnecessary logging, clean up imports >> - Merge from shenandoah/master >> - Document the class responsible for adjusting generation sizes >> - Revert unnecessary change >> - Remove unused time between cycle tracking >> - Remove vestigial mmu tracker instance >> - Clamp adjustments to min/max when increment is too large >> - Adjust generation sizes from safepoint >> - ... and 7 more: https://git.openjdk.org/shenandoah/compare/25469283...50896e31 > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 124: > >> 122: if (old_time_s > young_time_s) { >> 123: return transfer_capacity(young, old); >> 124: } else { > > In another place I had asked if this method was idempotent. It would be nice if it were. > > This is almost idempotent, but not quite. > > You can make it idempotent by changing the `else` to `else if (young_time_s > old_time_s)` thus sidestepping the case where the two have just been reset and will be 0 (at least until the next gc). Good point, I'll make that change. > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 146: > >> 144: size_t regions_to_transfer = MAX2(1UL, size_t(double(available_regions) * _resize_increment)); >> 145: size_t bytes_to_transfer = regions_to_transfer * ShenandoahHeapRegion::region_size_bytes(); >> 146: if (from->generation_mode() == YOUNG) { > > I'd consider extracting the work in the `if` and `else` arms into a suitable smaller work method (or two, if one won't suffice for both arms) instead of doing it in line here. It might improve readability and maintainability of the code. > > If you tried that and it didn't help, you can ignore this comment. The similarity in shape of the two arms and the "duplication" just seemed to be worth refactoring into a worker method. Yes, this method is a bit long. The similarity breaks a bit because it needs to enforce the min or max constraint depending on the direction of the transfer. I'll split it up in a subsequent PR. > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 54: > >> 52: double _initial_collector_time_s; >> 53: double _initial_process_time_s; >> 54: double _initial_verify_collector_time_s; > > What does this field with `verify` in its name track? > > For each of the data fields, I'd suggest adding a short comment; e.g.: > > > double _initial_collector_time_s; // tracks cumulative collector threads virtual cpu-time at last recording > > > etc. It's vestigial - I was using it originally to "verify" the result of the per-generation based MMU. I'll rename it. ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From wkemper at openjdk.org Fri Dec 9 16:34:48 2022 From: wkemper at openjdk.org (William Kemper) Date: Fri, 9 Dec 2022 16:34:48 GMT Subject: RFR: Generation resizing [v3] In-Reply-To: References: <8Fse7IxO14Uc0eJJoLMmGXSo8XYD9Qb144mCyrMX3-g=.1d5d7258-f06d-4a03-bf78-8102f22ada7d@github.com> Message-ID: On Thu, 8 Dec 2022 07:57:51 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove vestigial lock, do not enroll periodic task while holding threads_lock > > src/hotspot/share/gc/shenandoah/shenandoahYoungGeneration.cpp line 95: > >> 93: >> 94: void ShenandoahYoungGeneration::add_collection_time(double time_seconds) { >> 95: if (_old_gen_task_queues != NULL) { > > This seems a bit subtle. Isn't there a better/official status flag to check, or a default second parm to leverage from caller? I'll turn this into an `is_bootstrapping()` method. ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From kdnilsen at openjdk.org Fri Dec 9 23:41:39 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 9 Dec 2022 23:41:39 GMT Subject: RFR: Shrink tlab to capacity Message-ID: When a TLAB request exceeds the currently available memory within young-gen, the existing behavior is to reject the TLAB request outright. This is recognized as a failed allocation request, which triggers degenerated GC. This change introduces code to reduce the likelihood that too-large TLAB requests will be issued, and when they are issued, it makes an effort to shrink the TLAB request in order to reduce the need for degenerated GC. The impact is difficult to measure because this situation is fairly rare. On one Extremem workload, the TLAB-shrinking code is exercised only once during a 16-minute run involving 500 concurrent GCs, a 45 GiB heap, and a 28 GiB young-gen size. The change reduces the degenerated GCs from 6 to 5. One reason that the remaining 5 degenerated GCs are not addressed by this change is that further work is required to handle a situation in which a requested TLAB is smaller than the available young-gen memory, but available memory is set aside in the evacuation reserve so cannot be provided to a mutator. Future work will address this condition. ------------- Commit messages: - Fix whitespace - Merge branch 'master' into shrink-tlab-to-capacity - Experiments to confirm proper operation - Merge remote-tracking branch 'GitFarmBranch/shrink-tlab-to-capacity' into shrink-tlab-to-capacity - Remove some debug instrumentation - Fix log message to avoid assertion failure - Change <= to < in test for shrinking tlab request size - Fix spelling error in assertion - Restructure control to avoid goto statement - Resize tlab request if larger than adjusted available Changes: https://git.openjdk.org/shenandoah/pull/180/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=180&range=00 Stats: 184 lines in 2 files changed: 71 ins; 39 del; 74 mod Patch: https://git.openjdk.org/shenandoah/pull/180.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/180/head:pull/180 PR: https://git.openjdk.org/shenandoah/pull/180 From ysr at openjdk.org Mon Dec 12 03:04:47 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 12 Dec 2022 03:04:47 GMT Subject: RFR: JDK-8297796 GenShen: instrument the remembered set scan [v4] In-Reply-To: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> References: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> Message-ID: > **Note:** > This pull request is a draft to share the diffs with the project team. The following additional work is planned before this is ready to commit. (Thanks to Kevin, Roman, William etc. for feedback & suggestions.) > > (1) Collect performance data from SpecJBB and from the pipeline to assess the impact of instrumentation on concurrent remembered set scanning and concurrent update refs phase durations, in addition to the existing data from Extermem mentioned in the ticket. > > (2) Make available the instrumentation only in non-product (optimized) mode until better performance is achieved. > > (3) Any improvements that come from further feedback on this draft (e.g. better or different logging of the metrics data), or other suggestions that I may have missed mentioning above. > > The fix to ShenandoahNumberSeq will be separated out and made into a separate pull request on mainline. > > **Summary:** > The main change is card stats collection during RS scanning. The code is protected by a new diagnostic flag `ShenandoahEnableCardStats`, which is off by default. With the flag disabled there is a small performance impact (measured with extremem; more data will be collected, see above). With the flag enabled there is a larger performance impact because of the large number of clusters, with shared stats updates at the end of each cluster processed. Since we expect the loops in process_clusters() to change in the near future, informed by the learnings from these stats, we expect to work further on reducing the cost of the stats collection as well. Currently the stats are logged per thread at the end of each RS scan. I'm happy to refine both the stats that we collect as well as how frequently we log the data once we have gathered some experience on how we use this. > > **Details of files changed:** > > 1. shenandoahGeneration.cpp: add a call to log info at the end of remembered set scan when card stats are enabled. > 2. shenandoahHeap.cpp: minor retsructuring of a loop for task claiming during update refs; introduce a worker id option to downstream code for card stats > 3. shenandoahNumberSeq.cpp: fix a minor issue with a boundary condition check in code that tries to find the right bucket to increment. This was triggering an assert in the update code. > 4. shenandoahNumberSeq.hpp: provide missing allocation spec for BinaryMagnitudeSeq > 5. shenandoahScanRemembered.cpp: new class ShenandoahCardStats methods. Minor restructure of loop for task claiming during RS scanning (akin to the one for update refs in 2 above). > 6. shenandoahScanRemembered.hpp: Diff looks large because of git-diff'ism having issues with indentation change in restructured if-else branches. Not sure how to make the diffs more easily readable. Updated some documentation comments that were slightly obsolete. New class ShenandoahCardStats and implementation of inline methods. Class ShenandoahScanRemembered keeps cumuative running histograms. Remove some inline declarations for larger methods that we shouldn't force inlining on. Update some old comments. > 7. shenandoahScanRemembered.inline.hpp: As in 6, diff looks larger than it should because of the same indentation change. ShenandoahScanRemembered::process_clusters() is the method where the instrumentation probes have been inserted. A couple of variables were renamed for clarity, as well as ti update local variables rather than method arguments. The large diffs at (old) line 589 onwards is the git-diff'ism to do with indentation change. Delete some unused methods. > 8. shenandoah_globals.hpp: new diagnostic flag `ShenandoahEnableCardStats` protects the stats collection code and is disabled by default. > > **Format of stats produced and how to interpret them: (sample)** > > > [1211.515s][info][gc,task ] GC(7069) Using 10 of 20 workers for Concurrent remembered set scanning > [1211.529s][info][gc,remset ] GC(7069) Worker 0 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1245.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1157.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > [1211.529s][info][gc,remset ] GC(7069) Worker 1 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1257.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1197.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > ... > > > The rows represent the metric that's being tracked, and the columns are, respectively, minimum, the 3 quartiles (25%, 50%, 75%) and the maximum. The metrics are: > > - dirty_run: the length of an uninterrupted run of dirty cards, interpretedas a percentage of a chunk of work assignment (cluster) processed by a thread > - clean_run: as above, but the length of an uninterrupted run of clean cards > - dirty_cards, clean_cards: as above, but counts of cards as a percentage of chunk > - max_dirty_run & max_clean_run: Similarly for the maximum of each. > - dirty_objs, clean_objs: these are numbers of objects in any chunk walked, or scanned > - dirty_scans, clean_scans: numbers of objects scanned by the closure > - alternations: the number of times that we transitioned from clean to dirty or dirty to clean in a chunk > > The data above indicates that at least 75% of the chunks have no alternations at all, > and cards are almost always mostly clean for this specific benchmark config (extremem). > > Comparing worker stats from worker 0 and worker 9 indicates very little difference between > their statistics, as one might typically expect for well-balanced RS scans. > > **Questions:** > > 1. Would it make sense to print also, for example, the 1, 10, 90 and 99 percentiles for these metrics as well, in addition to the quartiles? > 2. The distributions are per worker for the cumulative history of the run. Would data per RS scan or per Refs Update phase provide more useful information? > 3. Any suggestions for a more easily consumable format? > 4. I welcome any other feedback on the pull request. Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: Extract ShenandoahCardStats into its own {.h,.c}pp files. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/176/files - new: https://git.openjdk.org/shenandoah/pull/176/files/488c9399..388a03da Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=03 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=02-03 Stats: 523 lines in 5 files changed: 293 ins; 227 del; 3 mod Patch: https://git.openjdk.org/shenandoah/pull/176.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/176/head:pull/176 PR: https://git.openjdk.org/shenandoah/pull/176 From ysr at openjdk.org Mon Dec 12 09:20:49 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 12 Dec 2022 09:20:49 GMT Subject: RFR: JDK-8297796 GenShen: instrument the remembered set scan [v5] In-Reply-To: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> References: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> Message-ID: > **Note:** > This pull request is a draft to share the diffs with the project team. The following additional work is planned before this is ready to commit. (Thanks to Kevin, Roman, William etc. for feedback & suggestions.) > > (1) Collect performance data from SpecJBB and from the pipeline to assess the impact of instrumentation on concurrent remembered set scanning and concurrent update refs phase durations, in addition to the existing data from Extermem mentioned in the ticket. > > (2) Make available the instrumentation only in non-product (optimized) mode until better performance is achieved. > > (3) Any improvements that come from further feedback on this draft (e.g. better or different logging of the metrics data), or other suggestions that I may have missed mentioning above. > > The fix to ShenandoahNumberSeq will be separated out and made into a separate pull request on mainline. > > **Summary:** > The main change is card stats collection during RS scanning. The code is protected by a new diagnostic flag `ShenandoahEnableCardStats`, which is off by default. With the flag disabled there is a small performance impact (measured with extremem; more data will be collected, see above). With the flag enabled there is a larger performance impact because of the large number of clusters, with shared stats updates at the end of each cluster processed. Since we expect the loops in process_clusters() to change in the near future, informed by the learnings from these stats, we expect to work further on reducing the cost of the stats collection as well. Currently the stats are logged per thread at the end of each RS scan. I'm happy to refine both the stats that we collect as well as how frequently we log the data once we have gathered some experience on how we use this. > > **Details of files changed:** > > 1. shenandoahGeneration.cpp: add a call to log info at the end of remembered set scan when card stats are enabled. > 2. shenandoahHeap.cpp: minor retsructuring of a loop for task claiming during update refs; introduce a worker id option to downstream code for card stats > 3. shenandoahNumberSeq.cpp: fix a minor issue with a boundary condition check in code that tries to find the right bucket to increment. This was triggering an assert in the update code. > 4. shenandoahNumberSeq.hpp: provide missing allocation spec for BinaryMagnitudeSeq > 5. shenandoahScanRemembered.cpp: new class ShenandoahCardStats methods. Minor restructure of loop for task claiming during RS scanning (akin to the one for update refs in 2 above). > 6. shenandoahScanRemembered.hpp: Diff looks large because of git-diff'ism having issues with indentation change in restructured if-else branches. Not sure how to make the diffs more easily readable. Updated some documentation comments that were slightly obsolete. New class ShenandoahCardStats and implementation of inline methods. Class ShenandoahScanRemembered keeps cumuative running histograms. Remove some inline declarations for larger methods that we shouldn't force inlining on. Update some old comments. > 7. shenandoahScanRemembered.inline.hpp: As in 6, diff looks larger than it should because of the same indentation change. ShenandoahScanRemembered::process_clusters() is the method where the instrumentation probes have been inserted. A couple of variables were renamed for clarity, as well as ti update local variables rather than method arguments. The large diffs at (old) line 589 onwards is the git-diff'ism to do with indentation change. Delete some unused methods. > 8. shenandoah_globals.hpp: new diagnostic flag `ShenandoahEnableCardStats` protects the stats collection code and is disabled by default. > > **Format of stats produced and how to interpret them: (sample)** > > > [1211.515s][info][gc,task ] GC(7069) Using 10 of 20 workers for Concurrent remembered set scanning > [1211.529s][info][gc,remset ] GC(7069) Worker 0 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1245.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1157.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > [1211.529s][info][gc,remset ] GC(7069) Worker 1 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1257.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1197.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > ... > > > The rows represent the metric that's being tracked, and the columns are, respectively, minimum, the 3 quartiles (25%, 50%, 75%) and the maximum. The metrics are: > > - dirty_run: the length of an uninterrupted run of dirty cards, interpretedas a percentage of a chunk of work assignment (cluster) processed by a thread > - clean_run: as above, but the length of an uninterrupted run of clean cards > - dirty_cards, clean_cards: as above, but counts of cards as a percentage of chunk > - max_dirty_run & max_clean_run: Similarly for the maximum of each. > - dirty_objs, clean_objs: these are numbers of objects in any chunk walked, or scanned > - dirty_scans, clean_scans: numbers of objects scanned by the closure > - alternations: the number of times that we transitioned from clean to dirty or dirty to clean in a chunk > > The data above indicates that at least 75% of the chunks have no alternations at all, > and cards are almost always mostly clean for this specific benchmark config (extremem). > > Comparing worker stats from worker 0 and worker 9 indicates very little difference between > their statistics, as one might typically expect for well-balanced RS scans. > > **Questions:** > > 1. Would it make sense to print also, for example, the 1, 10, 90 and 99 percentiles for these metrics as well, in addition to the quartiles? > 2. The distributions are per worker for the cumulative history of the run. Would data per RS scan or per Refs Update phase provide more useful information? > 3. Any suggestions for a more easily consumable format? > 4. I welcome any other feedback on the pull request. Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: Separated out stats for scan_rs and update_refs Still need to carry cumulative stats, and merge stats from each round into cumulative.The latter needs a "merge" method in NumberSeq, which will be a separate PR. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/176/files - new: https://git.openjdk.org/shenandoah/pull/176/files/388a03da..0d65158c Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=04 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=03-04 Stats: 43 lines in 5 files changed: 19 ins; 0 del; 24 mod Patch: https://git.openjdk.org/shenandoah/pull/176.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/176/head:pull/176 PR: https://git.openjdk.org/shenandoah/pull/176 From fyang at openjdk.org Mon Dec 12 12:45:54 2022 From: fyang at openjdk.org (Fei Yang) Date: Mon, 12 Dec 2022 12:45:54 GMT Subject: RFR: 8298568: Fastdebug build fails after JDK-8296389 Message-ID: This is a trivial change fixing an typo introduced by JDK-8296389. The correct version is "is_NeverBranch()" instead of "isNeverBranch()". Testing: Fastdebug builds fine with this fix on linux-aarch64 platform. ------------- Commit messages: - 8298568: Fastdebug build fails after JDK-8296389 Changes: https://git.openjdk.org/jdk/pull/11631/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11631&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8298568 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11631.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11631/head:pull/11631 PR: https://git.openjdk.org/jdk/pull/11631 From rkennke at openjdk.org Mon Dec 12 13:18:53 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 12 Dec 2022 13:18:53 GMT Subject: RFR: 8298568: Fastdebug build fails after JDK-8296389 In-Reply-To: References: Message-ID: On Mon, 12 Dec 2022 12:37:51 GMT, Fei Yang wrote: > This is a trivial change fixing an typo introduced by JDK-8296389. > > The correct version is "is_NeverBranch()" instead of "isNeverBranch()". > > Testing: Fastdebug builds fine with this fix on linux-aarch64 platform. Looks good to me! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.org/jdk/pull/11631 From ysr at openjdk.org Mon Dec 12 20:43:48 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 12 Dec 2022 20:43:48 GMT Subject: RFR: JDK-8297796 GenShen: instrument the remembered set scan [v6] In-Reply-To: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> References: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> Message-ID: > **Note:** > This pull request is a draft to share the diffs with the project team. The following additional work is planned before this is ready to commit. (Thanks to Kevin, Roman, William etc. for feedback & suggestions.) > > (1) Collect performance data from SpecJBB and from the pipeline to assess the impact of instrumentation on concurrent remembered set scanning and concurrent update refs phase durations, in addition to the existing data from Extermem mentioned in the ticket. > > (2) Make available the instrumentation only in non-product (optimized) mode until better performance is achieved. > > (3) Any improvements that come from further feedback on this draft (e.g. better or different logging of the metrics data), or other suggestions that I may have missed mentioning above. > > The fix to ShenandoahNumberSeq will be separated out and made into a separate pull request on mainline. > > **Summary:** > The main change is card stats collection during RS scanning. The code is protected by a new diagnostic flag `ShenandoahEnableCardStats`, which is off by default. With the flag disabled there is a small performance impact (measured with extremem; more data will be collected, see above). With the flag enabled there is a larger performance impact because of the large number of clusters, with shared stats updates at the end of each cluster processed. Since we expect the loops in process_clusters() to change in the near future, informed by the learnings from these stats, we expect to work further on reducing the cost of the stats collection as well. Currently the stats are logged per thread at the end of each RS scan. I'm happy to refine both the stats that we collect as well as how frequently we log the data once we have gathered some experience on how we use this. > > **Details of files changed:** > > 1. shenandoahGeneration.cpp: add a call to log info at the end of remembered set scan when card stats are enabled. > 2. shenandoahHeap.cpp: minor retsructuring of a loop for task claiming during update refs; introduce a worker id option to downstream code for card stats > 3. shenandoahNumberSeq.cpp: fix a minor issue with a boundary condition check in code that tries to find the right bucket to increment. This was triggering an assert in the update code. > 4. shenandoahNumberSeq.hpp: provide missing allocation spec for BinaryMagnitudeSeq > 5. shenandoahScanRemembered.cpp: new class ShenandoahCardStats methods. Minor restructure of loop for task claiming during RS scanning (akin to the one for update refs in 2 above). > 6. shenandoahScanRemembered.hpp: Diff looks large because of git-diff'ism having issues with indentation change in restructured if-else branches. Not sure how to make the diffs more easily readable. Updated some documentation comments that were slightly obsolete. New class ShenandoahCardStats and implementation of inline methods. Class ShenandoahScanRemembered keeps cumuative running histograms. Remove some inline declarations for larger methods that we shouldn't force inlining on. Update some old comments. > 7. shenandoahScanRemembered.inline.hpp: As in 6, diff looks larger than it should because of the same indentation change. ShenandoahScanRemembered::process_clusters() is the method where the instrumentation probes have been inserted. A couple of variables were renamed for clarity, as well as ti update local variables rather than method arguments. The large diffs at (old) line 589 onwards is the git-diff'ism to do with indentation change. Delete some unused methods. > 8. shenandoah_globals.hpp: new diagnostic flag `ShenandoahEnableCardStats` protects the stats collection code and is disabled by default. > > **Format of stats produced and how to interpret them: (sample)** > > > [1211.515s][info][gc,task ] GC(7069) Using 10 of 20 workers for Concurrent remembered set scanning > [1211.529s][info][gc,remset ] GC(7069) Worker 0 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1245.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1157.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > [1211.529s][info][gc,remset ] GC(7069) Worker 1 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1257.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1197.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > ... > > > The rows represent the metric that's being tracked, and the columns are, respectively, minimum, the 3 quartiles (25%, 50%, 75%) and the maximum. The metrics are: > > - dirty_run: the length of an uninterrupted run of dirty cards, interpretedas a percentage of a chunk of work assignment (cluster) processed by a thread > - clean_run: as above, but the length of an uninterrupted run of clean cards > - dirty_cards, clean_cards: as above, but counts of cards as a percentage of chunk > - max_dirty_run & max_clean_run: Similarly for the maximum of each. > - dirty_objs, clean_objs: these are numbers of objects in any chunk walked, or scanned > - dirty_scans, clean_scans: numbers of objects scanned by the closure > - alternations: the number of times that we transitioned from clean to dirty or dirty to clean in a chunk > > The data above indicates that at least 75% of the chunks have no alternations at all, > and cards are almost always mostly clean for this specific benchmark config (extremem). > > Comparing worker stats from worker 0 and worker 9 indicates very little difference between > their statistics, as one might typically expect for well-balanced RS scans. > > **Questions:** > > 1. Would it make sense to print also, for example, the 1, 10, 90 and 99 percentiles for these metrics as well, in addition to the quartiles? > 2. The distributions are per worker for the cumulative history of the run. Would data per RS scan or per Refs Update phase provide more useful information? > 3. Any suggestions for a more easily consumable format? > 4. I welcome any other feedback on the pull request. Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: Cumulative card stats separated out for scan_rs and update_refs phases; merge of per-worker stats into phase-specific cumulative stats stubbed out for now until HdrSeq::merge() is done. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/176/files - new: https://git.openjdk.org/shenandoah/pull/176/files/0d65158c..a6b1a236 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=05 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=04-05 Stats: 33 lines in 3 files changed: 29 ins; 0 del; 4 mod Patch: https://git.openjdk.org/shenandoah/pull/176.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/176/head:pull/176 PR: https://git.openjdk.org/shenandoah/pull/176 From ysr at openjdk.org Mon Dec 12 21:36:36 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 12 Dec 2022 21:36:36 GMT Subject: RFR: Shrink tlab to capacity In-Reply-To: References: Message-ID: On Fri, 9 Dec 2022 23:23:43 GMT, Kelvin Nilsen wrote: > When a TLAB request exceeds the currently available memory within young-gen, the existing behavior is to reject the TLAB request outright. This is recognized as a failed allocation request, which triggers degenerated GC. > > This change introduces code to reduce the likelihood that too-large TLAB requests will be issued, and when they are issued, it makes an effort to shrink the TLAB request in order to reduce the need for degenerated GC. > > The impact is difficult to measure because this situation is fairly rare. On one Extremem workload, the TLAB-shrinking code is exercised only once during a 16-minute run involving 500 concurrent GCs, a 45 GiB heap, and a 28 GiB young-gen size. The change reduces the degenerated GCs from 6 to 5. > > One reason that the remaining 5 degenerated GCs are not addressed by this change is that further work is required to handle a situation in which a requested TLAB is smaller than the available young-gen memory, but available memory is set aside in the evacuation reserve so cannot be provided to a mutator. Future work will address this condition. Looks good, modulo a comment I left inline in the ShenandoahHeap::allocate_memory_under_lock() method. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1368: > 1366: // satisfy the allocation request. The reality is the actual TLAB size is likely to be even smaller, because it will > 1367: // depend on how much memory is available within mutator regions that are not yet fully used. > 1368: HeapWord* result = allocate_memory_under_lock(smaller_req, in_new_region, is_promotion); Can you help me understand the structure here. Would it not have been simpler to keep sufficient state at the point where the attempt to allocate the larger size failed, and we decided we would shrink the size of the request, to just make the smaller allocation request which would be guaranteed to succeed because we held the heap lock at that point already? Is there a reason to give up and reattempt the smaller allocation request afresh? I realize you explicitly added a scope to make this re-attempt outside the scope of the locker and make the recursive call, but am trying to understand the rationale for doing so. Perhaps it's because I am missing the big picture of the work being done here from various callers to this method, but may be you can help clarify that a bit. ------------- Marked as reviewed by ysr (Author). PR: https://git.openjdk.org/shenandoah/pull/180 From kvn at openjdk.org Mon Dec 12 21:37:56 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 Dec 2022 21:37:56 GMT Subject: RFR: 8298568: Fastdebug build fails after JDK-8296389 In-Reply-To: References: Message-ID: On Mon, 12 Dec 2022 12:37:51 GMT, Fei Yang wrote: > This is a trivial change fixing a typo introduced by JDK-8296389. > > The correct version is "is_NeverBranch()" instead of "isNeverBranch()". > > Testing: Fastdebug builds fine with this fix on linux-aarch64 platform. Good and trivial. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/11631 From ysr at openjdk.org Mon Dec 12 21:46:04 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 12 Dec 2022 21:46:04 GMT Subject: RFR: JDK-8297796 GenShen: instrument the remembered set scan [v7] In-Reply-To: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> References: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> Message-ID: > **Note:** > This pull request is a draft to share the diffs with the project team. The following additional work is planned before this is ready to commit. (Thanks to Kevin, Roman, William etc. for feedback & suggestions.) > > (1) Collect performance data from SpecJBB and from the pipeline to assess the impact of instrumentation on concurrent remembered set scanning and concurrent update refs phase durations, in addition to the existing data from Extermem mentioned in the ticket. > > (2) Make available the instrumentation only in non-product (optimized) mode until better performance is achieved. > > (3) Any improvements that come from further feedback on this draft (e.g. better or different logging of the metrics data), or other suggestions that I may have missed mentioning above. > > The fix to ShenandoahNumberSeq will be separated out and made into a separate pull request on mainline. > > **Summary:** > The main change is card stats collection during RS scanning. The code is protected by a new diagnostic flag `ShenandoahEnableCardStats`, which is off by default. With the flag disabled there is a small performance impact (measured with extremem; more data will be collected, see above). With the flag enabled there is a larger performance impact because of the large number of clusters, with shared stats updates at the end of each cluster processed. Since we expect the loops in process_clusters() to change in the near future, informed by the learnings from these stats, we expect to work further on reducing the cost of the stats collection as well. Currently the stats are logged per thread at the end of each RS scan. I'm happy to refine both the stats that we collect as well as how frequently we log the data once we have gathered some experience on how we use this. > > **Details of files changed:** > > 1. shenandoahGeneration.cpp: add a call to log info at the end of remembered set scan when card stats are enabled. > 2. shenandoahHeap.cpp: minor retsructuring of a loop for task claiming during update refs; introduce a worker id option to downstream code for card stats > 3. shenandoahNumberSeq.cpp: fix a minor issue with a boundary condition check in code that tries to find the right bucket to increment. This was triggering an assert in the update code. > 4. shenandoahNumberSeq.hpp: provide missing allocation spec for BinaryMagnitudeSeq > 5. shenandoahScanRemembered.cpp: new class ShenandoahCardStats methods. Minor restructure of loop for task claiming during RS scanning (akin to the one for update refs in 2 above). > 6. shenandoahScanRemembered.hpp: Diff looks large because of git-diff'ism having issues with indentation change in restructured if-else branches. Not sure how to make the diffs more easily readable. Updated some documentation comments that were slightly obsolete. New class ShenandoahCardStats and implementation of inline methods. Class ShenandoahScanRemembered keeps cumuative running histograms. Remove some inline declarations for larger methods that we shouldn't force inlining on. Update some old comments. > 7. shenandoahScanRemembered.inline.hpp: As in 6, diff looks larger than it should because of the same indentation change. ShenandoahScanRemembered::process_clusters() is the method where the instrumentation probes have been inserted. A couple of variables were renamed for clarity, as well as ti update local variables rather than method arguments. The large diffs at (old) line 589 onwards is the git-diff'ism to do with indentation change. Delete some unused methods. > 8. shenandoah_globals.hpp: new diagnostic flag `ShenandoahEnableCardStats` protects the stats collection code and is disabled by default. > > **Format of stats produced and how to interpret them: (sample)** > > > [1211.515s][info][gc,task ] GC(7069) Using 10 of 20 workers for Concurrent remembered set scanning > [1211.529s][info][gc,remset ] GC(7069) Worker 0 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1245.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1157.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > [1211.529s][info][gc,remset ] GC(7069) Worker 1 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1257.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1197.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > ... > > > The rows represent the metric that's being tracked, and the columns are, respectively, minimum, the 3 quartiles (25%, 50%, 75%) and the maximum. The metrics are: > > - dirty_run: the length of an uninterrupted run of dirty cards, interpretedas a percentage of a chunk of work assignment (cluster) processed by a thread > - clean_run: as above, but the length of an uninterrupted run of clean cards > - dirty_cards, clean_cards: as above, but counts of cards as a percentage of chunk > - max_dirty_run & max_clean_run: Similarly for the maximum of each. > - dirty_objs, clean_objs: these are numbers of objects in any chunk walked, or scanned > - dirty_scans, clean_scans: numbers of objects scanned by the closure > - alternations: the number of times that we transitioned from clean to dirty or dirty to clean in a chunk > > The data above indicates that at least 75% of the chunks have no alternations at all, > and cards are almost always mostly clean for this specific benchmark config (extremem). > > Comparing worker stats from worker 0 and worker 9 indicates very little difference between > their statistics, as one might typically expect for well-balanced RS scans. > > **Questions:** > > 1. Would it make sense to print also, for example, the 1, 10, 90 and 99 percentiles for these metrics as well, in addition to the quartiles? > 2. The distributions are per worker for the cumulative history of the run. Would data per RS scan or per Refs Update phase provide more useful information? > 3. Any suggestions for a more easily consumable format? > 4. I welcome any other feedback on the pull request. Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: jcheck clean ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/176/files - new: https://git.openjdk.org/shenandoah/pull/176/files/a6b1a236..d5b337bc Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=06 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/shenandoah/pull/176.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/176/head:pull/176 PR: https://git.openjdk.org/shenandoah/pull/176 From kdnilsen at openjdk.org Mon Dec 12 23:17:11 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 Dec 2022 23:17:11 GMT Subject: RFR: Shrink tlab to capacity [v2] In-Reply-To: References: Message-ID: > When a TLAB request exceeds the currently available memory within young-gen, the existing behavior is to reject the TLAB request outright. This is recognized as a failed allocation request, which triggers degenerated GC. > > This change introduces code to reduce the likelihood that too-large TLAB requests will be issued, and when they are issued, it makes an effort to shrink the TLAB request in order to reduce the need for degenerated GC. > > The impact is difficult to measure because this situation is fairly rare. On one Extremem workload, the TLAB-shrinking code is exercised only once during a 16-minute run involving 500 concurrent GCs, a 45 GiB heap, and a 28 GiB young-gen size. The change reduces the degenerated GCs from 6 to 5. > > One reason that the remaining 5 degenerated GCs are not addressed by this change is that further work is required to handle a situation in which a requested TLAB is smaller than the available young-gen memory, but available memory is set aside in the evacuation reserve so cannot be provided to a mutator. Future work will address this condition. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Clarify recursive implementation of allocate_memory_under_lock (with a comment) ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/180/files - new: https://git.openjdk.org/shenandoah/pull/180/files/774e07a1..2d5da073 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=180&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=180&range=00-01 Stats: 20 lines in 1 file changed: 17 ins; 0 del; 3 mod Patch: https://git.openjdk.org/shenandoah/pull/180.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/180/head:pull/180 PR: https://git.openjdk.org/shenandoah/pull/180 From kdnilsen at openjdk.org Mon Dec 12 23:17:11 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 12 Dec 2022 23:17:11 GMT Subject: RFR: Shrink tlab to capacity [v2] In-Reply-To: References:

Message-ID: On Mon, 12 Dec 2022 21:31:22 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Clarify recursive implementation of allocate_memory_under_lock >> >> (with a comment) > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1368: > >> 1366: // satisfy the allocation request. The reality is the actual TLAB size is likely to be even smaller, because it will >> 1367: // depend on how much memory is available within mutator regions that are not yet fully used. >> 1368: HeapWord* result = allocate_memory_under_lock(smaller_req, in_new_region, is_promotion); > > Can you help me understand the structure here. > > Would it not have been simpler to keep sufficient state at the point where the attempt to allocate the larger size failed, and we decided we would shrink the size of the request, to just make the smaller allocation request which would be guaranteed to succeed because we held the heap lock at that point already? Is there a reason to give up and reattempt the smaller allocation request afresh? > > I realize you explicitly added a scope to make this re-attempt outside the scope of the locker and make the recursive call, but am trying to understand the rationale for doing so. Perhaps it's because I am missing the big picture of the work being done here from various callers to this method, but may be you can help clarify that a bit. Thanks for your review. I'm adding a comment to clarify the recursive algorithm and use of secondary ShenandoahAllocationRequest argument. ------------- PR: https://git.openjdk.org/shenandoah/pull/180 From haosun at openjdk.org Tue Dec 13 00:16:49 2022 From: haosun at openjdk.org (Hao Sun) Date: Tue, 13 Dec 2022 00:16:49 GMT Subject: RFR: 8298568: Fastdebug build fails after JDK-8296389 In-Reply-To: References: Message-ID: On Mon, 12 Dec 2022 12:37:51 GMT, Fei Yang wrote: > This is a trivial change fixing a typo introduced by JDK-8296389. > > The correct version is "is_NeverBranch()" instead of "isNeverBranch()". > > Testing: Fastdebug builds fine with this fix on linux-aarch64 platform. LGTM. (I'm not a Reviewer) ------------- Marked as reviewed by haosun (Author). PR: https://git.openjdk.org/jdk/pull/11631 From fyang at openjdk.org Tue Dec 13 00:52:19 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 13 Dec 2022 00:52:19 GMT Subject: RFR: 8298568: Fastdebug build fails after JDK-8296389 In-Reply-To: References: Message-ID: On Mon, 12 Dec 2022 12:37:51 GMT, Fei Yang wrote: > This is a trivial change fixing a typo introduced by JDK-8296389. > > The correct version is "is_NeverBranch()" instead of "isNeverBranch()". > > Testing: Fastdebug builds fine with this fix on linux-aarch64 platform. Thank you! Let's /integrate ------------- PR: https://git.openjdk.org/jdk/pull/11631 From fyang at openjdk.org Tue Dec 13 01:01:40 2022 From: fyang at openjdk.org (Fei Yang) Date: Tue, 13 Dec 2022 01:01:40 GMT Subject: Integrated: 8298568: Fastdebug build fails after JDK-8296389 In-Reply-To: References: Message-ID: On Mon, 12 Dec 2022 12:37:51 GMT, Fei Yang wrote: > This is a trivial change fixing a typo introduced by JDK-8296389. > > The correct version is "is_NeverBranch()" instead of "isNeverBranch()". > > Testing: Fastdebug builds fine with this fix on linux-aarch64 platform. This pull request has now been integrated. Changeset: 173778e2 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/173778e2fee58e47d35197b78eb23f46154b5b2b Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8298568: Fastdebug build fails after JDK-8296389 Reviewed-by: rkennke, kvn, haosun ------------- PR: https://git.openjdk.org/jdk/pull/11631 From ysr at openjdk.org Tue Dec 13 01:17:20 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 13 Dec 2022 01:17:20 GMT Subject: RFR: Shrink tlab to capacity [v2] In-Reply-To: References:

Message-ID: On Mon, 12 Dec 2022 23:13:29 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1368: >> >>> 1366: // satisfy the allocation request. The reality is the actual TLAB size is likely to be even smaller, because it will >>> 1367: // depend on how much memory is available within mutator regions that are not yet fully used. >>> 1368: HeapWord* result = allocate_memory_under_lock(smaller_req, in_new_region, is_promotion); >> >> Can you help me understand the structure here. >> >> Would it not have been simpler to keep sufficient state at the point where the attempt to allocate the larger size failed, and we decided we would shrink the size of the request, to just make the smaller allocation request which would be guaranteed to succeed because we held the heap lock at that point already? Is there a reason to give up and reattempt the smaller allocation request afresh? >> >> I realize you explicitly added a scope to make this re-attempt outside the scope of the locker and make the recursive call, but am trying to understand the rationale for doing so. Perhaps it's because I am missing the big picture of the work being done here from various callers to this method, but may be you can help clarify that a bit. > > Thanks for your review. I'm adding a comment to clarify the recursive algorithm and use of secondary ShenandoahAllocationRequest argument. ok, thanks! ------------- PR: https://git.openjdk.org/shenandoah/pull/180 From wkemper at openjdk.org Tue Dec 13 18:52:17 2022 From: wkemper at openjdk.org (William Kemper) Date: Tue, 13 Dec 2022 18:52:17 GMT Subject: RFR: Allow adjusted capacity and used regions size to be equal Message-ID: Fix assertion which requires adjusted capacity to be larger than the used regions size (they may be equal). ------------- Commit messages: - Allow adjusted capacity and used regions size to be equal Changes: https://git.openjdk.org/shenandoah/pull/181/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=181&range=00 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/181.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/181/head:pull/181 PR: https://git.openjdk.org/shenandoah/pull/181 From wkemper at openjdk.org Tue Dec 13 18:55:35 2022 From: wkemper at openjdk.org (William Kemper) Date: Tue, 13 Dec 2022 18:55:35 GMT Subject: RFR: Generation sizing fixes Message-ID: <5u8AOWojAzGbSIL0W9XpeA1hTTXFUA2Dq_crBbJ6Z9o=.4587ca91-fc93-4073-96a2-93f6e63a06ef@github.com> Two small fixes: * Fix windows build * Need gc id for resumed old generation marking ------------- Commit messages: - Fix MAX2 arguments - Use GCIdMark when resuming old marking Changes: https://git.openjdk.org/shenandoah/pull/182/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=182&range=00 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/182.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/182/head:pull/182 PR: https://git.openjdk.org/shenandoah/pull/182 From ysr at openjdk.org Tue Dec 13 23:24:09 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 13 Dec 2022 23:24:09 GMT Subject: RFR: Allow adjusted capacity and used regions size to be equal In-Reply-To: References: Message-ID: On Tue, 13 Dec 2022 18:46:37 GMT, William Kemper wrote: > Fix assertion which requires adjusted capacity to be larger than the used regions size (they may be equal). LGTM! ------------- Marked as reviewed by ysr (Author). PR: https://git.openjdk.org/shenandoah/pull/181 From ysr at openjdk.org Tue Dec 13 23:26:09 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 13 Dec 2022 23:26:09 GMT Subject: RFR: Generation sizing fixes In-Reply-To: <5u8AOWojAzGbSIL0W9XpeA1hTTXFUA2Dq_crBbJ6Z9o=.4587ca91-fc93-4073-96a2-93f6e63a06ef@github.com> References: <5u8AOWojAzGbSIL0W9XpeA1hTTXFUA2Dq_crBbJ6Z9o=.4587ca91-fc93-4073-96a2-93f6e63a06ef@github.com> Message-ID: On Tue, 13 Dec 2022 18:48:59 GMT, William Kemper wrote: > Two small fixes: > * Fix windows build > * Need gc id for resumed old generation marking Marked as reviewed by ysr (Author). ------------- PR: https://git.openjdk.org/shenandoah/pull/182 From kdnilsen at openjdk.org Wed Dec 14 00:26:34 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 14 Dec 2022 00:26:34 GMT Subject: RFR: Allow adjusted capacity and used regions size to be equal In-Reply-To: References: Message-ID: On Tue, 13 Dec 2022 18:46:37 GMT, William Kemper wrote: > Fix assertion which requires adjusted capacity to be larger than the used regions size (they may be equal). Marked as reviewed by kdnilsen (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/181 From kdnilsen at openjdk.org Wed Dec 14 00:28:45 2022 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 14 Dec 2022 00:28:45 GMT Subject: RFR: Generation sizing fixes In-Reply-To: <5u8AOWojAzGbSIL0W9XpeA1hTTXFUA2Dq_crBbJ6Z9o=.4587ca91-fc93-4073-96a2-93f6e63a06ef@github.com> References: <5u8AOWojAzGbSIL0W9XpeA1hTTXFUA2Dq_crBbJ6Z9o=.4587ca91-fc93-4073-96a2-93f6e63a06ef@github.com> Message-ID: <0Jz3hrh-_sJeoEmTFhaJJIRVLmepJatoKGBipVBD-i0=.3f24f850-06fc-48c2-bc8d-37e79a139946@github.com> On Tue, 13 Dec 2022 18:48:59 GMT, William Kemper wrote: > Two small fixes: > * Fix windows build > * Need gc id for resumed old generation marking Marked as reviewed by kdnilsen (Committer). ------------- PR: https://git.openjdk.org/shenandoah/pull/182 From wkemper at openjdk.org Wed Dec 14 00:35:35 2022 From: wkemper at openjdk.org (William Kemper) Date: Wed, 14 Dec 2022 00:35:35 GMT Subject: RFR: Generation resizing [v3] In-Reply-To: References: <8Fse7IxO14Uc0eJJoLMmGXSo8XYD9Qb144mCyrMX3-g=.1d5d7258-f06d-4a03-bf78-8102f22ada7d@github.com>

Message-ID: On Fri, 9 Dec 2022 00:55:04 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 980: >> >>> 978: } >>> 979: >>> 980: void ShenandoahGeneration::increase_capacity(size_t increment) { >> >> is there some sanity check done on this elsewhere to make sure the increase/decrease make sense? Perhaps I'll see it in the caller(s) when I get to it. > > I see now that you do. Would it still be worthwhile asserting here as well that bounds are respected. Might make the code more maintainable in the face of changes. I will add asserts here. >> src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 90: >> >>> 88: // allocators by taking the heap lock). The amount of capacity to move >>> 89: // from one generation to another is controlled by YoungGenerationSizeIncrement >>> 90: // and defaults to 20% of the heap. The minimum and maximum sizes of the >> >> Is the transfer delta always 20%? Wouldn't that cause oscillations about an equilibrium point at steady load? But I should read on to see how this works. > > I think the way you use it, it's not 20% of the heap but rather 20% of the free space in the generation that will provide the transfer delta. May be reword for clarity? Yes, it's 20% of the available capacity. ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From wkemper at openjdk.org Wed Dec 14 00:49:47 2022 From: wkemper at openjdk.org (William Kemper) Date: Wed, 14 Dec 2022 00:49:47 GMT Subject: RFR: Generation resizing [v4] In-Reply-To: References:

Message-ID: On Thu, 8 Dec 2022 23:32:14 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'shenandoah-master' into mmu-instrumentation >> - Remove vestigial lock, do not enroll periodic task while holding threads_lock >> - Remove unnecessary logging, clean up imports >> - Merge from shenandoah/master >> - Document the class responsible for adjusting generation sizes >> - Revert unnecessary change >> - Remove unused time between cycle tracking >> - Remove vestigial mmu tracker instance >> - Clamp adjustments to min/max when increment is too large >> - Adjust generation sizes from safepoint >> - ... and 7 more: https://git.openjdk.org/shenandoah/compare/25469283...50896e31 > > src/hotspot/share/gc/shenandoah/mode/shenandoahGenerationalMode.cpp line 39: > >> 37: } >> 38: >> 39: SHENANDOAH_ERGO_OVERRIDE_DEFAULT(GCTimeRatio, 70); > > Does this translate to a GC overhead of 1/71*100% = 1.4%? I think it is a confusingly named parameter, but I'm interpreting it based on the description: > "Adaptive size policy application time to GC time ratio" Any time the average MMU drops below this number, it attempts to resize the generations. ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From wkemper at openjdk.org Wed Dec 14 00:54:44 2022 From: wkemper at openjdk.org (William Kemper) Date: Wed, 14 Dec 2022 00:54:44 GMT Subject: RFR: Generation resizing [v4] In-Reply-To: References:

Message-ID: <-VY3oESa76Aaivt-cx5T9CjmLrMDsm3lOM6aSm_w3iw=.76afd677-35be-4305-91a6-3338a6935e1c@github.com> On Thu, 8 Dec 2022 23:57:18 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'shenandoah-master' into mmu-instrumentation >> - Remove vestigial lock, do not enroll periodic task while holding threads_lock >> - Remove unnecessary logging, clean up imports >> - Merge from shenandoah/master >> - Document the class responsible for adjusting generation sizes >> - Revert unnecessary change >> - Remove unused time between cycle tracking >> - Remove vestigial mmu tracker instance >> - Clamp adjustments to min/max when increment is too large >> - Adjust generation sizes from safepoint >> - ... and 7 more: https://git.openjdk.org/shenandoah/compare/25469283...50896e31 > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 96: > >> 94: // This is only called by the periodic thread. >> 95: double process_time_s = process_time_seconds(); >> 96: double elapsed_process_time_s = process_time_s - _initial_process_time_s; > > elapsed -> delta I prefer 'elapsed' here when dealing with time deltas. ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From wkemper at openjdk.org Wed Dec 14 00:54:46 2022 From: wkemper at openjdk.org (William Kemper) Date: Wed, 14 Dec 2022 00:54:46 GMT Subject: RFR: Generation resizing [v4] In-Reply-To: References:

Message-ID: On Fri, 9 Dec 2022 00:12:56 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 99: >> >>> 97: _initial_process_time_s = process_time_s; >>> 98: double verify_time_s = gc_thread_time_seconds(); >>> 99: double verify_elapsed = verify_time_s - _initial_verify_collector_time_s; >> >> elapsed -> delta > > Why do you use the `verify_` prefix here? I'm sure I am missing something here... It's vestigial, I'll change it. ------------- PR: https://git.openjdk.org/shenandoah/pull/177 From mennen at openjdk.org Wed Dec 14 03:47:06 2022 From: mennen at openjdk.org (Michael Ennen) Date: Wed, 14 Dec 2022 03:47:06 GMT Subject: RFR: 8295044: Implementation of Foreign Function and Memory API (Second Preview) [v39] In-Reply-To: References: <-V_N0Cvh4J0vKNbBYdFcow9E8yFHRIjya8n69MpDSuY=.9626ee4d-95b6-41e4-b21e-395e79840388@github.com> Message-ID: <91WlM45Ykemls6D5vtXZMIIqjjECQTLVuJFhTLYXq-I=.4e670ac0-f1ff-480e-b18e-cea98d01bd6f@github.com> On Mon, 5 Dec 2022 13:46:09 GMT, Maurizio Cimadamore wrote: >> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix Preview annotation for JEP 434 > > Note: there are 4 tests failing in x86: > * MemoryLayoutPrincipalTotalityTest > * MemoryLayoutTypeRetentionTest > * TestLargeSegmentCopy > * TestLinker > > These failures are addressed in the dependent PR: https://git.openjdk.org/jdk/pull/11019, which will be integrated immediately after these changes @mcimadamore This PR made my code in [java-vulkan](https://github.com/brcolow/java-vulkan/commit/171f167782eea538b19b60d5fa73e9f75a112f6d) much cleaner! Nice work! ------------- PR: https://git.openjdk.org/jdk/pull/10872 From ysr at openjdk.org Wed Dec 14 08:22:56 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 14 Dec 2022 08:22:56 GMT Subject: RFR: JDK-8297796 GenShen: instrument the remembered set scan [v8] In-Reply-To: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> References: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> Message-ID: > **Note:** > This pull request is a draft to share the diffs with the project team. The following additional work is planned before this is ready to commit. (Thanks to Kevin, Roman, William etc. for feedback & suggestions.) > > (1) Collect performance data from SpecJBB and from the pipeline to assess the impact of instrumentation on concurrent remembered set scanning and concurrent update refs phase durations, in addition to the existing data from Extermem mentioned in the ticket. > > (2) Make available the instrumentation only in non-product (optimized) mode until better performance is achieved. > > (3) Any improvements that come from further feedback on this draft (e.g. better or different logging of the metrics data), or other suggestions that I may have missed mentioning above. > > The fix to ShenandoahNumberSeq will be separated out and made into a separate pull request on mainline. > > **Summary:** > The main change is card stats collection during RS scanning. The code is protected by a new diagnostic flag `ShenandoahEnableCardStats`, which is off by default. With the flag disabled there is a small performance impact (measured with extremem; more data will be collected, see above). With the flag enabled there is a larger performance impact because of the large number of clusters, with shared stats updates at the end of each cluster processed. Since we expect the loops in process_clusters() to change in the near future, informed by the learnings from these stats, we expect to work further on reducing the cost of the stats collection as well. Currently the stats are logged per thread at the end of each RS scan. I'm happy to refine both the stats that we collect as well as how frequently we log the data once we have gathered some experience on how we use this. > > **Details of files changed:** > > 1. shenandoahGeneration.cpp: add a call to log info at the end of remembered set scan when card stats are enabled. > 2. shenandoahHeap.cpp: minor retsructuring of a loop for task claiming during update refs; introduce a worker id option to downstream code for card stats > 3. shenandoahNumberSeq.cpp: fix a minor issue with a boundary condition check in code that tries to find the right bucket to increment. This was triggering an assert in the update code. > 4. shenandoahNumberSeq.hpp: provide missing allocation spec for BinaryMagnitudeSeq > 5. shenandoahScanRemembered.cpp: new class ShenandoahCardStats methods. Minor restructure of loop for task claiming during RS scanning (akin to the one for update refs in 2 above). > 6. shenandoahScanRemembered.hpp: Diff looks large because of git-diff'ism having issues with indentation change in restructured if-else branches. Not sure how to make the diffs more easily readable. Updated some documentation comments that were slightly obsolete. New class ShenandoahCardStats and implementation of inline methods. Class ShenandoahScanRemembered keeps cumuative running histograms. Remove some inline declarations for larger methods that we shouldn't force inlining on. Update some old comments. > 7. shenandoahScanRemembered.inline.hpp: As in 6, diff looks larger than it should because of the same indentation change. ShenandoahScanRemembered::process_clusters() is the method where the instrumentation probes have been inserted. A couple of variables were renamed for clarity, as well as ti update local variables rather than method arguments. The large diffs at (old) line 589 onwards is the git-diff'ism to do with indentation change. Delete some unused methods. > 8. shenandoah_globals.hpp: new diagnostic flag `ShenandoahEnableCardStats` protects the stats collection code and is disabled by default. > > **Format of stats produced and how to interpret them: (sample)** > > > [1211.515s][info][gc,task ] GC(7069) Using 10 of 20 workers for Concurrent remembered set scanning > [1211.529s][info][gc,remset ] GC(7069) Worker 0 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1245.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1157.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > [1211.529s][info][gc,remset ] GC(7069) Worker 1 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1257.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1197.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > ... > > > The rows represent the metric that's being tracked, and the columns are, respectively, minimum, the 3 quartiles (25%, 50%, 75%) and the maximum. The metrics are: > > - dirty_run: the length of an uninterrupted run of dirty cards, interpretedas a percentage of a chunk of work assignment (cluster) processed by a thread > - clean_run: as above, but the length of an uninterrupted run of clean cards > - dirty_cards, clean_cards: as above, but counts of cards as a percentage of chunk > - max_dirty_run & max_clean_run: Similarly for the maximum of each. > - dirty_objs, clean_objs: these are numbers of objects in any chunk walked, or scanned > - dirty_scans, clean_scans: numbers of objects scanned by the closure > - alternations: the number of times that we transitioned from clean to dirty or dirty to clean in a chunk > > The data above indicates that at least 75% of the chunks have no alternations at all, > and cards are almost always mostly clean for this specific benchmark config (extremem). > > Comparing worker stats from worker 0 and worker 9 indicates very little difference between > their statistics, as one might typically expect for well-balanced RS scans. > > **Questions:** > > 1. Would it make sense to print also, for example, the 1, 10, 90 and 99 percentiles for these metrics as well, in addition to the quartiles? > 2. The distributions are per worker for the cumulative history of the run. Would data per RS scan or per Refs Update phase provide more useful information? > 3. Any suggestions for a more easily consumable format? > 4. I welcome any other feedback on the pull request. Y. Srinivas Ramakrishna has updated the pull request incrementally with five additional commits since the last revision: - Remove stubs (guarantee/false). This still needs formal work on merging decayed stats, but is OK to ignore for now because no one currently uses the decayed stats. The non-decayed stats also need further review and correction. So this is still an interim checkin. To do: -- print final summary at exit; consider if periodic cumulative summary might be useful as well (Every major collection cycles?) -- check correctness of merged data (ignoring decayed statistics for now) - Merge branch 'stats_merge' into JVM-1264 - More merge() implementation. -- Need to think about merge of decaying stats in AbsSeq. -- Need to add tests. - Interim checkin of code w/beginnings of merge() support. Some implementations are still stubbed out and need to be written. - First cut at merge. More changes to come. May not build yet. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/176/files - new: https://git.openjdk.org/shenandoah/pull/176/files/d5b337bc..695851da Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=07 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=06-07 Stats: 84 lines in 5 files changed: 83 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/176.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/176/head:pull/176 PR: https://git.openjdk.org/shenandoah/pull/176 From rkennke at openjdk.org Wed Dec 14 15:47:00 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 14 Dec 2022 15:47:00 GMT Subject: RFR: 8291555: Replace stack-locking with fast-locking [v8] In-Reply-To: References:

Message-ID: On Fri, 28 Oct 2022 09:32:58 GMT, Roman Kennke wrote: >> This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> ### Benchmarks >> >> All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. >> >> #### DaCapo/AArch64 >> >> Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? >> >> benchmark | baseline | fast-locking | % | size >> -- | -- | -- | -- | -- >> avrora | 27859 | 27563 | 1.07% | large >> batik | 20786 | 20847 | -0.29% | large >> biojava | 27421 | 27334 | 0.32% | default >> eclipse | 59918 | 60522 | -1.00% | large >> fop | 3670 | 3678 | -0.22% | default >> graphchi | 2088 | 2060 | 1.36% | default >> h2 | 297391 | 291292 | 2.09% | huge >> jme | 8762 | 8877 | -1.30% | default >> jython | 18938 | 18878 | 0.32% | default >> luindex | 1339 | 1325 | 1.06% | default >> lusearch | 918 | 936 | -1.92% | default >> pmd | 58291 | 58423 | -0.23% | large >> sunflow | 32617 | 24961 | 30.67% | large >> tomcat | 25481 | 25992 | -1.97% | large >> tradebeans | 314640 | 311706 | 0.94% | huge >> tradesoap | 107473 | 110246 | -2.52% | huge >> xalan | 6047 | 5882 | 2.81% | default >> zxing | 970 | 926 | 4.75% | default >> >> #### DaCapo/x86_64 >> >> The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. >> >> benchmark | baseline | fast-Locking | % | size >> -- | -- | -- | -- | -- >> avrora | 127690 | 126749 | 0.74% | large >> batik | 12736 | 12641 | 0.75% | large >> biojava | 15423 | 15404 | 0.12% | default >> eclipse | 41174 | 41498 | -0.78% | large >> fop | 2184 | 2172 | 0.55% | default >> graphchi | 1579 | 1560 | 1.22% | default >> h2 | 227614 | 230040 | -1.05% | huge >> jme | 8591 | 8398 | 2.30% | default >> jython | 13473 | 13356 | 0.88% | default >> luindex | 824 | 813 | 1.35% | default >> lusearch | 962 | 968 | -0.62% | default >> pmd | 40827 | 39654 | 2.96% | large >> sunflow | 53362 | 43475 | 22.74% | large >> tomcat | 27549 | 28029 | -1.71% | large >> tradebeans | 190757 | 190994 | -0.12% | huge >> tradesoap | 68099 | 67934 | 0.24% | huge >> xalan | 7969 | 8178 | -2.56% | default >> zxing | 1176 | 1148 | 2.44% | default >> >> #### Renaissance/AArch64 >> >> This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 2558.832 | 2513.594 | 1.80% >> Reactors | 14715.626 | 14311.246 | 2.83% >> Als | 1851.485 | 1869.622 | -0.97% >> ChiSquare | 1007.788 | 1003.165 | 0.46% >> GaussMix | 1157.491 | 1149.969 | 0.65% >> LogRegression | 717.772 | 733.576 | -2.15% >> MovieLens | 7916.181 | 8002.226 | -1.08% >> NaiveBayes | 395.296 | 386.611 | 2.25% >> PageRank | 4294.939 | 4346.333 | -1.18% >> FjKmeans | 496.076 | 493.873 | 0.45% >> FutureGenetic | 2578.504 | 2589.255 | -0.42% >> Mnemonics | 4898.886 | 4903.689 | -0.10% >> ParMnemonics | 4260.507 | 4210.121 | 1.20% >> Scrabble | 139.37 | 138.312 | 0.76% >> RxScrabble | 320.114 | 322.651 | -0.79% >> Dotty | 1056.543 | 1068.492 | -1.12% >> ScalaDoku | 3443.117 | 3449.477 | -0.18% >> ScalaKmeans | 259.384 | 258.648 | 0.28% >> Philosophers | 24333.311 | 23438.22 | 3.82% >> ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% >> FinagleChirper | 6814.192 | 6853.38 | -0.57% >> FinagleHttp | 4762.902 | 4807.564 | -0.93% >> >> #### Renaissance/x86_64 >> >> benchmark | baseline | fast-locking | % >> -- | -- | -- | -- >> AkkaUct | 1117.185 | 1116.425 | 0.07% >> Reactors | 11561.354 | 11812.499 | -2.13% >> Als | 1580.838 | 1575.318 | 0.35% >> ChiSquare | 459.601 | 467.109 | -1.61% >> GaussMix | 705.944 | 685.595 | 2.97% >> LogRegression | 659.944 | 656.428 | 0.54% >> MovieLens | 7434.303 | 7592.271 | -2.08% >> NaiveBayes | 413.482 | 417.369 | -0.93% >> PageRank | 3259.233 | 3276.589 | -0.53% >> FjKmeans | 946.429 | 938.991 | 0.79% >> FutureGenetic | 1760.672 | 1815.272 | -3.01% >> ParMnemonics | 2016.917 | 2033.101 | -0.80% >> Scrabble | 147.996 | 150.084 | -1.39% >> RxScrabble | 177.755 | 177.956 | -0.11% >> Dotty | 673.754 | 683.919 | -1.49% >> ScalaDoku | 2193.562 | 1958.419 | 12.01% >> ScalaKmeans | 165.376 | 168.925 | -2.10% >> ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% >> Philosophers | 14268.449 | 13308.87 | 7.21% >> FinagleChirper | 4722.13 | 4688.3 | 0.72% >> FinagleHttp | 3497.241 | 3605.118 | -2.99% >> >> Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. >> >> I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). >> >> Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. >> >> ### Testing >> - [x] tier1 (x86_64, aarch64, x86_32) >> - [x] tier2 (x86_64, aarch64) >> - [x] tier3 (x86_64, aarch64) >> - [x] tier4 (x86_64, aarch64) >> - [x] jcstress 3-days -t sync -af GLOBAL (x86_64, aarch64) > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: > > - Merge remote-tracking branch 'upstream/master' into fast-locking > - Merge remote-tracking branch 'upstream/master' into fast-locking > - Merge remote-tracking branch 'upstream/master' into fast-locking > - More RISC-V fixes > - Merge remote-tracking branch 'origin/fast-locking' into fast-locking > - RISC-V port > - Revert "Re-use r0 in call to unlock_object()" > > This reverts commit ebbcb615a788998596f403b47b72cf133cb9de46. > - Merge remote-tracking branch 'origin/fast-locking' into fast-locking > - Fix number of rt args to complete_monitor_locking_C, remove some comments > - Re-use r0 in call to unlock_object() > - ... and 27 more: https://git.openjdk.org/jdk/compare/4b89fce0...3f0acba4 Closing this in favour of #10907. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From rkennke at openjdk.org Wed Dec 14 15:47:01 2022 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 14 Dec 2022 15:47:01 GMT Subject: Withdrawn: 8291555: Replace stack-locking with fast-locking In-Reply-To: References: Message-ID: On Thu, 6 Oct 2022 10:23:04 GMT, Roman Kennke wrote: > This change replaces the current stack-locking implementation with a fast-locking scheme that retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation. And because of the very racy nature, this turns out to be very complex and involved a variant of the inflation protocol to ensure that the object header is stable. > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations. The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > ### Benchmarks > > All benchmarks are run on server-class metal machines. The JVM settings are always: `-Xmx20g -Xms20g -XX:+UseParallelGC`. All benchmarks are ms/ops, less is better. > > #### DaCapo/AArch64 > > Those measurements have been taken on a Graviton2 box with 64 CPU cores (an AWS m6g.metal instance). It is using DaCapo evaluation version, git hash 309e1fa (download file dacapo-evaluation-git+309e1fa.jar). I needed to exclude cassandra, h2o & kafka benchmarks because of incompatibility with JDK20. Benchmarks that showed results far off the baseline or showed high variance have been repeated and I am reporting results with the most bias *against* fast-locking. The sunflow benchmark is really far off the mark - the baseline run with stack-locking exhibited very high run-to-run variance and generally much worse performance, while with fast-locking the variance was very low and the results very stable between runs. I wouldn't trust that benchmark - I mean what is it actually doing that a change in locking shows >30% perf difference? > > benchmark | baseline | fast-locking | % | size > -- | -- | -- | -- | -- > avrora | 27859 | 27563 | 1.07% | large > batik | 20786 | 20847 | -0.29% | large > biojava | 27421 | 27334 | 0.32% | default > eclipse | 59918 | 60522 | -1.00% | large > fop | 3670 | 3678 | -0.22% | default > graphchi | 2088 | 2060 | 1.36% | default > h2 | 297391 | 291292 | 2.09% | huge > jme | 8762 | 8877 | -1.30% | default > jython | 18938 | 18878 | 0.32% | default > luindex | 1339 | 1325 | 1.06% | default > lusearch | 918 | 936 | -1.92% | default > pmd | 58291 | 58423 | -0.23% | large > sunflow | 32617 | 24961 | 30.67% | large > tomcat | 25481 | 25992 | -1.97% | large > tradebeans | 314640 | 311706 | 0.94% | huge > tradesoap | 107473 | 110246 | -2.52% | huge > xalan | 6047 | 5882 | 2.81% | default > zxing | 970 | 926 | 4.75% | default > > #### DaCapo/x86_64 > > The following measurements have been taken on an Intel Xeon Scalable Processors (Cascade Lake 8252C) (an AWS m5zn.metal instance). All the same settings and considerations as in the measurements above. > > benchmark | baseline | fast-Locking | % | size > -- | -- | -- | -- | -- > avrora | 127690 | 126749 | 0.74% | large > batik | 12736 | 12641 | 0.75% | large > biojava | 15423 | 15404 | 0.12% | default > eclipse | 41174 | 41498 | -0.78% | large > fop | 2184 | 2172 | 0.55% | default > graphchi | 1579 | 1560 | 1.22% | default > h2 | 227614 | 230040 | -1.05% | huge > jme | 8591 | 8398 | 2.30% | default > jython | 13473 | 13356 | 0.88% | default > luindex | 824 | 813 | 1.35% | default > lusearch | 962 | 968 | -0.62% | default > pmd | 40827 | 39654 | 2.96% | large > sunflow | 53362 | 43475 | 22.74% | large > tomcat | 27549 | 28029 | -1.71% | large > tradebeans | 190757 | 190994 | -0.12% | huge > tradesoap | 68099 | 67934 | 0.24% | huge > xalan | 7969 | 8178 | -2.56% | default > zxing | 1176 | 1148 | 2.44% | default > > #### Renaissance/AArch64 > > This tests Renaissance/JMH version 0.14.1 on same machines as DaCapo above, with same JVM settings. > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 2558.832 | 2513.594 | 1.80% > Reactors | 14715.626 | 14311.246 | 2.83% > Als | 1851.485 | 1869.622 | -0.97% > ChiSquare | 1007.788 | 1003.165 | 0.46% > GaussMix | 1157.491 | 1149.969 | 0.65% > LogRegression | 717.772 | 733.576 | -2.15% > MovieLens | 7916.181 | 8002.226 | -1.08% > NaiveBayes | 395.296 | 386.611 | 2.25% > PageRank | 4294.939 | 4346.333 | -1.18% > FjKmeans | 496.076 | 493.873 | 0.45% > FutureGenetic | 2578.504 | 2589.255 | -0.42% > Mnemonics | 4898.886 | 4903.689 | -0.10% > ParMnemonics | 4260.507 | 4210.121 | 1.20% > Scrabble | 139.37 | 138.312 | 0.76% > RxScrabble | 320.114 | 322.651 | -0.79% > Dotty | 1056.543 | 1068.492 | -1.12% > ScalaDoku | 3443.117 | 3449.477 | -0.18% > ScalaKmeans | 259.384 | 258.648 | 0.28% > Philosophers | 24333.311 | 23438.22 | 3.82% > ScalaStmBench7 | 1102.43 | 1115.142 | -1.14% > FinagleChirper | 6814.192 | 6853.38 | -0.57% > FinagleHttp | 4762.902 | 4807.564 | -0.93% > > #### Renaissance/x86_64 > > benchmark | baseline | fast-locking | % > -- | -- | -- | -- > AkkaUct | 1117.185 | 1116.425 | 0.07% > Reactors | 11561.354 | 11812.499 | -2.13% > Als | 1580.838 | 1575.318 | 0.35% > ChiSquare | 459.601 | 467.109 | -1.61% > GaussMix | 705.944 | 685.595 | 2.97% > LogRegression | 659.944 | 656.428 | 0.54% > MovieLens | 7434.303 | 7592.271 | -2.08% > NaiveBayes | 413.482 | 417.369 | -0.93% > PageRank | 3259.233 | 3276.589 | -0.53% > FjKmeans | 946.429 | 938.991 | 0.79% > FutureGenetic | 1760.672 | 1815.272 | -3.01% > ParMnemonics | 2016.917 | 2033.101 | -0.80% > Scrabble | 147.996 | 150.084 | -1.39% > RxScrabble | 177.755 | 177.956 | -0.11% > Dotty | 673.754 | 683.919 | -1.49% > ScalaDoku | 2193.562 | 1958.419 | 12.01% > ScalaKmeans | 165.376 | 168.925 | -2.10% > ScalaStmBench7 | 1080.187 | 1049.184 | 2.95% > Philosophers | 14268.449 | 13308.87 | 7.21% > FinagleChirper | 4722.13 | 4688.3 | 0.72% > FinagleHttp | 3497.241 | 3605.118 | -2.99% > > Some renaissance benchmarks are missing: DecTree, DbShootout and Neo4jAnalytics are not compatible with JDK20. The remaining benchmarks show very high run-to-run variance, which I am investigating (and probably addressing with running them much more often. > > I have also run another benchmark, which is a popular Java JVM benchmark, with workloads wrapped in JMH and very slightly modified to run with newer JDKs, but I won't publish the results because I am not sure about the licensing terms. They look similar to the measurements above (i.e. +/- 2%, nothing very suspicious). > > Please let me know if you want me to run any other workloads, or, even better, run them yourself and report here. > > ### Testing > - [x] tier1 (x86_64, aarch64, x86_32) > - [x] tier2 (x86_64, aarch64) > - [x] tier3 (x86_64, aarch64) > - [x] tier4 (x86_64, aarch64) > - [x] jcstress 3-days -t sync -af GLOBAL (x86_64, aarch64) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/10590 From andrew at openjdk.org Wed Dec 14 16:44:55 2022 From: andrew at openjdk.org (Andrew John Hughes) Date: Wed, 14 Dec 2022 16:44:55 GMT Subject: RFR: Merge jdk8u:master Message-ID: Merge jdk8u332-b02 ------------- Commit messages: - Merge jdk8u332-b02 - Merge - 8273575: memory leak in appendBootClassPath(), paths must be deallocated - 8141508: java.lang.invoke.LambdaConversionException: Invalid receiver type - 8209178: Proxied HttpsURLConnection doesn't send BODY when retrying POST request - 8273341: Update Siphash to version 1.0 - 8273229: Update OS detection code to recognize Windows Server 2022 - Added tag jdk8u332-b01 for changeset b81aa0cb6267 The merge commit only contains trivial merges, so no merge-specific webrevs have been generated. Changes: https://git.openjdk.org/shenandoah-jdk8u/pull/7/files Stats: 536 lines in 10 files changed: 502 ins; 11 del; 23 mod Patch: https://git.openjdk.org/shenandoah-jdk8u/pull/7.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk8u pull/7/head:pull/7 PR: https://git.openjdk.org/shenandoah-jdk8u/pull/7 From wkemper at openjdk.org Wed Dec 14 16:54:31 2022 From: wkemper at openjdk.org (William Kemper) Date: Wed, 14 Dec 2022 16:54:31 GMT Subject: Integrated: Generation sizing fixes In-Reply-To: <5u8AOWojAzGbSIL0W9XpeA1hTTXFUA2Dq_crBbJ6Z9o=.4587ca91-fc93-4073-96a2-93f6e63a06ef@github.com> References: <5u8AOWojAzGbSIL0W9XpeA1hTTXFUA2Dq_crBbJ6Z9o=.4587ca91-fc93-4073-96a2-93f6e63a06ef@github.com> Message-ID: <8h3ssMXXg9m4MZsp8WXOHULK2ANszaMShty_a8GSHtM=.304d95db-06a3-4045-aa52-e58ad3ee3af5@github.com> On Tue, 13 Dec 2022 18:48:59 GMT, William Kemper wrote: > Two small fixes: > * Fix windows build > * Need gc id for resumed old generation marking This pull request has now been integrated. Changeset: 7a3ebbcd Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/7a3ebbcdae1659ba9bb9be04e6419aa3b34cc8c9 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Generation sizing fixes Reviewed-by: ysr, kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/182 From wkemper at openjdk.org Wed Dec 14 16:54:40 2022 From: wkemper at openjdk.org (William Kemper) Date: Wed, 14 Dec 2022 16:54:40 GMT Subject: Integrated: Allow adjusted capacity and used regions size to be equal In-Reply-To: References: Message-ID: On Tue, 13 Dec 2022 18:46:37 GMT, William Kemper wrote: > Fix assertion which requires adjusted capacity to be larger than the used regions size (they may be equal). This pull request has now been integrated. Changeset: 35b26d60 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/35b26d605a61cba864458e4493bf80fe7fda31ad Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Allow adjusted capacity and used regions size to be equal Reviewed-by: ysr, kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/181 From ysr at openjdk.org Thu Dec 15 09:08:24 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 15 Dec 2022 09:08:24 GMT Subject: RFR: JDK-8297796 GenShen: instrument the remembered set scan [v9] In-Reply-To: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> References: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> Message-ID: > **Note:** > This pull request is a draft to share the diffs with the project team. The following additional work is planned before this is ready to commit. (Thanks to Kevin, Roman, William etc. for feedback & suggestions.) > > (1) Collect performance data from SpecJBB and from the pipeline to assess the impact of instrumentation on concurrent remembered set scanning and concurrent update refs phase durations, in addition to the existing data from Extermem mentioned in the ticket. > > (2) Make available the instrumentation only in non-product (optimized) mode until better performance is achieved. > > (3) Any improvements that come from further feedback on this draft (e.g. better or different logging of the metrics data), or other suggestions that I may have missed mentioning above. > > The fix to ShenandoahNumberSeq will be separated out and made into a separate pull request on mainline. > > **Summary:** > The main change is card stats collection during RS scanning. The code is protected by a new diagnostic flag `ShenandoahEnableCardStats`, which is off by default. With the flag disabled there is a small performance impact (measured with extremem; more data will be collected, see above). With the flag enabled there is a larger performance impact because of the large number of clusters, with shared stats updates at the end of each cluster processed. Since we expect the loops in process_clusters() to change in the near future, informed by the learnings from these stats, we expect to work further on reducing the cost of the stats collection as well. Currently the stats are logged per thread at the end of each RS scan. I'm happy to refine both the stats that we collect as well as how frequently we log the data once we have gathered some experience on how we use this. > > **Details of files changed:** > > 1. shenandoahGeneration.cpp: add a call to log info at the end of remembered set scan when card stats are enabled. > 2. shenandoahHeap.cpp: minor retsructuring of a loop for task claiming during update refs; introduce a worker id option to downstream code for card stats > 3. shenandoahNumberSeq.cpp: fix a minor issue with a boundary condition check in code that tries to find the right bucket to increment. This was triggering an assert in the update code. > 4. shenandoahNumberSeq.hpp: provide missing allocation spec for BinaryMagnitudeSeq > 5. shenandoahScanRemembered.cpp: new class ShenandoahCardStats methods. Minor restructure of loop for task claiming during RS scanning (akin to the one for update refs in 2 above). > 6. shenandoahScanRemembered.hpp: Diff looks large because of git-diff'ism having issues with indentation change in restructured if-else branches. Not sure how to make the diffs more easily readable. Updated some documentation comments that were slightly obsolete. New class ShenandoahCardStats and implementation of inline methods. Class ShenandoahScanRemembered keeps cumuative running histograms. Remove some inline declarations for larger methods that we shouldn't force inlining on. Update some old comments. > 7. shenandoahScanRemembered.inline.hpp: As in 6, diff looks larger than it should because of the same indentation change. ShenandoahScanRemembered::process_clusters() is the method where the instrumentation probes have been inserted. A couple of variables were renamed for clarity, as well as ti update local variables rather than method arguments. The large diffs at (old) line 589 onwards is the git-diff'ism to do with indentation change. Delete some unused methods. > 8. shenandoah_globals.hpp: new diagnostic flag `ShenandoahEnableCardStats` protects the stats collection code and is disabled by default. > > **Format of stats produced and how to interpret them: (sample)** > > > [1211.515s][info][gc,task ] GC(7069) Using 10 of 20 workers for Concurrent remembered set scanning > [1211.529s][info][gc,remset ] GC(7069) Worker 0 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1245.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1157.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > [1211.529s][info][gc,remset ] GC(7069) Worker 1 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1257.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1197.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > ... > > > The rows represent the metric that's being tracked, and the columns are, respectively, minimum, the 3 quartiles (25%, 50%, 75%) and the maximum. The metrics are: > > - dirty_run: the length of an uninterrupted run of dirty cards, interpretedas a percentage of a chunk of work assignment (cluster) processed by a thread > - clean_run: as above, but the length of an uninterrupted run of clean cards > - dirty_cards, clean_cards: as above, but counts of cards as a percentage of chunk > - max_dirty_run & max_clean_run: Similarly for the maximum of each. > - dirty_objs, clean_objs: these are numbers of objects in any chunk walked, or scanned > - dirty_scans, clean_scans: numbers of objects scanned by the closure > - alternations: the number of times that we transitioned from clean to dirty or dirty to clean in a chunk > > The data above indicates that at least 75% of the chunks have no alternations at all, > and cards are almost always mostly clean for this specific benchmark config (extremem). > > Comparing worker stats from worker 0 and worker 9 indicates very little difference between > their statistics, as one might typically expect for well-balanced RS scans. > > **Questions:** > > 1. Would it make sense to print also, for example, the 1, 10, 90 and 99 percentiles for these metrics as well, in addition to the quartiles? > 2. The distributions are per worker for the cumulative history of the run. Would data per RS scan or per Refs Update phase provide more useful information? > 3. Any suggestions for a more easily consumable format? > 4. I welcome any other feedback on the pull request. Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: Tested and fixed some bugs; printing frequency of cumulative stats controlled by command-line option. Ready for review. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/176/files - new: https://git.openjdk.org/shenandoah/pull/176/files/695851da..75c09268 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=08 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=07-08 Stats: 64 lines in 7 files changed: 32 ins; 6 del; 26 mod Patch: https://git.openjdk.org/shenandoah/pull/176.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/176/head:pull/176 PR: https://git.openjdk.org/shenandoah/pull/176 From ysr at openjdk.org Thu Dec 15 09:13:39 2022 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 15 Dec 2022 09:13:39 GMT Subject: RFR: JDK-8297796 GenShen: instrument the remembered set scan [v10] In-Reply-To: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> References: <3-iFBSR1DHkrBgskzogR_KdmBvQtPQXb3MiHuqd-y7c=.7ae6200d-ed99-4766-b1a5-e331c4dcbb13@github.com> Message-ID: > **Note:** > This pull request is a draft to share the diffs with the project team. The following additional work is planned before this is ready to commit. (Thanks to Kevin, Roman, William etc. for feedback & suggestions.) > > (1) Collect performance data from SpecJBB and from the pipeline to assess the impact of instrumentation on concurrent remembered set scanning and concurrent update refs phase durations, in addition to the existing data from Extermem mentioned in the ticket. > > (2) Make available the instrumentation only in non-product (optimized) mode until better performance is achieved. > > (3) Any improvements that come from further feedback on this draft (e.g. better or different logging of the metrics data), or other suggestions that I may have missed mentioning above. > > The fix to ShenandoahNumberSeq will be separated out and made into a separate pull request on mainline. > > **Summary:** > The main change is card stats collection during RS scanning. The code is protected by a new diagnostic flag `ShenandoahEnableCardStats`, which is off by default. With the flag disabled there is a small performance impact (measured with extremem; more data will be collected, see above). With the flag enabled there is a larger performance impact because of the large number of clusters, with shared stats updates at the end of each cluster processed. Since we expect the loops in process_clusters() to change in the near future, informed by the learnings from these stats, we expect to work further on reducing the cost of the stats collection as well. Currently the stats are logged per thread at the end of each RS scan. I'm happy to refine both the stats that we collect as well as how frequently we log the data once we have gathered some experience on how we use this. > > **Details of files changed:** > > 1. shenandoahGeneration.cpp: add a call to log info at the end of remembered set scan when card stats are enabled. > 2. shenandoahHeap.cpp: minor retsructuring of a loop for task claiming during update refs; introduce a worker id option to downstream code for card stats > 3. shenandoahNumberSeq.cpp: fix a minor issue with a boundary condition check in code that tries to find the right bucket to increment. This was triggering an assert in the update code. > 4. shenandoahNumberSeq.hpp: provide missing allocation spec for BinaryMagnitudeSeq > 5. shenandoahScanRemembered.cpp: new class ShenandoahCardStats methods. Minor restructure of loop for task claiming during RS scanning (akin to the one for update refs in 2 above). > 6. shenandoahScanRemembered.hpp: Diff looks large because of git-diff'ism having issues with indentation change in restructured if-else branches. Not sure how to make the diffs more easily readable. Updated some documentation comments that were slightly obsolete. New class ShenandoahCardStats and implementation of inline methods. Class ShenandoahScanRemembered keeps cumuative running histograms. Remove some inline declarations for larger methods that we shouldn't force inlining on. Update some old comments. > 7. shenandoahScanRemembered.inline.hpp: As in 6, diff looks larger than it should because of the same indentation change. ShenandoahScanRemembered::process_clusters() is the method where the instrumentation probes have been inserted. A couple of variables were renamed for clarity, as well as ti update local variables rather than method arguments. The large diffs at (old) line 589 onwards is the git-diff'ism to do with indentation change. Delete some unused methods. > 8. shenandoah_globals.hpp: new diagnostic flag `ShenandoahEnableCardStats` protects the stats collection code and is disabled by default. > > **Format of stats produced and how to interpret them: (sample)** > > > [1211.515s][info][gc,task ] GC(7069) Using 10 of 20 workers for Concurrent remembered set scanning > [1211.529s][info][gc,remset ] GC(7069) Worker 0 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1245.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1157.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > [1211.529s][info][gc,remset ] GC(7069) Worker 1 Card Stats Histo: > [1211.529s][info][gc,remset ] GC(7069) dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_run: [ 0.00 0.00 0.00 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_cards: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_cards: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_dirty_run: [ 0.00 0.00 0.00 0.00 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) max_clean_run: [ 0.00 99.61 99.61 99.61 100.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_objs: [ 0.00 0.00 0.00 0.00 1257.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_objs: [ 0.00 0.00 0.00 0.00 64.00 ] > [1211.529s][info][gc,remset ] GC(7069) dirty_scans: [ 0.00 0.00 0.00 0.00 1197.00 ] > [1211.529s][info][gc,remset ] GC(7069) clean_scans: [ 0.00 0.00 0.00 0.00 17.00 ] > [1211.529s][info][gc,remset ] GC(7069) alternations: [ 0.00 0.00 0.00 0.00 39.00 ] > ... > > > The rows represent the metric that's being tracked, and the columns are, respectively, minimum, the 3 quartiles (25%, 50%, 75%) and the maximum. The metrics are: > > - dirty_run: the length of an uninterrupted run of dirty cards, interpretedas a percentage of a chunk of work assignment (cluster) processed by a thread > - clean_run: as above, but the length of an uninterrupted run of clean cards > - dirty_cards, clean_cards: as above, but counts of cards as a percentage of chunk > - max_dirty_run & max_clean_run: Similarly for the maximum of each. > - dirty_objs, clean_objs: these are numbers of objects in any chunk walked, or scanned > - dirty_scans, clean_scans: numbers of objects scanned by the closure > - alternations: the number of times that we transitioned from clean to dirty or dirty to clean in a chunk > > The data above indicates that at least 75% of the chunks have no alternations at all, > and cards are almost always mostly clean for this specific benchmark config (extremem). > > Comparing worker stats from worker 0 and worker 9 indicates very little difference between > their statistics, as one might typically expect for well-balanced RS scans. > > **Questions:** > > 1. Would it make sense to print also, for example, the 1, 10, 90 and 99 percentiles for these metrics as well, in addition to the quartiles? > 2. The distributions are per worker for the cumulative history of the run. Would data per RS scan or per Refs Update phase provide more useful information? > 3. Any suggestions for a more easily consumable format? > 4. I welcome any other feedback on the pull request. Y. Srinivas Ramakrishna has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 49 commits: - Merge branch 'master' into JVM-1264 - Tested and fixed some bugs; printing frequency of cumulative stats controlled by command-line option. Ready for review. - Remove stubs (guarantee/false). This still needs formal work on merging decayed stats, but is OK to ignore for now because no one currently uses the decayed stats. The non-decayed stats also need further review and correction. So this is still an interim checkin. To do: -- print final summary at exit; consider if periodic cumulative summary might be useful as well (Every major collection cycles?) -- check correctness of merged data (ignoring decayed statistics for now) - Merge branch 'stats_merge' into JVM-1264 - More merge() implementation. -- Need to think about merge of decaying stats in AbsSeq. -- Need to add tests. - Interim checkin of code w/beginnings of merge() support. Some implementations are still stubbed out and need to be written. - First cut at merge. More changes to come. May not build yet. - jcheck clean - Cumulative card stats separated out for scan_rs and update_refs phases; merge of per-worker stats into phase-specific cumulative stats stubbed out for now until HdrSeq::merge() is done. - Separated out stats for scan_rs and update_refs Still need to carry cumulative stats, and merge stats from each round into cumulative.The latter needs a "merge" method in NumberSeq, which will be a separate PR. - ... and 39 more: https://git.openjdk.org/shenandoah/compare/35b26d60...75933a59 ------------- Changes: https://git.openjdk.org/shenandoah/pull/176/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=176&range=09 Stats: 948 lines in 12 files changed: 578 ins; 204 del; 166 mod Patch: https://git.openjdk.org/shenandoah/pull/176.diff Fetch: git fetch https://git.openjdk.org/shenandoah pull/176/head:pull/176 PR: https://git.openjdk.org/shenandoah/pull/176 From luhenry at openjdk.org Thu Dec 15 13:27:11 2022 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 15 Dec 2022 13:27:11 GMT Subject: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13] In-Reply-To: References:

Message-ID: