From ayang at openjdk.org Thu Aug 1 07:31:43 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 1 Aug 2024 07:31:43 GMT Subject: RFR: 8337546: Remove unused GCCause::_adaptive_size_policy In-Reply-To: References: Message-ID: On Wed, 31 Jul 2024 11:25:50 GMT, Albert Mingkun Yang wrote: > Trivial removing an unused gc-cause; it was previously used by Parallel only. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20403#issuecomment-2262242210 From ayang at openjdk.org Thu Aug 1 07:31:43 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 1 Aug 2024 07:31:43 GMT Subject: Integrated: 8337546: Remove unused GCCause::_adaptive_size_policy In-Reply-To: References: Message-ID: <6THEqrfMC8jW6TBFfLMIn8XdDslUFXP9jBtYzc0jOKc=.474e0a7c-827c-4519-948e-db8aecc15722@github.com> On Wed, 31 Jul 2024 11:25:50 GMT, Albert Mingkun Yang wrote: > Trivial removing an unused gc-cause; it was previously used by Parallel only. This pull request has now been integrated. Changeset: cf1230a5 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/cf1230a5f7e5ae4c72ec6243fff1d0b0eb27779a Stats: 13 lines in 4 files changed: 0 ins; 11 del; 2 mod 8337546: Remove unused GCCause::_adaptive_size_policy Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/20403 From ayang at openjdk.org Thu Aug 1 07:43:56 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 1 Aug 2024 07:43:56 GMT Subject: RFR: 8337641: G1: Remove unused G1CollectedHeap::alloc_highest_free_region Message-ID: Trivial removing dead code. ------------- Commit messages: - g1-trivial Changes: https://git.openjdk.org/jdk/pull/20415/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20415&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337641 Stats: 18 lines in 2 files changed: 0 ins; 18 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20415.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20415/head:pull/20415 PR: https://git.openjdk.org/jdk/pull/20415 From ayang at openjdk.org Thu Aug 1 07:49:02 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 1 Aug 2024 07:49:02 GMT Subject: RFR: 8337642: Serial: Remove redundant counter update in DefNewGeneration::gc_epilogue Message-ID: Trivial removing an empty method call. (Only subclasses have non-empty method body, which is not used by Serial.) ------------- Commit messages: - s1-trivial Changes: https://git.openjdk.org/jdk/pull/20416/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20416&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337642 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20416/head:pull/20416 PR: https://git.openjdk.org/jdk/pull/20416 From tschatzl at openjdk.org Thu Aug 1 07:53:30 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 1 Aug 2024 07:53:30 GMT Subject: RFR: 8337641: G1: Remove unused G1CollectedHeap::alloc_highest_free_region In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 07:39:31 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. `G1HeapRegionManager::find_highest_free()` can also be removed. ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20415#pullrequestreview-2211909772 From tschatzl at openjdk.org Thu Aug 1 08:39:32 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 1 Aug 2024 08:39:32 GMT Subject: RFR: 8337642: Serial: Remove redundant counter update in DefNewGeneration::gc_epilogue In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 07:43:05 GMT, Albert Mingkun Yang wrote: > Trivial removing an empty method call. (Only subclasses have non-empty method body, which is not used by Serial.) I disagree with this change: when using `GCPolicyCounters`, the implied contract seems to be that `update_counters` is called at appropriate locations (e.g. `gc_epilogue`), even if empty, exactly to abstract away differences in the collectors wrt to usage. Looking at the users, it rather seems G1 being wrong in not calling this. ------------- PR Review: https://git.openjdk.org/jdk/pull/20416#pullrequestreview-2212007311 From ayang at openjdk.org Thu Aug 1 09:49:47 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 1 Aug 2024 09:49:47 GMT Subject: RFR: 8337641: G1: Remove unused G1CollectedHeap::alloc_highest_free_region [v2] In-Reply-To: References: Message-ID: > Trivial removing dead code. Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20415/files - new: https://git.openjdk.org/jdk/pull/20415/files/01b06d59..1ff8da35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20415&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20415&range=00-01 Stats: 27 lines in 2 files changed: 0 ins; 27 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20415.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20415/head:pull/20415 PR: https://git.openjdk.org/jdk/pull/20415 From ayang at openjdk.org Thu Aug 1 09:53:31 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 1 Aug 2024 09:53:31 GMT Subject: RFR: 8337642: Serial: Remove redundant counter update in DefNewGeneration::gc_epilogue In-Reply-To: References: Message-ID: <7-9_ZwueknC-L3QGr2xeMipto28jnnmtsLMGNKs3ouA=.19c3c6bd-2d1d-43e0-b48e-de6df2da3032@github.com> On Thu, 1 Aug 2024 08:37:10 GMT, Thomas Schatzl wrote: > the implied contract seems to be that update_counters is called at appropriate locations Mabye we can remove it from the base class. Callers of this method always live in gc-specific location where the concrete policy-counter type is known. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20416#issuecomment-2262625045 From stefank at openjdk.org Thu Aug 1 12:23:57 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 1 Aug 2024 12:23:57 GMT Subject: RFR: 8337658: ZGC: Move soft reference handling out of the driver loop function Message-ID: The ZDriver code is written to be neat and have a clear outline. The soft reference handling distracts when reading this code. I propose that we hide it a bit. I've also clarified in comments and names that the code is dealing with clearing of *all* references. ------------- Commit messages: - 8337658: ZGC: Move soft reference handling out of the driver loop function Changes: https://git.openjdk.org/jdk/pull/20418/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20418&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337658 Stats: 51 lines in 8 files changed: 20 ins; 4 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/20418.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20418/head:pull/20418 PR: https://git.openjdk.org/jdk/pull/20418 From duke at openjdk.org Thu Aug 1 12:58:30 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 1 Aug 2024 12:58:30 GMT Subject: RFR: 8337658: ZGC: Move soft reference handling out of the driver loop function In-Reply-To: References: Message-ID: <-La3_J21R2DpekRekPcRg4yDUUt7QJ5MfsyQjWznr0o=.fc95a797-44f0-4140-af5c-46ca6a2ef0a0@github.com> On Thu, 1 Aug 2024 12:19:04 GMT, Stefan Karlsson wrote: > The ZDriver code is written to be neat and have a clear outline. The soft reference handling distracts when reading this code. I propose that we hide it a bit. > > I've also clarified in comments and names that the code is dealing with clearing of *all* references. I think this change is good and agree that `ZDriverMajor::run_thread()` becomes easier to read. Since the policy is now read and set in the construction of the `ZDriverScopeMajor`, a new getter is needed from `ZGenerationOld` and in turn `ZReferenceProcessor` to retrieve the policy for the gc request. The naming clarifications seem appropriate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20418#issuecomment-2262971897 From tschatzl at openjdk.org Thu Aug 1 13:35:34 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 1 Aug 2024 13:35:34 GMT Subject: RFR: 8337641: G1: Remove unused G1CollectedHeap::alloc_highest_free_region [v2] In-Reply-To: References: Message-ID: <8h0n7NHo5JaK_BUG_kJAqEe9LxKXvUa3hBx61xvyURM=.cacd30cc-c91a-4bea-88f3-9971810a8961@github.com> On Thu, 1 Aug 2024 09:49:47 GMT, Albert Mingkun Yang wrote: >> Trivial removing dead code. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review lgtm and trivial. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20415#pullrequestreview-2212776652 From ayang at openjdk.org Thu Aug 1 13:44:35 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 1 Aug 2024 13:44:35 GMT Subject: RFR: 8337641: G1: Remove unused G1CollectedHeap::alloc_highest_free_region [v2] In-Reply-To: References: Message-ID: <5ULVBkCxofwqMvWaQA579_ERMrUF6CskwU049ouXHeU=.94534efb-5531-46be-890f-c221c98c5428@github.com> On Thu, 1 Aug 2024 09:49:47 GMT, Albert Mingkun Yang wrote: >> Trivial removing dead code. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20415#issuecomment-2263075474 From ayang at openjdk.org Thu Aug 1 13:44:36 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 1 Aug 2024 13:44:36 GMT Subject: Integrated: 8337641: G1: Remove unused G1CollectedHeap::alloc_highest_free_region In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 07:39:31 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. This pull request has now been integrated. Changeset: 022899a7 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/022899a7eb0100bd6d738471f52e5028e3e5f18e Stats: 45 lines in 4 files changed: 0 ins; 45 del; 0 mod 8337641: G1: Remove unused G1CollectedHeap::alloc_highest_free_region Reviewed-by: tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/20415 From duke at openjdk.org Thu Aug 1 18:24:38 2024 From: duke at openjdk.org (duke) Date: Thu, 1 Aug 2024 18:24:38 GMT Subject: Withdrawn: 8331723: Serial: Remove the unused parameter of the method SerialHeap::gc_prologue In-Reply-To: References: Message-ID: On Sun, 12 May 2024 09:27:36 GMT, xiaotaonan wrote: > Serial: Remove the unused parameter of the method SerialHeap::gc_prologue This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19207 From nprasad at openjdk.org Thu Aug 1 21:13:48 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Thu, 1 Aug 2024 21:13:48 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v4] In-Reply-To: References: Message-ID: <6WQVJcdTTpmLHN11SLuikvQGtPYOiB82dFd-cShd-Qk=.5a20a8f4-cab9-4fd2-9f00-93c064fe7ceb@github.com> > **Notes** > Adding logs to get more visibility into how fast a thread resumes from allocation stall. > > **Testing** > * tier 1, tier 2, hotspot_gc tests. > > Example log messages > > 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. > > 2. Thread exiting critical region Thread "main" 0 locked. > > 3. Thread stalled by JNI critical section. Resumed after 586ms. Thread "Thread-0". > > 4. Thread blocked to enter critical region. Resumed after 1240ms. Thread "SIGINT handler". Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: Address formating issue and code clean up feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20277/files - new: https://git.openjdk.org/jdk/pull/20277/files/c6b66ceb..c53dc9cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20277&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20277&range=02-03 Stats: 52 lines in 2 files changed: 23 ins; 29 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20277.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20277/head:pull/20277 PR: https://git.openjdk.org/jdk/pull/20277 From nprasad at openjdk.org Fri Aug 2 02:58:37 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Fri, 2 Aug 2024 02:58:37 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics In-Reply-To: References: Message-ID: On Wed, 24 Jul 2024 08:27:49 GMT, Thomas Schatzl wrote: > It might also be nice to give an example of such a new message in the CR. updated PR summary. Examples are as below 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. 2. Thread exiting critical region Thread "main" 0 locked. 3. Thread stalled by JNI critical section. Resumed after 586ms. Thread "Thread-0". 4. Thread blocked to enter critical region. Resumed after 1240ms. Thread "SIGINT handler". ------------- PR Comment: https://git.openjdk.org/jdk/pull/20277#issuecomment-2264414125 From nprasad at openjdk.org Fri Aug 2 02:58:38 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Fri, 2 Aug 2024 02:58:38 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v4] In-Reply-To: References: Message-ID: On Wed, 24 Jul 2024 08:23:54 GMT, Thomas Schatzl wrote: >> Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: >> >> Address formating issue and code clean up feedback > > src/hotspot/share/gc/shared/gcLocker.cpp line 124: > >> 122: } >> 123: >> 124: elapsedTimer elapsed_timer; > > In GC code we tend to use the newer `Ticks` and `Tickspan` API, not `elapsedTimer`. Only Parallel GC uses it at this point afaict (or just `os::elapsedTime()`/`os::elapsed_counter()`). > > Maybe it's even worth to add a special class that can be used with scopes to hide all that including the manual call to `log_debug_jni` (automatically done in the destructor). Probably not really useful. Addressed in latest revision. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20277#discussion_r1701133766 From nprasad at openjdk.org Fri Aug 2 02:58:40 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Fri, 2 Aug 2024 02:58:40 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jul 2024 08:10:43 GMT, Stefan Karlsson wrote: >> Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing imports and remove unused ones > > src/hotspot/share/gc/shared/gcLocker.hpp line 166: > >> 164: GCLockerTimingDebugLogger(const char* log_message); >> 165: ~GCLockerTimingDebugLogger(); >> 166: }; > > There should be no code after the include guard on line 153. This class should be moved above it. With that said, this class is only used in gcLocker.cpp, so there's really no need to expose it through the gcLocker.hpp file, AFAICT. > > Also, note that you are using `/* */` to add a comment about the class, but the rest of the code in this file uses `//`, so I'd prefer to see it changed. > > Also note that GitHub complains that your addition lacks a newline at the end of the file. We recently went over the GC code base and fixed issues like that. Maybe there's a way to configure your editor to add one when making edits to the end of a file? Thanks for the feedback. Addressed in new PR revision. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20277#discussion_r1701134220 From ayang at openjdk.org Fri Aug 2 06:52:00 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 2 Aug 2024 06:52:00 GMT Subject: RFR: 8337721: G1: Remove unused G1CollectedHeap::young_collection_verify_type Message-ID: Trivial removing dead code. ------------- Commit messages: - g1-trivial Changes: https://git.openjdk.org/jdk/pull/20438/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20438&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337721 Stats: 11 lines in 2 files changed: 0 ins; 11 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20438/head:pull/20438 PR: https://git.openjdk.org/jdk/pull/20438 From ayang at openjdk.org Fri Aug 2 06:55:32 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 2 Aug 2024 06:55:32 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v4] In-Reply-To: <6WQVJcdTTpmLHN11SLuikvQGtPYOiB82dFd-cShd-Qk=.5a20a8f4-cab9-4fd2-9f00-93c064fe7ceb@github.com> References: <6WQVJcdTTpmLHN11SLuikvQGtPYOiB82dFd-cShd-Qk=.5a20a8f4-cab9-4fd2-9f00-93c064fe7ceb@github.com> Message-ID: On Thu, 1 Aug 2024 21:13:48 GMT, Neethu Prasad wrote: >> **Notes** >> Adding logs to get more visibility into how fast a thread resumes from allocation stall. >> >> **Testing** >> * tier 1, tier 2, hotspot_gc tests. >> >> Example log messages >> >> 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. >> >> 2. Thread exiting critical region Thread "main" 0 locked. >> >> 3. Thread stalled by JNI critical section. Resumed after 586ms. Thread "Thread-0". >> >> 4. Thread blocked to enter critical region. Resumed after 1240ms. Thread "SIGINT handler". > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > Address formating issue and code clean up feedback src/hotspot/share/gc/shared/gcLocker.cpp line 56: > 54: > 55: ~GCLockerTimingDebugLogger() { > 56: const Tickspan elapsed_time = Ticks::now() - _start; Why is this outside the `if` logger-enabled check? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20277#discussion_r1701374847 From tschatzl at openjdk.org Fri Aug 2 07:43:32 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 2 Aug 2024 07:43:32 GMT Subject: RFR: 8337642: Serial: Remove redundant counter update in DefNewGeneration::gc_epilogue In-Reply-To: <7-9_ZwueknC-L3QGr2xeMipto28jnnmtsLMGNKs3ouA=.19c3c6bd-2d1d-43e0-b48e-de6df2da3032@github.com> References: <7-9_ZwueknC-L3QGr2xeMipto28jnnmtsLMGNKs3ouA=.19c3c6bd-2d1d-43e0-b48e-de6df2da3032@github.com> Message-ID: On Thu, 1 Aug 2024 09:50:50 GMT, Albert Mingkun Yang wrote: > > the implied contract seems to be that update_counters is called at appropriate locations > > Mabye we can remove it from the base class. Callers of this method always live in gc-specific location where the concrete policy-counter type is known. Imo the point of this and similar APIs to avoid the need to think about whether a given collector uses a particular implementation, so that would run counter to the intent of such generic API about handling? I.e. that regardless of type of `GCPolicyCounters` that is actually used, one can be sure that everything is fine as long as you call that `update` method, not needing to think about the concrete policy type. Is the single empty call that much of a (performance) issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20416#issuecomment-2264771219 From tschatzl at openjdk.org Fri Aug 2 08:01:34 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 2 Aug 2024 08:01:34 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v4] In-Reply-To: References: <6WQVJcdTTpmLHN11SLuikvQGtPYOiB82dFd-cShd-Qk=.5a20a8f4-cab9-4fd2-9f00-93c064fe7ceb@github.com> Message-ID: On Fri, 2 Aug 2024 06:52:53 GMT, Albert Mingkun Yang wrote: >> Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: >> >> Address formating issue and code clean up feedback > > src/hotspot/share/gc/shared/gcLocker.cpp line 56: > >> 54: >> 55: ~GCLockerTimingDebugLogger() { >> 56: const Tickspan elapsed_time = Ticks::now() - _start; > > Why is this outside the `if` logger-enabled check? Please move within the `if`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20277#discussion_r1701459168 From tschatzl at openjdk.org Fri Aug 2 08:01:33 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 2 Aug 2024 08:01:33 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v4] In-Reply-To: <6WQVJcdTTpmLHN11SLuikvQGtPYOiB82dFd-cShd-Qk=.5a20a8f4-cab9-4fd2-9f00-93c064fe7ceb@github.com> References: <6WQVJcdTTpmLHN11SLuikvQGtPYOiB82dFd-cShd-Qk=.5a20a8f4-cab9-4fd2-9f00-93c064fe7ceb@github.com> Message-ID: On Thu, 1 Aug 2024 21:13:48 GMT, Neethu Prasad wrote: >> **Notes** >> Adding logs to get more visibility into how fast a thread resumes from allocation stall. >> >> **Testing** >> * tier 1, tier 2, hotspot_gc tests. >> >> Example log messages >> >> 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. >> >> 2. Thread exiting critical region Thread "main" 0 locked. >> >> 3. Thread stalled by JNI critical section. Resumed after 586ms. Thread "Thread-0". >> >> 4. Thread blocked to enter critical region. Resumed after 1240ms. Thread "SIGINT handler". > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > Address formating issue and code clean up feedback Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/shared/gcLocker.cpp line 50: > 48: public: > 49: GCLockerTimingDebugLogger(const char* log_message) : > 50: _log_message(log_message) { Indentation of the entire class is one level too deep; the first `private` visibility specifier can be ommitted. There are two spaces before `_log_message`. src/hotspot/share/gc/shared/gcLocker.cpp line 53: > 51: assert(_log_message != nullptr, "GC locker debug message must be set."); > 52: _start = Ticks::now(); > 53: } I think this `}` should align with the method name, i.e. the body of this constructor seems to be nested one level too deep. ------------- PR Review: https://git.openjdk.org/jdk/pull/20277#pullrequestreview-2214937023 PR Review Comment: https://git.openjdk.org/jdk/pull/20277#discussion_r1701456382 PR Review Comment: https://git.openjdk.org/jdk/pull/20277#discussion_r1701458868 From tschatzl at openjdk.org Fri Aug 2 08:17:30 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 2 Aug 2024 08:17:30 GMT Subject: RFR: 8337721: G1: Remove unused G1CollectedHeap::young_collection_verify_type In-Reply-To: References: Message-ID: <0SliAGiQTVkw13wR8sq3OVbeOUCpcYyOpiruvzMwfxY=.843fed21-1ae2-4890-a170-36d6e4934719@github.com> On Fri, 2 Aug 2024 06:47:17 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. Lgtm and trivial. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20438#pullrequestreview-2214974144 From ayang at openjdk.org Fri Aug 2 10:56:38 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 2 Aug 2024 10:56:38 GMT Subject: RFR: 8337721: G1: Remove unused G1CollectedHeap::young_collection_verify_type In-Reply-To: References: Message-ID: <7M8bnjQ8rk7S8SeGPk-gGqKxDfNTIYhA--QnopL4eRI=.6ba40ec8-dd2a-49f8-b890-5c374099f654@github.com> On Fri, 2 Aug 2024 06:47:17 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20438#issuecomment-2265101741 From ayang at openjdk.org Fri Aug 2 10:56:38 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 2 Aug 2024 10:56:38 GMT Subject: Integrated: 8337721: G1: Remove unused G1CollectedHeap::young_collection_verify_type In-Reply-To: References: Message-ID: On Fri, 2 Aug 2024 06:47:17 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. This pull request has now been integrated. Changeset: a89b5251 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/a89b525189fbc0559be9edc0de9f4288ca676139 Stats: 11 lines in 2 files changed: 0 ins; 11 del; 0 mod 8337721: G1: Remove unused G1CollectedHeap::young_collection_verify_type Reviewed-by: tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/20438 From kbarrett at openjdk.org Fri Aug 2 19:41:03 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 2 Aug 2024 19:41:03 GMT Subject: RFR: 8337709: Use allocated states for chunking large array processing Message-ID: Please review this change to the G1 young/mixed collector to use allocated states to encode partial array task chunking. States are allocated from per-worker-thread arena+free-list pairs, and released to the free-list for the worker that completed use. They are refcounted to track the number of refering tasks. Various other approaches (such as a single arena+FreeListAllocator) were tested, but found to have worse performance, though in some cases fewer allocations. The per-worker arena+free-list pair was the only option that didn't show a regression compared to the previous PartialArrayScanTask approach on a stress test. In addition to the changes to ScannerTask to support the new PartialArrayState, it temporarily continues to support PartialArrayScanTask. This is because ParallelGC will continue to use the latter until it is changed to use PartialArrayState. The intent is to update ParallelGC in a followup CR. Testing: mach5 tier1-5 G1 performance suite ------------- Commit messages: - G1 young update - add PartialArrayState - move chunk size inside stepper Changes: https://git.openjdk.org/jdk/pull/20445/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20445&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337709 Stats: 501 lines in 9 files changed: 356 ins; 57 del; 88 mod Patch: https://git.openjdk.org/jdk/pull/20445.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20445/head:pull/20445 PR: https://git.openjdk.org/jdk/pull/20445 From ayang at openjdk.org Mon Aug 5 08:04:30 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 5 Aug 2024 08:04:30 GMT Subject: RFR: 8337642: Serial: Remove redundant counter update in DefNewGeneration::gc_epilogue In-Reply-To: References: <7-9_ZwueknC-L3QGr2xeMipto28jnnmtsLMGNKs3ouA=.19c3c6bd-2d1d-43e0-b48e-de6df2da3032@github.com> Message-ID: On Fri, 2 Aug 2024 07:41:14 GMT, Thomas Schatzl wrote: > so that would run counter to the intent of such generic API about handling One can view `GCPolicyCounters` as a plain data-structure with only getters. The only two non-getter methods are the empty `update_counters` and the unused `kind`. If both are removed, `GCPolicyCounters` doesn't expose any action-related APIs any more. > one can be sure that everything is fine as long as you call that update method, That's a false sense of security. The two actual vars, `_tenuring_threshold` and `_desired_survivor_size`, that requires updating, are updated in two diff places in Serial and G1, after and before young-gc. IOW, diff GCs differ enough so that not exposing an `update` API, i.e. treating `GCPolicyCounters` as plain-old-data, offers more flexibility, IMO. > Is the single empty call that much of a (performance) issue? It's more about removing effectively dead code to simplify the logic. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20416#issuecomment-2268424338 From tschatzl at openjdk.org Mon Aug 5 09:18:31 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 5 Aug 2024 09:18:31 GMT Subject: RFR: 8337642: Serial: Remove redundant counter update in DefNewGeneration::gc_epilogue In-Reply-To: References: <7-9_ZwueknC-L3QGr2xeMipto28jnnmtsLMGNKs3ouA=.19c3c6bd-2d1d-43e0-b48e-de6df2da3032@github.com> Message-ID: On Mon, 5 Aug 2024 08:01:27 GMT, Albert Mingkun Yang wrote: > > so that would run counter to the intent of such generic API about handling > > One can view `GCPolicyCounters` as a plain data-structure with only getters. The only two non-getter methods are the empty `update_counters` and the unused `kind`. If both are removed, `GCPolicyCounters` doesn't expose any action-related APIs any more. Then let's do that instead of removing only the call. The `update_counters` API as it is used now does not seem to help at all. The `gc_overhead_limit_exceeded_counter` could also be moved into the Parallel specific class because it's only used there. I object to only remove the call and keep the bad API; removing them isn't that much more work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20416#issuecomment-2268573514 From ayang at openjdk.org Mon Aug 5 09:41:04 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 5 Aug 2024 09:41:04 GMT Subject: RFR: 8337642: Serial: Remove redundant counter update in DefNewGeneration::gc_epilogue [v2] In-Reply-To: References: Message-ID: > Trivial removing an empty method call. (Only subclasses have non-empty method body, which is not used by Serial.) Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - review - Merge branch 'master' into s1-trivial - s1-trivial ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20416/files - new: https://git.openjdk.org/jdk/pull/20416/files/54f148df..86e46b66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20416&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20416&range=00-01 Stats: 4398 lines in 184 files changed: 2047 ins; 1389 del; 962 mod Patch: https://git.openjdk.org/jdk/pull/20416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20416/head:pull/20416 PR: https://git.openjdk.org/jdk/pull/20416 From tschatzl at openjdk.org Mon Aug 5 09:46:36 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 5 Aug 2024 09:46:36 GMT Subject: RFR: 8337642: Remove unused APIs of GCPolicyCounters [v2] In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 09:41:04 GMT, Albert Mingkun Yang wrote: >> Trivial removing an empty method call. (Only subclasses have non-empty method body, which is not used by Serial.) > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into s1-trivial > - s1-trivial Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20416#pullrequestreview-2218545426 From iwalulya at openjdk.org Mon Aug 5 14:42:45 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 5 Aug 2024 14:42:45 GMT Subject: RFR: 8336086: G1: Use one G1CardSet instance for all young regions [v2] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to assign a single G1CardSet to all young regions. As young regions are collected at the same, and we do not have young-to-young remembered sets, we can maintain a single G1CardSet for all young regions. > > This reduces the memory overhead of the G1CardSets and the time taken to merge per region G1CardSets during GC pause. > > Testing: Tier 1-5 Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Albert Review - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet - cleanup - merge - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet - init ------------- Changes: https://git.openjdk.org/jdk/pull/20134/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20134&range=01 Stats: 175 lines in 21 files changed: 142 ins; 10 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/20134.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20134/head:pull/20134 PR: https://git.openjdk.org/jdk/pull/20134 From ayang at openjdk.org Tue Aug 6 07:01:44 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 6 Aug 2024 07:01:44 GMT Subject: Integrated: 8337642: Remove unused APIs of GCPolicyCounters In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 07:43:05 GMT, Albert Mingkun Yang wrote: > Trivial removing an empty method call. (Only subclasses have non-empty method body, which is not used by Serial.) This pull request has now been integrated. Changeset: 0d8ec429 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/0d8ec42969fb60c140aaed7795ea1b9591915b8d Stats: 22 lines in 4 files changed: 0 ins; 22 del; 0 mod 8337642: Remove unused APIs of GCPolicyCounters Reviewed-by: tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/20416 From ayang at openjdk.org Tue Aug 6 07:01:43 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 6 Aug 2024 07:01:43 GMT Subject: RFR: 8337642: Remove unused APIs of GCPolicyCounters [v2] In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 09:41:04 GMT, Albert Mingkun Yang wrote: >> Trivial removing an empty method call. (Only subclasses have non-empty method body, which is not used by Serial.) > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into s1-trivial > - s1-trivial Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20416#issuecomment-2270525653 From duke at openjdk.org Tue Aug 6 14:00:02 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 6 Aug 2024 14:00:02 GMT Subject: RFR: 8310675: Fix -Wconversion warnings in ZGC code Message-ID: <_eEPqKVsKunCsC5ogqfaHPfgndjqBoJKV1iOsZAYxio=.723085df-731a-4c98-b74a-91575860e1ec@github.com> Fixed `-Wconversion` warnings in ZGC code, either by adding an explicit type cast, changin the type of the variable or calling an equivalent method with other types. The largest change is the addition of `ZStatDurationSample`, which typecasts `Tickspan::value()` to a `uint64_t` and calls `ZStatSample` to make the code more readable. I isolated the `-Wconversion` warnings for ZGC by adding the flag to clangd and displaying the errors in my IDE and going through each file directly associated with ZGC one by one. Tested with tiers 1-3. ------------- Commit messages: - 8310675: Fixed -Wconversion warnings in ZGC Changes: https://git.openjdk.org/jdk/pull/20406/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20406&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310675 Stats: 120 lines in 33 files changed: 5 ins; 0 del; 115 mod Patch: https://git.openjdk.org/jdk/pull/20406.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20406/head:pull/20406 PR: https://git.openjdk.org/jdk/pull/20406 From stefank at openjdk.org Tue Aug 6 14:24:32 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 6 Aug 2024 14:24:32 GMT Subject: RFR: 8310675: Fix -Wconversion warnings in ZGC code In-Reply-To: <_eEPqKVsKunCsC5ogqfaHPfgndjqBoJKV1iOsZAYxio=.723085df-731a-4c98-b74a-91575860e1ec@github.com> References: <_eEPqKVsKunCsC5ogqfaHPfgndjqBoJKV1iOsZAYxio=.723085df-731a-4c98-b74a-91575860e1ec@github.com> Message-ID: <75lHlbAzUEAS7EkEbyGjQrIT2l4qB2_-oQKI6CYNX6k=.59849893-7e20-4594-b93a-5675a6943d97@github.com> On Wed, 31 Jul 2024 13:01:50 GMT, Joel Sikstr?m wrote: > Fixed `-Wconversion` warnings in ZGC code, either by adding an explicit type cast, changin the type of the variable or calling an equivalent method with other types. The largest change is the addition of `ZStatDurationSample`, which typecasts `Tickspan::value()` to a `uint64_t` and calls `ZStatSample` to make the code more readable. > > I isolated the `-Wconversion` warnings for ZGC by adding the flag to clangd and displaying the errors in my IDE and going through each file directly associated with ZGC one by one. > > Tested with tiers 1-3. This change looks good to me. I've looked through the code with Joel and I think that this is good set of changes to the ZGC code base. When fixing -Wconversion warnings there are always multiple ways to juggle around the types. Some of the added casts could probably be cleaned up by updating non-ZGC code instead (E.g. TimeHelper), but for Joel's first patch we wanted to limit the changes to the ZGC code base. Note that we're intentionally only fixing the Generational ZGC code and leaving the single-generation code left as is. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20406#pullrequestreview-2221462496 From ayang at openjdk.org Tue Aug 6 14:28:32 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 6 Aug 2024 14:28:32 GMT Subject: RFR: 8310675: Fix -Wconversion warnings in ZGC code In-Reply-To: <_eEPqKVsKunCsC5ogqfaHPfgndjqBoJKV1iOsZAYxio=.723085df-731a-4c98-b74a-91575860e1ec@github.com> References: <_eEPqKVsKunCsC5ogqfaHPfgndjqBoJKV1iOsZAYxio=.723085df-731a-4c98-b74a-91575860e1ec@github.com> Message-ID: On Wed, 31 Jul 2024 13:01:50 GMT, Joel Sikstr?m wrote: > Fixed `-Wconversion` warnings in ZGC code, either by adding an explicit type cast, changin the type of the variable or calling an equivalent method with other types. The largest change is the addition of `ZStatDurationSample`, which typecasts `Tickspan::value()` to a `uint64_t` and calls `ZStatSample` to make the code more readable. > > I isolated the `-Wconversion` warnings for ZGC by adding the flag to clangd and displaying the errors in my IDE and going through each file directly associated with ZGC one by one. > > Tested with tiers 1-3. Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20406#pullrequestreview-2221474896 From nprasad at openjdk.org Tue Aug 6 18:11:05 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Tue, 6 Aug 2024 18:11:05 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v2] In-Reply-To: References: Message-ID: > **Notes** > This PR adds the following > 1. info logging on number of SATB flush attempts > 2. total time spend on handshaking all threads requesting them to flush their SATB buffers. > > As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. > > [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns > [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns > > > **Testing** > 1. tier1, tier2 and hotspot_gc_shenandoah tests. > 2. **-Xlog:gc+stats=info** > > > [4.058s][info][gc,stats ] Concurrent Marking = 0.080 s (a = 5351 us) (n = 15) (lvls, us = 4746, 5000, 5156, 5684, 5988) > [4.058s][info][gc,stats ] SATB Flush Rendezvous = 0.013 s (a = 860 us) (n = 15) (lvls, us = 764, 814, 836, 885, 961) > [4.058s][info][gc,stats ] Pause Final Mark (G) = 0.058 s (a = 3839 us) (n = 15) (lvls, us = 3047, 3320, 3867, 4121, 4930) > [4.058s][info][gc,stats ] Pause Final Mark (N) = 0.054 s (a = 3592 us) (n = 15) (lvls, us = 2812, 3047, 3574, 3887, 4597) > [4.058s][info][gc,stats ] Finish Mark = 0.028 s (a = 1843 us) (n = 15) (lvls, us = 1602, 1641, 1816, 1934, 2045) > [4.058s][info][gc,stats ] Update Region States = 0.006 s (a = 386 us) (n = 15) (lvls, us = 375, 375, 381, 389, 413) > [4.058s][info][gc,stats ] Choose Collection Set = 0.018 s (a = 1186 us) (n = 15) (lvls, us = 609, 619, 1309, 1387, 2109) > [4.058s][info][gc,stats ] Rebuild Free Set = 0.001 s (a = 43 us) (n = 15) (lvls, us = 40, 41, 42, 43, 53) > [4.058s][info][gc,stats ] Concurrent Weak References = 0.007 s (a = 452 us) (n = 15) (lvls, us = 420, 438, 443, 455, 487) > > > on app termination > > > [5.299s][info][gc,stats] GC STATISTICS: > [5.299s][info][gc,stats] "(G)" (gross) pauses include VM time: time to notify and block threads, do the pre- > [5.299s][info][gc,stats] and post-safepoint housekee... Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: ShenandoahTimingsTracker to support aggregation of cycle times ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20318/files - new: https://git.openjdk.org/jdk/pull/20318/files/7c3d4a84..6e7fdd5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20318&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20318&range=00-01 Stats: 31 lines in 5 files changed: 9 ins; 6 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/20318.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20318/head:pull/20318 PR: https://git.openjdk.org/jdk/pull/20318 From shade at openjdk.org Tue Aug 6 18:24:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 6 Aug 2024 18:24:35 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v2] In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 18:11:05 GMT, Neethu Prasad wrote: >> **Revision 2 Notes** >> 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. >> 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. >> >> **Revision 1 Notes** >> This PR adds the following >> 1. info logging on number of SATB flush attempts >> 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. >> >> As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. >> >> [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns >> [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns >> >> >> **Testing** >> 1. tier1, tier2 and hotspot_gc_shenandoah tests. >> 2. **-Xlog:gc+stats=info** >> >> >> [37.087s][info][gc,stats] CMR: VM Strong Roots 413 us, workers (us): 64, 57, 52, 47, 38, 31, 30, 25, 20, 21, 17, 10, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [37.087s][info][gc,stats] CMR: CLDG Roots 449 us, workers (us): 4, ---, ---, 406, ---, 15, ---, 4, 4, ---, ---, 17, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [37.087s][info][gc,stats] Concurrent Marking 5002 us >> [37.087s][info][gc,stats] SATB Flush Rendezvous 1748 us >> [37.087s][info][gc,stats] Pause Final Mark (G) 57272 us >> [37.087s][info][gc,stats] Pause Final Mark (N) 56985 us >> [37.087s][info][gc,stats] Finish Mark 387 us >> [37.087s][info][gc,stats] Update Region States 109 us >> [37.087s][info][gc,stats] Choose Collection Set 56395 us >> [37.087s][info][gc,stats] Rebuild Free Set 40 us >> >> >> on app termination >> >> >> [40.640s][info][gc,stats] Concurrent Reset = 0.914 s (a = 65255 us) (n = 14) (lvls, us = 54883, 55859, 63867, 65234, 97096) >> [40.640s][info][gc,stats] Pause Init Mark (G) = 1.755 s (a = 125380 us) (n = 14) (lvls, us = 119141, 123047, 125000, 125000, 128042) >> [40.640s][info][gc,stats] Pause Init Mark (N) = 1.697 s (a = 121241 us... > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > ShenandoahTimingsTracker to support aggregation of cycle times Looks okay, only stylistic comments: src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp line 142: > 140: void ShenandoahPhaseTimings::set_cycle_data(Phase phase, double time, bool should_aggregate_cycles) { > 141: if (should_aggregate_cycles) { > 142: _cycle_data[phase] = _cycle_data[phase] <= 0 ? time : _cycle_data[phase] + time; I *think* `<= 0` is too broad, and assumes things about the value of `uninitialized()`. Check for `uninitialized()` explicitly. src/hotspot/share/gc/shenandoah/shenandoahUtils.cpp line 127: > 125: const double end_time = os::elapsedTime(); > 126: const double phase_elapsed_time = end_time - _start; > 127: _timings->record_phase_time(_phase, phase_elapsed_time, _should_aggregate_cycles); No need to introduce local variables here, right? The expression can stay inlined. src/hotspot/share/gc/shenandoah/shenandoahUtils.hpp line 69: > 67: ShenandoahPhaseTimings::Phase _parent_phase; > 68: double _start; > 69: bool _should_aggregate_cycles; How about simplifying it to `_should_aggregate`? src/hotspot/share/gc/shenandoah/shenandoahUtils.hpp line 72: > 70: > 71: public: > 72: ShenandoahTimingsTracker(ShenandoahPhaseTimings::Phase phase, bool should_aggregate_cycles=false); Here and everywhere else, need whitespaces: `bool should_aggregate_cycles = false` ------------- PR Review: https://git.openjdk.org/jdk/pull/20318#pullrequestreview-2221968957 PR Review Comment: https://git.openjdk.org/jdk/pull/20318#discussion_r1705945023 PR Review Comment: https://git.openjdk.org/jdk/pull/20318#discussion_r1705943889 PR Review Comment: https://git.openjdk.org/jdk/pull/20318#discussion_r1705943175 PR Review Comment: https://git.openjdk.org/jdk/pull/20318#discussion_r1705943455 From nprasad at openjdk.org Tue Aug 6 19:23:46 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Tue, 6 Aug 2024 19:23:46 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v3] In-Reply-To: References: Message-ID: > **Revision 2 Notes** > 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. > 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. > > **Revision 1 Notes** > This PR adds the following > 1. info logging on number of SATB flush attempts > 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. > > As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. > > [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns > [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns > > > **Testing** > 1. tier1, tier2 and hotspot_gc_shenandoah tests. > 2. **-Xlog:gc+stats=info** > > > [37.087s][info][gc,stats] CMR: VM Strong Roots 413 us, workers (us): 64, 57, 52, 47, 38, 31, 30, 25, 20, 21, 17, 10, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, > [37.087s][info][gc,stats] CMR: CLDG Roots 449 us, workers (us): 4, ---, ---, 406, ---, 15, ---, 4, 4, ---, ---, 17, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, > [37.087s][info][gc,stats] Concurrent Marking 5002 us > [37.087s][info][gc,stats] SATB Flush Rendezvous 1748 us > [37.087s][info][gc,stats] Pause Final Mark (G) 57272 us > [37.087s][info][gc,stats] Pause Final Mark (N) 56985 us > [37.087s][info][gc,stats] Finish Mark 387 us > [37.087s][info][gc,stats] Update Region States 109 us > [37.087s][info][gc,stats] Choose Collection Set 56395 us > [37.087s][info][gc,stats] Rebuild Free Set 40 us > > > on app termination > > > [40.640s][info][gc,stats] Concurrent Reset = 0.914 s (a = 65255 us) (n = 14) (lvls, us = 54883, 55859, 63867, 65234, 97096) > [40.640s][info][gc,stats] Pause Init Mark (G) = 1.755 s (a = 125380 us) (n = 14) (lvls, us = 119141, 123047, 125000, 125000, 128042) > [40.640s][info][gc,stats] Pause Init Mark (N) = 1.697 s (a = 121241 us) (n = 14) (lvls, us = 117188, 119141, 121094, 121094, 123880) > ... Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: Address feedback on code style and uninitialized check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20318/files - new: https://git.openjdk.org/jdk/pull/20318/files/6e7fdd5e..a7c0514a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20318&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20318&range=01-02 Stats: 17 lines in 4 files changed: 1 ins; 3 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/20318.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20318/head:pull/20318 PR: https://git.openjdk.org/jdk/pull/20318 From kbarrett at openjdk.org Wed Aug 7 04:45:38 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 7 Aug 2024 04:45:38 GMT Subject: RFR: 8335925: Serial: Move allocation API from Generation to subclasses [v3] In-Reply-To: References: Message-ID: <7KXk5bzbb7ONFhufUey4cDGNuqMckpBbQTdclprrr1A=.fdff505e-e920-4b0c-a95b-03bf034a8ef2@github.com> On Fri, 26 Jul 2024 10:18:08 GMT, Albert Mingkun Yang wrote: >> Trivial moving methods from parent class to subclasses. The unused second arg is also removed along the way. The API names are descriptive enough so that the accompanying comments are dropped as well. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - review > - Merge branch 'master' into s1-gen-alloc > - review > - Merge branch 'master' into s1-gen-alloc > - s1-gen-alloc Looks good. Probably the name "Generation" ought to be changed at some point, as part of the described future development. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20084#pullrequestreview-2222616549 From ayang at openjdk.org Wed Aug 7 07:50:38 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 7 Aug 2024 07:50:38 GMT Subject: RFR: 8335925: Serial: Move allocation API from Generation to subclasses [v3] In-Reply-To: References: Message-ID: On Fri, 26 Jul 2024 10:18:08 GMT, Albert Mingkun Yang wrote: >> Trivial moving methods from parent class to subclasses. The unused second arg is also removed along the way. The API names are descriptive enough so that the accompanying comments are dropped as well. > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - review > - Merge branch 'master' into s1-gen-alloc > - review > - Merge branch 'master' into s1-gen-alloc > - s1-gen-alloc Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20084#issuecomment-2272835593 From ayang at openjdk.org Wed Aug 7 07:50:38 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 7 Aug 2024 07:50:38 GMT Subject: Integrated: 8335925: Serial: Move allocation API from Generation to subclasses In-Reply-To: References: Message-ID: On Mon, 8 Jul 2024 20:00:31 GMT, Albert Mingkun Yang wrote: > Trivial moving methods from parent class to subclasses. The unused second arg is also removed along the way. The API names are descriptive enough so that the accompanying comments are dropped as well. This pull request has now been integrated. Changeset: 41f784fe Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/41f784fe63f8e06a25e1fe00dc96e398874adf81 Stats: 57 lines in 7 files changed: 3 ins; 35 del; 19 mod 8335925: Serial: Move allocation API from Generation to subclasses Reviewed-by: gli, kbarrett, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/20084 From iwalulya at openjdk.org Wed Aug 7 08:21:34 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 7 Aug 2024 08:21:34 GMT Subject: RFR: 8337709: Use allocated states for chunking large array processing In-Reply-To: References: Message-ID: On Fri, 2 Aug 2024 19:36:47 GMT, Kim Barrett wrote: > Please review this change to the G1 young/mixed collector to use allocated > states to encode partial array task chunking. > > States are allocated from per-worker-thread arena+free-list pairs, and > released to the free-list for the worker that completed use. They are > refcounted to track the number of refering tasks. > > Various other approaches (such as a single arena+FreeListAllocator) were > tested, but found to have worse performance, though in some cases fewer > allocations. The per-worker arena+free-list pair was the only option that > didn't show a regression compared to the previous PartialArrayScanTask > approach on a stress test. > > In addition to the changes to ScannerTask to support the new > PartialArrayState, it temporarily continues to support PartialArrayScanTask. > This is because ParallelGC will continue to use the latter until it is changed > to use PartialArrayState. The intent is to update ParallelGC in a followup CR. > > Testing: > mach5 tier1-5 > G1 performance suite LGTM! Was there any observable impact on G1 performance suite? ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20445#pullrequestreview-2223259036 From shade at openjdk.org Wed Aug 7 08:27:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 Aug 2024 08:27:35 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v3] In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 19:23:46 GMT, Neethu Prasad wrote: >> **Revision 2 Notes** >> 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. >> 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. >> >> **Revision 1 Notes** >> This PR adds the following >> 1. info logging on number of SATB flush attempts >> 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. >> >> As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. >> >> [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns >> [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns >> >> >> **Testing** >> 1. tier1, tier2 and hotspot_gc_shenandoah tests. >> 2. **-Xlog:gc+stats=info** >> >> >> [37.087s][info][gc,stats] CMR: VM Strong Roots 413 us, workers (us): 64, 57, 52, 47, 38, 31, 30, 25, 20, 21, 17, 10, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [37.087s][info][gc,stats] CMR: CLDG Roots 449 us, workers (us): 4, ---, ---, 406, ---, 15, ---, 4, 4, ---, ---, 17, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [37.087s][info][gc,stats] Concurrent Marking 5002 us >> [37.087s][info][gc,stats] SATB Flush Rendezvous 1748 us >> [37.087s][info][gc,stats] Pause Final Mark (G) 57272 us >> [37.087s][info][gc,stats] Pause Final Mark (N) 56985 us >> [37.087s][info][gc,stats] Finish Mark 387 us >> [37.087s][info][gc,stats] Update Region States 109 us >> [37.087s][info][gc,stats] Choose Collection Set 56395 us >> [37.087s][info][gc,stats] Rebuild Free Set 40 us >> >> >> on app termination >> >> >> [40.640s][info][gc,stats] Concurrent Reset = 0.914 s (a = 65255 us) (n = 14) (lvls, us = 54883, 55859, 63867, 65234, 97096) >> [40.640s][info][gc,stats] Pause Init Mark (G) = 1.755 s (a = 125380 us) (n = 14) (lvls, us = 119141, 123047, 125000, 125000, 128042) >> [40.640s][info][gc,stats] Pause Init Mark (N) = 1.697 s (a = 121241 us... > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > Address feedback on code style and uninitialized check Looks fine. Consider the remaining nits: src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp line 143: > 141: const double cycle_data = _cycle_data[phase]; > 142: if (should_aggregate) { > 143: _cycle_data[phase] = (cycle_data == uninitialized()) ? time : cycle_data + time; Suggestion: _cycle_data[phase] = (cycle_data == uninitialized()) ? time : (cycle_data + time); src/hotspot/share/gc/shenandoah/shenandoahUtils.hpp line 69: > 67: ShenandoahPhaseTimings::Phase _parent_phase; > 68: double _start; > 69: bool _should_aggregate; Should probably be `const bool`? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20318#pullrequestreview-2223286637 PR Review Comment: https://git.openjdk.org/jdk/pull/20318#discussion_r1706594323 PR Review Comment: https://git.openjdk.org/jdk/pull/20318#discussion_r1706595588 From shade at openjdk.org Wed Aug 7 11:57:02 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 Aug 2024 11:57:02 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions Message-ID: The expected behavior of `CollectedHeap::is_in` is to check whether the object belongs to the committed parts of the heap. This is useful to check if object resides in the parts of the heap the GC knows are not dead. Yet, Shenandoah's check just verifies that oop is within the heap bounds. So `is_in` check for an object that is in trashed/empty region would pass by accident, and we will miss detecting bugs. This should be rectified. I also re-wired assertions/verification code to be clear whether we check for heap bounds or actual in-heap conditions. Additional testing: - [ ] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/20492/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20492&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337981 Stats: 74 lines in 9 files changed: 35 ins; 0 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/20492.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20492/head:pull/20492 PR: https://git.openjdk.org/jdk/pull/20492 From nprasad at openjdk.org Wed Aug 7 13:19:12 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Wed, 7 Aug 2024 13:19:12 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v4] In-Reply-To: References: Message-ID: > **Revision 2 Notes** > 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. > 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. > > **Revision 1 Notes** > This PR adds the following > 1. info logging on number of SATB flush attempts > 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. > > As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. > > [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns > [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns > > > **Testing** > 1. tier1, tier2 and hotspot_gc_shenandoah tests. > 2. **-Xlog:gc+stats=info** > > > [37.087s][info][gc,stats] CMR: VM Strong Roots 413 us, workers (us): 64, 57, 52, 47, 38, 31, 30, 25, 20, 21, 17, 10, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, > [37.087s][info][gc,stats] CMR: CLDG Roots 449 us, workers (us): 4, ---, ---, 406, ---, 15, ---, 4, 4, ---, ---, 17, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, > [37.087s][info][gc,stats] Concurrent Marking 5002 us > [37.087s][info][gc,stats] SATB Flush Rendezvous 1748 us > [37.087s][info][gc,stats] Pause Final Mark (G) 57272 us > [37.087s][info][gc,stats] Pause Final Mark (N) 56985 us > [37.087s][info][gc,stats] Finish Mark 387 us > [37.087s][info][gc,stats] Update Region States 109 us > [37.087s][info][gc,stats] Choose Collection Set 56395 us > [37.087s][info][gc,stats] Rebuild Free Set 40 us > > > on app termination > > > [40.640s][info][gc,stats] Concurrent Reset = 0.914 s (a = 65255 us) (n = 14) (lvls, us = 54883, 55859, 63867, 65234, 97096) > [40.640s][info][gc,stats] Pause Init Mark (G) = 1.755 s (a = 125380 us) (n = 14) (lvls, us = 119141, 123047, 125000, 125000, 128042) > [40.640s][info][gc,stats] Pause Init Mark (N) = 1.697 s (a = 121241 us) (n = 14) (lvls, us = 117188, 119141, 121094, 121094, 123880) > ... Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: Address feedback on code style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20318/files - new: https://git.openjdk.org/jdk/pull/20318/files/a7c0514a..9649c2ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20318&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20318&range=02-03 Stats: 5 lines in 3 files changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20318.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20318/head:pull/20318 PR: https://git.openjdk.org/jdk/pull/20318 From duke at openjdk.org Wed Aug 7 13:42:06 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 7 Aug 2024 13:42:06 GMT Subject: RFR: 8310675: Fix -Wconversion warnings in ZGC code [v2] In-Reply-To: <_eEPqKVsKunCsC5ogqfaHPfgndjqBoJKV1iOsZAYxio=.723085df-731a-4c98-b74a-91575860e1ec@github.com> References: <_eEPqKVsKunCsC5ogqfaHPfgndjqBoJKV1iOsZAYxio=.723085df-731a-4c98-b74a-91575860e1ec@github.com> Message-ID: > Fixed `-Wconversion` warnings in ZGC code, either by adding an explicit type cast, changin the type of the variable or calling an equivalent method with other types. The largest change is the addition of `ZStatDurationSample`, which typecasts `Tickspan::value()` to a `uint64_t` and calls `ZStatSample` to make the code more readable. > > I isolated the `-Wconversion` warnings for ZGC by adding the flag to clangd and displaying the errors in my IDE and going through each file directly associated with ZGC one by one. > > Tested with tiers 1-3. Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8310675 - 8310675: Fixed -Wconversion warnings in ZGC ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20406/files - new: https://git.openjdk.org/jdk/pull/20406/files/5c3206cd..2b28a82f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20406&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20406&range=00-01 Stats: 13259 lines in 509 files changed: 6983 ins; 4162 del; 2114 mod Patch: https://git.openjdk.org/jdk/pull/20406.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20406/head:pull/20406 PR: https://git.openjdk.org/jdk/pull/20406 From stefank at openjdk.org Wed Aug 7 13:42:06 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 7 Aug 2024 13:42:06 GMT Subject: RFR: 8310675: Fix -Wconversion warnings in ZGC code [v2] In-Reply-To: References: <_eEPqKVsKunCsC5ogqfaHPfgndjqBoJKV1iOsZAYxio=.723085df-731a-4c98-b74a-91575860e1ec@github.com> Message-ID: On Wed, 7 Aug 2024 13:39:15 GMT, Joel Sikstr?m wrote: >> Fixed `-Wconversion` warnings in ZGC code, either by adding an explicit type cast, changin the type of the variable or calling an equivalent method with other types. The largest change is the addition of `ZStatDurationSample`, which typecasts `Tickspan::value()` to a `uint64_t` and calls `ZStatSample` to make the code more readable. >> >> I isolated the `-Wconversion` warnings for ZGC by adding the flag to clangd and displaying the errors in my IDE and going through each file directly associated with ZGC one by one. >> >> Tested with tiers 1-3. > > Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8310675 > - 8310675: Fixed -Wconversion warnings in ZGC Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20406#pullrequestreview-2225135283 From duke at openjdk.org Wed Aug 7 13:59:36 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 7 Aug 2024 13:59:36 GMT Subject: RFR: 8310675: Fix -Wconversion warnings in ZGC code [v2] In-Reply-To: References: <_eEPqKVsKunCsC5ogqfaHPfgndjqBoJKV1iOsZAYxio=.723085df-731a-4c98-b74a-91575860e1ec@github.com> Message-ID: On Tue, 6 Aug 2024 14:26:17 GMT, Albert Mingkun Yang wrote: >> Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8310675 >> - 8310675: Fixed -Wconversion warnings in ZGC > > Marked as reviewed by ayang (Reviewer). Thank you for reviews! @albertnetymk @stefank ------------- PR Comment: https://git.openjdk.org/jdk/pull/20406#issuecomment-2273541083 From duke at openjdk.org Wed Aug 7 13:59:37 2024 From: duke at openjdk.org (duke) Date: Wed, 7 Aug 2024 13:59:37 GMT Subject: RFR: 8310675: Fix -Wconversion warnings in ZGC code [v2] In-Reply-To: References: <_eEPqKVsKunCsC5ogqfaHPfgndjqBoJKV1iOsZAYxio=.723085df-731a-4c98-b74a-91575860e1ec@github.com> Message-ID: On Wed, 7 Aug 2024 13:42:06 GMT, Joel Sikstr?m wrote: >> Fixed `-Wconversion` warnings in ZGC code, either by adding an explicit type cast, changin the type of the variable or calling an equivalent method with other types. The largest change is the addition of `ZStatDurationSample`, which typecasts `Tickspan::value()` to a `uint64_t` and calls `ZStatSample` to make the code more readable. >> >> I isolated the `-Wconversion` warnings for ZGC by adding the flag to clangd and displaying the errors in my IDE and going through each file directly associated with ZGC one by one. >> >> Tested with tiers 1-3. > > Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8310675 > - 8310675: Fixed -Wconversion warnings in ZGC @jsikstro Your change (at version 2b28a82f20ce24d33de4fbe90455aa2ed05249e0) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20406#issuecomment-2273543069 From duke at openjdk.org Wed Aug 7 14:05:58 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 7 Aug 2024 14:05:58 GMT Subject: RFR: 8337939: ZGC: Make assertions and checks less convoluted and explicit Message-ID: There are currently cases where calls to type converters are made only to assert whether the conversion is reasonable or not and then discarding the result. For example, to_zaddress(...) is used to check if the pointer passed to it is a valid zaddress or not, whilst discarding the result of the conversion. Additionally, a call like oopDesc::is_oop(to_oop(o)) is convoluted since a similar check to is_oop() is already done inside to_oop(), which should be a separate operation in its entirety. Asserts/checks in affected places should be separated so that assertion/checking can be explicitly made and not done more than necessary. Tested with tiers 1-7 on linux64 and linux64-debug. ------------- Commit messages: - Update zPage.inline.hpp - 8337939: ZGC: Make assertions and checks less convoluted and explicit Changes: https://git.openjdk.org/jdk/pull/20478/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20478&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337939 Stats: 57 lines in 10 files changed: 32 ins; 8 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/20478.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20478/head:pull/20478 PR: https://git.openjdk.org/jdk/pull/20478 From stefank at openjdk.org Wed Aug 7 14:17:32 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 7 Aug 2024 14:17:32 GMT Subject: RFR: 8337939: ZGC: Make assertions and checks less convoluted and explicit In-Reply-To: References: Message-ID: <_661fPm-naPJKMvyhdmi3r2SktldtzcS7ooXTRkhDwg=.de998533-26c7-4af2-ac6b-363632ad3378@github.com> On Tue, 6 Aug 2024 15:15:57 GMT, Joel Sikstr?m wrote: > There are currently cases where calls to type converters are made only to assert whether the conversion is reasonable or not and then discarding the result. For example, to_zaddress(...) is used to check if the pointer passed to it is a valid zaddress or not, whilst discarding the result of the conversion. > > Additionally, a call like oopDesc::is_oop(to_oop(o)) is convoluted since a similar check to is_oop() is already done inside to_oop(), which should be a separate operation in its entirety. > > Asserts/checks in affected places should be separated so that assertion/checking can be explicitly made and not done more than necessary. > > Tested with tiers 1-7 on linux64 and linux64-debug. Changes requested by stefank (Reviewer). src/hotspot/share/gc/z/zVerify.cpp line 122: > 120: const oop obj = cast_to_oop(o); > 121: guarantee(oopDesc::is_oop(obj), BAD_OOP_ARG(o, p)); > 122: } I pre-reviewed this part, but I realize now that I'd like to update the parameter name for the zaddress. Would you mind updating the code this? Suggestion: static void z_verify_root_oop_object(zaddress addr, void* p) { const oop obj = cast_to_oop(addr); guarantee(oopDesc::is_oop(obj), BAD_OOP_ARG(addr, p)); } ------------- PR Review: https://git.openjdk.org/jdk/pull/20478#pullrequestreview-2225333707 PR Review Comment: https://git.openjdk.org/jdk/pull/20478#discussion_r1707089369 From duke at openjdk.org Wed Aug 7 14:18:36 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 7 Aug 2024 14:18:36 GMT Subject: Integrated: 8310675: Fix -Wconversion warnings in ZGC code In-Reply-To: <_eEPqKVsKunCsC5ogqfaHPfgndjqBoJKV1iOsZAYxio=.723085df-731a-4c98-b74a-91575860e1ec@github.com> References: <_eEPqKVsKunCsC5ogqfaHPfgndjqBoJKV1iOsZAYxio=.723085df-731a-4c98-b74a-91575860e1ec@github.com> Message-ID: On Wed, 31 Jul 2024 13:01:50 GMT, Joel Sikstr?m wrote: > Fixed `-Wconversion` warnings in ZGC code, either by adding an explicit type cast, changin the type of the variable or calling an equivalent method with other types. The largest change is the addition of `ZStatDurationSample`, which typecasts `Tickspan::value()` to a `uint64_t` and calls `ZStatSample` to make the code more readable. > > I isolated the `-Wconversion` warnings for ZGC by adding the flag to clangd and displaying the errors in my IDE and going through each file directly associated with ZGC one by one. > > Tested with tiers 1-3. This pull request has now been integrated. Changeset: 21f710e7 Author: Joel Sikstr?m Committer: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/21f710e7f6698b12b06cc3685cefa31f5fcff2a2 Stats: 120 lines in 33 files changed: 5 ins; 0 del; 115 mod 8310675: Fix -Wconversion warnings in ZGC code Reviewed-by: stefank, ayang ------------- PR: https://git.openjdk.org/jdk/pull/20406 From shade at openjdk.org Wed Aug 7 14:28:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 Aug 2024 14:28:32 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v4] In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 13:19:12 GMT, Neethu Prasad wrote: >> **Revision 2 Notes** >> 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. >> 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. >> >> **Revision 1 Notes** >> This PR adds the following >> 1. info logging on number of SATB flush attempts >> 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. >> >> As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. >> >> [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns >> [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns >> >> >> **Testing** >> 1. tier1, tier2 and hotspot_gc_shenandoah tests. >> 2. **-Xlog:gc+stats=info** >> >> >> [37.087s][info][gc,stats] CMR: VM Strong Roots 413 us, workers (us): 64, 57, 52, 47, 38, 31, 30, 25, 20, 21, 17, 10, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [37.087s][info][gc,stats] CMR: CLDG Roots 449 us, workers (us): 4, ---, ---, 406, ---, 15, ---, 4, 4, ---, ---, 17, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [37.087s][info][gc,stats] Concurrent Marking 5002 us >> [37.087s][info][gc,stats] SATB Flush Rendezvous 1748 us >> [37.087s][info][gc,stats] Pause Final Mark (G) 57272 us >> [37.087s][info][gc,stats] Pause Final Mark (N) 56985 us >> [37.087s][info][gc,stats] Finish Mark 387 us >> [37.087s][info][gc,stats] Update Region States 109 us >> [37.087s][info][gc,stats] Choose Collection Set 56395 us >> [37.087s][info][gc,stats] Rebuild Free Set 40 us >> >> >> on app termination >> >> >> [40.640s][info][gc,stats] Concurrent Reset = 0.914 s (a = 65255 us) (n = 14) (lvls, us = 54883, 55859, 63867, 65234, 97096) >> [40.640s][info][gc,stats] Pause Init Mark (G) = 1.755 s (a = 125380 us) (n = 14) (lvls, us = 119141, 123047, 125000, 125000, 128042) >> [40.640s][info][gc,stats] Pause Init Mark (N) = 1.697 s (a = 121241 us... > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > Address feedback on code style Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20318#pullrequestreview-2225370812 From shade at openjdk.org Wed Aug 7 14:57:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 Aug 2024 14:57:35 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 11:51:25 GMT, Aleksey Shipilev wrote: > The expected behavior of `CollectedHeap::is_in` is to check whether the object belongs to the committed parts of the heap. This is useful to check if object resides in the parts of the heap the GC knows are not dead. Yet, Shenandoah's check just verifies that oop is within the heap bounds. So `is_in` check for an object that is in trashed/empty region would pass by accident, and we will miss detecting bugs. This should be rectified. > > I also re-wired assertions/verification code to be clear whether we check for heap bounds or actual in-heap conditions. > > Additional testing: > - [ ] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` Test failures, there are verifier paths that touch dead Reference.referent, apparently. Figuring it out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20492#issuecomment-2273672661 From shade at openjdk.org Wed Aug 7 17:07:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 Aug 2024 17:07:35 GMT Subject: RFR: 8335865: Shenandoah: Improve THP pretouch after JDK-8315923 In-Reply-To: References: Message-ID: <1HLgUoF7ByaXQkgUB3UYK35VzxayzTXZl562fDBWKZ8=.3641cb43-c0f4-44c1-bbce-af168d02ead2@github.com> On Fri, 19 Jul 2024 14:28:24 GMT, Neethu Prasad wrote: > **Notes** > os::pretouch is now using madvice now when available and has a fall back to using vm page size [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923) > Hence removing code that sets _pretouch_heap_page_size & _pretouch_bitmap_page_size in Shenandoah. > > **Testing** > > * Ran test in Linux 5.10 and Linux 6.x and confirmed that there is no regression. I could not replicate the issue or performance improvement though. [add results] > * Ran [TestTransparentHugePageUsage](https://github.com/openjdk/jdk/commit/a65a89522d2f24b1767e1c74f6689a22ea32ca6a) for Shenandoah and verified that test passed > * Ran tier 1, tier 2 , tier1_gc_shenandoah, tier2_gc_shenandoah, tier3_gc_shenandoah and hotspot_gc_shenandoah. I am approving, since the "problem" appears to be a kernel version between 5.8 and 5.14. So THP is broken there, and MADV_POPULATE_WRITE is still not available. Reading the JDK-8315923 code, it essentially does what this code was doing, so we do not actually regress anything. I think we only need to confirm using the one-liner I had above that >=5.14 really works, and <5.8 does not regress the speed with which we wire up `AnonHugePages`. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 287: > 285: // Reserve aux bitmap for use in object_iterate(). We don't commit it here. > 286: size_t aux_bitmap_page_size = bitmap_page_size; > 287: I think this newline is unnecessary. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20254#pullrequestreview-2225728082 PR Review Comment: https://git.openjdk.org/jdk/pull/20254#discussion_r1707457414 From shade at openjdk.org Wed Aug 7 18:48:36 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 Aug 2024 18:48:36 GMT Subject: RFR: 8335865: Shenandoah: Improve THP pretouch after JDK-8315923 In-Reply-To: <1HLgUoF7ByaXQkgUB3UYK35VzxayzTXZl562fDBWKZ8=.3641cb43-c0f4-44c1-bbce-af168d02ead2@github.com> References: <1HLgUoF7ByaXQkgUB3UYK35VzxayzTXZl562fDBWKZ8=.3641cb43-c0f4-44c1-bbce-af168d02ead2@github.com> Message-ID: <-pE70sSWPv6wUCDfLwEp1f8ZSbV1N-Gn3lOlcINCxww=.9fe8a8e4-b69b-45cc-a891-7565a6ff8572@github.com> On Wed, 7 Aug 2024 17:04:38 GMT, Aleksey Shipilev wrote: > I think we only need to confirm using the one-liner I had above that >=5.14 really works I confirmed Shenandoah THP+Pretouch works well on my desktop with 5.15, either by default or with `-XX:-UseMadvPopulateWrite`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20254#issuecomment-2274119099 From shade at openjdk.org Wed Aug 7 18:50:47 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 Aug 2024 18:50:47 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v2] In-Reply-To: References: Message-ID: > The expected behavior of `CollectedHeap::is_in` is to check whether the object belongs to the committed parts of the heap: > https://github.com/openjdk/jdk/blob/d19ba81ce12a99de1114c1bfe67392f5aee2104e/src/hotspot/share/gc/shared/collectedHeap.hpp#L273-L276 > > This is useful to check if object resides in the parts of the heap the GC knows are not dead. Yet, Shenandoah's check just verifies that oop is within the heap bounds. So `is_in` check for an object that is in trashed/empty region would pass by accident, and we will miss detecting bugs. This should be rectified. I believe "committed" is too weak for the test as well, since we really want to know if we can touch the object, i.e. if it is in active region. > > I re-wired assertions/verification code to be clear whether we check for heap bounds or actual in-heap conditions. > > Deeper testing revealed that reference processing code potentially loads a dead referent, but only to null-check it, or ask bitmap about it. Still, more precise `in_heap` check fails asserts in `CompressedOops::decode`. That required a bit of touchup as well. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: - Style touchups - Fixing ShenandoahReferenceProcessor - Verifier fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20492/files - new: https://git.openjdk.org/jdk/pull/20492/files/dbab6d43..69c66853 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20492&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20492&range=00-01 Stats: 35 lines in 2 files changed: 22 ins; 7 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20492.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20492/head:pull/20492 PR: https://git.openjdk.org/jdk/pull/20492 From duke at openjdk.org Wed Aug 7 20:10:03 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 7 Aug 2024 20:10:03 GMT Subject: RFR: 8337939: ZGC: Make assertions and checks less convoluted and explicit [v2] In-Reply-To: References: Message-ID: > There are currently cases where calls to type converters are made only to assert whether the conversion is reasonable or not and then discarding the result. For example, to_zaddress(...) is used to check if the pointer passed to it is a valid zaddress or not, whilst discarding the result of the conversion. > > Additionally, a call like oopDesc::is_oop(to_oop(o)) is convoluted since a similar check to is_oop() is already done inside to_oop(), which should be a separate operation in its entirety. > > Asserts/checks in affected places should be separated so that assertion/checking can be explicitly made and not done more than necessary. > > Tested with tiers 1-7 on linux64 and linux64-debug. Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: Fix zaddress parameter name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20478/files - new: https://git.openjdk.org/jdk/pull/20478/files/70f13835..42044a86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20478&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20478&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20478.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20478/head:pull/20478 PR: https://git.openjdk.org/jdk/pull/20478 From stefank at openjdk.org Thu Aug 8 07:17:32 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 8 Aug 2024 07:17:32 GMT Subject: RFR: 8337939: ZGC: Make assertions and checks less convoluted and explicit [v2] In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 20:10:03 GMT, Joel Sikstr?m wrote: >> There are currently cases where calls to type converters are made only to assert whether the conversion is reasonable or not and then discarding the result. For example, to_zaddress(...) is used to check if the pointer passed to it is a valid zaddress or not, whilst discarding the result of the conversion. >> >> Additionally, a call like oopDesc::is_oop(to_oop(o)) is convoluted since a similar check to is_oop() is already done inside to_oop(), which should be a separate operation in its entirety. >> >> Asserts/checks in affected places should be separated so that assertion/checking can be explicitly made and not done more than necessary. >> >> Tested with tiers 1-7 on linux64 and linux64-debug. > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Fix zaddress parameter name Looks good. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20478#pullrequestreview-2227047222 From ayang at openjdk.org Thu Aug 8 07:42:31 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 8 Aug 2024 07:42:31 GMT Subject: RFR: 8337939: ZGC: Make assertions and checks less convoluted and explicit [v2] In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 20:10:03 GMT, Joel Sikstr?m wrote: >> There are currently cases where calls to type converters are made only to assert whether the conversion is reasonable or not and then discarding the result. For example, to_zaddress(...) is used to check if the pointer passed to it is a valid zaddress or not, whilst discarding the result of the conversion. >> >> Additionally, a call like oopDesc::is_oop(to_oop(o)) is convoluted since a similar check to is_oop() is already done inside to_oop(), which should be a separate operation in its entirety. >> >> Asserts/checks in affected places should be separated so that assertion/checking can be explicitly made and not done more than necessary. >> >> Tested with tiers 1-7 on linux64 and linux64-debug. > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Fix zaddress parameter name Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20478#pullrequestreview-2227097036 From ayang at openjdk.org Thu Aug 8 08:33:34 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 8 Aug 2024 08:33:34 GMT Subject: RFR: 8336086: G1: Use one G1CardSet instance for all young regions [v2] In-Reply-To: References: Message-ID: On Mon, 5 Aug 2024 14:42:45 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign a single G1CardSet to all young regions. As young regions are collected at the same, and we do not have young-to-young remembered sets, we can maintain a single G1CardSet for all young regions. >> >> This reduces the memory overhead of the G1CardSets and the time taken to merge per region G1CardSets during GC pause. >> >> Testing: Tier 1-5 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Albert Review > - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet > - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet > - cleanup > - merge > - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet > - init Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20134#pullrequestreview-2227206638 From ayang at openjdk.org Thu Aug 8 08:41:08 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 8 Aug 2024 08:41:08 GMT Subject: RFR: 8338036: Serial: Remove Generation::update_counters Message-ID: Trivial removing redundant code. ------------- Commit messages: - s1-perf-counter Changes: https://git.openjdk.org/jdk/pull/20509/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20509&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338036 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20509/head:pull/20509 PR: https://git.openjdk.org/jdk/pull/20509 From eosterlund at openjdk.org Thu Aug 8 13:29:34 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 8 Aug 2024 13:29:34 GMT Subject: RFR: 8337939: ZGC: Make assertions and checks less convoluted and explicit [v2] In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 20:10:03 GMT, Joel Sikstr?m wrote: >> There are currently cases where calls to type converters are made only to assert whether the conversion is reasonable or not and then discarding the result. For example, to_zaddress(...) is used to check if the pointer passed to it is a valid zaddress or not, whilst discarding the result of the conversion. >> >> Additionally, a call like oopDesc::is_oop(to_oop(o)) is convoluted since a similar check to is_oop() is already done inside to_oop(), which should be a separate operation in its entirety. >> >> Asserts/checks in affected places should be separated so that assertion/checking can be explicitly made and not done more than necessary. >> >> Tested with tiers 1-7 on linux64 and linux64-debug. > > Joel Sikstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Fix zaddress parameter name Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20478#pullrequestreview-2227880615 From rcastanedalo at openjdk.org Thu Aug 8 14:17:24 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Aug 2024 14:17:24 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v3] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Flatten barrier assembly generation code by removing helpers individual barrier tests and operations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/d722d4c7..20ef68c8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=01-02 Stats: 263 lines in 2 files changed: 77 ins; 116 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Thu Aug 8 14:23:36 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Aug 2024 14:23:36 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v3] In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 08:45:45 GMT, Albert Mingkun Yang wrote: >> Note that if we want to optimize the barrier code layout (see the [JEP description](https://openjdk.org/jeps/475), *Candidate optimizations* sub-section), splitting the assembly of each barrier in at least two blocks is necessary, since we need to separate the inline from the out-of-line (barrier stub) code. And since the assembly code has to be split into multiple functions anyway, I think it makes sense to group the code by logical blocks (different barrier tests, queue insertion, etc.), as proposed in this changeset. This also improves code reuse, e.g. the same `generate_queue_insertion` implementation is used for the pre- and post-barriers. >> If you still think there is value in grouping together the blocks that can be grouped together (e.g. `generate_single_region_test` + `generate_new_val_null_test` + `generate_card_young_test`), I can prototype the refactoring and let the G1 maintainers decide which alternative is more readable/maintainable. > >> This also improves code reuse > > In this area, I think code duplication is less of an issue -- it's more crucial that one can follow the asm flow as if reading real asm. (Ofc, this is subjective; feel free to keep as is.) I'm back from vacation now and resuming my work in this JEP. After some offline discussions, I have pushed a new version (commit 20ef68c81e) without helper functions, except for `generate_queue_insertion()` which is still included. @albertnetymk please have a look and let me know if you find the new style more readable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1709618766 From duke at openjdk.org Thu Aug 8 14:33:09 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 8 Aug 2024 14:33:09 GMT Subject: RFR: 8337939: ZGC: Make assertions and checks less convoluted and explicit [v3] In-Reply-To: References: Message-ID: > There are currently cases where calls to type converters are made only to assert whether the conversion is reasonable or not and then discarding the result. For example, to_zaddress(...) is used to check if the pointer passed to it is a valid zaddress or not, whilst discarding the result of the conversion. > > Additionally, a call like oopDesc::is_oop(to_oop(o)) is convoluted since a similar check to is_oop() is already done inside to_oop(), which should be a separate operation in its entirety. > > Asserts/checks in affected places should be separated so that assertion/checking can be explicitly made and not done more than necessary. > > Tested with tiers 1-7 on linux64 and linux64-debug. Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into zgc_assert_check_cleanup - Update copyright years - Fix zaddress parameter name - Update zPage.inline.hpp - 8337939: ZGC: Make assertions and checks less convoluted and explicit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20478/files - new: https://git.openjdk.org/jdk/pull/20478/files/42044a86..426c4be6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20478&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20478&range=01-02 Stats: 7463 lines in 163 files changed: 2134 ins; 4812 del; 517 mod Patch: https://git.openjdk.org/jdk/pull/20478.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20478/head:pull/20478 PR: https://git.openjdk.org/jdk/pull/20478 From stefank at openjdk.org Thu Aug 8 15:12:34 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 8 Aug 2024 15:12:34 GMT Subject: RFR: 8337939: ZGC: Make assertions and checks less convoluted and explicit [v3] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 14:33:09 GMT, Joel Sikstr?m wrote: >> There are currently cases where calls to type converters are made only to assert whether the conversion is reasonable or not and then discarding the result. For example, to_zaddress(...) is used to check if the pointer passed to it is a valid zaddress or not, whilst discarding the result of the conversion. >> >> Additionally, a call like oopDesc::is_oop(to_oop(o)) is convoluted since a similar check to is_oop() is already done inside to_oop(), which should be a separate operation in its entirety. >> >> Asserts/checks in affected places should be separated so that assertion/checking can be explicitly made and not done more than necessary. >> >> Tested with tiers 1-7 on linux64 and linux64-debug. > > Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into zgc_assert_check_cleanup > - Update copyright years > - Fix zaddress parameter name > - Update zPage.inline.hpp > - 8337939: ZGC: Make assertions and checks less convoluted and explicit Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20478#pullrequestreview-2228179678 From rcastanedalo at openjdk.org Thu Aug 8 15:37:19 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Aug 2024 15:37:19 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v4] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Also include HOTSPOT_TARGET_CPU_ARCH-based G1 ADL source file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/20ef68c8..47079ea1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Thu Aug 8 15:37:19 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Aug 2024 15:37:19 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: Message-ID: On Sat, 29 Jun 2024 03:51:29 GMT, Amit Kumar wrote: >> make/hotspot/gensrc/GensrcAdlc.gmk line 205: >> >>> 203: ifeq ($(call check-jvm-feature, g1gc), true) >>> 204: AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, $(AD_SRC_ROOTS), \ >>> 205: $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/g1/g1_$(HOTSPOT_TARGET_CPU).ad \ >> >> on s390, `g1_s390.ad` file is not compiled with current code. >> >> Suggestion: >> >> $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/g1/g1_$(HOTSPOT_TARGET_CPU_ARCH).ad \ > > I guess this one might be better: > > diff --git a/make/hotspot/gensrc/GensrcAdlc.gmk b/make/hotspot/gensrc/GensrcAdlc.gmk > index e34f0725397..ef9c15b2975 100644 > --- a/make/hotspot/gensrc/GensrcAdlc.gmk > +++ b/make/hotspot/gensrc/GensrcAdlc.gmk > @@ -203,6 +203,7 @@ ifeq ($(call check-jvm-feature, compiler2), true) > ifeq ($(call check-jvm-feature, g1gc), true) > AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, $(AD_SRC_ROOTS), \ > $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/g1/g1_$(HOTSPOT_TARGET_CPU).ad \ > + $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/g1/g1_$(HOTSPOT_TARGET_CPU_ARCH).ad \ > ))) > endif > > > Build is fine with both changes, (tested on Mac-M1) Thanks! I went with the second option (commit 47079ea1) for consistency with other collectors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1709781421 From shade at openjdk.org Thu Aug 8 16:23:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 Aug 2024 16:23:39 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v4] In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 13:19:12 GMT, Neethu Prasad wrote: >> **Revision 2 Notes** >> 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. >> 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. >> >> **Revision 1 Notes** >> This PR adds the following >> 1. info logging on number of SATB flush attempts >> 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. >> >> As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. >> >> [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns >> [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns >> >> >> **Testing** >> 1. tier1, tier2 and hotspot_gc_shenandoah tests. >> 2. **-Xlog:gc+stats=info** >> >> >> [37.087s][info][gc,stats] CMR: VM Strong Roots 413 us, workers (us): 64, 57, 52, 47, 38, 31, 30, 25, 20, 21, 17, 10, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [37.087s][info][gc,stats] CMR: CLDG Roots 449 us, workers (us): 4, ---, ---, 406, ---, 15, ---, 4, 4, ---, ---, 17, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [37.087s][info][gc,stats] Concurrent Marking 5002 us >> [37.087s][info][gc,stats] SATB Flush Rendezvous 1748 us >> [37.087s][info][gc,stats] Pause Final Mark (G) 57272 us >> [37.087s][info][gc,stats] Pause Final Mark (N) 56985 us >> [37.087s][info][gc,stats] Finish Mark 387 us >> [37.087s][info][gc,stats] Update Region States 109 us >> [37.087s][info][gc,stats] Choose Collection Set 56395 us >> [37.087s][info][gc,stats] Rebuild Free Set 40 us >> >> >> on app termination >> >> >> [40.640s][info][gc,stats] Concurrent Reset = 0.914 s (a = 65255 us) (n = 14) (lvls, us = 54883, 55859, 63867, 65234, 97096) >> [40.640s][info][gc,stats] Pause Init Mark (G) = 1.755 s (a = 125380 us) (n = 14) (lvls, us = 119141, 123047, 125000, 125000, 128042) >> [40.640s][info][gc,stats] Pause Init Mark (N) = 1.697 s (a = 121241 us... > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > Address feedback on code style Marked as reviewed by shade (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahUtils.hpp line 67: > 65: ShenandoahPhaseTimings* const _timings; > 66: const ShenandoahPhaseTimings::Phase _phase; > 67: const bool _should_aggregate; One really tiny thing: `_should_aggregate` should be indented like other field names. ------------- PR Review: https://git.openjdk.org/jdk/pull/20318#pullrequestreview-2228346593 PR Review Comment: https://git.openjdk.org/jdk/pull/20318#discussion_r1709865975 From ayang at openjdk.org Thu Aug 8 16:47:40 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 8 Aug 2024 16:47:40 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v4] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 15:37:19 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Also include HOTSPOT_TARGET_CPU_ARCH-based G1 ADL source file Some naming comments/suggestions, up to you. g1_write_barrier_post_c2 generate_c2_post_barrier_stub The latter is the "next" step if slower path is taken. I wonder if it can be renamed to sth like "...write_barrier_post_c2_stub" to make it obvious that they are related. Both "write_barrier_pre" and "pre_write_barrier" exist. It's not obvious whether that is intended (to highlight some diff) or not. ------------- Marked as reviewed by ayang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2228393022 From duke at openjdk.org Thu Aug 8 17:44:33 2024 From: duke at openjdk.org (duke) Date: Thu, 8 Aug 2024 17:44:33 GMT Subject: RFR: 8337939: ZGC: Make assertions and checks less convoluted and explicit [v3] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 14:33:09 GMT, Joel Sikstr?m wrote: >> There are currently cases where calls to type converters are made only to assert whether the conversion is reasonable or not and then discarding the result. For example, to_zaddress(...) is used to check if the pointer passed to it is a valid zaddress or not, whilst discarding the result of the conversion. >> >> Additionally, a call like oopDesc::is_oop(to_oop(o)) is convoluted since a similar check to is_oop() is already done inside to_oop(), which should be a separate operation in its entirety. >> >> Asserts/checks in affected places should be separated so that assertion/checking can be explicitly made and not done more than necessary. >> >> Tested with tiers 1-7 on linux64 and linux64-debug. > > Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into zgc_assert_check_cleanup > - Update copyright years > - Fix zaddress parameter name > - Update zPage.inline.hpp > - 8337939: ZGC: Make assertions and checks less convoluted and explicit @jsikstro Your change (at version 426c4be69b9d287345378587748b777f16beed2d) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20478#issuecomment-2276344924 From kbarrett at openjdk.org Thu Aug 8 17:56:30 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 8 Aug 2024 17:56:30 GMT Subject: RFR: 8337709: Use allocated states for chunking large array processing In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 08:18:50 GMT, Ivan Walulya wrote: > Was there any observable impact on G1 performance suite? No, it looked like just the usual random noise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20445#issuecomment-2276365582 From kbarrett at openjdk.org Fri Aug 9 07:07:35 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 9 Aug 2024 07:07:35 GMT Subject: RFR: 8338036: Serial: Remove Generation::update_counters In-Reply-To: References: Message-ID: <6z4eo6-KCKUKXeZv23ifKvLX1ZQACNFZBBcMxO5BC34=.fb5403be-03ea-4aaf-a3dc-ad40f5666275@github.com> On Thu, 8 Aug 2024 08:35:40 GMT, Albert Mingkun Yang wrote: > Trivial removing redundant code. Looks good, and trivial. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20509#pullrequestreview-2229461965 From duke at openjdk.org Fri Aug 9 07:32:41 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 9 Aug 2024 07:32:41 GMT Subject: RFR: 8337939: ZGC: Make assertions and checks less convoluted and explicit [v3] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 15:10:24 GMT, Stefan Karlsson wrote: >> Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into zgc_assert_check_cleanup >> - Update copyright years >> - Fix zaddress parameter name >> - Update zPage.inline.hpp >> - 8337939: ZGC: Make assertions and checks less convoluted and explicit > > Marked as reviewed by stefank (Reviewer). Thank you for the reviews! @stefank, @albertnetymk, @fisk ------------- PR Comment: https://git.openjdk.org/jdk/pull/20478#issuecomment-2276343869 From duke at openjdk.org Fri Aug 9 07:32:41 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 9 Aug 2024 07:32:41 GMT Subject: Integrated: 8337939: ZGC: Make assertions and checks less convoluted and explicit In-Reply-To: References: Message-ID: On Tue, 6 Aug 2024 15:15:57 GMT, Joel Sikstr?m wrote: > There are currently cases where calls to type converters are made only to assert whether the conversion is reasonable or not and then discarding the result. For example, to_zaddress(...) is used to check if the pointer passed to it is a valid zaddress or not, whilst discarding the result of the conversion. > > Additionally, a call like oopDesc::is_oop(to_oop(o)) is convoluted since a similar check to is_oop() is already done inside to_oop(), which should be a separate operation in its entirety. > > Asserts/checks in affected places should be separated so that assertion/checking can be explicitly made and not done more than necessary. > > Tested with tiers 1-7 on linux64 and linux64-debug. This pull request has now been integrated. Changeset: f74109bd Author: Joel Sikstr?m URL: https://git.openjdk.org/jdk/commit/f74109bd178c92a9dff1ca6fce03b25f51a0384f Stats: 63 lines in 10 files changed: 32 ins; 8 del; 23 mod 8337939: ZGC: Make assertions and checks less convoluted and explicit Reviewed-by: stefank, ayang, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/20478 From ayang at openjdk.org Fri Aug 9 08:28:42 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 9 Aug 2024 08:28:42 GMT Subject: RFR: 8338036: Serial: Remove Generation::update_counters In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 08:35:40 GMT, Albert Mingkun Yang wrote: > Trivial removing redundant code. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20509#issuecomment-2277422911 From ayang at openjdk.org Fri Aug 9 08:28:42 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 9 Aug 2024 08:28:42 GMT Subject: Integrated: 8338036: Serial: Remove Generation::update_counters In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 08:35:40 GMT, Albert Mingkun Yang wrote: > Trivial removing redundant code. This pull request has now been integrated. Changeset: 6ebd5d74 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/6ebd5d74d57b334e7cf0b1282d7bb469a56fb3d6 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod 8338036: Serial: Remove Generation::update_counters Reviewed-by: kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/20509 From tschatzl at openjdk.org Fri Aug 9 09:43:32 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 9 Aug 2024 09:43:32 GMT Subject: RFR: 8337709: Use allocated states for chunking large array processing In-Reply-To: References: Message-ID: On Fri, 2 Aug 2024 19:36:47 GMT, Kim Barrett wrote: > Please review this change to the G1 young/mixed collector to use allocated > states to encode partial array task chunking. > > States are allocated from per-worker-thread arena+free-list pairs, and > released to the free-list for the worker that completed use. They are > refcounted to track the number of refering tasks. > > Various other approaches (such as a single arena+FreeListAllocator) were > tested, but found to have worse performance, though in some cases fewer > allocations. The per-worker arena+free-list pair was the only option that > didn't show a regression compared to the previous PartialArrayScanTask > approach on a stress test. > > In addition to the changes to ScannerTask to support the new > PartialArrayState, it temporarily continues to support PartialArrayScanTask. > This is because ParallelGC will continue to use the latter until it is changed > to use PartialArrayState. The intent is to update ParallelGC in a followup CR. > > Testing: > mach5 tier1-5 > G1 performance suite Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20445#pullrequestreview-2229794656 From rcastanedalo at openjdk.org Fri Aug 9 11:48:17 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 9 Aug 2024 11:48:17 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Give barrier generation helper functions a more consistent name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/47079ea1..1834bf41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=03-04 Stats: 455 lines in 3 files changed: 0 ins; 0 del; 455 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Fri Aug 9 11:52:35 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 9 Aug 2024 11:52:35 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 11:48:17 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Give barrier generation helper functions a more consistent name Thanks for reviewing, Albert! > ``` > g1_write_barrier_post_c2 > generate_c2_post_barrier_stub > ``` > > The latter is the "next" step if slower path is taken. I wonder if it can be renamed to sth like "...write_barrier_post_c2_stub" to make it obvious that they are related. I agree with your suggestion, but will postpone it to a follow-up task to avoid interfering with the ongoing port work (the names are dictated by the platform-independent `G1PreBarrierStubC2::emit_code()` and `G1PostBarrierStubC2::emit_code()` functions, so a name change would affect every platform). > Both "write_barrier_pre" and "pre_write_barrier" exist. It's not obvious whether that is intended (to highlight some diff) or not. This is accidental, as far as I can see. `write_barrier_pre` is the pre-existing name for the interpreter barrier generation functions, I would rather leave it as-is to avoid making this changeset even larger. Instead, I have renamed the helper functions `g1_pre_write_barrier()` and `g1_post_write_barrier()` to `write_barrier_pre()` and `write_barrier_post()`, for consistency (and dropped `g1_` since it is obvious from the context) in commit 1834bf4. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2277770042 From rcastanedalo at openjdk.org Fri Aug 9 12:03:37 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 9 Aug 2024 12:03:37 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> Message-ID: <5Q8PqULlpKfoPLXRqI0ua0dVWAy3zPBqtFpycNwBg0Y=.f2830c84-63ba-43cd-85e3-2245e4ac8917@github.com> On Sun, 21 Jul 2024 08:21:39 GMT, Martin Doerr wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Build barrier data in G1BarrierSetC2::get_store_barrier() by adding, rather than removing, barrier tags > > src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 86: > >> 84: // an indirect memory operand) to reduce C2's scheduling and register >> 85: // allocation pressure (fewer Mach nodes). The same holds for g1StoreN and >> 86: // g1EncodePAndStoreN. > > I'm not convinced that this is beneficial. We're wasting a temp register just for an addition? I agree that using indirect memory operands is the most readable choice, and is slightly less wasteful from a register usage perspective. However, when I tried this choice a couple of months ago, I observed timeouts in some CTW runs, which as far as I remember were caused when LCM processed huge basic blocks with lots of memory writes (e.g. arising from static initializations of large String arrays such as in [here](https://github.com/apache/lucene/blob/ea562f6ef2b32fe6eadf57c6381d9a69acb043c7/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/KStemData1.java#L47-L748)), in combination with C2 stress options. In these scenarios, the large number of additional Mach nodes seemed to cause the timeouts. I settled for materializing the store address internally to guard against such corner cases. I did not see any significant performance difference between the two choices in my benchmark results. I would like to study whether LCM can be made more robust in this scenario, which would enable using indirect memory operands here, but I think this would be best addressed in a separate RFE. Would it be OK by now to extend the code comment with the details provided in the above explanation? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1711337413 From duke at openjdk.org Fri Aug 9 12:57:44 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 9 Aug 2024 12:57:44 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT Message-ID: Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. Tested with tiers 1-3. ------------- Commit messages: - Remove trailing whitespace - 8337938: ZUtils::alloc_aligned allocates without reporting to NMT Changes: https://git.openjdk.org/jdk/pull/20523/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20523&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337938 Stats: 101 lines in 6 files changed: 13 ins; 83 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20523.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20523/head:pull/20523 PR: https://git.openjdk.org/jdk/pull/20523 From stefank at openjdk.org Fri Aug 9 13:30:32 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 9 Aug 2024 13:30:32 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 12:47:18 GMT, Joel Sikstr?m wrote: > Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. > > Tested with tiers 1-3. Looks good. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20523#pullrequestreview-2230191554 From mdoerr at openjdk.org Fri Aug 9 14:08:34 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 9 Aug 2024 14:08:34 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <5Q8PqULlpKfoPLXRqI0ua0dVWAy3zPBqtFpycNwBg0Y=.f2830c84-63ba-43cd-85e3-2245e4ac8917@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <5Q8PqULlpKfoPLXRqI0ua0dVWAy3zPBqtFpycNwBg0Y=.f2830c84-63ba-43cd-85e3-2245e4ac8917@github.com> Message-ID: On Fri, 9 Aug 2024 12:00:26 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 86: >> >>> 84: // an indirect memory operand) to reduce C2's scheduling and register >>> 85: // allocation pressure (fewer Mach nodes). The same holds for g1StoreN and >>> 86: // g1EncodePAndStoreN. >> >> I'm not convinced that this is beneficial. We're wasting a temp register just for an addition? > > I agree that using indirect memory operands is the most readable choice, and is slightly less wasteful from a register usage perspective. However, when I tried this choice a couple of months ago, I observed timeouts in some CTW runs, which as far as I remember were caused when LCM processed huge basic blocks with lots of memory writes (e.g. arising from static initializations of large String arrays such as in [here](https://github.com/apache/lucene/blob/ea562f6ef2b32fe6eadf57c6381d9a69acb043c7/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/KStemData1.java#L47-L748)), in combination with C2 stress options. In these scenarios, the large number of additional Mach nodes seemed to cause the timeouts. I settled for materializing the store address internally to guard against such corner cases. I did not see any significant performance difference between the two choices in my benchmark results. > > I would like to study whether LCM can be made more robust in this scenario, which would enable using indirect memory operands here, but I think this would be best addressed in a separate RFE. Would it be OK by now to extend the code comment with the details provided in the above explanation? Ok, doing it in a separate RFE is fine with me. This sounds like a C2 problem which should get investigated. It may cause other performance problems, too. Maybe a native profiler can show what takes too much time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1711536279 From nprasad at openjdk.org Fri Aug 9 14:54:26 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Fri, 9 Aug 2024 14:54:26 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v5] In-Reply-To: References: Message-ID: > **Notes** > Adding logs to get more visibility into how fast a thread resumes from allocation stall. > > **Testing** > * tier 1, tier 2, hotspot_gc tests. > > Example log messages > > 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. > > 2. Thread exiting critical region Thread "main" 0 locked. > > 3. Thread stalled by JNI critical section. Resumed after 586ms. Thread "Thread-0". > > 4. Thread blocked to enter critical region. Resumed after 1240ms. Thread "SIGINT handler". Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: address code style feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20277/files - new: https://git.openjdk.org/jdk/pull/20277/files/c53dc9cf..77fd9d55 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20277&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20277&range=03-04 Stats: 21 lines in 1 file changed: 4 ins; 7 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20277.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20277/head:pull/20277 PR: https://git.openjdk.org/jdk/pull/20277 From btaylor at openjdk.org Fri Aug 9 17:58:58 2024 From: btaylor at openjdk.org (Ben Taylor) Date: Fri, 9 Aug 2024 17:58:58 GMT Subject: RFR: 8337815: Relax G1EvacStats atomic operations Message-ID: This PR should slightly improve the performance of G1EvacStats by using `memory_order_relaxed` instead of the default `memory_order_conservative`. Since the original bug report says >I doubt it would show on benchmarks, this is a paper-cut issue. I haven't benchmarked this change for performance. The change passes all tests in `gc/g1` locally on x86_64 linux. ------------- Commit messages: - 8337815: Relax G1EvacStats atomic operations Changes: https://git.openjdk.org/jdk/pull/20529/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20529&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337815 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20529.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20529/head:pull/20529 PR: https://git.openjdk.org/jdk/pull/20529 From kbarrett at openjdk.org Sun Aug 11 16:40:30 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 11 Aug 2024 16:40:30 GMT Subject: RFR: 8337815: Relax G1EvacStats atomic operations In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 17:46:43 GMT, Ben Taylor wrote: > This PR should slightly improve the performance of G1EvacStats by using `memory_order_relaxed` instead of the default `memory_order_conservative`. > Since the original bug report says > >>I doubt it would show on benchmarks, this is a paper-cut issue. > > I haven't benchmarked this change for performance. > > The change passes all tests in `gc/g1` locally on x86_64 linux. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20529#pullrequestreview-2231854153 From kbarrett at openjdk.org Sun Aug 11 18:27:35 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 11 Aug 2024 18:27:35 GMT Subject: RFR: 8337709: Use allocated states for chunking large array processing In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 08:18:50 GMT, Ivan Walulya wrote: >> Please review this change to the G1 young/mixed collector to use allocated >> states to encode partial array task chunking. >> >> States are allocated from per-worker-thread arena+free-list pairs, and >> released to the free-list for the worker that completed use. They are >> refcounted to track the number of refering tasks. >> >> Various other approaches (such as a single arena+FreeListAllocator) were >> tested, but found to have worse performance, though in some cases fewer >> allocations. The per-worker arena+free-list pair was the only option that >> didn't show a regression compared to the previous PartialArrayScanTask >> approach on a stress test. >> >> In addition to the changes to ScannerTask to support the new >> PartialArrayState, it temporarily continues to support PartialArrayScanTask. >> This is because ParallelGC will continue to use the latter until it is changed >> to use PartialArrayState. The intent is to update ParallelGC in a followup CR. >> >> Testing: >> mach5 tier1-5 >> G1 performance suite > > LGTM! > > Was there any observable impact on G1 performance suite? Thanks for reviews @walulyai and @tschatzl ------------- PR Comment: https://git.openjdk.org/jdk/pull/20445#issuecomment-2282846972 From kbarrett at openjdk.org Sun Aug 11 18:36:38 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 11 Aug 2024 18:36:38 GMT Subject: Integrated: 8337709: Use allocated states for chunking large array processing In-Reply-To: References: Message-ID: On Fri, 2 Aug 2024 19:36:47 GMT, Kim Barrett wrote: > Please review this change to the G1 young/mixed collector to use allocated > states to encode partial array task chunking. > > States are allocated from per-worker-thread arena+free-list pairs, and > released to the free-list for the worker that completed use. They are > refcounted to track the number of refering tasks. > > Various other approaches (such as a single arena+FreeListAllocator) were > tested, but found to have worse performance, though in some cases fewer > allocations. The per-worker arena+free-list pair was the only option that > didn't show a regression compared to the previous PartialArrayScanTask > approach on a stress test. > > In addition to the changes to ScannerTask to support the new > PartialArrayState, it temporarily continues to support PartialArrayScanTask. > This is because ParallelGC will continue to use the latter until it is changed > to use PartialArrayState. The intent is to update ParallelGC in a followup CR. > > Testing: > mach5 tier1-5 > G1 performance suite This pull request has now been integrated. Changeset: 6a3d0452 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/6a3d045221c338fefec9bd59245324eae60b156b Stats: 501 lines in 9 files changed: 356 ins; 57 del; 88 mod 8337709: Use allocated states for chunking large array processing Reviewed-by: iwalulya, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/20445 From kbarrett at openjdk.org Mon Aug 12 05:21:32 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 12 Aug 2024 05:21:32 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 12:47:18 GMT, Joel Sikstr?m wrote: > Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. > > Tested with tiers 1-3. Looks good, except for some copyrights needing update. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20523#pullrequestreview-2232062157 From amitkumar at openjdk.org Mon Aug 12 05:25:33 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 12 Aug 2024 05:25:33 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 11:48:17 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Give barrier generation helper functions a more consistent name is there issue if we replace this code: if (in_bytes(SATBMarkQueue::byte_width_of_active()) == 4) { __ ldrw(rscratch1, in_progress); } else { assert(in_bytes(SATBMarkQueue::byte_width_of_active()) == 1, "Assumption"); __ ldrb(rscratch1, in_progress); } in method `G1BarrierSetAssembler::gen_write_ref_array_pre_barrier` with `generate_queue_test_and_insertion(masm, rthread, rscratch1)` ? Though you have to move the `gen_write_ref_array_pre_barrier` on top otherwise compiler wouldn't be able to find it. ------------- PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2232065079 From duke at openjdk.org Mon Aug 12 06:29:48 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 12 Aug 2024 06:29:48 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT [v2] In-Reply-To: References: Message-ID: > Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. > > Tested with tiers 1-3. Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into zgc_zutils_alloc_aligned - Updated copyright years - Remove trailing whitespace - 8337938: ZUtils::alloc_aligned allocates without reporting to NMT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20523/files - new: https://git.openjdk.org/jdk/pull/20523/files/cb942b1b..d227e0de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20523&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20523&range=00-01 Stats: 551 lines in 25 files changed: 357 ins; 77 del; 117 mod Patch: https://git.openjdk.org/jdk/pull/20523.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20523/head:pull/20523 PR: https://git.openjdk.org/jdk/pull/20523 From kbarrett at openjdk.org Mon Aug 12 07:12:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 12 Aug 2024 07:12:37 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT [v2] In-Reply-To: References: Message-ID: <6QCBXfFDwLUTkzz9hezTYFbRwUExGs3HOyNrXEDshks=.08f22e57-c52b-4374-9925-4f66ceed25a0@github.com> On Mon, 12 Aug 2024 06:29:48 GMT, Joel Sikstr?m wrote: >> Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. >> >> Tested with tiers 1-3. > > Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into zgc_zutils_alloc_aligned > - Updated copyright years > - Remove trailing whitespace > - 8337938: ZUtils::alloc_aligned allocates without reporting to NMT Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20523#pullrequestreview-2232190875 From tschatzl at openjdk.org Mon Aug 12 07:44:35 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 12 Aug 2024 07:44:35 GMT Subject: RFR: 8337815: Relax G1EvacStats atomic operations In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 17:46:43 GMT, Ben Taylor wrote: > This PR should slightly improve the performance of G1EvacStats by using `memory_order_relaxed` instead of the default `memory_order_conservative`. > Since the original bug report says > >>I doubt it would show on benchmarks, this is a paper-cut issue. > > I haven't benchmarked this change for performance. > > The change passes all tests in `gc/g1` locally on x86_64 linux. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20529#pullrequestreview-2232247488 From tschatzl at openjdk.org Mon Aug 12 07:45:33 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 12 Aug 2024 07:45:33 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v5] In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 14:54:26 GMT, Neethu Prasad wrote: >> **Notes** >> Adding logs to get more visibility into how fast a thread resumes from allocation stall. >> >> **Testing** >> * tier 1, tier 2, hotspot_gc tests. >> >> Example log messages >> >> 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. >> >> 2. Thread exiting critical region Thread "main" 0 locked. >> >> 3. Thread stalled by JNI critical section. Resumed after 586ms. Thread "Thread-0". >> >> 4. Thread blocked to enter critical region. Resumed after 1240ms. Thread "SIGINT handler". > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > address code style feedback Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20277#pullrequestreview-2232249505 From ayang at openjdk.org Mon Aug 12 08:01:35 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 12 Aug 2024 08:01:35 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v5] In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 14:54:26 GMT, Neethu Prasad wrote: >> **Notes** >> Adding logs to get more visibility into how fast a thread resumes from allocation stall. >> >> **Testing** >> * tier 1, tier 2, hotspot_gc tests. >> >> Example log messages >> >> 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. >> >> 2. Thread exiting critical region Thread "main" 0 locked. >> >> 3. Thread stalled by JNI critical section. Resumed after 586ms. Thread "Thread-0". >> >> 4. Thread blocked to enter critical region. Resumed after 1240ms. Thread "SIGINT handler". > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > address code style feedback src/hotspot/share/gc/shared/gcLocker.cpp line 139: > 137: // Wait for _needs_gc to be cleared > 138: while (needs_gc()) { > 139: GCLockerTimingDebugLogger logger("Thread stalled by JNI critical section."); If a spurious wakeup occurs, the logger will be instantiated multiple times, this can lead to confusing log msgs, right? If so, I wonder whether it makes sense to extract `logger` out of the while-iteration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20277#discussion_r1713314902 From shade at openjdk.org Mon Aug 12 08:24:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Aug 2024 08:24:32 GMT Subject: RFR: 8337815: Relax G1EvacStats atomic operations In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 17:46:43 GMT, Ben Taylor wrote: > This PR should slightly improve the performance of G1EvacStats by using `memory_order_relaxed` instead of the default `memory_order_conservative`. > Since the original bug report says > >>I doubt it would show on benchmarks, this is a paper-cut issue. > > I haven't benchmarked this change for performance. > > The change passes all tests in `gc/g1` locally on x86_64 linux. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20529#pullrequestreview-2232324912 From rcastanedalo at openjdk.org Mon Aug 12 08:38:37 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Aug 2024 08:38:37 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 05:23:06 GMT, Amit Kumar wrote: > is there issue if we replace this code: > > ``` > if (in_bytes(SATBMarkQueue::byte_width_of_active()) == 4) { > __ ldrw(rscratch1, in_progress); > } else { > assert(in_bytes(SATBMarkQueue::byte_width_of_active()) == 1, "Assumption"); > __ ldrb(rscratch1, in_progress); > } > ``` > > in method `G1BarrierSetAssembler::gen_write_ref_array_pre_barrier` with `generate_queue_test_and_insertion(masm, rthread, rscratch1)` ? > > Though you have to move the `gen_write_ref_array_pre_barrier` on top otherwise compiler wouldn't be able to find it. Thanks for the suggestion Amit! this refactoring would work (assuming you mean `generate_pre_barrier_fast_path` instead of `generate_queue_test_and_insertion`), however I am hesitant to apply it because 1) it would further increase the size of the changelog and hence the burden of reviewing it and 2) it is not a clear maintainability win: some engineers prefer a little bit of code duplication to preserve the assembly code flow (see discussion [here](https://github.com/openjdk/jdk/pull/19746#discussion_r1645713269)). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2283395013 From rcastanedalo at openjdk.org Mon Aug 12 08:46:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Aug 2024 08:46:16 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Further motivate the choice of internal store address materialization in x64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/1834bf41..d21104ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=04-05 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Mon Aug 12 08:46:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Aug 2024 08:46:16 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <5Q8PqULlpKfoPLXRqI0ua0dVWAy3zPBqtFpycNwBg0Y=.f2830c84-63ba-43cd-85e3-2245e4ac8917@github.com> Message-ID: On Fri, 9 Aug 2024 14:05:43 GMT, Martin Doerr wrote: >> I agree that using indirect memory operands is the most readable choice, and is slightly less wasteful from a register usage perspective. However, when I tried this choice a couple of months ago, I observed timeouts in some CTW runs, which as far as I remember were caused when LCM processed huge basic blocks with lots of memory writes (e.g. arising from static initializations of large String arrays such as in [here](https://github.com/apache/lucene/blob/ea562f6ef2b32fe6eadf57c6381d9a69acb043c7/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/KStemData1.java#L47-L748)), in combination with C2 stress options. In these scenarios, the large number of additional Mach nodes seemed to cause the timeouts. I settled for materializing the store address internally to guard against such corner cases. I did not see any significant performance difference between the two choices in my benchmark results. >> >> I would like to study whether LCM can be made more robust in this scenario, which would enable using indirect memory operands here, but I think this would be best addressed in a separate RFE. Would it be OK by now to extend the code comment with the details provided in the above explanation? > > Ok, doing it in a separate RFE is fine with me. This sounds like a C2 problem which should get investigated. It may cause other performance problems, too. Maybe a native profiler can show what takes too much time. Thanks Martin, I have added this to my list of follow-up tasks and extended the comment in the code with some more details (commit d21104ca8). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1713372749 From amitkumar at openjdk.org Mon Aug 12 08:50:35 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 12 Aug 2024 08:50:35 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 08:35:57 GMT, Roberto Casta?eda Lozano wrote: > > is there issue if we replace this code: > > ``` > > if (in_bytes(SATBMarkQueue::byte_width_of_active()) == 4) { > > __ ldrw(rscratch1, in_progress); > > } else { > > assert(in_bytes(SATBMarkQueue::byte_width_of_active()) == 1, "Assumption"); > > __ ldrb(rscratch1, in_progress); > > } > > ``` > > > > > > > > > > > > > > > > > > > > > > > > in method `G1BarrierSetAssembler::gen_write_ref_array_pre_barrier` with `generate_queue_test_and_insertion(masm, rthread, rscratch1)` ? > > Though you have to move the `gen_write_ref_array_pre_barrier` on top otherwise compiler wouldn't be able to find it. > > Thanks for the suggestion Amit! this refactoring would work (assuming you mean `generate_pre_barrier_fast_path` instead of `generate_queue_test_and_insertion`), however I am hesitant to apply it because 1) it would further increase the size of the changelog and hence the burden of reviewing it and 2) it is not a clear maintainability win: some engineers prefer a little bit of code duplication to preserve the assembly code flow (see discussion [here](https://github.com/openjdk/jdk/pull/19746#discussion_r1645713269)). Ha! makes sense. Are you planning to rebase it with master ? Nothing important, but there were couple of failures which are fixed after this PR. So will make test result a bit clean for us ?. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2283418237 From shade at openjdk.org Mon Aug 12 09:55:40 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Aug 2024 09:55:40 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v5] In-Reply-To: References: Message-ID: <51nqp49EdM1zrS_Fck_dV_DaGPbK6ZXmMa4EwRWd5AE=.70d60821-e292-45ae-8b1b-990dc520600f@github.com> On Mon, 12 Aug 2024 07:59:17 GMT, Albert Mingkun Yang wrote: >> Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: >> >> address code style feedback > > src/hotspot/share/gc/shared/gcLocker.cpp line 139: > >> 137: // Wait for _needs_gc to be cleared >> 138: while (needs_gc()) { >> 139: GCLockerTimingDebugLogger logger("Thread stalled by JNI critical section."); > > If a spurious wakeup occurs, the logger will be instantiated multiple times, this can lead to confusing log msgs, right? If so, I wonder whether it makes sense to extract `logger` out of the while-iteration. Agreed. Same in `GCLocker::jni_lock` below. It would probably take the form of: if (needs_gc()) { GCLockerTracer::inc_stall_count(); log_debug_jni("Allocation failed. Thread stalled by JNI critical section."); GCLockerTimingDebugLogger logger("Thread stalled by JNI critical section."); while (needs_gc()) { ml.wait(); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20277#discussion_r1713459073 From duke at openjdk.org Mon Aug 12 11:01:36 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 12 Aug 2024 11:01:36 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT [v2] In-Reply-To: References: Message-ID: <-fmLaWio8oZygaYx76ekJpOU7ZOAP9NZT943MOf0AdY=.ac0f1ffa-d9a4-4b45-a580-681d194235f8@github.com> On Mon, 12 Aug 2024 10:53:12 GMT, Stefan Karlsson wrote: >> Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into zgc_zutils_alloc_aligned >> - Updated copyright years >> - Remove trailing whitespace >> - 8337938: ZUtils::alloc_aligned allocates without reporting to NMT > > Marked as reviewed by stefank (Reviewer). Thank you for the reviews! @stefank @kimbarrett ------------- PR Comment: https://git.openjdk.org/jdk/pull/20523#issuecomment-2283655289 From duke at openjdk.org Mon Aug 12 11:01:36 2024 From: duke at openjdk.org (duke) Date: Mon, 12 Aug 2024 11:01:36 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT [v2] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 06:29:48 GMT, Joel Sikstr?m wrote: >> Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. >> >> Tested with tiers 1-3. > > Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into zgc_zutils_alloc_aligned > - Updated copyright years > - Remove trailing whitespace > - 8337938: ZUtils::alloc_aligned allocates without reporting to NMT @jsikstro Your change (at version d227e0dee82abb51f23ffbb6c2e199248a273123) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20523#issuecomment-2283657113 From stefank at openjdk.org Mon Aug 12 11:01:36 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 12 Aug 2024 11:01:36 GMT Subject: RFR: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT [v2] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 06:29:48 GMT, Joel Sikstr?m wrote: >> Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. >> >> Tested with tiers 1-3. > > Joel Sikstr?m has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into zgc_zutils_alloc_aligned > - Updated copyright years > - Remove trailing whitespace > - 8337938: ZUtils::alloc_aligned allocates without reporting to NMT Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20523#pullrequestreview-2232648180 From duke at openjdk.org Mon Aug 12 11:01:37 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 12 Aug 2024 11:01:37 GMT Subject: Integrated: 8337938: ZUtils::alloc_aligned allocates without reporting to NMT In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 12:47:18 GMT, Joel Sikstr?m wrote: > Replaces usage of posix_memalign/_aligned_malloc with os::malloc and manual alignment to report memory usage to NMT. Manually aligning the memory makes the returned address unfreeable by malloc (as clarified by the added comment), which is reasonable since the memory used by ZUtils::alloc_aligned is never freed. > > Tested with tiers 1-3. This pull request has now been integrated. Changeset: a6c06307 Author: Joel Sikstr?m Committer: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/a6c0630737bbf2f2e6c64863ff9b43c50c4742b6 Stats: 104 lines in 6 files changed: 13 ins; 83 del; 8 mod 8337938: ZUtils::alloc_aligned allocates without reporting to NMT Reviewed-by: stefank, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/20523 From rcastanedalo at openjdk.org Mon Aug 12 12:13:42 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Aug 2024 12:13:42 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 08:48:24 GMT, Amit Kumar wrote: > Are you planning to rebase it with master ? Nothing important, but there were couple of failures which are fixed after this PR. So will make test result a bit clean for us ?. Actually, I have refrained to update to the latest mainline changes to avoid interfering with the porting work while it is in progress, but if there is consensus among the port maintainers I would be happy to update the changeset regularly. @TheRealMDoerr @feilongjiang @offamitkumar @snazarkin what do you prefer? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2283802801 From mdoerr at openjdk.org Mon Aug 12 12:25:41 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 12 Aug 2024 12:25:41 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 08:46:16 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Further motivate the choice of internal store address materialization in x64 I'm a bit concerned about regular updates. We should at least check if all platforms are in a good shape before merging. JDK head looks good at the moment, so I'd appreciate an update. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2283828888 From shade at openjdk.org Mon Aug 12 13:53:07 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Aug 2024 13:53:07 GMT Subject: RFR: 8338202: Shenandoah: Improve handshake closure labels Message-ID: Currently, Shenandoah has a few handshakes that have not very clear names, "ShenandoahRendezvous". Would be good to make them more explicit. ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/20549/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20549&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338202 Stats: 7 lines in 5 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20549.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20549/head:pull/20549 PR: https://git.openjdk.org/jdk/pull/20549 From rcastanedalo at openjdk.org Mon Aug 12 14:00:36 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Aug 2024 14:00:36 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 12:23:24 GMT, Martin Doerr wrote: > I'm a bit concerned about regular updates. We should at least check if all platforms are in a good shape before merging. JDK head looks good at the moment, so I'd appreciate an update. OK, I will test and push a merge of jdk-24+10 (Thu Aug 8) in the next days, unless @feilongjiang or @snazarkin object. We can then check in a few weeks if another update is required. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2284064522 From mdoerr at openjdk.org Mon Aug 12 14:06:35 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 12 Aug 2024 14:06:35 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 08:46:16 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Further motivate the choice of internal store address materialization in x64 src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 203: > 201: // Do we need to load the previous value? > 202: if (obj != noreg) { > 203: __ load_heap_oop(pre_val, Address(obj, 0), noreg, noreg, AS_RAW); How do we handle implicit null checks for which `obj` is null? Note that we may expect the store instruction to trigger SIGSEGV. Does it work correctly if we trigger the SIGSEGV, here in the pre barrier? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1713842991 From rkennke at openjdk.org Mon Aug 12 16:23:31 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 12 Aug 2024 16:23:31 GMT Subject: RFR: 8338202: Shenandoah: Improve handshake closure labels In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 13:47:28 GMT, Aleksey Shipilev wrote: > Currently, Shenandoah has a few handshakes that have not very clear names, "ShenandoahRendezvous". Would be good to make them more explicit. > > Before: > > > Event: 2.593 Executing VM operation: Shenandoah Init Marking > Event: 2.594 Executing VM operation: Shenandoah Init Marking done > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) done > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) done > Event: 2.599 Executing VM operation: Shenandoah Final Mark and Start Evacuation > Event: 2.600 Executing VM operation: Shenandoah Final Mark and Start Evacuation done > Event: 2.600 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) > Event: 2.600 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) done > Event: 2.604 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) > Event: 2.604 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) done > Event: 2.605 Executing VM operation: CleanClassLoaderDataMetaspaces > Event: 2.606 Executing VM operation: CleanClassLoaderDataMetaspaces done > Event: 2.606 Executing VM operation: Shenandoah Init Update References > Event: 2.606 Executing VM operation: Shenandoah Init Update References done > Event: 2.611 Executing VM operation: HandshakeAllThreads (Shenandoah Update Thread Roots) > Event: 2.611 Executing VM operation: HandshakeAllThreads (Shenandoah Update Thread Roots) done > Event: 2.611 Executing VM operation: Shenandoah Final Update References > Event: 2.611 Executing VM operation: Shenandoah Final Update References done > > > After: > > > Event: 1.043 Executing VM operation: Shenandoah Init Marking > Event: 1.044 Executing VM operation: Shenandoah Init Marking done > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) done > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) done > Event: 1.050 Executing VM operation: Shenandoah Final Mark and Start Evacuation > Event: 1.050 Executing VM operation: Shenandoah Final Mark and Start Evacuation done > Event: 1.051 Executing VM operation: HandshakeAllThreads (Shenandoah Concurrent Weak... Looks good to me, thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20549#pullrequestreview-2233517305 From duke at openjdk.org Mon Aug 12 17:24:36 2024 From: duke at openjdk.org (duke) Date: Mon, 12 Aug 2024 17:24:36 GMT Subject: RFR: 8337815: Relax G1EvacStats atomic operations In-Reply-To: References: Message-ID: <5QnG1iNd_jkaPF8PaoK7zFcXBmD-720JdoozCpwgzJQ=.65d8801d-be2d-4700-b5df-5b2d282bcf25@github.com> On Fri, 9 Aug 2024 17:46:43 GMT, Ben Taylor wrote: > This PR should slightly improve the performance of G1EvacStats by using `memory_order_relaxed` instead of the default `memory_order_conservative`. > Since the original bug report says > >>I doubt it would show on benchmarks, this is a paper-cut issue. > > I haven't benchmarked this change for performance. > > The change passes all tests in `gc/g1` locally on x86_64 linux. @benty-amzn Your change (at version 2c113c71dd08a7f2c15c0f48c1fc3cd27cb5a725) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20529#issuecomment-2284547524 From btaylor at openjdk.org Mon Aug 12 17:29:34 2024 From: btaylor at openjdk.org (Ben Taylor) Date: Mon, 12 Aug 2024 17:29:34 GMT Subject: Integrated: 8337815: Relax G1EvacStats atomic operations In-Reply-To: References: Message-ID: On Fri, 9 Aug 2024 17:46:43 GMT, Ben Taylor wrote: > This PR should slightly improve the performance of G1EvacStats by using `memory_order_relaxed` instead of the default `memory_order_conservative`. > Since the original bug report says > >>I doubt it would show on benchmarks, this is a paper-cut issue. > > I haven't benchmarked this change for performance. > > The change passes all tests in `gc/g1` locally on x86_64 linux. This pull request has now been integrated. Changeset: 2ca136a7 Author: Ben Taylor Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/2ca136a7adb6defaea3b7a69d30e6c36bda66e6a Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod 8337815: Relax G1EvacStats atomic operations Reviewed-by: kbarrett, tschatzl, shade ------------- PR: https://git.openjdk.org/jdk/pull/20529 From zgu at openjdk.org Mon Aug 12 17:58:44 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 12 Aug 2024 17:58:44 GMT Subject: RFR: 8338248: PartialArrayStateAllocator::Impl leaks Arena array Message-ID: <0vEz5GvIeko4KzQYITYb-6KDn8cvv0nanpJCoe19R44=.5a30b0e3-c990-4374-9e66-800f5cdac038@github.com> Simple fix for a memory leak ------------- Commit messages: - 8338248: PartialArrayStateAllocator::Impl leaks Arena array Changes: https://git.openjdk.org/jdk/pull/20557/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20557&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338248 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20557.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20557/head:pull/20557 PR: https://git.openjdk.org/jdk/pull/20557 From kbarrett at openjdk.org Mon Aug 12 18:24:30 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 12 Aug 2024 18:24:30 GMT Subject: RFR: 8338248: PartialArrayStateAllocator::Impl leaks Arena array In-Reply-To: <0vEz5GvIeko4KzQYITYb-6KDn8cvv0nanpJCoe19R44=.5a30b0e3-c990-4374-9e66-800f5cdac038@github.com> References: <0vEz5GvIeko4KzQYITYb-6KDn8cvv0nanpJCoe19R44=.5a30b0e3-c990-4374-9e66-800f5cdac038@github.com> Message-ID: <5RT8WSSVIVZIbQQYNCXYjFV29OJWVO9aKM5eXfxCp64=.b92c2a16-e53a-406d-88bd-468239b4730c@github.com> On Mon, 12 Aug 2024 17:52:55 GMT, Zhengyu Gu wrote: > Simple fix for a memory leak Marked as reviewed by kbarrett (Reviewer). src/hotspot/share/gc/shared/partialArrayState.cpp line 104: > 102: } > 103: > 104: FREE_C_HEAP_ARRAY(Arena*, _arenas); Bleh. Thanks for catching this. I don't see a need for the extra blank line, but otherwise this is good. ------------- PR Review: https://git.openjdk.org/jdk/pull/20557#pullrequestreview-2233741819 PR Review Comment: https://git.openjdk.org/jdk/pull/20557#discussion_r1714200858 From shade at openjdk.org Mon Aug 12 18:52:47 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Aug 2024 18:52:47 GMT Subject: RFR: 8338248: PartialArrayStateAllocator::Impl leaks Arena array [v2] In-Reply-To: References: <0vEz5GvIeko4KzQYITYb-6KDn8cvv0nanpJCoe19R44=.5a30b0e3-c990-4374-9e66-800f5cdac038@github.com> Message-ID: On Mon, 12 Aug 2024 18:49:53 GMT, Zhengyu Gu wrote: >> Simple fix for a memory leak > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Remove empty line Oops. Looks fine. I agree there is no need for an empty line. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20557#pullrequestreview-2233786817 From zgu at openjdk.org Mon Aug 12 18:52:46 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 12 Aug 2024 18:52:46 GMT Subject: RFR: 8338248: PartialArrayStateAllocator::Impl leaks Arena array [v2] In-Reply-To: <0vEz5GvIeko4KzQYITYb-6KDn8cvv0nanpJCoe19R44=.5a30b0e3-c990-4374-9e66-800f5cdac038@github.com> References: <0vEz5GvIeko4KzQYITYb-6KDn8cvv0nanpJCoe19R44=.5a30b0e3-c990-4374-9e66-800f5cdac038@github.com> Message-ID: > Simple fix for a memory leak Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Remove empty line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20557/files - new: https://git.openjdk.org/jdk/pull/20557/files/b24a60e0..1f4739dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20557&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20557&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20557.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20557/head:pull/20557 PR: https://git.openjdk.org/jdk/pull/20557 From kbarrett at openjdk.org Mon Aug 12 18:52:47 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 12 Aug 2024 18:52:47 GMT Subject: RFR: 8338248: PartialArrayStateAllocator::Impl leaks Arena array [v2] In-Reply-To: References: <0vEz5GvIeko4KzQYITYb-6KDn8cvv0nanpJCoe19R44=.5a30b0e3-c990-4374-9e66-800f5cdac038@github.com> Message-ID: <9FRgTgbD1tJpCA2U85Rx27BpnyTk1NGitBluGhLnGeE=.a8febcfa-bbad-4d04-b764-c1a72edaf807@github.com> On Mon, 12 Aug 2024 18:49:53 GMT, Zhengyu Gu wrote: >> Simple fix for a memory leak > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Remove empty line Looks good, and trivial. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20557#pullrequestreview-2233789957 From shade at openjdk.org Mon Aug 12 18:58:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Aug 2024 18:58:39 GMT Subject: RFR: 8338248: PartialArrayStateAllocator::Impl leaks Arena array [v2] In-Reply-To: References: <0vEz5GvIeko4KzQYITYb-6KDn8cvv0nanpJCoe19R44=.5a30b0e3-c990-4374-9e66-800f5cdac038@github.com> Message-ID: On Mon, 12 Aug 2024 18:52:46 GMT, Zhengyu Gu wrote: >> Simple fix for a memory leak > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Remove empty line Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20557#pullrequestreview-2233801718 From ysr at openjdk.org Mon Aug 12 20:11:32 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 12 Aug 2024 20:11:32 GMT Subject: RFR: 8338202: Shenandoah: Improve handshake closure labels In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 13:47:28 GMT, Aleksey Shipilev wrote: > Currently, Shenandoah has a few handshakes that have not very clear names, "ShenandoahRendezvous". Would be good to make them more explicit. > > Before: > > > Event: 2.593 Executing VM operation: Shenandoah Init Marking > Event: 2.594 Executing VM operation: Shenandoah Init Marking done > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) done > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) done > Event: 2.599 Executing VM operation: Shenandoah Final Mark and Start Evacuation > Event: 2.600 Executing VM operation: Shenandoah Final Mark and Start Evacuation done > Event: 2.600 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) > Event: 2.600 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) done > Event: 2.604 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) > Event: 2.604 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) done > Event: 2.605 Executing VM operation: CleanClassLoaderDataMetaspaces > Event: 2.606 Executing VM operation: CleanClassLoaderDataMetaspaces done > Event: 2.606 Executing VM operation: Shenandoah Init Update References > Event: 2.606 Executing VM operation: Shenandoah Init Update References done > Event: 2.611 Executing VM operation: HandshakeAllThreads (Shenandoah Update Thread Roots) > Event: 2.611 Executing VM operation: HandshakeAllThreads (Shenandoah Update Thread Roots) done > Event: 2.611 Executing VM operation: Shenandoah Final Update References > Event: 2.611 Executing VM operation: Shenandoah Final Update References done > > > After: > > > Event: 1.043 Executing VM operation: Shenandoah Init Marking > Event: 1.044 Executing VM operation: Shenandoah Init Marking done > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) done > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) done > Event: 1.050 Executing VM operation: Shenandoah Final Mark and Start Evacuation > Event: 1.050 Executing VM operation: Shenandoah Final Mark and Start Evacuation done > Event: 1.051 Executing VM operation: HandshakeAllThreads (Shenandoah Concurrent Weak... Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20549#pullrequestreview-2233923775 From wkemper at openjdk.org Mon Aug 12 20:28:32 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 12 Aug 2024 20:28:32 GMT Subject: RFR: 8338202: Shenandoah: Improve handshake closure labels In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 13:47:28 GMT, Aleksey Shipilev wrote: > Currently, Shenandoah has a few handshakes that have not very clear names, "ShenandoahRendezvous". Would be good to make them more explicit. > > Before: > > > Event: 2.593 Executing VM operation: Shenandoah Init Marking > Event: 2.594 Executing VM operation: Shenandoah Init Marking done > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) done > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) done > Event: 2.599 Executing VM operation: Shenandoah Final Mark and Start Evacuation > Event: 2.600 Executing VM operation: Shenandoah Final Mark and Start Evacuation done > Event: 2.600 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) > Event: 2.600 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) done > Event: 2.604 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) > Event: 2.604 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) done > Event: 2.605 Executing VM operation: CleanClassLoaderDataMetaspaces > Event: 2.606 Executing VM operation: CleanClassLoaderDataMetaspaces done > Event: 2.606 Executing VM operation: Shenandoah Init Update References > Event: 2.606 Executing VM operation: Shenandoah Init Update References done > Event: 2.611 Executing VM operation: HandshakeAllThreads (Shenandoah Update Thread Roots) > Event: 2.611 Executing VM operation: HandshakeAllThreads (Shenandoah Update Thread Roots) done > Event: 2.611 Executing VM operation: Shenandoah Final Update References > Event: 2.611 Executing VM operation: Shenandoah Final Update References done > > > After: > > > Event: 1.043 Executing VM operation: Shenandoah Init Marking > Event: 1.044 Executing VM operation: Shenandoah Init Marking done > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) done > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) done > Event: 1.050 Executing VM operation: Shenandoah Final Mark and Start Evacuation > Event: 1.050 Executing VM operation: Shenandoah Final Mark and Start Evacuation done > Event: 1.051 Executing VM operation: HandshakeAllThreads (Shenandoah Concurrent Weak... Marked as reviewed by wkemper (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20549#pullrequestreview-2233950899 From nprasad at openjdk.org Mon Aug 12 22:39:25 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Mon, 12 Aug 2024 22:39:25 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v6] In-Reply-To: References: Message-ID: <-UX_bf4EPxAUOOYQrK69o4o0ewKcMx8ajJog5bdCspE=.b3ce2098-a937-4c81-8aca-1993c14b9e28@github.com> > **Notes** > Adding logs to get more visibility into how fast a thread resumes from allocation stall. > > **Testing** > * tier 1, tier 2, hotspot_gc tests. > > Example log messages > > 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. > > 2. Thread exiting critical region Thread "main" 0 locked. > > 3. Thread stalled by JNI critical section. Resumed after 586ms. Thread "Thread-0". > > 4. Thread blocked to enter critical region. Resumed after 1240ms. Thread "SIGINT handler". Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: address feedback regarding logger potentially getting instantiated multiple times ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20277/files - new: https://git.openjdk.org/jdk/pull/20277/files/77fd9d55..6ca6f29d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20277&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20277&range=04-05 Stats: 21 lines in 1 file changed: 5 ins; 6 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20277.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20277/head:pull/20277 PR: https://git.openjdk.org/jdk/pull/20277 From zgu at openjdk.org Mon Aug 12 23:03:02 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 12 Aug 2024 23:03:02 GMT Subject: Integrated: 8338248: PartialArrayStateAllocator::Impl leaks Arena array In-Reply-To: <0vEz5GvIeko4KzQYITYb-6KDn8cvv0nanpJCoe19R44=.5a30b0e3-c990-4374-9e66-800f5cdac038@github.com> References: <0vEz5GvIeko4KzQYITYb-6KDn8cvv0nanpJCoe19R44=.5a30b0e3-c990-4374-9e66-800f5cdac038@github.com> Message-ID: On Mon, 12 Aug 2024 17:52:55 GMT, Zhengyu Gu wrote: > Simple fix for a memory leak This pull request has now been integrated. Changeset: e70c9bcc Author: Zhengyu Gu URL: https://git.openjdk.org/jdk/commit/e70c9bccaae375be1ee6812dabc9fbaff01a6ab0 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8338248: PartialArrayStateAllocator::Impl leaks Arena array Reviewed-by: kbarrett, shade ------------- PR: https://git.openjdk.org/jdk/pull/20557 From zgu at openjdk.org Mon Aug 12 23:03:01 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 12 Aug 2024 23:03:01 GMT Subject: RFR: 8338248: PartialArrayStateAllocator::Impl leaks Arena array [v2] In-Reply-To: References: <0vEz5GvIeko4KzQYITYb-6KDn8cvv0nanpJCoe19R44=.5a30b0e3-c990-4374-9e66-800f5cdac038@github.com> Message-ID: On Mon, 12 Aug 2024 18:52:46 GMT, Zhengyu Gu wrote: >> Simple fix for a memory leak > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Remove empty line Thanks, @kimbarrett @shipilev ------------- PR Comment: https://git.openjdk.org/jdk/pull/20557#issuecomment-2285039310 From ayang at openjdk.org Tue Aug 13 06:37:51 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 13 Aug 2024 06:37:51 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v6] In-Reply-To: <-UX_bf4EPxAUOOYQrK69o4o0ewKcMx8ajJog5bdCspE=.b3ce2098-a937-4c81-8aca-1993c14b9e28@github.com> References: <-UX_bf4EPxAUOOYQrK69o4o0ewKcMx8ajJog5bdCspE=.b3ce2098-a937-4c81-8aca-1993c14b9e28@github.com> Message-ID: <6bz0xVqL9XZkIlX4r6JJViPtMs6ZiSNkByQ0UUSyPb0=.fd456e16-7e5d-43ef-bcfd-439ed05f3bcd@github.com> On Mon, 12 Aug 2024 22:39:25 GMT, Neethu Prasad wrote: >> **Notes** >> Adding logs to get more visibility into how fast a thread resumes from allocation stall. >> >> **Testing** >> * tier 1, tier 2, hotspot_gc tests. >> >> Example log messages >> >> 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. >> >> 2. Thread exiting critical region Thread "main" 0 locked. >> >> 3. Thread stalled by JNI critical section. Resumed after 586ms. Thread "Thread-0". >> >> 4. Thread blocked to enter critical region. Resumed after 1240ms. Thread "SIGINT handler". > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > address feedback regarding logger potentially getting instantiated multiple times > 2. Thread exiting critical region Thread "main" 0 locked. Seems that there is period missing after "region" in the PR description. (The code looks correct to me.) src/hotspot/share/gc/shared/gcLocker.cpp line 58: > 56: ResourceMark rm; // JavaThread::name() allocates to convert to UTF8 > 57: const Tickspan elapsed_time = Ticks::now() - _start; > 58: log.debug("%s Resumed after " UINT64_FORMAT "ms. Thread \"%s\".", _log_message, elapsed_time.milliseconds(), Thread::current()->name()); Maybe `UINT64_FORMAT` can be replaced by `%zu`? ------------- PR Review: https://git.openjdk.org/jdk/pull/20277#pullrequestreview-2234616989 PR Review Comment: https://git.openjdk.org/jdk/pull/20277#discussion_r1714739408 From ayang at openjdk.org Tue Aug 13 07:33:15 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 13 Aug 2024 07:33:15 GMT Subject: RFR: 8338280: Parallel: Inline ParallelCompactData::verify_clear Message-ID: Trivial inlining a method to its sole caller. ------------- Commit messages: - trivial Changes: https://git.openjdk.org/jdk/pull/20561/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20561&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338280 Stats: 9 lines in 2 files changed: 0 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20561.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20561/head:pull/20561 PR: https://git.openjdk.org/jdk/pull/20561 From tschatzl at openjdk.org Tue Aug 13 08:03:51 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 13 Aug 2024 08:03:51 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v6] In-Reply-To: <6bz0xVqL9XZkIlX4r6JJViPtMs6ZiSNkByQ0UUSyPb0=.fd456e16-7e5d-43ef-bcfd-439ed05f3bcd@github.com> References: <-UX_bf4EPxAUOOYQrK69o4o0ewKcMx8ajJog5bdCspE=.b3ce2098-a937-4c81-8aca-1993c14b9e28@github.com> <6bz0xVqL9XZkIlX4r6JJViPtMs6ZiSNkByQ0UUSyPb0=.fd456e16-7e5d-43ef-bcfd-439ed05f3bcd@github.com> Message-ID: <8GXGKTUrLt8RihoOCRCZN0SI_LHPrQyQEOZOO_KEyMw=.1f1c1a8f-a927-45f7-a98e-213fe2f15558@github.com> On Tue, 13 Aug 2024 06:33:31 GMT, Albert Mingkun Yang wrote: >> Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: >> >> address feedback regarding logger potentially getting instantiated multiple times > > src/hotspot/share/gc/shared/gcLocker.cpp line 58: > >> 56: ResourceMark rm; // JavaThread::name() allocates to convert to UTF8 >> 57: const Tickspan elapsed_time = Ticks::now() - _start; >> 58: log.debug("%s Resumed after " UINT64_FORMAT "ms. Thread \"%s\".", _log_message, elapsed_time.milliseconds(), Thread::current()->name()); > > Maybe `UINT64_FORMAT` can be replaced by `%zu`? `Tickspan::milliseconds()` returns an `uint64_t` which is not `size_t`; the `z` length modifier is only for `size_t` and would be wrong to use here on 32 bit systems. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20277#discussion_r1714850063 From shade at openjdk.org Tue Aug 13 08:14:53 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Aug 2024 08:14:53 GMT Subject: RFR: 8338202: Shenandoah: Improve handshake closure labels In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 13:47:28 GMT, Aleksey Shipilev wrote: > Currently, Shenandoah has a few handshakes that have not very clear names, "ShenandoahRendezvous". Would be good to make them more explicit. > > Before: > > > Event: 2.593 Executing VM operation: Shenandoah Init Marking > Event: 2.594 Executing VM operation: Shenandoah Init Marking done > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) done > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) done > Event: 2.599 Executing VM operation: Shenandoah Final Mark and Start Evacuation > Event: 2.600 Executing VM operation: Shenandoah Final Mark and Start Evacuation done > Event: 2.600 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) > Event: 2.600 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) done > Event: 2.604 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) > Event: 2.604 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) done > Event: 2.605 Executing VM operation: CleanClassLoaderDataMetaspaces > Event: 2.606 Executing VM operation: CleanClassLoaderDataMetaspaces done > Event: 2.606 Executing VM operation: Shenandoah Init Update References > Event: 2.606 Executing VM operation: Shenandoah Init Update References done > Event: 2.611 Executing VM operation: HandshakeAllThreads (Shenandoah Update Thread Roots) > Event: 2.611 Executing VM operation: HandshakeAllThreads (Shenandoah Update Thread Roots) done > Event: 2.611 Executing VM operation: Shenandoah Final Update References > Event: 2.611 Executing VM operation: Shenandoah Final Update References done > > > After: > > > Event: 1.043 Executing VM operation: Shenandoah Init Marking > Event: 1.044 Executing VM operation: Shenandoah Init Marking done > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) done > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) done > Event: 1.050 Executing VM operation: Shenandoah Final Mark and Start Evacuation > Event: 1.050 Executing VM operation: Shenandoah Final Mark and Start Evacuation done > Event: 1.051 Executing VM operation: HandshakeAllThreads (Shenandoah Concurrent Weak... Thanks all! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20549#issuecomment-2285630273 From shade at openjdk.org Tue Aug 13 08:14:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Aug 2024 08:14:54 GMT Subject: Integrated: 8338202: Shenandoah: Improve handshake closure labels In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 13:47:28 GMT, Aleksey Shipilev wrote: > Currently, Shenandoah has a few handshakes that have not very clear names, "ShenandoahRendezvous". Would be good to make them more explicit. > > Before: > > > Event: 2.593 Executing VM operation: Shenandoah Init Marking > Event: 2.594 Executing VM operation: Shenandoah Init Marking done > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) done > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) > Event: 2.599 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB Handshake) done > Event: 2.599 Executing VM operation: Shenandoah Final Mark and Start Evacuation > Event: 2.600 Executing VM operation: Shenandoah Final Mark and Start Evacuation done > Event: 2.600 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) > Event: 2.600 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) done > Event: 2.604 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) > Event: 2.604 Executing VM operation: HandshakeAllThreads (ShenandoahRendezvous) done > Event: 2.605 Executing VM operation: CleanClassLoaderDataMetaspaces > Event: 2.606 Executing VM operation: CleanClassLoaderDataMetaspaces done > Event: 2.606 Executing VM operation: Shenandoah Init Update References > Event: 2.606 Executing VM operation: Shenandoah Init Update References done > Event: 2.611 Executing VM operation: HandshakeAllThreads (Shenandoah Update Thread Roots) > Event: 2.611 Executing VM operation: HandshakeAllThreads (Shenandoah Update Thread Roots) done > Event: 2.611 Executing VM operation: Shenandoah Final Update References > Event: 2.611 Executing VM operation: Shenandoah Final Update References done > > > After: > > > Event: 1.043 Executing VM operation: Shenandoah Init Marking > Event: 1.044 Executing VM operation: Shenandoah Init Marking done > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) done > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) > Event: 1.050 Executing VM operation: HandshakeAllThreads (Shenandoah Flush SATB) done > Event: 1.050 Executing VM operation: Shenandoah Final Mark and Start Evacuation > Event: 1.050 Executing VM operation: Shenandoah Final Mark and Start Evacuation done > Event: 1.051 Executing VM operation: HandshakeAllThreads (Shenandoah Concurrent Weak... This pull request has now been integrated. Changeset: ba69ed7c Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/ba69ed7c58fcf99ed18dfd8840125ddcac9460bb Stats: 7 lines in 5 files changed: 0 ins; 0 del; 7 mod 8338202: Shenandoah: Improve handshake closure labels Reviewed-by: rkennke, ysr, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/20549 From rcastanedalo at openjdk.org Tue Aug 13 14:23:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 Aug 2024 14:23:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 14:03:53 GMT, Martin Doerr wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Further motivate the choice of internal store address materialization in x64 > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 203: > >> 201: // Do we need to load the previous value? >> 202: if (obj != noreg) { >> 203: __ load_heap_oop(pre_val, Address(obj, 0), noreg, noreg, AS_RAW); > > How do we handle implicit null checks for which `obj` is null? Note that we may expect the store instruction to trigger SIGSEGV. Does it work correctly if we trigger the SIGSEGV, here in the pre barrier? Good question! I have checked (no pun intended) and it turns out C2 never uses stores (or other memory access operations) with late-expanded barriers to perform implicit null checks. This is accidental, due to the fact that all these memory operations use MachTemp nodes in the C2 code generation stage (to reserve registers for their ADL TEMP operands). C2's implicit null check analysis requires that all inputs of a candidate memory operation dominate the null check [1], which fails if the operation uses MachTemp nodes, since these are always placed in the same basic block [2]. Note that this optimization triggers very rarely, if at all, for memory operations in the current early barrier expansion model, since the additional control flow of the barrier code obfuscates the analysis. For late barrier expansion, the analysis could be easily extended to recognize and hoist MachTemp nodes together with their user memory operation that is a candidate to implement the implicit null check [3], but that would require extending the barrier assembly emission step to populate the implicit null exception table correctly. Since this seems non-trivial, and would also affect other garbage collectors (ZGC), I suggest to simply assert for now that we do not generate implicit null checks for memory operations with barrier data (as in [4]), and leave full support for implicit null checks for these G1 and ZGC operations to a future RFE. What do you think? [1] https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/lcm.cpp#L310-L328 [2] https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/gcm.cpp#L1397-L1404 [3] https://github.com/robcasloz/jdk/commit/e0ab1e418b81c0acddff2190ca57b3335b5214ba [4] https://github.com/robcasloz/jdk/commit/e0ab1e418b81c0acddff2190ca57b3335b5214ba#diff-554fddca91406a67dc0f8faee12dc30c709181685a0add7f4ba9ae5ace68f192R2031-R2032 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1715387255 From wkemper at openjdk.org Tue Aug 13 16:20:51 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 13 Aug 2024 16:20:51 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v4] In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 13:19:12 GMT, Neethu Prasad wrote: >> **Revision 2 Notes** >> 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. >> 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. >> >> **Revision 1 Notes** >> This PR adds the following >> 1. info logging on number of SATB flush attempts >> 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. >> >> As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. >> >> [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns >> [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns >> >> >> **Testing** >> 1. tier1, tier2 and hotspot_gc_shenandoah tests. >> 2. **-Xlog:gc+stats=info** >> >> >> [37.087s][info][gc,stats] CMR: VM Strong Roots 413 us, workers (us): 64, 57, 52, 47, 38, 31, 30, 25, 20, 21, 17, 10, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [37.087s][info][gc,stats] CMR: CLDG Roots 449 us, workers (us): 4, ---, ---, 406, ---, 15, ---, 4, 4, ---, ---, 17, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [37.087s][info][gc,stats] Concurrent Marking 5002 us >> [37.087s][info][gc,stats] SATB Flush Rendezvous 1748 us >> [37.087s][info][gc,stats] Pause Final Mark (G) 57272 us >> [37.087s][info][gc,stats] Pause Final Mark (N) 56985 us >> [37.087s][info][gc,stats] Finish Mark 387 us >> [37.087s][info][gc,stats] Update Region States 109 us >> [37.087s][info][gc,stats] Choose Collection Set 56395 us >> [37.087s][info][gc,stats] Rebuild Free Set 40 us >> >> >> on app termination >> >> >> [40.640s][info][gc,stats] Concurrent Reset = 0.914 s (a = 65255 us) (n = 14) (lvls, us = 54883, 55859, 63867, 65234, 97096) >> [40.640s][info][gc,stats] Pause Init Mark (G) = 1.755 s (a = 125380 us) (n = 14) (lvls, us = 119141, 123047, 125000, 125000, 128042) >> [40.640s][info][gc,stats] Pause Init Mark (N) = 1.697 s (a = 121241 us... > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > Address feedback on code style Suggest changing phase name in logs for consistency with https://github.com/openjdk/jdk/pull/20549. src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp line 60: > 58: SHENANDOAH_PAR_PHASE_DO(conc_mark_roots, " CMR: ", f) \ > 59: f(conc_mark, "Concurrent Marking") \ > 60: f(conc_mark_satb_flush_rendezvous, " SATB Flush Rendezvous") \ Could this be "Flush SATB" or "Flush SATB Handshakes" for consistency with https://github.com/openjdk/jdk/pull/20549? ------------- Changes requested by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/20318#pullrequestreview-2236015311 PR Review Comment: https://git.openjdk.org/jdk/pull/20318#discussion_r1715576931 From shade at openjdk.org Tue Aug 13 16:24:03 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Aug 2024 16:24:03 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v4] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 16:17:53 GMT, William Kemper wrote: >> Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: >> >> Address feedback on code style > > src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp line 60: > >> 58: SHENANDOAH_PAR_PHASE_DO(conc_mark_roots, " CMR: ", f) \ >> 59: f(conc_mark, "Concurrent Marking") \ >> 60: f(conc_mark_satb_flush_rendezvous, " SATB Flush Rendezvous") \ > > Could this be "Flush SATB" or "Flush SATB Handshakes" for consistency with https://github.com/openjdk/jdk/pull/20549? `Flush SATB` would be good, I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20318#discussion_r1715581820 From wkemper at openjdk.org Tue Aug 13 16:24:58 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 13 Aug 2024 16:24:58 GMT Subject: RFR: 8335865: Shenandoah: Improve THP pretouch after JDK-8315923 In-Reply-To: References: Message-ID: On Fri, 19 Jul 2024 14:28:24 GMT, Neethu Prasad wrote: > **Notes** > os::pretouch is now using madvice now when available and has a fall back to using vm page size [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923) > Hence removing code that sets _pretouch_heap_page_size & _pretouch_bitmap_page_size in Shenandoah. > > **Testing** > > * Ran test in Linux 5.10 and Linux 6.x and confirmed that there is no regression. I could not replicate the issue or performance improvement though. [add results] > * Ran [TestTransparentHugePageUsage](https://github.com/openjdk/jdk/commit/a65a89522d2f24b1767e1c74f6689a22ea32ca6a) for Shenandoah and verified that test passed > * Ran tier 1, tier 2 , tier1_gc_shenandoah, tier2_gc_shenandoah, tier3_gc_shenandoah and hotspot_gc_shenandoah. Marked as reviewed by wkemper (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20254#pullrequestreview-2236025646 From nprasad at openjdk.org Tue Aug 13 16:37:14 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Tue, 13 Aug 2024 16:37:14 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v5] In-Reply-To: References: Message-ID: > **Revision 2 Notes** > 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. > 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. > > **Revision 1 Notes** > This PR adds the following > 1. info logging on number of SATB flush attempts > 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. > > As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. > > [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns > [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns > > > **Testing** > 1. tier1, tier2 and hotspot_gc_shenandoah tests. > 2. **-Xlog:gc+stats=info** > > > [4.911s][info][gc,stats] CMR: VM Strong Roots 539 us, workers (us): 102, 89, 89, 79, 46, 40, 40, 25, 24, 1, 2, 2, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, > [4.911s][info][gc,stats] CMR: CLDG Roots 461 us, workers (us): 429, 5, 12, 11, 4, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, > [4.911s][info][gc,stats] Concurrent Marking 4392 us > [4.911s][info][gc,stats] Flush SATB 1035 us > [4.911s][info][gc,stats] Pause Final Mark (G) 2615 us > [4.912s][info][gc,stats] Pause Final Mark (N) 2339 us > [4.912s][info][gc,stats] Finish Mark 780 us > [4.912s][info][gc,stats] Update Region States 109 us > [4.912s][info][gc,stats] Choose Collection Set 1336 us > [4.912s][info][gc,stats] Rebuild Free Set 23 us > > > on app termination > > > 4.924s][info][gc,stats] Concurrent Reset = 0.042 s (a = 1846 us) (n = 23) (lvls, us = 1113, 1660, 1895, 2031, 2674) > [4.924s][info][gc,stats] Pause Init Mark (G) = 0.073 s (a = 3163 us) (n = 23) (lvls, us = 2812, 2949, 3047, 3281, 3790) > [4.924s][info][gc,stats] Pause Init Mark (N) = 0.065 s (a = 2810 us) (n = 23) (lvls, us = 2578, 2676, 2793, 2793, 3260) > [4.924s][info]... Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: Rename phase to Flush SATB ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20318/files - new: https://git.openjdk.org/jdk/pull/20318/files/9649c2ca..1609ed50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20318&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20318&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20318.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20318/head:pull/20318 PR: https://git.openjdk.org/jdk/pull/20318 From nprasad at openjdk.org Tue Aug 13 16:37:15 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Tue, 13 Aug 2024 16:37:15 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v4] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 16:21:23 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp line 60: >> >>> 58: SHENANDOAH_PAR_PHASE_DO(conc_mark_roots, " CMR: ", f) \ >>> 59: f(conc_mark, "Concurrent Marking") \ >>> 60: f(conc_mark_satb_flush_rendezvous, " SATB Flush Rendezvous") \ >> >> Could this be "Flush SATB" or "Flush SATB Handshakes" for consistency with https://github.com/openjdk/jdk/pull/20549? > > `Flush SATB` would be good, I think. will rename to "Flush STAB" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20318#discussion_r1715601134 From nprasad at openjdk.org Tue Aug 13 16:47:49 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Tue, 13 Aug 2024 16:47:49 GMT Subject: RFR: 8335865: Shenandoah: Improve THP pretouch after JDK-8315923 In-Reply-To: References: Message-ID: On Fri, 19 Jul 2024 14:28:24 GMT, Neethu Prasad wrote: > **Notes** > os::pretouch is now using madvice now when available and has a fall back to using vm page size [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923) > Hence removing code that sets _pretouch_heap_page_size & _pretouch_bitmap_page_size in Shenandoah. > > **Testing** > > * Ran test in Linux 5.10 and Linux 6.x and confirmed that there is no regression. I could not replicate the issue or performance improvement though. [add results] > * Ran [TestTransparentHugePageUsage](https://github.com/openjdk/jdk/commit/a65a89522d2f24b1767e1c74f6689a22ea32ca6a) for Shenandoah and verified that test passed > * Ran tier 1, tier 2 , tier1_gc_shenandoah, tier2_gc_shenandoah, tier3_gc_shenandoah and hotspot_gc_shenandoah. Thanks for review & approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20254#issuecomment-2286682085 From duke at openjdk.org Tue Aug 13 16:47:49 2024 From: duke at openjdk.org (duke) Date: Tue, 13 Aug 2024 16:47:49 GMT Subject: RFR: 8335865: Shenandoah: Improve THP pretouch after JDK-8315923 In-Reply-To: References: Message-ID: On Fri, 19 Jul 2024 14:28:24 GMT, Neethu Prasad wrote: > **Notes** > os::pretouch is now using madvice now when available and has a fall back to using vm page size [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923) > Hence removing code that sets _pretouch_heap_page_size & _pretouch_bitmap_page_size in Shenandoah. > > **Testing** > > * Ran test in Linux 5.10 and Linux 6.x and confirmed that there is no regression. I could not replicate the issue or performance improvement though. [add results] > * Ran [TestTransparentHugePageUsage](https://github.com/openjdk/jdk/commit/a65a89522d2f24b1767e1c74f6689a22ea32ca6a) for Shenandoah and verified that test passed > * Ran tier 1, tier 2 , tier1_gc_shenandoah, tier2_gc_shenandoah, tier3_gc_shenandoah and hotspot_gc_shenandoah. @neethu-prasad Your change (at version 736fff5738887dedf1d44a73ec50f6baa3df2fb0) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20254#issuecomment-2286684586 From nprasad at openjdk.org Tue Aug 13 16:50:25 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Tue, 13 Aug 2024 16:50:25 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v6] In-Reply-To: References: Message-ID: > **Revision 2 Notes** > 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. > 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. > > **Revision 1 Notes** > This PR adds the following > 1. info logging on number of SATB flush attempts > 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. > > As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. > > [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns > [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns > > > **Testing** > 1. tier1, tier2 and hotspot_gc_shenandoah tests. > 2. **-Xlog:gc+stats=info** > > > [4.911s][info][gc,stats] CMR: VM Strong Roots 539 us, workers (us): 102, 89, 89, 79, 46, 40, 40, 25, 24, 1, 2, 2, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, > [4.911s][info][gc,stats] CMR: CLDG Roots 461 us, workers (us): 429, 5, 12, 11, 4, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, > [4.911s][info][gc,stats] Concurrent Marking 4392 us > [4.911s][info][gc,stats] Flush SATB 1035 us > [4.911s][info][gc,stats] Pause Final Mark (G) 2615 us > [4.912s][info][gc,stats] Pause Final Mark (N) 2339 us > [4.912s][info][gc,stats] Finish Mark 780 us > [4.912s][info][gc,stats] Update Region States 109 us > [4.912s][info][gc,stats] Choose Collection Set 1336 us > [4.912s][info][gc,stats] Rebuild Free Set 23 us > > > on app termination > > > 4.924s][info][gc,stats] Concurrent Reset = 0.042 s (a = 1846 us) (n = 23) (lvls, us = 1113, 1660, 1895, 2031, 2674) > [4.924s][info][gc,stats] Pause Init Mark (G) = 0.073 s (a = 3163 us) (n = 23) (lvls, us = 2812, 2949, 3047, 3281, 3790) > [4.924s][info][gc,stats] Pause Init Mark (N) = 0.065 s (a = 2810 us) (n = 23) (lvls, us = 2578, 2676, 2793, 2793, 3260) > [4.924s][info]... Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: Fix indentation and naming ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20318/files - new: https://git.openjdk.org/jdk/pull/20318/files/1609ed50..da599599 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20318&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20318&range=04-05 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20318.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20318/head:pull/20318 PR: https://git.openjdk.org/jdk/pull/20318 From shade at openjdk.org Tue Aug 13 16:50:25 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Aug 2024 16:50:25 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v6] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 16:46:54 GMT, Neethu Prasad wrote: >> **Revision 2 Notes** >> 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. >> 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. >> >> **Revision 1 Notes** >> This PR adds the following >> 1. info logging on number of SATB flush attempts >> 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. >> >> As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. >> >> [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns >> [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns >> >> >> **Testing** >> 1. tier1, tier2 and hotspot_gc_shenandoah tests. >> 2. **-Xlog:gc+stats=info** >> >> >> [4.911s][info][gc,stats] CMR: VM Strong Roots 539 us, workers (us): 102, 89, 89, 79, 46, 40, 40, 25, 24, 1, 2, 2, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [4.911s][info][gc,stats] CMR: CLDG Roots 461 us, workers (us): 429, 5, 12, 11, 4, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [4.911s][info][gc,stats] Concurrent Marking 4392 us >> [4.911s][info][gc,stats] Flush SATB 1035 us >> [4.911s][info][gc,stats] Pause Final Mark (G) 2615 us >> [4.912s][info][gc,stats] Pause Final Mark (N) 2339 us >> [4.912s][info][gc,stats] Finish Mark 780 us >> [4.912s][info][gc,stats] Update Region States 109 us >> [4.912s][info][gc,stats] Choose Collection Set 1336 us >> [4.912s][info][gc,stats] Rebuild Free Set 23 us >> >> >> on app termination >> >> >> 4.924s][info][gc,stats] Concurrent Reset = 0.042 s (a = 1846 us) (n = 23) (lvls, us = 1113, 1660, 1895, 2031, 2674) >> [4.924s][info][gc,stats] Pause Init Mark (G) = 0.073 s (a = 3163 us) (n = 23) (lvls, us = 2812, 2949, 3047, 3281, 3790) >> [4.924s][info][gc,stats] Pause Init Mark (N) = 0.065 s (a = 2810 us) (n = 23) ... > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation and naming Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20318#pullrequestreview-2236075482 From wkemper at openjdk.org Tue Aug 13 16:55:50 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 13 Aug 2024 16:55:50 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v2] In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 18:50:47 GMT, Aleksey Shipilev wrote: >> The expected behavior of `CollectedHeap::is_in` is to check whether the object belongs to the committed parts of the heap: >> https://github.com/openjdk/jdk/blob/d19ba81ce12a99de1114c1bfe67392f5aee2104e/src/hotspot/share/gc/shared/collectedHeap.hpp#L273-L276 >> >> This is useful to check if object resides in the parts of the heap the GC knows are not dead. Yet, Shenandoah's check just verifies that oop is within the heap bounds. So `is_in` check for an object that is in trashed/empty region would pass by accident, and we will miss detecting bugs. This should be rectified. I believe "committed" is too weak for the test as well, since we really want to know if we can touch the object, i.e. if it is in active region. >> >> I re-wired assertions/verification code to be clear whether we check for heap bounds or actual in-heap conditions. >> >> Deeper testing revealed that reference processing code potentially loads a dead referent, but only to null-check it, or ask bitmap about it. Still, more precise `in_heap` check fails asserts in `CompressedOops::decode`. That required a bit of touchup as well. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` > > Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: > > - Style touchups > - Fixing ShenandoahReferenceProcessor > - Verifier fix Changes requested by wkemper (Committer). src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 64: > 62: void* _interior_loc; > 63: oop _loc; > 64: ReferenceIterationMode _ref_mode; I don't see where this new field is read. ------------- PR Review: https://git.openjdk.org/jdk/pull/20492#pullrequestreview-2236090705 PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1715627146 From wkemper at openjdk.org Tue Aug 13 16:59:48 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 13 Aug 2024 16:59:48 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v2] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 16:52:48 GMT, William Kemper wrote: >> Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: >> >> - Style touchups >> - Fixing ShenandoahReferenceProcessor >> - Verifier fix > > src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 64: > >> 62: void* _interior_loc; >> 63: oop _loc; >> 64: ReferenceIterationMode _ref_mode; > > I don't see where this new field is read. Okay, I see now that `reference_iteration_mode` overrides a virtual method defined in `OopIterateClosure`. Perhaps mark it with `override` for readability? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1715632948 From wkemper at openjdk.org Tue Aug 13 17:20:50 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 13 Aug 2024 17:20:50 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v6] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 16:50:25 GMT, Neethu Prasad wrote: >> **Revision 2 Notes** >> 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. >> 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. >> >> **Revision 1 Notes** >> This PR adds the following >> 1. info logging on number of SATB flush attempts >> 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. >> >> As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. >> >> [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns >> [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns >> >> >> **Testing** >> 1. tier1, tier2 and hotspot_gc_shenandoah tests. >> 2. **-Xlog:gc+stats=info** >> >> >> [4.911s][info][gc,stats] CMR: VM Strong Roots 539 us, workers (us): 102, 89, 89, 79, 46, 40, 40, 25, 24, 1, 2, 2, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [4.911s][info][gc,stats] CMR: CLDG Roots 461 us, workers (us): 429, 5, 12, 11, 4, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [4.911s][info][gc,stats] Concurrent Marking 4392 us >> [4.911s][info][gc,stats] Flush SATB 1035 us >> [4.911s][info][gc,stats] Pause Final Mark (G) 2615 us >> [4.912s][info][gc,stats] Pause Final Mark (N) 2339 us >> [4.912s][info][gc,stats] Finish Mark 780 us >> [4.912s][info][gc,stats] Update Region States 109 us >> [4.912s][info][gc,stats] Choose Collection Set 1336 us >> [4.912s][info][gc,stats] Rebuild Free Set 23 us >> >> >> on app termination >> >> >> 4.924s][info][gc,stats] Concurrent Reset = 0.042 s (a = 1846 us) (n = 23) (lvls, us = 1113, 1660, 1895, 2031, 2674) >> [4.924s][info][gc,stats] Pause Init Mark (G) = 0.073 s (a = 3163 us) (n = 23) (lvls, us = 2812, 2949, 3047, 3281, 3790) >> [4.924s][info][gc,stats] Pause Init Mark (N) = 0.065 s (a = 2810 us) (n = 23) ... > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation and naming Thank you! ------------- Marked as reviewed by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/20318#pullrequestreview-2236140224 From rkennke at openjdk.org Tue Aug 13 17:25:50 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 13 Aug 2024 17:25:50 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v6] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 16:50:25 GMT, Neethu Prasad wrote: >> **Revision 2 Notes** >> 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. >> 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. >> >> **Revision 1 Notes** >> This PR adds the following >> 1. info logging on number of SATB flush attempts >> 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. >> >> As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. >> >> [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns >> [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns >> >> >> **Testing** >> 1. tier1, tier2 and hotspot_gc_shenandoah tests. >> 2. **-Xlog:gc+stats=info** >> >> >> [4.911s][info][gc,stats] CMR: VM Strong Roots 539 us, workers (us): 102, 89, 89, 79, 46, 40, 40, 25, 24, 1, 2, 2, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [4.911s][info][gc,stats] CMR: CLDG Roots 461 us, workers (us): 429, 5, 12, 11, 4, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [4.911s][info][gc,stats] Concurrent Marking 4392 us >> [4.911s][info][gc,stats] Flush SATB 1035 us >> [4.911s][info][gc,stats] Pause Final Mark (G) 2615 us >> [4.912s][info][gc,stats] Pause Final Mark (N) 2339 us >> [4.912s][info][gc,stats] Finish Mark 780 us >> [4.912s][info][gc,stats] Update Region States 109 us >> [4.912s][info][gc,stats] Choose Collection Set 1336 us >> [4.912s][info][gc,stats] Rebuild Free Set 23 us >> >> >> on app termination >> >> >> 4.924s][info][gc,stats] Concurrent Reset = 0.042 s (a = 1846 us) (n = 23) (lvls, us = 1113, 1660, 1895, 2031, 2674) >> [4.924s][info][gc,stats] Pause Init Mark (G) = 0.073 s (a = 3163 us) (n = 23) (lvls, us = 2812, 2949, 3047, 3281, 3790) >> [4.924s][info][gc,stats] Pause Init Mark (N) = 0.065 s (a = 2810 us) (n = 23) ... > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation and naming Looks good, thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20318#pullrequestreview-2236150805 From nprasad at openjdk.org Tue Aug 13 17:26:00 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Tue, 13 Aug 2024 17:26:00 GMT Subject: Integrated: 8335865: Shenandoah: Improve THP pretouch after JDK-8315923 In-Reply-To: References: Message-ID: On Fri, 19 Jul 2024 14:28:24 GMT, Neethu Prasad wrote: > **Notes** > os::pretouch is now using madvice now when available and has a fall back to using vm page size [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923) > Hence removing code that sets _pretouch_heap_page_size & _pretouch_bitmap_page_size in Shenandoah. > > **Testing** > > * Ran test in Linux 5.10 and Linux 6.x and confirmed that there is no regression. I could not replicate the issue or performance improvement though. [add results] > * Ran [TestTransparentHugePageUsage](https://github.com/openjdk/jdk/commit/a65a89522d2f24b1767e1c74f6689a22ea32ca6a) for Shenandoah and verified that test passed > * Ran tier 1, tier 2 , tier1_gc_shenandoah, tier2_gc_shenandoah, tier3_gc_shenandoah and hotspot_gc_shenandoah. This pull request has now been integrated. Changeset: 84c3065e Author: Neethu Prasad Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/84c3065e8004122f3455a8c28c8719b2c8111c17 Stats: 18 lines in 1 file changed: 0 ins; 17 del; 1 mod 8335865: Shenandoah: Improve THP pretouch after JDK-8315923 Reviewed-by: shade, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/20254 From iwalulya at openjdk.org Tue Aug 13 17:44:20 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 13 Aug 2024 17:44:20 GMT Subject: RFR: 8338315: G1: G1CardTableEntryClosure:do_card_ptr remove unused parameter worker_id Message-ID: <6YCCjq764PLV3aGlOE-c-auXEqByRrRLqo4MG9xBwzQ=.7cc330a4-22e7-4f4e-bda5-1d33bb417274@github.com> Please review this trivial change to remove an unused parameter worker_id in G1CardTableEntryClosure:do_card_ptr. ------------- Commit messages: - cleanup Changes: https://git.openjdk.org/jdk/pull/20569/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20569&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338315 Stats: 6 lines in 3 files changed: 1 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20569.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20569/head:pull/20569 PR: https://git.openjdk.org/jdk/pull/20569 From rkennke at openjdk.org Tue Aug 13 17:59:49 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 13 Aug 2024 17:59:49 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v2] In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 18:50:47 GMT, Aleksey Shipilev wrote: >> The expected behavior of `CollectedHeap::is_in` is to check whether the object belongs to the committed parts of the heap: >> https://github.com/openjdk/jdk/blob/d19ba81ce12a99de1114c1bfe67392f5aee2104e/src/hotspot/share/gc/shared/collectedHeap.hpp#L273-L276 >> >> This is useful to check if object resides in the parts of the heap the GC knows are not dead. Yet, Shenandoah's check just verifies that oop is within the heap bounds. So `is_in` check for an object that is in trashed/empty region would pass by accident, and we will miss detecting bugs. This should be rectified. I believe "committed" is too weak for the test as well, since we really want to know if we can touch the object, i.e. if it is in active region. >> >> I re-wired assertions/verification code to be clear whether we check for heap bounds or actual in-heap conditions. >> >> Deeper testing revealed that reference processing code potentially loads a dead referent, but only to null-check it, or ask bitmap about it. Still, more precise `in_heap` check fails asserts in `CompressedOops::decode`. That required a bit of touchup as well. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` > > Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: > > - Style touchups > - Fixing ShenandoahReferenceProcessor > - Verifier fix Mostly looks good, I have a few suggestions. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 748: > 746: } > 747: > 748: bool ShenandoahHeap::is_in_bounds(const void* p) const { Is this the same as is_in_reserved()? Do we even need that new method? src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 759: > 757: // objects during Full GC across the regions in not yet determinate state. > 758: return is_full_gc_move_in_progress() || > 759: heap_region_containing(p)->is_active(); Should this also check against the region bounds? src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.cpp line 96: > 94: // Raw referent, it can be dead. You cannot dereference it, only use for nullptr > 95: // and bitmap checks. The decoding uses a special-case inlined CompressedOops::decode > 96: // method that bypasses normal oop-ness checks. If you don't want to be treated like an actual oop, you could return a HeapWord* instead. That's still good enough for null- and bitmap-checking. Not sure if it causes a lot of casting around, if so then it's probably not worth it. ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20492#pullrequestreview-2236206495 PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1715699307 PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1715702322 PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1715705338 From nprasad at openjdk.org Tue Aug 13 19:49:51 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Tue, 13 Aug 2024 19:49:51 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v6] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 16:50:25 GMT, Neethu Prasad wrote: >> **Revision 2 Notes** >> 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. >> 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. >> >> **Revision 1 Notes** >> This PR adds the following >> 1. info logging on number of SATB flush attempts >> 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. >> >> As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. >> >> [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns >> [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns >> >> >> **Testing** >> 1. tier1, tier2 and hotspot_gc_shenandoah tests. >> 2. **-Xlog:gc+stats=info** >> >> >> [4.911s][info][gc,stats] CMR: VM Strong Roots 539 us, workers (us): 102, 89, 89, 79, 46, 40, 40, 25, 24, 1, 2, 2, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [4.911s][info][gc,stats] CMR: CLDG Roots 461 us, workers (us): 429, 5, 12, 11, 4, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [4.911s][info][gc,stats] Concurrent Marking 4392 us >> [4.911s][info][gc,stats] Flush SATB 1035 us >> [4.911s][info][gc,stats] Pause Final Mark (G) 2615 us >> [4.912s][info][gc,stats] Pause Final Mark (N) 2339 us >> [4.912s][info][gc,stats] Finish Mark 780 us >> [4.912s][info][gc,stats] Update Region States 109 us >> [4.912s][info][gc,stats] Choose Collection Set 1336 us >> [4.912s][info][gc,stats] Rebuild Free Set 23 us >> >> >> on app termination >> >> >> 4.924s][info][gc,stats] Concurrent Reset = 0.042 s (a = 1846 us) (n = 23) (lvls, us = 1113, 1660, 1895, 2031, 2674) >> [4.924s][info][gc,stats] Pause Init Mark (G) = 0.073 s (a = 3163 us) (n = 23) (lvls, us = 2812, 2949, 3047, 3281, 3790) >> [4.924s][info][gc,stats] Pause Init Mark (N) = 0.065 s (a = 2810 us) (n = 23) ... > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation and naming Thanks for review & approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20318#issuecomment-2287005181 From duke at openjdk.org Tue Aug 13 19:49:51 2024 From: duke at openjdk.org (duke) Date: Tue, 13 Aug 2024 19:49:51 GMT Subject: RFR: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts [v6] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 16:50:25 GMT, Neethu Prasad wrote: >> **Revision 2 Notes** >> 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. >> 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. >> >> **Revision 1 Notes** >> This PR adds the following >> 1. info logging on number of SATB flush attempts >> 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. >> >> As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. >> >> [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns >> [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns >> >> >> **Testing** >> 1. tier1, tier2 and hotspot_gc_shenandoah tests. >> 2. **-Xlog:gc+stats=info** >> >> >> [4.911s][info][gc,stats] CMR: VM Strong Roots 539 us, workers (us): 102, 89, 89, 79, 46, 40, 40, 25, 24, 1, 2, 2, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [4.911s][info][gc,stats] CMR: CLDG Roots 461 us, workers (us): 429, 5, 12, 11, 4, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, >> [4.911s][info][gc,stats] Concurrent Marking 4392 us >> [4.911s][info][gc,stats] Flush SATB 1035 us >> [4.911s][info][gc,stats] Pause Final Mark (G) 2615 us >> [4.912s][info][gc,stats] Pause Final Mark (N) 2339 us >> [4.912s][info][gc,stats] Finish Mark 780 us >> [4.912s][info][gc,stats] Update Region States 109 us >> [4.912s][info][gc,stats] Choose Collection Set 1336 us >> [4.912s][info][gc,stats] Rebuild Free Set 23 us >> >> >> on app termination >> >> >> 4.924s][info][gc,stats] Concurrent Reset = 0.042 s (a = 1846 us) (n = 23) (lvls, us = 1113, 1660, 1895, 2031, 2674) >> [4.924s][info][gc,stats] Pause Init Mark (G) = 0.073 s (a = 3163 us) (n = 23) (lvls, us = 2812, 2949, 3047, 3281, 3790) >> [4.924s][info][gc,stats] Pause Init Mark (N) = 0.065 s (a = 2810 us) (n = 23) ... > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation and naming @neethu-prasad Your change (at version da599599e711a1b435b3ed6b7088f299a8008f7a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20318#issuecomment-2287007930 From nprasad at openjdk.org Tue Aug 13 19:58:54 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Tue, 13 Aug 2024 19:58:54 GMT Subject: Integrated: 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts In-Reply-To: References: Message-ID: On Wed, 24 Jul 2024 19:15:55 GMT, Neethu Prasad wrote: > **Revision 2 Notes** > 1. Added time spent on handshaking all threads requesting them to flush their SATB buffers as part of GC stats. > 2. As mentioned in PR feedback, will raise separate PR to adding logging in ShenandoahTimingsTracker. > > **Revision 1 Notes** > This PR adds the following > 1. info logging on number of SATB flush attempts > 3. total time spend on handshaking all threads requesting them to flush their SATB buffers. > > As suggested by William in [JDK-8336742 ](https://bugs.openjdk.org/browse/JDK-83367420), we can use handshake logging to get time spend and other stats for each handshake. > > [4.515s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1035, Total completion time: 597004 ns > [4.517s][info][handshake ] Handshake "Shenandoah Flush SATB Handshake", Targeted threads: 1036, Executed by requesting thread: 1033, Total completion time: 207402 ns > > > **Testing** > 1. tier1, tier2 and hotspot_gc_shenandoah tests. > 2. **-Xlog:gc+stats=info** > > > [4.911s][info][gc,stats] CMR: VM Strong Roots 539 us, workers (us): 102, 89, 89, 79, 46, 40, 40, 25, 24, 1, 2, 2, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, > [4.911s][info][gc,stats] CMR: CLDG Roots 461 us, workers (us): 429, 5, 12, 11, 4, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, ---, > [4.911s][info][gc,stats] Concurrent Marking 4392 us > [4.911s][info][gc,stats] Flush SATB 1035 us > [4.911s][info][gc,stats] Pause Final Mark (G) 2615 us > [4.912s][info][gc,stats] Pause Final Mark (N) 2339 us > [4.912s][info][gc,stats] Finish Mark 780 us > [4.912s][info][gc,stats] Update Region States 109 us > [4.912s][info][gc,stats] Choose Collection Set 1336 us > [4.912s][info][gc,stats] Rebuild Free Set 23 us > > > on app termination > > > 4.924s][info][gc,stats] Concurrent Reset = 0.042 s (a = 1846 us) (n = 23) (lvls, us = 1113, 1660, 1895, 2031, 2674) > [4.924s][info][gc,stats] Pause Init Mark (G) = 0.073 s (a = 3163 us) (n = 23) (lvls, us = 2812, 2949, 3047, 3281, 3790) > [4.924s][info][gc,stats] Pause Init Mark (N) = 0.065 s (a = 2810 us) (n = 23) (lvls, us = 2578, 2676, 2793, 2793, 3260) > [4.924s][info]... This pull request has now been integrated. Changeset: 90527a57 Author: Neethu Prasad Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/90527a57848f452be3be089a703cbc2af2d1657a Stats: 23 lines in 5 files changed: 10 ins; 1 del; 12 mod 8336742: Shenandoah: Add more verbose logging/stats for mark termination attempts Reviewed-by: shade, wkemper, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/20318 From mdoerr at openjdk.org Tue Aug 13 20:44:53 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 13 Aug 2024 20:44:53 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 14:21:01 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 203: >> >>> 201: // Do we need to load the previous value? >>> 202: if (obj != noreg) { >>> 203: __ load_heap_oop(pre_val, Address(obj, 0), noreg, noreg, AS_RAW); >> >> How do we handle implicit null checks for which `obj` is null? Note that we may expect the store instruction to trigger SIGSEGV. Does it work correctly if we trigger the SIGSEGV, here in the pre barrier? > > Good question! I have checked (no pun intended) and it turns out C2 never uses stores (or other memory access operations) with late-expanded barriers to perform implicit null checks. This is accidental, due to the fact that all these memory operations use MachTemp nodes in the C2 code generation stage (to reserve registers for their ADL TEMP operands). C2's implicit null check analysis requires that all inputs of a candidate memory operation dominate the null check [1], which fails if the operation uses MachTemp nodes, since these are always placed in the same basic block [2]. > > Note that this optimization triggers very rarely, if at all, for memory operations in the current early barrier expansion model, since the additional control flow of the barrier code obfuscates the analysis. For late barrier expansion, the analysis could be easily extended to recognize and hoist MachTemp nodes together with their user memory operation that is a candidate to implement the implicit null check [3], but that would require extending the barrier assembly emission step to populate the implicit null exception table correctly. Since this seems non-trivial, and would also affect other garbage collectors (ZGC), I suggest to simply assert for now that we do not generate implicit null checks for memory operations with barrier data (as in [4]), and leave full support for implicit null checks for these G1 and ZGC operations to a future RFE. What do you think? > > [1] https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/lcm.cpp#L310-L328 > [2] https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/gcm.cpp#L1397-L1404 > [3] https://github.com/robcasloz/jdk/commit/e0ab1e418b81c0acddff2190ca57b3335b5214ba > [4] https://github.com/robcasloz/jdk/commit/e0ab1e418b81c0acddff2190ca57b3335b5214ba#diff-554fddca91406a67dc0f8faee12dc30c709181685a0add7f4ba9ae5ace68f192R2031-R2032 Thanks for figuring it out! Makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1715904218 From tschatzl at openjdk.org Wed Aug 14 07:36:48 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 14 Aug 2024 07:36:48 GMT Subject: RFR: 8338315: G1: G1CardTableEntryClosure:do_card_ptr remove unused parameter worker_id In-Reply-To: <6YCCjq764PLV3aGlOE-c-auXEqByRrRLqo4MG9xBwzQ=.7cc330a4-22e7-4f4e-bda5-1d33bb417274@github.com> References: <6YCCjq764PLV3aGlOE-c-auXEqByRrRLqo4MG9xBwzQ=.7cc330a4-22e7-4f4e-bda5-1d33bb417274@github.com> Message-ID: On Tue, 13 Aug 2024 15:56:13 GMT, Ivan Walulya wrote: > Please review this trivial change to remove an unused parameter worker_id in G1CardTableEntryClosure:do_card_ptr. lgtm and trivial. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20569#pullrequestreview-2237381705 From tschatzl at openjdk.org Wed Aug 14 07:37:49 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 14 Aug 2024 07:37:49 GMT Subject: RFR: 8338280: Parallel: Inline ParallelCompactData::verify_clear In-Reply-To: References: Message-ID: <0g6abxqDduQ6-4wgtVMb-tma7bG1vxemujqp2WzdQ2E=.4af9c888-5d8f-4266-849d-5afed69bd07b@github.com> On Tue, 13 Aug 2024 07:28:06 GMT, Albert Mingkun Yang wrote: > Trivial inlining a method to its sole caller. lgtm and trivial. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20561#pullrequestreview-2237383271 From tschatzl at openjdk.org Wed Aug 14 08:04:59 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 14 Aug 2024 08:04:59 GMT Subject: RFR: 8336086: G1: Use one G1CardSet instance for all young regions [v2] In-Reply-To: References: Message-ID: <7eNeS75jdSKQrPi9x_XqB2iOX9St6fIIPq2tuFxtD7A=.d810310c-287b-45e1-8ff8-d6eece492c0d@github.com> On Mon, 5 Aug 2024 14:42:45 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign a single G1CardSet to all young regions. As young regions are collected at the same, and we do not have young-to-young remembered sets, we can maintain a single G1CardSet for all young regions. >> >> This reduces the memory overhead of the G1CardSets and the time taken to merge per region G1CardSets during GC pause. >> >> Testing: Tier 1-5 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Albert Review > - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet > - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet > - cleanup > - merge > - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet > - init Apologies for the late review. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3053: > 3051: > 3052: void G1CollectedHeap::prepare_group_cardsets_for_scan () { > 3053: _young_regions_cardset.reset_table_scanner(4); Please make that "4" a constant like "GroupBucketClaimSize" and put it next to `BucketClaimSize` with an appropriate comment ("claim size for groups should be smaller to facilitate work distribution across less but larger card sets" or so). src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 783: > 781: > 782: // Group cardsets > 783: G1CardSetMemoryManager _card_set_mm; Is it possible to rename this to `_young_regions_card_set_mm` to make it more clear this is the card set memory manager for young regions? ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20134#pullrequestreview-2237426651 PR Review Comment: https://git.openjdk.org/jdk/pull/20134#discussion_r1716472629 PR Review Comment: https://git.openjdk.org/jdk/pull/20134#discussion_r1716478692 From rcastanedalo at openjdk.org Wed Aug 14 08:29:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Aug 2024 08:29:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v7] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Rename 'HeapRegionBounds' to 'G1HeapRegionBounds' - Merge jdk-24+10 - Further motivate the choice of internal store address materialization in x64 - Give barrier generation helper functions a more consistent name - Also include HOTSPOT_TARGET_CPU_ARCH-based G1 ADL source file - Flatten barrier assembly generation code by removing helpers individual barrier tests and operations - Build barrier data in G1BarrierSetC2::get_store_barrier() by adding, rather than removing, barrier tags - Implement JEP 475 Co-authored-by: Erik ?sterlund, Siyao Liu, and Kim Barrett ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/d21104ca..88d28b9f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=05-06 Stats: 99129 lines in 2523 files changed: 60137 ins; 27053 del; 11939 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Wed Aug 14 08:29:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Aug 2024 08:29:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 13:57:27 GMT, Roberto Casta?eda Lozano wrote: > OK, I will test and push a merge of jdk-24+10 (Thu Aug 8) in the next days, unless @feilongjiang or @snazarkin object. We can then check in a few weeks if another update is required. Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2288146342 From fjiang at openjdk.org Wed Aug 14 09:12:53 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 14 Aug 2024 09:12:53 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: On Mon, 12 Aug 2024 12:10:28 GMT, Roberto Casta?eda Lozano wrote: > > Are you planning to rebase it with master ? Nothing important, but there were couple of failures which are fixed after this PR. So will make test result a bit clean for us ?. > > Actually, I have refrained to update to the latest mainline changes to avoid interfering with the porting work while it is in progress, but if there is consensus among the port maintainers I would be happy to update the changeset regularly. @TheRealMDoerr @feilongjiang @offamitkumar @snazarkin what do you prefer? I have already merged upstream commits on my local branch, so I'm fine with regular updates. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2288247680 From ayang at openjdk.org Wed Aug 14 09:15:50 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 14 Aug 2024 09:15:50 GMT Subject: RFR: 8338280: Parallel: Inline ParallelCompactData::verify_clear In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 07:28:06 GMT, Albert Mingkun Yang wrote: > Trivial inlining a method to its sole caller. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20561#issuecomment-2288252686 From iwalulya at openjdk.org Wed Aug 14 09:15:52 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 14 Aug 2024 09:15:52 GMT Subject: RFR: 8338315: G1: G1CardTableEntryClosure:do_card_ptr remove unused parameter worker_id In-Reply-To: <6YCCjq764PLV3aGlOE-c-auXEqByRrRLqo4MG9xBwzQ=.7cc330a4-22e7-4f4e-bda5-1d33bb417274@github.com> References: <6YCCjq764PLV3aGlOE-c-auXEqByRrRLqo4MG9xBwzQ=.7cc330a4-22e7-4f4e-bda5-1d33bb417274@github.com> Message-ID: On Tue, 13 Aug 2024 15:56:13 GMT, Ivan Walulya wrote: > Please review this trivial change to remove an unused parameter worker_id in G1CardTableEntryClosure:do_card_ptr. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20569#issuecomment-2288249364 From iwalulya at openjdk.org Wed Aug 14 09:15:52 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 14 Aug 2024 09:15:52 GMT Subject: Integrated: 8338315: G1: G1CardTableEntryClosure:do_card_ptr remove unused parameter worker_id In-Reply-To: <6YCCjq764PLV3aGlOE-c-auXEqByRrRLqo4MG9xBwzQ=.7cc330a4-22e7-4f4e-bda5-1d33bb417274@github.com> References: <6YCCjq764PLV3aGlOE-c-auXEqByRrRLqo4MG9xBwzQ=.7cc330a4-22e7-4f4e-bda5-1d33bb417274@github.com> Message-ID: <0rIbWTaCJis6MdfRDV3o6RJmkQLUS4B7hwfUgYCnhVk=.00416f9b-a67c-4af5-be23-e57856db4692@github.com> On Tue, 13 Aug 2024 15:56:13 GMT, Ivan Walulya wrote: > Please review this trivial change to remove an unused parameter worker_id in G1CardTableEntryClosure:do_card_ptr. This pull request has now been integrated. Changeset: 66bee253 Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/66bee2532f849cfb7ab63857ecd7d773c2566722 Stats: 6 lines in 3 files changed: 1 ins; 1 del; 4 mod 8338315: G1: G1CardTableEntryClosure:do_card_ptr remove unused parameter worker_id Reviewed-by: tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/20569 From ayang at openjdk.org Wed Aug 14 09:18:56 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 14 Aug 2024 09:18:56 GMT Subject: Integrated: 8338280: Parallel: Inline ParallelCompactData::verify_clear In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 07:28:06 GMT, Albert Mingkun Yang wrote: > Trivial inlining a method to its sole caller. This pull request has now been integrated. Changeset: 9fe1777f Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/9fe1777fafca30cf60acb5402c7c70800137136e Stats: 9 lines in 2 files changed: 0 ins; 6 del; 3 mod 8338280: Parallel: Inline ParallelCompactData::verify_clear Reviewed-by: tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/20561 From iwalulya at openjdk.org Wed Aug 14 12:26:18 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 14 Aug 2024 12:26:18 GMT Subject: RFR: 8336086: G1: Use one G1CardSet instance for all young regions [v3] In-Reply-To: References: Message-ID: <2NHqncCKqdSOs56Raz5Df2h72UORehowRe5Xf2ZFN4Q=.74dd0f89-e39c-46db-83ff-1cacf560321d@github.com> > Hi all, > > Please review this change to assign a single G1CardSet to all young regions. As young regions are collected at the same, and we do not have young-to-young remembered sets, we can maintain a single G1CardSet for all young regions. > > This reduces the memory overhead of the G1CardSets and the time taken to merge per region G1CardSets during GC pause. > > Testing: Tier 1-5 Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Thomas Review - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet - Albert Review - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet - cleanup - merge - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet - init ------------- Changes: https://git.openjdk.org/jdk/pull/20134/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20134&range=02 Stats: 183 lines in 21 files changed: 150 ins; 10 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/20134.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20134/head:pull/20134 PR: https://git.openjdk.org/jdk/pull/20134 From rcastanedalo at openjdk.org Wed Aug 14 12:38:51 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Aug 2024 12:38:51 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: <259a7NXcZVtVnc3vlOTN2eF4zPq3U_QBKDLNnvE1OJw=.894d8054-8947-40c2-a62d-1dd387477013@github.com> On Wed, 14 Aug 2024 09:10:10 GMT, Feilong Jiang wrote: > I have already merged upstream commits on my local branch, so I'm fine with regular updates. Thanks, let's go with this version and see if we need a new update in a few weeks (or, perhaps, all platforms have been ported by then ?). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2288628007 From tschatzl at openjdk.org Wed Aug 14 12:59:53 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 14 Aug 2024 12:59:53 GMT Subject: RFR: 8336086: G1: Use one G1CardSet instance for all young regions [v3] In-Reply-To: <2NHqncCKqdSOs56Raz5Df2h72UORehowRe5Xf2ZFN4Q=.74dd0f89-e39c-46db-83ff-1cacf560321d@github.com> References: <2NHqncCKqdSOs56Raz5Df2h72UORehowRe5Xf2ZFN4Q=.74dd0f89-e39c-46db-83ff-1cacf560321d@github.com> Message-ID: On Wed, 14 Aug 2024 12:26:18 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to assign a single G1CardSet to all young regions. As young regions are collected at the same, and we do not have young-to-young remembered sets, we can maintain a single G1CardSet for all young regions. >> >> This reduces the memory overhead of the G1CardSets and the time taken to merge per region G1CardSets during GC pause. >> >> Testing: Tier 1-5 > > Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Thomas Review > - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet > - Albert Review > - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet > - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet > - cleanup > - merge > - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet > - init Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20134#pullrequestreview-2238087414 From ayang at openjdk.org Wed Aug 14 13:10:14 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 14 Aug 2024 13:10:14 GMT Subject: RFR: 8338393: Parallel: Remove unused ParallelCompactData::clear_range Message-ID: Trivial removing dead code. ------------- Commit messages: - trivial Changes: https://git.openjdk.org/jdk/pull/20583/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20583&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338393 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20583/head:pull/20583 PR: https://git.openjdk.org/jdk/pull/20583 From rcastanedalo at openjdk.org Wed Aug 14 13:11:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Aug 2024 13:11:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v6] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 20:42:36 GMT, Martin Doerr wrote: >> Good question! I have checked (no pun intended) and it turns out C2 never uses stores (or other memory access operations) with late-expanded barriers to perform implicit null checks. This is accidental, due to the fact that all these memory operations use MachTemp nodes in the C2 code generation stage (to reserve registers for their ADL TEMP operands). C2's implicit null check analysis requires that all inputs of a candidate memory operation dominate the null check [1], which fails if the operation uses MachTemp nodes, since these are always placed in the same basic block [2]. >> >> Note that this optimization triggers very rarely, if at all, for memory operations in the current early barrier expansion model, since the additional control flow of the barrier code obfuscates the analysis. For late barrier expansion, the analysis could be easily extended to recognize and hoist MachTemp nodes together with their user memory operation that is a candidate to implement the implicit null check [3], but that would require extending the barrier assembly emission step to populate the implicit null exception table correctly. Since this seems non-trivial, and would also affect other garbage collectors (ZGC), I suggest to simply assert for now that we do not generate implicit null checks for memory operations with barrier data (as in [4]), and leave full support for implicit null checks for these G1 and ZGC operations to a future RFE. What do you think? >> >> [1] https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/lcm.cpp#L310-L328 >> [2] https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/gcm.cpp#L1397-L1404 >> [3] https://github.com/robcasloz/jdk/commit/e0ab1e418b81c0acddff2190ca57b3335b5214ba >> [4] https://github.com/robcasloz/jdk/commit/e0ab1e418b81c0acddff2190ca57b3335b5214ba#diff-554fddca91406a67dc0f8faee12dc30c709181685a0add7f4ba9ae5ace68f192R2031-R2032 > > Thanks for figuring it out! Makes sense. Added the assertion in commit 554de779. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1716895420 From rcastanedalo at openjdk.org Wed Aug 14 13:11:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Aug 2024 13:11:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v8] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Assert that no implicit null checks are generated for memory accesses with barriers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/88d28b9f..554de779 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=06-07 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From tschatzl at openjdk.org Wed Aug 14 14:00:48 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 14 Aug 2024 14:00:48 GMT Subject: RFR: 8338393: Parallel: Remove unused ParallelCompactData::clear_range In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 13:04:40 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. Lgtm and trivial. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20583#pullrequestreview-2238268319 From shade at openjdk.org Wed Aug 14 16:30:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 16:30:54 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v2] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 16:57:26 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 64: >> >>> 62: void* _interior_loc; >>> 63: oop _loc; >>> 64: ReferenceIterationMode _ref_mode; >> >> I don't see where this new field is read. > > Okay, I see now that `reference_iteration_mode` overrides a virtual method defined in `OopIterateClosure`. Perhaps mark it with `override` for readability? Right. Since `override` is all-or-nothing, I had to add it to other methods as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1717234461 From shade at openjdk.org Wed Aug 14 16:37:51 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 16:37:51 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v2] In-Reply-To: References: Message-ID: On Tue, 13 Aug 2024 17:50:45 GMT, Roman Kennke wrote: >> Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: >> >> - Style touchups >> - Fixing ShenandoahReferenceProcessor >> - Verifier fix > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 748: > >> 746: } >> 747: >> 748: bool ShenandoahHeap::is_in_bounds(const void* p) const { > > Is this the same as is_in_reserved()? Do we even need that new method? Right, we can use `is_in_reserved` instead. In fact, our current method that computes this from the region sizes is not necessary, as the heap regions cover the heap exactly. So we can just as for `is_in_reserved` everywhere, without any loss. Gonna massage the code a bit around this. > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 759: > >> 757: // objects during Full GC across the regions in not yet determinate state. >> 758: return is_full_gc_move_in_progress() || >> 759: heap_region_containing(p)->is_active(); > > Should this also check against the region bounds? Not sure I understand. The if-condition checks that we are pointing into heap. This means `heap_region_containing` always returns the region. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1717241078 PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1717242093 From rkennke at openjdk.org Wed Aug 14 16:42:52 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 14 Aug 2024 16:42:52 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v2] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 16:34:58 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 759: >> >>> 757: // objects during Full GC across the regions in not yet determinate state. >>> 758: return is_full_gc_move_in_progress() || >>> 759: heap_region_containing(p)->is_active(); >> >> Should this also check against the region bounds? > > Not sure I understand. The if-condition checks that we are pointing into heap. This means `heap_region_containing` always returns the region. heap_region_containing() returns the region for which bottom <= p < end, my question is if we should check if bottom <= p < top, in other words if p is within the region's currently allocated part (as opposed to the unallocated tail, if any). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1717248921 From wkemper at openjdk.org Wed Aug 14 16:45:49 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 14 Aug 2024 16:45:49 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v2] In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 18:50:47 GMT, Aleksey Shipilev wrote: >> The expected behavior of `CollectedHeap::is_in` is to check whether the object belongs to the committed parts of the heap: >> https://github.com/openjdk/jdk/blob/d19ba81ce12a99de1114c1bfe67392f5aee2104e/src/hotspot/share/gc/shared/collectedHeap.hpp#L273-L276 >> >> This is useful to check if object resides in the parts of the heap the GC knows are not dead. Yet, Shenandoah's check just verifies that oop is within the heap bounds. So `is_in` check for an object that is in trashed/empty region would pass by accident, and we will miss detecting bugs. This should be rectified. I believe "committed" is too weak for the test as well, since we really want to know if we can touch the object, i.e. if it is in active region. >> >> I re-wired assertions/verification code to be clear whether we check for heap bounds or actual in-heap conditions. >> >> Deeper testing revealed that reference processing code potentially loads a dead referent, but only to null-check it, or ask bitmap about it. Still, more precise `in_heap` check fails asserts in `CompressedOops::decode`. That required a bit of touchup as well. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` > > Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: > > - Style touchups > - Fixing ShenandoahReferenceProcessor > - Verifier fix Marked as reviewed by wkemper (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20492#pullrequestreview-2238701013 From ayang at openjdk.org Wed Aug 14 16:53:54 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 14 Aug 2024 16:53:54 GMT Subject: RFR: 8338393: Parallel: Remove unused ParallelCompactData::clear_range In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 13:04:40 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20583#issuecomment-2289300150 From ayang at openjdk.org Wed Aug 14 16:53:55 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 14 Aug 2024 16:53:55 GMT Subject: Integrated: 8338393: Parallel: Remove unused ParallelCompactData::clear_range In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 13:04:40 GMT, Albert Mingkun Yang wrote: > Trivial removing dead code. This pull request has now been integrated. Changeset: 0e3903f2 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/0e3903f2eb854715acee92cfc5ee2d4a2e800f61 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod 8338393: Parallel: Remove unused ParallelCompactData::clear_range Reviewed-by: tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/20583 From shade at openjdk.org Wed Aug 14 17:20:51 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 17:20:51 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v2] In-Reply-To: References: Message-ID: <0U2BXwyW7GJ0egzsAtm0gk9fHjaWMRC4KaYPRfZQk3s=.3719560b-9904-43bb-8303-f02dc2a0ceb3@github.com> On Tue, 13 Aug 2024 17:56:04 GMT, Roman Kennke wrote: >> Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: >> >> - Style touchups >> - Fixing ShenandoahReferenceProcessor >> - Verifier fix > > src/hotspot/share/gc/shenandoah/shenandoahReferenceProcessor.cpp line 96: > >> 94: // Raw referent, it can be dead. You cannot dereference it, only use for nullptr >> 95: // and bitmap checks. The decoding uses a special-case inlined CompressedOops::decode >> 96: // method that bypasses normal oop-ness checks. > > If you don't want to be treated like an actual oop, you could return a HeapWord* instead. That's still good enough for null- and bitmap-checking. Not sure if it causes a lot of casting around, if so then it's probably not worth it. Aha, true, let's do `HeapWord*` instead. This clearly shows this is raw memory. I thought it would cause lots of casting around, but I think the marking context actually accepts `HeapWord*` for its real methods, and we only need to massage its API a little bit. In fact, I think it _saves_ a bit on casts now, since we don't have to cast twice: once in `allocated_after_mark_start`, and then for bitmap itself. I have a version for this in the works. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1717297480 From shade at openjdk.org Wed Aug 14 17:58:11 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 17:58:11 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v3] In-Reply-To: References: Message-ID: > The expected behavior of `CollectedHeap::is_in` is to check whether the object belongs to the committed parts of the heap: > https://github.com/openjdk/jdk/blob/d19ba81ce12a99de1114c1bfe67392f5aee2104e/src/hotspot/share/gc/shared/collectedHeap.hpp#L273-L276 > > This is useful to check if object resides in the parts of the heap the GC knows are not dead. Yet, Shenandoah's check just verifies that oop is within the heap bounds. So `is_in` check for an object that is in trashed/empty region would pass by accident, and we will miss detecting bugs. This should be rectified. I believe "committed" is too weak for the test as well, since we really want to know if we can touch the object, i.e. if it is in active region. > > I re-wired assertions/verification code to be clear whether we check for heap bounds or actual in-heap conditions. > > Deeper testing revealed that reference processing code potentially loads a dead referent, but only to null-check it, or ask bitmap about it. Still, more precise `in_heap` check fails asserts in `CompressedOops::decode`. That required a bit of touchup as well. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'master' into JDK-8337981-shenandoah-is-in - Even stronger is_in check - Use is_in_reserved - Drop raw_referent to HeapWord* - Add some overrides - Merge branch 'master' into JDK-8337981-shenandoah-is-in - Style touchups - Fixing ShenandoahReferenceProcessor - Verifier fix - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20492/files - new: https://git.openjdk.org/jdk/pull/20492/files/69c66853..0a0859a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20492&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20492&range=01-02 Stats: 12566 lines in 452 files changed: 5421 ins; 5523 del; 1622 mod Patch: https://git.openjdk.org/jdk/pull/20492.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20492/head:pull/20492 PR: https://git.openjdk.org/jdk/pull/20492 From shade at openjdk.org Wed Aug 14 17:58:12 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 17:58:12 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v2] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 16:40:13 GMT, Roman Kennke wrote: >> Not sure I understand. The if-condition checks that we are pointing into heap. This means `heap_region_containing` always returns the region. > > heap_region_containing() returns the region for which bottom <= p < end, my question is if we should check if bottom <= p < top, in other words if p is within the region's currently allocated part (as opposed to the unallocated tail, if any). Right, we should check that as well. Done in new commit, but I have to re-run the tests now, because something might trigger on this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1717345515 From rkennke at openjdk.org Wed Aug 14 18:42:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 14 Aug 2024 18:42:54 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v3] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 17:58:11 GMT, Aleksey Shipilev wrote: >> The expected behavior of `CollectedHeap::is_in` is to check whether the object belongs to the committed parts of the heap: >> https://github.com/openjdk/jdk/blob/d19ba81ce12a99de1114c1bfe67392f5aee2104e/src/hotspot/share/gc/shared/collectedHeap.hpp#L273-L276 >> >> This is useful to check if object resides in the parts of the heap the GC knows are not dead. Yet, Shenandoah's check just verifies that oop is within the heap bounds. So `is_in` check for an object that is in trashed/empty region would pass by accident, and we will miss detecting bugs. This should be rectified. I believe "committed" is too weak for the test as well, since we really want to know if we can touch the object, i.e. if it is in active region. >> >> I re-wired assertions/verification code to be clear whether we check for heap bounds or actual in-heap conditions. >> >> Deeper testing revealed that reference processing code potentially loads a dead referent, but only to null-check it, or ask bitmap about it. Still, more precise `in_heap` check fails asserts in `CompressedOops::decode`. That required a bit of touchup as well. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into JDK-8337981-shenandoah-is-in > - Even stronger is_in check > - Use is_in_reserved > - Drop raw_referent to HeapWord* > - Add some overrides > - Merge branch 'master' into JDK-8337981-shenandoah-is-in > - Style touchups > - Fixing ShenandoahReferenceProcessor > - Verifier fix > - Fix Looks good now. You might want to improve the asserts similar to is_in() (up to you). src/hotspot/share/gc/shenandoah/shenandoahAsserts.cpp line 214: > 212: > 213: ShenandoahHeapRegion* obj_reg = heap->heap_region_containing(obj); > 214: if (!heap->is_full_gc_move_in_progress() && !obj_reg->is_active()) { This is essentially ShenandoahHeap::is_in(), right? Might also want to check for obj < region->top() here? Or use is_in() to begin with? src/hotspot/share/gc/shenandoah/shenandoahAsserts.cpp line 249: > 247: // Step 3. Check that forwardee points to correct region, unless we are in Full GC. > 248: if (!heap->is_full_gc_move_in_progress()) { > 249: if (!fwd_reg->is_active()) { Same here? src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 740: > 738: // Now check if we point to a live section in active region. > 739: ShenandoahHeapRegion* r = heap_region_containing(p); > 740: return (r->is_active() && p < r->top()); I'm kinda surprised that there is no ShHeapRegion::is_in() (or variants), but ok. *shrugs* ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20492#pullrequestreview-2238921696 PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1717387502 PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1717388712 PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1717391321 From duke at openjdk.org Wed Aug 14 22:18:57 2024 From: duke at openjdk.org (Satyen Subramaniam) Date: Wed, 14 Aug 2024 22:18:57 GMT Subject: RFR: 8336914: Shenandoah: Missing verification steps after JDK-8255765 Message-ID: Adding before-update-refs verification step which was removed in previous revision [JDK-8255765](https://bugs.openjdk.org/browse/JDK-8255765) as directed by [JDK-8336914](https://bugs.openjdk.org/browse/JDK-8336914) ------------- Commit messages: - Setting update ref to in-progress after verification - Adding verify_before_updaterefs() to concurrent op_init_updaterefs() Changes: https://git.openjdk.org/jdk/pull/20364/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20364&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336914 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20364.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20364/head:pull/20364 PR: https://git.openjdk.org/jdk/pull/20364 From duke at openjdk.org Wed Aug 14 22:19:01 2024 From: duke at openjdk.org (Satyen Subramaniam) Date: Wed, 14 Aug 2024 22:19:01 GMT Subject: RFR: 8336915: Shenandoah: Remove unused ShenandoahVerifier::verify_after_evacuation Message-ID: Removing the `verify_after_evacuation()` function, since last use was removed in [JDK-8240868](https://bugs.openjdk.org/browse/JDK-8240868) as directed by [JDK-8336915](https://bugs.openjdk.org/browse/JDK-8336915) ------------- Commit messages: - Removing verify_after_evacuation() Changes: https://git.openjdk.org/jdk/pull/20365/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20365&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336915 Stats: 13 lines in 2 files changed: 0 ins; 13 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20365.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20365/head:pull/20365 PR: https://git.openjdk.org/jdk/pull/20365 From shade at openjdk.org Wed Aug 14 22:18:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 22:18:57 GMT Subject: RFR: 8336914: Shenandoah: Missing verification steps after JDK-8255765 In-Reply-To: References: Message-ID: <9eeovjoPtca79OrDX2gGPPI-1nR_rT6lGCK0wzrLCmY=.edfd77b7-15ea-4dc9-b7df-dc03b8909cfe@github.com> On Fri, 26 Jul 2024 21:20:27 GMT, Satyen Subramaniam wrote: > Adding before-update-refs verification step which was removed in previous revision [JDK-8255765](https://bugs.openjdk.org/browse/JDK-8255765) as directed by [JDK-8336914](https://bugs.openjdk.org/browse/JDK-8336914) Looks right. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20364#pullrequestreview-2210596385 From shade at openjdk.org Wed Aug 14 22:19:01 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Aug 2024 22:19:01 GMT Subject: RFR: 8336915: Shenandoah: Remove unused ShenandoahVerifier::verify_after_evacuation In-Reply-To: References: Message-ID: On Fri, 26 Jul 2024 21:26:01 GMT, Satyen Subramaniam wrote: > Removing the `verify_after_evacuation()` function, since last use was removed in [JDK-8240868](https://bugs.openjdk.org/browse/JDK-8240868) as directed by [JDK-8336915](https://bugs.openjdk.org/browse/JDK-8336915) This looks fine and trivial. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20365#pullrequestreview-2210573281 From ayang at openjdk.org Thu Aug 15 07:05:15 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 15 Aug 2024 07:05:15 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC Message-ID: Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 ------------- Commit messages: - pgc-split-region Changes: https://git.openjdk.org/jdk/pull/20590/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20590&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338440 Stats: 569 lines in 2 files changed: 210 ins; 143 del; 216 mod Patch: https://git.openjdk.org/jdk/pull/20590.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20590/head:pull/20590 PR: https://git.openjdk.org/jdk/pull/20590 From iwalulya at openjdk.org Thu Aug 15 08:28:59 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 15 Aug 2024 08:28:59 GMT Subject: Integrated: 8336086: G1: Use one G1CardSet instance for all young regions In-Reply-To: References: Message-ID: On Thu, 11 Jul 2024 09:45:37 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to assign a single G1CardSet to all young regions. As young regions are collected at the same, and we do not have young-to-young remembered sets, we can maintain a single G1CardSet for all young regions. > > This reduces the memory overhead of the G1CardSets and the time taken to merge per region G1CardSets during GC pause. > > Testing: Tier 1-5 This pull request has now been integrated. Changeset: f536f5ab Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/f536f5ab68235d27e9708674f707bcbff7840730 Stats: 183 lines in 21 files changed: 150 ins; 10 del; 23 mod 8336086: G1: Use one G1CardSet instance for all young regions Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.org/jdk/pull/20134 From iwalulya at openjdk.org Thu Aug 15 08:28:58 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 15 Aug 2024 08:28:58 GMT Subject: RFR: 8336086: G1: Use one G1CardSet instance for all young regions [v2] In-Reply-To: References: Message-ID: On Thu, 8 Aug 2024 08:30:31 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - Albert Review >> - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet >> - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet >> - cleanup >> - merge >> - Merge remote-tracking branch 'upstream/master' into YoungOnlyCardSet >> - init > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk and @tschatzl for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20134#issuecomment-2290879903 From aboldtch at openjdk.org Thu Aug 15 08:44:48 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 15 Aug 2024 08:44:48 GMT Subject: RFR: 8337658: ZGC: Move soft reference handling out of the driver loop function In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 12:19:04 GMT, Stefan Karlsson wrote: > The ZDriver code is written to be neat and have a clear outline. The soft reference handling distracts when reading this code. I propose that we hide it a bit. > > I've also clarified in comments and names that the code is dealing with clearing of *all* references. lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20418#pullrequestreview-2239968081 From eosterlund at openjdk.org Thu Aug 15 08:56:48 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 15 Aug 2024 08:56:48 GMT Subject: RFR: 8337658: ZGC: Move soft reference handling out of the driver loop function In-Reply-To: References: Message-ID: On Thu, 1 Aug 2024 12:19:04 GMT, Stefan Karlsson wrote: > The ZDriver code is written to be neat and have a clear outline. The soft reference handling distracts when reading this code. I propose that we hide it a bit. > > I've also clarified in comments and names that the code is dealing with clearing of *all* references. This code sure has moved around a lot! Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20418#pullrequestreview-2239990474 From shade at openjdk.org Thu Aug 15 11:13:58 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 11:13:58 GMT Subject: RFR: 8338444: Shenandoah: Remove ShenandoahHumongousThreshold tunable Message-ID: See the bug for rationale. Additional testing: - [ ] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` - [ ] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/20593/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20593&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338444 Stats: 246 lines in 10 files changed: 4 ins; 234 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20593/head:pull/20593 PR: https://git.openjdk.org/jdk/pull/20593 From rkennke at openjdk.org Thu Aug 15 15:32:50 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 15 Aug 2024 15:32:50 GMT Subject: RFR: 8338444: Shenandoah: Remove ShenandoahHumongousThreshold tunable In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 11:10:03 GMT, Aleksey Shipilev wrote: > See the bug for rationale. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` Yeah I agree, that makes sense. I don't think anybody would want humongous objects, if they can be avoided, there is no advantage in that, afaict. Patch looks good. ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20593#pullrequestreview-2240656481 From wkemper at openjdk.org Thu Aug 15 16:01:48 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 15 Aug 2024 16:01:48 GMT Subject: RFR: 8338444: Shenandoah: Remove ShenandoahHumongousThreshold tunable In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 11:10:03 GMT, Aleksey Shipilev wrote: > See the bug for rationale. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` Marked as reviewed by wkemper (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20593#pullrequestreview-2240724488 From ysr at openjdk.org Thu Aug 15 16:18:49 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 15 Aug 2024 16:18:49 GMT Subject: RFR: 8338444: Shenandoah: Remove ShenandoahHumongousThreshold tunable In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 11:10:03 GMT, Aleksey Shipilev wrote: > See the bug for rationale. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` Marked as reviewed by ysr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20593#pullrequestreview-2240762386 From shade at openjdk.org Thu Aug 15 16:48:52 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 16:48:52 GMT Subject: RFR: 8338444: Shenandoah: Remove ShenandoahHumongousThreshold tunable In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 11:10:03 GMT, Aleksey Shipilev wrote: > See the bug for rationale. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` Thank you all! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20593#issuecomment-2291697545 From shade at openjdk.org Thu Aug 15 16:48:53 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Aug 2024 16:48:53 GMT Subject: Integrated: 8338444: Shenandoah: Remove ShenandoahHumongousThreshold tunable In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 11:10:03 GMT, Aleksey Shipilev wrote: > See the bug for rationale. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` This pull request has now been integrated. Changeset: ef54af39 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/ef54af39883e76c80a3e012ed91b90973da51bb4 Stats: 246 lines in 10 files changed: 4 ins; 234 del; 8 mod 8338444: Shenandoah: Remove ShenandoahHumongousThreshold tunable Reviewed-by: rkennke, wkemper, ysr ------------- PR: https://git.openjdk.org/jdk/pull/20593 From duke at openjdk.org Thu Aug 15 16:49:58 2024 From: duke at openjdk.org (duke) Date: Thu, 15 Aug 2024 16:49:58 GMT Subject: RFR: 8336914: Shenandoah: Missing verification steps after JDK-8255765 In-Reply-To: References: Message-ID: On Fri, 26 Jul 2024 21:20:27 GMT, Satyen Subramaniam wrote: > Adding before-update-refs verification step which was removed in previous revision [JDK-8255765](https://bugs.openjdk.org/browse/JDK-8255765) as directed by [JDK-8336914](https://bugs.openjdk.org/browse/JDK-8336914) @satyenme Your change (at version 56ddb4a5c670bf7eeb4fa4f2436412ddb57a371c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20364#issuecomment-2291699568 From duke at openjdk.org Thu Aug 15 16:49:58 2024 From: duke at openjdk.org (Satyen Subramaniam) Date: Thu, 15 Aug 2024 16:49:58 GMT Subject: Integrated: 8336914: Shenandoah: Missing verification steps after JDK-8255765 In-Reply-To: References: Message-ID: On Fri, 26 Jul 2024 21:20:27 GMT, Satyen Subramaniam wrote: > Adding before-update-refs verification step which was removed in previous revision [JDK-8255765](https://bugs.openjdk.org/browse/JDK-8255765) as directed by [JDK-8336914](https://bugs.openjdk.org/browse/JDK-8336914) This pull request has now been integrated. Changeset: e51e40c2 Author: Satyen Subramaniam Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/e51e40c2b9f51d012c01407e0b8dadaab464753e Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod 8336914: Shenandoah: Missing verification steps after JDK-8255765 Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/20364 From duke at openjdk.org Thu Aug 15 16:50:55 2024 From: duke at openjdk.org (duke) Date: Thu, 15 Aug 2024 16:50:55 GMT Subject: RFR: 8336915: Shenandoah: Remove unused ShenandoahVerifier::verify_after_evacuation In-Reply-To: References: Message-ID: <5sqCY3fLA6fRX-P-k3PzQctEuAHZDpQawVkO418ZXo4=.ed67066d-9ac1-4ad2-8daa-7111f26d6dbd@github.com> On Fri, 26 Jul 2024 21:26:01 GMT, Satyen Subramaniam wrote: > Removing the `verify_after_evacuation()` function, since last use was removed in [JDK-8240868](https://bugs.openjdk.org/browse/JDK-8240868) as directed by [JDK-8336915](https://bugs.openjdk.org/browse/JDK-8336915) @satyenme Your change (at version b28884204d277460da82f309481724f9b87d2c28) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20365#issuecomment-2291701903 From duke at openjdk.org Thu Aug 15 16:50:55 2024 From: duke at openjdk.org (Satyen Subramaniam) Date: Thu, 15 Aug 2024 16:50:55 GMT Subject: Integrated: 8336915: Shenandoah: Remove unused ShenandoahVerifier::verify_after_evacuation In-Reply-To: References: Message-ID: <6b7P_bRKAyfp4a34tdXlaRMAaXD4Cdn6fK9BlEkXHHM=.9d00c3a5-258a-4390-b6ec-4ab285c20f40@github.com> On Fri, 26 Jul 2024 21:26:01 GMT, Satyen Subramaniam wrote: > Removing the `verify_after_evacuation()` function, since last use was removed in [JDK-8240868](https://bugs.openjdk.org/browse/JDK-8240868) as directed by [JDK-8336915](https://bugs.openjdk.org/browse/JDK-8336915) This pull request has now been integrated. Changeset: f308b2d5 Author: Satyen Subramaniam Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/f308b2d59672b39ddca502baff50ab20ab781047 Stats: 13 lines in 2 files changed: 0 ins; 13 del; 0 mod 8336915: Shenandoah: Remove unused ShenandoahVerifier::verify_after_evacuation Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/20365 From ayang at openjdk.org Fri Aug 16 07:11:58 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 16 Aug 2024 07:11:58 GMT Subject: RFR: 8338490: Serial: Move Generation::print_on to subclasses Message-ID: Trivial inlining a virtual method to subclasses and some cleanup to related methods. The gc-log is slightly updated due to the change of the name of generations. Log before&after shown below: # baseline [2.417s][debug][gc,heap] GC(0) def new generation total 153600K, used 76645K [0x000000060d800000, 0x0000000617ea0000, 0x00000006b3aa0000) [2.417s][debug][gc,heap] GC(0) eden space 136576K, 56% used [0x000000060d800000, 0x00000006122d9538, 0x0000000615d60000) [2.417s][debug][gc,heap] GC(0) from space 17024K, 0% used [0x0000000615d60000, 0x0000000615d60000, 0x0000000616e00000) [2.417s][debug][gc,heap] GC(0) to space 17024K, 0% used [0x0000000616e00000, 0x0000000616e00000, 0x0000000617ea0000) [2.417s][debug][gc,heap] GC(0) tenured generation total 341376K, used 0K [0x00000006b3aa0000, 0x00000006c8800000, 0x0000000800000000) [2.417s][debug][gc,heap] GC(0) the space 341376K, 0% used [0x00000006b3aa0000, 0x00000006b3aa0000, 0x00000006c8800000) # new [9.846s][debug][gc,heap] GC(0) DefNew total 153600K, used 71165K [0x000000060d800000, 0x0000000617ea0000, 0x00000006b3aa0000) [9.846s][debug][gc,heap] GC(0) eden space 136576K, 52% used [0x000000060d800000, 0x0000000611d7f708, 0x0000000615d60000) [9.846s][debug][gc,heap] GC(0) from space 17024K, 0% used [0x0000000615d60000, 0x0000000615d60000, 0x0000000616e00000) [9.846s][debug][gc,heap] GC(0) to space 17024K, 0% used [0x0000000616e00000, 0x0000000616e00000, 0x0000000617ea0000) [9.846s][debug][gc,heap] GC(0) Tenured total 341376K, used 0K [0x00000006b3aa0000, 0x00000006c8800000, 0x0000000800000000) [9.846s][debug][gc,heap] GC(0) the space 341376K, 0% used [0x00000006b3aa0000, 0x00000006b3aa0000, 0x00000006c8800000) ------------- Commit messages: - s1-print-on Changes: https://git.openjdk.org/jdk/pull/20607/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20607&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338490 Stats: 55 lines in 7 files changed: 16 ins; 26 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/20607.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20607/head:pull/20607 PR: https://git.openjdk.org/jdk/pull/20607 From rcastanedalo at openjdk.org Fri Aug 16 13:06:55 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 Aug 2024 13:06:55 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> Message-ID: On Sun, 21 Jul 2024 08:27:52 GMT, Martin Doerr wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Build barrier data in G1BarrierSetC2::get_store_barrier() by adding, rather than removing, barrier tags > > src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 123: > >> 121: if ((barrier_data() & G1C2BarrierPost) != 0) { >> 122: __ movl($tmp2$$Register, $src$$Register); >> 123: if ((barrier_data() & G1C2BarrierPostNotNull) == 0) { > > `decode_heap_oop` contains a null check in some cases which makes some of your code redundant. Optimization idea: In case of `(((barrier_data() & G1C2BarrierPostNotNull) == 0) && CompressedOops::base() != nullptr)` use a null check and bail out because there's nothing left to do if it's null. After that, we can always use `decode_heap_oop_not_null`. Also note that the latter supports specifying different src and dst registers which saves the extra move operation. Thanks for the suggestion, Martin! I have prototyped the optimization [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `RemoveRedundantNullChecks`), and in my opinion its expected benefit does not justify the additional complexity, especially since the scope is limited (in my earlier experiments, most of the stores are implemented with `g1EncodePAndStoreN` rather than `g1StoreN`, plus the optimization only applies to a specific compressed OOPs mode). I have run a few general-purpose benchmarks using a non-zero base compressed oops mode and the optimization did not yield any statistically significant improvement, but please let me know if you have any specific benchmark/configuration in mind and I can re-check. > src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 182: > >> 180: $tmp2$$Register /* pre_val */, >> 181: $tmp3$$Register /* tmp */, >> 182: RegSet::of($mem$$Register, $newval$$Register, $oldval$$Register) /* preserve */); > > The only value which can get overwritten is `oldval`. Optimization idea: Pass `oldval` to the SATB barrier. There is no load of the old value required. Thanks, I will test and apply this one: it is simple enough and, in a way, even more intuitive than the current solution. For reference, I prototyped it [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseOldValInPreBarriers`). It should also apply to `g1CompareAndSwapP`, right? > src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 301: > >> 299: RegSet::of($mem$$Register, $newval$$Register) /* preserve */); >> 300: __ movq($tmp1$$Register, $newval$$Register); >> 301: __ xchgq($newval$$Register, Address($mem$$Register, 0)); > > Optimization idea: Despite its name, `g1_pre_write_barrier` can be moved after the xchg operation because there's no safepoint within this MachNode. This allows avoiding loading the old value twice. Thanks, I also prototyped this [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseExchangedValueinPreBarriers`). The atomic barrier implementation in this PR is purposefully simple based on 1) the assumption that the cost of the atomic operations will dominate that of their barriers, and 2) the risk of introducing subtle bugs which can be difficult to catch by regular testing. Because of this, I feel hesitant to introduce this kind of optimizations for atomic operation barriers. But I am happy to reconsider, if you have any specific benchmark/configuration in mind where the benefit could outweigh the cost. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1719811953 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1719812882 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1719814312 From mdoerr at openjdk.org Fri Aug 16 20:58:53 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 16 Aug 2024 20:58:53 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> Message-ID: <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> On Fri, 16 Aug 2024 13:01:28 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 123: >> >>> 121: if ((barrier_data() & G1C2BarrierPost) != 0) { >>> 122: __ movl($tmp2$$Register, $src$$Register); >>> 123: if ((barrier_data() & G1C2BarrierPostNotNull) == 0) { >> >> `decode_heap_oop` contains a null check in some cases which makes some of your code redundant. Optimization idea: In case of `(((barrier_data() & G1C2BarrierPostNotNull) == 0) && CompressedOops::base() != nullptr)` use a null check and bail out because there's nothing left to do if it's null. After that, we can always use `decode_heap_oop_not_null`. Also note that the latter supports specifying different src and dst registers which saves the extra move operation. > > Thanks for the suggestion, Martin! I have prototyped the optimization [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `RemoveRedundantNullChecks`), and in my opinion its expected benefit does not justify the additional complexity, especially since the scope is limited (in my earlier experiments, most of the stores are implemented with `g1EncodePAndStoreN` rather than `g1StoreN`, plus the optimization only applies to a specific compressed OOPs mode). I have run a few general-purpose benchmarks using a non-zero base compressed oops mode and the optimization did not yield any statistically significant improvement, but please let me know if you have any specific benchmark/configuration in mind and I can re-check. Thanks for trying! I think I should try it on PPC64. The null check can be integrated into the oop decoding, so all Compressed Oops Modes would benefit. It could be that x86 is less sensitive to such optimizations. >> src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 182: >> >>> 180: $tmp2$$Register /* pre_val */, >>> 181: $tmp3$$Register /* tmp */, >>> 182: RegSet::of($mem$$Register, $newval$$Register, $oldval$$Register) /* preserve */); >> >> The only value which can get overwritten is `oldval`. Optimization idea: Pass `oldval` to the SATB barrier. There is no load of the old value required. > > Thanks, I will test and apply this one: it is simple enough and, in a way, even more intuitive than the current solution. For reference, I prototyped it [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseOldValInPreBarriers`). It should also apply to `g1CompareAndSwapP`, right? Exactly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1720338672 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1720340149 From mdoerr at openjdk.org Fri Aug 16 21:05:51 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 16 Aug 2024 21:05:51 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> Message-ID: <4aS0KysKaIPue1D60nootRPb8m7pdP_2hlPSwGUIk8w=.6b68b278-62ce-4df0-a6c6-c7fee40557aa@github.com> On Fri, 16 Aug 2024 13:03:51 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad line 301: >> >>> 299: RegSet::of($mem$$Register, $newval$$Register) /* preserve */); >>> 300: __ movq($tmp1$$Register, $newval$$Register); >>> 301: __ xchgq($newval$$Register, Address($mem$$Register, 0)); >> >> Optimization idea: Despite its name, `g1_pre_write_barrier` can be moved after the xchg operation because there's no safepoint within this MachNode. This allows avoiding loading the old value twice. > > Thanks, I also prototyped this [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseExchangedValueinPreBarriers`). The atomic barrier implementation in this PR is purposefully simple based on 1) the assumption that the cost of the atomic operations will dominate that of their barriers, and 2) the risk of introducing subtle bugs which can be difficult to catch by regular testing. Because of this, I feel hesitant to introduce this kind of optimizations for atomic operation barriers. But I am happy to reconsider, if you have any specific benchmark/configuration in mind where the benefit could outweigh the cost. Note that we had such an optimization already in C2: https://github.com/openjdk/jdk8u-dev/blob/4106121e0ae42d644e45c6eab9037874110ed670/hotspot/src/share/vm/opto/library_call.cpp#L3114 But, it's probably not a big deal. Maybe I can try it on PPC64 which may be more sensitive to accesses on contended memory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1720347473 From rcastanedalo at openjdk.org Mon Aug 19 08:53:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Aug 2024 08:53:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v9] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Pass oldval to the pre-barrier in g1CompareAndExchange/SwapP ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/554de779..92112802 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=07-08 Stats: 28 lines in 3 files changed: 12 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Mon Aug 19 08:53:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Aug 2024 08:53:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> Message-ID: On Fri, 16 Aug 2024 20:56:08 GMT, Martin Doerr wrote: >> Thanks, I will test and apply this one: it is simple enough and, in a way, even more intuitive than the current solution. For reference, I prototyped it [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseOldValInPreBarriers`). It should also apply to `g1CompareAndSwapP`, right? > > Exactly. Done (commit 9211280). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1721433361 From shade at openjdk.org Mon Aug 19 11:20:06 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Aug 2024 11:20:06 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v4] In-Reply-To: References: Message-ID: <1ozQDeJqM5Cr5jHU_vX7I5SW9BKm0ce6JWo4LqBZdcE=.9eb60c15-17ff-4a70-b0c4-4e132720e2d1@github.com> > The expected behavior of `CollectedHeap::is_in` is to check whether the object belongs to the committed parts of the heap: > https://github.com/openjdk/jdk/blob/d19ba81ce12a99de1114c1bfe67392f5aee2104e/src/hotspot/share/gc/shared/collectedHeap.hpp#L273-L276 > > This is useful to check if object resides in the parts of the heap the GC knows are not dead. Yet, Shenandoah's check just verifies that oop is within the heap bounds. So `is_in` check for an object that is in trashed/empty region would pass by accident, and we will miss detecting bugs. This should be rectified. I believe "committed" is too weak for the test as well, since we really want to know if we can touch the object, i.e. if it is in active region. > > I re-wired assertions/verification code to be clear whether we check for heap bounds or actual in-heap conditions. > > Deeper testing revealed that reference processing code potentially loads a dead referent, but only to null-check it, or ask bitmap about it. Still, more precise `in_heap` check fails asserts in `CompressedOops::decode`. That required a bit of touchup as well. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Review feedback - Merge branch 'master' into JDK-8337981-shenandoah-is-in - Merge branch 'master' into JDK-8337981-shenandoah-is-in - Even stronger is_in check - Use is_in_reserved - Drop raw_referent to HeapWord* - Add some overrides - Merge branch 'master' into JDK-8337981-shenandoah-is-in - Style touchups - Fixing ShenandoahReferenceProcessor - ... and 2 more: https://git.openjdk.org/jdk/compare/ed2af769...2da163d1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20492/files - new: https://git.openjdk.org/jdk/pull/20492/files/0a0859a4..2da163d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20492&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20492&range=02-03 Stats: 11187 lines in 265 files changed: 7236 ins; 2667 del; 1284 mod Patch: https://git.openjdk.org/jdk/pull/20492.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20492/head:pull/20492 PR: https://git.openjdk.org/jdk/pull/20492 From shade at openjdk.org Mon Aug 19 11:20:08 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Aug 2024 11:20:08 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v3] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 18:31:32 GMT, Roman Kennke wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8337981-shenandoah-is-in >> - Even stronger is_in check >> - Use is_in_reserved >> - Drop raw_referent to HeapWord* >> - Add some overrides >> - Merge branch 'master' into JDK-8337981-shenandoah-is-in >> - Style touchups >> - Fixing ShenandoahReferenceProcessor >> - Verifier fix >> - Fix > > src/hotspot/share/gc/shenandoah/shenandoahAsserts.cpp line 214: > >> 212: >> 213: ShenandoahHeapRegion* obj_reg = heap->heap_region_containing(obj); >> 214: if (!heap->is_full_gc_move_in_progress() && !obj_reg->is_active()) { > > This is essentially ShenandoahHeap::is_in(), right? Might also want to check for obj < region->top() here? Or use is_in() to begin with? All right, true. We might just use `is_in` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20492#discussion_r1721629410 From rcastanedalo at openjdk.org Mon Aug 19 12:19:51 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Aug 2024 12:19:51 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> Message-ID: On Fri, 16 Aug 2024 20:54:25 GMT, Martin Doerr wrote: > The null check can be integrated into the oop decoding, so all Compressed Oops Modes would benefit. But integrating the null check into the zero-based OOP decoding operation would require adding a conditional branch to OOP decoding (as prototyped [here](https://github.com/robcasloz/jdk/blob/ac71b1a02c8c1bd4989f762d27092fd3bf19ccd7/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5859-L5875)), right? This would effectively mean moving the post-barrier null check above the post-barrier inter-region check, which I am not sure is beneficial. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1721697447 From rcastanedalo at openjdk.org Mon Aug 19 12:22:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Aug 2024 12:22:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <4aS0KysKaIPue1D60nootRPb8m7pdP_2hlPSwGUIk8w=.6b68b278-62ce-4df0-a6c6-c7fee40557aa@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <4aS0KysKaIPue1D60nootRPb8m7pdP_2hlPSwGUIk8w=.6b68b278-62ce-4df0-a6c6-c7fee40557aa@github.com> Message-ID: On Fri, 16 Aug 2024 21:03:14 GMT, Martin Doerr wrote: >> Thanks, I also prototyped this [here](https://github.com/robcasloz/jdk/blob/JDK-8334060-g1-late-barrier-expansion-x64-optimizations/src/hotspot/cpu/x86/gc/g1/g1_x86_64.ad) (guarded by `UseExchangedValueinPreBarriers`). The atomic barrier implementation in this PR is purposefully simple based on 1) the assumption that the cost of the atomic operations will dominate that of their barriers, and 2) the risk of introducing subtle bugs which can be difficult to catch by regular testing. Because of this, I feel hesitant to introduce this kind of optimizations for atomic operation barriers. But I am happy to reconsider, if you have any specific benchmark/configuration in mind where the benefit could outweigh the cost. > > Note that we had such an optimization already in C2: https://github.com/openjdk/jdk8u-dev/blob/4106121e0ae42d644e45c6eab9037874110ed670/hotspot/src/share/vm/opto/library_call.cpp#L3114 > But, it's probably not a big deal. Maybe I can try it on PPC64 which may be more sensitive to accesses on contended memory. Thanks for the reference, I would still prefer to keep this part as is for simplicity. We can always optimize the atomic barriers in follow-up work, if a need arises. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1721701794 From duke at openjdk.org Mon Aug 19 13:23:00 2024 From: duke at openjdk.org (duke) Date: Mon, 19 Aug 2024 13:23:00 GMT Subject: Withdrawn: 8331432: Clean up comments in GenArguments::initialize_size_info() In-Reply-To: References: Message-ID: On Tue, 14 May 2024 00:40:33 GMT, xiaotaonan wrote: > Clean up comments in GenArguments::initialize_size_info() This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19223 From mdoerr at openjdk.org Mon Aug 19 13:45:55 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 19 Aug 2024 13:45:55 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> Message-ID: On Mon, 19 Aug 2024 12:16:44 GMT, Roberto Casta?eda Lozano wrote: >> Thanks for trying! I think I should try it on PPC64. The null check can be integrated into the oop decoding, so all Compressed Oops Modes would benefit. It could be that x86 is less sensitive to such optimizations. > >> The null check can be integrated into the oop decoding, so all Compressed Oops Modes would benefit. > > But integrating the null check into the zero-based OOP decoding operation would require adding a conditional branch to OOP decoding (as prototyped [here](https://github.com/robcasloz/jdk/blob/ac71b1a02c8c1bd4989f762d27092fd3bf19ccd7/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5859-L5875)), right? This would effectively mean moving the post-barrier null check above the post-barrier inter-region check, which I am not sure is beneficial. If case of heap base != null, a branch already exists which makes the other null check redundant. So, we have null check, region crossing check, another null check. Maybe this compressed oop mode is not important enough. For the other compressed oop modes, yes, this means moving the null check above the region crossing check. On PPC64, the null check can be combined with the shift instruction, so we save one compare instruction. Technically, it would even be possible to use only one branch instruction for both checks, but I'm not sure if it's worth the complexity. I'll think about it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1721813878 From rcastanedalo at openjdk.org Mon Aug 19 14:27:53 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Aug 2024 14:27:53 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> Message-ID: <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> On Mon, 19 Aug 2024 13:43:04 GMT, Martin Doerr wrote: >>> The null check can be integrated into the oop decoding, so all Compressed Oops Modes would benefit. >> >> But integrating the null check into the zero-based OOP decoding operation would require adding a conditional branch to OOP decoding (as prototyped [here](https://github.com/robcasloz/jdk/blob/ac71b1a02c8c1bd4989f762d27092fd3bf19ccd7/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5859-L5875)), right? This would effectively mean moving the post-barrier null check above the post-barrier inter-region check, which I am not sure is beneficial. > > If case of heap base != null, a branch already exists which makes the other null check redundant. So, we have null check, region crossing check, another null check. Maybe this compressed oop mode is not important enough. > > For the other compressed oop modes, yes, this means moving the null check above the region crossing check. On PPC64, the null check can be combined with the shift instruction, so we save one compare instruction. Technically, it would even be possible to use only one branch instruction for both checks, but I'm not sure if it's worth the complexity. I'll think about it. OK, thanks. I just ran some benchmarks with zero-based OOP compression ([prototype here](https://github.com/robcasloz/jdk/tree/JDK-8334060-g1-late-barrier-expansion-x64-optimizations)) and could not observe any significant performance effect on three different x64 implementations. I think I will keep the `g1StoreN` implementation as-is in the x64 and aarch64 backends, for simplicity. Again, we can revisit this in follow-up work if need be. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1721881065 From rkennke at openjdk.org Mon Aug 19 17:31:52 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 19 Aug 2024 17:31:52 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v4] In-Reply-To: <1ozQDeJqM5Cr5jHU_vX7I5SW9BKm0ce6JWo4LqBZdcE=.9eb60c15-17ff-4a70-b0c4-4e132720e2d1@github.com> References: <1ozQDeJqM5Cr5jHU_vX7I5SW9BKm0ce6JWo4LqBZdcE=.9eb60c15-17ff-4a70-b0c4-4e132720e2d1@github.com> Message-ID: On Mon, 19 Aug 2024 11:20:06 GMT, Aleksey Shipilev wrote: >> The expected behavior of `CollectedHeap::is_in` is to check whether the object belongs to the committed parts of the heap: >> https://github.com/openjdk/jdk/blob/d19ba81ce12a99de1114c1bfe67392f5aee2104e/src/hotspot/share/gc/shared/collectedHeap.hpp#L273-L276 >> >> This is useful to check if object resides in the parts of the heap the GC knows are not dead. Yet, Shenandoah's check just verifies that oop is within the heap bounds. So `is_in` check for an object that is in trashed/empty region would pass by accident, and we will miss detecting bugs. This should be rectified. I believe "committed" is too weak for the test as well, since we really want to know if we can touch the object, i.e. if it is in active region. >> >> I re-wired assertions/verification code to be clear whether we check for heap bounds or actual in-heap conditions. >> >> Deeper testing revealed that reference processing code potentially loads a dead referent, but only to null-check it, or ask bitmap about it. Still, more precise `in_heap` check fails asserts in `CompressedOops::decode`. That required a bit of touchup as well. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Review feedback > - Merge branch 'master' into JDK-8337981-shenandoah-is-in > - Merge branch 'master' into JDK-8337981-shenandoah-is-in > - Even stronger is_in check > - Use is_in_reserved > - Drop raw_referent to HeapWord* > - Add some overrides > - Merge branch 'master' into JDK-8337981-shenandoah-is-in > - Style touchups > - Fixing ShenandoahReferenceProcessor > - ... and 2 more: https://git.openjdk.org/jdk/compare/11b06bb1...2da163d1 Looks good, thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20492#pullrequestreview-2246095728 From shade at openjdk.org Mon Aug 19 18:51:52 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Aug 2024 18:51:52 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v2] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 16:43:02 GMT, William Kemper wrote: >> Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: >> >> - Style touchups >> - Fixing ShenandoahReferenceProcessor >> - Verifier fix > > Marked as reviewed by wkemper (Committer). @earthling-amzn -- this might cause more "fun" down the line with GenShen merges. Tell me when you want this to appear in mainline. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20492#issuecomment-2297219193 From wkemper at openjdk.org Mon Aug 19 23:30:51 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 19 Aug 2024 23:30:51 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v2] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 16:43:02 GMT, William Kemper wrote: >> Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: >> >> - Style touchups >> - Fixing ShenandoahReferenceProcessor >> - Verifier fix > > Marked as reviewed by wkemper (Committer). > @earthling-amzn -- this might cause more "fun" down the line with GenShen merges. Tell me when you want this to appear in mainline. I don't want the GenShen PR to hold anything up. Feel free to integrate, we'll deal with any repercussions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20492#issuecomment-2297687138 From gli at openjdk.org Tue Aug 20 08:33:48 2024 From: gli at openjdk.org (Guoxiong Li) Date: Tue, 20 Aug 2024 08:33:48 GMT Subject: RFR: 8337658: ZGC: Move soft reference handling out of the driver loop function In-Reply-To: References: Message-ID: <5CLxq71h2WzrhzVpRdRgw_QWeRnTnBL5YMQ6Zeo_8xU=.ed328b8d-0309-486f-8c36-e3d7f099411e@github.com> On Thu, 1 Aug 2024 12:19:04 GMT, Stefan Karlsson wrote: > The ZDriver code is written to be neat and have a clear outline. The soft reference handling distracts when reading this code. I propose that we hide it a bit. > > I've also clarified in comments and names that the code is dealing with clearing of *all* references. I don't know whether `_uses_clear_all_soft_references_policy` is a good name. I may be confused by the name if I read the code first time. One nit shown below. src/hotspot/share/gc/z/zReferenceProcessor.hpp line 73: > 71: > 72: void set_soft_reference_policy(bool clear_all_soft_references); > 73: bool uses_clear_all_soft_reference_policy() const; The method name should be `uses_clear_all_soft_references_policy`. Please note the letter `s` after the `reference`. ------------- Changes requested by gli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20418#pullrequestreview-2247312576 PR Review Comment: https://git.openjdk.org/jdk/pull/20418#discussion_r1722903702 From shade at openjdk.org Tue Aug 20 08:43:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 20 Aug 2024 08:43:54 GMT Subject: RFR: 8337981: ShenandoahHeap::is_in should check for alive regions [v4] In-Reply-To: <1ozQDeJqM5Cr5jHU_vX7I5SW9BKm0ce6JWo4LqBZdcE=.9eb60c15-17ff-4a70-b0c4-4e132720e2d1@github.com> References: <1ozQDeJqM5Cr5jHU_vX7I5SW9BKm0ce6JWo4LqBZdcE=.9eb60c15-17ff-4a70-b0c4-4e132720e2d1@github.com> Message-ID: On Mon, 19 Aug 2024 11:20:06 GMT, Aleksey Shipilev wrote: >> The expected behavior of `CollectedHeap::is_in` is to check whether the object belongs to the committed parts of the heap: >> https://github.com/openjdk/jdk/blob/d19ba81ce12a99de1114c1bfe67392f5aee2104e/src/hotspot/share/gc/shared/collectedHeap.hpp#L273-L276 >> >> This is useful to check if object resides in the parts of the heap the GC knows are not dead. Yet, Shenandoah's check just verifies that oop is within the heap bounds. So `is_in` check for an object that is in trashed/empty region would pass by accident, and we will miss detecting bugs. This should be rectified. I believe "committed" is too weak for the test as well, since we really want to know if we can touch the object, i.e. if it is in active region. >> >> I re-wired assertions/verification code to be clear whether we check for heap bounds or actual in-heap conditions. >> >> Deeper testing revealed that reference processing code potentially loads a dead referent, but only to null-check it, or ask bitmap about it. Still, more precise `in_heap` check fails asserts in `CompressedOops::decode`. That required a bit of touchup as well. >> >> Additional testing: >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` >> - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Review feedback > - Merge branch 'master' into JDK-8337981-shenandoah-is-in > - Merge branch 'master' into JDK-8337981-shenandoah-is-in > - Even stronger is_in check > - Use is_in_reserved > - Drop raw_referent to HeapWord* > - Add some overrides > - Merge branch 'master' into JDK-8337981-shenandoah-is-in > - Style touchups > - Fixing ShenandoahReferenceProcessor > - ... and 2 more: https://git.openjdk.org/jdk/compare/b2c2bda0...2da163d1 All right, thanks! Integrating now then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20492#issuecomment-2298300216 From shade at openjdk.org Tue Aug 20 08:43:55 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 20 Aug 2024 08:43:55 GMT Subject: Integrated: 8337981: ShenandoahHeap::is_in should check for alive regions In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 11:51:25 GMT, Aleksey Shipilev wrote: > The expected behavior of `CollectedHeap::is_in` is to check whether the object belongs to the committed parts of the heap: > https://github.com/openjdk/jdk/blob/d19ba81ce12a99de1114c1bfe67392f5aee2104e/src/hotspot/share/gc/shared/collectedHeap.hpp#L273-L276 > > This is useful to check if object resides in the parts of the heap the GC knows are not dead. Yet, Shenandoah's check just verifies that oop is within the heap bounds. So `is_in` check for an object that is in trashed/empty region would pass by accident, and we will miss detecting bugs. This should be rectified. I believe "committed" is too weak for the test as well, since we really want to know if we can touch the object, i.e. if it is in active region. > > I re-wired assertions/verification code to be clear whether we check for heap bounds or actual in-heap conditions. > > Deeper testing revealed that reference processing code potentially loads a dead referent, but only to null-check it, or ask bitmap about it. Still, more precise `in_heap` check fails asserts in `CompressedOops::decode`. That required a bit of touchup as well. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC` > - [x] Linux AArch64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` This pull request has now been integrated. Changeset: b9d49dce Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/b9d49dcef22ab81a087d890bbac0329a5244a2ef Stats: 118 lines in 12 files changed: 58 ins; 7 del; 53 mod 8337981: ShenandoahHeap::is_in should check for alive regions Reviewed-by: rkennke, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/20492 From gli at openjdk.org Tue Aug 20 08:55:49 2024 From: gli at openjdk.org (Guoxiong Li) Date: Tue, 20 Aug 2024 08:55:49 GMT Subject: RFR: 8338490: Serial: Move Generation::print_on to subclasses In-Reply-To: References: Message-ID: On Fri, 16 Aug 2024 07:07:36 GMT, Albert Mingkun Yang wrote: > Trivial inlining a virtual method to subclasses and some cleanup to related methods. > > The gc-log is slightly updated due to the change of the name of generations. Log before&after shown below: > > > # baseline > > [2.417s][debug][gc,heap] GC(0) def new generation total 153600K, used 76645K [0x000000060d800000, 0x0000000617ea0000, 0x00000006b3aa0000) > [2.417s][debug][gc,heap] GC(0) eden space 136576K, 56% used [0x000000060d800000, 0x00000006122d9538, 0x0000000615d60000) > [2.417s][debug][gc,heap] GC(0) from space 17024K, 0% used [0x0000000615d60000, 0x0000000615d60000, 0x0000000616e00000) > [2.417s][debug][gc,heap] GC(0) to space 17024K, 0% used [0x0000000616e00000, 0x0000000616e00000, 0x0000000617ea0000) > [2.417s][debug][gc,heap] GC(0) tenured generation total 341376K, used 0K [0x00000006b3aa0000, 0x00000006c8800000, 0x0000000800000000) > [2.417s][debug][gc,heap] GC(0) the space 341376K, 0% used [0x00000006b3aa0000, 0x00000006b3aa0000, 0x00000006c8800000) > > # new > > [9.846s][debug][gc,heap] GC(0) DefNew total 153600K, used 71165K [0x000000060d800000, 0x0000000617ea0000, 0x00000006b3aa0000) > [9.846s][debug][gc,heap] GC(0) eden space 136576K, 52% used [0x000000060d800000, 0x0000000611d7f708, 0x0000000615d60000) > [9.846s][debug][gc,heap] GC(0) from space 17024K, 0% used [0x0000000615d60000, 0x0000000615d60000, 0x0000000616e00000) > [9.846s][debug][gc,heap] GC(0) to space 17024K, 0% used [0x0000000616e00000, 0x0000000616e00000, 0x0000000617ea0000) > [9.846s][debug][gc,heap] GC(0) Tenured total 341376K, used 0K [0x00000006b3aa0000, 0x00000006c8800000, 0x0000000800000000) > [9.846s][debug][gc,heap] GC(0) the space 341376K, 0% used [0x00000006b3aa0000, 0x00000006b3aa0000, 0x00000006c8800000) src/hotspot/share/gc/serial/tenuredGeneration.cpp line 452: > 450: p2i(_virtual_space.high_boundary())); > 451: > 452: st->print(" the"); The new `st->print(" the");` (new line 452) misses one space. Please note the old line (old line 444) has three spaces. Is it your intention? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20607#discussion_r1722944590 From stefank at openjdk.org Tue Aug 20 09:01:48 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 20 Aug 2024 09:01:48 GMT Subject: RFR: 8337658: ZGC: Move soft reference handling out of the driver loop function In-Reply-To: <5CLxq71h2WzrhzVpRdRgw_QWeRnTnBL5YMQ6Zeo_8xU=.ed328b8d-0309-486f-8c36-e3d7f099411e@github.com> References: <5CLxq71h2WzrhzVpRdRgw_QWeRnTnBL5YMQ6Zeo_8xU=.ed328b8d-0309-486f-8c36-e3d7f099411e@github.com> Message-ID: <8gHhoUWbd1NXNwXEeUbqtPMIr6u_lP8CnKJrarncBZI=.b648e3ab-f275-4dc1-a5e9-e16ec3ff3e2b@github.com> On Tue, 20 Aug 2024 08:30:46 GMT, Guoxiong Li wrote: > I don't know whether _uses_clear_all_soft_references_policy is a good name. I may be confused by the name if I read the code first time. Could you elaborate on what the problem with the name is? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20418#issuecomment-2298341743 From ayang at openjdk.org Tue Aug 20 09:16:20 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 20 Aug 2024 09:16:20 GMT Subject: RFR: 8338490: Serial: Move Generation::print_on to subclasses [v2] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 08:53:38 GMT, Guoxiong Li wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/gc/serial/tenuredGeneration.cpp line 452: > >> 450: p2i(_virtual_space.high_boundary())); >> 451: >> 452: st->print(" the"); > > The new `st->print(" the");` (new line 452) misses one space. Please note the old line (old line 444) has three spaces. Is it your intention? It was intentional, to make sure `the` is not indented too much. Now that I realize that using three-space was to align "space" in those log line. Reverted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20607#discussion_r1722975505 From ayang at openjdk.org Tue Aug 20 09:16:20 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 20 Aug 2024 09:16:20 GMT Subject: RFR: 8338490: Serial: Move Generation::print_on to subclasses [v2] In-Reply-To: References: Message-ID: > Trivial inlining a virtual method to subclasses and some cleanup to related methods. > > The gc-log is slightly updated due to the change of the name of generations. Log before&after shown below: > > > # baseline > > [2.417s][debug][gc,heap] GC(0) def new generation total 153600K, used 76645K [0x000000060d800000, 0x0000000617ea0000, 0x00000006b3aa0000) > [2.417s][debug][gc,heap] GC(0) eden space 136576K, 56% used [0x000000060d800000, 0x00000006122d9538, 0x0000000615d60000) > [2.417s][debug][gc,heap] GC(0) from space 17024K, 0% used [0x0000000615d60000, 0x0000000615d60000, 0x0000000616e00000) > [2.417s][debug][gc,heap] GC(0) to space 17024K, 0% used [0x0000000616e00000, 0x0000000616e00000, 0x0000000617ea0000) > [2.417s][debug][gc,heap] GC(0) tenured generation total 341376K, used 0K [0x00000006b3aa0000, 0x00000006c8800000, 0x0000000800000000) > [2.417s][debug][gc,heap] GC(0) the space 341376K, 0% used [0x00000006b3aa0000, 0x00000006b3aa0000, 0x00000006c8800000) > > # new > > [9.846s][debug][gc,heap] GC(0) DefNew total 153600K, used 71165K [0x000000060d800000, 0x0000000617ea0000, 0x00000006b3aa0000) > [9.846s][debug][gc,heap] GC(0) eden space 136576K, 52% used [0x000000060d800000, 0x0000000611d7f708, 0x0000000615d60000) > [9.846s][debug][gc,heap] GC(0) from space 17024K, 0% used [0x0000000615d60000, 0x0000000615d60000, 0x0000000616e00000) > [9.846s][debug][gc,heap] GC(0) to space 17024K, 0% used [0x0000000616e00000, 0x0000000616e00000, 0x0000000617ea0000) > [9.846s][debug][gc,heap] GC(0) Tenured total 341376K, used 0K [0x00000006b3aa0000, 0x00000006c8800000, 0x0000000800000000) > [9.846s][debug][gc,heap] GC(0) the space 341376K, 0% used [0x00000006b3aa0000, 0x00000006b3aa0000, 0x00000006c8800000) Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20607/files - new: https://git.openjdk.org/jdk/pull/20607/files/e9cb6c93..f9be4f79 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20607&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20607&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20607.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20607/head:pull/20607 PR: https://git.openjdk.org/jdk/pull/20607 From gli at openjdk.org Tue Aug 20 09:51:48 2024 From: gli at openjdk.org (Guoxiong Li) Date: Tue, 20 Aug 2024 09:51:48 GMT Subject: RFR: 8337658: ZGC: Move soft reference handling out of the driver loop function In-Reply-To: <8gHhoUWbd1NXNwXEeUbqtPMIr6u_lP8CnKJrarncBZI=.b648e3ab-f275-4dc1-a5e9-e16ec3ff3e2b@github.com> References: <5CLxq71h2WzrhzVpRdRgw_QWeRnTnBL5YMQ6Zeo_8xU=.ed328b8d-0309-486f-8c36-e3d7f099411e@github.com> <8gHhoUWbd1NXNwXEeUbqtPMIr6u_lP8CnKJrarncBZI=.b648e3ab-f275-4dc1-a5e9-e16ec3ff3e2b@github.com> Message-ID: On Tue, 20 Aug 2024 08:59:34 GMT, Stefan Karlsson wrote: > > I don't know whether _uses_clear_all_soft_references_policy is a good name. I may be confused by the name if I read the code first time. > > Could you elaborate on what the problem with the name is? The two verbs `use` and `clear` are put together, which is curious at first glance. When I dive into the concrete code, I can know the `clear_all` is actually a adjective to describe the `soft_references_policy`. So I don't know whether it would confuse the newbies. Another problem is whether we should use the third person singular form (use `uses` or `use` in `_uses_clear_all_soft_references_policy` here). I search the code in current HotSpot and both forms are used now. So I am OK with the word `uses`. Maybe we should unify the name rule in the future. And, my problem may be caused by my poor english, so I will agree with your opinion if you approve this name after reading my comment above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20418#issuecomment-2298444013 From stefank at openjdk.org Tue Aug 20 10:07:07 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 20 Aug 2024 10:07:07 GMT Subject: RFR: 8337658: ZGC: Move soft reference handling out of the driver loop function [v2] In-Reply-To: References: Message-ID: > The ZDriver code is written to be neat and have a clear outline. The soft reference handling distracts when reading this code. I propose that we hide it a bit. > > I've also clarified in comments and names that the code is dealing with clearing of *all* references. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Fix inconsistent naming ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20418/files - new: https://git.openjdk.org/jdk/pull/20418/files/5a69a95e..c84a143e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20418&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20418&range=00-01 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20418.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20418/head:pull/20418 PR: https://git.openjdk.org/jdk/pull/20418 From stefank at openjdk.org Tue Aug 20 10:11:49 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 20 Aug 2024 10:11:49 GMT Subject: RFR: 8337658: ZGC: Move soft reference handling out of the driver loop function [v2] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:07:07 GMT, Stefan Karlsson wrote: >> The ZDriver code is written to be neat and have a clear outline. The soft reference handling distracts when reading this code. I propose that we hide it a bit. >> >> I've also clarified in comments and names that the code is dealing with clearing of *all* references. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix inconsistent naming I fixed the inconsistency between `soft_reference_policy` and `soft_references_policy`. > The two verbs use and clear are put together, which is curious at first glance. When I dive into the concrete code, I can know the clear_all is actually a adjective to describe the soft_references_policy. So I don't know whether it would confuse the newbies. Thanks for the explanation. Yes, the intention is to read this as "uses 'clear all' 'soft reference policy'". I'm going to keep that name, unless someone manages to come up with a more spot-on name that is easy to understand. > Another problem is whether we should use the third person singular form (use uses or use in _uses_clear_all_soft_references_policy here). I search the code in current HotSpot and both forms are used now. So I am OK with the word uses. Maybe we should unify the name rule in the future. To me "use" sounds like a command to do something, and not a property. An alternative could be to change it to "is_using_clear_all_soft_reference_policy". Would that be better? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20418#issuecomment-2298484566 From shade at openjdk.org Tue Aug 20 10:27:21 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 20 Aug 2024 10:27:21 GMT Subject: RFR: 8338662: Shenandoah: Remove excessive ShenandoahVerifier::verify_during_evacuation Message-ID: `ShenandoahVerifier::verify_during_evacuation` is a relaxed version of `ShenandoahVerifier::verify_before_evacuation`. In current code, "during" verification is called shortly after "before" check, which really gains us nothing checking-wise, and only really wastes verification time. This is the only "during" verification check we have, all other checks verify things before/after the phases. It makes sense to remove "during evac" verification check for extra debug performance and cleanliness. Additional testing: - [ ] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/20641/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20641&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338662 Stats: 39 lines in 4 files changed: 4 ins; 34 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20641.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20641/head:pull/20641 PR: https://git.openjdk.org/jdk/pull/20641 From shade at openjdk.org Tue Aug 20 17:05:52 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 20 Aug 2024 17:05:52 GMT Subject: RFR: 8338688: Shenandoah: Avoid calling java_lang_Class accessors in asserts/verifier Message-ID: In GC verification code, we are not always safe to touch the klass directly. This becomes a problem in Lilliput, where loading klass from the from-space is erroneous. Lilliput would replace `obj->klass()` with `obj->forward_safe_klass()` to make it right in GC code. But accessors like `java_lang_Class` would not be fixed. So we are better avoiding using these `java_lang_Class` accessors in GC verification code, and use the loaded `klass` directly. Additional tests: - [ ] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/20651/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20651&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338688 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20651.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20651/head:pull/20651 PR: https://git.openjdk.org/jdk/pull/20651 From rkennke at openjdk.org Tue Aug 20 17:10:03 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 20 Aug 2024 17:10:03 GMT Subject: RFR: 8338688: Shenandoah: Avoid calling java_lang_Class accessors in asserts/verifier In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 16:55:53 GMT, Aleksey Shipilev wrote: > In GC verification code, we are not always safe to touch the klass directly. This becomes a problem in Lilliput, where loading klass from the from-space is erroneous. Lilliput would replace `obj->klass()` with `obj->forward_safe_klass()` to make it right in GC code. But accessors like `java_lang_Class` would not be fixed: they would instead rely on barriers to always be called on to-space objects. > > So we are better avoiding using these `java_lang_Class` accessors in GC verification code, and use the loaded `klass` directly. > > Additional tests: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC` Looks good to me, and I verified that it fixes the problems that I've observed in Lilliput. ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20651#pullrequestreview-2248582655 From nprasad at openjdk.org Tue Aug 20 22:38:04 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Tue, 20 Aug 2024 22:38:04 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics In-Reply-To: References: Message-ID: On Fri, 2 Aug 2024 02:54:10 GMT, Neethu Prasad wrote: > 2. Thread exiting critical region Re-ran tests and updated examples in PR description. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20277#issuecomment-2299875222 From lmesnik at openjdk.org Wed Aug 21 00:24:07 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 21 Aug 2024 00:24:07 GMT Subject: RFR: 8258483: [TESTBUG] gtest CollectorPolicy.young_scaled_initial_ergo_vm fails if heap is too small Message-ID: <7Qtt8s9VVWzl6VfmXVPR3GQTCKcQA5Fx-fEgsPLCLk8=.f084eb20-96be-4bde-ac50-84ca2fc54951@github.com> The tests CollectorPolicy.* checks SerialGC policy. They might fail if MaxHeapSize is too small. If heap is not enough for then VM change ergonomic scheme and print warning about this. The test is not checking this case. The GC ergonomic has very different cases and only main workflow is covered. The goal of fix is not to improve test but pass in reasonable environment or silently pass if other. I have updated test so it pass if heap is at least 128M (since it is SerialGC, it seems reasonable for testing in smaller containers) or skipped otherwise. Testing: tier1 ------------- Commit messages: - typo fixed. - size updated. - gtest has been updated. Changes: https://git.openjdk.org/jdk/pull/20656/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20656&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8258483 Stats: 12 lines in 1 file changed: 9 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20656.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20656/head:pull/20656 PR: https://git.openjdk.org/jdk/pull/20656 From lmesnik at openjdk.org Wed Aug 21 04:05:02 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 21 Aug 2024 04:05:02 GMT Subject: RFR: 8258483: [TESTBUG] gtest CollectorPolicy.young_scaled_initial_ergo_vm fails if heap is too small In-Reply-To: <7Qtt8s9VVWzl6VfmXVPR3GQTCKcQA5Fx-fEgsPLCLk8=.f084eb20-96be-4bde-ac50-84ca2fc54951@github.com> References: <7Qtt8s9VVWzl6VfmXVPR3GQTCKcQA5Fx-fEgsPLCLk8=.f084eb20-96be-4bde-ac50-84ca2fc54951@github.com> Message-ID: On Wed, 21 Aug 2024 00:16:26 GMT, Leonid Mesnik wrote: > The tests CollectorPolicy.* checks SerialGC policy. They might fail if MaxHeapSize is too small. > > If heap is not enough for then VM change ergonomic scheme and print warning about this. The test is not checking this case. > The GC ergonomic has very different cases and only main workflow is covered. The goal of fix is not to improve test but pass in reasonable environment or silently pass if other. > > I have updated test so it pass if heap is at least 128M (since it is SerialGC, it seems reasonable for testing in smaller containers) or skipped otherwise. > > Testing: tier1 The test failed in GHA, moving to draft state. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20656#issuecomment-2300714000 From ayang at openjdk.org Wed Aug 21 08:58:08 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 21 Aug 2024 08:58:08 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v6] In-Reply-To: <-UX_bf4EPxAUOOYQrK69o4o0ewKcMx8ajJog5bdCspE=.b3ce2098-a937-4c81-8aca-1993c14b9e28@github.com> References: <-UX_bf4EPxAUOOYQrK69o4o0ewKcMx8ajJog5bdCspE=.b3ce2098-a937-4c81-8aca-1993c14b9e28@github.com> Message-ID: <4RbcLpEj2K4ymMQ3MwGfF867M1aVINKZBS1hcnlW2Pk=.bb952464-3fa8-45d0-9a58-27dadc36f4f7@github.com> On Mon, 12 Aug 2024 22:39:25 GMT, Neethu Prasad wrote: >> **Notes** >> Adding logs to get more visibility into how fast a thread resumes from allocation stall. >> >> **Testing** >> * tier 1, tier 2, hotspot_gc tests. >> >> Example log messages >> >> 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. >> >> 2. Thread exiting critical region. Thread "main" 0 locked. >> >> 3. Thread stalled by JNI critical section. Resumed after 1294ms. Thread "Thread-0". >> >> 4. Thread blocked to enter critical region. Resumed after 1280ms. Thread "SIGINT handler". > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > address feedback regarding logger potentially getting instantiated multiple times Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20277#pullrequestreview-2250207354 From gli at openjdk.org Wed Aug 21 11:21:05 2024 From: gli at openjdk.org (Guoxiong Li) Date: Wed, 21 Aug 2024 11:21:05 GMT Subject: RFR: 8337658: ZGC: Move soft reference handling out of the driver loop function [v2] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:07:07 GMT, Stefan Karlsson wrote: >> The ZDriver code is written to be neat and have a clear outline. The soft reference handling distracts when reading this code. I propose that we hide it a bit. >> >> I've also clarified in comments and names that the code is dealing with clearing of *all* references. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix inconsistent naming Looks good. ------------- Marked as reviewed by gli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20418#pullrequestreview-2250517468 From rkennke at openjdk.org Wed Aug 21 11:17:30 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 21 Aug 2024 11:17:30 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism Message-ID: Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when promotion fails, to indicate that the object has been looked at, but failed promotion. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. A side-effect of this is that we can get rid of the machinery to preserve headers across promotion failures in Serial and G1. (Parallel GC [ab]uses the preserved-headers structure to also find all the forwarded objects for header restoration. This could be changed, I suppose, but it is not trivial.) If you prefer, I could break-out the removal of the preserved-headers stuff into a separate PR. This is in preparation of upstreaming compact object headers, and I intend to push it only once all the parts have been approved. Testing: - [x] tier1 - [x] tier2 - [x] tier3 - [x] tier4 - [x] Running in production @ AWS since >1year without troubles ------------- Commit messages: - 8305898: Alternative self-forwarding mechanism Changes: https://git.openjdk.org/jdk/pull/20603/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20603&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305898 Stats: 156 lines in 18 files changed: 66 ins; 66 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/20603.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20603/head:pull/20603 PR: https://git.openjdk.org/jdk/pull/20603 From gli at openjdk.org Wed Aug 21 11:29:04 2024 From: gli at openjdk.org (Guoxiong Li) Date: Wed, 21 Aug 2024 11:29:04 GMT Subject: RFR: 8337658: ZGC: Move soft reference handling out of the driver loop function [v2] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:09:17 GMT, Stefan Karlsson wrote: > To me "use" sounds like a command to do something, and not a property. An alternative could be to change it to "is_using_clear_all_soft_reference_policy". Would that be better? These names seem similar. I approved the current patch just now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20418#issuecomment-2301818591 From gli at openjdk.org Wed Aug 21 11:29:04 2024 From: gli at openjdk.org (Guoxiong Li) Date: Wed, 21 Aug 2024 11:29:04 GMT Subject: RFR: 8338490: Serial: Move Generation::print_on to subclasses [v2] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 09:16:20 GMT, Albert Mingkun Yang wrote: >> Trivial inlining a virtual method to subclasses and some cleanup to related methods. >> >> The gc-log is slightly updated due to the change of the name of generations. Log before&after shown below: >> >> >> # baseline >> >> [2.417s][debug][gc,heap] GC(0) def new generation total 153600K, used 76645K [0x000000060d800000, 0x0000000617ea0000, 0x00000006b3aa0000) >> [2.417s][debug][gc,heap] GC(0) eden space 136576K, 56% used [0x000000060d800000, 0x00000006122d9538, 0x0000000615d60000) >> [2.417s][debug][gc,heap] GC(0) from space 17024K, 0% used [0x0000000615d60000, 0x0000000615d60000, 0x0000000616e00000) >> [2.417s][debug][gc,heap] GC(0) to space 17024K, 0% used [0x0000000616e00000, 0x0000000616e00000, 0x0000000617ea0000) >> [2.417s][debug][gc,heap] GC(0) tenured generation total 341376K, used 0K [0x00000006b3aa0000, 0x00000006c8800000, 0x0000000800000000) >> [2.417s][debug][gc,heap] GC(0) the space 341376K, 0% used [0x00000006b3aa0000, 0x00000006b3aa0000, 0x00000006c8800000) >> >> # new >> >> [9.846s][debug][gc,heap] GC(0) DefNew total 153600K, used 71165K [0x000000060d800000, 0x0000000617ea0000, 0x00000006b3aa0000) >> [9.846s][debug][gc,heap] GC(0) eden space 136576K, 52% used [0x000000060d800000, 0x0000000611d7f708, 0x0000000615d60000) >> [9.846s][debug][gc,heap] GC(0) from space 17024K, 0% used [0x0000000615d60000, 0x0000000615d60000, 0x0000000616e00000) >> [9.846s][debug][gc,heap] GC(0) to space 17024K, 0% used [0x0000000616e00000, 0x0000000616e00000, 0x0000000617ea0000) >> [9.846s][debug][gc,heap] GC(0) Tenured total 341376K, used 0K [0x00000006b3aa0000, 0x00000006c8800000, 0x0000000800000000) >> [9.846s][debug][gc,heap] GC(0) the space 341376K, 0% used [0x00000006b3aa0000, 0x00000006b3aa0000, 0x00000006c8800000) > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good. ------------- Marked as reviewed by gli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20607#pullrequestreview-2250533517 From rkennke at openjdk.org Wed Aug 21 11:36:39 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 21 Aug 2024 11:36:39 GMT Subject: RFR: 8305896: Alternative full GC forwarding Message-ID: Currently, the full-GC modes of Serial, Parallel, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large because now all headers would be 'interesting' and would have to be preserved. I propose to use an alternative encoding for full-GC (sliding-GC) so that the forwarding information fits into the lowest 32 bits of the header. The encoding is similar to compressed-oops encoding: it basically subtracts the forwardee address from the heap-base, shifts that difference into the right place, and sets the lowest two bits (to indicate 'forwarded' state as usual). The current implementation preserves the upper 32 bits of the mark-word. This leaves 30 bits for encoding the forwardee, enough for 8GB of heap. As soon as we get Tiny Class-Pointers (planned as part of compact headers upstreaming), we only need 22 bits for the narrow Klass*, and can use 40 bits for forwardee encoding. That's enough for 8TB of heap. If somebody wants to run with larger heap than this, compact headers would be disabled. This change also adds some infrastructure to configure the flags, with the code commented out, to illustrate the intended use. An earlier approach to address the problem has been proposed in https://github.com/openjdk/jdk/pull/13582. This has been implemented under the assumption that we would only have 30 bits to encode the forwardee. In the light of changed plans, I don't think it's worth the added complexity and risk of slight performance issues. I will revisit it in the future, for 4-byte-headers. I also experimented with a different forwarding approach that would use per-region hashtables, but gave up on it for now, because performance was significantly worse than the sliding forwarding encoding. This is in preparation of upstreaming compact object headers, and I intend to push it only once all the parts have been approved. Testing: - [x] hotspot_gc - [x] tier1 - [x] tier2 - [x] tier3 - [x] tier4 ------------- Depends on: https://git.openjdk.org/jdk/pull/20603 Commit messages: - 8305896O Alternative full GC forwarding Changes: https://git.openjdk.org/jdk/pull/20605/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20605&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305896 Stats: 269 lines in 18 files changed: 191 ins; 21 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/20605.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20605/head:pull/20605 PR: https://git.openjdk.org/jdk/pull/20605 From rkennke at openjdk.org Wed Aug 21 11:48:26 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 21 Aug 2024 11:48:26 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) Message-ID: This is the main body of the JEP 450: Compact Object Headers (Experimental). Main changes: - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). - Arrays will can now store their length at offset 8. - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (or at least I could not find it), and also I fear that doing so could mess with optimizations. This may be useful to revisit. OTOH, the approach that I have taken works and is similar to DecodeNKlass and similar instructions. Testing: (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests.) The below testing has been run many times, but not with this exact base version of the JDK. I want to hold off the full testing until we also have the Tiny Class-Pointers PR lined-up, and test with that. - [x] tier1 (x86_64) - [ ] tier2 (x86_64) - [ ] tier3 (x86_64) - [ ] tier4 (x86_64) - [x] tier1 (aarch64) - [ ] tier2 (aarch64) - [ ] tier3 (aarch64) - [ ] tier4 (aarch64) - [x] tier1 (x86_64) +UseCompactObjectHeaders - [ ] tier2 (x86_64) +UseCompactObjectHeaders - [ ] tier3 (x86_64) +UseCompactObjectHeaders - [ ] tier4 (x86_64) +UseCompactObjectHeaders - [x] tier1 (aarch64) +UseCompactObjectHeaders - [ ] tier2 (aarch64) +UseCompactObjectHeaders - [ ] tier3 (aarch64) +UseCompactObjectHeaders - [ ] tier4 (aarch64) +UseCompactObjectHeaders - [x] Running as a backport in production since >1 year. ------------- Depends on: https://git.openjdk.org/jdk/pull/20605 Commit messages: - 8305894: Implementation: JEP 450: Compact Object Headers (Experimental) Changes: https://git.openjdk.org/jdk/pull/20640/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20640&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305895 Stats: 1668 lines in 104 files changed: 1232 ins; 206 del; 230 mod Patch: https://git.openjdk.org/jdk/pull/20640.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20640/head:pull/20640 PR: https://git.openjdk.org/jdk/pull/20640 From ayang at openjdk.org Wed Aug 21 12:04:11 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 21 Aug 2024 12:04:11 GMT Subject: RFR: 8338490: Serial: Move Generation::print_on to subclasses [v2] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 09:16:20 GMT, Albert Mingkun Yang wrote: >> Trivial inlining a virtual method to subclasses and some cleanup to related methods. >> >> The gc-log is slightly updated due to the change of the name of generations. Log before&after shown below: >> >> >> # baseline >> >> [2.417s][debug][gc,heap] GC(0) def new generation total 153600K, used 76645K [0x000000060d800000, 0x0000000617ea0000, 0x00000006b3aa0000) >> [2.417s][debug][gc,heap] GC(0) eden space 136576K, 56% used [0x000000060d800000, 0x00000006122d9538, 0x0000000615d60000) >> [2.417s][debug][gc,heap] GC(0) from space 17024K, 0% used [0x0000000615d60000, 0x0000000615d60000, 0x0000000616e00000) >> [2.417s][debug][gc,heap] GC(0) to space 17024K, 0% used [0x0000000616e00000, 0x0000000616e00000, 0x0000000617ea0000) >> [2.417s][debug][gc,heap] GC(0) tenured generation total 341376K, used 0K [0x00000006b3aa0000, 0x00000006c8800000, 0x0000000800000000) >> [2.417s][debug][gc,heap] GC(0) the space 341376K, 0% used [0x00000006b3aa0000, 0x00000006b3aa0000, 0x00000006c8800000) >> >> # new >> >> [9.846s][debug][gc,heap] GC(0) DefNew total 153600K, used 71165K [0x000000060d800000, 0x0000000617ea0000, 0x00000006b3aa0000) >> [9.846s][debug][gc,heap] GC(0) eden space 136576K, 52% used [0x000000060d800000, 0x0000000611d7f708, 0x0000000615d60000) >> [9.846s][debug][gc,heap] GC(0) from space 17024K, 0% used [0x0000000615d60000, 0x0000000615d60000, 0x0000000616e00000) >> [9.846s][debug][gc,heap] GC(0) to space 17024K, 0% used [0x0000000616e00000, 0x0000000616e00000, 0x0000000617ea0000) >> [9.846s][debug][gc,heap] GC(0) Tenured total 341376K, used 0K [0x00000006b3aa0000, 0x00000006c8800000, 0x0000000800000000) >> [9.846s][debug][gc,heap] GC(0) the space 341376K, 0% used [0x00000006b3aa0000, 0x00000006b3aa0000, 0x00000006c8800000) > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20607#issuecomment-2301880724 From ayang at openjdk.org Wed Aug 21 12:04:12 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 21 Aug 2024 12:04:12 GMT Subject: Integrated: 8338490: Serial: Move Generation::print_on to subclasses In-Reply-To: References: Message-ID: On Fri, 16 Aug 2024 07:07:36 GMT, Albert Mingkun Yang wrote: > Trivial inlining a virtual method to subclasses and some cleanup to related methods. > > The gc-log is slightly updated due to the change of the name of generations. Log before&after shown below: > > > # baseline > > [2.417s][debug][gc,heap] GC(0) def new generation total 153600K, used 76645K [0x000000060d800000, 0x0000000617ea0000, 0x00000006b3aa0000) > [2.417s][debug][gc,heap] GC(0) eden space 136576K, 56% used [0x000000060d800000, 0x00000006122d9538, 0x0000000615d60000) > [2.417s][debug][gc,heap] GC(0) from space 17024K, 0% used [0x0000000615d60000, 0x0000000615d60000, 0x0000000616e00000) > [2.417s][debug][gc,heap] GC(0) to space 17024K, 0% used [0x0000000616e00000, 0x0000000616e00000, 0x0000000617ea0000) > [2.417s][debug][gc,heap] GC(0) tenured generation total 341376K, used 0K [0x00000006b3aa0000, 0x00000006c8800000, 0x0000000800000000) > [2.417s][debug][gc,heap] GC(0) the space 341376K, 0% used [0x00000006b3aa0000, 0x00000006b3aa0000, 0x00000006c8800000) > > # new > > [9.846s][debug][gc,heap] GC(0) DefNew total 153600K, used 71165K [0x000000060d800000, 0x0000000617ea0000, 0x00000006b3aa0000) > [9.846s][debug][gc,heap] GC(0) eden space 136576K, 52% used [0x000000060d800000, 0x0000000611d7f708, 0x0000000615d60000) > [9.846s][debug][gc,heap] GC(0) from space 17024K, 0% used [0x0000000615d60000, 0x0000000615d60000, 0x0000000616e00000) > [9.846s][debug][gc,heap] GC(0) to space 17024K, 0% used [0x0000000616e00000, 0x0000000616e00000, 0x0000000617ea0000) > [9.846s][debug][gc,heap] GC(0) Tenured total 341376K, used 0K [0x00000006b3aa0000, 0x00000006c8800000, 0x0000000800000000) > [9.846s][debug][gc,heap] GC(0) the space 341376K, 0% used [0x00000006b3aa0000, 0x00000006b3aa0000, 0x00000006c8800000) This pull request has now been integrated. Changeset: 918cf114 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/918cf114548d0098cf6a8a50032b78ee04d453db Stats: 54 lines in 7 files changed: 16 ins; 26 del; 12 mod 8338490: Serial: Move Generation::print_on to subclasses Reviewed-by: gli ------------- PR: https://git.openjdk.org/jdk/pull/20607 From aph at openjdk.org Wed Aug 21 12:27:05 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 21 Aug 2024 12:27:05 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:07:26 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. > - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (o... src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp line 184: > 182: } else { > 183: // This assumes that all prototype bits fit in an int32_t > 184: mov(t1, (int32_t)(intptr_t)markWord::prototype().value()); Suggestion: mov(t1, checked_cast((intptr_t)markWord::prototype().value())); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20640#discussion_r1724960170 From aph at openjdk.org Wed Aug 21 13:11:05 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 21 Aug 2024 13:11:05 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: <1JS96-UJBR95NbAe79ETt1c7aJ3vH9-cijwG24ufs0E=.537b0a80-aedd-4b25-9605-264a2f7f60fb@github.com> On Tue, 20 Aug 2024 10:07:26 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. > - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (o... src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2575: > 2573: } else { > 2574: lea(dst, Address(obj, index, Address::lsl(scale))); > 2575: ldr(dst, Address(dst, offset)); Suggestion: ldr(dst, Address(dst, index, Address::lsl(scale))); Will this work? Or is dst unaligned? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20640#discussion_r1725025276 From rkennke at openjdk.org Wed Aug 21 13:21:04 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 21 Aug 2024 13:21:04 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: <1JS96-UJBR95NbAe79ETt1c7aJ3vH9-cijwG24ufs0E=.537b0a80-aedd-4b25-9605-264a2f7f60fb@github.com> References: <1JS96-UJBR95NbAe79ETt1c7aJ3vH9-cijwG24ufs0E=.537b0a80-aedd-4b25-9605-264a2f7f60fb@github.com> Message-ID: On Wed, 21 Aug 2024 13:08:23 GMT, Andrew Haley wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. >> - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral t... > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2575: > >> 2573: } else { >> 2574: lea(dst, Address(obj, index, Address::lsl(scale))); >> 2575: ldr(dst, Address(dst, offset)); > > Suggestion: > > ldr(dst, Address(dst, index, Address::lsl(scale))); > > Will this work? Or is dst unaligned? It ignores the offset, right? Or are you saying that offset must be 0 on that path? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20640#discussion_r1725040269 From yzheng at openjdk.org Wed Aug 21 14:34:04 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 21 Aug 2024 14:34:04 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:07:26 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. > - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (o... src/hotspot/share/opto/library_call.cpp line 4631: > 4629: // vm: see markWord.hpp. > 4630: Node *hash_mask = _gvn.intcon(UseCompactObjectHeaders ? markWord::hash_mask_compact : markWord::hash_mask); > 4631: Node *hash_shift = _gvn.intcon(UseCompactObjectHeaders ? markWord::hash_shift_compact : markWord::hash_shift); Could you please export these two symbols to JVMCI? Thanks! diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp index 688691fb976..d97fdcb3f44 100644 --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp @@ -792,11 +792,13 @@ declare_constant(InvocationCounter::count_shift) \ \ declare_constant(markWord::hash_shift) \ + declare_constant(markWord::hash_shift_compact) \ declare_constant(markWord::monitor_value) \ \ declare_constant(markWord::lock_mask_in_place) \ declare_constant(markWord::age_mask_in_place) \ declare_constant(markWord::hash_mask) \ + declare_constant(markWord::hash_mask_compact) \ declare_constant(markWord::hash_mask_in_place) \ \ declare_constant(markWord::unlocked_value) \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20640#discussion_r1725162361 From shade at openjdk.org Wed Aug 21 14:40:05 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 Aug 2024 14:40:05 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v6] In-Reply-To: <-UX_bf4EPxAUOOYQrK69o4o0ewKcMx8ajJog5bdCspE=.b3ce2098-a937-4c81-8aca-1993c14b9e28@github.com> References: <-UX_bf4EPxAUOOYQrK69o4o0ewKcMx8ajJog5bdCspE=.b3ce2098-a937-4c81-8aca-1993c14b9e28@github.com> Message-ID: On Mon, 12 Aug 2024 22:39:25 GMT, Neethu Prasad wrote: >> **Notes** >> Adding logs to get more visibility into how fast a thread resumes from allocation stall. >> >> **Testing** >> * tier 1, tier 2, hotspot_gc tests. >> >> Example log messages >> >> 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. >> >> 2. Thread exiting critical region. Thread "main" 0 locked. >> >> 3. Thread stalled by JNI critical section. Resumed after 1294ms. Thread "Thread-0". >> >> 4. Thread blocked to enter critical region. Resumed after 1280ms. Thread "SIGINT handler". > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > address feedback regarding logger potentially getting instantiated multiple times Looks good to me, thanks. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20277#pullrequestreview-2251003645 From aph at openjdk.org Wed Aug 21 14:43:06 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 21 Aug 2024 14:43:06 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: <1JS96-UJBR95NbAe79ETt1c7aJ3vH9-cijwG24ufs0E=.537b0a80-aedd-4b25-9605-264a2f7f60fb@github.com> Message-ID: On Wed, 21 Aug 2024 13:18:03 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. >> - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral t... > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2575: > >> 2573: } else { >> 2574: lea(dst, Address(obj, index, Address::lsl(scale))); >> 2575: ldr(dst, Address(dst, offset)); > > It ignores the offset, right? Or are you saying that offset must be 0 on that path? Sorry, brain fart. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20640#discussion_r1725176978 From ihse at openjdk.org Wed Aug 21 14:50:07 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 21 Aug 2024 14:50:07 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:07:26 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. > - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (o... make/autoconf/jdk-options.m4 line 696: > 694: AVAILABLE=false > 695: else > 696: AC_MSG_RESULT([yes]) You should set `AVAILABLE=true` in this case. Apparently it works anyway, but it will increase clarity of the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20640#discussion_r1725190726 From shade at openjdk.org Wed Aug 21 15:31:03 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 Aug 2024 15:31:03 GMT Subject: RFR: 8338688: Shenandoah: Avoid calling java_lang_Class accessors in asserts/verifier In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 16:55:53 GMT, Aleksey Shipilev wrote: > In GC verification code, we are not always safe to touch the klass directly. This becomes a problem in Lilliput, where loading klass from the from-space is erroneous. Lilliput would replace `obj->klass()` with `obj->forward_safe_klass()` to make it right in GC code. But accessors like `java_lang_Class` would not be fixed: they would instead rely on barriers to always be called on to-space objects. > > So we are better avoiding using these `java_lang_Class` accessors in GC verification code, and use the loaded `klass` directly. > > Additional tests: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC` Thanks! I think I need a second review, @earthling-amzn ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20651#issuecomment-2302377707 From wkemper at openjdk.org Wed Aug 21 16:13:10 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 21 Aug 2024 16:13:10 GMT Subject: RFR: 8338688: Shenandoah: Avoid calling java_lang_Class accessors in asserts/verifier In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 16:55:53 GMT, Aleksey Shipilev wrote: > In GC verification code, we are not always safe to touch the klass directly. This becomes a problem in Lilliput, where loading klass from the from-space is erroneous. Lilliput would replace `obj->klass()` with `obj->forward_safe_klass()` to make it right in GC code. But accessors like `java_lang_Class` would not be fixed: they would instead rely on barriers to always be called on to-space objects. > > So we are better avoiding using these `java_lang_Class` accessors in GC verification code, and use the loaded `klass` directly. > > Additional tests: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC` Thanks, LGTM. ------------- Marked as reviewed by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/20651#pullrequestreview-2251295201 From shade at openjdk.org Wed Aug 21 16:13:11 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 Aug 2024 16:13:11 GMT Subject: RFR: 8338688: Shenandoah: Avoid calling java_lang_Class accessors in asserts/verifier In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 16:55:53 GMT, Aleksey Shipilev wrote: > In GC verification code, we are not always safe to touch the klass directly. This becomes a problem in Lilliput, where loading klass from the from-space is erroneous. Lilliput would replace `obj->klass()` with `obj->forward_safe_klass()` to make it right in GC code. But accessors like `java_lang_Class` would not be fixed: they would instead rely on barriers to always be called on to-space objects. > > So we are better avoiding using these `java_lang_Class` accessors in GC verification code, and use the loaded `klass` directly. > > Additional tests: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC` Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20651#issuecomment-2302462734 From shade at openjdk.org Wed Aug 21 16:13:11 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 Aug 2024 16:13:11 GMT Subject: Integrated: 8338688: Shenandoah: Avoid calling java_lang_Class accessors in asserts/verifier In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 16:55:53 GMT, Aleksey Shipilev wrote: > In GC verification code, we are not always safe to touch the klass directly. This becomes a problem in Lilliput, where loading klass from the from-space is erroneous. Lilliput would replace `obj->klass()` with `obj->forward_safe_klass()` to make it right in GC code. But accessors like `java_lang_Class` would not be fixed: they would instead rely on barriers to always be called on to-space objects. > > So we are better avoiding using these `java_lang_Class` accessors in GC verification code, and use the loaded `klass` directly. > > Additional tests: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC` This pull request has now been integrated. Changeset: e297e881 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/e297e8817f486e4af850c97fcff859c3e9a9e21c Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8338688: Shenandoah: Avoid calling java_lang_Class accessors in asserts/verifier Reviewed-by: rkennke, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/20651 From ayang at openjdk.org Wed Aug 21 17:48:09 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 21 Aug 2024 17:48:09 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 20:52:13 GMT, Roman Kennke wrote: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when promotion fails, to indicate that the object has been looked at, but failed promotion. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. > > A side-effect of this is that we can get rid of the machinery to preserve headers across promotion failures in Serial and G1. (Parallel GC [ab]uses the preserved-headers structure to also find all the forwarded objects for header restoration. This could be changed, I suppose, but it is not trivial.) If you prefer, I could break-out the removal of the preserved-headers stuff into a separate PR. > > This is in preparation of upstreaming compact object headers, and I intend to push it only once all the parts have been approved. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [x] tier4 > - [x] Running in production @ AWS since >1year without troubles src/hotspot/share/gc/serial/defNewGeneration.cpp line 703: > 701: struct ResetForwardedMarkWord : ObjectClosure { > 702: void do_object(oop obj) override { > 703: if (obj->is_self_forwarded()) { Why is `is_self_forwarded` treated specially? I'd expect the `is_forwarded` case alone to be enough here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20603#discussion_r1725482115 From ayang at openjdk.org Wed Aug 21 17:57:03 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 21 Aug 2024 17:57:03 GMT Subject: RFR: 8305896: Alternative full GC forwarding In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 22:54:39 GMT, Roman Kennke wrote: > Currently, the full-GC modes of Serial, Parallel, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large because now all headers would be 'interesting' and would have to be preserved. > > I propose to use an alternative encoding for full-GC (sliding-GC) so that the forwarding information fits into the lowest 32 bits of the header. The encoding is similar to compressed-oops encoding: it basically subtracts the forwardee address from the heap-base, shifts that difference into the right place, and sets the lowest two bits (to indicate 'forwarded' state as usual). > > The current implementation preserves the upper 32 bits of the mark-word. This leaves 30 bits for encoding the forwardee, enough for 8GB of heap. As soon as we get Tiny Class-Pointers (planned as part of compact headers upstreaming), we only need 22 bits for the narrow Klass*, and can use 40 bits for forwardee encoding. That's enough for 8TB of heap. If somebody wants to run with larger heap than this, compact headers would be disabled. This change also adds some infrastructure to configure the flags, with the code commented out, to illustrate the intended use. > > An earlier approach to address the problem has been proposed in https://github.com/openjdk/jdk/pull/13582. This has been implemented under the assumption that we would only have 30 bits to encode the forwardee. In the light of changed plans, I don't think it's worth the added complexity and risk of slight performance issues. I will revisit it in the future, for 4-byte-headers. > > I also experimented with a different forwarding approach that would use per-region hashtables, but gave up on it for now, because performance was significantly worse than the sliding forwarding encoding. > > This is in preparation of upstreaming compact object headers, and I intend to push it only once all the parts have been approved. > > Testing: > - [x] hotspot_gc > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [x] tier4 src/hotspot/share/gc/shared/gcForwarding.hpp line 33: > 31: #include "oops/oopsHierarchy.hpp" > 32: > 33: class GCForwarding : public AllStatic { This class should have some doc. (Some text from the PR description.) src/hotspot/share/gc/shared/gcForwarding.hpp line 36: > 34: static const int NumKlassBits = 32; // Will be 22 with Tiny Class-Pointers > 35: static const int NUM_LOW_BITS_NARROW = BitsPerWord - NumKlassBits; > 36: static const int NUM_LOW_BITS_WIDE = BitsPerWord; Not obvious why some vars use CamelCapital while others use ALL_CAPS. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20605#discussion_r1725487945 PR Review Comment: https://git.openjdk.org/jdk/pull/20605#discussion_r1725489810 From wkemper at openjdk.org Wed Aug 21 18:01:03 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 21 Aug 2024 18:01:03 GMT Subject: RFR: 8338662: Shenandoah: Remove excessive ShenandoahVerifier::verify_during_evacuation In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:22:23 GMT, Aleksey Shipilev wrote: > `ShenandoahVerifier::verify_during_evacuation` is a relaxed version of `ShenandoahVerifier::verify_before_evacuation`. In current code, "during" verification is called shortly after "before" check, which really gains us nothing checking-wise, and only really wastes verification time. This is the only "during" verification check we have, all other checks verify things before/after the phases. It makes sense to remove "during evac" verification check for extra debug performance and cleanliness. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` Looks okay to me. ------------- Marked as reviewed by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/20641#pullrequestreview-2251541139 From ysr at openjdk.org Wed Aug 21 19:31:04 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 21 Aug 2024 19:31:04 GMT Subject: RFR: 8338662: Shenandoah: Remove excessive ShenandoahVerifier::verify_during_evacuation In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:22:23 GMT, Aleksey Shipilev wrote: > `ShenandoahVerifier::verify_during_evacuation` is a relaxed version of `ShenandoahVerifier::verify_before_evacuation`. In current code, "during" verification is called shortly after "before" check, which really gains us nothing checking-wise, and only really wastes verification time. This is the only "during" verification check we have, all other checks verify things before/after the phases. It makes sense to remove "during evac" verification check for extra debug performance and cleanliness. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` LGTM. ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20641#pullrequestreview-2251773370 From stefank at openjdk.org Wed Aug 21 20:14:03 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 21 Aug 2024 20:14:03 GMT Subject: RFR: 8305896: Alternative full GC forwarding In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 17:50:48 GMT, Albert Mingkun Yang wrote: >> Currently, the full-GC modes of Serial, Parallel, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large because now all headers would be 'interesting' and would have to be preserved. >> >> I propose to use an alternative encoding for full-GC (sliding-GC) so that the forwarding information fits into the lowest 32 bits of the header. The encoding is similar to compressed-oops encoding: it basically subtracts the forwardee address from the heap-base, shifts that difference into the right place, and sets the lowest two bits (to indicate 'forwarded' state as usual). >> >> The current implementation preserves the upper 32 bits of the mark-word. This leaves 30 bits for encoding the forwardee, enough for 8GB of heap. As soon as we get Tiny Class-Pointers (planned as part of compact headers upstreaming), we only need 22 bits for the narrow Klass*, and can use 40 bits for forwardee encoding. That's enough for 8TB of heap. If somebody wants to run with larger heap than this, compact headers would be disabled. This change also adds some infrastructure to configure the flags, with the code commented out, to illustrate the intended use. >> >> An earlier approach to address the problem has been proposed in https://github.com/openjdk/jdk/pull/13582. This has been implemented under the assumption that we would only have 30 bits to encode the forwardee. In the light of changed plans, I don't think it's worth the added complexity and risk of slight performance issues. I will revisit it in the future, for 4-byte-headers. >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but gave up on it for now, because performance was significantly worse than the sliding forwarding encoding. >> >> This is in preparation of upstreaming compact object headers, and I intend to push it only once all the parts have been approved. >> >> Testing: >> - [x] hotspot_gc >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [x] tier4 > > src/hotspot/share/gc/shared/gcForwarding.hpp line 36: > >> 34: static const int NumKlassBits = 32; // Will be 22 with Tiny Class-Pointers >> 35: static const int NUM_LOW_BITS_NARROW = BitsPerWord - NumKlassBits; >> 36: static const int NUM_LOW_BITS_WIDE = BitsPerWord; > > Not obvious why some vars use CamelCapital while others use ALL_CAPS. FWIW, in recent discussions around constants in the GC code, there has been a preference to update the GC code to use CamelCase for constants. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20605#discussion_r1725700046 From stefank at openjdk.org Wed Aug 21 20:26:04 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 21 Aug 2024 20:26:04 GMT Subject: RFR: 8305896: Alternative full GC forwarding In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 22:54:39 GMT, Roman Kennke wrote: > Currently, the full-GC modes of Serial, Parallel, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large because now all headers would be 'interesting' and would have to be preserved. > > I propose to use an alternative encoding for full-GC (sliding-GC) so that the forwarding information fits into the lowest 32 bits of the header. The encoding is similar to compressed-oops encoding: it basically subtracts the forwardee address from the heap-base, shifts that difference into the right place, and sets the lowest two bits (to indicate 'forwarded' state as usual). > > The current implementation preserves the upper 32 bits of the mark-word. This leaves 30 bits for encoding the forwardee, enough for 8GB of heap. As soon as we get Tiny Class-Pointers (planned as part of compact headers upstreaming), we only need 22 bits for the narrow Klass*, and can use 40 bits for forwardee encoding. That's enough for 8TB of heap. If somebody wants to run with larger heap than this, compact headers would be disabled. This change also adds some infrastructure to configure the flags, with the code commented out, to illustrate the intended use. > > An earlier approach to address the problem has been proposed in https://github.com/openjdk/jdk/pull/13582. This has been implemented under the assumption that we would only have 30 bits to encode the forwardee. In the light of changed plans, I don't think it's worth the added complexity and risk of slight performance issues. I will revisit it in the future, for 4-byte-headers. > > I also experimented with a different forwarding approach that would use per-region hashtables, but gave up on it for now, because performance was significantly worse than the sliding forwarding encoding. > > This is in preparation of upstreaming compact object headers, and I intend to push it only once all the parts have been approved. > > Testing: > - [x] hotspot_gc > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [x] tier4 src/hotspot/share/gc/shared/gcForwarding.cpp line 42: > 40: // FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); > 41: // } > 42: } ZGC doesn't use this and even if it did using MaxHeapSize wouldn't work since we have a larger address space than max heap size. Could you move the call to this function out of `GCArguments::initialize_heap_sizes` and into the GCs that will use `GCForwarding`? so could you move this out to the GCs that actually uses it? src/hotspot/share/gc/shared/gcForwarding.inline.hpp line 56: > 54: > 55: bool GCForwarding::is_forwarded(oop obj) { > 56: return obj->mark().is_marked(); Is this intentionally using `is_marked` instead of `is_forwarded`? If it is, could you write a short comment explaining why? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20605#discussion_r1725706745 PR Review Comment: https://git.openjdk.org/jdk/pull/20605#discussion_r1725711518 From stefank at openjdk.org Wed Aug 21 20:37:04 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 21 Aug 2024 20:37:04 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 20:52:13 GMT, Roman Kennke wrote: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when promotion fails, to indicate that the object has been looked at, but failed promotion. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. > > A side-effect of this is that we can get rid of the machinery to preserve headers across promotion failures in Serial and G1. (Parallel GC [ab]uses the preserved-headers structure to also find all the forwarded objects for header restoration. This could be changed, I suppose, but it is not trivial.) If you prefer, I could break-out the removal of the preserved-headers stuff into a separate PR. > > This is in preparation of upstreaming compact object headers, and I intend to push it only once all the parts have been approved. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [x] tier4 > - [x] Running in production @ AWS since >1year without troubles It might not be clear to reviewers, but the suggested change forces the usage of Lightweight locking on 32-bit JVMs. I think that is OK, especially given that Legacy locking is deprecated. However, before approving this PR it would be good to know if this has been communicated to the maintainers of the affected platforms? And with that said, I couldn't find anything in this patch that prevented 32-bit JVMs from starting with Legacy. There's only these asserts: NOT_LP64(assert(LockingMode != LM_LEGACY, "incorrect with LM_LEGACY on 32 bit");) ------------- PR Review: https://git.openjdk.org/jdk/pull/20603#pullrequestreview-2251897087 From lmesnik at openjdk.org Thu Aug 22 04:01:44 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 22 Aug 2024 04:01:44 GMT Subject: RFR: 8258483: [TESTBUG] gtest CollectorPolicy.young_scaled_initial_ergo_vm fails if heap is too small [v2] In-Reply-To: <7Qtt8s9VVWzl6VfmXVPR3GQTCKcQA5Fx-fEgsPLCLk8=.f084eb20-96be-4bde-ac50-84ca2fc54951@github.com> References: <7Qtt8s9VVWzl6VfmXVPR3GQTCKcQA5Fx-fEgsPLCLk8=.f084eb20-96be-4bde-ac50-84ca2fc54951@github.com> Message-ID: > The tests CollectorPolicy.* checks SerialGC policy. They might fail if MaxHeapSize is too small. > > If heap is not enough for then VM change ergonomic scheme and print warning about this. The test is not checking this case. > The GC ergonomic has very different cases and only main workflow is covered. The goal of fix is not to improve test but pass in reasonable environment or silently pass if other. > > I have updated test so it pass if heap is at least 128M (since it is SerialGC, it seems reasonable for testing in smaller containers) or skipped otherwise. > > Testing: tier1, running these tests manually with 90/128/256M to check that they pass in such environment. Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - increased heap - fixed space. - Merge branch 'master' of https://github.com/openjdk/jdk into 8258483 - typo fixed. - size updated. - gtest has been updated. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20656/files - new: https://git.openjdk.org/jdk/pull/20656/files/19a8501f..f0155c52 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20656&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20656&range=00-01 Stats: 2699 lines in 108 files changed: 1372 ins; 617 del; 710 mod Patch: https://git.openjdk.org/jdk/pull/20656.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20656/head:pull/20656 PR: https://git.openjdk.org/jdk/pull/20656 From rkennke at openjdk.org Thu Aug 22 06:12:06 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 06:12:06 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 17:45:14 GMT, Albert Mingkun Yang wrote: >> Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when promotion fails, to indicate that the object has been looked at, but failed promotion. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. >> >> A side-effect of this is that we can get rid of the machinery to preserve headers across promotion failures in Serial and G1. (Parallel GC [ab]uses the preserved-headers structure to also find all the forwarded objects for header restoration. This could be changed, I suppose, but it is not trivial.) If you prefer, I could break-out the removal of the preserved-headers stuff into a separate PR. >> >> This is in preparation of upstreaming compact object headers, and I intend to push it only once all the parts have been approved. >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [x] tier4 >> - [x] Running in production @ AWS since >1year without troubles > > src/hotspot/share/gc/serial/defNewGeneration.cpp line 703: > >> 701: struct ResetForwardedMarkWord : ObjectClosure { >> 702: void do_object(oop obj) override { >> 703: if (obj->is_self_forwarded()) { > > Why is `is_self_forwarded` treated specially? I'd expect the `is_forwarded` case alone to be enough here. Because I'd like self-forwarded marks not to be init-ed. Otherwise we'd have to preserve/restore them. Simply unset_self_forwarded() is enough to get them back to the original state. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20603#discussion_r1726400372 From rkennke at openjdk.org Thu Aug 22 06:19:06 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 06:19:06 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism In-Reply-To: References: Message-ID: <-YF6fEvCggHl5AROWyxVD02HXj85Q-DFyZHHX67Sado=.1c9361f5-8e00-4d2b-ab20-30fef3c03a7f@github.com> On Wed, 21 Aug 2024 20:34:45 GMT, Stefan Karlsson wrote: > It might not be clear to reviewers, but the suggested change forces the usage of Lightweight locking on 32-bit JVMs. I think that is OK, especially given that Legacy locking is deprecated. However, before approving this PR it would be good to know if this has been communicated to the maintainers of the affected platforms? > > And with that said, I couldn't find anything in this patch that prevented 32-bit JVMs from starting with Legacy. There's only these asserts: > > ``` > NOT_LP64(assert(LockingMode != LM_LEGACY, "incorrect with LM_LEGACY on 32 bit");) > ``` Right, with this change, we cannot use legacy locking on 32bit platforms anymore, because 1. the self-fwd bit would conflict with stack-locks because stack-locks are only 4-byte aligned on those platforms and 2. we no longer preserve headers around self-forwarding. No I haven't communicated this, yet. It might be better to seperate out the removal of preserved-headers around self-forwarding, and deal with header preservation and the implications on 32-bit platforms in a separate PR, WDYT? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20603#issuecomment-2303868150 From stefank at openjdk.org Thu Aug 22 07:31:04 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 22 Aug 2024 07:31:04 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism In-Reply-To: <-YF6fEvCggHl5AROWyxVD02HXj85Q-DFyZHHX67Sado=.1c9361f5-8e00-4d2b-ab20-30fef3c03a7f@github.com> References: <-YF6fEvCggHl5AROWyxVD02HXj85Q-DFyZHHX67Sado=.1c9361f5-8e00-4d2b-ab20-30fef3c03a7f@github.com> Message-ID: On Thu, 22 Aug 2024 06:16:45 GMT, Roman Kennke wrote: > It might be better to seperate out the removal of preserved-headers around self-forwarding, and deal with header preservation and the implications on 32-bit platforms in a separate PR, WDYT? I'm fine with handling it here in this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20603#issuecomment-2303972901 From rkennke at openjdk.org Thu Aug 22 07:57:35 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 07:57:35 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v2] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Parallel, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large because now all headers would be 'interesting' and would have to be preserved. > > I propose to use an alternative encoding for full-GC (sliding-GC) so that the forwarding information fits into the lowest 32 bits of the header. The encoding is similar to compressed-oops encoding: it basically subtracts the forwardee address from the heap-base, shifts that difference into the right place, and sets the lowest two bits (to indicate 'forwarded' state as usual). > > The current implementation preserves the upper 32 bits of the mark-word. This leaves 30 bits for encoding the forwardee, enough for 8GB of heap. As soon as we get Tiny Class-Pointers (planned as part of compact headers upstreaming), we only need 22 bits for the narrow Klass*, and can use 40 bits for forwardee encoding. That's enough for 8TB of heap. If somebody wants to run with larger heap than this, compact headers would be disabled. This change also adds some infrastructure to configure the flags, with the code commented out, to illustrate the intended use. > > An earlier approach to address the problem has been proposed in https://github.com/openjdk/jdk/pull/13582. This has been implemented under the assumption that we would only have 30 bits to encode the forwardee. In the light of changed plans, I don't think it's worth the added complexity and risk of slight performance issues. I will revisit it in the future, for 4-byte-headers. > > I also experimented with a different forwarding approach that would use per-region hashtables, but gave up on it for now, because performance was significantly worse than the sliding forwarding encoding. > > This is in preparation of upstreaming compact object headers, and I intend to push it only once all the parts have been approved. > > Testing: > - [x] hotspot_gc > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [x] tier4 Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Initialize flags in GC specific paths - Reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20605/files - new: https://git.openjdk.org/jdk/pull/20605/files/b6eafc76..932ee693 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20605&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20605&range=00-01 Stats: 44 lines in 10 files changed: 28 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/20605.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20605/head:pull/20605 PR: https://git.openjdk.org/jdk/pull/20605 From rkennke at openjdk.org Thu Aug 22 08:00:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 08:00:54 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v2] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. > - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (o... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 - Explicitely make AVAILABLE=true - Export new hash constants to JVMCI - Improve asserts - 8305894: Implementation: JEP 450: Compact Object Headers (Experimental) ------------- Changes: https://git.openjdk.org/jdk/pull/20640/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20640&range=01 Stats: 1670 lines in 105 files changed: 1234 ins; 208 del; 228 mod Patch: https://git.openjdk.org/jdk/pull/20640.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20640/head:pull/20640 PR: https://git.openjdk.org/jdk/pull/20640 From ayang at openjdk.org Thu Aug 22 08:32:02 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 22 Aug 2024 08:32:02 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 06:09:42 GMT, Roman Kennke wrote: > Otherwise we'd have to preserve/restore them. OK, then it's incorrect to use `init_mark()` for self-fwd objs. This method is for self-fwd objs only. Can we remove the `else if` part? The non-self-fwd objs are essentially dead, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20603#discussion_r1726597935 From rkennke at openjdk.org Thu Aug 22 09:23:03 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 09:23:03 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 08:29:38 GMT, Albert Mingkun Yang wrote: >> Because I'd like self-forwarded marks not to be init-ed. Otherwise we'd have to preserve/restore them. Simply unset_self_forwarded() is enough to get them back to the original state. > >> Otherwise we'd have to preserve/restore them. > > OK, then it's incorrect to use `init_mark()` for self-fwd objs. > > This method is for self-fwd objs only. Can we remove the `else if` part? The non-self-fwd objs are essentially dead, right? I wasn't sure about this. What happens to successfully-promoted objects? Are we sure that no references point to their from-space parts? If yes, then I'd remove the else-if part and place an assert there instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20603#discussion_r1726683296 From stefank at openjdk.org Thu Aug 22 09:27:04 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 22 Aug 2024 09:27:04 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism In-Reply-To: References: Message-ID: <256N9BwlYvCrf4A7z19uZKHOGrXf_JqFOi_TyGJLUP0=.0864e85e-ee36-4d3f-a79b-d77711b99eaf@github.com> On Thu, 22 Aug 2024 09:20:30 GMT, Roman Kennke wrote: >>> Otherwise we'd have to preserve/restore them. >> >> OK, then it's incorrect to use `init_mark()` for self-fwd objs. >> >> This method is for self-fwd objs only. Can we remove the `else if` part? The non-self-fwd objs are essentially dead, right? > > I wasn't sure about this. What happens to successfully-promoted objects? Are we sure that no references point to their from-space parts? If yes, then I'd remove the else-if part and place an assert there instead. The else if part is needed when we later turn on UseCompactObjectHeaders, because the "normal" forwarding pointers then destroy the klass pointers, causing the object_iterate to fail to read the size of the objects. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20603#discussion_r1726689537 From ayang at openjdk.org Thu Aug 22 09:35:02 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 22 Aug 2024 09:35:02 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism In-Reply-To: <256N9BwlYvCrf4A7z19uZKHOGrXf_JqFOi_TyGJLUP0=.0864e85e-ee36-4d3f-a79b-d77711b99eaf@github.com> References: <256N9BwlYvCrf4A7z19uZKHOGrXf_JqFOi_TyGJLUP0=.0864e85e-ee36-4d3f-a79b-d77711b99eaf@github.com> Message-ID: On Thu, 22 Aug 2024 09:24:11 GMT, Stefan Karlsson wrote: >> I wasn't sure about this. What happens to successfully-promoted objects? Are we sure that no references point to their from-space parts? If yes, then I'd remove the else-if part and place an assert there instead. > > The else if part is needed when we later turn on UseCompactObjectHeaders, because the "normal" forwarding pointers then destroy the klass pointers, causing the object_iterate to fail to read the size of the objects. > What happens to successfully-promoted objects? ... Successfully-forwarded objs are don't live in eden/from spaces, which are the spaces this closure is applied on. One can't have assert here, because as we iterate over eden/from spaces, we will encounter successfully-forwarded objs, but they should just be skipped. > The else if part is needed when we later turn on UseCompactObjectHeaders... I believe so, but it should be in the UseCompactObjectHeaders PR, not this one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20603#discussion_r1726703952 From rkennke at openjdk.org Thu Aug 22 09:47:03 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 09:47:03 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism In-Reply-To: References: <256N9BwlYvCrf4A7z19uZKHOGrXf_JqFOi_TyGJLUP0=.0864e85e-ee36-4d3f-a79b-d77711b99eaf@github.com> Message-ID: <3pPYzbW2cEwO49fdLyBuKZhGiCej-Kw-zYuwIpsQyos=.870b6ab7-6672-4d8d-8964-0a374be580c3@github.com> On Thu, 22 Aug 2024 09:32:10 GMT, Albert Mingkun Yang wrote: >> The else if part is needed when we later turn on UseCompactObjectHeaders, because the "normal" forwarding pointers then destroy the klass pointers, causing the object_iterate to fail to read the size of the objects. > >> What happens to successfully-promoted objects? ... > > Successfully-forwarded objs are don't live in eden/from spaces, which are the spaces this closure is applied on. One can't have assert here, because as we iterate over eden/from spaces, we will encounter successfully-forwarded objs, but they should just be skipped. > >> The else if part is needed when we later turn on UseCompactObjectHeaders... > > I believe so, but it should be in the UseCompactObjectHeaders PR, not this one. Ok, thanks for the explanation! I don't think it's needed for the UCOH part - the iterator fetched Klass*/size from forwardee if it encounters forwarded objects. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20603#discussion_r1726724977 From stefank at openjdk.org Thu Aug 22 09:59:04 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 22 Aug 2024 09:59:04 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism In-Reply-To: <3pPYzbW2cEwO49fdLyBuKZhGiCej-Kw-zYuwIpsQyos=.870b6ab7-6672-4d8d-8964-0a374be580c3@github.com> References: <256N9BwlYvCrf4A7z19uZKHOGrXf_JqFOi_TyGJLUP0=.0864e85e-ee36-4d3f-a79b-d77711b99eaf@github.com> <3pPYzbW2cEwO49fdLyBuKZhGiCej-Kw-zYuwIpsQyos=.870b6ab7-6672-4d8d-8964-0a374be580c3@github.com> Message-ID: On Thu, 22 Aug 2024 09:44:12 GMT, Roman Kennke wrote: >>> What happens to successfully-promoted objects? ... >> >> Successfully-forwarded objs are don't live in eden/from spaces, which are the spaces this closure is applied on. One can't have assert here, because as we iterate over eden/from spaces, we will encounter successfully-forwarded objs, but they should just be skipped. >> >>> The else if part is needed when we later turn on UseCompactObjectHeaders... >> >> I believe so, but it should be in the UseCompactObjectHeaders PR, not this one. > > Ok, thanks for the explanation! > > I don't think it's needed for the UCOH part - the iterator fetched Klass*/size from forwardee if it encounters forwarded objects. You didn't change the iterators for the Serial GC, only for Parallel. But, yes, we can deal with this in the UCOH PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20603#discussion_r1726744403 From ihse at openjdk.org Thu Aug 22 10:33:05 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 22 Aug 2024 10:33:05 GMT Subject: RFR: Implement JEP 450: Compact Object Headers (Experimental) [v2] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 08:00:54 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. >> - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral t... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 > - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 > - Explicitely make AVAILABLE=true > - Export new hash constants to JVMCI > - Improve asserts > - 8305894: Implementation: JEP 450: Compact Object Headers (Experimental) Build changes look good. I have not looked at any other changes. @rkennke Note that the Skara bot removed the RFR label when you changed the title to no longer match a JBS issue. This means that no emails will be sent to the corresponding lists. I am not sure if this was intentional on your part. ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20640#pullrequestreview-2254136529 PR Comment: https://git.openjdk.org/jdk/pull/20640#issuecomment-2304321468 From rkennke at openjdk.org Thu Aug 22 10:34:44 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 10:34:44 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v2] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when promotion fails, to indicate that the object has been looked at, but failed promotion. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. > > A side-effect of this is that we can get rid of the machinery to preserve headers across promotion failures in Serial and G1. (Parallel GC [ab]uses the preserved-headers structure to also find all the forwarded objects for header restoration. This could be changed, I suppose, but it is not trivial.) If you prefer, I could break-out the removal of the preserved-headers stuff into a separate PR. > > This is in preparation of upstreaming compact object headers, and I intend to push it only once all the parts have been approved. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [x] tier4 > - [x] Running in production @ AWS since >1year without troubles Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remove special handling of non-self-fwded objects ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20603/files - new: https://git.openjdk.org/jdk/pull/20603/files/86239af3..ba54b2d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20603&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20603&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20603.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20603/head:pull/20603 PR: https://git.openjdk.org/jdk/pull/20603 From rkennke at openjdk.org Thu Aug 22 10:36:39 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 10:36:39 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v3] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Parallel, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large because now all headers would be 'interesting' and would have to be preserved. > > I propose to use an alternative encoding for full-GC (sliding-GC) so that the forwarding information fits into the lowest 32 bits of the header. The encoding is similar to compressed-oops encoding: it basically subtracts the forwardee address from the heap-base, shifts that difference into the right place, and sets the lowest two bits (to indicate 'forwarded' state as usual). > > The current implementation preserves the upper 32 bits of the mark-word. This leaves 30 bits for encoding the forwardee, enough for 8GB of heap. As soon as we get Tiny Class-Pointers (planned as part of compact headers upstreaming), we only need 22 bits for the narrow Klass*, and can use 40 bits for forwardee encoding. That's enough for 8TB of heap. If somebody wants to run with larger heap than this, compact headers would be disabled. This change also adds some infrastructure to configure the flags, with the code commented out, to illustrate the intended use. > > An earlier approach to address the problem has been proposed in https://github.com/openjdk/jdk/pull/13582. This has been implemented under the assumption that we would only have 30 bits to encode the forwardee. In the light of changed plans, I don't think it's worth the added complexity and risk of slight performance issues. I will revisit it in the future, for 4-byte-headers. > > I also experimented with a different forwarding approach that would use per-region hashtables, but gave up on it for now, because performance was significantly worse than the sliding forwarding encoding. > > This is in preparation of upstreaming compact object headers, and I intend to push it only once all the parts have been approved. > > Testing: > - [x] hotspot_gc > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [x] tier4 Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'JDK-8305898-v4' into JDK-8305896-v2 - Initialize flags in GC specific paths - Reviews - 8305896O Alternative full GC forwarding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20605/files - new: https://git.openjdk.org/jdk/pull/20605/files/932ee693..d5f735fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20605&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20605&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20605.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20605/head:pull/20605 PR: https://git.openjdk.org/jdk/pull/20605 From rkennke at openjdk.org Thu Aug 22 11:05:18 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 11:05:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. > - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (o... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 - Explicitely make AVAILABLE=true - Export new hash constants to JVMCI - Improve asserts - 8305894: Implementation: JEP 450: Compact Object Headers (Experimental) ------------- Changes: https://git.openjdk.org/jdk/pull/20640/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20640&range=02 Stats: 1670 lines in 105 files changed: 1234 ins; 208 del; 228 mod Patch: https://git.openjdk.org/jdk/pull/20640.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20640/head:pull/20640 PR: https://git.openjdk.org/jdk/pull/20640 From rkennke at openjdk.org Thu Aug 22 11:05:18 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 11:05:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v2] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 10:30:00 GMT, Magnus Ihse Bursie wrote: > @rkennke Note that the Skara bot removed the RFR label when you changed the title to no longer match a JBS issue. This means that no emails will be sent to the corresponding lists. I am not sure if this was intentional on your part. Thanks for pointing that out! No it was not intentional. Mark changed the title in the JBS issue, and I copied that over, but forgot the actual issue number. Should be fixed now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20640#issuecomment-2304384577 From shade at openjdk.org Thu Aug 22 11:42:08 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 22 Aug 2024 11:42:08 GMT Subject: RFR: 8338662: Shenandoah: Remove excessive ShenandoahVerifier::verify_during_evacuation In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:22:23 GMT, Aleksey Shipilev wrote: > `ShenandoahVerifier::verify_during_evacuation` is a relaxed version of `ShenandoahVerifier::verify_before_evacuation`. In current code, "during" verification is called shortly after "before" check, which really gains us nothing checking-wise, and only really wastes verification time. This is the only "during" verification check we have, all other checks verify things before/after the phases. It makes sense to remove "during evac" verification check for extra debug performance and cleanliness. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20641#issuecomment-2304454798 From shade at openjdk.org Thu Aug 22 11:42:08 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 22 Aug 2024 11:42:08 GMT Subject: Integrated: 8338662: Shenandoah: Remove excessive ShenandoahVerifier::verify_during_evacuation In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:22:23 GMT, Aleksey Shipilev wrote: > `ShenandoahVerifier::verify_during_evacuation` is a relaxed version of `ShenandoahVerifier::verify_before_evacuation`. In current code, "during" verification is called shortly after "before" check, which really gains us nothing checking-wise, and only really wastes verification time. This is the only "during" verification check we have, all other checks verify things before/after the phases. It makes sense to remove "during evac" verification check for extra debug performance and cleanliness. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC -XX:+ShenandoahVerify` This pull request has now been integrated. Changeset: 6cf7f9c4 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/6cf7f9c4a76b99ed7aa4dc7ee54692331fc73408 Stats: 39 lines in 4 files changed: 4 ins; 34 del; 1 mod 8338662: Shenandoah: Remove excessive ShenandoahVerifier::verify_during_evacuation Reviewed-by: wkemper, ysr ------------- PR: https://git.openjdk.org/jdk/pull/20641 From iwalulya at openjdk.org Thu Aug 22 12:02:02 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 22 Aug 2024 12:02:02 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 06:59:35 GMT, Albert Mingkun Yang wrote: > Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. > > For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. > > Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 "The effect of this fragmentation can be observed using `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java`. The Parallel collector takes significantly longer than other collectors, around 30 seconds compared to about 8 seconds. By adding `-Xlog:gc`, one can see that the Parallel collector runs approximately 47 full GCs, whereas others run around 12." Any details on the improvements observed with this patch for this test? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20590#issuecomment-2304491910 From rkennke at openjdk.org Thu Aug 22 14:51:15 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 14:51:15 GMT Subject: Withdrawn: 8305895: Implement JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:07:26 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. > - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (o... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20640 From rkennke at openjdk.org Thu Aug 22 14:51:10 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 14:51:10 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v2] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 10:34:44 GMT, Roman Kennke wrote: >> Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when promotion fails, to indicate that the object has been looked at, but failed promotion. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. >> >> A side-effect of this is that we can get rid of the machinery to preserve headers across promotion failures in Serial and G1. (Parallel GC [ab]uses the preserved-headers structure to also find all the forwarded objects for header restoration. This could be changed, I suppose, but it is not trivial.) If you prefer, I could break-out the removal of the preserved-headers stuff into a separate PR. >> >> This is in preparation of upstreaming compact object headers, and I intend to push it only once all the parts have been approved. >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] tier3 >> - [x] tier4 >> - [x] Running in production @ AWS since >1year without troubles > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Remove special handling of non-self-fwded objects Superseding by #20677 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20603#issuecomment-2304865196 From rkennke at openjdk.org Thu Aug 22 14:51:15 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 14:51:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: <_dUMNfQKbcgEP_r6avLEDuPprVLFitHPaIWTxJ7_ZcU=.c711b62a-6bc1-4e69-85f2-52e38ccfeb87@github.com> On Thu, 22 Aug 2024 11:05:18 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #20603 and #20605 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - The identity hash-code is temporarily narrowed to 25 bits. As soon as we get Tiny Class-Pointers (planned before the JEP can be integrated, and to be opened for review soon), we will widen the hash-bits back to 31 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. >> - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral t... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 > - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 > - Merge branch 'JDK-8305896-v2' into JDK-8305895-v3 > - Explicitely make AVAILABLE=true > - Export new hash constants to JVMCI > - Improve asserts > - 8305894: Implementation: JEP 450: Compact Object Headers (Experimental) Superseding by #20677 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20640#issuecomment-2304864714 From rkennke at openjdk.org Thu Aug 22 14:51:11 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 14:51:11 GMT Subject: Withdrawn: 8305898: Alternative self-forwarding mechanism In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 20:52:13 GMT, Roman Kennke wrote: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when promotion fails, to indicate that the object has been looked at, but failed promotion. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. > > A side-effect of this is that we can get rid of the machinery to preserve headers across promotion failures in Serial and G1. (Parallel GC [ab]uses the preserved-headers structure to also find all the forwarded objects for header restoration. This could be changed, I suppose, but it is not trivial.) If you prefer, I could break-out the removal of the preserved-headers stuff into a separate PR. > > This is in preparation of upstreaming compact object headers, and I intend to push it only once all the parts have been approved. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] tier3 > - [x] tier4 > - [x] Running in production @ AWS since >1year without troubles This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20603 From rkennke at openjdk.org Thu Aug 22 14:53:47 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 14:53:47 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) Message-ID: This is the main body of the JEP 450: Compact Object Headers (Experimental). It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. Main changes: - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). - Arrays will now store their length at offset 8. - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archives are generated, next to the _nocoops variant. - Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I played with other approaches to implement LoadNKlass. Expanding it as a macro did not easily work, because C2 is missing a way to cast a word-sized integral to a narrow-Klass* (or at least I could not find it), and also I fear that doing so could mess with optimizations. This may be useful to revisit. OTOH, the approach that I have taken works and is similar to DecodeNKlass and similar instructions. Testing: (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests.) The below testing has been run many times, but not with this exact base version of the JDK. I want to hold off the full testing until we also have the Tiny Class-Pointers PR lined-up, and test with that. - [x] tier1 (x86_64) - [ ] tier2 (x86_64) - [ ] tier3 (x86_64) - [ ] tier4 (x86_64) - [x] tier1 (aarch64) - [ ] tier2 (aarch64) - [ ] tier3 (aarch64) - [ ] tier4 (aarch64) - [x] tier1 (x86_64) +UseCompactObjectHeaders - [ ] tier2 (x86_64) +UseCompactObjectHeaders - [ ] tier3 (x86_64) +UseCompactObjectHeaders - [ ] tier4 (x86_64) +UseCompactObjectHeaders - [x] tier1 (aarch64) +UseCompactObjectHeaders - [ ] tier2 (aarch64) +UseCompactObjectHeaders - [ ] tier3 (aarch64) +UseCompactObjectHeaders - [ ] tier4 (aarch64) +UseCompactObjectHeaders - [x] Running as a backport in production since >1 year. ------------- Commit messages: - 8305895: Implement JEP 450: Compact Object Headers (Experimental) Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305895 Stats: 4526 lines in 187 files changed: 3238 ins; 671 del; 617 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Aug 22 15:00:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 15:00:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v2] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Add missing newline ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/ed032173..18e08c1e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Aug 22 16:23:48 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 16:23:48 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remove hashcode leftovers from SA ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/18e08c1e..1578ffae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Aug 22 17:59:10 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 17:59:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v4] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix hash_mask_in_place in ClhsdbLongConstant test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/1578ffae..7009e147 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Aug 22 18:18:01 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 18:18:01 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix hash shift for 32 bit builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/7009e147..5ffc582f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From ihse at openjdk.org Thu Aug 22 19:29:03 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 22 Aug 2024 19:29:03 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 18:18:01 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix hash shift for 32 bit builds Build changes look good. I have not looked at any other code. ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2255474321 From duke at openjdk.org Thu Aug 22 19:36:15 2024 From: duke at openjdk.org (duke) Date: Thu, 22 Aug 2024 19:36:15 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v6] In-Reply-To: <-UX_bf4EPxAUOOYQrK69o4o0ewKcMx8ajJog5bdCspE=.b3ce2098-a937-4c81-8aca-1993c14b9e28@github.com> References: <-UX_bf4EPxAUOOYQrK69o4o0ewKcMx8ajJog5bdCspE=.b3ce2098-a937-4c81-8aca-1993c14b9e28@github.com> Message-ID: On Mon, 12 Aug 2024 22:39:25 GMT, Neethu Prasad wrote: >> **Notes** >> Adding logs to get more visibility into how fast a thread resumes from allocation stall. >> >> **Testing** >> * tier 1, tier 2, hotspot_gc tests. >> >> Example log messages >> >> 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. >> >> 2. Thread exiting critical region. Thread "main" 0 locked. >> >> 3. Thread stalled by JNI critical section. Resumed after 1294ms. Thread "Thread-0". >> >> 4. Thread blocked to enter critical region. Resumed after 1280ms. Thread "SIGINT handler". > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > address feedback regarding logger potentially getting instantiated multiple times @neethu-prasad Your change (at version 6ca6f29da58ecd5072192d96f47b34e7706d5aac) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20277#issuecomment-2305488620 From nprasad at openjdk.org Thu Aug 22 19:36:15 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Thu, 22 Aug 2024 19:36:15 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v6] In-Reply-To: <-UX_bf4EPxAUOOYQrK69o4o0ewKcMx8ajJog5bdCspE=.b3ce2098-a937-4c81-8aca-1993c14b9e28@github.com> References: <-UX_bf4EPxAUOOYQrK69o4o0ewKcMx8ajJog5bdCspE=.b3ce2098-a937-4c81-8aca-1993c14b9e28@github.com> Message-ID: On Mon, 12 Aug 2024 22:39:25 GMT, Neethu Prasad wrote: >> **Notes** >> Adding logs to get more visibility into how fast a thread resumes from allocation stall. >> >> **Testing** >> * tier 1, tier 2, hotspot_gc tests. >> >> Example log messages >> >> 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. >> >> 2. Thread exiting critical region. Thread "main" 0 locked. >> >> 3. Thread stalled by JNI critical section. Resumed after 1294ms. Thread "Thread-0". >> >> 4. Thread blocked to enter critical region. Resumed after 1280ms. Thread "SIGINT handler". > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > address feedback regarding logger potentially getting instantiated multiple times Thanks for review & approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20277#issuecomment-2305487204 From rkennke at openjdk.org Thu Aug 22 20:08:43 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 22 Aug 2024 20:08:43 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix bit counts in GCForwarding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/5ffc582f..eaec1117 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From ayang at openjdk.org Thu Aug 22 20:16:05 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 22 Aug 2024 20:16:05 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 16:23:48 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Remove hashcode leftovers from SA src/hotspot/share/gc/parallel/mutableSpace.cpp line 232: > 230: p += obj->forwardee()->size(); > 231: } else { > 232: p += obj->size(); I feel it's more correct to go through the forwardee for forwarded objs even for the non-COMPACT_HEADERS case. (This method is meant to cover all objs, so should not be perf-critical.) IOW, the `false` case should just be dropped. src/hotspot/share/gc/serial/defNewGeneration.cpp line 707: > 705: } else if (obj->is_forwarded()) { > 706: // To restore the klass-bits in the header. > 707: obj->forward_safe_init_mark(); I wonder if not modifying successful-forwarded objs is cleaner. Sth like: reset_self_forwarded_in_space(space) { cur = space->bottom(); top = space->top(); while (cur < top) { obj = cast_to_oop(cur); if (obj->is_self_forwarded()) { obj->unset_self_forwarded(); obj_size = obj->size(); } else { assert(obj->is_forwarded(), "inv"); obj_size = obj->forwardee()->size(); } cur += obj_size; } } reset_self_forwarded_in_space(eden()); reset_self_forwarded_in_space(from()); src/hotspot/share/gc/serial/serialArguments.cpp line 33: > 31: void SerialArguments::initialize_heap_flags_and_sizes() { > 32: GenArguments::initialize_heap_flags_and_sizes(); > 33: GCForwarding::initialize_flags(MaxNewSize + MaxOldSize); Can one use `MaxHeapSize` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1727547638 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1727524479 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1727548413 From ayang at openjdk.org Thu Aug 22 20:16:07 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 22 Aug 2024 20:16:07 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: References: Message-ID: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> On Thu, 22 Aug 2024 18:18:01 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix hash shift for 32 bit builds src/hotspot/share/gc/shared/gcForwarding.cpp line 37: > 35: size_t max_narrow_heap_size = right_n_bits(NumLowBitsNarrow - Shift); > 36: if (UseCompactObjectHeaders && max_heap_size > max_narrow_heap_size * HeapWordSize) { > 37: FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); Maybe a log-info/warning would be nice. src/hotspot/share/gc/shared/gcForwarding.hpp line 36: > 34: * Implements forwarding for the full-GCs of Serial, Parallel, G1 and Shenandoah in > 35: * a way that preserves upper N bits of object mark-words, which contain crucial > 36: * Klass* information when running with compact headers. The encoding is similar to This doc suggests this forwarding is only for compact-header so I wonder if we can check `UseCompactObjectHeaders` directly instead of heap-size in `GCForwarding::initialize`. src/hotspot/share/gc/shared/gcForwarding.hpp line 40: > 38: * heap-base, shifts that difference into the right place, and sets the lowest two > 39: * bits (to indicate 'forwarded' state as usual). > 40: */ > "can use 40 bits for forwardee encoding. That's enough for 8TB of heap." I feel this 8T-constraint is significant and should be in the doc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1727708193 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1727727638 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1727732496 From gli at openjdk.org Thu Aug 22 23:28:07 2024 From: gli at openjdk.org (Guoxiong Li) Date: Thu, 22 Aug 2024 23:28:07 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 06:59:35 GMT, Albert Mingkun Yang wrote: > Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. > > For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. > > With this patch, the exec time of `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java` using Parallel drops from ~30s to ~8s, the same as other GCs, and gc-log shows similar number of GC cycles as well. > > Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 Nice improvement. Several suggestions/questions. src/hotspot/share/gc/parallel/psParallelCompact.cpp line 204: > 202: assert(_split_point == nullptr, "inv"); > 203: assert(_preceding_live_words == 0, "inv"); > 204: assert(_split_destination_count == 0, "inv"); May be better to use `not clear` uniformly? src/hotspot/share/gc/parallel/psParallelCompact.cpp line 309: > 307: // The total live words on src_region would overflow the target space, so find > 308: // the overflowing object and recorde the split point. The invariant is that an > 309: // obj should not cross space boundary. Typo `recorde`. src/hotspot/share/gc/parallel/psParallelCompact.cpp line 364: > 362: HeapWord* new_top = destination - pointer_delta(src_region_start, overflowing_obj); > 363: > 364: // If the overflowing obj were to relocated to its original destination, Typo `were to relocated to`. src/hotspot/share/gc/parallel/psParallelCompact.cpp line 393: > 391: if (cur_addr >= region_end) { > 392: break; > 393: } What about using a `for` statement here? For example: --- a/src/hotspot/share/gc/parallel/psParallelCompact.cpp +++ b/src/hotspot/share/gc/parallel/psParallelCompact.cpp @@ -382,15 +382,11 @@ HeapWord* ParallelCompactData::summarize_split_space(size_t src_region, // Obj-iteration to locate the overflowing obj HeapWord* region_start = region_to_addr(src_region); HeapWord* region_end = region_start + RegionSize; - HeapWord* cur_addr = region_start + partial_obj_size; size_t live_words = partial_obj_size; - while (true) { - assert(cur_addr < region_end, "inv"); - cur_addr = PSParallelCompact::mark_bitmap()->find_obj_beg(cur_addr, region_end); - if (cur_addr >= region_end) { - break; - } + for (HeapWord* cur_addr = region_start + partial_obj_size; + cur_addr < region_end; + cur_addr = PSParallelCompact::mark_bitmap()->find_obj_beg(cur_addr, region_end)) { oop obj = cast_to_oop(cur_addr); size_t obj_size = obj->size(); src/hotspot/share/gc/parallel/psParallelCompact.cpp line 482: > 480: assert(source_next != nullptr, "source_next is null when splitting"); > 481: *source_next = summarize_split_space(cur_region, split_info, dest_addr, > 482: target_end, target_next); Actually, the last parameter `target_next` of the method `summarize_split_space` won't be used at any other places. The bottom of the new space will be used instead (see `PSParallelCompact::summary_phase`). So I think the last parameter `target_next` of the method `summarize_split_space` can be **removed**. Then in `PSParallelCompact::summary_phase`, we can pass a null pointer in this situation (See code below) so that the meaning becomes clearer. // method `PSParallelCompact::summary_phase` } else if (live > 0) { // Attempt to fit part of the source space into the target space. HeapWord* next_src_addr = nullptr; bool done = _summary_data.summarize(_space_info[id].split_info(), space->bottom(), space->top(), &next_src_addr, *new_top_addr, dst_space_end, new_top_addr); // <--- here, can pass `nullptr` assert(!done, "space should not fit into old gen"); assert(next_src_addr != nullptr, "sanity"); src/hotspot/share/gc/parallel/psParallelCompact.cpp line 524: > 522: log_warning(gc)("Uncleared Region: %u", cur_idx); > 523: region(cur_idx)->verify_clear(); > 524: } In `PSParallelCompact::clear_data_covering_space --> ParallelCompactData::clear_range`, the `ParallelCompactData::_region_data` is set to `0` directly. I don't know whether it is worth adding two methods `ParallelCompactData::RegionData::is_clear/verify_clear` to verify. And the previous implementation can verify the field `RegionData::_pushed` as well. src/hotspot/share/gc/parallel/psParallelCompact.cpp line 1560: > 1558: HeapWord* end, > 1559: HeapWord* destination, > 1560: size_t live_words) { We can pass the precise destination and then the last parameter `live_words` of the method `PSParallelCompact::forward_to_new_addr` can be removed. Just like: --- a/src/hotspot/share/gc/parallel/psParallelCompact.cpp +++ b/src/hotspot/share/gc/parallel/psParallelCompact.cpp @@ -1556,8 +1556,7 @@ void PSParallelCompact::forward_to_new_addr() { static void forward_objs_in_range(ParCompactionManager* cm, HeapWord* start, HeapWord* end, - HeapWord* destination, - size_t live_words) { + HeapWord* destination) { HeapWord* cur_addr = start; while (cur_addr < end) { @@ -1566,14 +1565,13 @@ void PSParallelCompact::forward_to_new_addr() { return; } assert(mark_bitmap()->is_marked(cur_addr), "inv"); - HeapWord* new_addr = destination + live_words; oop obj = cast_to_oop(cur_addr); - if (new_addr != cur_addr) { + if (destination != cur_addr) { cm->preserved_marks()->push_if_necessary(obj, obj->mark()); - obj->forward_to(cast_to_oop(new_addr)); + obj->forward_to(cast_to_oop(destination)); } size_t obj_size = obj->size(); - live_words += obj_size; + destination += obj_size; cur_addr += obj_size; } } @@ -1613,14 +1611,14 @@ void PSParallelCompact::forward_to_new_addr() { // Part 1: will be relocated to space-1 HeapWord* split_destination = split_info.split_destination(); HeapWord* split_point = split_info.split_point(); - forward_objs_in_range(cm, region_start + live_words, split_point, split_destination, live_words); + forward_objs_in_range(cm, region_start + live_words, split_point, split_destination + live_words); // Part 2: will be relocated to space-2 HeapWord* destination = region_ptr->destination(); - forward_objs_in_range(cm, split_point, region_end, destination, 0); + forward_objs_in_range(cm, split_point, region_end, destination); } else { HeapWord* destination = region_ptr->destination(); - forward_objs_in_range(cm, region_start + live_words, region_end, destination, live_words); + forward_objs_in_range(cm, region_start + live_words, region_end, destination + live_words); } } } src/hotspot/share/gc/parallel/psParallelCompact.cpp line 1611: > 1609: HeapWord* region_end = region_start + ParallelCompactData::RegionSize; > 1610: > 1611: const SplitInfo& split_info = _space_info[space_id(region_start)].split_info(); Could the variable `SplitInfo` be moved outside/before the `for` loop to avoid duplicated allocations? src/hotspot/share/gc/parallel/psParallelCompact.cpp line 1998: > 1996: // source-region contains this location. This location is retrieved by calling > 1997: // `first_src_addr` on a dest-region. > 1998: // Conversely, a source-region has a dest-region which holds the destinatino of Typo `destinatino`. src/hotspot/share/gc/parallel/psParallelCompact.cpp line 2271: > 2269: if (partial_obj_start == obj_start) { > 2270: // This obj extends to next region. > 2271: obj_end = partial_obj_end(next_region_start); Question: We know the object start position in this branch. Why can't we use object size (in new line 2271) directly (like new line 2274 shown below)? Why is it not safe? // Completely contained in this region; safe to use size(). obj_end = obj_start + cast_to_oop(obj_start)->size(); src/hotspot/share/gc/parallel/psParallelCompact.hpp line 152: > 150: HeapWord* _split_point; > 151: size_t _preceding_live_words; > 152: uint _split_destination_count; The names `_split_destination` and `_split_destination_count` may be ambiguous. In `_split_destination`, the prefix `split` means two parts whose destinations locate in different spaces. But in `_split_destination_count`, the prefix `split` means two parts whose destinations locate in different regions but in the same space. I suggest to change the field `_split_destination_count` to `_preceding_destination_count` (or other more appropriate name). ------------- Changes requested by gli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20590#pullrequestreview-2253703015 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728036224 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1726498140 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1726911573 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1726908715 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1726876009 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1726965885 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1727427367 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1727325953 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1727453862 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1727998329 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1727773965 From stefank at openjdk.org Fri Aug 23 07:12:07 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 23 Aug 2024 07:12:07 GMT Subject: RFR: 8337658: ZGC: Move soft reference handling out of the driver loop function [v2] In-Reply-To: References: Message-ID: On Tue, 20 Aug 2024 10:07:07 GMT, Stefan Karlsson wrote: >> The ZDriver code is written to be neat and have a clear outline. The soft reference handling distracts when reading this code. I propose that we hide it a bit. >> >> I've also clarified in comments and names that the code is dealing with clearing of *all* references. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix inconsistent naming Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20418#issuecomment-2306436896 From stefank at openjdk.org Fri Aug 23 07:12:08 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 23 Aug 2024 07:12:08 GMT Subject: Integrated: 8337658: ZGC: Move soft reference handling out of the driver loop function In-Reply-To: References: Message-ID: <__sooj1X9qyJdT-d9DqxNpSaMT7aw0SaNVc83e2LS0c=.9834f1c0-f53b-449c-b830-db97033b80c5@github.com> On Thu, 1 Aug 2024 12:19:04 GMT, Stefan Karlsson wrote: > The ZDriver code is written to be neat and have a clear outline. The soft reference handling distracts when reading this code. I propose that we hide it a bit. > > I've also clarified in comments and names that the code is dealing with clearing of *all* references. This pull request has now been integrated. Changeset: 9cbf685b Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/9cbf685b0b1ade5e6ddebfeec225b2efb5cf4cfc Stats: 51 lines in 8 files changed: 20 ins; 4 del; 27 mod 8337658: ZGC: Move soft reference handling out of the driver loop function Reviewed-by: gli, aboldtch, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/20418 From ayang at openjdk.org Fri Aug 23 08:47:05 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 23 Aug 2024 08:47:05 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 22:28:04 GMT, Guoxiong Li wrote: >> Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. >> >> For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. >> >> With this patch, the exec time of `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java` using Parallel drops from ~30s to ~8s, the same as other GCs, and gc-log shows similar number of GC cycles as well. >> >> Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 > > src/hotspot/share/gc/parallel/psParallelCompact.cpp line 2271: > >> 2269: if (partial_obj_start == obj_start) { >> 2270: // This obj extends to next region. >> 2271: obj_end = partial_obj_end(next_region_start); > > Question: We know the object start position in this branch. Why can't we use object size (in new line 2271) directly (like new line 2274 shown below)? Why is it not safe? > > > // Completely contained in this region; safe to use size(). > obj_end = obj_start + cast_to_oop(obj_start)->size(); `size()` uses `klass`, which may lie in the next region (depending on the number of left words in this region), which can belong to another worker. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728611288 From ayang at openjdk.org Fri Aug 23 08:54:03 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 23 Aug 2024 08:54:03 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 12:30:29 GMT, Guoxiong Li wrote: >> Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. >> >> For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. >> >> With this patch, the exec time of `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java` using Parallel drops from ~30s to ~8s, the same as other GCs, and gc-log shows similar number of GC cycles as well. >> >> Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 > > src/hotspot/share/gc/parallel/psParallelCompact.cpp line 524: > >> 522: log_warning(gc)("Uncleared Region: %u", cur_idx); >> 523: region(cur_idx)->verify_clear(); >> 524: } > > In `PSParallelCompact::clear_data_covering_space --> ParallelCompactData::clear_range`, the `ParallelCompactData::_region_data` is set to `0` directly. I don't know whether it is worth adding two methods `ParallelCompactData::RegionData::is_clear/verify_clear` to verify. And the previous implementation can verify the field `RegionData::_pushed` as well. It's mostly for easier debugging. During dev of this patch, if this assertion fails, it's unclear which field is problematic, so I went for this new impl. `RegionData::_pushed` has only a single mutating place, which performs verification already, so I think it's fine not to verify it here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728622460 From ayang at openjdk.org Fri Aug 23 09:01:05 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 23 Aug 2024 09:01:05 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC In-Reply-To: References: Message-ID: <6Vbb705vYo4NSvCKYiN2iW26COgTrNrVN57YQolEP-U=.0c26183a-1fbc-4a25-92ca-4f49acaf01be@github.com> On Thu, 22 Aug 2024 11:27:03 GMT, Guoxiong Li wrote: >> Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. >> >> For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. >> >> With this patch, the exec time of `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java` using Parallel drops from ~30s to ~8s, the same as other GCs, and gc-log shows similar number of GC cycles as well. >> >> Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 > > src/hotspot/share/gc/parallel/psParallelCompact.cpp line 482: > >> 480: assert(source_next != nullptr, "source_next is null when splitting"); >> 481: *source_next = summarize_split_space(cur_region, split_info, dest_addr, >> 482: target_end, target_next); > > Actually, the last parameter `target_next` of the method `summarize_split_space` won't be used at any other places. The bottom of the new space will be used instead (see `PSParallelCompact::summary_phase`). So I think the last parameter `target_next` of the method `summarize_split_space` can be **removed**. Then in `PSParallelCompact::summary_phase`, we can pass a null pointer in this situation (See code below) so that the meaning becomes clearer. > > > // method `PSParallelCompact::summary_phase` > } else if (live > 0) { > // Attempt to fit part of the source space into the target space. > HeapWord* next_src_addr = nullptr; > bool done = _summary_data.summarize(_space_info[id].split_info(), > space->bottom(), space->top(), > &next_src_addr, > *new_top_addr, dst_space_end, > new_top_addr); // <--- here, can pass `nullptr` > assert(!done, "space should not fit into old gen"); > assert(next_src_addr != nullptr, "sanity"); `target_next` is used in the callee to set up the new-top. In `summarize_split_space`: // Update new top of target space *target_next = new_top; > src/hotspot/share/gc/parallel/psParallelCompact.hpp line 152: > >> 150: HeapWord* _split_point; >> 151: size_t _preceding_live_words; >> 152: uint _split_destination_count; > > The names `_split_destination` and `_split_destination_count` may be ambiguous. In `_split_destination`, the prefix `split` means two parts whose destinations locate in different spaces. But in `_split_destination_count`, the prefix `split` means two parts whose destinations locate in different regions but in the same space. I suggest to change the field `_split_destination_count` to `_preceding_destination_count` (or other more appropriate name). I tried to make fields of `class SplitInfo` to match their counterpart in `class RegionData`, except the additional `_split_` prefix, in order to signify these fields are closely related. > But in _split_destination_count, the prefix split means two parts whose destinations locate in different regions but in the same space. Why are they in the "same" space? The purpose of having "split" to support space-boundary so that the first part and the second part are in two spaces. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728631650 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728628594 From ayang at openjdk.org Fri Aug 23 09:14:36 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 23 Aug 2024 09:14:36 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v2] In-Reply-To: References: Message-ID: > Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. > > For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. > > With this patch, the exec time of `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java` using Parallel drops from ~30s to ~8s, the same as other GCs, and gc-log shows similar number of GC cycles as well. > > Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - review - Merge branch 'master' into pgc-split-region - pgc-split-region ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20590/files - new: https://git.openjdk.org/jdk/pull/20590/files/38cc30c2..a4665329 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20590&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20590&range=00-01 Stats: 15756 lines in 381 files changed: 10446 ins; 3328 del; 1982 mod Patch: https://git.openjdk.org/jdk/pull/20590.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20590/head:pull/20590 PR: https://git.openjdk.org/jdk/pull/20590 From ayang at openjdk.org Fri Aug 23 09:14:37 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 23 Aug 2024 09:14:37 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v2] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 11:50:24 GMT, Guoxiong Li wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - review >> - Merge branch 'master' into pgc-split-region >> - pgc-split-region > > src/hotspot/share/gc/parallel/psParallelCompact.cpp line 393: > >> 391: if (cur_addr >= region_end) { >> 392: break; >> 393: } > > What about using a `for` statement here? For example: > > > --- a/src/hotspot/share/gc/parallel/psParallelCompact.cpp > +++ b/src/hotspot/share/gc/parallel/psParallelCompact.cpp > @@ -382,15 +382,11 @@ HeapWord* ParallelCompactData::summarize_split_space(size_t src_region, > // Obj-iteration to locate the overflowing obj > HeapWord* region_start = region_to_addr(src_region); > HeapWord* region_end = region_start + RegionSize; > - HeapWord* cur_addr = region_start + partial_obj_size; > size_t live_words = partial_obj_size; > > - while (true) { > - assert(cur_addr < region_end, "inv"); > - cur_addr = PSParallelCompact::mark_bitmap()->find_obj_beg(cur_addr, region_end); > - if (cur_addr >= region_end) { > - break; > - } > + for (HeapWord* cur_addr = region_start + partial_obj_size; > + cur_addr < region_end; > + cur_addr = PSParallelCompact::mark_bitmap()->find_obj_beg(cur_addr, region_end)) { > > oop obj = cast_to_oop(cur_addr); > size_t obj_size = obj->size(); The sub-parts of `for` are rather complex, IMO, so I'd prefer the original style. > src/hotspot/share/gc/parallel/psParallelCompact.cpp line 1560: > >> 1558: HeapWord* end, >> 1559: HeapWord* destination, >> 1560: size_t live_words) { > > We can pass the precise destination and then the last parameter `live_words` of the method `PSParallelCompact::forward_to_new_addr` can be removed. Just like: > > > --- a/src/hotspot/share/gc/parallel/psParallelCompact.cpp > +++ b/src/hotspot/share/gc/parallel/psParallelCompact.cpp > @@ -1556,8 +1556,7 @@ void PSParallelCompact::forward_to_new_addr() { > static void forward_objs_in_range(ParCompactionManager* cm, > HeapWord* start, > HeapWord* end, > - HeapWord* destination, > - size_t live_words) { > + HeapWord* destination) { > HeapWord* cur_addr = start; > > while (cur_addr < end) { > @@ -1566,14 +1565,13 @@ void PSParallelCompact::forward_to_new_addr() { > return; > } > assert(mark_bitmap()->is_marked(cur_addr), "inv"); > - HeapWord* new_addr = destination + live_words; > oop obj = cast_to_oop(cur_addr); > - if (new_addr != cur_addr) { > + if (destination != cur_addr) { > cm->preserved_marks()->push_if_necessary(obj, obj->mark()); > - obj->forward_to(cast_to_oop(new_addr)); > + obj->forward_to(cast_to_oop(destination)); > } > size_t obj_size = obj->size(); > - live_words += obj_size; > + destination += obj_size; > cur_addr += obj_size; > } > } > @@ -1613,14 +1611,14 @@ void PSParallelCompact::forward_to_new_addr() { > // Part 1: will be relocated to space-1 > HeapWord* split_destination = split_info.split_destination(); > HeapWord* split_point = split_info.split_point(); > - forward_objs_in_range(cm, region_start + live_words, split_point, split_destination, live_words); > + forward_objs_in_range(cm, region_start + live_words, split_point, split_destination + live_words); > > // Part 2: will be relocated to space-2 > HeapWord* destination = region_ptr->destination(); > - forward_objs_in_range(cm, split_point, region_end, destination, 0); > + forward_objs_in_range(cm, split_point, region_end, destination); > } else {... Thank you for the suggestion. I have made minor adjustment so that the arg stays immutable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728649848 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728645192 From shade at openjdk.org Fri Aug 23 10:26:05 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 23 Aug 2024 10:26:05 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v3] In-Reply-To: References: Message-ID: On Wed, 31 Jul 2024 08:12:52 GMT, Stefan Karlsson wrote: >> Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing imports and remove unused ones > > Hi Neethu, > > I glanced at this change and saw a couple of nits that I think would be good to clean out before this gets fully reviewed and integrated. @stefank -- I assume you have no more comments? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20277#issuecomment-2306786834 From stefank at openjdk.org Fri Aug 23 10:56:05 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 23 Aug 2024 10:56:05 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding I've looked through the changes to the gc/ directory and have a couple of proposal changes. Please have a look: https://github.com/openjdk/jdk/compare/pr/20677...stefank:jdk:lilliput_review_gc_1 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2306834883 From gli at openjdk.org Fri Aug 23 11:00:04 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 23 Aug 2024 11:00:04 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v2] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 09:11:46 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/parallel/psParallelCompact.cpp line 393: >> >>> 391: if (cur_addr >= region_end) { >>> 392: break; >>> 393: } >> >> What about using a `for` statement here? For example: >> >> >> --- a/src/hotspot/share/gc/parallel/psParallelCompact.cpp >> +++ b/src/hotspot/share/gc/parallel/psParallelCompact.cpp >> @@ -382,15 +382,11 @@ HeapWord* ParallelCompactData::summarize_split_space(size_t src_region, >> // Obj-iteration to locate the overflowing obj >> HeapWord* region_start = region_to_addr(src_region); >> HeapWord* region_end = region_start + RegionSize; >> - HeapWord* cur_addr = region_start + partial_obj_size; >> size_t live_words = partial_obj_size; >> >> - while (true) { >> - assert(cur_addr < region_end, "inv"); >> - cur_addr = PSParallelCompact::mark_bitmap()->find_obj_beg(cur_addr, region_end); >> - if (cur_addr >= region_end) { >> - break; >> - } >> + for (HeapWord* cur_addr = region_start + partial_obj_size; >> + cur_addr < region_end; >> + cur_addr = PSParallelCompact::mark_bitmap()->find_obj_beg(cur_addr, region_end)) { >> >> oop obj = cast_to_oop(cur_addr); >> size_t obj_size = obj->size(); > > The sub-parts of `for` are rather complex, IMO, so I'd prefer the original style. Just a suggestion. I am also OK with the current code. >> src/hotspot/share/gc/parallel/psParallelCompact.cpp line 482: >> >>> 480: assert(source_next != nullptr, "source_next is null when splitting"); >>> 481: *source_next = summarize_split_space(cur_region, split_info, dest_addr, >>> 482: target_end, target_next); >> >> Actually, the last parameter `target_next` of the method `summarize_split_space` won't be used at any other places. The bottom of the new space will be used instead (see `PSParallelCompact::summary_phase`). So I think the last parameter `target_next` of the method `summarize_split_space` can be **removed**. Then in `PSParallelCompact::summary_phase`, we can pass a null pointer in this situation (See code below) so that the meaning becomes clearer. >> >> >> // method `PSParallelCompact::summary_phase` >> } else if (live > 0) { >> // Attempt to fit part of the source space into the target space. >> HeapWord* next_src_addr = nullptr; >> bool done = _summary_data.summarize(_space_info[id].split_info(), >> space->bottom(), space->top(), >> &next_src_addr, >> *new_top_addr, dst_space_end, >> new_top_addr); // <--- here, can pass `nullptr` >> assert(!done, "space should not fit into old gen"); >> assert(next_src_addr != nullptr, "sanity"); > > `target_next` is used in the callee to set up the new-top. > > In `summarize_split_space`: > > > // Update new top of target space > *target_next = new_top; > Actually, the last parameter target_next of the method summarize_split_space won't be used at any other places. > target_next is used in the callee to set up the new-top. I means that the `target_next` is updated in `summarize_split_space` but the updated `target_next` is never used later. The bottom of the new space will be used instead. // method PSParallelCompact::summary_phase } else if (live > 0) { // other code, skip bool done = _summary_data.summarize(_space_info[id].split_info(), space->bottom(), space->top(), &next_src_addr, *new_top_addr, dst_space_end, new_top_addr); // <-- `new_top_addr` is updated, but not unnecessary // other code, skip new_top_addr = _space_info[id].new_top_addr(); // <-- `new_top_addr` is updated again done = _summary_data.summarize(_space_info[id].split_info(), next_src_addr, space->top(), nullptr, space->bottom(), dst_space_end, // <-- the bottom of the new space is used new_top_addr); // other code, skip } A draft diff is shown below: --- a/src/hotspot/share/gc/parallel/psParallelCompact.cpp +++ b/src/hotspot/share/gc/parallel/psParallelCompact.cpp @@ -310,8 +310,7 @@ ParallelCompactData::summarize_dense_prefix(HeapWord* beg, HeapWord* end) HeapWord* ParallelCompactData::summarize_split_space(size_t src_region, SplitInfo& split_info, HeapWord* const destination, - HeapWord* const target_end, - HeapWord** target_next) { + HeapWord* const target_end) { assert(destination <= target_end, "sanity"); assert(destination + _region_data[src_region].data_size() > target_end, "region should not fit into target space"); @@ -373,9 +372,6 @@ HeapWord* ParallelCompactData::summarize_split_space(size_t src_region, } } - // Update new top of target space - *target_next = new_top; - return overflowing_obj; } @@ -397,7 +393,6 @@ HeapWord* ParallelCompactData::summarize_split_space(size_t src_region, if (destination + live_words + obj_size > target_end) { // Found the overflowing obj split_info.record(src_region, cur_addr, live_words); - *target_next = destination + live_words; return cur_addr; } @@ -478,8 +473,8 @@ bool ParallelCompactData::summarize(SplitInfo& split_info, // target space and the rest is copied elsewhere. if (dest_addr + words > target_end) { assert(source_next != nullptr, "source_next is null when splitting"); - *source_next = summarize_split_space(cur_region, split_info, dest_addr, - target_end, target_next); + assert(target_next == nullptr, "target_next is not null when splitting"); + *source_next = summarize_split_space(cur_region, split_info, dest_addr, target_end); return false; } @@ -931,7 +926,7 @@ void PSParallelCompact::summary_phase() space->bottom(), space->top(), &next_src_addr, *new_top_addr, dst_space_end, - new_top_addr); + nullptr); assert(!done, "space should not fit into old gen"); assert(next_src_addr != nullptr, "sanity"); diff --git a/src/hotspot/share/gc/parallel/psParallelCompact.hpp b/src/hotspot/share/gc/parallel/psParallelCompact.hpp index 00b9d8e7f8b..3b27bb37bf2 100644 --- a/src/hotspot/share/gc/parallel/psParallelCompact.hpp +++ b/src/hotspot/share/gc/parallel/psParallelCompact.hpp @@ -366,8 +366,7 @@ class ParallelCompactData void summarize_dense_prefix(HeapWord* beg, HeapWord* end); HeapWord* summarize_split_space(size_t src_region, SplitInfo& split_info, - HeapWord* destination, HeapWord* target_end, - HeapWord** target_next); + HeapWord* destination, HeapWord* target_end); size_t live_words_in_space(const MutableSpace* space, HeapWord** full_region_prefix_end = nullptr); > if this assertion fails, it's unclear which field is problematic Ohh...Got it. >> src/hotspot/share/gc/parallel/psParallelCompact.cpp line 2271: >> >>> 2269: if (partial_obj_start == obj_start) { >>> 2270: // This obj extends to next region. >>> 2271: obj_end = partial_obj_end(next_region_start); >> >> Question: We know the object start position in this branch. Why can't we use object size (in new line 2271) directly (like new line 2274 shown below)? Why is it not safe? >> >> >> // Completely contained in this region; safe to use size(). >> obj_end = obj_start + cast_to_oop(obj_start)->size(); > > `size()` uses `klass`, which may lie in the next region (depending on the number of left words in this region), which can belong to another worker. Thanks for your explanation. > > But in _split_destination_count, the prefix split means two parts whose destinations locate in different regions but in the same space. > > Why are they in the "same" space? The purpose of having "split" to support space-boundary so that the first part and the second part are in two spaces. The `_split_destination_count` is a size to record how many regions the preceding part occupy. The preceding part is moved to one same space and the second part is moved to another space. So now, it seems you are also misled by this ambiguous name, which is a proof that we need to rename it. // file psParallelCompact.hpp // Number of regions the preceding live words are relocated into. uint split_destination_count() const { return _split_destination_count; } // file psParallelCompact.cpp, method SplitInfo::record // How many regions does the preceding part occupy uint split_destination_count; if (preceding_live_words == 0) { split_destination_count = 0; } else { if (split_destination + preceding_live_words > sd.region_align_up(split_destination)) { split_destination_count = 2; } else { split_destination_count = 1; } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728786276 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728786233 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728786527 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728786849 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728786723 From ayang at openjdk.org Fri Aug 23 11:24:05 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 23 Aug 2024 11:24:05 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v2] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 10:57:03 GMT, Guoxiong Li wrote: >> `target_next` is used in the callee to set up the new-top. >> >> In `summarize_split_space`: >> >> >> // Update new top of target space >> *target_next = new_top; > >> Actually, the last parameter target_next of the method summarize_split_space won't be used at any other places. > >> target_next is used in the callee to set up the new-top. > > I means that the `target_next` is updated in `summarize_split_space` but the updated `target_next` is never used later. The bottom of the new space will be used instead. > > > // method PSParallelCompact::summary_phase > > } else if (live > 0) { > // other code, skip > bool done = _summary_data.summarize(_space_info[id].split_info(), > space->bottom(), space->top(), > &next_src_addr, > *new_top_addr, dst_space_end, > new_top_addr); // <-- `new_top_addr` is updated, but not unnecessary > // other code, skip > new_top_addr = _space_info[id].new_top_addr(); // <-- `new_top_addr` is updated again > done = _summary_data.summarize(_space_info[id].split_info(), > next_src_addr, space->top(), > nullptr, > space->bottom(), dst_space_end, // <-- the bottom of the new space is used > new_top_addr); > // other code, skip > } > > > A draft diff is shown below: > > > --- a/src/hotspot/share/gc/parallel/psParallelCompact.cpp > +++ b/src/hotspot/share/gc/parallel/psParallelCompact.cpp > @@ -310,8 +310,7 @@ ParallelCompactData::summarize_dense_prefix(HeapWord* beg, HeapWord* end) > HeapWord* ParallelCompactData::summarize_split_space(size_t src_region, > SplitInfo& split_info, > HeapWord* const destination, > - HeapWord* const target_end, > - HeapWord** target_next) { > + HeapWord* const target_end) { > assert(destination <= target_end, "sanity"); > assert(destination + _region_data[src_region].data_size() > target_end, > "region should not fit into target space"); > @@ -373,9 +372,6 @@ HeapWord* ParallelCompactData::summarize_split_space(size_t src_region, > } > } > > - // Update new top of target space > - *target_next = new_top; > - > return overflowing_obj; > } > > @@ -397,7 +393,6 @@ HeapWord* ParallelCompactData::summarize_split_space(size_t src_regio... `gc/InfiniteList.java` (if running Parallel) fails with this patch on my box. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728814300 From ayang at openjdk.org Fri Aug 23 11:28:03 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 23 Aug 2024 11:28:03 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v2] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 10:57:30 GMT, Guoxiong Li wrote: >> I tried to make fields of `class SplitInfo` to match their counterpart in `class RegionData`, except the additional `_split_` prefix, in order to signify these fields are closely related. >> >>> But in _split_destination_count, the prefix split means two parts whose destinations locate in different regions but in the same space. >> >> Why are they in the "same" space? The purpose of having "split" to support space-boundary so that the first part and the second part are in two spaces. > >> > But in _split_destination_count, the prefix split means two parts whose destinations locate in different regions but in the same space. >> >> Why are they in the "same" space? The purpose of having "split" to support space-boundary so that the first part and the second part are in two spaces. > > The `_split_destination_count` is a size to record how many regions the preceding part occupy. The preceding part is moved to one same space and the second part is moved to another space. So now, it seems you are also misled by this ambiguous name, which is a proof that we need to rename it. > > > // file psParallelCompact.hpp > > // Number of regions the preceding live words are relocated into. > uint split_destination_count() const { return _split_destination_count; } > > > // file psParallelCompact.cpp, method SplitInfo::record > > // How many regions does the preceding part occupy > uint split_destination_count; > if (preceding_live_words == 0) { > split_destination_count = 0; > } else { > if (split_destination + preceding_live_words > sd.region_align_up(split_destination)) { > split_destination_count = 2; > } else { > split_destination_count = 1; > } > } Since the preceding live words can be relocated to more than one region, I believe you interpret that as "split". However, the word "split" in this class/context is exclusively reserved for "splitting" a region so that "The preceding part is moved to one same space and the second part is moved to another space." Relocating into multiple destination-regions is not "split", otherwise, `RegionData::_destination` should contain "split" as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728818747 From stefank at openjdk.org Fri Aug 23 11:44:05 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 23 Aug 2024 11:44:05 GMT Subject: RFR: 8336299: Improve GCLocker stall diagnostics [v6] In-Reply-To: <-UX_bf4EPxAUOOYQrK69o4o0ewKcMx8ajJog5bdCspE=.b3ce2098-a937-4c81-8aca-1993c14b9e28@github.com> References: <-UX_bf4EPxAUOOYQrK69o4o0ewKcMx8ajJog5bdCspE=.b3ce2098-a937-4c81-8aca-1993c14b9e28@github.com> Message-ID: <32Jn_Bnz3ikz1-7wL-vNGo63HhNMh-OuvtT3PDaHiVg=.67e36e6f-a7de-47f8-89ca-ea0a46016285@github.com> On Mon, 12 Aug 2024 22:39:25 GMT, Neethu Prasad wrote: >> **Notes** >> Adding logs to get more visibility into how fast a thread resumes from allocation stall. >> >> **Testing** >> * tier 1, tier 2, hotspot_gc tests. >> >> Example log messages >> >> 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. >> >> 2. Thread exiting critical region. Thread "main" 0 locked. >> >> 3. Thread stalled by JNI critical section. Resumed after 1294ms. Thread "Thread-0". >> >> 4. Thread blocked to enter critical region. Resumed after 1280ms. Thread "SIGINT handler". > > Neethu Prasad has updated the pull request incrementally with one additional commit since the last revision: > > address feedback regarding logger potentially getting instantiated multiple times No further nits (Not a full review - unclear how to remove the "request for changes" without fully approving the patch) ------------- PR Review: https://git.openjdk.org/jdk/pull/20277#pullrequestreview-2257032123 From gli at openjdk.org Fri Aug 23 12:37:04 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 23 Aug 2024 12:37:04 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v2] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 11:25:18 GMT, Albert Mingkun Yang wrote: >>> > But in _split_destination_count, the prefix split means two parts whose destinations locate in different regions but in the same space. >>> >>> Why are they in the "same" space? The purpose of having "split" to support space-boundary so that the first part and the second part are in two spaces. >> >> The `_split_destination_count` is a size to record how many regions the preceding part occupy. The preceding part is moved to one same space and the second part is moved to another space. So now, it seems you are also misled by this ambiguous name, which is a proof that we need to rename it. >> >> >> // file psParallelCompact.hpp >> >> // Number of regions the preceding live words are relocated into. >> uint split_destination_count() const { return _split_destination_count; } >> >> >> // file psParallelCompact.cpp, method SplitInfo::record >> >> // How many regions does the preceding part occupy >> uint split_destination_count; >> if (preceding_live_words == 0) { >> split_destination_count = 0; >> } else { >> if (split_destination + preceding_live_words > sd.region_align_up(split_destination)) { >> split_destination_count = 2; >> } else { >> split_destination_count = 1; >> } >> } > > Since the preceding live words can be relocated to more than one region, I believe you interpret that as "split". However, the word "split" in this class/context is exclusively reserved for "splitting" a region so that "The preceding part is moved to one same space and the second part is moved to another space." > > Relocating into multiple destination-regions is not "split", otherwise, `RegionData::_destination` should contain "split" as well. > In _split_destination, the prefix split means two parts whose destinations locate in different spaces. But in _split_destination_count, the prefix split means two parts whose destinations locate in different regions but in the same space. The two `two parts` I mentioned is unclear and ambiguous. To be clearer and be easy to understand, please omit my previous unclear statement, and help judge whether my following understanding is right or not. Thanks. One source region which needs to be split can be divided into three parts (shown below). - `first-part-opt-1` contains objects targeted to one region - `first-part-opt-2` contains objects targeted to another region (but the same space as the target of `first-part-opt-1`) - `second-part` contains objects targeted to another space - `first-part-opt-1` and `first-part-opt-2` are optional source-region: ----------------------------------------------------------- | first-part-opt-1 | first-part-opt-2 | second-part | a--------------------b--------------------c---------------d Then we have several conclusions: - the `_split_point` is at the location `c` - the `_preceding_live_words` is the size of the live objects from `a` to `c` - the `_split_destination` is the destination of the first object from `a`(not the first object from `c`) - the `_split_destination_count` has three conditions - if the `_preceding_live_words` is `0`, the `_split_destination_count` is `0`, because the `first-part-opt-1` and `first-part-opt-2` don't exist. - if `_split_destination + _preceding_live_words` is larger than `region_align_up(_split_destination)`, the `_split_destination_count` is `2`, because both `first-part-opt-1` and `first-part-opt-2` exist. - otherwise, the `_split_destination_count` is `1` (only `first-part-opt-1` exists) > However, the word "split" in this class/context is exclusively reserved for "splitting" a region so that "The preceding part is moved to one same space and the second part is moved to another space." If my understanding is right, I think the word `split` can be used in the context related to the `c` or `from c to else` or `second-part`. And the word `preceding` can be used in the context related to the `a`, `b` or `from 'a'/'b' to else` or `first-part-opt-1/first-part-opt-2`. What do you think about it? If you agree with this convention. I propose to change `_split_destination` to `_preceding_destination` and change `_split_destination_count` to `_preceding_destination_count`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1728898444 From ayang at openjdk.org Fri Aug 23 13:26:51 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 23 Aug 2024 13:26:51 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v3] In-Reply-To: References: Message-ID: <6_FegF-P_VDqelU_dwTjTIWLgDdIV0faVP7gSZDcdFM=.03c464b1-c757-4f7c-a32e-2ba4fa726dc8@github.com> > Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. > > For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. > > With this patch, the exec time of `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java` using Parallel drops from ~30s to ~8s, the same as other GCs, and gc-log shows similar number of GC cycles as well. > > Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20590/files - new: https://git.openjdk.org/jdk/pull/20590/files/a4665329..6e76fb49 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20590&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20590&range=01-02 Stats: 33 lines in 2 files changed: 12 ins; 12 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20590.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20590/head:pull/20590 PR: https://git.openjdk.org/jdk/pull/20590 From mdoerr at openjdk.org Fri Aug 23 13:31:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 23 Aug 2024 13:31:06 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> Message-ID: On Mon, 19 Aug 2024 14:25:13 GMT, Roberto Casta?eda Lozano wrote: >> If case of heap base != null, a branch already exists which makes the other null check redundant. So, we have null check, region crossing check, another null check. Maybe this compressed oop mode is not important enough. >> >> For the other compressed oop modes, yes, this means moving the null check above the region crossing check. On PPC64, the null check can be combined with the shift instruction, so we save one compare instruction. Technically, it would even be possible to use only one branch instruction for both checks, but I'm not sure if it's worth the complexity. I'll think about it. > > OK, thanks. I just ran some benchmarks with zero-based OOP compression ([prototype here](https://github.com/robcasloz/jdk/tree/JDK-8334060-g1-late-barrier-expansion-x64-optimizations)) and could not observe any significant performance effect on three different x64 implementations. I think I will keep the `g1StoreN` implementation as-is in the x64 and aarch64 backends, for simplicity. Again, we can revisit this in follow-up work if need be. I have an experimental implementation for PPC64. I have moved the oop decoding into `G1BarrierSetAssembler::g1_write_barrier_post_c2`: https://github.com/TheRealMDoerr/jdk/blob/a48598075862f17e7b1cfbec29af4c2431809257/src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp#L476 This has 2 advantages: - Reduce replicated code in the .ad file. - Make the discussed optimization easy. Please take a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1728978594 From ayang at openjdk.org Fri Aug 23 13:32:18 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 23 Aug 2024 13:32:18 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v4] In-Reply-To: References: Message-ID: > Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. > > For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. > > With this patch, the exec time of `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java` using Parallel drops from ~30s to ~8s, the same as other GCs, and gc-log shows similar number of GC cycles as well. > > Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20590/files - new: https://git.openjdk.org/jdk/pull/20590/files/6e76fb49..73598ba8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20590&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20590&range=02-03 Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20590.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20590/head:pull/20590 PR: https://git.openjdk.org/jdk/pull/20590 From mdoerr at openjdk.org Fri Aug 23 13:36:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 23 Aug 2024 13:36:06 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <4aS0KysKaIPue1D60nootRPb8m7pdP_2hlPSwGUIk8w=.6b68b278-62ce-4df0-a6c6-c7fee40557aa@github.com> Message-ID: <6DcMr9PUa8OZEhO861hkhqTMYlKDs96tgPs4Fu1u72I=.9dd2d0d4-9075-4c36-9ab6-ff886e257b4b@github.com> On Mon, 19 Aug 2024 12:20:21 GMT, Roberto Casta?eda Lozano wrote: >> Note that we had such an optimization already in C2: https://github.com/openjdk/jdk8u-dev/blob/4106121e0ae42d644e45c6eab9037874110ed670/hotspot/src/share/vm/opto/library_call.cpp#L3114 >> But, it's probably not a big deal. Maybe I can try it on PPC64 which may be more sensitive to accesses on contended memory. > > Thanks for the reference, I would still prefer to keep this part as is for simplicity. We can always optimize the atomic barriers in follow-up work, if a need arises. After thinking more about this, I figured out that we can optimize more when moving the pre_barrier after the cmpxchg. We can skip all G1 barriers if the cmpxchg fails: https://github.com/TheRealMDoerr/jdk/blob/a48598075862f17e7b1cfbec29af4c2431809257/src/hotspot/cpu/ppc/gc/g1/g1_ppc.ad#L171 This may reduce load on GC queue handling and related work for GC threads. I'm testing this version and I actually like it more than the version I had before. Please take a look. (Note that my final version will need https://github.com/openjdk/jdk/pull/20689 to be integrated and merged into your PR.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1728987173 From gli at openjdk.org Fri Aug 23 14:33:04 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 23 Aug 2024 14:33:04 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v4] In-Reply-To: References: Message-ID: <12ltuBDcerrLt-01ke0FeRUBzWM5KCGMC4kVJ4R0fRI=.5bd62c93-6ade-4f4c-9578-de01b80c92d1@github.com> On Fri, 23 Aug 2024 11:21:25 GMT, Albert Mingkun Yang wrote: >>> Actually, the last parameter target_next of the method summarize_split_space won't be used at any other places. >> >>> target_next is used in the callee to set up the new-top. >> >> I means that the `target_next` is updated in `summarize_split_space` but the updated `target_next` is never used later. The bottom of the new space will be used instead. >> >> >> // method PSParallelCompact::summary_phase >> >> } else if (live > 0) { >> // other code, skip >> bool done = _summary_data.summarize(_space_info[id].split_info(), >> space->bottom(), space->top(), >> &next_src_addr, >> *new_top_addr, dst_space_end, >> new_top_addr); // <-- `new_top_addr` is updated, but not unnecessary >> // other code, skip >> new_top_addr = _space_info[id].new_top_addr(); // <-- `new_top_addr` is updated again >> done = _summary_data.summarize(_space_info[id].split_info(), >> next_src_addr, space->top(), >> nullptr, >> space->bottom(), dst_space_end, // <-- the bottom of the new space is used >> new_top_addr); >> // other code, skip >> } >> >> >> A draft diff is shown below: >> >> >> --- a/src/hotspot/share/gc/parallel/psParallelCompact.cpp >> +++ b/src/hotspot/share/gc/parallel/psParallelCompact.cpp >> @@ -310,8 +310,7 @@ ParallelCompactData::summarize_dense_prefix(HeapWord* beg, HeapWord* end) >> HeapWord* ParallelCompactData::summarize_split_space(size_t src_region, >> SplitInfo& split_info, >> HeapWord* const destination, >> - HeapWord* const target_end, >> - HeapWord** target_next) { >> + HeapWord* const target_end) { >> assert(destination <= target_end, "sanity"); >> assert(destination + _region_data[src_region].data_size() > target_end, >> "region should not fit into target space"); >> @@ -373,9 +372,6 @@ HeapWord* ParallelCompactData::summarize_split_space(size_t src_region, >> } >> } >> >> - // Update new top of target space >> - *target_next = new_top; >> - >> return overflowing_obj... > > `gc/InfiniteList.java` (if running Parallel) fails with this patch on my box. I know what I miss now. The `new_top_addr` is not only a temporary variable which only influences the summary phase. It actually points to the `SpaceInfo::_new_top` which influences the later phases. So it needs to be updated precisely. Sorry, my mistake. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1729069597 From gli at openjdk.org Fri Aug 23 14:33:03 2024 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 23 Aug 2024 14:33:03 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v4] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 13:32:18 GMT, Albert Mingkun Yang wrote: >> Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. >> >> For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. >> >> With this patch, the exec time of `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java` using Parallel drops from ~30s to ~8s, the same as other GCs, and gc-log shows similar number of GC cycles as well. >> >> Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good. ------------- Marked as reviewed by gli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20590#pullrequestreview-2257397301 From lmesnik at openjdk.org Fri Aug 23 16:37:05 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 23 Aug 2024 16:37:05 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding Changes requested by lmesnik (Reviewer). make/Images.gmk line 135: > 133: # > 134: # Param1 - VM variant (e.g., server, client, zero, ...) > 135: # Param2 - _nocoops, _coh, _nocoops_coh, or empty The -XX:+UseCompactObjectHeaders ssems to incompatible withe zero vm. The zero vm build start failing while generating shared archive with +UseCompactObjectHeaders. Generation should be disabled by default for zero to don't break the build. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2257621775 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1729222671 From mli at openjdk.org Fri Aug 23 18:49:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 23 Aug 2024 18:49:05 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp line 170: > 168: mv(tmp1, (int32_t)(intptr_t)markWord::prototype().value()); > 169: sd(tmp1, Address(obj, oopDesc::mark_offset_in_bytes())); > 170: // Todo UseCompactObjectHeaders Can I ask, will this pr fullly support riscv? src/hotspot/share/oops/oop.inline.hpp line 94: > 92: > 93: void oopDesc::init_mark() { > 94: if (UseCompactObjectHeaders) { Seems only `set_mark(prototype_mark());` is fine for both cases? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1729383247 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1728833750 From lmesnik at openjdk.org Fri Aug 23 19:06:04 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 23 Aug 2024 19:06:04 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding test/hotspot/jtreg/runtime/cds/appcds/TestZGCWithCDS.java line 59: > 57: public static void main(String... args) throws Exception { > 58: String zGenerational = args[0]; > 59: String compactHeaders = "-XX:" + (zGenerational.equals("-XX:+ZGenerational") ? "+" : "-") + "UseCompactObjectHeaders"; The test failing with stdout: [[0.176s][info][cds] trying to map /opt/mach5/mesos/work_dir/slaves/a20696e7-ae7d-4d37-8e9c-83f99ef002cb-S2261/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f0801999-993f-4e08-b017-08b33a8ec44f/runs/34cc555e-ae8f-4a48-8175-e998194f204b/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_cds_relocation/scratch/5/appcds-18h50m16s773.jsa [0.176s][info][cds] Opened archive /opt/mach5/mesos/work_dir/slaves/a20696e7-ae7d-4d37-8e9c-83f99ef002cb-S2261/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f0801999-993f-4e08-b017-08b33a8ec44f/runs/34cc555e-ae8f-4a48-8175-e998194f204b/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_cds_relocation/scratch/5/appcds-18h50m16s773.jsa. [0.176s][info][cds] Archive was created with UseCompressedOops = 0, UseCompressedClassPointers = 1 [0.176s][info][cds] The shared archive file's UseCompactObjectHeaders setting (enabled) does not equal the current UseCompactObjectHeaders setting (disabled). [0.176s][info][cds] Initialize static archive failed. [0.176s][info][cds] Unable to map shared spaces [0.176s][error][cds] An error has occurred while processing the shared archive file. [0.176s][error][cds] Unable to map shared spaces Error occurred during initialization of VM Unable to use shared archive. ]; stderr: [] exitValue = 1 java.lang.RuntimeException: 'Hello World' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:252) at TestZGCWithCDS.main(TestZGCWithCDS.java:123) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:573) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) at java.base/java.lang.Thread.run(Thread.java:1575) JavaTest Message: Test threw exception: java.lang.RuntimeException JavaTest Message: shutting down test ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1729404477 From zgu at openjdk.org Fri Aug 23 20:03:33 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Fri, 23 Aug 2024 20:03:33 GMT Subject: RFR: 8338922: Parallel: Add task queue stats support in full GC Message-ID: Add capability to print out task queue stats in Parallel Compact GC, like other GCs. ------------- Commit messages: - 8338922: Parallel: Add task queue stats support in full GC Changes: https://git.openjdk.org/jdk/pull/20694/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20694&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338922 Stats: 24 lines in 3 files changed: 22 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20694.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20694/head:pull/20694 PR: https://git.openjdk.org/jdk/pull/20694 From zgu at openjdk.org Sun Aug 25 01:41:07 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Sun, 25 Aug 2024 01:41:07 GMT Subject: RFR: 8338922: Parallel: Add task queue stats support in full GC [v2] In-Reply-To: References: Message-ID: > Add capability to print out task queue stats in Parallel Compact GC, like other GCs. Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: v1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20694/files - new: https://git.openjdk.org/jdk/pull/20694/files/3e4592a2..b1a46198 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20694&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20694&range=00-01 Stats: 6 lines in 3 files changed: 2 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20694.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20694/head:pull/20694 PR: https://git.openjdk.org/jdk/pull/20694 From rcastanedalo at openjdk.org Mon Aug 26 07:26:08 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 Aug 2024 07:26:08 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> Message-ID: <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> On Fri, 23 Aug 2024 13:28:03 GMT, Martin Doerr wrote: >> OK, thanks. I just ran some benchmarks with zero-based OOP compression ([prototype here](https://github.com/robcasloz/jdk/tree/JDK-8334060-g1-late-barrier-expansion-x64-optimizations)) and could not observe any significant performance effect on three different x64 implementations. I think I will keep the `g1StoreN` implementation as-is in the x64 and aarch64 backends, for simplicity. Again, we can revisit this in follow-up work if need be. > > I have an experimental implementation for PPC64. I have moved the oop decoding into `G1BarrierSetAssembler::g1_write_barrier_post_c2`: > https://github.com/TheRealMDoerr/jdk/blob/0aedfb0aa1c545319257c0e613066b91404a07ca/src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp#L476 > This has 2 advantages: > - Reduce replicated code in the .ad file. > - Make the discussed optimization easy. Please take a look. Great that you already have an experimental port! Thanks for the heads-up, I agree that the OOP decoding + null check fusion becomes less intrusive, but I still prefer the current decoupled implementation for x64 and aarch64 (even simpler, IMO). In the benchmarks I have run (admittedly, only on x64), I could not observe any positive effect, whereas I found a slight regression in one case using zero-based OOP compression. I have not investigated further, but I wonder if hoisting the null check above the region-crossing test could have a negative impact on branch predictability. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730806021 From mdoerr at openjdk.org Mon Aug 26 07:46:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 26 Aug 2024 07:46:06 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> Message-ID: On Mon, 26 Aug 2024 07:23:40 GMT, Roberto Casta?eda Lozano wrote: >> I have an experimental implementation for PPC64. I have moved the oop decoding into `G1BarrierSetAssembler::g1_write_barrier_post_c2`: >> https://github.com/TheRealMDoerr/jdk/blob/0aedfb0aa1c545319257c0e613066b91404a07ca/src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp#L476 >> This has 2 advantages: >> - Reduce replicated code in the .ad file. >> - Make the discussed optimization easy. Please take a look. > > Great that you already have an experimental port! Thanks for the heads-up, I agree that the OOP decoding + null check fusion becomes less intrusive, but I still prefer the current decoupled implementation for x64 and aarch64 (even simpler, IMO). In the benchmarks I have run (admittedly, only on x64), I could not observe any positive effect, whereas I found a slight regression in one case using zero-based OOP compression. I have not investigated further, but I wonder if hoisting the null check above the region-crossing test could have a negative impact on branch predictability. It can be implemented like this: - If oop decoding requires a null check, redirect the branch to jump over the barrier code. - Else insert the null check after the region crossing check. This way, I don't see how it can have a negative effect. But I leave you free to decide about x86 and aarch64. Optimizations could be done later if needed as you already mentioned. Did you also see my comment https://github.com/openjdk/jdk/pull/19746#discussion_r1728987173 ? It's in the "resolved" discussion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730832653 From stuefe at openjdk.org Mon Aug 26 08:06:09 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 26 Aug 2024 08:06:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 19:03:19 GMT, Leonid Mesnik wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > test/hotspot/jtreg/runtime/cds/appcds/TestZGCWithCDS.java line 59: > >> 57: public static void main(String... args) throws Exception { >> 58: String zGenerational = args[0]; >> 59: String compactHeaders = "-XX:" + (zGenerational.equals("-XX:+ZGenerational") ? "+" : "-") + "UseCompactObjectHeaders"; > > The test failing with > stdout: [[0.176s][info][cds] trying to map /opt/mach5/mesos/work_dir/slaves/a20696e7-ae7d-4d37-8e9c-83f99ef002cb-S2261/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f0801999-993f-4e08-b017-08b33a8ec44f/runs/34cc555e-ae8f-4a48-8175-e998194f204b/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_cds_relocation/scratch/5/appcds-18h50m16s773.jsa > [0.176s][info][cds] Opened archive /opt/mach5/mesos/work_dir/slaves/a20696e7-ae7d-4d37-8e9c-83f99ef002cb-S2261/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f0801999-993f-4e08-b017-08b33a8ec44f/runs/34cc555e-ae8f-4a48-8175-e998194f204b/testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_cds_relocation/scratch/5/appcds-18h50m16s773.jsa. > [0.176s][info][cds] Archive was created with UseCompressedOops = 0, UseCompressedClassPointers = 1 > [0.176s][info][cds] The shared archive file's UseCompactObjectHeaders setting (enabled) does not equal the current UseCompactObjectHeaders setting (disabled). > [0.176s][info][cds] Initialize static archive failed. > [0.176s][info][cds] Unable to map shared spaces > [0.176s][error][cds] An error has occurred while processing the shared archive file. > [0.176s][error][cds] Unable to map shared spaces > Error occurred during initialization of VM > Unable to use shared archive. > ]; > stderr: [] > exitValue = 1 > > java.lang.RuntimeException: 'Hello World' missing from stdout/stderr > at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:252) > at TestZGCWithCDS.main(TestZGCWithCDS.java:123) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:573) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > at java.base/java.lang.Thread.run(Thread.java:1575) > > JavaTest Message: Test threw exception: java.lang.RuntimeException > JavaTest Message: shutting down test Roman has two weeks of vacation; I am taking a look at this one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1730855152 From rcastanedalo at openjdk.org Mon Aug 26 08:32:06 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 Aug 2024 08:32:06 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <6DcMr9PUa8OZEhO861hkhqTMYlKDs96tgPs4Fu1u72I=.9dd2d0d4-9075-4c36-9ab6-ff886e257b4b@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <4aS0KysKaIPue1D60nootRPb8m7pdP_2hlPSwGUIk8w=.6b68b278-62ce-4df0-a6c6-c7fee40557aa@github.com> <6DcMr9PUa8OZEhO861hkhqTMYlKDs96tgPs4Fu1u72I=.9dd2d0d4-9075-4c36-9ab6-ff886e257b4b@github.com> Message-ID: On Fri, 23 Aug 2024 13:33:09 GMT, Martin Doerr wrote: >> Thanks for the reference, I would still prefer to keep this part as is for simplicity. We can always optimize the atomic barriers in follow-up work, if a need arises. > > After thinking more about this, I figured out that we can optimize more when moving the pre_barrier after the cmpxchg. We can skip all G1 barriers if the cmpxchg fails: > https://github.com/TheRealMDoerr/jdk/blob/0aedfb0aa1c545319257c0e613066b91404a07ca/src/hotspot/cpu/ppc/gc/g1/g1_ppc.ad#L171 > The cmpxchg jumps to no_update on failure. This may reduce load on GC queue handling and related work for GC threads. I'm testing this version and I actually like it more than the version I had before. Please take a look. > > (Note that my final version will need https://github.com/openjdk/jdk/pull/20689 to be integrated and merged into your PR.) Right, that makes sense since for PPC's cmpxchg implementation (unlike x64 or aarch64+LSE) you are already explicitly branching on failure anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730891297 From rcastanedalo at openjdk.org Mon Aug 26 08:41:08 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 Aug 2024 08:41:08 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> Message-ID: <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> On Mon, 26 Aug 2024 07:43:39 GMT, Martin Doerr wrote: > This way, I don't see how it can have a negative effect. I agree, this is the implementation I tried out originally (https://github.com/openjdk/jdk/pull/19746#discussion_r1719811953). > Did you also see my comment https://github.com/openjdk/jdk/pull/19746#discussion_r1728987173 ? It's in the "resolved" discussion. Yes, thanks, I "unresolved" it now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730905873 From rcastanedalo at openjdk.org Mon Aug 26 08:49:06 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 Aug 2024 08:49:06 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> Message-ID: On Mon, 26 Aug 2024 08:38:39 GMT, Roberto Casta?eda Lozano wrote: >> It can be implemented like this: >> >> - If oop decoding requires a null check, redirect the branch to jump over the barrier code. >> - Else insert the null check after the region crossing check. >> >> This way, I don't see how it can have a negative effect. But I leave you free to decide about x86 and aarch64. Optimizations could be done later if needed as you already mentioned. >> >> Did you also see my comment https://github.com/openjdk/jdk/pull/19746#discussion_r1728987173 ? It's in the "resolved" discussion. > >> This way, I don't see how it can have a negative effect. > > I agree, this is the implementation I tried out originally (https://github.com/openjdk/jdk/pull/19746#discussion_r1719811953). > >> Did you also see my comment https://github.com/openjdk/jdk/pull/19746#discussion_r1728987173 ? It's in the "resolved" discussion. > > Yes, thanks, I "unresolved" it now. > I have an experimental implementation for PPC64. An unrelated comment about your PPC64 implementation: did you try running `test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java`? It expects the ADL instructions that implement `GetAndSetP` and `GetAndSetN` to be called `g1XChgP` and `g1XChgN`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730916202 From mdoerr at openjdk.org Mon Aug 26 09:45:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 26 Aug 2024 09:45:06 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> Message-ID: <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> On Mon, 26 Aug 2024 08:46:10 GMT, Roberto Casta?eda Lozano wrote: >>> This way, I don't see how it can have a negative effect. >> >> I agree, this is the implementation I tried out originally (https://github.com/openjdk/jdk/pull/19746#discussion_r1719811953). >> >>> Did you also see my comment https://github.com/openjdk/jdk/pull/19746#discussion_r1728987173 ? It's in the "resolved" discussion. >> >> Yes, thanks, I "unresolved" it now. > >> I have an experimental implementation for PPC64. > > An unrelated comment about your PPC64 implementation: did you try running `test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java`? It expects the ADL instructions that implement `GetAndSetP` and `GetAndSetN` to be called `g1XChgP` and `g1XChgN`. That one is among the failing tests. Can we agree on better names than `g1XChgP` and `g1XChgN`? They are not readable very well IMHO. All the other nodes have nice names. Regardless if you implement the compressed oops optimization or not, I'd reconsider moving the oop decoding into `G1BarrierSetAssembler::g1_write_barrier_post_c2` because it makes the .ad file shorter because you can get rid of the replicated `decode_heap_oop`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730991651 From ayang at openjdk.org Mon Aug 26 12:50:03 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 26 Aug 2024 12:50:03 GMT Subject: RFR: 8338922: Parallel: Add task queue stats support in full GC [v2] In-Reply-To: References: Message-ID: On Sun, 25 Aug 2024 01:41:07 GMT, Zhengyu Gu wrote: >> Add capability to print out task queue stats in Parallel Compact GC, like other GCs. > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > v1 Why is the `#if TASKQUEUE_STATS` at the end of `PSParallelCompact::marking_phase` not enough? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20694#issuecomment-2310129821 From zgu at openjdk.org Mon Aug 26 13:17:09 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 26 Aug 2024 13:17:09 GMT Subject: RFR: 8338922: Parallel: Add task queue stats support in full GC [v2] In-Reply-To: References: Message-ID: On Sun, 25 Aug 2024 01:41:07 GMT, Zhengyu Gu wrote: >> Add capability to print out task queue stats in Parallel Compact GC, like other GCs. > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > v1 Oops, nevermind. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20694#issuecomment-2310190382 From zgu at openjdk.org Mon Aug 26 13:17:10 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 26 Aug 2024 13:17:10 GMT Subject: Withdrawn: 8338922: Parallel: Add task queue stats support in full GC In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 19:57:54 GMT, Zhengyu Gu wrote: > Add capability to print out task queue stats in Parallel Compact GC, like other GCs. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20694 From ayang at openjdk.org Mon Aug 26 13:22:03 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 26 Aug 2024 13:22:03 GMT Subject: RFR: 8258483: [TESTBUG] gtest CollectorPolicy.young_scaled_initial_ergo_vm fails if heap is too small [v2] In-Reply-To: References: <7Qtt8s9VVWzl6VfmXVPR3GQTCKcQA5Fx-fEgsPLCLk8=.f084eb20-96be-4bde-ac50-84ca2fc54951@github.com> Message-ID: <5aFWT-n32nea6yxJkTWQVcYa1_0YoEL6uDx60f98yjQ=.3c880f47-6e89-4b86-8d63-7ce0cf5d2ac2@github.com> On Thu, 22 Aug 2024 04:01:44 GMT, Leonid Mesnik wrote: >> The tests CollectorPolicy.* checks SerialGC policy. They might fail if MaxHeapSize is too small. >> >> If heap is not enough for then VM change ergonomic scheme and print warning about this. The test is not checking this case. >> The GC ergonomic has very different cases and only main workflow is covered. The goal of fix is not to improve test but pass in reasonable environment or silently pass if other. >> >> I have updated test so it pass if heap is at least 128M (since it is SerialGC, it seems reasonable for testing in smaller containers) or skipped otherwise. >> >> Testing: tier1, running these tests manually with 90/128/256M to check that they pass in such environment. > > Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - increased heap > - fixed space. > - Merge branch 'master' of https://github.com/openjdk/jdk into 8258483 > - typo fixed. > - size updated. > - gtest has been updated. Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20656#pullrequestreview-2260668385 From rcastanedalo at openjdk.org Mon Aug 26 13:26:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 Aug 2024 13:26:07 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> Message-ID: On Mon, 26 Aug 2024 09:42:29 GMT, Martin Doerr wrote: > That one is among the failing tests. Can we agree on better names than g1XChgP and g1XChgN? They are not readable very well IMHO. Sure, I agree that `g1GetAndSetP` and `g1GetAndSetN` are more consistent with the corresponding Ideal operation names, will rename the instructions in x64 and aarch64 and update the test expectations. > Regardless if you implement the compressed oops optimization or not, I'd reconsider moving the oop decoding into G1BarrierSetAssembler::g1_write_barrier_post_c2 because it makes the .ad file shorter because you can get rid of the replicated decode_heap_oop. Thanks, will try it out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1731240303 From iwalulya at openjdk.org Mon Aug 26 15:32:05 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 26 Aug 2024 15:32:05 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v4] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 13:32:18 GMT, Albert Mingkun Yang wrote: >> Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. >> >> For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. >> >> With this patch, the exec time of `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java` using Parallel drops from ~30s to ~8s, the same as other GCs, and gc-log shows similar number of GC cycles as well. >> >> Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Changes requested by iwalulya (Reviewer). src/hotspot/share/gc/parallel/psParallelCompact.cpp line 129: > 127: ParallelCompactData::RegionData::dc_completed = 0xcU << dc_shift; > 128: > 129: bool ParallelCompactData::RegionData::is_clear() { Do we call this anywhere outside #ifdef ASSERT? src/hotspot/share/gc/parallel/psParallelCompact.cpp line 391: > 389: assert(cur_addr < region_end, "inv"); > 390: cur_addr = PSParallelCompact::mark_bitmap()->find_obj_beg(cur_addr, region_end); > 391: if (cur_addr >= region_end) { Shouldn't this be changed to an assert? Given the: `// There must be an overflowing obj in this region` `ShouldNotReachHere();` if the `break` is executed. ------------- PR Review: https://git.openjdk.org/jdk/pull/20590#pullrequestreview-2260943971 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1731418102 PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1731399930 From ayang at openjdk.org Mon Aug 26 17:20:18 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 26 Aug 2024 17:20:18 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v5] In-Reply-To: References: Message-ID: > Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. > > For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. > > With this patch, the exec time of `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java` using Parallel drops from ~30s to ~8s, the same as other GCs, and gc-log shows similar number of GC cycles as well. > > Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20590/files - new: https://git.openjdk.org/jdk/pull/20590/files/73598ba8..50cad556 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20590&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20590&range=03-04 Stats: 6 lines in 1 file changed: 0 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20590.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20590/head:pull/20590 PR: https://git.openjdk.org/jdk/pull/20590 From ayang at openjdk.org Mon Aug 26 17:20:19 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 26 Aug 2024 17:20:19 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v4] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 15:24:54 GMT, Ivan Walulya wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/gc/parallel/psParallelCompact.cpp line 129: > >> 127: ParallelCompactData::RegionData::dc_completed = 0xcU << dc_shift; >> 128: >> 129: bool ParallelCompactData::RegionData::is_clear() { > > Do we call this anywhere outside #ifdef ASSERT? No, it's only for assert. The name seems useful for non-verification, so I left it there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20590#discussion_r1731551711 From iwalulya at openjdk.org Mon Aug 26 17:32:03 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 26 Aug 2024 17:32:03 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v5] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 17:20:18 GMT, Albert Mingkun Yang wrote: >> Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. >> >> For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. >> >> With this patch, the exec time of `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java` using Parallel drops from ~30s to ~8s, the same as other GCs, and gc-log shows similar number of GC cycles as well. >> >> Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > assert Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20590#pullrequestreview-2261222978 From cjplummer at openjdk.org Mon Aug 26 21:56:04 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 26 Aug 2024 21:56:04 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 85: > 83: > 84: private static Klass getKlass(Mark mark) { > 85: assert(VM.getVM().isCompactObjectHeadersEnabled()); `mark.getKlass()` already does this assert. I don't see any value in this `getKlass()` method. The caller should just call `getMark().getKlass()` rather than `getKlass(getMark())`. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 169: > 167: } else { > 168: visitor.doMetadata(klass, true); > 169: } Why is there no `visitor.doMetadata()` call for the compressed object header case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1731849434 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1731866842 From kbarrett at openjdk.org Tue Aug 27 00:21:12 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 27 Aug 2024 00:21:12 GMT Subject: RFR: 8311163: Parallel: Improve large object handling during evacuation Message-ID: Please review this change to ParallelGC young generation collection's handling of large objArrays, to now use the infrastructure provided by JDK-8253237 and JDK-8337709. (That's the same infrastructure used by G1 young/mixed collections.) Testing: mach5 tier1-5 ------------- Commit messages: - use PartialArrayStates Changes: https://git.openjdk.org/jdk/pull/20720/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20720&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311163 Stats: 121 lines in 5 files changed: 54 ins; 39 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/20720.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20720/head:pull/20720 PR: https://git.openjdk.org/jdk/pull/20720 From fyang at openjdk.org Tue Aug 27 05:42:05 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 27 Aug 2024 05:42:05 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 18:42:28 GMT, Hamlin Li wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp line 170: > >> 168: mv(tmp1, (int32_t)(intptr_t)markWord::prototype().value()); >> 169: sd(tmp1, Address(obj, oopDesc::mark_offset_in_bytes())); >> 170: // Todo UseCompactObjectHeaders > > Can I ask, will this pr fullly support riscv? @Hamlin-Li : AFAIK, porting to linux-riscv platform has NOT been started yet. To avoid duplicate work, please let me know if anyone is interested or has been working on it :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1732153574 From rcastanedalo at openjdk.org Tue Aug 27 07:30:46 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 Aug 2024 07:30:46 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Rename g1XChgX to g1GetAndSetX for consistency with Ideal operation names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/92112802..daf38d3f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=08-09 Stats: 10 lines in 4 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Tue Aug 27 07:38:08 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 Aug 2024 07:38:08 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> Message-ID: On Mon, 26 Aug 2024 13:23:16 GMT, Roberto Casta?eda Lozano wrote: > Sure, I agree that g1GetAndSetP and g1GetAndSetN are more consistent with the corresponding Ideal operation names, will rename the instructions in x64 and aarch64 and update the test expectations. Done (commit daf38d3). @offamitkumar @feilongjiang @snazarkin please note that the ADL instructions `g1XChgP` and `g1XChgN` have been renamed to `g1GetAndSetP` and `g1GetAndSetN`, and the same naming is expected across all platforms by the test `compiler/gcbarriers/TestG1BarrierGeneration.java` included in this changeset. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1732301224 From mli at openjdk.org Tue Aug 27 07:46:10 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 27 Aug 2024 07:46:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 05:37:30 GMT, Fei Yang wrote: >> src/hotspot/cpu/riscv/c1_MacroAssembler_riscv.cpp line 170: >> >>> 168: mv(tmp1, (int32_t)(intptr_t)markWord::prototype().value()); >>> 169: sd(tmp1, Address(obj, oopDesc::mark_offset_in_bytes())); >>> 170: // Todo UseCompactObjectHeaders >> >> Can I ask, will this pr fullly support riscv? > > @Hamlin-Li : AFAIK, porting to linux-riscv platform has NOT been started yet. To avoid duplicate work, please let me know if anyone is interested or has been working on it :-) Yes, I'm interested in it. Thanks for raising the discussion. :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1732312058 From ayang at openjdk.org Tue Aug 27 10:23:02 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 27 Aug 2024 10:23:02 GMT Subject: RFR: 8311163: Parallel: Improve large object handling during evacuation In-Reply-To: References: Message-ID: <188kmFVjxpneetUFoIaZdj_p4EqGe0JJ3HIUtJQ0yaA=.a1781866-4d97-43e6-8bb0-bb07ab0d42c4@github.com> On Tue, 27 Aug 2024 00:16:10 GMT, Kim Barrett wrote: > Please review this change to ParallelGC young generation collection's handling > of large objArrays, to now use the infrastructure provided by JDK-8253237 and > JDK-8337709. (That's the same infrastructure used by G1 young/mixed > collections.) > > Testing: mach5 tier1-5 Using `-Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=6 DelayInducer.java`, I can see large regression on my box (nproc=12). Have you seen sth similar? # baseline [0.002s][info][gc] Using Parallel [1.252s][info][gc] GC(0) Pause Young (Allocation Failure) 381M->296M(479M) 50.696ms [1.490s][info][gc] GC(1) Pause Young (Allocation Failure) 422M->422M(674M) 165.177ms [1.864s][info][gc] GC(2) Pause Young (Allocation Failure) 673M->674M(925M) 212.106ms [2.243s][info][gc] GC(3) Pause Young (Allocation Failure) 925M->925M(1428M) 229.982ms # new [0.005s][info][gc] Using Parallel [5.504s][info][gc] GC(0) Pause Young (Allocation Failure) 381M->299M(479M) 203.409ms [6.233s][info][gc] GC(1) Pause Young (Allocation Failure) 424M->425M(676M) 620.718ms [7.594s][info][gc] GC(2) Pause Young (Allocation Failure) 676M->676M(928M) 1155.301ms [8.920s][info][gc] GC(3) Pause Young (Allocation Failure) 927M->928M(1430M) 1119.259ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/20720#issuecomment-2312131807 From ayang at openjdk.org Tue Aug 27 10:44:03 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 27 Aug 2024 10:44:03 GMT Subject: RFR: 8311163: Parallel: Improve large object handling during evacuation In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 00:16:10 GMT, Kim Barrett wrote: > Please review this change to ParallelGC young generation collection's handling > of large objArrays, to now use the infrastructure provided by JDK-8253237 and > JDK-8337709. (That's the same infrastructure used by G1 young/mixed > collections.) > > Testing: mach5 tier1-5 Using `-Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=6 DelayInducer.java`, I can see some regression (~10%) in avg gc-pause time on my box (nproc=12). Have you seen sth similar? # baseline [0.003s][info][gc] Using Parallel [1.175s][info][gc] GC(0) Pause Young (Allocation Failure) 381M->299M(479M) 53.655ms [1.432s][info][gc] GC(1) Pause Young (Allocation Failure) 424M->425M(676M) 185.065ms [1.821s][info][gc] GC(2) Pause Young (Allocation Failure) 676M->676M(928M) 234.141ms [2.171s][info][gc] GC(3) Pause Young (Allocation Failure) 927M->928M(1430M) 206.027ms [7.367s][info][gc] GC(4) Pause Young (Allocation Failure) 1814M->1809M(2312M) 631.159ms [8.217s][info][gc] GC(5) Pause Full (Allocation Failure) 2311M->732M(2105M) 547.506ms [10.453s][info][gc] GC(6) Pause Young (Allocation Failure) 1074M->1068M(2105M) 162.718ms [11.581s][info][gc] GC(7) Pause Young (Allocation Failure) 1410M->994M(2105M) 157.642ms [12.400s][info][gc] GC(8) Pause Young (Allocation Failure) 1336M->926M(2105M) 114.424ms # new [0.002s][info][gc] Using Parallel [1.176s][info][gc] GC(0) Pause Young (Allocation Failure) 381M->299M(479M) 59.968ms [1.415s][info][gc] GC(1) Pause Young (Allocation Failure) 425M->425M(676M) 167.393ms [1.771s][info][gc] GC(2) Pause Young (Allocation Failure) 676M->676M(928M) 199.576ms [2.151s][info][gc] GC(3) Pause Young (Allocation Failure) 927M->928M(1430M) 236.868ms [7.397s][info][gc] GC(4) Pause Young (Allocation Failure) 1814M->1809M(2312M) 689.166ms [8.299s][info][gc] GC(5) Pause Full (Allocation Failure) 2311M->732M(2106M) 603.035ms [10.560s][info][gc] GC(6) Pause Young (Allocation Failure) 1074M->1068M(2106M) 218.286ms [11.710s][info][gc] GC(7) Pause Young (Allocation Failure) 1410M->994M(2106M) 183.299ms [12.590s][info][gc] GC(8) Pause Young (Allocation Failure) 1336M->926M(2106M) 181.675ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/20720#issuecomment-2312172195 From rcastanedalo at openjdk.org Tue Aug 27 12:39:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 Aug 2024 12:39:07 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> Message-ID: On Tue, 27 Aug 2024 07:34:57 GMT, Roberto Casta?eda Lozano wrote: >>> That one is among the failing tests. Can we agree on better names than g1XChgP and g1XChgN? They are not readable very well IMHO. >> >> Sure, I agree that `g1GetAndSetP` and `g1GetAndSetN` are more consistent with the corresponding Ideal operation names, will rename the instructions in x64 and aarch64 and update the test expectations. >> >>> Regardless if you implement the compressed oops optimization or not, I'd reconsider moving the oop decoding into G1BarrierSetAssembler::g1_write_barrier_post_c2 because it makes the .ad file shorter because you can get rid of the replicated decode_heap_oop. >> >> Thanks, will try it out. > >> Sure, I agree that g1GetAndSetP and g1GetAndSetN are more consistent with the corresponding Ideal operation names, will rename the instructions in x64 and aarch64 and update the test expectations. > > Done (commit daf38d3). > > @offamitkumar @feilongjiang @snazarkin please note that the ADL instructions `g1XChgP` and `g1XChgN` have been renamed to `g1GetAndSetP` and `g1GetAndSetN`, and the same naming is expected across all platforms by the test `compiler/gcbarriers/TestG1BarrierGeneration.java` included in this changeset. > Regardless if you implement the compressed oops optimization or not, I'd reconsider moving the oop decoding into G1BarrierSetAssembler::g1_write_barrier_post_c2 because it makes the .ad file shorter because you can get rid of the replicated decode_heap_oop. I tried this refactoring [here](https://github.com/openjdk/jdk/commit/d4e83fd7d77c5415700b33556752e8c8da811dea), thanks again Martin for the suggestion. In my opinion, the result is similar in terms of readability/maintainability because the benefit of removing explicit `decode_heap_oop` operations in the ADL file is somewhat negated by the increased complexity of the `write_barrier_post` and `g1_write_barrier_post_c2` signatures. For aarch64, moving non-destructive `decode_heap_oop` operations would probably require passing both source and destination registers to these functions explicitly, which would make them more complex. I feel that this refactoring is only amortized when the new-value decoding operations are more complex, as in your PPC implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1732770143 From zgu at openjdk.org Tue Aug 27 13:41:09 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Tue, 27 Aug 2024 13:41:09 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v5] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 17:20:18 GMT, Albert Mingkun Yang wrote: >> Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. >> >> For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. >> >> With this patch, the exec time of `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java` using Parallel drops from ~30s to ~8s, the same as other GCs, and gc-log shows similar number of GC cycles as well. >> >> Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > assert LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20590#pullrequestreview-2263397509 From ayang at openjdk.org Tue Aug 27 15:21:15 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 27 Aug 2024 15:21:15 GMT Subject: RFR: 8338440: Parallel: Improve fragmentation mitigation in Full GC [v5] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 17:20:18 GMT, Albert Mingkun Yang wrote: >> Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. >> >> For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. >> >> With this patch, the exec time of `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java` using Parallel drops from ~30s to ~8s, the same as other GCs, and gc-log shows similar number of GC cycles as well. >> >> Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > assert Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20590#issuecomment-2312855043 From ayang at openjdk.org Tue Aug 27 15:21:16 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 27 Aug 2024 15:21:16 GMT Subject: Integrated: 8338440: Parallel: Improve fragmentation mitigation in Full GC In-Reply-To: References: Message-ID: On Thu, 15 Aug 2024 06:59:35 GMT, Albert Mingkun Yang wrote: > Extend `SplitInfo` to support more fine-grained splitting to mitigate the fragmentation issue during full GC. Added comments and diagrams in the process. > > For easier review, it's best to start with `SplitInfo` and then proceed to see how it is constructed in `summarize_split_space` and consumed in `first_src_addr`. The accompanying diagrams should help create a clear mental image. > > With this patch, the exec time of `runtime/ClassInitErrors/TestOutOfMemoryDuringInit.java` using Parallel drops from ~30s to ~8s, the same as other GCs, and gc-log shows similar number of GC cycles as well. > > Test: tier1-8, systemgc micro bm, CacheStress, dacapo, specjbb2005, specjvm2008 This pull request has now been integrated. Changeset: 1ff5f8d6 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/1ff5f8d65cf6153e517ee7a242d10536eee0d637 Stats: 568 lines in 2 files changed: 209 ins; 145 del; 214 mod 8338440: Parallel: Improve fragmentation mitigation in Full GC Co-authored-by: Guoxiong Li Reviewed-by: iwalulya, zgu, gli ------------- PR: https://git.openjdk.org/jdk/pull/20590 From nprasad at openjdk.org Tue Aug 27 16:48:14 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Tue, 27 Aug 2024 16:48:14 GMT Subject: Integrated: 8336299: Improve GCLocker stall diagnostics In-Reply-To: References: Message-ID: On Mon, 22 Jul 2024 13:52:41 GMT, Neethu Prasad wrote: > **Notes** > Adding logs to get more visibility into how fast a thread resumes from allocation stall. > > **Testing** > * tier 1, tier 2, hotspot_gc tests. > > Example log messages > > 1. Last thread exiting. Performing GC after exiting critical section. Thread "main" 0 locked. > > 2. Thread exiting critical region. Thread "main" 0 locked. > > 3. Thread stalled by JNI critical section. Resumed after 1294ms. Thread "Thread-0". > > 4. Thread blocked to enter critical region. Resumed after 1280ms. Thread "SIGINT handler". This pull request has now been integrated. Changeset: 284c3cde Author: Neethu Prasad URL: https://git.openjdk.org/jdk/commit/284c3cde5e1b7115fb17c51f3ed17c1be95845bc Stats: 43 lines in 1 file changed: 27 ins; 0 del; 16 mod 8336299: Improve GCLocker stall diagnostics Reviewed-by: ayang, shade, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/20277 From mdoerr at openjdk.org Tue Aug 27 17:41:13 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 27 Aug 2024 17:41:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> Message-ID: On Tue, 27 Aug 2024 12:36:39 GMT, Roberto Casta?eda Lozano wrote: >>> Sure, I agree that g1GetAndSetP and g1GetAndSetN are more consistent with the corresponding Ideal operation names, will rename the instructions in x64 and aarch64 and update the test expectations. >> >> Done (commit daf38d3). >> >> @offamitkumar @feilongjiang @snazarkin please note that the ADL instructions `g1XChgP` and `g1XChgN` have been renamed to `g1GetAndSetP` and `g1GetAndSetN`, and the same naming is expected across all platforms by the test `compiler/gcbarriers/TestG1BarrierGeneration.java` included in this changeset. > >> Regardless if you implement the compressed oops optimization or not, I'd reconsider moving the oop decoding into G1BarrierSetAssembler::g1_write_barrier_post_c2 because it makes the .ad file shorter because you can get rid of the replicated decode_heap_oop. > > I tried this refactoring [here](https://github.com/openjdk/jdk/commit/d4e83fd7d77c5415700b33556752e8c8da811dea), thanks again Martin for the suggestion. In my opinion, the result is similar in terms of readability/maintainability because the benefit of removing explicit `decode_heap_oop` operations in the ADL file is somewhat negated by the increased complexity of the `write_barrier_post` and `g1_write_barrier_post_c2` signatures. For aarch64, moving non-destructive `decode_heap_oop` operations would probably require passing both source and destination registers to these functions explicitly, which would make them more complex. I feel that this refactoring is only amortized when the new-value decoding operations are more complex, as in your PPC implementation. Thanks for trying! I like your refactored version. I prefer moving stuff out of the .ad files. The complexity of the barrier code is not significantly higher. Note that you could use `decode_heap_oop_not_null(Register dst, Register src)` with `tmp2` as dst (`generate_post_barrier_fast_path` can deal with new_val == tmp2). That would save a move instruction in some cases. I haven't looked into the aarch64 code. I leave you free to decide. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1733283320 From kbarrett at openjdk.org Wed Aug 28 08:27:25 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 28 Aug 2024 08:27:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: Message-ID: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> On Tue, 27 Aug 2024 07:30:46 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Rename g1XChgX to g1GetAndSetX for consistency with Ideal operation names I've only looked at the changes in gc directories (shared and cpu-specific). src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 160: > 158: * To reduce the number of updates to the remembered set, the post-barrier > 159: * filters out updates to fields in objects located in the Young Generation, the > 160: * same region as the reference, when the null is being written, or if the card s/the null/null/ src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 166: > 164: * post-barrier completely, if it is possible during compile time to prove the > 165: * object is newly allocated and that no safepoint exists between the allocation > 166: * and the store. It might be worth saying explicitly that this is a compile-time version of the above mentioned young generation filter. src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 229: > 227: } > 228: > 229: void refine_barrier_by_new_val_type(Node* n) { This function should probably be `static`. ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2259069811 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1734167614 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1734196887 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1734207820 From kbarrett at openjdk.org Wed Aug 28 08:27:27 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 28 Aug 2024 08:27:27 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v9] In-Reply-To: References: Message-ID: On Mon, 19 Aug 2024 08:53:30 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Pass oldval to the pre-barrier in g1CompareAndExchange/SwapP src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 218: > 216: __ cbz(new_val, done); > 217: } > 218: // Storing region crossing non-null, is card already dirty? s/already dirty/young/ src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 280: > 278: > 279: #undef __ > 280: #define __ masm-> These "changes" to `__` are unnecessary and confusing. We have the same define near the top of the file, unconditionally. This one is conditonal on COMPILER2, but is left in place at the end of the conditional block, affecting following unconditional code. src/hotspot/share/opto/memnode.cpp line 3468: > 3466: // Capture an unaliased, unconditional, simple store into an initializer. > 3467: // Or, if it is independent of the allocation, hoist it above the allocation. > 3468: if (ReduceFieldZeroing && ReduceInitialCardMarks && /*can_reshape &&*/ It's not obvious to me how this is related to the late barrier changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730194278 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730238757 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1730246320 From kbarrett at openjdk.org Wed Aug 28 08:27:28 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 28 Aug 2024 08:27:28 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Wed, 28 Aug 2024 08:09:44 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename g1XChgX to g1GetAndSetX for consistency with Ideal operation names > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 166: > >> 164: * post-barrier completely, if it is possible during compile time to prove the >> 165: * object is newly allocated and that no safepoint exists between the allocation >> 166: * and the store. > > It might be worth saying explicitly that this is a compile-time version of the above mentioned young > generation filter. We can similarly elide the post-barrier if we can prove at compile-time that the value being written is null. That case isn't handled here though. Instead that's checked for in `refine_barrier_by_new_val_type` and in `get_store_barrier`. I'm not sure why it's structured that way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1734201007 From ayang at openjdk.org Wed Aug 28 12:57:45 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 28 Aug 2024 12:57:45 GMT Subject: RFR: 8339160: [BACKOUT] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC Message-ID: Clean revert of the problematic commit to reduce CI noise. ------------- Commit messages: - Revert "8338440: Parallel: Improve fragmentation mitigation in Full GC" Changes: https://git.openjdk.org/jdk/pull/20746/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20746&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339160 Stats: 570 lines in 2 files changed: 147 ins; 211 del; 212 mod Patch: https://git.openjdk.org/jdk/pull/20746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20746/head:pull/20746 PR: https://git.openjdk.org/jdk/pull/20746 From tschatzl at openjdk.org Wed Aug 28 13:24:18 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 28 Aug 2024 13:24:18 GMT Subject: RFR: 8339160: [BACKOUT] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 12:53:05 GMT, Albert Mingkun Yang wrote: > Clean revert of the problematic commit to reduce CI noise. Lgtm, clean backout, ship it. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20746#pullrequestreview-2266249306 From ayang at openjdk.org Wed Aug 28 13:30:22 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 28 Aug 2024 13:30:22 GMT Subject: RFR: 8339160: [BACKOUT] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 12:53:05 GMT, Albert Mingkun Yang wrote: > Clean revert of the problematic commit to reduce CI noise. Thanks for review. Merging now to reduce CI noise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20746#issuecomment-2315318535 From ayang at openjdk.org Wed Aug 28 13:30:22 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 28 Aug 2024 13:30:22 GMT Subject: Integrated: 8339160: [BACKOUT] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC In-Reply-To: References: Message-ID: <4EvNC_azdw6Df4MvrRgLbcII1tRIzCTDIACC4XeOwko=.5fa9f4d0-02f1-4f61-a037-fa499cabdb8b@github.com> On Wed, 28 Aug 2024 12:53:05 GMT, Albert Mingkun Yang wrote: > Clean revert of the problematic commit to reduce CI noise. This pull request has now been integrated. Changeset: 32c97509 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/32c975098521e830ce706b67e7232a007c0846c7 Stats: 570 lines in 2 files changed: 147 ins; 211 del; 212 mod 8339160: [BACKOUT] JDK-8338440 Parallel: Improve fragmentation mitigation in Full GC Reviewed-by: tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/20746 From rcastanedalo at openjdk.org Wed Aug 28 15:49:22 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 28 Aug 2024 15:49:22 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> Message-ID: <6rvTU-KY2DpLx3sK7zmkaZCBaNELr3FLGItGwUJzNUM=.0e75c7a7-4309-49c5-b48b-5aa642bcbe43@github.com> On Tue, 27 Aug 2024 17:38:28 GMT, Martin Doerr wrote: >>> Regardless if you implement the compressed oops optimization or not, I'd reconsider moving the oop decoding into G1BarrierSetAssembler::g1_write_barrier_post_c2 because it makes the .ad file shorter because you can get rid of the replicated decode_heap_oop. >> >> I tried this refactoring [here](https://github.com/openjdk/jdk/commit/d4e83fd7d77c5415700b33556752e8c8da811dea), thanks again Martin for the suggestion. In my opinion, the result is similar in terms of readability/maintainability because the benefit of removing explicit `decode_heap_oop` operations in the ADL file is somewhat negated by the increased complexity of the `write_barrier_post` and `g1_write_barrier_post_c2` signatures. For aarch64, moving non-destructive `decode_heap_oop` operations would probably require passing both source and destination registers to these functions explicitly, which would make them more complex. I feel that this refactoring is only amortized when the new-value decoding operations are more complex, as in your PPC implementation. > > Thanks for trying! I like your refactored version. I prefer moving stuff out of the .ad files. The complexity of the barrier code is not significantly higher. Note that you could use `decode_heap_oop_not_null(Register dst, Register src)` with `tmp2` as dst (`generate_post_barrier_fast_path` can deal with new_val == tmp2). That would save a move instruction in some cases. > I haven't looked into the aarch64 code. I leave you free to decide. Thanks, I prototyped the refactored version for both x64 and aarch64 [here](https://github.com/robcasloz/jdk/commit/c1ae871eadac0d44981b7892ac8f7b64e8734283). I do not have a strong opinion for or against this refactoring. @albertnetymk @kimbarrett what do you think about it? (asking since you have recently looked at this code). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1734924686 From lmesnik at openjdk.org Wed Aug 28 20:21:25 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 28 Aug 2024 20:21:25 GMT Subject: Integrated: 8258483: [TESTBUG] gtest CollectorPolicy.young_scaled_initial_ergo_vm fails if heap is too small In-Reply-To: <7Qtt8s9VVWzl6VfmXVPR3GQTCKcQA5Fx-fEgsPLCLk8=.f084eb20-96be-4bde-ac50-84ca2fc54951@github.com> References: <7Qtt8s9VVWzl6VfmXVPR3GQTCKcQA5Fx-fEgsPLCLk8=.f084eb20-96be-4bde-ac50-84ca2fc54951@github.com> Message-ID: On Wed, 21 Aug 2024 00:16:26 GMT, Leonid Mesnik wrote: > The tests CollectorPolicy.* checks SerialGC policy. They might fail if MaxHeapSize is too small. > > If heap is not enough for then VM change ergonomic scheme and print warning about this. The test is not checking this case. > The GC ergonomic has very different cases and only main workflow is covered. The goal of fix is not to improve test but pass in reasonable environment or silently pass if other. > > I have updated test so it pass if heap is at least 128M (since it is SerialGC, it seems reasonable for testing in smaller containers) or skipped otherwise. > > Testing: tier1, running these tests manually with 90/128/256M to check that they pass in such environment. This pull request has now been integrated. Changeset: d08b5bd9 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/d08b5bd9f5f740d75c1acfbd644ce1c822e03833 Stats: 12 lines in 1 file changed: 9 ins; 0 del; 3 mod 8258483: [TESTBUG] gtest CollectorPolicy.young_scaled_initial_ergo_vm fails if heap is too small Reviewed-by: ayang ------------- PR: https://git.openjdk.org/jdk/pull/20656 From kbarrett at openjdk.org Thu Aug 29 06:00:18 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 29 Aug 2024 06:00:18 GMT Subject: RFR: 8311163: Parallel: Improve large object handling during evacuation In-Reply-To: References: Message-ID: <-UurDpPgJ8q7_zcyk1o8X9PUo4rSxToiYFrtO2b0ugM=.821c2ccd-7edb-41ed-a1bc-0e4cfdd50b65@github.com> On Tue, 27 Aug 2024 10:40:59 GMT, Albert Mingkun Yang wrote: > Using `-Xmx3g -XX:+UseParallelGC -XX:ParallelGCThreads=6 DelayInducer.java`, I can see some regression (~10%) in avg gc-pause time on my box (nproc=12). Have you seen sth similar? No, I'm not seeing anything like that. Indeed, I'm seeing strongly the opposite. Note that my machine has nproc=32, but the results were similar with -XX:ActiveProcessorCount=12. old [1.966s][info][gc] GC(0) Pause Young (Allocation Failure) 504M->387M(1930M) 744.238ms [2.446s][info][gc] GC(1) Pause Young (Allocation Failure) 891M->882M(2283M) 236.505ms [7.733s][info][gc] GC(2) Pause Young (Allocation Failure) 1686M->1669M(2668M) 1678.807ms [9.423s][info][gc] GC(3) Pause Full (Allocation Failure) 2523M->89M(2421M) 45.843ms new [1.694s][info][gc] GC(0) Pause Young (Allocation Failure) 504M->397M(1930M) 469.406ms [2.140s][info][gc] GC(1) Pause Young (Allocation Failure) 901M->892M(2283M) 224.942ms [6.533s][info][gc] GC(2) Pause Young (Allocation Failure) 1682M->1669M(2668M) 761.521ms [8.253s][info][gc] GC(3) Pause Full (Allocation Failure) 2509M->89M(2408M) 47.130ms These results are pretty stable across 10-12 runs. Regardless of which variant is used, the number and types of GCs is stable, as are the collection amounts. And for a given variant the times are fairly stable. For GC(0), the new variant is consistently faster, though the fastest old time and the slowest new time are close or overlap. For GC(1), the two variants are pretty similar. For GC(2), the new variant is consistently more than a factor of 2 faster, and the average difference is closer to 2.5x. Digging a bit deeper, here are sample times for the Scavenge phase. old [7.849s][debug][gc,phases ] GC(2) Scavenge 1928.903ms new [6.511s][debug][gc,phases ] GC(2) Scavenge 774.372ms I haven't found a way to get more detailed timing information there. In particular, I thought there was some logging in the TaskTerminator, but don't see any now. I rebuilt the variants with -DTASKQUEUE_STATS and ran with gc+task+stats=trace logging, and here's a sample output: [6.610s][trace][gc,task,stats] GC(2) thr push steal chunked chunks [6.610s][trace][gc,task,stats] GC(2) --- ---------- ---------- ---------- ---------- [6.610s][trace][gc,task,stats] GC(2) 0 306340 0 19 306335 [6.610s][trace][gc,task,stats] GC(2) 1 300040 2 7 300041 [6.610s][trace][gc,task,stats] GC(2) 2 303588 1 0 303589 [6.610s][trace][gc,task,stats] GC(2) 3 334770 3 0 334771 [6.610s][trace][gc,task,stats] GC(2) 4 332980 3 0 332981 [6.610s][trace][gc,task,stats] GC(2) 5 436049 1 0 436050 new [6.657s][trace][gc,task,stats] GC(2) ----partial array---- arrays array [6.657s][trace][gc,task,stats] GC(2) thr push steal chunked chunks [6.658s][trace][gc,task,stats] GC(2) --- ---------- ---------- ---------- ---------- [6.658s][trace][gc,task,stats] GC(2) 0 315502 2 18 315503 [6.658s][trace][gc,task,stats] GC(2) 1 314152 1 0 314153 [6.658s][trace][gc,task,stats] GC(2) 2 321092 2 7 321093 [6.658s][trace][gc,task,stats] GC(2) 3 406893 4 0 406893 [6.658s][trace][gc,task,stats] GC(2) 4 326710 1 0 326707 [6.658s][trace][gc,task,stats] GC(2) 5 329418 1 1 329418 The number of partial array steals is about the same. So I don't yet know why the change is so beneficial for GC(2). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20720#issuecomment-2316773823 From ayang at openjdk.org Thu Aug 29 08:40:21 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 29 Aug 2024 08:40:21 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: <6rvTU-KY2DpLx3sK7zmkaZCBaNELr3FLGItGwUJzNUM=.0e75c7a7-4309-49c5-b48b-5aa642bcbe43@github.com> References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> <6rvTU-KY2DpLx3sK7zmkaZCBaNELr3FLGItGwUJzNUM=.0e75c7a7-4309-49c5-b48b-5aa642bcbe43@github.com> Message-ID: On Wed, 28 Aug 2024 15:46:57 GMT, Roberto Casta?eda Lozano wrote: >> Thanks for trying! I like your refactored version. I prefer moving stuff out of the .ad files. The complexity of the barrier code is not significantly higher. Note that you could use `decode_heap_oop_not_null(Register dst, Register src)` with `tmp2` as dst (`generate_post_barrier_fast_path` can deal with new_val == tmp2). That would save a move instruction in some cases. >> I haven't looked into the aarch64 code. I leave you free to decide. > > Thanks, I prototyped the refactored version for both x64 and aarch64 [here](https://github.com/robcasloz/jdk/commit/c1ae871eadac0d44981b7892ac8f7b64e8734283). I do not have a strong opinion for or against this refactoring. @albertnetymk @kimbarrett what do you think about it? (asking since you have recently looked at this code). I find the use of default-arg-value `bool decode_new_val = false` a bit confusing. (I tend to think default-arg-value makes the code less readable in general.) If not using default-arg-value, I suspect the diff will be larger, and I don't see the immediate benefit of this refactoring. Maybe this can be deferred to its own PR if it's really desirable? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1735806805 From rcastanedalo at openjdk.org Thu Aug 29 09:11:23 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 29 Aug 2024 09:11:23 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> <6rvTU-KY2DpLx3sK7zmkaZCBaNELr3FLGItGwUJzNUM=.0e75c7a7-4309-49c5-b48b-5aa642bcbe43@github.com> Message-ID: On Thu, 29 Aug 2024 08:37:24 GMT, Albert Mingkun Yang wrote: >> Thanks, I prototyped the refactored version for both x64 and aarch64 [here](https://github.com/robcasloz/jdk/commit/c1ae871eadac0d44981b7892ac8f7b64e8734283). I do not have a strong opinion for or against this refactoring. @albertnetymk @kimbarrett what do you think about it? (asking since you have recently looked at this code). > > I find the use of default-arg-value `bool decode_new_val = false` a bit confusing. (I tend to think default-arg-value makes the code less readable in general.) > > If not using default-arg-value, I suspect the diff will be larger, and I don't see the immediate benefit of this refactoring. Maybe this can be deferred to its own PR if it's really desirable? Thanks for looking at it, Albert! Since there is no clear consensus, let's postpone the refactoring. We can come back to it after the JEP is integrated if there is renewed interest. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1735852561 From stefank at openjdk.org Thu Aug 29 09:31:21 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 29 Aug 2024 09:31:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/hotspot/share/cds/archiveHeapWriter.cpp line 214: > 212: oopDesc::set_mark(mem, markWord::prototype()); > 213: oopDesc::release_set_klass(mem, k); > 214: } The `UseCompactObjectHeaders` path calls `get_requested_narrow_klass`, while the `else` part directly uses `k`. Is one of these paths incorrect? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1735881613 From tschatzl at openjdk.org Thu Aug 29 09:45:19 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 29 Aug 2024 09:45:19 GMT Subject: RFR: 8311163: Parallel: Improve large object handling during evacuation In-Reply-To: References: Message-ID: <3HMcFqPLOklSkTRx7DuDf0x6pAZGqA_fQQ2BIn-vzls=.267af0ac-48f1-40a7-a3f0-a5d8ad4efcbb@github.com> On Tue, 27 Aug 2024 00:16:10 GMT, Kim Barrett wrote: > Please review this change to ParallelGC young generation collection's handling > of large objArrays, to now use the infrastructure provided by JDK-8253237 and > JDK-8337709. (That's the same infrastructure used by G1 young/mixed > collections.) > > Testing: mach5 tier1-5 I did some runs with 1, 2, 4, 6, 12 and 25 parallel workers and I can't see regressions either; the changes seem to be better always. The only difference from your runs is that with `-Xmx3g` there are in total four young collections every time. The last one is the one showing the 2x+ speedup. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20720#issuecomment-2317176794 From tschatzl at openjdk.org Thu Aug 29 10:27:18 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 29 Aug 2024 10:27:18 GMT Subject: RFR: 8311163: Parallel: Improve large object handling during evacuation In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 00:16:10 GMT, Kim Barrett wrote: > Please review this change to ParallelGC young generation collection's handling > of large objArrays, to now use the infrastructure provided by JDK-8253237 and > JDK-8337709. (That's the same infrastructure used by G1 young/mixed > collections.) > > Testing: mach5 tier1-5 The old code seems to only ever push a single continuation task, while the new code pushes multiple (after the first one), and tries to keep them at `ParallelGCThreads`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20720#issuecomment-2317258824 From ayang at openjdk.org Thu Aug 29 11:03:20 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 29 Aug 2024 11:03:20 GMT Subject: RFR: 8311163: Parallel: Improve large object handling during evacuation In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 00:16:10 GMT, Kim Barrett wrote: > Please review this change to ParallelGC young generation collection's handling > of large objArrays, to now use the infrastructure provided by JDK-8253237 and > JDK-8337709. (That's the same infrastructure used by G1 young/mixed > collections.) > > Testing: mach5 tier1-5 Marked as reviewed by ayang (Reviewer). Using another box (AMD), the improvement becomes clear. ## baesline [0.003s][info][gc] Using Parallel [1.464s][info][gc] GC(0) Pause Young (Allocation Failure) 512M->344M(1963M) 586.065ms [2.163s][info][gc] GC(1) Pause Young (Allocation Failure) 857M->853M(2304M) 298.110ms [8.208s][info][gc] GC(2) Pause Young (Allocation Failure) 1707M->1669M(2689M) 2986.103ms [9.941s][info][gc] GC(3) Pause Full (Allocation Failure) 2516M->91M(2485M) 38.478ms ## new [0.002s][info][gc] Using Parallel [1.325s][info][gc] GC(0) Pause Young (Allocation Failure) 512M->355M(1963M) 415.916ms [1.791s][info][gc] GC(1) Pause Young (Allocation Failure) 867M->858M(2304M) 212.690ms [5.663s][info][gc] GC(2) Pause Young (Allocation Failure) 1700M->1669M(2689M) 821.355ms [7.088s][info][gc] GC(3) Pause Full (Allocation Failure) 2510M->91M(2475M) 32.170ms > So I don't yet know why the change is so beneficial for GC(2). With `-Xlog:gc*=debug`, I can see an expansion during GC(2) -- I guess because the expansion operation is synchronous, not having any chunks in the task-queue essentially blocks other workers. src/hotspot/share/gc/parallel/psPromotionManager.cpp line 323: > 321: } > 322: > 323: void PSPromotionManager::push_objArray(oop old_obj, oop new_obj, size_t obj_size) { `obj_size` seems unused. ------------- PR Review: https://git.openjdk.org/jdk/pull/20720#pullrequestreview-2268387400 PR Comment: https://git.openjdk.org/jdk/pull/20720#issuecomment-2317326172 PR Review Comment: https://git.openjdk.org/jdk/pull/20720#discussion_r1735998641 From stuefe at openjdk.org Thu Aug 29 11:40:25 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 29 Aug 2024 11:40:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/hotspot/share/cds/archiveHeapWriter.hpp line 261: > 259: // at mapping start, these 4G are enough. Therefore, we don't need to shift at all (shift=0). > 260: static constexpr int precomputed_narrow_klass_shift = 0; > 261: Reviewer Note: move to ArchiveBuilder ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1736042302 From tschatzl at openjdk.org Thu Aug 29 12:12:20 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 29 Aug 2024 12:12:20 GMT Subject: RFR: 8311163: Parallel: Improve large object handling during evacuation In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 00:16:10 GMT, Kim Barrett wrote: > Please review this change to ParallelGC young generation collection's handling > of large objArrays, to now use the infrastructure provided by JDK-8253237 and > JDK-8337709. (That's the same infrastructure used by G1 young/mixed > collections.) > > Testing: mach5 tier1-5 lgtm ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20720#pullrequestreview-2268535455 From zgu at openjdk.org Thu Aug 29 13:50:51 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Thu, 29 Aug 2024 13:50:51 GMT Subject: RFR: 8339097: Parallel: Compact GC to split array early for task stealing Message-ID: Parallel Compact GC splits a large array in stripes during marking, so that other workers can steal the work. Currently, it only splits a large array into two tasks, retains and pushes remaining into a task queue for task stealing, depends on next worker to further split the array if possible, that creates artificial dependency. I would like purpose to have the first worker breaking up the array, to eliminate the dependency. ------------- Commit messages: - 8339097: Parallel: Compact GC to split array early for task stealing Changes: https://git.openjdk.org/jdk/pull/20745/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20745&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339097 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20745.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20745/head:pull/20745 PR: https://git.openjdk.org/jdk/pull/20745 From stuefe at openjdk.org Thu Aug 29 14:23:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 29 Aug 2024 14:23:20 GMT Subject: RFR: 8339097: Parallel: Compact GC to split array early for task stealing In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 12:49:46 GMT, Zhengyu Gu wrote: > Parallel Compact GC splits a large array in stripes during marking, so that other workers can steal the work. > > Currently, it only splits a large array into two tasks, retains and pushes remaining into a task queue for task stealing, depends on next worker to further split the array if possible, that creates artificial dependency. > > I would like purpose to have the first worker breaking up the array, to eliminate the dependency. I am no expert, but I thought we do it this way to not artificially blow up the task queue for large arrays and many stripes? I may be wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20745#issuecomment-2317852117 From zgu at openjdk.org Thu Aug 29 14:41:21 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Thu, 29 Aug 2024 14:41:21 GMT Subject: RFR: 8339097: Parallel: Compact GC to split array early for task stealing In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 14:20:38 GMT, Thomas Stuefe wrote: > I am no expert, but I thought we do it this way to not artificially blow up the task queue for large arrays and many stripes? I may be wrong. Ran many tests/benchmarks with task queue stats reporting on, almost never overflow, even for marking stack. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20745#issuecomment-2317909788 From kbarrett at openjdk.org Thu Aug 29 19:09:19 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 29 Aug 2024 19:09:19 GMT Subject: RFR: 8339097: Parallel: Compact GC to split array early for task stealing In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 14:38:34 GMT, Zhengyu Gu wrote: > > I am no expert, but I thought we do it this way to not artificially blow up the task queue for large arrays and many stripes? I may be wrong. > > Ran many tests/benchmarks with task queue stats reporting on, almost never overflow, even for marking stack. Also see comment [here](https://github.com/openjdk/jdk/pull/20720#issuecomment-2317258824), similar idea. You've ignored part of that comment - "... tries to keep them at ParallelGCThreads" That's the purpose of PartialArrayTaskStepper, along with the more recent PartialArrayState (to allow the partial array tasks to be in the same queue as the ordinary oop/narrowOop tasks). Please don't make this change. The plan is to change this part of ParallelGC to use that new infrastructure, eliminating the need for ObjArrayTask and the separate taskqueue for them. That will simplify getting tasks, stealing tasks, and termination. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20745#issuecomment-2318656274 From kbarrett at openjdk.org Thu Aug 29 19:09:19 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 29 Aug 2024 19:09:19 GMT Subject: RFR: 8339097: Parallel: Compact GC to split array early for task stealing In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 12:49:46 GMT, Zhengyu Gu wrote: > Parallel Compact GC splits a large array in stripes during marking, so that other workers can steal the work. > > Currently, it only splits a large array into two tasks, retains and pushes remaining into a task queue for task stealing, depends on next worker to further split the array if possible, that creates artificial dependency. > > I would like purpose to have the first worker breaking up the array, to eliminate the dependency. Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/parallel/psCompactionManager.inline.hpp line 129: > 127: while (end_index < len) { > 128: cm->push_objarray(obj, end_index); > 129: end_index += (size_t)ObjArrayMarkingStride; Don't make this change. PSCM should instead be changed to use PartialArrayTaskStepper and PartialArrayState, with associated elimination of the separate ObjArrayTask queue. ------------- PR Review: https://git.openjdk.org/jdk/pull/20745#pullrequestreview-2269876411 PR Review Comment: https://git.openjdk.org/jdk/pull/20745#discussion_r1736955852 From zgu at openjdk.org Thu Aug 29 19:39:22 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Thu, 29 Aug 2024 19:39:22 GMT Subject: RFR: 8339097: Parallel: Compact GC to split array early for task stealing In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 19:07:01 GMT, Kim Barrett wrote: >> Parallel Compact GC splits a large array in stripes during marking, so that other workers can steal the work. >> >> Currently, it only splits a large array into two tasks, retains and pushes remaining into a task queue for task stealing, depends on next worker to further split the array if possible, that creates artificial dependency. >> >> I would like purpose to have the first worker breaking up the array, to eliminate the dependency. > > src/hotspot/share/gc/parallel/psCompactionManager.inline.hpp line 129: > >> 127: while (end_index < len) { >> 128: cm->push_objarray(obj, end_index); >> 129: end_index += (size_t)ObjArrayMarkingStride; > > Don't make this change. PSCM should instead be changed to use PartialArrayTaskStepper and > PartialArrayState, with associated elimination of the separate ObjArrayTask queue. Okay, eliminating `ObjArrayTask` queue sounds great. I wonder how hard to backport to 17u, because I am really targeting the change to 17u. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20745#discussion_r1737039810 From zgu at openjdk.org Thu Aug 29 19:39:22 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Thu, 29 Aug 2024 19:39:22 GMT Subject: Withdrawn: 8339097: Parallel: Compact GC to split array early for task stealing In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 12:49:46 GMT, Zhengyu Gu wrote: > Parallel Compact GC splits a large array in stripes during marking, so that other workers can steal the work. > > Currently, it only splits a large array into two tasks, retains and pushes remaining into a task queue for task stealing, depends on next worker to further split the array if possible, that creates artificial dependency. > > I would like purpose to have the first worker breaking up the array, to eliminate the dependency. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20745 From iklam at openjdk.org Thu Aug 29 22:35:22 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 29 Aug 2024 22:35:22 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: <3M4XT4BaiowyWjJhSoFBREh9e-Be2B6L4tHVAXKw5VQ=.7647e788-8d7d-4e05-91f3-509c6fbd0d3c@github.com> On Thu, 29 Aug 2024 09:28:50 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/cds/archiveHeapWriter.cpp line 214: > >> 212: oopDesc::set_mark(mem, markWord::prototype()); >> 213: oopDesc::release_set_klass(mem, k); >> 214: } > > The `UseCompactObjectHeaders` path calls `get_requested_narrow_klass`, while the `else` part directly uses `k`. Is one of these paths incorrect? This seems odd. The original code sets `Universe::objectArrayKlass()` into the object header. This is the value of this class in the current JVM lifetime. Later, `ArchiveHeapWriter::update_header_for_requested_obj()` would change the object's klass to the "requested" address. I.e., where this class will be loaded in a future JVM lifetime when the CDS archive is loaded into memory. It seems the same logic should be used in the `UseCompactObjectHeaders==true` case. BTW (unrelated to this PR) the comment a few lines up is outdated and wrong: Klass* k = Universe::objectArrayKlass(); // already relocated to point to archived klass `k` is the value of the *actual* location of this class in the current JVM lifetime. Please ignore this comment when trying to understand this function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1737294872 From stefank at openjdk.org Fri Aug 30 07:30:22 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 30 Aug 2024 07:30:22 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 17:30:14 GMT, Albert Mingkun Yang wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove hashcode leftovers from SA > > src/hotspot/share/gc/serial/serialArguments.cpp line 33: > >> 31: void SerialArguments::initialize_heap_flags_and_sizes() { >> 32: GenArguments::initialize_heap_flags_and_sizes(); >> 33: GCForwarding::initialize_flags(MaxNewSize + MaxOldSize); > > Can one use `MaxHeapSize` here? Good catch. This is actually a bug that is causing the CDS tests to fail. The used variables have not yet been initialized at this point. I tried making the suggested change and that fixed at least one of the CDS failures. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738101667 From stefank at openjdk.org Fri Aug 30 07:30:23 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 30 Aug 2024 07:30:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> References: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> Message-ID: On Thu, 22 Aug 2024 19:36:00 GMT, Albert Mingkun Yang wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix hash shift for 32 bit builds > > src/hotspot/share/gc/shared/gcForwarding.cpp line 37: > >> 35: size_t max_narrow_heap_size = right_n_bits(NumLowBitsNarrow - Shift); >> 36: if (UseCompactObjectHeaders && max_heap_size > max_narrow_heap_size * HeapWordSize) { >> 37: FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); > > Maybe a log-info/warning would be nice. Yes. This silent setting of UseCompactObjectHeaders ended up hiding why we got CDS failures. I would also suggest that we change this to FLAG_SET_ERGO. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738104783 From stuefe at openjdk.org Fri Aug 30 07:40:23 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 30 Aug 2024 07:40:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 16:23:19 GMT, Leonid Mesnik wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > make/Images.gmk line 135: > >> 133: # >> 134: # Param1 - VM variant (e.g., server, client, zero, ...) >> 135: # Param2 - _nocoops, _coh, _nocoops_coh, or empty > > The -XX:+UseCompactObjectHeaders ssems to incompatible withe zero vm. The zero vm build start failing while generating shared archive with +UseCompactObjectHeaders. Generation should be disabled by default for zero to don't break the build. No, zero works with +COH, but a small change is needed. I'll post a suggestion inline. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738119614 From stuefe at openjdk.org Fri Aug 30 07:45:19 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 30 Aug 2024 07:45:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 07:25:54 GMT, Stefan Karlsson wrote: >> src/hotspot/share/gc/serial/serialArguments.cpp line 33: >> >>> 31: void SerialArguments::initialize_heap_flags_and_sizes() { >>> 32: GenArguments::initialize_heap_flags_and_sizes(); >>> 33: GCForwarding::initialize_flags(MaxNewSize + MaxOldSize); >> >> Can one use `MaxHeapSize` here? > > Good catch. This is actually a bug that is causing the CDS tests to fail. The used variables have not yet been initialized at this point. I tried making the suggested change and that fixed at least one of the CDS failures. Yes, one must, since MaxNewSize and MaxOldSize are still on their initial values, so way too large to allow the GC forwarding, and therefore CompactObjectHeaders get automatically disabled for SerialGC. That explains a bunch of the problems @lmesnik saw. This fixes SerialGC for me: Suggestion: GCForwarding::initialize_flags(MaxHeapSize); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738123826 From stuefe at openjdk.org Fri Aug 30 07:45:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 30 Aug 2024 07:45:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: References: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> Message-ID: <3QGPH52NyrDPne5EgoGx2sx9OeGRu9K72onNNwzMr2M=.8a390b3d-2e8a-470e-8bb7-1ba975070c53@github.com> On Fri, 30 Aug 2024 07:27:45 GMT, Stefan Karlsson wrote: >> src/hotspot/share/gc/shared/gcForwarding.cpp line 37: >> >>> 35: size_t max_narrow_heap_size = right_n_bits(NumLowBitsNarrow - Shift); >>> 36: if (UseCompactObjectHeaders && max_heap_size > max_narrow_heap_size * HeapWordSize) { >>> 37: FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); >> >> Maybe a log-info/warning would be nice. > > Yes. This silent setting of UseCompactObjectHeaders ended up hiding why we got CDS failures. I would also suggest that we change this to FLAG_SET_ERGO. Seems we run all into the same thoughts :) I added Suggestion: FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); warning("Compact object headers require a java heap size smaller than %zu (given: %zu). " "Disabling compact object headers.", max_narrow_heap_size * HeapWordSize, max_heap_size); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738127194 From stefank at openjdk.org Fri Aug 30 08:10:21 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 30 Aug 2024 08:10:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/hotspot/share/cds/filemap.cpp line 2507: > 2505: } > 2506: > 2507: if (compact_headers() != UseCompactObjectHeaders) { (Commenting here, but the comment applies to code a bit above) While debugging CDS, it would have been useful to print the value of UseCompactObjectHeaders. Could we change the code to be: log_info(cds)("Archive was created with UseCompressedOops = %d, UseCompressedClassPointers = %d, UseCompactObjectHeaders = %d", compressed_oops(), compressed_class_pointers(), compact_headers()); src/hotspot/share/cds/filemap.cpp line 2508: > 2506: > 2507: if (compact_headers() != UseCompactObjectHeaders) { > 2508: log_info(cds)("The shared archive file's UseCompactObjectHeaders setting (%s)" Printing on the `info` level mimics what we do when there's a mismatch for compressed classes (and oops), but I wonder if that one is intentional or if it is accidentally printing to 'info' instead of 'warning'. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738164792 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738166832 From rcastanedalo at openjdk.org Fri Aug 30 08:22:43 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 08:22:43 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v11] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with six additional commits since the last revision: - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type' - Remark relation between compiler optimization and barrier filter - Make 'refine_barrier_by_new_val_type' static and its input argument 'const' - Replace 'the null' with 'null' in comment - Remove redundant redefinitions of '__' - Replace 'already dirty' with 'young' in post-barrier fast path comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/daf38d3f..57adcfb0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=09-10 Stats: 39 lines in 4 files changed: 27 ins; 6 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Fri Aug 30 08:22:44 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 08:22:44 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v9] In-Reply-To: References: Message-ID: On Sun, 25 Aug 2024 01:53:30 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Pass oldval to the pre-barrier in g1CompareAndExchange/SwapP > > src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 218: > >> 216: __ cbz(new_val, done); >> 217: } >> 218: // Storing region crossing non-null, is card already dirty? > > s/already dirty/young/ Done (commit [70c2771](https://github.com/openjdk/jdk/pull/19746/commits/70c2771818834a74a12f8a61de3c77bb69e3e531)), thanks. > src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 280: > >> 278: >> 279: #undef __ >> 280: #define __ masm-> > > These "changes" to `__` are unnecessary and confusing. We have the same define near the top of > the file, unconditionally. This one is conditonal on COMPILER2, but is left in place at the end of the > conditional block, affecting following unconditional code. Removed now (commit [2dc688b](https://github.com/openjdk/jdk/pull/19746/commits/2dc688baf2a8f446c7579fafce7eab3a953e623a)), thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1738181093 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1738182128 From rcastanedalo at openjdk.org Fri Aug 30 08:22:44 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 08:22:44 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Wed, 28 Aug 2024 07:50:11 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename g1XChgX to g1GetAndSetX for consistency with Ideal operation names > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 160: > >> 158: * To reduce the number of updates to the remembered set, the post-barrier >> 159: * filters out updates to fields in objects located in the Young Generation, the >> 160: * same region as the reference, when the null is being written, or if the card > > s/the null/null/ Done (commit [d1a2349](https://github.com/openjdk/jdk/pull/19746/commits/d1a2349068194ee598cec2b6afe7aa972781b491)), thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1738183062 From rcastanedalo at openjdk.org Fri Aug 30 08:22:44 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 08:22:44 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Wed, 28 Aug 2024 08:12:36 GMT, Kim Barrett wrote: >> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 166: >> >>> 164: * post-barrier completely, if it is possible during compile time to prove the >>> 165: * object is newly allocated and that no safepoint exists between the allocation >>> 166: * and the store. >> >> It might be worth saying explicitly that this is a compile-time version of the above mentioned young >> generation filter. > > We can similarly elide the post-barrier if we can prove at compile-time that the value being written > is null. That case isn't handled here though. Instead that's checked for in > `refine_barrier_by_new_val_type` and in `get_store_barrier`. I'm not sure why it's structured > that way. > It might be worth saying explicitly that this is a compile-time version of the above mentioned young generation filter. Done (commit [72a04c4](https://github.com/openjdk/jdk/pull/19746/commits/72a04c4e8046256ee7e811d66934d5d9e24f4c7c)), thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1738184612 From rcastanedalo at openjdk.org Fri Aug 30 08:27:22 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 08:27:22 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Fri, 30 Aug 2024 08:19:50 GMT, Roberto Casta?eda Lozano wrote: >> We can similarly elide the post-barrier if we can prove at compile-time that the value being written >> is null. That case isn't handled here though. Instead that's checked for in >> `refine_barrier_by_new_val_type` and in `get_store_barrier`. I'm not sure why it's structured >> that way. > >> It might be worth saying explicitly that this is a compile-time version of the above mentioned young > generation filter. > > Done (commit [72a04c4](https://github.com/openjdk/jdk/pull/19746/commits/72a04c4e8046256ee7e811d66934d5d9e24f4c7c)), thanks. > We can similarly elide the post-barrier if we can prove at compile-time that the value being written is null. That case isn't handled here though. Instead that's checked for in refine_barrier_by_new_val_type and in get_store_barrier. I'm not sure why it's structured that way. The reason why the compile-time null check is performed outside of `g1_can_remove_post_barrier` is for consistency with the [current mainline code](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp#L382-L388). The difference between the current and this changeset's `g1_can_remove_post_barrier` function is minimal, but this is unfortunately obscured in the patch by the temporary `G1_LATE_BARRIER_MIGRATION_SUPPORT`-guarded code. `refine_barrier_by_new_val_type` performs a compile-time null check again at the end of C2's platform-independent optimizations (see https://bugs.openjdk.org/secure/attachment/107747/late-expansion.png) to exploit potentially stronger type information that might be revealed only after applying some optimizations. I have added a new test case that illustrates this scenario (commit [57adcfb](https://github.com/openjdk/jdk/pull/19746/commits/57adcfb04b163ba6744389d6258efe4b2ace534d)). I will study if the check in `get_store_barrier` is superseded by that in `refine_barrier_by_new_val_type`. If I can convince myself that this is the case I will consider removing the former. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1738191022 From rcastanedalo at openjdk.org Fri Aug 30 08:27:23 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 08:27:23 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Wed, 28 Aug 2024 08:17:14 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename g1XChgX to g1GetAndSetX for consistency with Ideal operation names > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 229: > >> 227: } >> 228: >> 229: void refine_barrier_by_new_val_type(Node* n) { > > This function should probably be `static`. Done, thanks (I also made its argument `const`, see commit [29d8a89](https://github.com/openjdk/jdk/pull/19746/commits/29d8a89a9a7fd0c1717330609c6d7cb36b0ff174)). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1738192979 From mdoerr at openjdk.org Fri Aug 30 08:33:23 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 30 Aug 2024 08:33:23 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v11] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 08:22:43 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with six additional commits since the last revision: > > - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type' > - Remark relation between compiler optimization and barrier filter > - Make 'refine_barrier_by_new_val_type' static and its input argument 'const' > - Replace 'the null' with 'null' in comment > - Remove redundant redefinitions of '__' > - Replace 'already dirty' with 'young' in post-barrier fast path comment Are you planning to merge jdk-24+13? It has a known testbug on PPC64, but that's not a problem. It looks good otherwise. I'll have to rebase the PPC64 implementation after it is merged and I should be able to provide a stable version for this PR afterwards. So, I'd appreciate the update unless @feilongjiang @offamitkumar @snazarkin see any issue on their platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2320483973 From amitkumar at openjdk.org Fri Aug 30 08:53:25 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 30 Aug 2024 08:53:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v11] In-Reply-To: References: Message-ID: <5FM9bNeaaI0Lcsto0kfzrcrY4u6SODtf3wqDwmlninw=.367c8d65-c059-4726-a10a-6dd616b643af@github.com> On Fri, 30 Aug 2024 08:22:43 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with six additional commits since the last revision: > > - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type' > - Remark relation between compiler optimization and barrier filter > - Make 'refine_barrier_by_new_val_type' static and its input argument 'const' > - Replace 'the null' with 'null' in comment > - Remove redundant redefinitions of '__' > - Replace 'already dirty' with 'young' in post-barrier fast path comment On s390x side, we are good. So I don't have issue with merging jdk-24+13. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2320533252 From kbarrett at openjdk.org Fri Aug 30 09:08:54 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 30 Aug 2024 09:08:54 GMT Subject: RFR: 8311163: Parallel: Improve large object handling during evacuation [v2] In-Reply-To: References: Message-ID: > Please review this change to ParallelGC young generation collection's handling > of large objArrays, to now use the infrastructure provided by JDK-8253237 and > JDK-8337709. (That's the same infrastructure used by G1 young/mixed > collections.) > > Testing: mach5 tier1-5 Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: remove unused size arg for push_objArray ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20720/files - new: https://git.openjdk.org/jdk/pull/20720/files/e1db9bca..6ca4dd9a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20720&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20720&range=00-01 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20720.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20720/head:pull/20720 PR: https://git.openjdk.org/jdk/pull/20720 From kbarrett at openjdk.org Fri Aug 30 09:08:55 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 30 Aug 2024 09:08:55 GMT Subject: RFR: 8311163: Parallel: Improve large object handling during evacuation [v2] In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 11:00:57 GMT, Albert Mingkun Yang wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> remove unused size arg for push_objArray > > src/hotspot/share/gc/parallel/psPromotionManager.cpp line 323: > >> 321: } >> 322: >> 323: void PSPromotionManager::push_objArray(oop old_obj, oop new_obj, size_t obj_size) { > > `obj_size` seems unused. I'd originally intended to compute the array_length from this argument, but that's kind of messy so I went with the simpler approach of just fetching it from one of the objects. I've removed the argument. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20720#discussion_r1738259596 From rcastanedalo at openjdk.org Fri Aug 30 09:23:23 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 09:23:23 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v5] In-Reply-To: References: Message-ID: On Wed, 14 Aug 2024 09:10:10 GMT, Feilong Jiang wrote: >>> Are you planning to rebase it with master ? Nothing important, but there were couple of failures which are fixed after this PR. So will make test result a bit clean for us ?. >> >> Actually, I have refrained to update to the latest mainline changes to avoid interfering with the porting work while it is in progress, but if there is consensus among the port maintainers I would be happy to update the changeset regularly. @TheRealMDoerr @feilongjiang @offamitkumar @snazarkin what do you prefer? > >> > Are you planning to rebase it with master ? Nothing important, but there were couple of failures which are fixed after this PR. So will make test result a bit clean for us ?. >> >> Actually, I have refrained to update to the latest mainline changes to avoid interfering with the porting work while it is in progress, but if there is consensus among the port maintainers I would be happy to update the changeset regularly. @TheRealMDoerr @feilongjiang @offamitkumar @snazarkin what do you prefer? > > I have already merged upstream commits on my local branch, so I'm fine with regular updates. > So, I'd appreciate the update unless @feilongjiang @offamitkumar @snazarkin see any issue on their platforms. OK, if there are no objections from @feilongjiang or @snazarkin within a couple of days I will prepare an update to jdk-24+13. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2320618425 From tschatzl at openjdk.org Fri Aug 30 10:52:23 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 30 Aug 2024 10:52:23 GMT Subject: RFR: 8311163: Parallel: Improve large object handling during evacuation [v2] In-Reply-To: References: Message-ID: <-Vood5VZzEUfW7xJFj7cYKe5wvdjxvn_AT9gQNVWRSk=.ed9813e1-8996-433e-9d4e-bafd1959b59e@github.com> On Fri, 30 Aug 2024 09:08:54 GMT, Kim Barrett wrote: >> Please review this change to ParallelGC young generation collection's handling >> of large objArrays, to now use the infrastructure provided by JDK-8253237 and >> JDK-8337709. (That's the same infrastructure used by G1 young/mixed >> collections.) >> >> Testing: mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove unused size arg for push_objArray Still good. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20720#pullrequestreview-2271922982 From stefank at openjdk.org Fri Aug 30 11:10:23 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 30 Aug 2024 11:10:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 17:12:09 GMT, Albert Mingkun Yang wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove hashcode leftovers from SA > > src/hotspot/share/gc/serial/defNewGeneration.cpp line 707: > >> 705: } else if (obj->is_forwarded()) { >> 706: // To restore the klass-bits in the header. >> 707: obj->forward_safe_init_mark(); > > I wonder if not modifying successful-forwarded objs is cleaner. Sth like: > > > reset_self_forwarded_in_space(space) { > cur = space->bottom(); > top = space->top(); > > while (cur < top) { > obj = cast_to_oop(cur); > > if (obj->is_self_forwarded()) { > obj->unset_self_forwarded(); > obj_size = obj->size(); > } else { > assert(obj->is_forwarded(), "inv"); > obj_size = obj->forwardee()->size(); > } > > cur += obj_size; > } > } > > reset_self_forwarded_in_space(eden()); > reset_self_forwarded_in_space(from()); I was thinking the same, but there's a problem with that. If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. Then when the Full GC scans these regions with dead objects it will mistakenly think that they have been marked alive because `is_forwarded() == is_gc_marked()`. The code in `phase2_calculate_new_addr` will then break when it looks for `is_gc_marked` objects. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738433303 From stefank at openjdk.org Fri Aug 30 11:18:20 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 30 Aug 2024 11:18:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 11:07:46 GMT, Stefan Karlsson wrote: >> src/hotspot/share/gc/serial/defNewGeneration.cpp line 707: >> >>> 705: } else if (obj->is_forwarded()) { >>> 706: // To restore the klass-bits in the header. >>> 707: obj->forward_safe_init_mark(); >> >> I wonder if not modifying successful-forwarded objs is cleaner. Sth like: >> >> >> reset_self_forwarded_in_space(space) { >> cur = space->bottom(); >> top = space->top(); >> >> while (cur < top) { >> obj = cast_to_oop(cur); >> >> if (obj->is_self_forwarded()) { >> obj->unset_self_forwarded(); >> obj_size = obj->size(); >> } else { >> assert(obj->is_forwarded(), "inv"); >> obj_size = obj->forwardee()->size(); >> } >> >> cur += obj_size; >> } >> } >> >> reset_self_forwarded_in_space(eden()); >> reset_self_forwarded_in_space(from()); > > I was thinking the same, but there's a problem with that. If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. Then when the Full GC scans these regions with dead objects it will mistakenly think that they have been marked alive because `is_forwarded() == is_gc_marked()`. The code in `phase2_calculate_new_addr` will then break when it looks for `is_gc_marked` objects. FWIW, the ParallelGC does something very similar to what you propose, except that it walks bitmaps instead of paring the space to find the self-forwarded objects. It then has a check inside object_iterate to make sure that it doesn't expose the dead objects (in eden and the from space) to heap dumpers and histogram printers. Because of the the code above, the SerialGC clears away the information about what objects are dead in eden and the from space, so heap dumpers and histogram printers will include these dead objects. We might want to fix that as a future RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1738444174 From fjiang at openjdk.org Fri Aug 30 13:26:23 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 30 Aug 2024 13:26:23 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v11] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 08:22:43 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with six additional commits since the last revision: > > - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type' > - Remark relation between compiler optimization and barrier filter > - Make 'refine_barrier_by_new_val_type' static and its input argument 'const' > - Replace 'the null' with 'null' in comment > - Remove redundant redefinitions of '__' > - Replace 'already dirty' with 'young' in post-barrier fast path comment risc-v port looks good too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2321247648 From rcastanedalo at openjdk.org Fri Aug 30 13:43:24 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 13:43:24 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v9] In-Reply-To: References: Message-ID: On Sun, 25 Aug 2024 06:15:20 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Pass oldval to the pre-barrier in g1CompareAndExchange/SwapP > > src/hotspot/share/opto/memnode.cpp line 3468: > >> 3466: // Capture an unaliased, unconditional, simple store into an initializer. >> 3467: // Or, if it is independent of the allocation, hoist it above the allocation. >> 3468: if (ReduceFieldZeroing && ReduceInitialCardMarks && /*can_reshape &&*/ > > It's not obvious to me how this is related to the late barrier changes. I agree this change is not obvious and deserves an explanation. With `ReduceInitialCardMarks` disabled, a store to a newly allocated object requires a post-barrier. In current mainline code, the post-barrier is expanded early, which allows the store-capturing transformation (a first step to avoid needless zeroing in object initialization) to move the store and its post-barrier apart: the store goes into the initialization sequence of the recently allocated object, whereas the post-barrier itself remains outside. Here is an example in pseudo-code of this transformation for early-expanded GC barriers: (before store capturing): allocate object o start initialization of o ... o.f <- 0 ... end initialization of o memory barrier (store-store) o.f <- new-val post-barrier of o.f <- new-val (after store capturing): allocate object o start initialization of o ... o.f <- new-val ... end initialization of o memory barrier (store-store) post-barrier of o.f <- new-val In late barrier expansion however, the post-barrier is an implicit, inseparable part of the store, so if we have stores with post-barriers we have no other choice than leaving them outside the initialization section. To enforce this, the change simply disables store-capturing analysis in the `!ReduceInitialCardMarks` case, which is the only case where we might find stores with post-barriers on recently allocated objects. A perhaps more principled solution might be extending store-capturing analysis to reject stores with late-expanded barriers. I will give it a try. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1738693695 From rcastanedalo at openjdk.org Fri Aug 30 13:51:23 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 Aug 2024 13:51:23 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Wed, 28 Aug 2024 08:25:06 GMT, Kim Barrett wrote: > I've only looked at the changes in gc directories (shared and cpu-specific). Thanks for your suggestions and comments, Kim! I have addressed them now (modulo a couple of follow-up experiments that I will try out in the next days), please let me know if there is any further question. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2321323461 From ayang at openjdk.org Fri Aug 30 18:13:23 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 30 Aug 2024 18:13:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 11:15:23 GMT, Stefan Karlsson wrote: >> I was thinking the same, but there's a problem with that. If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. Then when the Full GC scans these regions with dead objects it will mistakenly think that they have been marked alive because `is_forwarded() == is_gc_marked()`. The code in `phase2_calculate_new_addr` will then break when it looks for `is_gc_marked` objects. > > FWIW, the ParallelGC does something very similar to what you propose, except that it walks bitmaps instead of paring the space to find the self-forwarded objects. It then has a check inside object_iterate to make sure that it doesn't expose the dead objects (in eden and the from space) to heap dumpers and histogram printers. > > Because of the the code above, the SerialGC clears away the information about what objects are dead in eden and the from space, so heap dumpers and histogram printers will include these dead objects. We might want to fix that as a future RFE. > If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. True; need to do sth like `obj->init_mark();` for the non-self-forwarded case. The postcondition is that no forwarded objs in eden/from. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1739218189 From kbarrett at openjdk.org Sat Aug 31 01:16:24 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 31 Aug 2024 01:16:24 GMT Subject: RFR: 8311163: Parallel: Improve large object handling during evacuation In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 11:00:19 GMT, Albert Mingkun Yang wrote: >> Please review this change to ParallelGC young generation collection's handling >> of large objArrays, to now use the infrastructure provided by JDK-8253237 and >> JDK-8337709. (That's the same infrastructure used by G1 young/mixed >> collections.) >> >> Testing: mach5 tier1-5 > > Using another box (AMD), the improvement becomes clear. > > > ## baesline > > [0.003s][info][gc] Using Parallel > [1.464s][info][gc] GC(0) Pause Young (Allocation Failure) 512M->344M(1963M) 586.065ms > [2.163s][info][gc] GC(1) Pause Young (Allocation Failure) 857M->853M(2304M) 298.110ms > [8.208s][info][gc] GC(2) Pause Young (Allocation Failure) 1707M->1669M(2689M) 2986.103ms > [9.941s][info][gc] GC(3) Pause Full (Allocation Failure) 2516M->91M(2485M) 38.478ms > > ## new > [0.002s][info][gc] Using Parallel > [1.325s][info][gc] GC(0) Pause Young (Allocation Failure) 512M->355M(1963M) 415.916ms > [1.791s][info][gc] GC(1) Pause Young (Allocation Failure) 867M->858M(2304M) 212.690ms > [5.663s][info][gc] GC(2) Pause Young (Allocation Failure) 1700M->1669M(2689M) 821.355ms > [7.088s][info][gc] GC(3) Pause Full (Allocation Failure) 2510M->91M(2475M) 32.170ms > > >> So I don't yet know why the change is so beneficial for GC(2). > > With `-Xlog:gc*=debug`, I can see an expansion during GC(2) -- I guess because the expansion operation is synchronous, not having any chunks in the task-queue essentially blocks other workers. Thanks for reviews @albertnetymk and @tschatzl ------------- PR Comment: https://git.openjdk.org/jdk/pull/20720#issuecomment-2322646225 From kbarrett at openjdk.org Sat Aug 31 01:16:25 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 31 Aug 2024 01:16:25 GMT Subject: Integrated: 8311163: Parallel: Improve large object handling during evacuation In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 00:16:10 GMT, Kim Barrett wrote: > Please review this change to ParallelGC young generation collection's handling > of large objArrays, to now use the infrastructure provided by JDK-8253237 and > JDK-8337709. (That's the same infrastructure used by G1 young/mixed > collections.) > > Testing: mach5 tier1-5 This pull request has now been integrated. Changeset: 4f071ce0 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/4f071ce074b934d5610e213d348cff8326f1499d Stats: 121 lines in 5 files changed: 54 ins; 39 del; 28 mod 8311163: Parallel: Improve large object handling during evacuation Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.org/jdk/pull/20720