From wkemper at openjdk.org Mon Jul 1 15:59:52 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 Jul 2024 15:59:52 GMT Subject: Integrated: 8335289: GenShen: Whitebox breakpoint GC requests may cause assertions In-Reply-To: <4mqg5wQ8EsOCwR_d7styAkrBVlxfdmm7v9o-B4G_BjI=.19d03445-aa39-4c8c-9eed-11ed356439ac@github.com> References: <4mqg5wQ8EsOCwR_d7styAkrBVlxfdmm7v9o-B4G_BjI=.19d03445-aa39-4c8c-9eed-11ed356439ac@github.com> Message-ID: On Fri, 28 Jun 2024 22:00:04 GMT, William Kemper wrote: > Clean backport, low risk. Fixes intermittent test failure. This pull request has now been integrated. Changeset: d6eacaff Author: William Kemper URL: https://git.openjdk.org/shenandoah-jdk21u/commit/d6eacaff28c313d72f40ef2b68b65de97fd2fbed Stats: 26 lines in 2 files changed: 20 ins; 4 del; 2 mod 8335289: GenShen: Whitebox breakpoint GC requests may cause assertions Backport-of: 71182e240ce4f4a6e3a8773f61be6b091e2d65e9 ------------- PR: https://git.openjdk.org/shenandoah-jdk21u/pull/63 From wkemper at openjdk.org Mon Jul 1 23:57:37 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 Jul 2024 23:57:37 GMT Subject: RFR: 8328235: GenShen: Robustify ShenandoahGCSession and fix missing use [v22] In-Reply-To: References:

Message-ID: <0bfcP-HwsrAGE77lyBmdG4mXj5G09cwsd2l9drSB0-U=.7451d11f-8885-42d1-99f4-4828028b7b26@github.com> On Sat, 29 Jun 2024 06:44:13 GMT, Y. Srinivas Ramakrishna wrote: >> ShenandoahGCSession is intended to create a scope where the ShenandoahHeap's _gc_cause and _gc_generation field reflect the current gc cycle. We now check that we do not overwrite existing non-default settings (respectively _no_gc and nullptr). The destructor of the scope/stack object also resets these fields to their default settings, ensuring intended uses. This uncovered a situation where the scope was not entered when it should have been, which we have now fixed. >> >> A case of flickering of active_generation() was identified when used concurrently by mutators while it was being modified by the controller thread. To deal with this, we have carefully gone through the setting and use of the field, and found that an expedient fix for the race is to split the field into two: >> - _gc_generation is set & cleared by the controller thread whenever it enters and exits a GC scope, and services concurrent gc cycles for young or old generations. >> - _active_generation is set to the value in _gc_generation at the start of each Shenandoah GC safepoint operation so that mutator threads and load barriers always see a consistent value between safepoints. >> >> Asserts check the protocol for setting and clearing these fields. >> >> The protocol for use of the fields is that mutator threads may never use the _gc_generation field since it's subject to asynchronously changing based on actions of the coordinator thread. Mutator threads may only use the _active_generation field which changes synchronously at safepoints. Worker threads will generally use the former, but they may also use the latter as part of the load barrier. >> >> An alternative approach would be to not use a global variable for the _gc_generation indirected through the heap, but rather to pass it into the gc closures that do the work for specific phases of the GC that need to know which generation is currently subject to collector actions. This would work as well, but the changes would potentially touch more code. We would still have to have set the variable that is consulted by the load barriers, viz. _active_generation, in a mutator-safe fashion at a safepoint, like we do today. This or other alternative approaches may be investigated in the future to potentially make this protocol more self-contained and robust rather than leaking as it does today into many places in the code. >> >> *Testing*: >> - [x] code pipeline >> - [x] specjbb testing >> - [x] specjbb performance >> - [x] jtreg:hotspot_gc and jtreg:hots... > > Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: > > Disallow mutator threads from reading the asynchronously updated > _gc_generation field of ShHeap. Marked as reviewed by wkemper (Committer). ------------- PR Review: https://git.openjdk.org/shenandoah/pull/407#pullrequestreview-2152416786 From ysr at openjdk.org Tue Jul 2 01:04:35 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 2 Jul 2024 01:04:35 GMT Subject: RFR: 8328235: GenShen: Robustify ShenandoahGCSession and fix missing use [v22] In-Reply-To: References:

Message-ID: On Sat, 29 Jun 2024 06:44:13 GMT, Y. Srinivas Ramakrishna wrote: >> ShenandoahGCSession is intended to create a scope where the ShenandoahHeap's _gc_cause and _gc_generation field reflect the current gc cycle. We now check that we do not overwrite existing non-default settings (respectively _no_gc and nullptr). The destructor of the scope/stack object also resets these fields to their default settings, ensuring intended uses. This uncovered a situation where the scope was not entered when it should have been, which we have now fixed. >> >> A case of flickering of active_generation() was identified when used concurrently by mutators while it was being modified by the controller thread. To deal with this, we have carefully gone through the setting and use of the field, and found that an expedient fix for the race is to split the field into two: >> - _gc_generation is set & cleared by the controller thread whenever it enters and exits a GC scope, and services concurrent gc cycles for young or old generations. >> - _active_generation is set to the value in _gc_generation at the start of each Shenandoah GC safepoint operation so that mutator threads and load barriers always see a consistent value between safepoints. >> >> Asserts check the protocol for setting and clearing these fields. >> >> The protocol for use of the fields is that mutator threads may never use the _gc_generation field since it's subject to asynchronously changing based on actions of the coordinator thread. Mutator threads may only use the _active_generation field which changes synchronously at safepoints. Worker threads will generally use the former, but they may also use the latter as part of the load barrier. >> >> An alternative approach would be to not use a global variable for the _gc_generation indirected through the heap, but rather to pass it into the gc closures that do the work for specific phases of the GC that need to know which generation is currently subject to collector actions. This would work as well, but the changes would potentially touch more code. We would still have to have set the variable that is consulted by the load barriers, viz. _active_generation, in a mutator-safe fashion at a safepoint, like we do today. This or other alternative approaches may be investigated in the future to potentially make this protocol more self-contained and robust rather than leaking as it does today into many places in the code. >> >> *Testing*: >> - [x] code pipeline >> - [x] specjbb testing >> - [x] specjbb performance >> - [x] jtreg:hotspot_gc and jtreg:hots... > > Y. Srinivas Ramakrishna has updated the pull request incrementally with one additional commit since the last revision: > > Disallow mutator threads from reading the asynchronously updated > _gc_generation field of ShHeap. Thanks for your review, William! ------------- PR Comment: https://git.openjdk.org/shenandoah/pull/407#issuecomment-2201583087 From ysr at openjdk.org Tue Jul 2 01:04:36 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 2 Jul 2024 01:04:36 GMT Subject: Integrated: 8328235: GenShen: Robustify ShenandoahGCSession and fix missing use In-Reply-To: References: Message-ID: <4SqhOKXjyBoST4Tm4tKxuc6TdzpuuKIr2Bpe1kM8Ycs=.6064b4b0-13c2-4fa5-beaa-3d8c73065d71@github.com> On Fri, 15 Mar 2024 01:45:51 GMT, Y. Srinivas Ramakrishna wrote: > ShenandoahGCSession is intended to create a scope where the ShenandoahHeap's _gc_cause and _gc_generation field reflect the current gc cycle. We now check that we do not overwrite existing non-default settings (respectively _no_gc and nullptr). The destructor of the scope/stack object also resets these fields to their default settings, ensuring intended uses. This uncovered a situation where the scope was not entered when it should have been, which we have now fixed. > > A case of flickering of active_generation() was identified when used concurrently by mutators while it was being modified by the controller thread. To deal with this, we have carefully gone through the setting and use of the field, and found that an expedient fix for the race is to split the field into two: > - _gc_generation is set & cleared by the controller thread whenever it enters and exits a GC scope, and services concurrent gc cycles for young or old generations. > - _active_generation is set to the value in _gc_generation at the start of each Shenandoah GC safepoint operation so that mutator threads and load barriers always see a consistent value between safepoints. > > Asserts check the protocol for setting and clearing these fields. > > The protocol for use of the fields is that mutator threads may never use the _gc_generation field since it's subject to asynchronously changing based on actions of the coordinator thread. Mutator threads may only use the _active_generation field which changes synchronously at safepoints. Worker threads will generally use the former, but they may also use the latter as part of the load barrier. > > An alternative approach would be to not use a global variable for the _gc_generation indirected through the heap, but rather to pass it into the gc closures that do the work for specific phases of the GC that need to know which generation is currently subject to collector actions. This would work as well, but the changes would potentially touch more code. We would still have to have set the variable that is consulted by the load barriers, viz. _active_generation, in a mutator-safe fashion at a safepoint, like we do today. This or other alternative approaches may be investigated in the future to potentially make this protocol more self-contained and robust rather than leaking as it does today into many places in the code. > > *Testing*: > - [x] code pipeline > - [x] specjbb testing > - [x] specjbb performance > - [x] jtreg:hotspot_gc and jtreg:hotspot:tier1 w/fastdebug > - [x] GHA > > *... This pull request has now been integrated. Changeset: d2102347 Author: Y. Srinivas Ramakrishna URL: https://git.openjdk.org/shenandoah/commit/d2102347ea9c1199221ec33f4e721aefa1193cea Stats: 169 lines in 16 files changed: 135 ins; 1 del; 33 mod 8328235: GenShen: Robustify ShenandoahGCSession and fix missing use Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/shenandoah/pull/407 From duke at openjdk.org Thu Jul 4 02:05:31 2024 From: duke at openjdk.org (duke) Date: Thu, 4 Jul 2024 02:05:31 GMT Subject: Withdrawn: 8321806: Shenandoah: each mutator must see FullGC or GC overhead limit is exceeded before throwing OOM In-Reply-To: References: Message-ID: On Tue, 5 Dec 2023 20:30:54 GMT, Kelvin Nilsen wrote: > Require each thread to observe unproductive Full GC before it throws OOM exception. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/16985 From duke at openjdk.org Fri Jul 5 03:25:29 2024 From: duke at openjdk.org (duke) Date: Fri, 5 Jul 2024 03:25:29 GMT Subject: Withdrawn: 8329204: Diagnostic command for zeroing unused parts of the heap In-Reply-To: References: Message-ID: On Wed, 27 Mar 2024 17:24:34 GMT, Volker Simonis wrote: > Diagnostic command for zeroing unused parts of the heap > > I propose to add a new diagnostic command `System.zero_unused_memory` which zeros out all unused parts of the heap. The name of the command is intentionally GC/heap agnostic because in the future it might be extended to also zero unused parts of the Metaspace and/or CodeCache. > > Currently `System.zero_unused_memory` triggers a full GC and afterwards zeros unused parts of the heap. Zeroing can help snapshotting technologies like [CRIU][1] or [Firecracker][2] to shrink the snapshot size of VMs/containers with running JVM processes because pages which only contain zero bytes can be easily removed from the image by making the image *sparse* (e.g. with [`fallocate -p`][3]). > > Notice that uncommitting unused heap parts in the JVM doesn't help in the context of virtualization (e.g. KVM/Firecracker) because from the host perspective they are still dirty and can't be easily removed from the snapshot image because they usually contain some non-zero data. More details can be found in my FOSDEM talk ["Zeroing and the semantic gap between host and guest"][4]. > > Furthermore, removing pages which only contain zero bytes (i.e. "empty pages") from a snapshot image not only decreases the image size but also speeds up the restore process because empty pages don't have to be read from the image file but will be populated by the kernel zero page first until they are used for the first time. This also decreases the initial memory footprint of a restored process. > > An additional argument for memory zeroing is security. By zeroing unused heap parts, we can make sure that secrets contained in unreferenced Java objects are deleted. Something that's currently impossibly to achieve from Java because even if a Java program zeroes out arrays with sensitive data after usage, it can never guarantee that the corresponding object hasn't already been moved by the GC and an old, unreferenced copy of that data still exists somewhere in the heap. > > A prototype implementation for this proposal for Serial, Parallel, G1 and Shenandoah GC is available in the linked pull request. > > [1]: https://criu.org > [2]: https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md > [3]: https://man7.org/linux/man-pages/man1/fallocate.1.html > [4]: https://fosdem.org/2024/schedule/event/fosdem-2024-3454-zeroing-and-the-semantic-gap-between-host-and-guest/ This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18521 From azafari at openjdk.org Fri Jul 5 11:15:25 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 5 Jul 2024 11:15:25 GMT Subject: RFR: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API [v4] In-Reply-To: References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> Message-ID: On Fri, 24 May 2024 13:46:15 GMT, Afshin Zafari wrote: >> This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: >> 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. >> Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. >> >> 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception. >> >> Tests: >> mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug} > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > more fixes. Withdrawn. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19343#issuecomment-2210686409 From azafari at openjdk.org Fri Jul 5 11:15:26 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 5 Jul 2024 11:15:26 GMT Subject: Withdrawn: 8331539: [REDO] NMT: add/make a mandatory MEMFLAGS argument to family of os::reserve/commit/uncommit memory API In-Reply-To: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> References: <1i0PKv9mCusM6BZqXG8ULe0lRA2Nz2ix4aZHz9otNMM=.b9d2d151-883e-4cb6-be48-4ba45b49ed43@github.com> Message-ID: On Wed, 22 May 2024 08:29:05 GMT, Afshin Zafari wrote: > This PR fixes the problems existed in the original PR (https://github.com/openjdk/jdk/pull/18745). There are two main fixes here: > 1- `ReservedSpace` class is changed so that the `_flag` member never changes after it is set in ctor. Since reserving memory regions may go thru a try and fail sequence of reserve-release pairs, changing the `_flag` member at failed releases would lead to incorrect flags in subsequent reserves. > Also, some assertion are added to the getters of a `ReservedSpace` to check if the region is successfully reserved. > > 2- In order to have adjacent regions with different flags, CDS reserves a (large) region `R` and then splits it into sub regions `R1` and `R2` (`R == <---R1---><--R2-->`). At release time, NMT tracks only `R` and ignores releasing `R1` and `R2`. This ignoring is problematic when a requested region `R` is size-aligned to `R1---R---R2` first and then the `R1` and `R2` are released (`chop_extra_memory` function is called for this). In this case, NMT ignores tracking `R1` and `R2` with false assumption that a containing `R` will be released. Therefore, `R1` and `R2` remain in the NMT reserved-regions-list and when a new reserve happens at that regions, NMT complains by raising an exception. > > Tests: > mach5 tiers 1-5, {linux-x64, macosx-aarch64, windows-x64, linux-aarch64 } x {debug, non-debug} This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19343 From duke at openjdk.org Mon Jul 8 05:30:40 2024 From: duke at openjdk.org (duke) Date: Mon, 8 Jul 2024 05:30:40 GMT Subject: Withdrawn: 8330171: Lazy W^X switch implementation In-Reply-To: <9eymaXovxUNFdkAkzojFQP5trwl_yyY0jE2GzcMEjR4=.02ee2ef9-c476-4c7c-9e4a-e021425c38bc@github.com> References: <9eymaXovxUNFdkAkzojFQP5trwl_yyY0jE2GzcMEjR4=.02ee2ef9-c476-4c7c-9e4a-e021425c38bc@github.com> Message-ID: On Fri, 12 Apr 2024 14:40:05 GMT, Sergey Nazarkin wrote: > An alternative for preemptively switching the W^X thread mode on macOS with an AArch64 CPU. This implementation triggers the switch in response to the SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this approach, it is now feasible to eliminate all WX guards and avoid potentially costly operations. However, no significant improvement or degradation in performance has been observed. Additionally, considering the issue with AsyncGetCallTrace, the patched JVM has been successfully operated with [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and [async-profiler](https://github.com/async-profiler/async-profiler). > > Additional testing: > - [x] MacOS AArch64 server fastdebug *gtets* > - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4* > - [ ] Benchmarking > > @apangin and @parttimenerd could you please check the patch on your scenarios?? This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18762 From duke at openjdk.org Mon Jul 8 08:54:38 2024 From: duke at openjdk.org (duke) Date: Mon, 8 Jul 2024 08:54:38 GMT Subject: RFR: 8331411: Shenandoah: Reconsider spinning duration in ShenandoahLock [v7] In-Reply-To: References:

Message-ID: On Tue, 25 Jun 2024 08:20:43 GMT, Xiaolong Peng wrote: >> ### Notes >> While doing CAS to get the lock, original implementation sleep/yield once after spinning 0xFFF times, and do these over and over again until get the lock successfully, it is like ```(N spins + sleep/yield) loop ```, based on test results, it seems doing more spins results in worse performance, we decided to change the algorithm to ```(N spins) + (yield loop)```, meanwhile block thread immediately if Safepoint is pending. But still need to determine the best N value for spins, tested multiple possible values: 0, 0x01, 0x7, 0xF, 0x1F, 0x3F, 0x7F, 0xFF, and compare the results with the baseline data(original implementation). >> >> #### Test code >> >> public class Alloc { >> static final int THREADS = 1280; //32 threads per CPU core, 40 cores >> static final Object[] sinks = new Object[64*THREADS]; >> static volatile boolean start; >> static volatile boolean stop; >> >> public static void main(String... args) throws Throwable { >> for (int t = 0; t < THREADS; t++) { >> int ft = t; >> new Thread(() -> work(ft * 64)).start(); >> } >> >> Thread.sleep(1000); >> start = true; >> Thread.sleep(30_000); >> stop = true; >> } >> >> public static void work(int idx) { >> while (!start) { Thread.onSpinWait(); } >> while (!stop) { >> sinks[idx] = new byte[128]; >> } >> } >> } >> >> >> Run it like this and observe TTSP times: >> >> >> java -Xms256m -Xmx256m -XX:+UseShenandoahGC -XX:-UseTLAB -Xlog:gc -Xlog:safepoint Alloc.java >> >> >> #### Metrics from tests(TTSP, allocation rate) >> ##### Heavy contention(1280 threads, 32 per CPU core) >> | Test | SP polls | Average TTSP | 2% TRIMMEAN | MAX | MIN | >> | -------- | -------- | ------------ | ----------- | -------- | ----- | >> | baseline | 18 | 3882361 | 3882361 | 43310117 | 49197 | >> | 0x00 | 168 | 861677 | 589036 | 46937732 | 44005 | >> | 0x01 | 164 | 627056 | 572697 | 10004767 | 55472 | >> | 0x07 | 163 | 650578 | 625329 | 5312631 | 53734 | >> | 0x0F | 164 | 590398 | 557325 | 6481761 | 56794 | >> | 0x1F | 144 | 814400 | 790089 | 5024881 | 56041 | >> | 0x3F | 137 | 830288 | 801192 | 5533538 | 54982 | >> | 0x7F | 132 | 1101625 | 845626 | 35425614 | 57492 | >> | 0xFF | 125 | 1005433 | 970988 | 6193342 | 54362 | >> >> >> ##### Light contention(40 threads, 1 per CPU core) >> | Spins | SP polls | Average TTSP | 2% T... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Simplify code with less stacks @pengxiaolong Your change (at version 6fd8205d851b27640a1f2ee6a29dc42cc811f530) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19570#issuecomment-2192464563 From aturbanov at openjdk.org Mon Jul 8 08:54:38 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 8 Jul 2024 08:54:38 GMT Subject: RFR: 8331411: Shenandoah: Reconsider spinning duration in ShenandoahLock [v7] In-Reply-To: References:

Message-ID: On Tue, 25 Jun 2024 08:20:43 GMT, Xiaolong Peng wrote: >> ### Notes >> While doing CAS to get the lock, original implementation sleep/yield once after spinning 0xFFF times, and do these over and over again until get the lock successfully, it is like ```(N spins + sleep/yield) loop ```, based on test results, it seems doing more spins results in worse performance, we decided to change the algorithm to ```(N spins) + (yield loop)```, meanwhile block thread immediately if Safepoint is pending. But still need to determine the best N value for spins, tested multiple possible values: 0, 0x01, 0x7, 0xF, 0x1F, 0x3F, 0x7F, 0xFF, and compare the results with the baseline data(original implementation). >> >> #### Test code >> >> public class Alloc { >> static final int THREADS = 1280; //32 threads per CPU core, 40 cores >> static final Object[] sinks = new Object[64*THREADS]; >> static volatile boolean start; >> static volatile boolean stop; >> >> public static void main(String... args) throws Throwable { >> for (int t = 0; t < THREADS; t++) { >> int ft = t; >> new Thread(() -> work(ft * 64)).start(); >> } >> >> Thread.sleep(1000); >> start = true; >> Thread.sleep(30_000); >> stop = true; >> } >> >> public static void work(int idx) { >> while (!start) { Thread.onSpinWait(); } >> while (!stop) { >> sinks[idx] = new byte[128]; >> } >> } >> } >> >> >> Run it like this and observe TTSP times: >> >> >> java -Xms256m -Xmx256m -XX:+UseShenandoahGC -XX:-UseTLAB -Xlog:gc -Xlog:safepoint Alloc.java >> >> >> #### Metrics from tests(TTSP, allocation rate) >> ##### Heavy contention(1280 threads, 32 per CPU core) >> | Test | SP polls | Average TTSP | 2% TRIMMEAN | MAX | MIN | >> | -------- | -------- | ------------ | ----------- | -------- | ----- | >> | baseline | 18 | 3882361 | 3882361 | 43310117 | 49197 | >> | 0x00 | 168 | 861677 | 589036 | 46937732 | 44005 | >> | 0x01 | 164 | 627056 | 572697 | 10004767 | 55472 | >> | 0x07 | 163 | 650578 | 625329 | 5312631 | 53734 | >> | 0x0F | 164 | 590398 | 557325 | 6481761 | 56794 | >> | 0x1F | 144 | 814400 | 790089 | 5024881 | 56041 | >> | 0x3F | 137 | 830288 | 801192 | 5533538 | 54982 | >> | 0x7F | 132 | 1101625 | 845626 | 35425614 | 57492 | >> | 0xFF | 125 | 1005433 | 970988 | 6193342 | 54362 | >> >> >> ##### Light contention(40 threads, 1 per CPU core) >> | Spins | SP polls | Average TTSP | 2% T... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Simplify code with less stacks src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 47: > 45: void ShenandoahLock::contended_lock_internal(JavaThread* java_thread) { > 46: assert(!ALLOW_BLOCK || java_thread != nullptr, "Must have a Java thread when allowing block."); > 47: // Spin this much on multi-processor, do not spin on multi-processor. >do not spin on multi-processor Did you mean on **single**-processor? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19570#discussion_r1668249587 From shade at openjdk.org Mon Jul 8 11:04:41 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jul 2024 11:04:41 GMT Subject: RFR: 8331411: Shenandoah: Reconsider spinning duration in ShenandoahLock [v7] In-Reply-To: References:

Message-ID: <8NhfPejZeG763Q6yny929XiwoBRC23XW8rGO-zSHtMM=.7acf73ff-9921-4537-b895-e94f29a569d8@github.com> On Mon, 8 Jul 2024 08:51:38 GMT, Andrey Turbanov wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplify code with less stacks > > src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 47: > >> 45: void ShenandoahLock::contended_lock_internal(JavaThread* java_thread) { >> 46: assert(!ALLOW_BLOCK || java_thread != nullptr, "Must have a Java thread when allowing block."); >> 47: // Spin this much on multi-processor, do not spin on multi-processor. > >>do not spin on multi-processor > > Did you mean on **single**-processor? Yes, I think so. @pengxiaolong, please do a simple follow-up fix? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19570#discussion_r1668426904 From xpeng at openjdk.org Mon Jul 8 14:34:38 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Jul 2024 14:34:38 GMT Subject: RFR: 8331411: Shenandoah: Reconsider spinning duration in ShenandoahLock [v7] In-Reply-To: <8NhfPejZeG763Q6yny929XiwoBRC23XW8rGO-zSHtMM=.7acf73ff-9921-4537-b895-e94f29a569d8@github.com> References:

<8NhfPejZeG763Q6yny929XiwoBRC23XW8rGO-zSHtMM=.7acf73ff-9921-4537-b895-e94f29a569d8@github.com> Message-ID: On Mon, 8 Jul 2024 11:01:51 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 47: >> >>> 45: void ShenandoahLock::contended_lock_internal(JavaThread* java_thread) { >>> 46: assert(!ALLOW_BLOCK || java_thread != nullptr, "Must have a Java thread when allowing block."); >>> 47: // Spin this much on multi-processor, do not spin on multi-processor. >> >>>do not spin on multi-processor >> >> Did you mean on **single**-processor? > > Yes, I think so. @pengxiaolong, please do a simple follow-up fix? Thanks, I'll fix it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19570#discussion_r1668758318 From duke at openjdk.org Mon Jul 8 16:22:33 2024 From: duke at openjdk.org (duke) Date: Mon, 8 Jul 2024 16:22:33 GMT Subject: RFR: 8335126: Shenandoah: Improve OOM handling In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 17:51:36 GMT, Kelvin Nilsen wrote: > 1. Throw OOM after failed allocation request following a Full GC (rather > than retrying as long as Full GC makes good progress because > repeatedly retrying the allocation request creates brown-out behavior > with no identified benefits on real-world workloads) > > 2. Count a successful allocation following a blocking > handle_allocation_failure() request to be good GC progress. > Otherwise, we increment gc_no_progress_count in full GCs that > have bad progress but successful allocations, and this causes > unwanted failure to even try a full GC in a different thread after > an out-of-memory condition might have been resolved in this thread. > > 3. Count a completed concurrent GC cycle as good progress, regardless > of how much memory it might have been able to reclaim. The fact that > concurrent GC succeeded without allocation failure and without > degeneration is considered good progress. Successful concurrent > GCs between Full GCs will reset the gc_no_progress_count to zero. > > 4. Do not count degenerated cycles as having no-progress. If a > degenerated cycle has no progress, it will upgrade to full GC. > The upgraded full GC will evaluate its own progress. We don't > want to count this "same [upgraded] cycle" twice. > > These changes have been tested over a variety of workloads and standard tests. These changes have also been tested with the generational mode of Shenandoah. It appears these changes provide more robust and consistent handling across a diversity of scenarios than the original implementation. @kdnilsen Your change (at version 439d394cab24ea9577550d3e44bbb02c1486dba8) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19912#issuecomment-2214585906 From kdnilsen at openjdk.org Mon Jul 8 16:22:34 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 8 Jul 2024 16:22:34 GMT Subject: RFR: 8335126: Shenandoah: Improve OOM handling In-Reply-To: <45BHtt51LihYveREtNv8vM0saAiwbeqWrSP8cyxm1EM=.39941e81-f71c-4333-b47c-516bb69375fb@github.com> References: <45BHtt51LihYveREtNv8vM0saAiwbeqWrSP8cyxm1EM=.39941e81-f71c-4333-b47c-516bb69375fb@github.com> Message-ID: <3hPxig-gI0Ac-H6ONjjc1NW-dmLW5erepRiVqG8J2dg=.d2c64ab7-5991-4059-a4d5-424211bd1987@github.com> On Thu, 27 Jun 2024 19:02:04 GMT, Aleksey Shipilev wrote: >> 1. Throw OOM after failed allocation request following a Full GC (rather >> than retrying as long as Full GC makes good progress because >> repeatedly retrying the allocation request creates brown-out behavior >> with no identified benefits on real-world workloads) >> >> 2. Count a successful allocation following a blocking >> handle_allocation_failure() request to be good GC progress. >> Otherwise, we increment gc_no_progress_count in full GCs that >> have bad progress but successful allocations, and this causes >> unwanted failure to even try a full GC in a different thread after >> an out-of-memory condition might have been resolved in this thread. >> >> 3. Count a completed concurrent GC cycle as good progress, regardless >> of how much memory it might have been able to reclaim. The fact that >> concurrent GC succeeded without allocation failure and without >> degeneration is considered good progress. Successful concurrent >> GCs between Full GCs will reset the gc_no_progress_count to zero. >> >> 4. Do not count degenerated cycles as having no-progress. If a >> degenerated cycle has no progress, it will upgrade to full GC. >> The upgraded full GC will evaluate its own progress. We don't >> want to count this "same [upgraded] cycle" twice. >> >> These changes have been tested over a variety of workloads and standard tests. These changes have also been tested with the generational mode of Shenandoah. It appears these changes provide more robust and consistent handling across a diversity of scenarios than the original implementation. > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 960: > >> 958: // b) We experienced at least one Full GC (whether or not it had good progress) >> 959: // >> 960: // TODO: Rather than require a Full GC before throwing OOMError, it might be more appropriate for handle_alloc_failure() > > Pro-tip: If you find yourself writing a large TODO comment, it should probably be transplanted straight into a new issue. Thanks. I will create an issue for this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19912#discussion_r1668933461 From xpeng at openjdk.org Mon Jul 8 16:48:44 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Jul 2024 16:48:44 GMT Subject: RFR: 8335904: Fix invalid comment in ShenandoahLock Message-ID: Hi all, This PR is to fix an invalid comment in ShenandoahLock I introduced in https://github.com/openjdk/jdk/pull/19570/files#r1668249587, thank you turbanoff@ for catching this, it is easy to miss since the PR had been closed. This PR should be trivial. Best, Xiaolong. ------------- Commit messages: - 8335904: Fix invalid comment in ShenandoahLock Changes: https://git.openjdk.org/jdk/pull/20079/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20079&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335904 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20079/head:pull/20079 PR: https://git.openjdk.org/jdk/pull/20079 From xpeng at openjdk.org Mon Jul 8 16:52:39 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Jul 2024 16:52:39 GMT Subject: RFR: 8331411: Shenandoah: Reconsider spinning duration in ShenandoahLock [v7] In-Reply-To: References:

<8NhfPejZeG763Q6yny929XiwoBRC23XW8rGO-zSHtMM=.7acf73ff-9921-4537-b895-e94f29a569d8@github.com> Message-ID: On Mon, 8 Jul 2024 14:31:45 GMT, Xiaolong Peng wrote: >> Yes, I think so. @pengxiaolong, please do a simple follow-up fix? > > Thanks, I'll fix it. https://github.com/openjdk/jdk/pull/20079 I have created a bug and PR for it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19570#discussion_r1668969590 From kdnilsen at openjdk.org Mon Jul 8 17:09:48 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 8 Jul 2024 17:09:48 GMT Subject: RFR: 8335126: Shenandoah: Improve OOM handling [v2] In-Reply-To: References: Message-ID: > 1. Throw OOM after failed allocation request following a Full GC (rather > than retrying as long as Full GC makes good progress because > repeatedly retrying the allocation request creates brown-out behavior > with no identified benefits on real-world workloads) > > 2. Count a successful allocation following a blocking > handle_allocation_failure() request to be good GC progress. > Otherwise, we increment gc_no_progress_count in full GCs that > have bad progress but successful allocations, and this causes > unwanted failure to even try a full GC in a different thread after > an out-of-memory condition might have been resolved in this thread. > > 3. Count a completed concurrent GC cycle as good progress, regardless > of how much memory it might have been able to reclaim. The fact that > concurrent GC succeeded without allocation failure and without > degeneration is considered good progress. Successful concurrent > GCs between Full GCs will reset the gc_no_progress_count to zero. > > 4. Do not count degenerated cycles as having no-progress. If a > degenerated cycle has no progress, it will upgrade to full GC. > The upgraded full GC will evaluate its own progress. We don't > want to count this "same [upgraded] cycle" twice. > > These changes have been tested over a variety of workloads and standard tests. These changes have also been tested with the generational mode of Shenandoah. It appears these changes provide more robust and consistent handling across a diversity of scenarios than the original implementation. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Change TODO comment to reference JBS issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19912/files - new: https://git.openjdk.org/jdk/pull/19912/files/439d394c..24cd0fa6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19912&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19912&range=00-01 Stats: 15 lines in 1 file changed: 0 ins; 14 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19912.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19912/head:pull/19912 PR: https://git.openjdk.org/jdk/pull/19912 From shade at openjdk.org Mon Jul 8 17:48:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jul 2024 17:48:34 GMT Subject: RFR: 8335904: Fix invalid comment in ShenandoahLock In-Reply-To: References: Message-ID: On Mon, 8 Jul 2024 16:43:00 GMT, Xiaolong Peng wrote: > Hi all, > This PR is to fix an invalid comment in ShenandoahLock I introduced in https://github.com/openjdk/jdk/pull/19570/files#r1668249587, thank you turbanoff@ for catching this, it is easy to miss since the PR had been closed. > This PR should be trivial. > > Best, > Xiaolong. This looks okay. src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 47: > 45: void ShenandoahLock::contended_lock_internal(JavaThread* java_thread) { > 46: assert(!ALLOW_BLOCK || java_thread != nullptr, "Must have a Java thread when allowing block."); > 47: // Spin this much on multi-processor, do not spin on single-processor. I think the proper nomenclature is "uniprocessor". We can avoid all this if we say e.g. "// Spin this much, but only on multi-processor systems" ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20079#pullrequestreview-2163983319 PR Review Comment: https://git.openjdk.org/jdk/pull/20079#discussion_r1669040268 From shade at openjdk.org Mon Jul 8 17:49:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jul 2024 17:49:34 GMT Subject: RFR: 8335126: Shenandoah: Improve OOM handling [v2] In-Reply-To: References:

Message-ID: On Mon, 8 Jul 2024 17:09:48 GMT, Kelvin Nilsen wrote: >> 1. Throw OOM after failed allocation request following a Full GC (rather >> than retrying as long as Full GC makes good progress because >> repeatedly retrying the allocation request creates brown-out behavior >> with no identified benefits on real-world workloads) >> >> 2. Count a successful allocation following a blocking >> handle_allocation_failure() request to be good GC progress. >> Otherwise, we increment gc_no_progress_count in full GCs that >> have bad progress but successful allocations, and this causes >> unwanted failure to even try a full GC in a different thread after >> an out-of-memory condition might have been resolved in this thread. >> >> 3. Count a completed concurrent GC cycle as good progress, regardless >> of how much memory it might have been able to reclaim. The fact that >> concurrent GC succeeded without allocation failure and without >> degeneration is considered good progress. Successful concurrent >> GCs between Full GCs will reset the gc_no_progress_count to zero. >> >> 4. Do not count degenerated cycles as having no-progress. If a >> degenerated cycle has no progress, it will upgrade to full GC. >> The upgraded full GC will evaluate its own progress. We don't >> want to count this "same [upgraded] cycle" twice. >> >> These changes have been tested over a variety of workloads and standard tests. These changes have also been tested with the generational mode of Shenandoah. It appears these changes provide more robust and consistent handling across a diversity of scenarios than the original implementation. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Change TODO comment to reference JBS issue @kdnilsen Your change (at version 24cd0fa655526ae7abc68630b07ba1eb5b094187) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19912#issuecomment-2214843005 From kdnilsen at openjdk.org Mon Jul 8 18:05:43 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 8 Jul 2024 18:05:43 GMT Subject: Integrated: 8335126: Shenandoah: Improve OOM handling In-Reply-To: References: Message-ID: On Wed, 26 Jun 2024 17:51:36 GMT, Kelvin Nilsen wrote: > 1. Throw OOM after failed allocation request following a Full GC (rather > than retrying as long as Full GC makes good progress because > repeatedly retrying the allocation request creates brown-out behavior > with no identified benefits on real-world workloads) > > 2. Count a successful allocation following a blocking > handle_allocation_failure() request to be good GC progress. > Otherwise, we increment gc_no_progress_count in full GCs that > have bad progress but successful allocations, and this causes > unwanted failure to even try a full GC in a different thread after > an out-of-memory condition might have been resolved in this thread. > > 3. Count a completed concurrent GC cycle as good progress, regardless > of how much memory it might have been able to reclaim. The fact that > concurrent GC succeeded without allocation failure and without > degeneration is considered good progress. Successful concurrent > GCs between Full GCs will reset the gc_no_progress_count to zero. > > 4. Do not count degenerated cycles as having no-progress. If a > degenerated cycle has no progress, it will upgrade to full GC. > The upgraded full GC will evaluate its own progress. We don't > want to count this "same [upgraded] cycle" twice. > > These changes have been tested over a variety of workloads and standard tests. These changes have also been tested with the generational mode of Shenandoah. It appears these changes provide more robust and consistent handling across a diversity of scenarios than the original implementation. This pull request has now been integrated. Changeset: 3a87eb5c Author: Kelvin Nilsen URL: https://git.openjdk.org/jdk/commit/3a87eb5c4606ce39970962895315567e8606eba7 Stats: 33 lines in 3 files changed: 13 ins; 1 del; 19 mod 8335126: Shenandoah: Improve OOM handling Reviewed-by: shade, ysr, wkemper, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/19912 From ysr at openjdk.org Mon Jul 8 18:17:55 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 8 Jul 2024 18:17:55 GMT Subject: RFR: 8327000: GenShen: Integrate updated Shenandoah implementation of FreeSet into GenShen In-Reply-To: References: Message-ID: <9sjiWgkSN4iOQKtCZekuBLi0odQe98-1zsElsuLMj4c=.e1d2608c-98e4-4c04-8996-52d405ddaedf@github.com> On Thu, 27 Jun 2024 20:38:32 GMT, Kelvin Nilsen wrote: > This pull request contains a backport of commit 9d2712dd from the openjdk/shenandoah repository. > > The commit being backported was authored by Kelvin Nilsen on 26 Jun 2024 and was reviewed by William Kemper and Y. Srinivas Ramakrishna. LGTM. ------------- Marked as reviewed by ysr (Committer). PR Review: https://git.openjdk.org/shenandoah-jdk21u/pull/62#pullrequestreview-2164045317 From xpeng at openjdk.org Mon Jul 8 18:29:02 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Jul 2024 18:29:02 GMT Subject: RFR: 8335904: Fix invalid comment in ShenandoahLock [v2] In-Reply-To: References: Message-ID: > Hi all, > This PR is to fix an invalid comment in ShenandoahLock I introduced in https://github.com/openjdk/jdk/pull/19570/files#r1668249587, thank you turbanoff@ for catching this, it is easy to miss since the PR had been closed. > This PR should be trivial. > > Best, > Xiaolong. Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: Update comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20079/files - new: https://git.openjdk.org/jdk/pull/20079/files/9d2c31f1..ad7dc6a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20079&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20079&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20079/head:pull/20079 PR: https://git.openjdk.org/jdk/pull/20079 From xpeng at openjdk.org Mon Jul 8 18:29:02 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Jul 2024 18:29:02 GMT Subject: RFR: 8335904: Fix invalid comment in ShenandoahLock [v2] In-Reply-To: References:

Message-ID: On Mon, 8 Jul 2024 17:43:34 GMT, Aleksey Shipilev wrote: >> Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comments > > src/hotspot/share/gc/shenandoah/shenandoahLock.cpp line 47: > >> 45: void ShenandoahLock::contended_lock_internal(JavaThread* java_thread) { >> 46: assert(!ALLOW_BLOCK || java_thread != nullptr, "Must have a Java thread when allowing block."); >> 47: // Spin this much on multi-processor, do not spin on single-processor. > > I think the proper nomenclature is "uniprocessor". We can avoid all this if we say e.g. "// Spin this much, but only on multi-processor systems" Thanks, suggestion is taken, I updated the PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20079#discussion_r1669092329 From shade at openjdk.org Mon Jul 8 19:15:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jul 2024 19:15:35 GMT Subject: RFR: 8335904: Fix invalid comment in ShenandoahLock [v2] In-Reply-To: References:

Message-ID: On Mon, 8 Jul 2024 18:29:02 GMT, Xiaolong Peng wrote: >> Hi all, >> This PR is to fix an invalid comment in ShenandoahLock I introduced in https://github.com/openjdk/jdk/pull/19570/files#r1668249587, thank you turbanoff@ for catching this, it is easy to miss since the PR had been closed. >> This PR should be trivial. >> >> Best, >> Xiaolong. > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update comments @pengxiaolong Your change (at version ad7dc6a36d777ddc8811ee05b8307fb6b55bc763) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20079#issuecomment-2215117981 From xpeng at openjdk.org Mon Jul 8 20:13:35 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 8 Jul 2024 20:13:35 GMT Subject: Integrated: 8335904: Fix invalid comment in ShenandoahLock In-Reply-To: References: Message-ID: On Mon, 8 Jul 2024 16:43:00 GMT, Xiaolong Peng wrote: > Hi all, > This PR is to fix an invalid comment in ShenandoahLock I introduced in https://github.com/openjdk/jdk/pull/19570/files#r1668249587, thank you turbanoff@ for catching this, it is easy to miss since the PR had been closed. > This PR should be trivial. > > Best, > Xiaolong. This pull request has now been integrated. Changeset: bb1f8a16 Author: Xiaolong Peng Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/bb1f8a1698553d5962569ac8912edd0d7ef010dd Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8335904: Fix invalid comment in ShenandoahLock Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/20079 From kdnilsen at openjdk.org Mon Jul 8 21:34:56 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 8 Jul 2024 21:34:56 GMT Subject: RFR: 8335930: GenShen: Reserve regions within each generation's freeset until available is sufficient Message-ID: Reserve regions until available memory within each generation's partition is sufficient to satisfy reserve request. ------------- Commit messages: - Reserve until available in partition is sufficient - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Revert "Change behavior of max_old and min_old" - ... and 9 more: https://git.openjdk.org/shenandoah/compare/d2102347...b226b32a Changes: https://git.openjdk.org/shenandoah/pull/457/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=457&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335930 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/shenandoah/pull/457.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/457/head:pull/457 PR: https://git.openjdk.org/shenandoah/pull/457 From wkemper at openjdk.org Mon Jul 8 21:53:58 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 8 Jul 2024 21:53:58 GMT Subject: RFR: 8335930: GenShen: Reserve regions within each generation's freeset until available is sufficient In-Reply-To: References: Message-ID: On Mon, 8 Jul 2024 21:30:23 GMT, Kelvin Nilsen wrote: > Reserve regions until available memory within each generation's partition is sufficient to satisfy reserve request. LGTM ------------- Marked as reviewed by wkemper (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/457#pullrequestreview-2164502168 From wkemper at openjdk.org Mon Jul 8 21:54:23 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 8 Jul 2024 21:54:23 GMT Subject: RFR: 8335932: GenShen: Fix old heuristic unit test Message-ID: This isn't really a unit test and it must be run with Shenandoah enabled, but it exercises a hard to reach code path. The test has suffered some bit-rot* of late that needs to be addressed. * We no longer set soft-max-capacity on the old generation. ------------- Commit messages: - Fix old heuristic test Changes: https://git.openjdk.org/shenandoah/pull/458/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=458&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335932 Stats: 11 lines in 1 file changed: 5 ins; 3 del; 3 mod Patch: https://git.openjdk.org/shenandoah/pull/458.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/458/head:pull/458 PR: https://git.openjdk.org/shenandoah/pull/458 From kdnilsen at openjdk.org Mon Jul 8 22:06:58 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 8 Jul 2024 22:06:58 GMT Subject: RFR: 8335932: GenShen: Fix old heuristic unit test In-Reply-To: References: Message-ID: On Mon, 8 Jul 2024 21:49:51 GMT, William Kemper wrote: > This isn't really a unit test and it must be run with Shenandoah enabled, but it exercises a hard to reach code path. The test has suffered some bit-rot* of late that needs to be addressed. > > * We no longer set soft-max-capacity on the old generation. Marked as reviewed by kdnilsen (Committer). test/hotspot/gtest/gc/shenandoah/test_shenandoahOldHeuristic.cpp line 89: > 87: _heap->heap_region_iterate(&reset); > 88: _heap->old_generation()->set_capacity(ShenandoahHeapRegion::region_size_bytes() * 10); > 89: _heap->old_generation()->set_evacuation_reserve(ShenandoahHeapRegion::region_size_bytes() * 4); I'll approve these changes because I assume you've tested and are satisfied with the results. Might be good to leave some comments in this code that warn the user that failure of this test does not necessarily mean we've broken old heuristics. What causes me some concern is that setting the capacity and reserves of old generation is very "dangerous". There are "assumptions" scattered throughout the implementation related to reserves being sufficient to handle anticipated evacuation, and furthermore, any attempt to take control over capacity and reserves may be defeated by the GC's ergonomics... ------------- PR Review: https://git.openjdk.org/shenandoah/pull/458#pullrequestreview-2164517412 PR Review Comment: https://git.openjdk.org/shenandoah/pull/458#discussion_r1669383515 From kdnilsen at openjdk.org Mon Jul 8 22:13:46 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 8 Jul 2024 22:13:46 GMT Subject: RFR: 8335932: GenShen: Fix old heuristic unit test In-Reply-To: References:

Message-ID: On Mon, 8 Jul 2024 22:03:51 GMT, Kelvin Nilsen wrote: >> This isn't really a unit test and it must be run with Shenandoah enabled, but it exercises a hard to reach code path. The test has suffered some bit-rot* of late that needs to be addressed. >> >> * We no longer set soft-max-capacity on the old generation. > > test/hotspot/gtest/gc/shenandoah/test_shenandoahOldHeuristic.cpp line 89: > >> 87: _heap->heap_region_iterate(&reset); >> 88: _heap->old_generation()->set_capacity(ShenandoahHeapRegion::region_size_bytes() * 10); >> 89: _heap->old_generation()->set_evacuation_reserve(ShenandoahHeapRegion::region_size_bytes() * 4); > > I'll approve these changes because I assume you've tested and are satisfied with the results. > > Might be good to leave some comments in this code that warn the user that failure of this test does not necessarily mean we've broken old heuristics. > > What causes me some concern is that setting the capacity and reserves of old generation is very "dangerous". There are "assumptions" scattered throughout the implementation related to reserves being sufficient to handle anticipated evacuation, and furthermore, any attempt to take control over capacity and reserves may be defeated by the GC's ergonomics... The fourth argument to finish_rebuild (default value false) is a flag that is supposed to be true when evacuation_reserve() is meaningful. When this flag is true, we also assume that old_evacuation_reserve() and old_promo_reserve() are valid. If this flag is false, the freeset rebuilder will ignore the value of evacuation_reserve. I wonder how we make sure that the evacuation reserve we set in this "unit test" program is not ignored. (In the share-collector-reserves development branch, I get rid of this flag and assume reserves are always valid.) ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/458#discussion_r1669388696 From wkemper at openjdk.org Mon Jul 8 23:26:10 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 8 Jul 2024 23:26:10 GMT Subject: RFR: 8335932: GenShen: Fix old heuristic unit test [v2] In-Reply-To: References: Message-ID: > This isn't really a unit test and it must be run with Shenandoah enabled, but it exercises a hard to reach code path. The test has suffered some bit-rot* of late that needs to be addressed. > > * We no longer set soft-max-capacity on the old generation. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Skip destructor actions when disabled ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/458/files - new: https://git.openjdk.org/shenandoah/pull/458/files/4e7e2b38..cf114eb5 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=458&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=458&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/458.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/458/head:pull/458 PR: https://git.openjdk.org/shenandoah/pull/458 From wkemper at openjdk.org Tue Jul 9 00:00:45 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 9 Jul 2024 00:00:45 GMT Subject: RFR: 8335932: GenShen: Fix old heuristic unit test [v2] In-Reply-To: References:

Message-ID: On Mon, 8 Jul 2024 22:11:31 GMT, Kelvin Nilsen wrote: >> test/hotspot/gtest/gc/shenandoah/test_shenandoahOldHeuristic.cpp line 89: >> >>> 87: _heap->heap_region_iterate(&reset); >>> 88: _heap->old_generation()->set_capacity(ShenandoahHeapRegion::region_size_bytes() * 10); >>> 89: _heap->old_generation()->set_evacuation_reserve(ShenandoahHeapRegion::region_size_bytes() * 4); >> >> I'll approve these changes because I assume you've tested and are satisfied with the results. >> >> Might be good to leave some comments in this code that warn the user that failure of this test does not necessarily mean we've broken old heuristics. >> >> What causes me some concern is that setting the capacity and reserves of old generation is very "dangerous". There are "assumptions" scattered throughout the implementation related to reserves being sufficient to handle anticipated evacuation, and furthermore, any attempt to take control over capacity and reserves may be defeated by the GC's ergonomics... > > The fourth argument to finish_rebuild (default value false) is a flag that is supposed to be true when evacuation_reserve() is meaningful. When this flag is true, we also assume that old_evacuation_reserve() and old_promo_reserve() are valid. If this flag is false, the freeset rebuilder will ignore the value of evacuation_reserve. > > I wonder how we make sure that the evacuation reserve we set in this "unit test" program is not ignored. > > (In the share-collector-reserves development branch, I get rid of this flag and assume reserves are always valid.) The tests are configuring the old generation with enough capacity and reserves to make sure old regions can be added to the collection set. Indeed, these tests are not true unit tests because they must be run with `-XX:+UseShenandoahGC` and if a _real_ collection happens during test execution, all bets are off. There is a lengthy comment at the top of the file explaining these risks and limitations. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/458#discussion_r1669458662 From wkemper at openjdk.org Tue Jul 9 17:31:26 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 9 Jul 2024 17:31:26 GMT Subject: Integrated: 8335932: GenShen: Fix old heuristic unit test In-Reply-To: References: Message-ID: On Mon, 8 Jul 2024 21:49:51 GMT, William Kemper wrote: > This isn't really a unit test and it must be run with Shenandoah enabled, but it exercises a hard to reach code path. The test has suffered some bit-rot* of late that needs to be addressed. > > * We no longer set soft-max-capacity on the old generation. This pull request has now been integrated. Changeset: f0b87713 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/f0b877138a7dd45fc957178de23cffe6e32405a0 Stats: 12 lines in 1 file changed: 6 ins; 3 del; 3 mod 8335932: GenShen: Fix old heuristic unit test Reviewed-by: kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/458 From wkemper at openjdk.org Tue Jul 9 18:34:01 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 9 Jul 2024 18:34:01 GMT Subject: RFR: 8335932: GenShen: Fix old heuristic unit test Message-ID: Clean backport, test only, low risk. ------------- Commit messages: - Backport f0b877138a7dd45fc957178de23cffe6e32405a0 Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/64/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=64&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335932 Stats: 12 lines in 1 file changed: 6 ins; 3 del; 3 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/64.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/64/head:pull/64 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/64 From wkemper at openjdk.org Tue Jul 9 18:43:36 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 9 Jul 2024 18:43:36 GMT Subject: Integrated: 8335932: GenShen: Fix old heuristic unit test In-Reply-To: References: Message-ID: On Tue, 9 Jul 2024 18:08:10 GMT, William Kemper wrote: > Clean backport, test only, low risk. This pull request has now been integrated. Changeset: 2d6289f7 Author: William Kemper URL: https://git.openjdk.org/shenandoah-jdk21u/commit/2d6289f7a2536bb7e320a26770a59a0f3e77fa77 Stats: 12 lines in 1 file changed: 6 ins; 3 del; 3 mod 8335932: GenShen: Fix old heuristic unit test Backport-of: f0b877138a7dd45fc957178de23cffe6e32405a0 ------------- PR: https://git.openjdk.org/shenandoah-jdk21u/pull/64 From kdnilsen at openjdk.org Tue Jul 9 21:22:44 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 9 Jul 2024 21:22:44 GMT Subject: RFR: 8334315: Shenandoah: reduce GC logging noise Message-ID: Qualify certain log messages as ergo or debug to remove extraneous noise from regular gc logs. ------------- Commit messages: - Reduce verbosity of gc log messages - Revert "Make GC logging less verbose" - Make GC logging less verbose - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 9 more: https://git.openjdk.org/jdk/compare/8464ce6d...2a74ecc4 Changes: https://git.openjdk.org/jdk/pull/19795/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19795&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334315 Stats: 17 lines in 1 file changed: 0 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/19795.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19795/head:pull/19795 PR: https://git.openjdk.org/jdk/pull/19795 From wkemper at openjdk.org Tue Jul 9 21:35:14 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 9 Jul 2024 21:35:14 GMT Subject: RFR: 8334315: Shenandoah: reduce GC logging noise In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:11:52 GMT, Kelvin Nilsen wrote: > Qualify certain log messages as ergo or debug to remove extraneous noise from regular gc logs. Marked as reviewed by wkemper (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19795#pullrequestreview-2167539722 From ysr at openjdk.org Tue Jul 9 22:01:17 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 9 Jul 2024 22:01:17 GMT Subject: RFR: 8334315: Shenandoah: reduce GC logging noise In-Reply-To: References: Message-ID: <_gAlo8USEPJAToN76JC667RoUbTdq_5z4UfvReAMrvY=.08520b88-301f-4316-b5c2-560b65d2c34b@github.com> On Wed, 19 Jun 2024 15:11:52 GMT, Kelvin Nilsen wrote: > Qualify certain log messages as ergo or debug to remove extraneous noise from regular gc logs. ? ? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19795#pullrequestreview-2167578144 From xpeng at openjdk.org Tue Jul 9 23:53:23 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Tue, 9 Jul 2024 23:53:23 GMT Subject: RFR: 8335356: Shenandoah: Improve concurrent cleanup locking Message-ID: Hi all, This PR is to improve the usage of heap lock in ShenandoahFreeSet::recycle_trash, the original observation mentioned in the [bug](https://bugs.openjdk.org/browse/JDK-8335356) should be caused by an uncommitted/reverted change I added when Aleksey and I worked on [JDK-8331411](https://bugs.openjdk.org/browse/JDK-8331411). Even the change was not committed, the way ShenandoahFreeSet::recycle_trash using heap lock is still not efficient, we think we should improve it. With the logs added in this commit https://github.com/openjdk/jdk/pull/20086/commits/5688ee2c0754483818a89bb7915f58a7464c2df2, I got some key metrics: average time to acquire heap lock is about 450 ~ 900 ns, average time to recycle one trash regions is about 600ns (if not batched, it is 1000+ns, might be related to branch prediction). The current implementation takes heap lock once for every trash region, assume there are 1000 regions to recycle, the time wasted on acquiring heap lock is more than 0.6ms. The PR splits the recycling process into two steps: 1. Filter out all the trash regions; 2. recycle the trash regions in batches. I can see some benefits which improve the performance: 1. Less time spent on acquiring heap lock, less contention with mutators/allocators 2.Simpler loops in filtering and batch recycling, presumably benefit CPU branch prediction Here are some logs from test running h2 benchmark: TIP with debug log, [code link](https://github.com/openjdk/jdk/compare/master...pengxiaolong:jdk:JDK-8335356-baseline?expand=1), Average time per region: 2312 ns [6.013s][info][gc] GC(0) Recycled 0 regions in 58675ns, break down: acquiring lock -> 0, recycling -> 0. [6.093s][info][gc] GC(0) Recycled 641 regions in 3025757ns, break down: acquiring lock -> 260016, recycling -> 548345. [9.354s][info][gc] GC(1) Recycled 1 regions in 61793ns, break down: acquiring lock -> 481, recycling -> 1141. [9.428s][info][gc] GC(1) Recycled 600 regions in 1083206ns, break down: acquiring lock -> 256578, recycling -> 511334. [12.145s][info][gc] GC(2) Recycled 35 regions in 118390ns, break down: acquiring lock -> 13703, recycling -> 27438. [12.202s][info][gc] GC(2) Recycled 553 regions in 911747ns, break down: acquiring lock -> 209511, recycling -> 426575. [15.086s][info][gc] GC(3) Recycled 106 regions in 218396ns, break down: acquiring lock -> 39089, recycling -> 80520. [15.164s][info][gc] GC(3) Recycled 454 regions in 762128ns, break down: acquiring lock -> 172583, recycling -> 351263. [18.781s][info][gc] GC(4) Recycled 119 regions in 244275ns, break down: acquiring lock -> 45741, recycling -> 92841. [18.866s][info][gc] GC(4) Recycled 437 regions in 721638ns, break down: acquiring lock -> 162149, recycling -> 329194. [20.735s][info][gc] GC(5) Recycled 119 regions in 244292ns, break down: acquiring lock -> 45992, recycling -> 93320. [20.788s][info][gc] GC(5) Recycled 193 regions in 364782ns, break down: acquiring lock -> 73267, recycling -> 149695. [21.699s][info][gc] GC(6) Recycled 0 regions in 92333ns, break down: acquiring lock -> 0, recycling -> 0. [21.856s][info][gc] GC(6) Recycled 552 regions in 1372852ns, break down: acquiring lock -> 302017, recycling -> 621198. [22.196s][info][gc] GC(7) Recycled 0 regions in 80586ns, break down: acquiring lock -> 0, recycling -> 0. [22.361s][info][gc] GC(7) Recycled 531 regions in 1433166ns, break down: acquiring lock -> 365550, recycling -> 632302. [22.720s][info][gc] GC(8) Recycled 0 regions in 74306ns, break down: acquiring lock -> 0, recycling -> 0. [22.898s][info][gc] GC(8) Recycled 530 regions in 1331485ns, break down: acquiring lock -> 299571, recycling -> 620722. [23.147s][info][gc] GC(9) Recycled 0 regions in 77732ns, break down: acquiring lock -> 0, recycling -> 0. [23.331s][info][gc] GC(9) Recycled 531 regions in 1361709ns, break down: acquiring lock -> 311774, recycling -> 653233. [24.257s][info][gc] GC(10) Recycled 0 regions in 62440ns, break down: acquiring lock -> 0, recycling -> 0. [24.460s][info][gc] GC(10) Recycled 1480 regions in 3611397ns, break down: acquiring lock -> 906760, recycling -> 1729345. [25.407s][info][gc] GC(11) Recycled 1 regions in 61455ns, break down: acquiring lock -> 585, recycling -> 1362. [25.597s][info][gc] GC(11) Recycled 1438 regions in 2313578ns, break down: acquiring lock -> 546099, recycling -> 1126582. Optimized, but w/o batching optimization, basically recycle all trash with one single lock acquirement , [code link](https://github.com/openjdk/jdk/pull/20086/commits/5688ee2c0754483818a89bb7915f58a7464c2df2), Average time per region: 560 ns [6.097s][info][gc] GC(0) Recycled 641 regions in 280280ns, break down: filtering -> 20216, taking heap lock -> 569, recycling -> 259495. [9.568s][info][gc] GC(1) Recycled 1 regions in 10592ns, break down: filtering -> 8695, taking heap lock -> 568, recycling -> 1329. [9.643s][info][gc] GC(1) Recycled 600 regions in 260624ns, break down: filtering -> 8023, taking heap lock -> 736, recycling -> 251865. [12.648s][info][gc] GC(2) Recycled 34 regions in 24651ns, break down: filtering -> 9739, taking heap lock -> 524, recycling -> 14388. [12.706s][info][gc] GC(2) Recycled 552 regions in 252231ns, break down: filtering -> 26613, taking heap lock -> 624, recycling -> 224994. [15.579s][info][gc] GC(3) Recycled 102 regions in 50140ns, break down: filtering -> 7846, taking heap lock -> 550, recycling -> 41744. [15.662s][info][gc] GC(3) Recycled 461 regions in 187735ns, break down: filtering -> 8851, taking heap lock -> 479, recycling -> 178405. [19.269s][info][gc] GC(4) Recycled 117 regions in 55504ns, break down: filtering -> 8709, taking heap lock -> 548, recycling -> 46247. [19.360s][info][gc] GC(4) Recycled 437 regions in 187981ns, break down: filtering -> 9582, taking heap lock -> 505, recycling -> 177894. [21.269s][info][gc] GC(5) Recycled 124 regions in 57666ns, break down: filtering -> 8986, taking heap lock -> 537, recycling -> 48143. [21.327s][info][gc] GC(5) Recycled 190 regions in 85890ns, break down: filtering -> 7768, taking heap lock -> 494, recycling -> 77628. [22.367s][info][gc] GC(6) Recycled 547 regions in 378074ns, break down: filtering -> 11634, taking heap lock -> 714, recycling -> 365726. [22.733s][info][gc] GC(7) Recycled 2 regions in 27277ns, break down: filtering -> 24172, taking heap lock -> 741, recycling -> 2364. [22.895s][info][gc] GC(7) Recycled 533 regions in 339216ns, break down: filtering -> 12006, taking heap lock -> 778, recycling -> 326432. [23.213s][info][gc] GC(8) Recycled 1 regions in 28104ns, break down: filtering -> 25781, taking heap lock -> 722, recycling -> 1601. [23.393s][info][gc] GC(8) Recycled 529 regions in 341289ns, break down: filtering -> 10257, taking heap lock -> 682, recycling -> 330350. [23.861s][info][gc] GC(9) Recycled 523 regions in 339914ns, break down: filtering -> 12715, taking heap lock -> 746, recycling -> 326453. [25.120s][info][gc] GC(10) Recycled 1515 regions in 1148153ns, break down: filtering -> 13906, taking heap lock -> 739, recycling -> 1133508. [25.873s][info][gc] GC(11) Recycled 1 regions in 13233ns, break down: filtering -> 11375, taking heap lock -> 568, recycling -> 1290. [26.109s][info][gc] GC(11) Recycled 1237 regions in 493244ns, break down: filtering -> 12178, taking heap lock -> 557, recycling -> 480509. With batch size of 128, [code link](https://github.com/openjdk/jdk/pull/20086/commits/ece3f3c7612560e1b66ca1e56e896488e3c593bc), Average time per region: 533 ns [6.066s][info][gc] GC(0) Recycled 641 regions in 290048ns, break down: filtering -> 20937, recycling -> 257691, yields -> 6088. [9.514s][info][gc] GC(1) Recycled 1 regions in 12863ns, break down: filtering -> 9186, recycling -> 1255, yields -> 1487. [9.591s][info][gc] GC(1) Recycled 601 regions in 285321ns, break down: filtering -> 11941, recycling -> 265095, yields -> 5540. [12.590s][info][gc] GC(2) Recycled 35 regions in 27005ns, break down: filtering -> 9873, recycling -> 14893, yields -> 1341. [12.650s][info][gc] GC(2) Recycled 551 regions in 231127ns, break down: filtering -> 10833, recycling -> 212840, yields -> 5054. [15.504s][info][gc] GC(3) Recycled 101 regions in 54762ns, break down: filtering -> 9759, recycling -> 42579, yields -> 1500. [15.591s][info][gc] GC(3) Recycled 466 regions in 197675ns, break down: filtering -> 9672, recycling -> 181928, yields -> 4095. [19.231s][info][gc] GC(4) Recycled 121 regions in 58985ns, break down: filtering -> 8601, recycling -> 47931, yields -> 1565. [19.322s][info][gc] GC(4) Recycled 439 regions in 186173ns, break down: filtering -> 9754, recycling -> 170179, yields -> 4269. [21.204s][info][gc] GC(5) Recycled 120 regions in 59352ns, break down: filtering -> 8120, recycling -> 48746, yields -> 1602. [21.257s][info][gc] GC(5) Recycled 191 regions in 89995ns, break down: filtering -> 8695, recycling -> 77421, yields -> 2596. [22.291s][info][gc] GC(6) Recycled 550 regions in 344021ns, break down: filtering -> 7845, recycling -> 325227, yields -> 7251. [22.789s][info][gc] GC(7) Recycled 535 regions in 352193ns, break down: filtering -> 11420, recycling -> 328723, yields -> 8067. [23.265s][info][gc] GC(8) Recycled 530 regions in 344795ns, break down: filtering -> 12356, recycling -> 321523, yields -> 7389. [23.731s][info][gc] GC(9) Recycled 526 regions in 268314ns, break down: filtering -> 11727, recycling -> 248182, yields -> 5403. [24.913s][info][gc] GC(10) Recycled 1520 regions in 1035594ns, break down: filtering -> 15586, recycling -> 995272, yields -> 16585. [25.827s][info][gc] GC(11) Recycled 1 regions in 12540ns, break down: filtering -> 9191, recycling -> 1162, yields -> 1286. [25.997s][info][gc] GC(11) Recycled 1406 regions in 593511ns, break down: filtering -> 11703, recycling -> 566761, yields -> 10335. Batch with timed lock up to 30us, PR version, Average time per region: 1118 ns [6.103s][info][gc] GC(0) Recycled 0 regions in 9421ns with 0 batches. [6.189s][info][gc] GC(0) Recycled 641 regions in 570953ns with 18 batches. [9.481s][info][gc] GC(1) Recycled 1 regions in 11745ns with 1 batches. [9.552s][info][gc] GC(1) Recycled 597 regions in 495402ns with 16 batches. [12.295s][info][gc] GC(2) Recycled 35 regions in 36924ns with 1 batches. [12.353s][info][gc] GC(2) Recycled 546 regions in 443037ns with 14 batches. [15.226s][info][gc] GC(3) Recycled 100 regions in 92537ns with 3 batches. [15.310s][info][gc] GC(3) Recycled 463 regions in 423945ns with 14 batches. [19.031s][info][gc] GC(4) Recycled 118 regions in 107655ns with 4 batches. [19.121s][info][gc] GC(4) Recycled 440 regions in 359861ns with 12 batches. [21.094s][info][gc] GC(5) Recycled 125 regions in 108325ns with 4 batches. [21.155s][info][gc] GC(5) Recycled 191 regions in 192925ns with 6 batches. [22.038s][info][gc] GC(6) Recycled 0 regions in 23493ns with 0 batches. [22.213s][info][gc] GC(6) Recycled 574 regions in 748833ns with 23 batches. [22.548s][info][gc] GC(7) Recycled 1 regions in 24498ns with 1 batches. [22.716s][info][gc] GC(7) Recycled 532 regions in 684182ns with 21 batches. [23.232s][info][gc] GC(8) Recycled 1 regions in 14411ns with 1 batches. [23.455s][info][gc] GC(8) Recycled 528 regions in 715436ns with 22 batches. [23.766s][info][gc] GC(9) Recycled 0 regions in 12247ns with 0 batches. [23.982s][info][gc] GC(9) Recycled 703 regions in 685842ns with 22 batches. [24.557s][info][gc] GC(10) Recycled 1 regions in 15890ns with 1 batches. [24.760s][info][gc] GC(10) Recycled 1142 regions in 1506585ns with 47 batches. [25.524s][info][gc] GC(11) Recycled 1 regions in 12424ns with 1 batches. [25.731s][info][gc] GC(11) Recycled 1262 regions in 1695506ns with 52 batches. Decided on batch with timed lock for following reasons: 1. We can manage exactly how long it holds the heap lock(set to up to 30us), therefore manage exactly how much it could impact on long tail latencies in the worst case. 2. Not like static batch size, with timed lock the algorithm is adaptive to different hardwares/runtime, batch size will be automatically adjusted. Additional test: - [x] ```make clean test TEST=hotspot_gc_shenandoah``` ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:hotspot_gc_shenandoah 261 261 0 0 ============================== TEST SUCCESS ------------- Commit messages: - Code polish - Remove logs - Increase deadline to 30us - Timed lock - Remove inline and code polish - Polish - Remove empty lines - Polish the loops for recycling trash regions - Fix -Werror=reorder - Pre-allocate array for trash regions used in recycle_trash - ... and 9 more: https://git.openjdk.org/jdk/compare/a9b7f42f...dac1ae6e Changes: https://git.openjdk.org/jdk/pull/20086/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20086&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335356 Stats: 18 lines in 2 files changed: 14 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20086.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20086/head:pull/20086 PR: https://git.openjdk.org/jdk/pull/20086 From ysr at openjdk.org Wed Jul 10 07:37:15 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 10 Jul 2024 07:37:15 GMT Subject: RFR: 8335356: Shenandoah: Improve concurrent cleanup locking In-Reply-To: References: Message-ID: On Mon, 8 Jul 2024 21:38:41 GMT, Xiaolong Peng wrote: > Hi all, > This PR is to improve the usage of heap lock in ShenandoahFreeSet::recycle_trash, the original observation mentioned in the [bug](https://bugs.openjdk.org/browse/JDK-8335356) should be caused by an uncommitted/reverted change I added when Aleksey and I worked on [JDK-8331411](https://bugs.openjdk.org/browse/JDK-8331411). Even the change was not committed, the way ShenandoahFreeSet::recycle_trash using heap lock is still not efficient, we think we should improve it. > With the logs added in this commit https://github.com/openjdk/jdk/pull/20086/commits/5688ee2c0754483818a89bb7915f58a7464c2df2, I got some key metrics: average time to acquire heap lock is about 450 ~ 900 ns, average time to recycle one trash regions is about 600ns (if not batched, it is 1000+ns, might be related to branch prediction). The current implementation takes heap lock once for every trash region, assume there are 1000 regions to recycle, the time wasted on acquiring heap lock is more than 0.6ms. > > The PR splits the recycling process into two steps: 1. Filter out all the trash regions; 2. recycle the trash regions in batches. I can see some benefits which improve the performance: > 1. Less time spent on acquiring heap lock, less contention with mutators/allocators > 2.Simpler loops in filtering and batch recycling, presumably benefit CPU branch prediction > > Here are some logs from test running h2 benchmark: > > TIP with debug log, [code link](https://github.com/openjdk/jdk/compare/master...pengxiaolong:jdk:JDK-8335356-baseline?expand=1), Average time per region: 2312 ns > > [6.013s][info][gc] GC(0) Recycled 0 regions in 58675ns, break down: acquiring lock -> 0, recycling -> 0. > [6.093s][info][gc] GC(0) Recycled 641 regions in 3025757ns, break down: acquiring lock -> 260016, recycling -> 548345. > [9.354s][info][gc] GC(1) Recycled 1 regions in 61793ns, break down: acquiring lock -> 481, recycling -> 1141. > [9.428s][info][gc] GC(1) Recycled 600 regions in 1083206ns, break down: acquiring lock -> 256578, recycling -> 511334. > [12.145s][info][gc] GC(2) Recycled 35 regions in 118390ns, break down: acquiring lock -> 13703, recycling -> 27438. > [12.202s][info][gc] GC(2) Recycled 553 regions in 911747ns, break down: acquiring lock -> 209511, recycling -> 426575. > [15.086s][info][gc] GC(3) Recycled 106 regions in 218396ns, break down: acquiring lock -> 39089, recycling -> 80520. > [15.164s][info][gc] GC(3) Recycled 454 regions in 762128ns, break down: acquiring lock -> 172583, recycl... Could you share any visible changes using the three different schemes (and baseline current) with say SPECjbb or such. Ideally, this affects some user-visible score or latency that we can use as a goodness metric that improves. I am a bit leery of why exactly 30 us, and not say 100 us. Also, I am thinking that a straight count might perform as well and the time-based solution almost seems overengineered to me -- or at least I'd like to see evidence that that engineering effort is worth the resulting bang for a service level metric such as latency or throughput. ------------- PR Review: https://git.openjdk.org/jdk/pull/20086#pullrequestreview-2168278271 From shade at openjdk.org Wed Jul 10 07:55:26 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 Jul 2024 07:55:26 GMT Subject: RFR: 8334315: Shenandoah: reduce GC logging noise In-Reply-To: References: Message-ID: On Wed, 19 Jun 2024 15:11:52 GMT, Kelvin Nilsen wrote: > Qualify certain log messages as ergo or debug to remove extraneous noise from regular gc logs. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19795#pullrequestreview-2168317699 From shade at openjdk.org Wed Jul 10 08:06:15 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 Jul 2024 08:06:15 GMT Subject: RFR: 8335356: Shenandoah: Improve concurrent cleanup locking In-Reply-To: References:

Message-ID: On Wed, 10 Jul 2024 07:34:39 GMT, Y. Srinivas Ramakrishna wrote: > Also, I am thinking that a straight count might perform as well and the time-based solution almost seems overengineered to me -- or at least I'd like to see evidence that that engineering effort is worth the resulting bang for a service level metric such as latency or throughput. I suggested the time-based approach to Xiaolong to side-step the discussion about the "reasonable" batch size. The good batch size would fluctuate between the machines, heap sizes, region counts. Since we are doing this whole dance to avoid hoarding the lock for a long time to avoid tail latencies increase for allocators waiting for the same lock, it is also more reasonable to just track the time directly here. This is not to mention that fastdebug builds would zap the unused heap, which makes cleanup orders of magnitude slower, and the large batch sizes would hoard the lock way too much, deviating from the "normal" release behavior. Time-based approach accomodates this as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20086#issuecomment-2219822121 From shade at openjdk.org Wed Jul 10 09:02:21 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 Jul 2024 09:02:21 GMT Subject: RFR: 8335356: Shenandoah: Improve concurrent cleanup locking In-Reply-To: References: Message-ID: On Mon, 8 Jul 2024 21:38:41 GMT, Xiaolong Peng wrote: > Hi all, > This PR is to improve the usage of heap lock in ShenandoahFreeSet::recycle_trash, the original observation mentioned in the [bug](https://bugs.openjdk.org/browse/JDK-8335356) should be caused by an uncommitted/reverted change I added when Aleksey and I worked on [JDK-8331411](https://bugs.openjdk.org/browse/JDK-8331411). Even the change was not committed, the way ShenandoahFreeSet::recycle_trash using heap lock is still not efficient, we think we should improve it. > With the logs added in this commit https://github.com/openjdk/jdk/pull/20086/commits/5688ee2c0754483818a89bb7915f58a7464c2df2, I got some key metrics: average time to acquire heap lock is about 450 ~ 900 ns, average time to recycle one trash regions is about 600ns (if not batched, it is 1000+ns, might be related to branch prediction). The current implementation takes heap lock once for every trash region, assume there are 1000 regions to recycle, the time wasted on acquiring heap lock is more than 0.6ms. > > The PR splits the recycling process into two steps: 1. Filter out all the trash regions; 2. recycle the trash regions in batches. I can see some benefits which improve the performance: > 1. Less time spent on acquiring heap lock, less contention with mutators/allocators > 2.Simpler loops in filtering and batch recycling, presumably benefit CPU branch prediction > > Here are some logs from test running h2 benchmark: > > TIP with debug log, [code link](https://github.com/openjdk/jdk/compare/master...pengxiaolong:jdk:JDK-8335356-baseline?expand=1), Average time per region: 2312 ns > > [6.013s][info][gc] GC(0) Recycled 0 regions in 58675ns, break down: acquiring lock -> 0, recycling -> 0. > [6.093s][info][gc] GC(0) Recycled 641 regions in 3025757ns, break down: acquiring lock -> 260016, recycling -> 548345. > [9.354s][info][gc] GC(1) Recycled 1 regions in 61793ns, break down: acquiring lock -> 481, recycling -> 1141. > [9.428s][info][gc] GC(1) Recycled 600 regions in 1083206ns, break down: acquiring lock -> 256578, recycling -> 511334. > [12.145s][info][gc] GC(2) Recycled 35 regions in 118390ns, break down: acquiring lock -> 13703, recycling -> 27438. > [12.202s][info][gc] GC(2) Recycled 553 regions in 911747ns, break down: acquiring lock -> 209511, recycling -> 426575. > [15.086s][info][gc] GC(3) Recycled 106 regions in 218396ns, break down: acquiring lock -> 39089, recycling -> 80520. > [15.164s][info][gc] GC(3) Recycled 454 regions in 762128ns, break down: acquiring lock -> 172583, recycl... Found an easy workload to demonstrate the impact on max latencies on allocation path. public class TimedAlloc { static volatile Object sink; public static void main(String... args) throws Throwable { for (int c = 0; c < 10; c++) { run(); } } public static void run() { long cur = System.nanoTime(); long end = cur + 3_000_000_000L; long sum = 0; long max = 0; long allocs = 0; while (cur < end) { long start = System.nanoTime(); sink = new Object[10000]; cur = System.nanoTime(); long v = (cur - start); sum += v; max = Math.max(max, v); allocs++; } System.out.printf("Allocs: %15d; Avg: %8d, Max: %9d%n", allocs, (sum / allocs), max); } } $ java -Xms30g -Xmx30g -XX:+AlwaysPreTouch -XX:+UseShenandoahGC ../TimedAlloc.java # Baseline Allocs: 3303444; Avg: 869, Max: 572869 Allocs: 3331117; Avg: 864, Max: 543211 Allocs: 3368134; Avg: 854, Max: 504501 Allocs: 3369721; Avg: 854, Max: 380658 Allocs: 3370053; Avg: 854, Max: 471238 Allocs: 3370687; Avg: 854, Max: 409657 Allocs: 3370261; Avg: 854, Max: 371527 Allocs: 3370627; Avg: 854, Max: 452230 Allocs: 3369164; Avg: 854, Max: 473957 Allocs: 3370499; Avg: 854, Max: 450008 # Patched Allocs: 3311185; Avg: 866, Max: 58489 Allocs: 3338693; Avg: 861, Max: 81620 Allocs: 3378208; Avg: 852, Max: 81913 Allocs: 3379441; Avg: 852, Max: 69214 Allocs: 3380807; Avg: 851, Max: 186430 Allocs: 3380362; Avg: 851, Max: 61498 Allocs: 3380142; Avg: 851, Max: 84395 Allocs: 3379811; Avg: 851, Max: 81258 Allocs: 3380572; Avg: 851, Max: 55142 Allocs: 3379696; Avg: 852, Max: 81612 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20086#issuecomment-2219949530 From ysr at openjdk.org Wed Jul 10 20:07:44 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 10 Jul 2024 20:07:44 GMT Subject: RFR: 8335356: Shenandoah: Improve concurrent cleanup locking In-Reply-To: References: Message-ID: <-_kQgM-yE1H8ZNX1chP1LcmpPdZYQgf0H7ggPN9XyYU=.24304200-8fa8-410a-aeeb-059095db8d5a@github.com> On Mon, 8 Jul 2024 21:38:41 GMT, Xiaolong Peng wrote: > Hi all, > This PR is to improve the usage of heap lock in ShenandoahFreeSet::recycle_trash, the original observation mentioned in the [bug](https://bugs.openjdk.org/browse/JDK-8335356) should be caused by an uncommitted/reverted change I added when Aleksey and I worked on [JDK-8331411](https://bugs.openjdk.org/browse/JDK-8331411). Even the change was not committed, the way ShenandoahFreeSet::recycle_trash using heap lock is still not efficient, we think we should improve it. > With the logs added in this commit https://github.com/openjdk/jdk/pull/20086/commits/5688ee2c0754483818a89bb7915f58a7464c2df2, I got some key metrics: average time to acquire heap lock is about 450 ~ 900 ns, average time to recycle one trash regions is about 600ns (if not batched, it is 1000+ns, might be related to branch prediction). The current implementation takes heap lock once for every trash region, assume there are 1000 regions to recycle, the time wasted on acquiring heap lock is more than 0.6ms. > > The PR splits the recycling process into two steps: 1. Filter out all the trash regions; 2. recycle the trash regions in batches. I can see some benefits which improve the performance: > 1. Less time spent on acquiring heap lock, less contention with mutators/allocators > 2.Simpler loops in filtering and batch recycling, presumably benefit CPU branch prediction > > Here are some logs from test running h2 benchmark: > > TIP with debug log, [code link](https://github.com/openjdk/jdk/compare/master...pengxiaolong:jdk:JDK-8335356-baseline?expand=1), Average time per region: 2312 ns > > [6.013s][info][gc] GC(0) Recycled 0 regions in 58675ns, break down: acquiring lock -> 0, recycling -> 0. > [6.093s][info][gc] GC(0) Recycled 641 regions in 3025757ns, break down: acquiring lock -> 260016, recycling -> 548345. > [9.354s][info][gc] GC(1) Recycled 1 regions in 61793ns, break down: acquiring lock -> 481, recycling -> 1141. > [9.428s][info][gc] GC(1) Recycled 600 regions in 1083206ns, break down: acquiring lock -> 256578, recycling -> 511334. > [12.145s][info][gc] GC(2) Recycled 35 regions in 118390ns, break down: acquiring lock -> 13703, recycling -> 27438. > [12.202s][info][gc] GC(2) Recycled 553 regions in 911747ns, break down: acquiring lock -> 209511, recycling -> 426575. > [15.086s][info][gc] GC(3) Recycled 106 regions in 218396ns, break down: acquiring lock -> 39089, recycling -> 80520. > [15.164s][info][gc] GC(3) Recycled 454 regions in 762128ns, break down: acquiring lock -> 172583, recycl... Marked as reviewed by ysr (Reviewer). Impressive and a nice demonstration of the improvements! Benchmarking with HyperAlloc may also be useful or even just SPECjbb may show some non-linear improvements, who knows? May be worth measuring, perhaps? Running the count-based and time-based on a (slow,fast) x (arm,x86) system to fill the matrix would be great, but may be more effort than worthwhile, but just putting it out there. Good data of actual measured improvements always makes me happy, though! :-) Thanks for the extra effort in collecting the data and sharing it. Reviewed and approved, thank you! ------------- PR Review: https://git.openjdk.org/jdk/pull/20086#pullrequestreview-2169690869 PR Comment: https://git.openjdk.org/jdk/pull/20086#issuecomment-2220964886 From shade at openjdk.org Wed Jul 10 20:07:46 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 Jul 2024 20:07:46 GMT Subject: RFR: 8335356: Shenandoah: Improve concurrent cleanup locking In-Reply-To: References: Message-ID: On Mon, 8 Jul 2024 21:38:41 GMT, Xiaolong Peng wrote: > Hi all, > This PR is to improve the usage of heap lock in ShenandoahFreeSet::recycle_trash, the original observation mentioned in the [bug](https://bugs.openjdk.org/browse/JDK-8335356) should be caused by an uncommitted/reverted change I added when Aleksey and I worked on [JDK-8331411](https://bugs.openjdk.org/browse/JDK-8331411). Even the change was not committed, the way ShenandoahFreeSet::recycle_trash using heap lock is still not efficient, we think we should improve it. > With the logs added in this commit https://github.com/openjdk/jdk/pull/20086/commits/5688ee2c0754483818a89bb7915f58a7464c2df2, I got some key metrics: average time to acquire heap lock is about 450 ~ 900 ns, average time to recycle one trash regions is about 600ns (if not batched, it is 1000+ns, might be related to branch prediction). The current implementation takes heap lock once for every trash region, assume there are 1000 regions to recycle, the time wasted on acquiring heap lock is more than 0.6ms. > > The PR splits the recycling process into two steps: 1. Filter out all the trash regions; 2. recycle the trash regions in batches. I can see some benefits which improve the performance: > 1. Less time spent on acquiring heap lock, less contention with mutators/allocators > 2.Simpler loops in filtering and batch recycling, presumably benefit CPU branch prediction > > Here are some logs from test running h2 benchmark: > > TIP with debug log, [code link](https://github.com/openjdk/jdk/compare/master...pengxiaolong:jdk:JDK-8335356-baseline?expand=1), Average time per region: 2312 ns > > [6.013s][info][gc] GC(0) Recycled 0 regions in 58675ns, break down: acquiring lock -> 0, recycling -> 0. > [6.093s][info][gc] GC(0) Recycled 641 regions in 3025757ns, break down: acquiring lock -> 260016, recycling -> 548345. > [9.354s][info][gc] GC(1) Recycled 1 regions in 61793ns, break down: acquiring lock -> 481, recycling -> 1141. > [9.428s][info][gc] GC(1) Recycled 600 regions in 1083206ns, break down: acquiring lock -> 256578, recycling -> 511334. > [12.145s][info][gc] GC(2) Recycled 35 regions in 118390ns, break down: acquiring lock -> 13703, recycling -> 27438. > [12.202s][info][gc] GC(2) Recycled 553 regions in 911747ns, break down: acquiring lock -> 209511, recycling -> 426575. > [15.086s][info][gc] GC(3) Recycled 106 regions in 218396ns, break down: acquiring lock -> 39089, recycling -> 80520. > [15.164s][info][gc] GC(3) Recycled 454 regions in 762128ns, break down: acquiring lock -> 172583, recycl... Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20086#pullrequestreview-2169696730 From xpeng at openjdk.org Wed Jul 10 20:07:48 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 10 Jul 2024 20:07:48 GMT Subject: RFR: 8335356: Shenandoah: Improve concurrent cleanup locking In-Reply-To: References:

Message-ID: On Wed, 10 Jul 2024 08:59:35 GMT, Aleksey Shipilev wrote: >> Hi all, >> This PR is to improve the usage of heap lock in ShenandoahFreeSet::recycle_trash, the original observation mentioned in the [bug](https://bugs.openjdk.org/browse/JDK-8335356) should be caused by an uncommitted/reverted change I added when Aleksey and I worked on [JDK-8331411](https://bugs.openjdk.org/browse/JDK-8331411). Even the change was not committed, the way ShenandoahFreeSet::recycle_trash using heap lock is still not efficient, we think we should improve it. >> With the logs added in this commit https://github.com/openjdk/jdk/pull/20086/commits/5688ee2c0754483818a89bb7915f58a7464c2df2, I got some key metrics: average time to acquire heap lock is about 450 ~ 900 ns, average time to recycle one trash regions is about 600ns (if not batched, it is 1000+ns, might be related to branch prediction). The current implementation takes heap lock once for every trash region, assume there are 1000 regions to recycle, the time wasted on acquiring heap lock is more than 0.6ms. >> >> The PR splits the recycling process into two steps: 1. Filter out all the trash regions; 2. recycle the trash regions in batches. I can see some benefits which improve the performance: >> 1. Less time spent on acquiring heap lock, less contention with mutators/allocators >> 2.Simpler loops in filtering and batch recycling, presumably benefit CPU branch prediction >> >> Here are some logs from test running h2 benchmark: >> >> TIP with debug log, [code link](https://github.com/openjdk/jdk/compare/master...pengxiaolong:jdk:JDK-8335356-baseline?expand=1), Average time per region: 2312 ns >> >> [6.013s][info][gc] GC(0) Recycled 0 regions in 58675ns, break down: acquiring lock -> 0, recycling -> 0. >> [6.093s][info][gc] GC(0) Recycled 641 regions in 3025757ns, break down: acquiring lock -> 260016, recycling -> 548345. >> [9.354s][info][gc] GC(1) Recycled 1 regions in 61793ns, break down: acquiring lock -> 481, recycling -> 1141. >> [9.428s][info][gc] GC(1) Recycled 600 regions in 1083206ns, break down: acquiring lock -> 256578, recycling -> 511334. >> [12.145s][info][gc] GC(2) Recycled 35 regions in 118390ns, break down: acquiring lock -> 13703, recycling -> 27438. >> [12.202s][info][gc] GC(2) Recycled 553 regions in 911747ns, break down: acquiring lock -> 209511, recycling -> 426575. >> [15.086s][info][gc] GC(3) Recycled 106 regions in 218396ns, break down: acquiring lock -> 39089, recycling -> 80520. >> [15.164s][info][gc] GC(3) Recycled 454 regions in 762128ns, brea... > > Found an easy workload to demonstrate the impact on max latencies on allocation path. > > > public class TimedAlloc { > static volatile Object sink; > > public static void main(String... args) throws Throwable { > for (int c = 0; c < 10; c++) { > run(); > } > } > > public static void run() { > long cur = System.nanoTime(); > long end = cur + 3_000_000_000L; > > long sum = 0; > long max = 0; > long allocs = 0; > > while (cur < end) { > long start = System.nanoTime(); > sink = new byte[40000]; > cur = System.nanoTime(); > long v = (cur - start); > sum += v; > max = Math.max(max, v); > allocs++; > } > > System.out.printf("Allocs: %15d; Avg: %8d, Max: %9d%n", allocs, (sum / allocs), max); > } > } > > > > $ java -Xms30g -Xmx30g -XX:+AlwaysPreTouch -XX:+UseShenandoahGC ../TimedAlloc.java > > # Baseline > Allocs: 3294669; Avg: 868, Max: 445998 > Allocs: 3372764; Avg: 852, Max: 528294 > Allocs: 3342978; Avg: 861, Max: 478060 > Allocs: 3341784; Avg: 861, Max: 468640 > Allocs: 3341870; Avg: 861, Max: 494377 > Allocs: 3342338; Avg: 861, Max: 469976 > Allocs: 3340135; Avg: 862, Max: 377933 > Allocs: 3341220; Avg: 862, Max: 511117 > Allocs: 3341673; Avg: 861, Max: 494394 > Allocs: 3341495; Avg: 861, Max: 506392 > > # Patched > Allocs: 3311013; Avg: 867, Max: 81908 > Allocs: 3376562; Avg: 851, Max: 82006 > Allocs: 3343815; Avg: 861, Max: 36880 > Allocs: 3341872; Avg: 861, Max: 87010 > Allocs: 3341824; Avg: 861, Max: 65734 > Allocs: 3342571; Avg: 861, Max: 137444 > Allocs: 3343636; Avg: 861, Max: 81407 > Allocs: 3345381; Avg: 860, Max: 36505 > Allocs: 3343388; Avg: 861, Max: 80849 > Allocs: 3344334; Avg: 861, Max: 59562 Thanks a lot @shipilev @ysramakrishna! I'll attach more benchmark result if I get some. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20086#issuecomment-2221097945 From duke at openjdk.org Wed Jul 10 20:07:48 2024 From: duke at openjdk.org (duke) Date: Wed, 10 Jul 2024 20:07:48 GMT Subject: RFR: 8335356: Shenandoah: Improve concurrent cleanup locking In-Reply-To: References: Message-ID: <84SUGDIEAZXiA1qIsMqDpo1RLDvEEkj8tnpp7Zh-20Y=.9b4eebec-6879-4c49-a152-1b42f7af1893@github.com> On Mon, 8 Jul 2024 21:38:41 GMT, Xiaolong Peng wrote: > Hi all, > This PR is to improve the usage of heap lock in ShenandoahFreeSet::recycle_trash, the original observation mentioned in the [bug](https://bugs.openjdk.org/browse/JDK-8335356) should be caused by an uncommitted/reverted change I added when Aleksey and I worked on [JDK-8331411](https://bugs.openjdk.org/browse/JDK-8331411). Even the change was not committed, the way ShenandoahFreeSet::recycle_trash using heap lock is still not efficient, we think we should improve it. > With the logs added in this commit https://github.com/openjdk/jdk/pull/20086/commits/5688ee2c0754483818a89bb7915f58a7464c2df2, I got some key metrics: average time to acquire heap lock is about 450 ~ 900 ns, average time to recycle one trash regions is about 600ns (if not batched, it is 1000+ns, might be related to branch prediction). The current implementation takes heap lock once for every trash region, assume there are 1000 regions to recycle, the time wasted on acquiring heap lock is more than 0.6ms. > > The PR splits the recycling process into two steps: 1. Filter out all the trash regions; 2. recycle the trash regions in batches. I can see some benefits which improve the performance: > 1. Less time spent on acquiring heap lock, less contention with mutators/allocators > 2.Simpler loops in filtering and batch recycling, presumably benefit CPU branch prediction > > Here are some logs from test running h2 benchmark: > > TIP with debug log, [code link](https://github.com/openjdk/jdk/compare/master...pengxiaolong:jdk:JDK-8335356-baseline?expand=1), Average time per region: 2312 ns > > [6.013s][info][gc] GC(0) Recycled 0 regions in 58675ns, break down: acquiring lock -> 0, recycling -> 0. > [6.093s][info][gc] GC(0) Recycled 641 regions in 3025757ns, break down: acquiring lock -> 260016, recycling -> 548345. > [9.354s][info][gc] GC(1) Recycled 1 regions in 61793ns, break down: acquiring lock -> 481, recycling -> 1141. > [9.428s][info][gc] GC(1) Recycled 600 regions in 1083206ns, break down: acquiring lock -> 256578, recycling -> 511334. > [12.145s][info][gc] GC(2) Recycled 35 regions in 118390ns, break down: acquiring lock -> 13703, recycling -> 27438. > [12.202s][info][gc] GC(2) Recycled 553 regions in 911747ns, break down: acquiring lock -> 209511, recycling -> 426575. > [15.086s][info][gc] GC(3) Recycled 106 regions in 218396ns, break down: acquiring lock -> 39089, recycling -> 80520. > [15.164s][info][gc] GC(3) Recycled 454 regions in 762128ns, break down: acquiring lock -> 172583, recycl... @pengxiaolong Your change (at version dac1ae6e73d6a4d3165a9a7f12a36ce09524f962) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20086#issuecomment-2221099381 From wkemper at openjdk.org Wed Jul 10 20:28:41 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 10 Jul 2024 20:28:41 GMT Subject: Integrated: 8336106: Genshen: Fix use of missing API in Shenandoah Old Heuristic test Message-ID: The recent fixes for the Shenandoah Old Heuristic test used an API that hasn't been backported. This change uses a different API available in 21 to achieve the same end. ------------- Commit messages: - Fix old heuristic test for 21 Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/65/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=65&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336106 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/65.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/65/head:pull/65 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/65 From kdnilsen at openjdk.org Wed Jul 10 20:28:47 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 10 Jul 2024 20:28:47 GMT Subject: Integrated: 8336106: Genshen: Fix use of missing API in Shenandoah Old Heuristic test In-Reply-To: References: Message-ID: On Wed, 10 Jul 2024 16:17:40 GMT, William Kemper wrote: > The recent fixes for the Shenandoah Old Heuristic test used an API that hasn't been backported. This change uses a different API available in 21 to achieve the same end. Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/shenandoah-jdk21u/pull/65#pullrequestreview-2169784648 From wkemper at openjdk.org Wed Jul 10 20:28:51 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 10 Jul 2024 20:28:51 GMT Subject: Integrated: 8336106: Genshen: Fix use of missing API in Shenandoah Old Heuristic test In-Reply-To: References: Message-ID: On Wed, 10 Jul 2024 16:17:40 GMT, William Kemper wrote: > The recent fixes for the Shenandoah Old Heuristic test used an API that hasn't been backported. This change uses a different API available in 21 to achieve the same end. This pull request has now been integrated. Changeset: 596fccf7 Author: William Kemper URL: https://git.openjdk.org/shenandoah-jdk21u/commit/596fccf7bd65d897d7a6dd5aa95d0b13e68b0e6c Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8336106: Genshen: Fix use of missing API in Shenandoah Old Heuristic test Reviewed-by: kdnilsen ------------- PR: https://git.openjdk.org/shenandoah-jdk21u/pull/65 From xpeng at openjdk.org Thu Jul 11 06:22:56 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 11 Jul 2024 06:22:56 GMT Subject: RFR: 8335356: Shenandoah: Improve concurrent cleanup locking In-Reply-To: References: Message-ID: On Mon, 8 Jul 2024 21:38:41 GMT, Xiaolong Peng wrote: > Hi all, > This PR is to improve the usage of heap lock in ShenandoahFreeSet::recycle_trash, the original observation mentioned in the [bug](https://bugs.openjdk.org/browse/JDK-8335356) should be caused by an uncommitted/reverted change I added when Aleksey and I worked on [JDK-8331411](https://bugs.openjdk.org/browse/JDK-8331411). Even the change was not committed, the way ShenandoahFreeSet::recycle_trash using heap lock is still not efficient, we think we should improve it. > With the logs added in this commit https://github.com/openjdk/jdk/pull/20086/commits/5688ee2c0754483818a89bb7915f58a7464c2df2, I got some key metrics: average time to acquire heap lock is about 450 ~ 900 ns, average time to recycle one trash regions is about 600ns (if not batched, it is 1000+ns, might be related to branch prediction). The current implementation takes heap lock once for every trash region, assume there are 1000 regions to recycle, the time wasted on acquiring heap lock is more than 0.6ms. > > The PR splits the recycling process into two steps: 1. Filter out all the trash regions; 2. recycle the trash regions in batches. I can see some benefits which improve the performance: > 1. Less time spent on acquiring heap lock, less contention with mutators/allocators > 2.Simpler loops in filtering and batch recycling, presumably benefit CPU branch prediction > > Here are some logs from test running h2 benchmark: > > TIP with debug log, [code link](https://github.com/openjdk/jdk/compare/master...pengxiaolong:jdk:JDK-8335356-baseline?expand=1), Average time per region: 2312 ns > > [6.013s][info][gc] GC(0) Recycled 0 regions in 58675ns, break down: acquiring lock -> 0, recycling -> 0. > [6.093s][info][gc] GC(0) Recycled 641 regions in 3025757ns, break down: acquiring lock -> 260016, recycling -> 548345. > [9.354s][info][gc] GC(1) Recycled 1 regions in 61793ns, break down: acquiring lock -> 481, recycling -> 1141. > [9.428s][info][gc] GC(1) Recycled 600 regions in 1083206ns, break down: acquiring lock -> 256578, recycling -> 511334. > [12.145s][info][gc] GC(2) Recycled 35 regions in 118390ns, break down: acquiring lock -> 13703, recycling -> 27438. > [12.202s][info][gc] GC(2) Recycled 553 regions in 911747ns, break down: acquiring lock -> 209511, recycling -> 426575. > [15.086s][info][gc] GC(3) Recycled 106 regions in 218396ns, break down: acquiring lock -> 39089, recycling -> 80520. > [15.164s][info][gc] GC(3) Recycled 454 regions in 762128ns, break down: acquiring lock -> 172583, recycl... Based on Aleksey's benchmark, I wrote a very [simple benchmark](https://github.com/pengxiaolong/benchmarks/blob/main/allocation-latency/src/main/java/personal/xlpeng/benchmarks/allocationlatency/AllocationLatency.java) to generate HdrHistogram, run command like below to generate HdrHistogram metrics: export JAVA_HOME=/home/xlpeng/repos/jdk-xlpeng/optimized-timed-lock export JAVA_OPTS="-Xms30g -Xmx30g -XX:+AlwaysPreTouch -XX:+UseShenandoahGC" ./build/distributions/allocation-latency/bin/allocation-latency export JAVA_HOME=/home/xlpeng/repos/jdk-xlpeng/baseline ./build/distributions/allocation-latency/bin/allocation-latency Here is the HdrHistogram: ![Histogram](https://github.com/openjdk/jdk/assets/2170530/fe52c38e-6c30-442c-b4c8-2c7ee06c5278) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20086#issuecomment-2222121679 From xpeng at openjdk.org Thu Jul 11 08:50:01 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 11 Jul 2024 08:50:01 GMT Subject: Integrated: 8335356: Shenandoah: Improve concurrent cleanup locking In-Reply-To: References: Message-ID: On Mon, 8 Jul 2024 21:38:41 GMT, Xiaolong Peng wrote: > Hi all, > This PR is to improve the usage of heap lock in ShenandoahFreeSet::recycle_trash, the original observation mentioned in the [bug](https://bugs.openjdk.org/browse/JDK-8335356) should be caused by an uncommitted/reverted change I added when Aleksey and I worked on [JDK-8331411](https://bugs.openjdk.org/browse/JDK-8331411). Even the change was not committed, the way ShenandoahFreeSet::recycle_trash using heap lock is still not efficient, we think we should improve it. > With the logs added in this commit https://github.com/openjdk/jdk/pull/20086/commits/5688ee2c0754483818a89bb7915f58a7464c2df2, I got some key metrics: average time to acquire heap lock is about 450 ~ 900 ns, average time to recycle one trash regions is about 600ns (if not batched, it is 1000+ns, might be related to branch prediction). The current implementation takes heap lock once for every trash region, assume there are 1000 regions to recycle, the time wasted on acquiring heap lock is more than 0.6ms. > > The PR splits the recycling process into two steps: 1. Filter out all the trash regions; 2. recycle the trash regions in batches. I can see some benefits which improve the performance: > 1. Less time spent on acquiring heap lock, less contention with mutators/allocators > 2.Simpler loops in filtering and batch recycling, presumably benefit CPU branch prediction > > Here are some logs from test running h2 benchmark: > > TIP with debug log, [code link](https://github.com/openjdk/jdk/compare/master...pengxiaolong:jdk:JDK-8335356-baseline?expand=1), Average time per region: 2312 ns > > [6.013s][info][gc] GC(0) Recycled 0 regions in 58675ns, break down: acquiring lock -> 0, recycling -> 0. > [6.093s][info][gc] GC(0) Recycled 641 regions in 3025757ns, break down: acquiring lock -> 260016, recycling -> 548345. > [9.354s][info][gc] GC(1) Recycled 1 regions in 61793ns, break down: acquiring lock -> 481, recycling -> 1141. > [9.428s][info][gc] GC(1) Recycled 600 regions in 1083206ns, break down: acquiring lock -> 256578, recycling -> 511334. > [12.145s][info][gc] GC(2) Recycled 35 regions in 118390ns, break down: acquiring lock -> 13703, recycling -> 27438. > [12.202s][info][gc] GC(2) Recycled 553 regions in 911747ns, break down: acquiring lock -> 209511, recycling -> 426575. > [15.086s][info][gc] GC(3) Recycled 106 regions in 218396ns, break down: acquiring lock -> 39089, recycling -> 80520. > [15.164s][info][gc] GC(3) Recycled 454 regions in 762128ns, break down: acquiring lock -> 172583, recycl... This pull request has now been integrated. Changeset: b32e4a68 Author: Xiaolong Peng Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/b32e4a68bca588d908bd81a398eb3171a6876dc5 Stats: 18 lines in 2 files changed: 14 ins; 1 del; 3 mod 8335356: Shenandoah: Improve concurrent cleanup locking Reviewed-by: ysr, shade ------------- PR: https://git.openjdk.org/jdk/pull/20086 From kdnilsen at openjdk.org Thu Jul 11 23:05:24 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Jul 2024 23:05:24 GMT Subject: Integrated: 8335930: GenShen: Reserve regions within each generation's freeset until available is sufficient In-Reply-To: References: Message-ID: On Mon, 8 Jul 2024 21:30:23 GMT, Kelvin Nilsen wrote: > Reserve regions until available memory within each generation's partition is sufficient to satisfy reserve request. This pull request has now been integrated. Changeset: 472e4c74 Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/472e4c74bde1c0ceb76755165e709cefa4fef584 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8335930: GenShen: Reserve regions within each generation's freeset until available is sufficient Reviewed-by: wkemper ------------- PR: https://git.openjdk.org/shenandoah/pull/457 From kdnilsen at openjdk.org Thu Jul 11 23:25:35 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 11 Jul 2024 23:25:35 GMT Subject: RFR: 8327000: GenShen: Integrate updated Shenandoah implementation of FreeSet into GenShen [v2] In-Reply-To: References: Message-ID: > This pull request contains a backport of commit 9d2712dd from the openjdk/shenandoah repository. > > The commit being backported was authored by Kelvin Nilsen on 26 Jun 2024 and was reviewed by William Kemper and Y. Srinivas Ramakrishna. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into backport-kdnilsen-9d2712dd-master - Backport 9d2712ddf39160a618c9c839c46d2510063f45df ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk21u/pull/62/files - new: https://git.openjdk.org/shenandoah-jdk21u/pull/62/files/7237da32..5b6a8543 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=62&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=62&range=00-01 Stats: 38 lines in 3 files changed: 26 ins; 7 del; 5 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/62.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/62/head:pull/62 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/62 From wkemper at openjdk.org Fri Jul 12 15:19:13 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 12 Jul 2024 15:19:13 GMT Subject: RFR: Merge openjdk/jdk21u-dev:master [v2] In-Reply-To: References: Message-ID: > Merges tag jdk-21.0.4+6 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/shenandoah-jdk21u/pull/61/files - new: https://git.openjdk.org/shenandoah-jdk21u/pull/61/files/a0b3c6cd..a0b3c6cd Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=61&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=61&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/61.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/61/head:pull/61 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/61 From wkemper at openjdk.org Fri Jul 12 15:19:13 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 12 Jul 2024 15:19:13 GMT Subject: Integrated: Merge openjdk/jdk21u-dev:master In-Reply-To: References: Message-ID: On Thu, 27 Jun 2024 14:17:09 GMT, William Kemper wrote: > Merges tag jdk-21.0.4+6 This pull request has now been integrated. Changeset: 645a4996 Author: William Kemper URL: https://git.openjdk.org/shenandoah-jdk21u/commit/645a49960260b3f945b20e4c270cb6b953d378e4 Stats: 156 lines in 8 files changed: 5 ins; 0 del; 151 mod Merge ------------- PR: https://git.openjdk.org/shenandoah-jdk21u/pull/61 From wkemper at openjdk.org Fri Jul 12 16:13:32 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 12 Jul 2024 16:13:32 GMT Subject: RFR: 8334491: GenShen: Revert changes to Shenandoah defaults Message-ID: <9yoShGWRixxQv9qnZyAyN9oF7zehFPiUp6r1A8UbVfQ=.3a9fa3ad-be2b-4b81-9131-4afb9c2bdf87@github.com> Clean backport. ------------- Commit messages: - Backport 744fc265aafe2fd59c2ca53c798ee930119f5d8d Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/66/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=66&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334491 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/66.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/66/head:pull/66 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/66 From wkemper at openjdk.org Fri Jul 12 16:21:33 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 12 Jul 2024 16:21:33 GMT Subject: RFR: 8335347: GenShen: Revert change that has adaptive heuristic ignore abbreviated cycles Message-ID: Trivial conflict with not-yet-backported improvements to OOM handling. ------------- Commit messages: - Remove part of upstream change picked up in cherry-pick - Backport 7542e909d1b4fb8a9a29ab5caf61a6be792f0f03 Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/67/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=67&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335347 Stats: 26 lines in 10 files changed: 1 ins; 14 del; 11 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/67.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/67/head:pull/67 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/67 From wkemper at openjdk.org Fri Jul 12 17:22:01 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 12 Jul 2024 17:22:01 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: <4u_CVXx1SmkbJR8UG4lLfkkg_3-Uai0oJTqhHn1_fSk=.3064d970-3323-4029-a5a3-beeea8c10feb@github.com> Merges tag jdk-24+6 ------------- Commit messages: - 8335946: DTrace code snippets should be generated when DTrace flags are enabled - 8335743: jhsdb jstack cannot print some information on the waiting thread - 8335935: Chained builders not sending transformed models to next transforms - 8334481: [JVMCI] add LINK_TO_NATIVE to MethodHandleAccessProvider.IntrinsicMethod - 8335637: Add explicit non-null return value expectations to Object.toString() - 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 - 8331725: ubsan: pc may not always be the entry point for a VtableStub - 8313909: [JVMCI] assert(cp->tag_at(index).is_unresolved_klass()) in lookupKlassInPool - 8336012: Fix usages of jtreg-reserved properties - 8335779: JFR: Hide sleep events - ... and 372 more: https://git.openjdk.org/shenandoah/compare/d8af5894...b363de8c The webrev contains the conflicts with master: - merge conflicts: https://webrevs.openjdk.org/?repo=shenandoah&pr=460&range=00.conflicts Changes: https://git.openjdk.org/shenandoah/pull/460/files Stats: 67266 lines in 1687 files changed: 41791 ins; 18146 del; 7329 mod Patch: https://git.openjdk.org/shenandoah/pull/460.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/460/head:pull/460 PR: https://git.openjdk.org/shenandoah/pull/460 From wkemper at openjdk.org Fri Jul 12 20:34:08 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 12 Jul 2024 20:34:08 GMT Subject: Integrated: 8334491: GenShen: Revert changes to Shenandoah defaults In-Reply-To: <9yoShGWRixxQv9qnZyAyN9oF7zehFPiUp6r1A8UbVfQ=.3a9fa3ad-be2b-4b81-9131-4afb9c2bdf87@github.com> References: <9yoShGWRixxQv9qnZyAyN9oF7zehFPiUp6r1A8UbVfQ=.3a9fa3ad-be2b-4b81-9131-4afb9c2bdf87@github.com> Message-ID: <3Sn7r0Xspdu9gCnJ4G-paMLjsOK4YplG0-pDnX5rljg=.9ce6eb5c-0715-44c0-9cff-7071b4299852@github.com> On Fri, 12 Jul 2024 16:07:49 GMT, William Kemper wrote: > Clean backport. This pull request has now been integrated. Changeset: 8e2bc918 Author: William Kemper URL: https://git.openjdk.org/shenandoah-jdk21u/commit/8e2bc918c38b17bf8383a441bf282ec9098b3287 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8334491: GenShen: Revert changes to Shenandoah defaults Backport-of: 744fc265aafe2fd59c2ca53c798ee930119f5d8d ------------- PR: https://git.openjdk.org/shenandoah-jdk21u/pull/66 From wkemper at openjdk.org Fri Jul 12 20:35:17 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 12 Jul 2024 20:35:17 GMT Subject: Integrated: 8335347: GenShen: Revert change that has adaptive heuristic ignore abbreviated cycles In-Reply-To: References: Message-ID: On Fri, 12 Jul 2024 16:15:24 GMT, William Kemper wrote: > Trivial conflict with not-yet-backported improvements to OOM handling. This pull request has now been integrated. Changeset: acb6b5f1 Author: William Kemper URL: https://git.openjdk.org/shenandoah-jdk21u/commit/acb6b5f10e28090dd2987c7f96370f463ea4a69b Stats: 26 lines in 10 files changed: 1 ins; 14 del; 11 mod 8335347: GenShen: Revert change that has adaptive heuristic ignore abbreviated cycles Backport-of: 7542e909d1b4fb8a9a29ab5caf61a6be792f0f03 ------------- PR: https://git.openjdk.org/shenandoah-jdk21u/pull/67 From wkemper at openjdk.org Fri Jul 12 22:49:30 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 12 Jul 2024 22:49:30 GMT Subject: RFR: Merge openjdk/jdk:master [v2] In-Reply-To: <4u_CVXx1SmkbJR8UG4lLfkkg_3-Uai0oJTqhHn1_fSk=.3064d970-3323-4029-a5a3-beeea8c10feb@github.com> References: <4u_CVXx1SmkbJR8UG4lLfkkg_3-Uai0oJTqhHn1_fSk=.3064d970-3323-4029-a5a3-beeea8c10feb@github.com> Message-ID: > Merges tag jdk-24+6 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 383 commits: - Merge tag 'jdk-24+6' into merge-jdk-24+6 Added tag jdk-24+6 for changeset b363de8c - 8335946: DTrace code snippets should be generated when DTrace flags are enabled Reviewed-by: coleenp, dholmes - 8335743: jhsdb jstack cannot print some information on the waiting thread Reviewed-by: dholmes, cjplummer, kevinw - 8335935: Chained builders not sending transformed models to next transforms Reviewed-by: asotona - 8334481: [JVMCI] add LINK_TO_NATIVE to MethodHandleAccessProvider.IntrinsicMethod Reviewed-by: dnsimon - 8335637: Add explicit non-null return value expectations to Object.toString() Reviewed-by: jpai, alanb, smarks, prappo - 8335409: Can't allocate and retain memory from resource area in frame::oops_interpreted_do oop closure after 8329665 Reviewed-by: dholmes, stuefe, coleenp, shade - 8331725: ubsan: pc may not always be the entry point for a VtableStub Reviewed-by: kvn, mbaesken - 8313909: [JVMCI] assert(cp->tag_at(index).is_unresolved_klass()) in lookupKlassInPool Reviewed-by: yzheng, never - 8336012: Fix usages of jtreg-reserved properties Reviewed-by: jjg - ... and 373 more: https://git.openjdk.org/shenandoah/compare/472e4c74...24e94f53 ------------- Changes: https://git.openjdk.org/shenandoah/pull/460/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=460&range=01 Stats: 67311 lines in 1691 files changed: 41794 ins; 18185 del; 7332 mod Patch: https://git.openjdk.org/shenandoah/pull/460.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/460/head:pull/460 PR: https://git.openjdk.org/shenandoah/pull/460 From ysr at openjdk.org Sat Jul 13 01:04:34 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 13 Jul 2024 01:04:34 GMT Subject: RFR: 8328235: GenShen: Robustify ShenandoahGCSession and fix missing use Message-ID: 8328235: GenShen: Robustify ShenandoahGCSession and fix missing use ------------- Commit messages: - Backport d2102347ea9c1199221ec33f4e721aefa1193cea Changes: https://git.openjdk.org/shenandoah-jdk21u/pull/68/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah-jdk21u&pr=68&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8328235 Stats: 169 lines in 16 files changed: 135 ins; 1 del; 33 mod Patch: https://git.openjdk.org/shenandoah-jdk21u/pull/68.diff Fetch: git fetch https://git.openjdk.org/shenandoah-jdk21u.git pull/68/head:pull/68 PR: https://git.openjdk.org/shenandoah-jdk21u/pull/68 From kdnilsen at openjdk.org Sat Jul 13 21:25:05 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 13 Jul 2024 21:25:05 GMT Subject: RFR: 8328235: GenShen: Robustify ShenandoahGCSession and fix missing use In-Reply-To: References: Message-ID: On Sat, 13 Jul 2024 00:59:12 GMT, Y. Srinivas Ramakrishna wrote: > Clean backport of commit [d2102347](https://github.com/openjdk/shenandoah/commit/d2102347ea9c1199221ec33f4e721aefa1193cea) from the [openjdk/shenandoah](https://git.openjdk.org/shenandoah) repository. > > The commit being backported was authored by Y. Srinivas Ramakrishna on 2 Jul 2024 and was reviewed by William Kemper. > > **Testing:** *in progress, will be updated when completed* > - [ ] Code pipeline testing > - [x] jtreg local > - [x] [GHA](https://github.com/openjdk-bots/shenandoah-jdk21u/actions/runs/9915888290) Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/shenandoah-jdk21u/pull/68#pullrequestreview-2176549175 From kdnilsen at openjdk.org Mon Jul 15 14:29:43 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 15 Jul 2024 14:29:43 GMT Subject: RFR: 8325673: GenShen: Share Reserves between Old and Young Collector [v2] In-Reply-To: References: Message-ID: > Allow young-gen Collector reserve to share memory with old-gen Collector reserve in order to support prompt processing of mixed evacuations, as constrained by ShenandoahOldEvacRatioPercent. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 70 commits: - Remove unreferenced local variable - Fix whitespace - Remove debug instrumentation and deprecated code - Turn off instrumentation - Verifier should only count non-trashed committed regions - Ignore generation soft capacities when adjusting generation sizes Soft capacities are established, for example, by setting NewRatio or New size on the JVM command line. GenShen, for now at least, does not honor these settings. Better performance is obtained by allowing GenShen to expand and shrink generation sizes according to application behavior. This commit also tidies up various aspects of the implementation to make adjustments to generation sizing more consistent: 1. ShenandoahGlobalHeuristics::choose_global_collection_set(): share the reserves between young and old collection to maximize evacuation of garbage-first regions, regardless of whether most garbage is found in old or young 2. ShenandoahConcurrentGC::entry_final_roots(): do not balance generations before invoking finish_rebuild() because finish_rebuild will balance generations. 3. ShenandoahFreeSet::flip_to_old_gc(): invoke force_transfer_to_young() instead of transfer_to_young() so we can override soft-capacity limits 4. ShenandoahFullGC::phase5_epilog(): Do not invoke compute_balances() or balance_generations_after_rebuilding_free_set(). Allow the free-set rebuild() implementation to do this work in a more consistent fashion. 5. ShenandoahGeneration::adjust_evacuation_budgets(): replace transfer_to_youn() with force_transfer_to_young() to avoid enforcement of soft capacity limits. 6. ShenandoahGenerationSizer::force_transfer_to_young(): new method 7. ShenandoahGenerationalFullGC::balance_generations_after_gc(): establish reserves() so that free-set rebuild() can adjust balance. Do not redundantly force transfer of regions here. 8. ShenandoahGenerationalFullGC::balance_generations_after_rebuilding_free_set(): deprecate this method. 9. ShenandoahGenerationalFullGC::compute_balances(): deprecate this method. 10. ShenandoahGenerationaStatsClosure::validate_usage() (part of Shenandoah Verification): add consistency check for generation capacities - Fix budgeting error during freeset rebuild Limit the size of old-gen by memory available in the OldCollector set following find_regions_with_alloc_capacity() (rather than limiting the size of old-gen by the total capacity of the OldCollector set, which includes used memory). - Fix whitespace - Merge remote-tracking branch 'origin/master' into share-collector-reserves - Merge branch 'openjdk:master' into master - ... and 60 more: https://git.openjdk.org/shenandoah/compare/d2102347...33eacea7 ------------- Changes: https://git.openjdk.org/shenandoah/pull/395/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=395&range=01 Stats: 1318 lines in 23 files changed: 832 ins; 240 del; 246 mod Patch: https://git.openjdk.org/shenandoah/pull/395.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/395/head:pull/395 PR: https://git.openjdk.org/shenandoah/pull/395 From kdnilsen at openjdk.org Mon Jul 15 14:29:43 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 15 Jul 2024 14:29:43 GMT Subject: RFR: 8325673: GenShen: Share Reserves between Old and Young Collector In-Reply-To: References: Message-ID: On Mon, 12 Feb 2024 18:22:42 GMT, Kelvin Nilsen wrote: > Allow young-gen Collector reserve to share memory with old-gen Collector reserve in order to support prompt processing of mixed evacuations, as constrained by ShenandoahOldEvacRatioPercent. After merging, I'm experiencing many unsuccessful GHA tests. These were apparently introduced by the merge. ------------- PR Comment: https://git.openjdk.org/shenandoah/pull/395#issuecomment-2189646618 From wkemper at openjdk.org Mon Jul 15 16:40:23 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 15 Jul 2024 16:40:23 GMT Subject: RFR: 8325673: GenShen: Share Reserves between Old and Young Collector [v2] In-Reply-To: References:

Message-ID: On Mon, 15 Jul 2024 14:29:43 GMT, Kelvin Nilsen wrote: >> Allow young-gen Collector reserve to share memory with old-gen Collector reserve in order to support prompt processing of mixed evacuations, as constrained by ShenandoahOldEvacRatioPercent. > > Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 70 commits: > > - Remove unreferenced local variable > - Fix whitespace > - Remove debug instrumentation and deprecated code > - Turn off instrumentation > - Verifier should only count non-trashed committed regions > - Ignore generation soft capacities when adjusting generation sizes > > Soft capacities are established, for example, by setting NewRatio or New size on the > JVM command line. GenShen, for now at least, does not honor these settings. Better > performance is obtained by allowing GenShen to expand and shrink generation sizes > according to application behavior. > > This commit also tidies up various aspects of the implementation to make adjustments > to generation sizing more consistent: > > 1. ShenandoahGlobalHeuristics::choose_global_collection_set(): share the reserves > between young and old collection to maximize evacuation of garbage-first > regions, regardless of whether most garbage is found in old or young > 2. ShenandoahConcurrentGC::entry_final_roots(): do not balance generations before > invoking finish_rebuild() because finish_rebuild will balance generations. > 3. ShenandoahFreeSet::flip_to_old_gc(): invoke force_transfer_to_young() instead > of transfer_to_young() so we can override soft-capacity limits > 4. ShenandoahFullGC::phase5_epilog(): Do not invoke compute_balances() or > balance_generations_after_rebuilding_free_set(). Allow the free-set > rebuild() implementation to do this work in a more consistent fashion. > 5. ShenandoahGeneration::adjust_evacuation_budgets(): replace transfer_to_youn() > with force_transfer_to_young() to avoid enforcement of soft capacity limits. > 6. ShenandoahGenerationSizer::force_transfer_to_young(): new method > 7. ShenandoahGenerationalFullGC::balance_generations_after_gc(): establish > reserves() so that free-set rebuild() can adjust balance. Do not redundantly > force transfer of regions here. > 8. ShenandoahGenerationalFullGC::balance_generations_after_rebuilding_free_set(): > deprecate this method. > 9. ShenandoahGenerationalFullGC::compute_balances(): deprecate this method. > 10. ShenandoahGenerationaStatsClosure::validate_usage() (part of Shenandoah > Verification): add consistency check for generation capacities > - Fix budgeting error during freeset rebuild > > Limit the size of old-gen by memory av... Changes requested by wkemper (Committer). src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.cpp line 179: > 177: } > 178: > 179: void ShenandoahOldHeuristics::initialize_piggyback_evacs(ShenandoahCollectionSet* collection_set, Can we continue using `mixed` instead of `piggyback` to describe collections that include young and old regions? I think `mixed` is an accepted term and `piggyback` feels a little bit too colloquial (also not sure if the phrase is commonly known outside of English). src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.cpp line 185: > 183: size_t &unfragmented_available, > 184: size_t &fragmented_available, > 185: size_t &excess_fragmented_available) { Should all of these reference parameters be members? The API feels complicated now with these three methods passing the same parameters to each other: initialize_piggyback_evacs(...) prime_collection_set(...) finalize_piggyback_evacs(...) Could everything just happen within `prime_collection_set` without so many changes to exiting callers? ------------- PR Review: https://git.openjdk.org/shenandoah/pull/395#pullrequestreview-2178189515 PR Review Comment: https://git.openjdk.org/shenandoah/pull/395#discussion_r1678109072 PR Review Comment: https://git.openjdk.org/shenandoah/pull/395#discussion_r1678113040 From kdnilsen at openjdk.org Tue Jul 16 03:35:33 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 16 Jul 2024 03:35:33 GMT Subject: RFR: 8325673: GenShen: Share Reserves between Old and Young Collector [v3] In-Reply-To: References: Message-ID: > Allow young-gen Collector reserve to share memory with old-gen Collector reserve in order to support prompt processing of mixed evacuations, as constrained by ShenandoahOldEvacRatioPercent. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Remove declaration of unused variables ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/395/files - new: https://git.openjdk.org/shenandoah/pull/395/files/33eacea7..3e38c8a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=395&range=02 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=395&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/395.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/395/head:pull/395 PR: https://git.openjdk.org/shenandoah/pull/395 From ysr at openjdk.org Tue Jul 16 17:09:16 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 16 Jul 2024 17:09:16 GMT Subject: RFR: 8328235: GenShen: Robustify ShenandoahGCSession and fix missing use In-Reply-To: References: Message-ID: <0IuOTkh9pOsqcITmXLSJPXFwVCqh_zhA50MuvYXDdww=.7607596e-92dc-4f00-b95f-3c9847d249ff@github.com> On Sat, 13 Jul 2024 00:59:12 GMT, Y. Srinivas Ramakrishna wrote: > Clean backport of commit [d2102347](https://github.com/openjdk/shenandoah/commit/d2102347ea9c1199221ec33f4e721aefa1193cea) from the [openjdk/shenandoah](https://git.openjdk.org/shenandoah) repository. > > The commit being backported was authored by Y. Srinivas Ramakrishna on 2 Jul 2024 and was reviewed by William Kemper. > > **Testing:** > - [x] Code pipeline testing -- failure orthogonal to the changes here and being independently investigated > - [x] jtreg local > - [x] [GHA](https://github.com/openjdk-bots/shenandoah-jdk21u/actions/runs/9915888290) Thanks Kelvin. ------------- PR Comment: https://git.openjdk.org/shenandoah-jdk21u/pull/68#issuecomment-2231416298 From ysr at openjdk.org Tue Jul 16 17:09:17 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 16 Jul 2024 17:09:17 GMT Subject: Integrated: 8328235: GenShen: Robustify ShenandoahGCSession and fix missing use In-Reply-To: References: Message-ID: On Sat, 13 Jul 2024 00:59:12 GMT, Y. Srinivas Ramakrishna wrote: > Clean backport of commit [d2102347](https://github.com/openjdk/shenandoah/commit/d2102347ea9c1199221ec33f4e721aefa1193cea) from the [openjdk/shenandoah](https://git.openjdk.org/shenandoah) repository. > > The commit being backported was authored by Y. Srinivas Ramakrishna on 2 Jul 2024 and was reviewed by William Kemper. > > **Testing:** > - [x] Code pipeline testing -- failure orthogonal to the changes here and being independently investigated > - [x] jtreg local > - [x] [GHA](https://github.com/openjdk-bots/shenandoah-jdk21u/actions/runs/9915888290) This pull request has now been integrated. Changeset: 80e796b4 Author: Y. Srinivas Ramakrishna URL: https://git.openjdk.org/shenandoah-jdk21u/commit/80e796b4974ba6f6c18eafc179b5b5196582084f Stats: 169 lines in 16 files changed: 135 ins; 1 del; 33 mod 8328235: GenShen: Robustify ShenandoahGCSession and fix missing use Reviewed-by: kdnilsen Backport-of: d2102347ea9c1199221ec33f4e721aefa1193cea ------------- PR: https://git.openjdk.org/shenandoah-jdk21u/pull/68 From kdnilsen at openjdk.org Tue Jul 16 20:03:35 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 16 Jul 2024 20:03:35 GMT Subject: RFR: 8325673: GenShen: Share Reserves between Old and Young Collector [v2] In-Reply-To: References:

Message-ID: On Mon, 15 Jul 2024 16:32:59 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 70 commits: >> >> - Remove unreferenced local variable >> - Fix whitespace >> - Remove debug instrumentation and deprecated code >> - Turn off instrumentation >> - Verifier should only count non-trashed committed regions >> - Ignore generation soft capacities when adjusting generation sizes >> >> Soft capacities are established, for example, by setting NewRatio or New size on the >> JVM command line. GenShen, for now at least, does not honor these settings. Better >> performance is obtained by allowing GenShen to expand and shrink generation sizes >> according to application behavior. >> >> This commit also tidies up various aspects of the implementation to make adjustments >> to generation sizing more consistent: >> >> 1. ShenandoahGlobalHeuristics::choose_global_collection_set(): share the reserves >> between young and old collection to maximize evacuation of garbage-first >> regions, regardless of whether most garbage is found in old or young >> 2. ShenandoahConcurrentGC::entry_final_roots(): do not balance generations before >> invoking finish_rebuild() because finish_rebuild will balance generations. >> 3. ShenandoahFreeSet::flip_to_old_gc(): invoke force_transfer_to_young() instead >> of transfer_to_young() so we can override soft-capacity limits >> 4. ShenandoahFullGC::phase5_epilog(): Do not invoke compute_balances() or >> balance_generations_after_rebuilding_free_set(). Allow the free-set >> rebuild() implementation to do this work in a more consistent fashion. >> 5. ShenandoahGeneration::adjust_evacuation_budgets(): replace transfer_to_youn() >> with force_transfer_to_young() to avoid enforcement of soft capacity limits. >> 6. ShenandoahGenerationSizer::force_transfer_to_young(): new method >> 7. ShenandoahGenerationalFullGC::balance_generations_after_gc(): establish >> reserves() so that free-set rebuild() can adjust balance. Do not redundantly >> force transfer of regions here. >> 8. ShenandoahGenerationalFullGC::balance_generations_after_rebuilding_free_set(): >> deprecate this method. >> 9. ShenandoahGenerationalFullGC::compute_balances(): deprecate this method. >> 10. ShenandoahGenerationaStatsClosure::validate_usage() (part of Shenandoah >> Verification): add consistency check for generation capacities >> - Fix budget... > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.cpp line 179: > >> 177: } >> 178: >> 179: void ShenandoahOldHeuristics::initialize_piggyback_evacs(ShenandoahCollectionSet* collection_set, > > Can we continue using `mixed` instead of `piggyback` to describe collections that include young and old regions? I think `mixed` is an accepted term and `piggyback` feels a little bit too colloquial (also not sure if the phrase is commonly known outside of English). Good suggestion. Thanks. I'm replacing piggyback with mixed throughout (where piggyback means mixed evac). There are a few places where piggyback in Shenandoah where piggyback is used for other purposes and I'm leaving those as is. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/395#discussion_r1680001115 From kdnilsen at openjdk.org Tue Jul 16 23:22:22 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 16 Jul 2024 23:22:22 GMT Subject: RFR: 8325673: GenShen: Share Reserves between Old and Young Collector [v2] In-Reply-To: References:

Message-ID: On Mon, 15 Jul 2024 16:36:38 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 70 commits: >> >> - Remove unreferenced local variable >> - Fix whitespace >> - Remove debug instrumentation and deprecated code >> - Turn off instrumentation >> - Verifier should only count non-trashed committed regions >> - Ignore generation soft capacities when adjusting generation sizes >> >> Soft capacities are established, for example, by setting NewRatio or New size on the >> JVM command line. GenShen, for now at least, does not honor these settings. Better >> performance is obtained by allowing GenShen to expand and shrink generation sizes >> according to application behavior. >> >> This commit also tidies up various aspects of the implementation to make adjustments >> to generation sizing more consistent: >> >> 1. ShenandoahGlobalHeuristics::choose_global_collection_set(): share the reserves >> between young and old collection to maximize evacuation of garbage-first >> regions, regardless of whether most garbage is found in old or young >> 2. ShenandoahConcurrentGC::entry_final_roots(): do not balance generations before >> invoking finish_rebuild() because finish_rebuild will balance generations. >> 3. ShenandoahFreeSet::flip_to_old_gc(): invoke force_transfer_to_young() instead >> of transfer_to_young() so we can override soft-capacity limits >> 4. ShenandoahFullGC::phase5_epilog(): Do not invoke compute_balances() or >> balance_generations_after_rebuilding_free_set(). Allow the free-set >> rebuild() implementation to do this work in a more consistent fashion. >> 5. ShenandoahGeneration::adjust_evacuation_budgets(): replace transfer_to_youn() >> with force_transfer_to_young() to avoid enforcement of soft capacity limits. >> 6. ShenandoahGenerationSizer::force_transfer_to_young(): new method >> 7. ShenandoahGenerationalFullGC::balance_generations_after_gc(): establish >> reserves() so that free-set rebuild() can adjust balance. Do not redundantly >> force transfer of regions here. >> 8. ShenandoahGenerationalFullGC::balance_generations_after_rebuilding_free_set(): >> deprecate this method. >> 9. ShenandoahGenerationalFullGC::compute_balances(): deprecate this method. >> 10. ShenandoahGenerationaStatsClosure::validate_usage() (part of Shenandoah >> Verification): add consistency check for generation capacities >> - Fix budget... > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.cpp line 185: > >> 183: size_t &unfragmented_available, >> 184: size_t &fragmented_available, >> 185: size_t &excess_fragmented_available) { > > Should all of these reference parameters be members? The API feels complicated now with these three methods passing the same parameters to each other: > > > initialize_piggyback_evacs(...) > > prime_collection_set(...) > > finalize_piggyback_evacs(...) > > > Could everything just happen within `prime_collection_set` without so many changes to exiting callers? Thanks. Good suggestion. Am pursuing this. Looks much cleaner. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/395#discussion_r1680161701 From kdnilsen at openjdk.org Tue Jul 16 23:58:41 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 16 Jul 2024 23:58:41 GMT Subject: RFR: 8325673: GenShen: Share Reserves between Old and Young Collector [v4] In-Reply-To: References: Message-ID: > Allow young-gen Collector reserve to share memory with old-gen Collector reserve in order to support prompt processing of mixed evacuations, as constrained by ShenandoahOldEvacRatioPercent. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Simplify arguments by using instance variables in ShenandoahOldHeuristics - Use mixed evac rather than piggyback to describe old-gen evacuations ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/395/files - new: https://git.openjdk.org/shenandoah/pull/395/files/3e38c8a3..406d347b Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=395&range=03 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=395&range=02-03 Stats: 447 lines in 4 files changed: 51 ins; 310 del; 86 mod Patch: https://git.openjdk.org/shenandoah/pull/395.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/395/head:pull/395 PR: https://git.openjdk.org/shenandoah/pull/395 From kdnilsen at openjdk.org Wed Jul 17 00:35:34 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 17 Jul 2024 00:35:34 GMT Subject: RFR: 8325673: GenShen: Share Reserves between Old and Young Collector [v5] In-Reply-To: References: Message-ID: <9-h8ac6CrQd08gP98fVB4BYMU1Z373uigTMO_C9l6Ls=.8563b787-6d5c-4006-9637-a8a205544aa9@github.com> > Allow young-gen Collector reserve to share memory with old-gen Collector reserve in order to support prompt processing of mixed evacuations, as constrained by ShenandoahOldEvacRatioPercent. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Remove unreferenced variables ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/395/files - new: https://git.openjdk.org/shenandoah/pull/395/files/406d347b..ee2ab01d Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=395&range=04 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=395&range=03-04 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/395.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/395/head:pull/395 PR: https://git.openjdk.org/shenandoah/pull/395 From kemperw at amazon.com Wed Jul 17 23:14:52 2024 From: kemperw at amazon.com (Kemper, William) Date: Wed, 17 Jul 2024 23:14:52 +0000 Subject: Proposal to remove the experimental incremental update mode from Shenandoah Message-ID: <66706abc9ae64a29a46b8f90b3ef4f6f@amazon.com> https://bugs.openjdk.org/browse/JDK-8336685 If there are no strong objections to this, we will remove this mode in JDK24. Thank you, William -------------- next part -------------- An HTML attachment was scrubbed... URL: From shipilev at amazon.de Thu Jul 18 07:13:00 2024 From: shipilev at amazon.de (Aleksey Shipilev) Date: Thu, 18 Jul 2024 09:13:00 +0200 Subject: Proposal to remove the experimental incremental update mode from Shenandoah In-Reply-To: <66706abc9ae64a29a46b8f90b3ef4f6f@amazon.com> References: <66706abc9ae64a29a46b8f90b3ef4f6f@amazon.com> Message-ID: <71e563ca-b789-4f2b-a2f5-7efbb5ca2f08@amazon.de> On 18.07.24 01:14, Kemper, William wrote: > https://bugs.openjdk.org/browse/JDK-8336685 I support this. It does not seem to be (widely, if ever) used, and removal will simplify Shenandoah maintenance. -Aleksey Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 From rkennke at amazon.de Thu Jul 18 14:39:50 2024 From: rkennke at amazon.de (Kennke, Roman) Date: Thu, 18 Jul 2024 14:39:50 +0000 Subject: Proposal to remove the experimental incremental update mode from Shenandoah In-Reply-To: <66706abc9ae64a29a46b8f90b3ef4f6f@amazon.com> References: <66706abc9ae64a29a46b8f90b3ef4f6f@amazon.com> Message-ID: > https://bugs.openjdk.org/browse/JDK-8336685 > > If there are no strong objections to this, we will remove this mode in JDK24. I also support the removal of the I-U mode. It?s long overdue. Thank you! Roman Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 From kdnilsen at openjdk.org Thu Jul 18 18:00:25 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 18 Jul 2024 18:00:25 GMT Subject: RFR: 8325673: GenShen: Share Reserves between Old and Young Collector [v6] In-Reply-To: References: Message-ID: <9ArkWWvSpWGPA84_TYWwLqi4TKlRYnWJUvL033Z9pT0=.91929a46-fc66-47ee-84ad-ac1f19592a44@github.com> > Allow young-gen Collector reserve to share memory with old-gen Collector reserve in order to support prompt processing of mixed evacuations, as constrained by ShenandoahOldEvacRatioPercent. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Improve comment ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/395/files - new: https://git.openjdk.org/shenandoah/pull/395/files/ee2ab01d..21a5d328 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=395&range=05 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=395&range=04-05 Stats: 11 lines in 1 file changed: 1 ins; 1 del; 9 mod Patch: https://git.openjdk.org/shenandoah/pull/395.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/395/head:pull/395 PR: https://git.openjdk.org/shenandoah/pull/395 From wkemper at openjdk.org Fri Jul 19 14:17:22 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 19 Jul 2024 14:17:22 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: Merges tag jdk-24+7 ------------- Commit messages: - 8336587: failure_handler lldb command times out on macosx-aarch64 core file - 8336091: Fix HTML warnings in the generated HTML files - 8335921: Fix HotSpot VM build without JVMTI - 8336300: DateFormatSymbols#getInstanceRef returns non-cached instance - 8336638: Parallel: Remove redundant mangle in PSScavenge::invoke - 8299080: Wrong default value of snippet lang attribute - 8334217: [AIX] Misleading error messages after JDK-8320005 - 8334781: JFR crash: assert(((((JfrTraceIdBits::load(klass)) & ((JfrTraceIdEpoch::this_epoch_method_and_class_bits()))) != 0))) failed: invariant - 8334502: gtest/GTestWrapper.java fails on armhf due to LogDecorations.iso8601_utctime_test - 8336040: Missing closing anchor element in Docs.gmk - ... and 452 more: https://git.openjdk.org/shenandoah/compare/d8af5894...21a6cf84 The webrev contains the conflicts with master: - merge conflicts: https://webrevs.openjdk.org/?repo=shenandoah&pr=461&range=00.conflicts Changes: https://git.openjdk.org/shenandoah/pull/461/files Stats: 71971 lines in 1877 files changed: 44597 ins; 19166 del; 8208 mod Patch: https://git.openjdk.org/shenandoah/pull/461.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/461/head:pull/461 PR: https://git.openjdk.org/shenandoah/pull/461 From duke at openjdk.org Fri Jul 19 21:30:54 2024 From: duke at openjdk.org (Henry Lin) Date: Fri, 19 Jul 2024 21:30:54 GMT Subject: RFR: 8333088: ubsan: shenandoahAdaptiveHeuristics.cpp:245:44: runtime error: division by zero Message-ID: <1drlX-RUquqXP9FFKE1X3Uq6pUyRjYPdPwMtCjACUdM=.11f83216-e30e-454d-a62e-65089d6614d3@github.com> Changed comparison to multiplication to prevent division by zero errors reported by ubsan. ------------- Commit messages: - 8333088: fix ubsan:shenandoahAdaptiveHeuristics divide by zero Changes: https://git.openjdk.org/jdk/pull/20161/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20161&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333088 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20161.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20161/head:pull/20161 PR: https://git.openjdk.org/jdk/pull/20161 From shade at openjdk.org Fri Jul 19 21:30:54 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 19 Jul 2024 21:30:54 GMT Subject: RFR: 8333088: ubsan: shenandoahAdaptiveHeuristics.cpp:245:44: runtime error: division by zero In-Reply-To: <1drlX-RUquqXP9FFKE1X3Uq6pUyRjYPdPwMtCjACUdM=.11f83216-e30e-454d-a62e-65089d6614d3@github.com> References: <1drlX-RUquqXP9FFKE1X3Uq6pUyRjYPdPwMtCjACUdM=.11f83216-e30e-454d-a62e-65089d6614d3@github.com> Message-ID: On Fri, 12 Jul 2024 19:55:13 GMT, Henry Lin wrote: > Changed comparison to multiplication to prevent division by zero errors reported by ubsan. Looks fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20161#pullrequestreview-2178097697 From shade at openjdk.org Fri Jul 19 21:31:02 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 19 Jul 2024 21:31:02 GMT Subject: RFR: 8333728: ubsan: shenandoahFreeSet.cpp:1347:24: runtime error: division by zero In-Reply-To: References: Message-ID: On Fri, 12 Jul 2024 20:50:30 GMT, Henry Lin wrote: > Changed the check to require `linear` to be nonzero to prevent division by zero errors report by `ubsan`. `linear > 0` implies that `count > 0` so the count variable is no longer necessary. This looks good, thanks. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20163#pullrequestreview-2178114899 From duke at openjdk.org Fri Jul 19 21:31:02 2024 From: duke at openjdk.org (Henry Lin) Date: Fri, 19 Jul 2024 21:31:02 GMT Subject: RFR: 8333728: ubsan: shenandoahFreeSet.cpp:1347:24: runtime error: division by zero Message-ID: Changed the check to require `linear` to be nonzero to prevent division by zero errors report by `ubsan`. `linear > 0` implies that `count > 0` so the count variable is no longer necessary. ------------- Commit messages: - 8333728: fix ubsan:shenandoahFreeSet.cpp runtime division by zero Changes: https://git.openjdk.org/jdk/pull/20163/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20163&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333728 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20163.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20163/head:pull/20163 PR: https://git.openjdk.org/jdk/pull/20163 From nprasad at openjdk.org Mon Jul 22 13:58:44 2024 From: nprasad at openjdk.org (Neethu Prasad) Date: Mon, 22 Jul 2024 13:58:44 GMT Subject: RFR: 8335865: Shenandoah: Improve THP pretouch after JDK-8315923 Message-ID: **Notes** os::pretouch is now using madvice now when available and has a fall back to using vm page size [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923) Hence removing code that sets _pretouch_heap_page_size & _pretouch_bitmap_page_size in Shenandoah. **Testing** * Ran test in Linux 5.10 and Linux 6.x and confirmed that there is no regression. I could not replicate the issue or performance improvement though. [add results] * Ran [TestTransparentHugePageUsage](https://github.com/openjdk/jdk/commit/a65a89522d2f24b1767e1c74f6689a22ea32ca6a) for Shenandoah and verified that test passed * Ran tier 1, tier 2 , tier1_gc_shenandoah, tier2_gc_shenandoah, tier3_gc_shenandoah and hotspot_gc_shenandoah. ------------- Commit messages: - 8335865: Shenandoah: Improve THP pretouch after JDK-8315923 Changes: https://git.openjdk.org/jdk/pull/20254/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20254&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335865 Stats: 18 lines in 1 file changed: 0 ins; 17 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20254/head:pull/20254 PR: https://git.openjdk.org/jdk/pull/20254 From rkennke at openjdk.org Mon Jul 22 16:01:37 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 22 Jul 2024 16:01:37 GMT Subject: RFR: 8333728: ubsan: shenandoahFreeSet.cpp:1347:24: runtime error: division by zero In-Reply-To: References: Message-ID: <1LqSYeeKEyJ--F9OBKB7w9VQFEJloob-wizI-sxrGUk=.6049a84c-1c5c-4cb7-bb52-d37ab8849cdc@github.com> On Fri, 12 Jul 2024 20:50:30 GMT, Henry Lin wrote: > Changed the check to require `linear` to be nonzero to prevent division by zero errors report by `ubsan`. `linear > 0` implies that `count > 0` so the count variable is no longer necessary. Looks good, thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20163#pullrequestreview-2191967705 From rkennke at openjdk.org Mon Jul 22 16:01:40 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 22 Jul 2024 16:01:40 GMT Subject: RFR: 8333088: ubsan: shenandoahAdaptiveHeuristics.cpp:245:44: runtime error: division by zero In-Reply-To: <1drlX-RUquqXP9FFKE1X3Uq6pUyRjYPdPwMtCjACUdM=.11f83216-e30e-454d-a62e-65089d6614d3@github.com> References: <1drlX-RUquqXP9FFKE1X3Uq6pUyRjYPdPwMtCjACUdM=.11f83216-e30e-454d-a62e-65089d6614d3@github.com> Message-ID: On Fri, 12 Jul 2024 19:55:13 GMT, Henry Lin wrote: > Changed comparison to multiplication to prevent division by zero errors reported by ubsan. Looks good, thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20161#pullrequestreview-2191965028 From duke at openjdk.org Mon Jul 22 17:31:36 2024 From: duke at openjdk.org (duke) Date: Mon, 22 Jul 2024 17:31:36 GMT Subject: RFR: 8333088: ubsan: shenandoahAdaptiveHeuristics.cpp:245:44: runtime error: division by zero In-Reply-To: <1drlX-RUquqXP9FFKE1X3Uq6pUyRjYPdPwMtCjACUdM=.11f83216-e30e-454d-a62e-65089d6614d3@github.com> References: <1drlX-RUquqXP9FFKE1X3Uq6pUyRjYPdPwMtCjACUdM=.11f83216-e30e-454d-a62e-65089d6614d3@github.com> Message-ID: On Fri, 12 Jul 2024 19:55:13 GMT, Henry Lin wrote: > Changed comparison to multiplication to prevent division by zero errors reported by ubsan. @Henry-Lin-A Your change (at version 75e5435db0300f54ec2d88684f3a6b888badff4c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20161#issuecomment-2243461370 From duke at openjdk.org Mon Jul 22 17:31:36 2024 From: duke at openjdk.org (Henry Lin) Date: Mon, 22 Jul 2024 17:31:36 GMT Subject: Integrated: 8333088: ubsan: shenandoahAdaptiveHeuristics.cpp:245:44: runtime error: division by zero In-Reply-To: <1drlX-RUquqXP9FFKE1X3Uq6pUyRjYPdPwMtCjACUdM=.11f83216-e30e-454d-a62e-65089d6614d3@github.com> References: <1drlX-RUquqXP9FFKE1X3Uq6pUyRjYPdPwMtCjACUdM=.11f83216-e30e-454d-a62e-65089d6614d3@github.com> Message-ID: <95otRcxDhReKuw9Rwafx2jgE1-4abHcRxmJ0p7sbBdM=.df00951d-5fe8-4595-b552-97a3e54b7d21@github.com> On Fri, 12 Jul 2024 19:55:13 GMT, Henry Lin wrote: > Changed comparison to multiplication to prevent division by zero errors reported by ubsan. This pull request has now been integrated. Changeset: 34eea6a5 Author: Henry Lin Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/34eea6a5fa27121bc0e9e8ace0894833d4a9f826 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8333088: ubsan: shenandoahAdaptiveHeuristics.cpp:245:44: runtime error: division by zero Reviewed-by: shade, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/20161 From duke at openjdk.org Mon Jul 22 17:32:36 2024 From: duke at openjdk.org (duke) Date: Mon, 22 Jul 2024 17:32:36 GMT Subject: RFR: 8333728: ubsan: shenandoahFreeSet.cpp:1347:24: runtime error: division by zero In-Reply-To: References: Message-ID: On Fri, 12 Jul 2024 20:50:30 GMT, Henry Lin wrote: > Changed the check to require `linear` to be nonzero to prevent division by zero errors report by `ubsan`. `linear > 0` implies that `count > 0` so the count variable is no longer necessary. @Henry-Lin-A Your change (at version cf6fb062f7fac3e919ce0427b71ea0bf99054ac9) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20163#issuecomment-2243461577 From duke at openjdk.org Mon Jul 22 17:32:36 2024 From: duke at openjdk.org (Henry Lin) Date: Mon, 22 Jul 2024 17:32:36 GMT Subject: Integrated: 8333728: ubsan: shenandoahFreeSet.cpp:1347:24: runtime error: division by zero In-Reply-To: References: Message-ID: On Fri, 12 Jul 2024 20:50:30 GMT, Henry Lin wrote: > Changed the check to require `linear` to be nonzero to prevent division by zero errors report by `ubsan`. `linear > 0` implies that `count > 0` so the count variable is no longer necessary. This pull request has now been integrated. Changeset: b5575942 Author: Henry Lin Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/b5575942027281166676678e2081b024ec572644 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod 8333728: ubsan: shenandoahFreeSet.cpp:1347:24: runtime error: division by zero Reviewed-by: shade, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/20163 From wkemper at openjdk.org Mon Jul 22 19:06:08 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 22 Jul 2024 19:06:08 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: <4u_CVXx1SmkbJR8UG4lLfkkg_3-Uai0oJTqhHn1_fSk=.3064d970-3323-4029-a5a3-beeea8c10feb@github.com> References: <4u_CVXx1SmkbJR8UG4lLfkkg_3-Uai0oJTqhHn1_fSk=.3064d970-3323-4029-a5a3-beeea8c10feb@github.com> Message-ID: On Fri, 12 Jul 2024 17:15:53 GMT, William Kemper wrote: > Merges tag jdk-24+6 This pull request has now been integrated. Changeset: e30fa929 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/e30fa929b19e8b7d6db9073fdf640938d3db39fd Stats: 67311 lines in 1691 files changed: 41794 ins; 18185 del; 7332 mod Merge ------------- PR: https://git.openjdk.org/shenandoah/pull/460 From wkemper at openjdk.org Tue Jul 23 00:18:00 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 23 Jul 2024 00:18:00 GMT Subject: RFR: 8336944: Shenandoah: Should only relativize stack chunks for successful evacuations Message-ID: In some cases, different threads may race to evacuate an object. The race is won by "CAS"ing in the forwarding pointer. The threads that lose this race must "back out" their allocation. The work to relativize stack chunks should only happen for the thread that wins the evacuation race. ------------- Commit messages: - Only relativize stack chunks for the evacuated object that wins the race Changes: https://git.openjdk.org/jdk/pull/20288/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20288&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336944 Stats: 3 lines in 1 file changed: 1 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20288.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20288/head:pull/20288 PR: https://git.openjdk.org/jdk/pull/20288 From ysr at openjdk.org Tue Jul 23 01:33:33 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Tue, 23 Jul 2024 01:33:33 GMT Subject: RFR: 8336944: Shenandoah: Should only relativize stack chunks for successful evacuations In-Reply-To: References: Message-ID: On Mon, 22 Jul 2024 23:56:46 GMT, William Kemper wrote: > In some cases, different threads may race to evacuate an object. The race is won by "CAS"ing in the forwarding pointer. The threads that lose this race must "back out" their allocation. The work to relativize stack chunks should only happen for the thread that wins the evacuation race. ? ------------- Marked as reviewed by ysr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20288#pullrequestreview-2192803782 From shade at openjdk.org Tue Jul 23 09:53:30 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Jul 2024 09:53:30 GMT Subject: RFR: 8336944: Shenandoah: Should only relativize stack chunks for successful evacuations In-Reply-To: References: Message-ID: On Mon, 22 Jul 2024 23:56:46 GMT, William Kemper wrote: > In some cases, different threads may race to evacuate an object. The race is won by "CAS"ing in the forwarding pointer. The threads that lose this race must "back out" their allocation. The work to relativize stack chunks should only happen for the thread that wins the evacuation race. Current code effectively relativizes a private copy of the object, so there is no question about the races over it. New code relativizes when that private copy is now public. This looks less safer than before. So my question is then: why is relativizing a private copy problematic? ------------- PR Review: https://git.openjdk.org/jdk/pull/20288#pullrequestreview-2193512736 From rkennke at openjdk.org Tue Jul 23 09:59:37 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 23 Jul 2024 09:59:37 GMT Subject: RFR: 8336944: Shenandoah: Should only relativize stack chunks for successful evacuations In-Reply-To: References:

Message-ID: On Tue, 23 Jul 2024 09:50:57 GMT, Aleksey Shipilev wrote: > Current code effectively relativizes a private copy of the object, so there is no question about the races over it. New code relativizes when that private copy is now public. This looks less safer than before. > > So my question is then: why is relativizing a private copy problematic? IIRC, it requires the Klass*. With Lilliput, it is possible that the copy that lost the CAS copied the fwd-ptr that the winning thread installed. I guess we could follow through that fwd-ptr and grab the Klass* from the winning copy, but why bother? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20288#issuecomment-2244772101 From shade at openjdk.org Tue Jul 23 10:28:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Jul 2024 10:28:32 GMT Subject: RFR: 8336944: Shenandoah: Should only relativize stack chunks for successful evacuations In-Reply-To: References:

Message-ID: On Tue, 23 Jul 2024 09:56:59 GMT, Roman Kennke wrote: > IIRC, it requires the Klass*. With Lilliput, it is possible that the copy that lost the CAS copied the fwd-ptr that the winning thread installed. I guess we could follow through that fwd-ptr and grab the Klass* from the winning copy, but why bother? Right, I remembered this one: https://github.com/openjdk/lilliput/blob/fdfcf46a3a4bfcf0f58f9413788aed8acc746203/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#L1171-L1185 I would sleep a little better if we did is similarly: reinstall the mark word into a private copy if we pulled the fwdptr already, instead of relying on concurrent relativization of the already published copy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20288#issuecomment-2244835474 From shade at openjdk.org Tue Jul 23 11:18:45 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Jul 2024 11:18:45 GMT Subject: RFR: 8336944: Shenandoah: Should only relativize stack chunks for successful evacuations In-Reply-To: References: Message-ID: On Mon, 22 Jul 2024 23:56:46 GMT, William Kemper wrote: > In some cases, different threads may race to evacuate an object. The race is won by "CAS"ing in the forwarding pointer. The threads that lose this race must "back out" their allocation. The work to relativize stack chunks should only happen for the thread that wins the evacuation race. I suggest something like: // Additional twist for stack chunks if (copy_val->is_stackChunk()) { // We need to call into relativization code for stack chunks. For that, we need // a proper object with all metadata set right. There is a race with fwdptr // installation from the thread that might have already won the CAS and over-written // the mark word now holding the fwdptr, and we have just picked it up by accident. // Additionally, memory copy is not atomic, so whatever we read from that mark word // could be partial. // // Check that original mark is still not forwarded. This means no one have installed // yet, and so we overwrite mark in our private copy with a safe value, relativize, // and try to install the copy. In all other cases, the installation would fail. markWord old_mark = p->mark(); if (!old_mark.is_marked()) { copy_val->set_mark(old_mark); ContinuationGCSupport::relativize_stack_chunk(copy_val); } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/20288#issuecomment-2244959839 From shade at openjdk.org Tue Jul 23 12:09:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Jul 2024 12:09:32 GMT Subject: RFR: 8336944: Shenandoah: Should only relativize stack chunks for successful evacuations In-Reply-To: References: Message-ID: On Mon, 22 Jul 2024 23:56:46 GMT, William Kemper wrote: > In some cases, different threads may race to evacuate an object. The race is won by "CAS"ing in the forwarding pointer. The threads that lose this race must "back out" their allocation. The work to relativize stack chunks should only happen for the thread that wins the evacuation race. Marked as reviewed by shade (Reviewer). So, my original concern was that doing relativization on already public `StackChunk` instances exposes us to more risk. But now I am looking around, and seeing we actually doing this during marking as well, which is an even more frequent case! https://github.com/openjdk/jdk/blob/e83b4b236eca48d0b75094006f7f888398194fe4/src/hotspot/share/gc/shenandoah/shenandoahMark.inline.hpp#L75-L79 So I think we can go with a simple patch here. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1185: > 1183: // Successfully evacuated. Our copy is now the public one! > 1184: shenandoah_assert_correct(nullptr, copy_val); > 1185: ContinuationGCSupport::relativize_stack_chunk(copy_val); Please move it before the assert, so that we are sure relativization did not foobar the `copy_val`. Asserts in these methods verify that we return good things. ------------- PR Review: https://git.openjdk.org/jdk/pull/20288#pullrequestreview-2193791051 PR Comment: https://git.openjdk.org/jdk/pull/20288#issuecomment-2245060685 PR Review Comment: https://git.openjdk.org/jdk/pull/20288#discussion_r1687944160 From wkemper at openjdk.org Tue Jul 23 15:19:32 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 23 Jul 2024 15:19:32 GMT Subject: RFR: 8336944: Shenandoah: Should only relativize stack chunks for successful evacuations In-Reply-To: References:

Message-ID: On Tue, 23 Jul 2024 09:56:59 GMT, Roman Kennke wrote: > > Current code effectively relativizes a private copy of the object, so there is no question about the races over it. New code relativizes when that private copy is now public. This looks less safer than before. > > So my question is then: why is relativizing a private copy problematic? > > IIRC, it requires the Klass*. With Lilliput, it is possible that the copy that lost the CAS copied the fwd-ptr that the winning thread installed. I guess we could follow through that fwd-ptr and grab the Klass* from the winning copy, but why bother? Yes, exactly. In the crash I debugged the `copy_val` included in the forwarding pointer installed by the winning thread. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20288#issuecomment-2245536056 From wkemper at openjdk.org Tue Jul 23 15:19:34 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 23 Jul 2024 15:19:34 GMT Subject: RFR: 8336944: Shenandoah: Should only relativize stack chunks for successful evacuations In-Reply-To: References:

Message-ID: On Tue, 23 Jul 2024 12:06:26 GMT, Aleksey Shipilev wrote: >> In some cases, different threads may race to evacuate an object. The race is won by "CAS"ing in the forwarding pointer. The threads that lose this race must "back out" their allocation. The work to relativize stack chunks should only happen for the thread that wins the evacuation race. > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1185: > >> 1183: // Successfully evacuated. Our copy is now the public one! >> 1184: shenandoah_assert_correct(nullptr, copy_val); >> 1185: ContinuationGCSupport::relativize_stack_chunk(copy_val); > > Please move it before the assert, so that we are sure relativization did not foobar the `copy_val`. Asserts in these methods verify that we return good things. Okay, I had it that way originally, but then changed it to make sure we were relativizing a valid copy. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20288#discussion_r1688249719 From wkemper at openjdk.org Tue Jul 23 16:53:50 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 23 Jul 2024 16:53:50 GMT Subject: RFR: 8336944: Shenandoah: Should only relativize stack chunks for successful evacuations [v2] In-Reply-To: References: Message-ID: <73hYnhJmzNdiODBPRF4Zz3YcmPNHkdAc0QgXszBRPSA=.c45fcbc9-3f86-458e-859b-a29801de7592@github.com> > In some cases, different threads may race to evacuate an object. The race is won by "CAS"ing in the forwarding pointer. The threads that lose this race must "back out" their allocation. The work to relativize stack chunks should only happen for the thread that wins the evacuation race. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Assert that return value is correct after relativization of stack chunk ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20288/files - new: https://git.openjdk.org/jdk/pull/20288/files/ad832e7e..6943b79a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20288&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20288&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20288.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20288/head:pull/20288 PR: https://git.openjdk.org/jdk/pull/20288 From shade at openjdk.org Tue Jul 23 16:53:50 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Jul 2024 16:53:50 GMT Subject: RFR: 8336944: Shenandoah: Should only relativize stack chunks for successful evacuations [v2] In-Reply-To: <73hYnhJmzNdiODBPRF4Zz3YcmPNHkdAc0QgXszBRPSA=.c45fcbc9-3f86-458e-859b-a29801de7592@github.com> References: <73hYnhJmzNdiODBPRF4Zz3YcmPNHkdAc0QgXszBRPSA=.c45fcbc9-3f86-458e-859b-a29801de7592@github.com> Message-ID: On Tue, 23 Jul 2024 16:50:32 GMT, William Kemper wrote: >> In some cases, different threads may race to evacuate an object. The race is won by "CAS"ing in the forwarding pointer. The threads that lose this race must "back out" their allocation. The work to relativize stack chunks should only happen for the thread that wins the evacuation race. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Assert that return value is correct after relativization of stack chunk Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20288#pullrequestreview-2194520892 From wkemper at openjdk.org Tue Jul 23 16:53:50 2024 From: wkemper at openjdk.org (William Kemper) Date: Tue, 23 Jul 2024 16:53:50 GMT Subject: Integrated: 8336944: Shenandoah: Should only relativize stack chunks for successful evacuations In-Reply-To: References: Message-ID: On Mon, 22 Jul 2024 23:56:46 GMT, William Kemper wrote: > In some cases, different threads may race to evacuate an object. The race is won by "CAS"ing in the forwarding pointer. The threads that lose this race must "back out" their allocation. The work to relativize stack chunks should only happen for the thread that wins the evacuation race. This pull request has now been integrated. Changeset: 2f2223d7 Author: William Kemper URL: https://git.openjdk.org/jdk/commit/2f2223d7524c4405cc7ca6ab77da62016bbfa911 Stats: 3 lines in 1 file changed: 1 ins; 2 del; 0 mod 8336944: Shenandoah: Should only relativize stack chunks for successful evacuations Reviewed-by: shade, ysr ------------- PR: https://git.openjdk.org/jdk/pull/20288 From wkemper at openjdk.org Wed Jul 24 18:12:40 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 24 Jul 2024 18:12:40 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode Message-ID: We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. ------------- Commit messages: - Remove last vestiges of incremental update mode - Missed test, remove actual IU barrier flag - Remove missed iu_barrier usages for C1 - Update test (all barriers can be enabled now for all modes) - WIP: Remove incremental update mode Changes: https://git.openjdk.org/jdk/pull/20316/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20316&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336685 Stats: 1696 lines in 69 files changed: 4 ins; 1658 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/20316.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20316/head:pull/20316 PR: https://git.openjdk.org/jdk/pull/20316 From shade at openjdk.org Wed Jul 24 18:44:31 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 24 Jul 2024 18:44:31 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode In-Reply-To: References: Message-ID: On Wed, 24 Jul 2024 18:08:46 GMT, William Kemper wrote: > We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. > > ## Testing > * hotspot_gc_shenandoah > * dacapo > * diluvian > * extremem > * hyperalloc > * specjbb2015 > * specjvm2008 Good riddance. I have to comb through this more accurately tomorrow, but first pass comments below. src/hotspot/cpu/ppc/gc/shenandoah/shenandoahBarrierSetAssembler_ppc.cpp line 571: > 569: /* ==== Apply keep-alive barrier, if required (e.g., to inhibit weak reference resurrection) ==== */ > 570: if (ShenandoahBarrierSet::need_keep_alive_barrier(decorators, type)) { > 571: if (ShenandoahSATBBarrier) { A bit weird to replace IU with SATB barrier here. src/hotspot/share/opto/classes.hpp line 327: > 325: shmacro(ShenandoahWeakCompareAndSwapN) > 326: shmacro(ShenandoahWeakCompareAndSwapP) > 327: I think this newline is unnecessary. ------------- PR Review: https://git.openjdk.org/jdk/pull/20316#pullrequestreview-2197480695 PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690256658 PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690269614 From kdnilsen at openjdk.org Wed Jul 24 18:55:32 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 24 Jul 2024 18:55:32 GMT Subject: RFR: 8336685: Shenandoah: Remove experimental incremental update mode In-Reply-To: References:

Message-ID: On Wed, 24 Jul 2024 18:25:38 GMT, Aleksey Shipilev wrote: >> We've reason to believe that this mode is very rarely used and its maintenance has become a burden for future development. >> >> ## Testing >> * hotspot_gc_shenandoah >> * dacapo >> * diluvian >> * extremem >> * hyperalloc >> * specjbb2015 >> * specjvm2008 > > src/hotspot/cpu/ppc/gc/shenandoah/shenandoahBarrierSetAssembler_ppc.cpp line 571: > >> 569: /* ==== Apply keep-alive barrier, if required (e.g., to inhibit weak reference resurrection) ==== */ >> 570: if (ShenandoahBarrierSet::need_keep_alive_barrier(decorators, type)) { >> 571: if (ShenandoahSATBBarrier) { > > A bit weird to replace IU with SATB barrier here. Will need_keep_alive_barrier() always be false in absence of IU mode support? can we replace this with an assert? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20316#discussion_r1690285365 From xpeng at openjdk.org Wed Jul 24 19:10:45 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 24 Jul 2024 19:10:45 GMT Subject: RFR: 8336640: Shenandoah: Parallel worker use in parallel_heap_region_iterate Message-ID: [parallel_heap_region_iterate](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#L1726-L1734) is used to execute lightweight operations on heap regions, including ShenandoahPrepareForMarkClosure, ShenandoahInitMarkUpdateRegionStateClosure, ShenandoahFinalUpdateRefsUpdateRegionStateClosure, ShenandoahResetUpdateRegionStateClosure and ShenandoahFinalMarkUpdateRegionStateClosure. Since all the operations are very lightweight, in regular cases w/o large number of heap regions, the parallelism seems to be an overkill because the cost of multi-thread orchestrating could be more expensive; In most cases, single thread should be more efficient. Also, if multiple threading is needed, we should maximize the utilization of all active workers for best performance. This PR includes proposed improvments addressing the known issues: 1. Change the default value of ShenandoahParallelRegionStride to 0, when it is 0, Shenandoah will auto derive the value of stride for best performance; 2. if num_regions is <= 4096, not use worker threads at all to avoid the overhead of multi-threading; 3. When num_regions is more than 4096, use worker threads to parallelize the workload, derive the value of stride to evenly distribute the workload to all active workers. 4. When number of active workers is 1, don't bother the workers, it is faster to finish the workload in current thread(avoid overhead of multi-threads orchestration) There are some time metrics I collected from test with TIP version(I added time metrics for parallel_heap_region_iterate): JVM args: export JAVA_OPTS="-Xms8G -Xmx8G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahParallelRegionStride= -XX:ShenandoahTargetNumRegions= -Xlog:gc*" | | 1024 regions | 2048 regions | 4096 regions | 8192 regions |16384 regions | | ----------- | ------------ | ------------ | ------------ | ------------ |------------ | | 1024 stride | 5785 ns | 22194 ns | 20953 ns | 23008 ns |33013 ns | | 2048 stride | N/A | 6491 ns | 22476 ns | 25842 ns |34378 ns | | 4096 stride | N/A | N/A | 14034 ns | 28425 ns |36324 ns | | 8192 stride | N/A | N/A | N/A | 24359 ns |45231 ns | | 16384 stride | N/A | N/A | N/A | N/A |53679 ns | Basically when we increase stride, less threads are used for parallel iteration, we get worse latency which is expected. when number of regions is same as stride, it won't use mutli-threading, using single thread to process 4096 regions is much better then 4 threads(1024 stride). For the PR, also tested with following JVM args: export JAVA_OPTS="-Xms8G -Xmx8G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahTargetNumRegions= -Xlog:gc*" | Regions | time (ns) | | ------- | --------- | | 1024 | 5103 | | 2048 | 6132 | | 4096 | 12763 | | 8192 | 24295 | | 16384 | 33729 | Overall the performance is optimal no matter how many heap regions. Additional test: - [ ] `make test TEST=hotspot_gc_shenandoah` ------------- Commit messages: - Add empty line - clean - Fix build error on Windows - Revert "Add timing logs for execution of ShenandoahHeapRegionClosure" - Remove the default arg value to constructor of ShenandoahParallelHeapRegionTask - Auto derive stride for ShenandoahParallelHeapRegionTask when ShenandoahParallelRegionStride is set to 0 - Dynamic calculate stride - Add timing logs for execution of ShenandoahHeapRegionClosure - Recalibrate ShenandoahParallelRegionStride value if there is no override from JVM args Changes: https://git.openjdk.org/jdk/pull/20305/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20305&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336640 Stats: 21 lines in 2 files changed: 14 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20305/head:pull/20305 PR: https://git.openjdk.org/jdk/pull/20305 From shade at openjdk.org Wed Jul 24 19:10:46 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 24 Jul 2024 19:10:46 GMT Subject: RFR: 8336640: Shenandoah: Parallel worker use in parallel_heap_region_iterate In-Reply-To: References: Message-ID: On Wed, 24 Jul 2024 00:42:22 GMT, Xiaolong Peng wrote: > [parallel_heap_region_iterate](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#L1726-L1734) is used to execute lightweight operations on heap regions, including ShenandoahPrepareForMarkClosure, ShenandoahInitMarkUpdateRegionStateClosure, ShenandoahFinalUpdateRefsUpdateRegionStateClosure, ShenandoahResetUpdateRegionStateClosure and ShenandoahFinalMarkUpdateRegionStateClosure. Since all the operations are very lightweight, in regular cases w/o large number of heap regions, the parallelism seems to be an overkill because the cost of multi-thread orchestrating could be more expensive; In most cases, single thread should be more efficient. Also, if multiple threading is needed, we should maximize the utilization of all active workers for best performance. > > This PR includes proposed improvments addressing the known issues: > 1. Change the default value of ShenandoahParallelRegionStride to 0, when it is 0, Shenandoah will auto derive the value of stride for best performance; > 2. if num_regions is <= 4096, not use worker threads at all to avoid the overhead of multi-threading; > 3. When num_regions is more than 4096, use worker threads to parallelize the workload, derive the value of stride to evenly distribute the workload to all active workers. > 4. When number of active workers is 1, don't bother the workers, it is faster to finish the workload in current thread(avoid overhead of multi-threads orchestration) > > There are some time metrics I collected from test with TIP version(I added time metrics for parallel_heap_region_iterate): > > JVM args: export JAVA_OPTS="-Xms8G -Xmx8G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahParallelRegionStride= -XX:ShenandoahTargetNumRegions= -Xlog:gc*" > > | | 1024 regions | 2048 regions | 4096 regions | 8192 regions |16384 regions | > | ----------- | ------------ | ------------ | ------------ | ------------ |------------ | > | 1024 stride | 5785 ns | 22194 ns | 20953 ns | 23008 ns |33013 ns | > | 2048 stride | N/A | 6491 ns | 22476 ns | 25842 ns |34378 ns | > | 4096 stride | N/A | N/A | 14034 ns | 28425 ns |36324 ns | > | 8192 stride | N/A | N/A | N/A | 24359 ns |45231 ns | > | 16384 stride | N/A | N/A | N/A | N/A |53679 ns | > > Basically w... src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1697: > 1695: ShenandoahHeap* const _heap; > 1696: ShenandoahHeapRegionClosure* const _blk; > 1697: size_t _stride; Should be `size_t const _stride;`? src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1729: > 1727: void ShenandoahHeap::parallel_heap_region_iterate(ShenandoahHeapRegionClosure* blk) const { > 1728: assert(blk->is_thread_safe(), "Only thread-safe closures here"); > 1729: const uint active_workers = workers() -> active_workers(); Suggestion: const uint active_workers = workers()->active_workers(); src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1737: > 1735: // not use worker threads to avoid the overhead; otherwise cacluate the stride by num_regions/active_workers > 1736: // to make sure every worker thread will have same amount of workload. > 1737: stride = n_regions <= 4096 ? 4096 : checked_cast(ceil(checked_cast(n_regions) / checked_cast(active_workers))); I suggest writing it like this: size_t stride = ShenandoahParallelRegionStride; if (stride == 0 && active_workers > 1) { // Automatically derive the stride to balance the work between threads // evenly. Do not try to split work if below the reasonable threshold. const size_t threshold = 4096; stride = (n_regions <= threshold) ? threshold : (n_regions + active_workers - 1) / active_workers; } if (n_regions > stride && active_workers > 1) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20305#discussion_r1690045161 PR Review Comment: https://git.openjdk.org/jdk/pull/20305#discussion_r1690045677 PR Review Comment: https://git.openjdk.org/jdk/pull/20305#discussion_r1690214436 From xpeng at openjdk.org Wed Jul 24 19:10:47 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Wed, 24 Jul 2024 19:10:47 GMT Subject: RFR: 8336640: Shenandoah: Parallel worker use in parallel_heap_region_iterate In-Reply-To: References:

Message-ID: <_pbtLdrJiJw6Wlrim13_cg5G8qTlxiqxHLJHCpmmjVc=.928b9c4a-681e-4dd1-8c0a-d531f2000504@github.com> On Wed, 24 Jul 2024 15:42:50 GMT, Aleksey Shipilev wrote: >> [parallel_heap_region_iterate](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp#L1726-L1734) is used to execute lightweight operations on heap regions, including ShenandoahPrepareForMarkClosure, ShenandoahInitMarkUpdateRegionStateClosure, ShenandoahFinalUpdateRefsUpdateRegionStateClosure, ShenandoahResetUpdateRegionStateClosure and ShenandoahFinalMarkUpdateRegionStateClosure. Since all the operations are very lightweight, in regular cases w/o large number of heap regions, the parallelism seems to be an overkill because the cost of multi-thread orchestrating could be more expensive; In most cases, single thread should be more efficient. Also, if multiple threading is needed, we should maximize the utilization of all active workers for best performance. >> >> This PR includes proposed improvments addressing the known issues: >> 1. Change the default value of ShenandoahParallelRegionStride to 0, when it is 0, Shenandoah will auto derive the value of stride for best performance; >> 2. if num_regions is <= 4096, not use worker threads at all to avoid the overhead of multi-threading; >> 3. When num_regions is more than 4096, use worker threads to parallelize the workload, derive the value of stride to evenly distribute the workload to all active workers. >> 4. When number of active workers is 1, don't bother the workers, it is faster to finish the workload in current thread(avoid overhead of multi-threads orchestration) >> >> There are some time metrics I collected from test with TIP version(I added time metrics for parallel_heap_region_iterate): >> >> JVM args: export JAVA_OPTS="-Xms8G -Xmx8G -XX:+AlwaysPreTouch -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:ShenandoahParallelRegionStride= -XX:ShenandoahTargetNumRegions=