From eosterlund at openjdk.java.net Sun Aug 1 12:19:34 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Sun, 1 Aug 2021 12:19:34 GMT Subject: RFR: 8193559: ugly DO_JAVA_THREADS macro should be replaced [v5] In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 20:53:53 GMT, Daniel D. Daugherty wrote: >> This is a simple rewrite of what is "Possibly the ugliest for loop the world has seen." >> Thanks to @stefank for the draft proposed fix. Thanks to @fisk for providing this >> piece of history that I'm finally getting around to cleaning up. While this macro has >> been with us for a long time, its time has passed... >> >> Tested with Mach5 Tier[1-3]. > > Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: > > kbarrett CR - simplify 'ThreadsList::Iterator::operator!=(Iterator i)' Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4671 From dcubed at openjdk.java.net Sun Aug 1 13:41:34 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Sun, 1 Aug 2021 13:41:34 GMT Subject: RFR: 8193559: ugly DO_JAVA_THREADS macro should be replaced [v5] In-Reply-To: <_ZO1aky7oyRjpY6NTFG_fjGhd9sr0ucpgQ8wO7VXTH8=.8491610a-5043-42f5-a8cc-1e6a906066aa@github.com> References: <_ZO1aky7oyRjpY6NTFG_fjGhd9sr0ucpgQ8wO7VXTH8=.8491610a-5043-42f5-a8cc-1e6a906066aa@github.com> Message-ID: On Sat, 31 Jul 2021 17:37:52 GMT, Kim Barrett wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> kbarrett CR - simplify 'ThreadsList::Iterator::operator!=(Iterator i)' > > Looks good. @kimbarrett and @fisk - Thanks for the re-reviews! @dholmes-ora - are you good with the latest version? ------------- PR: https://git.openjdk.java.net/jdk/pull/4671 From dholmes at openjdk.java.net Mon Aug 2 01:53:32 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 2 Aug 2021 01:53:32 GMT Subject: RFR: 8193559: ugly DO_JAVA_THREADS macro should be replaced [v5] In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 20:53:53 GMT, Daniel D. Daugherty wrote: >> This is a simple rewrite of what is "Possibly the ugliest for loop the world has seen." >> Thanks to @stefank for the draft proposed fix. Thanks to @fisk for providing this >> piece of history that I'm finally getting around to cleaning up. While this macro has >> been with us for a long time, its time has passed... >> >> Tested with Mach5 Tier[1-3]. > > Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: > > kbarrett CR - simplify 'ThreadsList::Iterator::operator!=(Iterator i)' Still LGTM! Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4671 From iwalulya at openjdk.java.net Mon Aug 2 07:22:32 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 2 Aug 2021 07:22:32 GMT Subject: RFR: 8271217: Fix race between G1PeriodicGCTask checks and GC request In-Reply-To: References: Message-ID: <8gQNCSwFUNPwQ0TT6-NA_2bOJAdtI1rMnMykc8VHqLE=.23e93c0c-dbc1-43ad-a0c1-c5b25ce1687b@github.com> On Mon, 26 Jul 2021 18:05:10 GMT, Kim Barrett wrote: > Please review this change to remove a race involving G1PeriodicGCTask. The > task performs various checks to determine whether a periodic GC should be > performed. If they pass, then it requests the GC by calling try_collect. > The problem is that an unrelated GC could occur after the task's checks but > before its request is processed. In that case we don't want to do the > periodic GC after all, but there's nothing to prevent it. > > This change lets the task capture the various GC counters as part of its > checking, and pass them to try_collect for use in detecting an intervening > GC. > > Testing: > mach5 tier1 Lgtm! ------------- Marked as reviewed by iwalulya (Committer). PR: https://git.openjdk.java.net/jdk/pull/4905 From thomas.schatzl at oracle.com Mon Aug 2 13:57:41 2021 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 2 Aug 2021 15:57:41 +0200 Subject: Status of JEP-8204088/JDK-8236073 In-Reply-To: References: Message-ID: Hi Man, On 27.07.21 01:39, Man Cao wrote: > Hi all, > > I added the "g1-heap-resizing" label to the issues that have been > brought up in the thread: > https://bugs.openjdk.java.net/issues/?jql=labels+%3D+g1-heap-resizing > . > We agree that most of these issues should be resolved first, before > further working on SoftMaxHeapSize or developing an external component > to automatically set GCTimeRatio/GCCpuPercentage at runtime. Thanks. > > Regarding CurrentMaxHeapSize, it is tracked in > https://bugs.openjdk.java.net/browse/JDK-8204088 > . It seems more > independent of other g1-heap-resizing issues, so perhaps it could be > implemented earlier. On the other hand, once most of the > g1-heap-resizing issues are fixed, SoftMaxHeapSize should?behave better. > One main goal of those g1-heap-resizing issues is to fix the problem > that "G1 expands the heap too aggressively". Fwiw, these two are fairly independent issues, and the main problem with the proposed CurrentMaxHeapSize patch has been that the change is not atomic in the sense that the changes simply modify the global heap size variable without regard whether and when it is accessed. That may result in fairly nasty races as different components of the GC may then have a different notion of what the current max heap size actually is at the moment. > > I agree that CurrentMaxHeapSize?can address several problems that cannot > be solved by tuning SoftMaxHeapSize and/or GCTimeRatio/GCCpuPercentage. > However, it may introduce other issues and does not satisfy all of our > requirements. In particular: > - In our experience, having a tight Xmx is prone to GC thrashing, which > could make servers unresponsive and wasting CPU resources, and is worse > than getting killed by a container manager due to running out of > container RAM (because a killed process can be restarted easily). I > suspect using CurrentMaxHeapSize instead of SoftMaxHeapSize will > increase the chance of GC thrashing, when it is better to let the > process run out of container RAM limit and get killed and restarted. > > - We found that turning off CompressedOops could waste 30%-40% > heap?memory. This means many of our servers are better off keeping Xmx > below 32G. Thus, many of our servers cannot directly use the strategy of > setting Xmx to the maximum memory available, then dynamically adjust > CurrentMaxHeapSize. Admittedly, this problem exists for both > CurrentMaxHeapSize or SoftMaxHeapSize. We will probably need to develop > a separate strategy to determine when to make Xmx go above 32G. It is > also possible to set -XX:ObjectAlignmentInBytes=16, which can then use > CompressedOops up to 64G of Xmx. > > We are definitely happy to experiment and compare automatically setting > CurrentMaxHeapSize vs SoftMaxHeapSize. Again, SoftMaxHeapSize is not > ready until most g1-heap-resizing issues are fixed. +1 Thomas From dcubed at openjdk.java.net Mon Aug 2 16:01:36 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 2 Aug 2021 16:01:36 GMT Subject: RFR: 8193559: ugly DO_JAVA_THREADS macro should be replaced [v5] In-Reply-To: References: Message-ID: On Mon, 2 Aug 2021 01:50:53 GMT, David Holmes wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> kbarrett CR - simplify 'ThreadsList::Iterator::operator!=(Iterator i)' > > Still LGTM! > > Thanks, > David @dholmes-ora - Thanks for the re-review! ------------- PR: https://git.openjdk.java.net/jdk/pull/4671 From dcubed at openjdk.java.net Mon Aug 2 16:04:41 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 2 Aug 2021 16:04:41 GMT Subject: Integrated: 8193559: ugly DO_JAVA_THREADS macro should be replaced In-Reply-To: References: Message-ID: On Fri, 2 Jul 2021 16:03:57 GMT, Daniel D. Daugherty wrote: > This is a simple rewrite of what is "Possibly the ugliest for loop the world has seen." > Thanks to @stefank for the draft proposed fix. Thanks to @fisk for providing this > piece of history that I'm finally getting around to cleaning up. While this macro has > been with us for a long time, its time has passed... > > Tested with Mach5 Tier[1-3]. This pull request has now been integrated. Changeset: 0a852363 Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/0a85236396c667c8d2c890e4384c623b39455075 Stats: 133 lines in 4 files changed: 107 ins; 23 del; 3 mod 8193559: ugly DO_JAVA_THREADS macro should be replaced Co-authored-by: Kim Barrett Reviewed-by: eosterlund, ayang, kbarrett, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/4671 From tschatzl at openjdk.java.net Mon Aug 2 16:15:34 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 2 Aug 2021 16:15:34 GMT Subject: RFR: 8270842: G1: Only young regions need to redirty outside references in remset. In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 02:13:38 GMT, Hamlin Li wrote: > For evac failure objects in non-young regions (old) of cset, the outside reference remset already recorded in the dirty queue, we only needs to do it for obj in young regions The problem of patch H is that in `G1ScanEvacuatedObjClosure::do_oop_work` the code does not always enqueue cards but only if the reference is already outside of the collection set. If inside the collection set, we push it on the task queue, meaning that ultimately we call `G1ParScanThreadState::do_oop_evac` on it; the suggested patch A then unconditionally enqueues the card that we "missed" just before. I experimented a bit how to remove the iteration during self-forwarding pointer removal, one option (not 100% finished, one FIXME and some commented out asserts and the new class isn't as nice as it could be, but it seems to work) is here: https://github.com/tschatzl/jdk/tree/submit/evac-failure-no-scan-during-remove-self-forwards Not sure if I like it, and we would need to test if it actually an improvement (i.e. faster overall - important for region based pinning). So for now the original patch from you seems best to continue with. Initial look at it seems good, but let me push it through our internal testing and re-review it. ------------- PR: https://git.openjdk.java.net/jdk/pull/4853 From stuefe at openjdk.java.net Tue Aug 3 07:31:57 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 3 Aug 2021 07:31:57 GMT Subject: RFR: JDK-8271721: Split gc/g1/TestMixedGCLiveThreshold into separate tests Message-ID: Hi, may I please have reviews for this trivial change: TestMixedGCLiveThreshold executes three tests in one by spawning off three child processes sequentially. In order to make the tests easier to analyze and to be able to execute them concurrently it would be helpful to have three subtests instead. I also added diagnostic output. Note: this was motivated while looking at test errors on ppc similar to JDK-8211081. ..Thomas ------------- Commit messages: - start Changes: https://git.openjdk.java.net/jdk/pull/4968/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4968&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271721 Stats: 35 lines in 1 file changed: 23 ins; 5 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/4968.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4968/head:pull/4968 PR: https://git.openjdk.java.net/jdk/pull/4968 From tschatzl at openjdk.java.net Tue Aug 3 08:04:30 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 3 Aug 2021 08:04:30 GMT Subject: RFR: JDK-8271721: Split gc/g1/TestMixedGCLiveThreshold into separate tests In-Reply-To: References: Message-ID: On Tue, 3 Aug 2021 07:24:49 GMT, Thomas Stuefe wrote: > Hi, may I please have reviews for this trivial change: > > TestMixedGCLiveThreshold executes three tests in one by spawning off three child processes sequentially. In order to make the tests easier to analyze and to be able to execute them concurrently it would be helpful to have three subtests instead. > > I also added diagnostic output. > > Note: this was motivated while looking at test errors on ppc similar to JDK-8211081. > > ..Thomas Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4968 From eosterlund at openjdk.java.net Tue Aug 3 08:19:56 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 3 Aug 2021 08:19:56 GMT Subject: [jdk17] RFR: 8271064: ZGC several jvm08 perf regressions after JDK-8268372 Message-ID: After the integration of ZGC dynamic thread selection, there was a regression that popped up on a few benchmarks that all had in common a high allocation rate and very quick GC times. An extra conservative measure for not trusting the heuristics when using few GC threads, did not play very well with this scenario. The effect of it is back-to-back GC causing thousands of GCs when <100 were really needed. Removing this conservative measure causes the regression to go away, and a bunch of green benchmarks to pop up instead of red ones. This suggest that conservative measure isn't working very well and does not scale to these scenarios. It is also already taken care of by other conservative measures. I propose it is removed, to solve the regressions. The entire ZGC perf suite was rerun (no red, only green), and tier1-3 was run. There isn't much potential for things to blow up as I just removed one parameter in the heuristic. So if anything nasty was to show up, it would be in the perf runs. ------------- Commit messages: - 8271064: zgc dynamic threads performance regression fix Changes: https://git.openjdk.java.net/jdk17/pull/298/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk17&pr=298&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271064 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk17/pull/298.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/298/head:pull/298 PR: https://git.openjdk.java.net/jdk17/pull/298 From ayang at openjdk.java.net Tue Aug 3 08:27:46 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 3 Aug 2021 08:27:46 GMT Subject: [jdk17] RFR: 8271064: ZGC several jvm08 perf regressions after JDK-8268372 In-Reply-To: References: Message-ID: <1SWdYM3veIQNe-I_wTJL1RRW374Dur4B0Vzux-AWRsI=.6dce58de-46f0-4ab7-9e60-7f59f857e274@github.com> On Tue, 3 Aug 2021 08:13:02 GMT, Erik ?sterlund wrote: > After the integration of ZGC dynamic thread selection, there was a regression that popped up on a few benchmarks that all had in common a high allocation rate and very quick GC times. An extra conservative measure for not trusting the heuristics when using few GC threads, did not play very well with this scenario. The effect of it is back-to-back GC causing thousands of GCs when <100 were really needed. Removing this conservative measure causes the regression to go away, and a bunch of green benchmarks to pop up instead of red ones. This suggest that conservative measure isn't working very well and does not scale to these scenarios. It is also already taken care of by other conservative measures. I propose it is removed, to solve the regressions. > The entire ZGC perf suite was rerun (no red, only green), and tier1-3 was run. There isn't much potential for things to blow up as I just removed one parameter in the heuristic. So if anything nasty was to show up, it would be in the perf runs. Marked as reviewed by ayang (Committer). ------------- PR: https://git.openjdk.java.net/jdk17/pull/298 From pliden at openjdk.java.net Tue Aug 3 08:27:46 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 3 Aug 2021 08:27:46 GMT Subject: [jdk17] RFR: 8271064: ZGC several jvm08 perf regressions after JDK-8268372 In-Reply-To: References: Message-ID: On Tue, 3 Aug 2021 08:13:02 GMT, Erik ?sterlund wrote: > After the integration of ZGC dynamic thread selection, there was a regression that popped up on a few benchmarks that all had in common a high allocation rate and very quick GC times. An extra conservative measure for not trusting the heuristics when using few GC threads, did not play very well with this scenario. The effect of it is back-to-back GC causing thousands of GCs when <100 were really needed. Removing this conservative measure causes the regression to go away, and a bunch of green benchmarks to pop up instead of red ones. This suggest that conservative measure isn't working very well and does not scale to these scenarios. It is also already taken care of by other conservative measures. I propose it is removed, to solve the regressions. > The entire ZGC perf suite was rerun (no red, only green), and tier1-3 was run. There isn't much potential for things to blow up as I just removed one parameter in the heuristic. So if anything nasty was to show up, it would be in the perf runs. Looks good! ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk17/pull/298 From tschatzl at openjdk.java.net Tue Aug 3 08:33:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 3 Aug 2021 08:33:43 GMT Subject: RFR: 8271217: Fix race between G1PeriodicGCTask checks and GC request In-Reply-To: References: Message-ID: <1tGM2Jf-63tbmf5uqGW3M_1g2W_Fo5y3vO5tzjAgyUM=.0dd7a812-1ae7-485e-b9ef-50258468f7b3@github.com> On Mon, 26 Jul 2021 18:05:10 GMT, Kim Barrett wrote: > Please review this change to remove a race involving G1PeriodicGCTask. The > task performs various checks to determine whether a periodic GC should be > performed. If they pass, then it requests the GC by calling try_collect. > The problem is that an unrelated GC could occur after the task's checks but > before its request is processed. In that case we don't want to do the > periodic GC after all, but there's nothing to prevent it. > > This change lets the task capture the various GC counters as part of its > checking, and pass them to try_collect for use in detecting an intervening > GC. > > Testing: > mach5 tier1 Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4905 From stuefe at openjdk.java.net Tue Aug 3 08:37:35 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 3 Aug 2021 08:37:35 GMT Subject: RFR: JDK-8271721: Split gc/g1/TestMixedGCLiveThreshold into separate tests In-Reply-To: References: Message-ID: On Tue, 3 Aug 2021 08:01:37 GMT, Thomas Schatzl wrote: > Lgtm. Thank you, Thomas. Is this trivial enough to be trivial? ------------- PR: https://git.openjdk.java.net/jdk/pull/4968 From tschatzl at openjdk.java.net Tue Aug 3 08:43:36 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 3 Aug 2021 08:43:36 GMT Subject: [jdk17] RFR: 8271064: ZGC several jvm08 perf regressions after JDK-8268372 In-Reply-To: References: Message-ID: <9XPXCzDwN60jCIm6PIib2WOzCFac7MRoBmj9WnJUX58=.94ea3960-aed3-4319-b8ab-1090cc43a16d@github.com> On Tue, 3 Aug 2021 08:13:02 GMT, Erik ?sterlund wrote: > After the integration of ZGC dynamic thread selection, there was a regression that popped up on a few benchmarks that all had in common a high allocation rate and very quick GC times. An extra conservative measure for not trusting the heuristics when using few GC threads, did not play very well with this scenario. The effect of it is back-to-back GC causing thousands of GCs when <100 were really needed. Removing this conservative measure causes the regression to go away, and a bunch of green benchmarks to pop up instead of red ones. This suggest that conservative measure isn't working very well and does not scale to these scenarios. It is also already taken care of by other conservative measures. I propose it is removed, to solve the regressions. > The entire ZGC perf suite was rerun (no red, only green), and tier1-3 was run. There isn't much potential for things to blow up as I just removed one parameter in the heuristic. So if anything nasty was to show up, it would be in the perf runs. Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk17/pull/298 From eosterlund at openjdk.java.net Tue Aug 3 08:58:31 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 3 Aug 2021 08:58:31 GMT Subject: [jdk17] RFR: 8271064: ZGC several jvm08 perf regressions after JDK-8268372 In-Reply-To: <1SWdYM3veIQNe-I_wTJL1RRW374Dur4B0Vzux-AWRsI=.6dce58de-46f0-4ab7-9e60-7f59f857e274@github.com> References: <1SWdYM3veIQNe-I_wTJL1RRW374Dur4B0Vzux-AWRsI=.6dce58de-46f0-4ab7-9e60-7f59f857e274@github.com> Message-ID: On Tue, 3 Aug 2021 08:22:58 GMT, Albert Mingkun Yang wrote: >> After the integration of ZGC dynamic thread selection, there was a regression that popped up on a few benchmarks that all had in common a high allocation rate and very quick GC times. An extra conservative measure for not trusting the heuristics when using few GC threads, did not play very well with this scenario. The effect of it is back-to-back GC causing thousands of GCs when <100 were really needed. Removing this conservative measure causes the regression to go away, and a bunch of green benchmarks to pop up instead of red ones. This suggest that conservative measure isn't working very well and does not scale to these scenarios. It is also already taken care of by other conservative measures. I propose it is removed, to solve the regressions. >> The entire ZGC perf suite was rerun (no red, only green), and tier1-3 was run. There isn't much potential for things to blow up as I just removed one parameter in the heuristic. So if anything nasty was to show up, it would be in the perf runs. > > Marked as reviewed by ayang (Committer). Thanks for the reviews, @albertnetymk, @pliden and @tschatzl ------------- PR: https://git.openjdk.java.net/jdk17/pull/298 From tschatzl at openjdk.java.net Tue Aug 3 09:05:30 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 3 Aug 2021 09:05:30 GMT Subject: RFR: JDK-8271721: Split gc/g1/TestMixedGCLiveThreshold into separate tests In-Reply-To: References: Message-ID: On Tue, 3 Aug 2021 07:24:49 GMT, Thomas Stuefe wrote: > Hi, may I please have reviews for this trivial change: > > TestMixedGCLiveThreshold executes three tests in one by spawning off three child processes sequentially. In order to make the tests easier to analyze and to be able to execute them concurrently it would be helpful to have three subtests instead. > > I also added diagnostic output. > > Note: this was motivated while looking at test errors on ppc similar to JDK-8211081. > > ..Thomas While it is a very simple change, it does not qualify as "trivial" as per openjdk rules to skip the 24h rule imho. There also seems to be no rush in getting this change in either. ------------- PR: https://git.openjdk.java.net/jdk/pull/4968 From stuefe at openjdk.java.net Tue Aug 3 09:19:29 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 3 Aug 2021 09:19:29 GMT Subject: RFR: JDK-8271721: Split gc/g1/TestMixedGCLiveThreshold into separate tests In-Reply-To: References: Message-ID: On Tue, 3 Aug 2021 07:24:49 GMT, Thomas Stuefe wrote: > Hi, may I please have reviews for this trivial change: > > TestMixedGCLiveThreshold executes three tests in one by spawning off three child processes sequentially. In order to make the tests easier to analyze and to be able to execute them concurrently it would be helpful to have three subtests instead. > > I also added diagnostic output. > > Note: this was motivated while looking at test errors on ppc similar to JDK-8211081. > > ..Thomas Okay, no problem. Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/4968 From pliden at openjdk.java.net Tue Aug 3 09:22:32 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 3 Aug 2021 09:22:32 GMT Subject: RFR: 8270347: ZGC: Adopt release-acquire ordering for forwarding table access In-Reply-To: References: Message-ID: On Tue, 13 Jul 2021 06:52:56 GMT, Xiaowei Lu wrote: > ZGC forwarding table records entries to track the destination of object relocation. Currently, the entry insertion (ZForwarding::insert()) adopts memory_order_conservative to guarantee that (1) the object copy always happens before the installation of forwardee, and (2) the other thread that accesses the same entry (ZForwarding::at() with load_acquire semantic) is able to access the correct contents of the forwarded object. > > Let us consider memory_order_release for the entry insertion in ZForwarding::insert(). Pairing with the entry access in ZForwarding::at(), the forwarding table adopts release-acquire memory ordering. The two statements we mentioned above can also be guaranteed by the release-acquire ordering. > > We performed an experiment on benchmark SPECjvm2008.sunflow on AArch64. The concurrent relocation time is listed below. The optimized version results in shorter average concurrent relocation time. Furthermore, it could benefit the throughput of ZGC. > > $ grep "[50]00.*Phase: Concurrent Relocate" optimized.log > [500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.006 / 4.449 4.041 / 5.361 4.041 / 5.361 4.041 / 5.361 ms > [1000.506s][info][gc,stats ] Phase: Concurrent Relocate 4.512 / 5.278 4.213 / 5.278 4.146 / 5.361 4.146 / 5.361 ms > [1500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.831 / 5.524 4.446 / 5.584 4.253 / 5.584 4.253 / 5.584 ms > [2000.506s][info][gc,stats ] Phase: Concurrent Relocate 4.037 / 4.649 4.391 / 5.524 4.281 / 5.584 4.281 / 5.584 ms > [2500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.256 / 4.568 4.198 / 5.022 4.265 / 5.584 4.265 / 5.584 ms > [3000.506s][info][gc,stats ] Phase: Concurrent Relocate 3.032 / 4.424 3.810 / 24.709 4.173 / 24.709 4.173 / 24.709 ms > [3500.506s][info][gc,stats ] Phase: Concurrent Relocate 3.740 / 4.598 3.304 / 4.872 4.050 / 24.709 4.050 / 24.709 ms > > $ grep "[50]00.*Phase: Concurrent Relocate" baseline.log > [500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.634 / 5.191 4.425 / 5.490 4.425 / 5.490 4.425 / 5.490 ms > [1000.545s][info][gc,stats ] Phase: Concurrent Relocate 4.177 / 4.731 4.414 / 5.543 4.400 / 5.543 4.400 / 5.543 ms > [1500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.560 / 4.894 4.441 / 5.543 4.427 / 5.543 4.427 / 5.543 ms > [2000.546s][info][gc,stats ] Phase: Concurrent Relocate 4.509 / 5.100 4.591 / 5.739 4.468 / 5.739 4.468 / 5.739 ms > [2500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.543 / 5.533 4.685 / 5.762 4.511 / 5.762 4.511 / 5.762 ms > [3000.546s][info][gc,stats ] Phase: Concurrent Relocate 4.423 / 4.834 4.635 / 5.895 4.530 / 5.895 4.530 / 5.895 ms > [3500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.152 / 5.243 4.313 / 24.341 4.493 / 24.341 4.493 / 24.341 ms Looks good! ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4763 From pliden at openjdk.java.net Tue Aug 3 09:27:33 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 3 Aug 2021 09:27:33 GMT Subject: RFR: 8270347: ZGC: Adopt release-acquire ordering for forwarding table access In-Reply-To: References: Message-ID: On Tue, 13 Jul 2021 06:52:56 GMT, Xiaowei Lu wrote: > ZGC forwarding table records entries to track the destination of object relocation. Currently, the entry insertion (ZForwarding::insert()) adopts memory_order_conservative to guarantee that (1) the object copy always happens before the installation of forwardee, and (2) the other thread that accesses the same entry (ZForwarding::at() with load_acquire semantic) is able to access the correct contents of the forwarded object. > > Let us consider memory_order_release for the entry insertion in ZForwarding::insert(). Pairing with the entry access in ZForwarding::at(), the forwarding table adopts release-acquire memory ordering. The two statements we mentioned above can also be guaranteed by the release-acquire ordering. > > We performed an experiment on benchmark SPECjvm2008.sunflow on AArch64. The concurrent relocation time is listed below. The optimized version results in shorter average concurrent relocation time. Furthermore, it could benefit the throughput of ZGC. > > $ grep "[50]00.*Phase: Concurrent Relocate" optimized.log > [500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.006 / 4.449 4.041 / 5.361 4.041 / 5.361 4.041 / 5.361 ms > [1000.506s][info][gc,stats ] Phase: Concurrent Relocate 4.512 / 5.278 4.213 / 5.278 4.146 / 5.361 4.146 / 5.361 ms > [1500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.831 / 5.524 4.446 / 5.584 4.253 / 5.584 4.253 / 5.584 ms > [2000.506s][info][gc,stats ] Phase: Concurrent Relocate 4.037 / 4.649 4.391 / 5.524 4.281 / 5.584 4.281 / 5.584 ms > [2500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.256 / 4.568 4.198 / 5.022 4.265 / 5.584 4.265 / 5.584 ms > [3000.506s][info][gc,stats ] Phase: Concurrent Relocate 3.032 / 4.424 3.810 / 24.709 4.173 / 24.709 4.173 / 24.709 ms > [3500.506s][info][gc,stats ] Phase: Concurrent Relocate 3.740 / 4.598 3.304 / 4.872 4.050 / 24.709 4.050 / 24.709 ms > > $ grep "[50]00.*Phase: Concurrent Relocate" baseline.log > [500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.634 / 5.191 4.425 / 5.490 4.425 / 5.490 4.425 / 5.490 ms > [1000.545s][info][gc,stats ] Phase: Concurrent Relocate 4.177 / 4.731 4.414 / 5.543 4.400 / 5.543 4.400 / 5.543 ms > [1500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.560 / 4.894 4.441 / 5.543 4.427 / 5.543 4.427 / 5.543 ms > [2000.546s][info][gc,stats ] Phase: Concurrent Relocate 4.509 / 5.100 4.591 / 5.739 4.468 / 5.739 4.468 / 5.739 ms > [2500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.543 / 5.533 4.685 / 5.762 4.511 / 5.762 4.511 / 5.762 ms > [3000.546s][info][gc,stats ] Phase: Concurrent Relocate 4.423 / 4.834 4.635 / 5.895 4.530 / 5.895 4.530 / 5.895 ms > [3500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.152 / 5.243 4.313 / 24.341 4.493 / 24.341 4.493 / 24.341 ms I'll sponsor the change. ------------- PR: https://git.openjdk.java.net/jdk/pull/4763 From rrich at openjdk.java.net Tue Aug 3 10:27:30 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Tue, 3 Aug 2021 10:27:30 GMT Subject: RFR: JDK-8271721: Split gc/g1/TestMixedGCLiveThreshold into separate tests In-Reply-To: References: Message-ID: On Tue, 3 Aug 2021 07:24:49 GMT, Thomas Stuefe wrote: > Hi, may I please have reviews for this trivial change: > > TestMixedGCLiveThreshold executes three tests in one by spawning off three child processes sequentially. In order to make the tests easier to analyze and to be able to execute them concurrently it would be helpful to have three subtests instead. > > I also added diagnostic output. > > Note: this was motivated while looking at test errors on ppc similar to JDK-8211081. > > ..Thomas Looks good to me. Cheers, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4968 From stuefe at openjdk.java.net Tue Aug 3 10:34:29 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 3 Aug 2021 10:34:29 GMT Subject: RFR: JDK-8271721: Split gc/g1/TestMixedGCLiveThreshold into separate tests In-Reply-To: References: Message-ID: <3R27llqY_TxPHf57Au4KRdCxHwVFrWcncdVbE0ZK3J4=.98f0b99e-ce9c-4cae-87a3-26e11a0042b9@github.com> On Tue, 3 Aug 2021 10:24:42 GMT, Richard Reingruber wrote: > Looks good to me. > > Cheers, Richard. Thanks! I'll wait the obligatory 24hrs and push tomorrow. ------------- PR: https://git.openjdk.java.net/jdk/pull/4968 From github.com+39413832+weixlu at openjdk.java.net Tue Aug 3 11:29:49 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Tue, 3 Aug 2021 11:29:49 GMT Subject: RFR: 8270347: ZGC: Adopt release-acquire ordering for forwarding table access In-Reply-To: References: Message-ID: On Tue, 3 Aug 2021 09:24:42 GMT, Per Liden wrote: >> ZGC forwarding table records entries to track the destination of object relocation. Currently, the entry insertion (ZForwarding::insert()) adopts memory_order_conservative to guarantee that (1) the object copy always happens before the installation of forwardee, and (2) the other thread that accesses the same entry (ZForwarding::at() with load_acquire semantic) is able to access the correct contents of the forwarded object. >> >> Let us consider memory_order_release for the entry insertion in ZForwarding::insert(). Pairing with the entry access in ZForwarding::at(), the forwarding table adopts release-acquire memory ordering. The two statements we mentioned above can also be guaranteed by the release-acquire ordering. >> >> We performed an experiment on benchmark SPECjvm2008.sunflow on AArch64. The concurrent relocation time is listed below. The optimized version results in shorter average concurrent relocation time. Furthermore, it could benefit the throughput of ZGC. >> >> $ grep "[50]00.*Phase: Concurrent Relocate" optimized.log >> [500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.006 / 4.449 4.041 / 5.361 4.041 / 5.361 4.041 / 5.361 ms >> [1000.506s][info][gc,stats ] Phase: Concurrent Relocate 4.512 / 5.278 4.213 / 5.278 4.146 / 5.361 4.146 / 5.361 ms >> [1500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.831 / 5.524 4.446 / 5.584 4.253 / 5.584 4.253 / 5.584 ms >> [2000.506s][info][gc,stats ] Phase: Concurrent Relocate 4.037 / 4.649 4.391 / 5.524 4.281 / 5.584 4.281 / 5.584 ms >> [2500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.256 / 4.568 4.198 / 5.022 4.265 / 5.584 4.265 / 5.584 ms >> [3000.506s][info][gc,stats ] Phase: Concurrent Relocate 3.032 / 4.424 3.810 / 24.709 4.173 / 24.709 4.173 / 24.709 ms >> [3500.506s][info][gc,stats ] Phase: Concurrent Relocate 3.740 / 4.598 3.304 / 4.872 4.050 / 24.709 4.050 / 24.709 ms >> >> $ grep "[50]00.*Phase: Concurrent Relocate" baseline.log >> [500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.634 / 5.191 4.425 / 5.490 4.425 / 5.490 4.425 / 5.490 ms >> [1000.545s][info][gc,stats ] Phase: Concurrent Relocate 4.177 / 4.731 4.414 / 5.543 4.400 / 5.543 4.400 / 5.543 ms >> [1500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.560 / 4.894 4.441 / 5.543 4.427 / 5.543 4.427 / 5.543 ms >> [2000.546s][info][gc,stats ] Phase: Concurrent Relocate 4.509 / 5.100 4.591 / 5.739 4.468 / 5.739 4.468 / 5.739 ms >> [2500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.543 / 5.533 4.685 / 5.762 4.511 / 5.762 4.511 / 5.762 ms >> [3000.546s][info][gc,stats ] Phase: Concurrent Relocate 4.423 / 4.834 4.635 / 5.895 4.530 / 5.895 4.530 / 5.895 ms >> [3500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.152 / 5.243 4.313 / 24.341 4.493 / 24.341 4.493 / 24.341 ms > > I'll sponsor the change. @pliden @fisk @stooart-mon Thank you very much for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/4763 From github.com+39413832+weixlu at openjdk.java.net Tue Aug 3 12:26:32 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Tue, 3 Aug 2021 12:26:32 GMT Subject: Integrated: 8270347: ZGC: Adopt release-acquire ordering for forwarding table access In-Reply-To: References: Message-ID: <5hsNZPJ2EMzLGlq90lFK31srs-Vnu71dqBfFV_s1-nE=.79909d80-e26f-4248-bec8-efdb6cb00870@github.com> On Tue, 13 Jul 2021 06:52:56 GMT, Xiaowei Lu wrote: > ZGC forwarding table records entries to track the destination of object relocation. Currently, the entry insertion (ZForwarding::insert()) adopts memory_order_conservative to guarantee that (1) the object copy always happens before the installation of forwardee, and (2) the other thread that accesses the same entry (ZForwarding::at() with load_acquire semantic) is able to access the correct contents of the forwarded object. > > Let us consider memory_order_release for the entry insertion in ZForwarding::insert(). Pairing with the entry access in ZForwarding::at(), the forwarding table adopts release-acquire memory ordering. The two statements we mentioned above can also be guaranteed by the release-acquire ordering. > > We performed an experiment on benchmark SPECjvm2008.sunflow on AArch64. The concurrent relocation time is listed below. The optimized version results in shorter average concurrent relocation time. Furthermore, it could benefit the throughput of ZGC. > > $ grep "[50]00.*Phase: Concurrent Relocate" optimized.log > [500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.006 / 4.449 4.041 / 5.361 4.041 / 5.361 4.041 / 5.361 ms > [1000.506s][info][gc,stats ] Phase: Concurrent Relocate 4.512 / 5.278 4.213 / 5.278 4.146 / 5.361 4.146 / 5.361 ms > [1500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.831 / 5.524 4.446 / 5.584 4.253 / 5.584 4.253 / 5.584 ms > [2000.506s][info][gc,stats ] Phase: Concurrent Relocate 4.037 / 4.649 4.391 / 5.524 4.281 / 5.584 4.281 / 5.584 ms > [2500.506s][info][gc,stats ] Phase: Concurrent Relocate 4.256 / 4.568 4.198 / 5.022 4.265 / 5.584 4.265 / 5.584 ms > [3000.506s][info][gc,stats ] Phase: Concurrent Relocate 3.032 / 4.424 3.810 / 24.709 4.173 / 24.709 4.173 / 24.709 ms > [3500.506s][info][gc,stats ] Phase: Concurrent Relocate 3.740 / 4.598 3.304 / 4.872 4.050 / 24.709 4.050 / 24.709 ms > > $ grep "[50]00.*Phase: Concurrent Relocate" baseline.log > [500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.634 / 5.191 4.425 / 5.490 4.425 / 5.490 4.425 / 5.490 ms > [1000.545s][info][gc,stats ] Phase: Concurrent Relocate 4.177 / 4.731 4.414 / 5.543 4.400 / 5.543 4.400 / 5.543 ms > [1500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.560 / 4.894 4.441 / 5.543 4.427 / 5.543 4.427 / 5.543 ms > [2000.546s][info][gc,stats ] Phase: Concurrent Relocate 4.509 / 5.100 4.591 / 5.739 4.468 / 5.739 4.468 / 5.739 ms > [2500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.543 / 5.533 4.685 / 5.762 4.511 / 5.762 4.511 / 5.762 ms > [3000.546s][info][gc,stats ] Phase: Concurrent Relocate 4.423 / 4.834 4.635 / 5.895 4.530 / 5.895 4.530 / 5.895 ms > [3500.545s][info][gc,stats ] Phase: Concurrent Relocate 4.152 / 5.243 4.313 / 24.341 4.493 / 24.341 4.493 / 24.341 ms This pull request has now been integrated. Changeset: bdb50cab Author: Xiaowei Lu Committer: Per Liden URL: https://git.openjdk.java.net/jdk/commit/bdb50cab79056bb2ac9fe1ba0cf0f237317052da Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8270347: ZGC: Adopt release-acquire ordering for forwarding table access Co-authored-by: Hao Tang Reviewed-by: eosterlund, pliden ------------- PR: https://git.openjdk.java.net/jdk/pull/4763 From iwalulya at openjdk.java.net Tue Aug 3 13:32:42 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 3 Aug 2021 13:32:42 GMT Subject: RFR: 8265057: G1: Investigate removal of maintenance of two BOT thresholds Message-ID: Hi all, Please review this change to remove `_next_offset_index` and maintain only one threshold `_next_offset_threshold` a direct pointer to the BOT entry in the BOT. We derive the next_offset_index from from _next_offset_threshold where required. Performance tests do not show any regression as a result of this change. Testing: Tier 1-5 ------------- Commit messages: - change asserts - Merge branch 'master' into JDK-8265057_BOT_thresholds - Merge branch 'master' into JDK-8265057_BOT_thresholds - Merge branch 'master' into JDK-8265057_BOT_thresholds - remove _next_offset_index Changes: https://git.openjdk.java.net/jdk/pull/4972/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4972&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8265057 Stats: 30 lines in 3 files changed: 4 ins; 7 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/4972.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4972/head:pull/4972 PR: https://git.openjdk.java.net/jdk/pull/4972 From rrich at openjdk.java.net Tue Aug 3 13:38:43 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Tue, 3 Aug 2021 13:38:43 GMT Subject: RFR: 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers Message-ID: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> I tried to make this pr is dependent on #4968. Hope this works... The change fixes a test issue in gc/g1/TestMixedGCLiveThreshold.java by adding the option `-XX:ParallelGCThreads=1`. This prevents the full gc from running with more than one worker because if it would do that, live data could be distributed into several regions and one of them could be selected for collection which is unexpected with `-XX:G1MixedGCLiveThresholdPercent=25`. Please refer to the JBS-Item for more details. Thanks, Richard. ------------- Depends on: https://git.openjdk.java.net/jdk/pull/4968 Commit messages: - 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers Changes: https://git.openjdk.java.net/jdk/pull/4971/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4971&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271722 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4971.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4971/head:pull/4971 PR: https://git.openjdk.java.net/jdk/pull/4971 From ayang at openjdk.java.net Tue Aug 3 13:41:29 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 3 Aug 2021 13:41:29 GMT Subject: RFR: 8265057: G1: Investigate removal of maintenance of two BOT thresholds In-Reply-To: References: Message-ID: On Tue, 3 Aug 2021 13:25:07 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to remove `_next_offset_index` and maintain only one threshold `_next_offset_threshold` a direct pointer to the BOT entry in the BOT. We derive the next_offset_index from from _next_offset_threshold where required. Performance tests do not show any regression as a result of this change. > > Testing: Tier 1-5 Marked as reviewed by ayang (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4972 From ayang at openjdk.java.net Tue Aug 3 14:15:29 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 3 Aug 2021 14:15:29 GMT Subject: RFR: 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers In-Reply-To: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> References: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> Message-ID: <6K5s9XmIcqsycp8GQBt2F1pa5PNW2rP1EQPiQa6K1JI=.c63182ad-c02e-4eaa-9f6b-c11ad4cce630@github.com> On Tue, 3 Aug 2021 13:17:48 GMT, Richard Reingruber wrote: > I tried to make this pr is dependent on #4968. Hope this works... > > The change fixes a test issue in gc/g1/TestMixedGCLiveThreshold.java by adding > the option `-XX:ParallelGCThreads=1`. This prevents the full gc from running > with more than one worker because if it would do that, live data could be > distributed into several regions and one of them could be selected for > collection which is unexpected with `-XX:G1MixedGCLiveThresholdPercent=25`. > > Please refer to the JBS-Item for more details. > > Thanks, Richard. Marked as reviewed by ayang (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4971 From tschatzl at openjdk.java.net Tue Aug 3 14:31:35 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 3 Aug 2021 14:31:35 GMT Subject: RFR: 8265057: G1: Investigate removal of maintenance of two BOT thresholds In-Reply-To: References: Message-ID: On Tue, 3 Aug 2021 13:25:07 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to remove `_next_offset_index` and maintain only one threshold `_next_offset_threshold` a direct pointer to the BOT entry in the BOT. We derive the next_offset_index from from _next_offset_threshold where required. Performance tests do not show any regression as a result of this change. > > Testing: Tier 1-5 Lgtm apart from the suggested fixes to some comments. src/hotspot/share/gc/g1/g1BlockOffsetTable.cpp line 267: > 265: // For efficiency, do copy-in/copy-out. > 266: HeapWord* threshold = *threshold_; > 267: size_t index = _bot->index_for_raw(threshold); Superfluous spaces but maybe keep it for consistency in this method/file. src/hotspot/share/gc/g1/g1BlockOffsetTable.cpp line 309: > 307: assert(threshold >= blk_end, "Incorrect offset threshold"); > 308: > 309: // index_ and threshold_ updated here. Stale comment about `_index`. Maybe just "Finally update threshold_." at the end, but simply removing the comment seems as fine. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4972 From tschatzl at openjdk.java.net Tue Aug 3 14:41:30 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 3 Aug 2021 14:41:30 GMT Subject: RFR: 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers In-Reply-To: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> References: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> Message-ID: On Tue, 3 Aug 2021 13:17:48 GMT, Richard Reingruber wrote: > I tried to make this pr is dependent on #4968. Hope this works... > > The change fixes a test issue in gc/g1/TestMixedGCLiveThreshold.java by adding > the option `-XX:ParallelGCThreads=1`. This prevents the full gc from running > with more than one worker because if it would do that, live data could be > distributed into several regions and one of them could be selected for > collection which is unexpected with `-XX:G1MixedGCLiveThresholdPercent=25`. > > Please refer to the JBS-Item for more details. > > Thanks, Richard. Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4971 From iwalulya at openjdk.java.net Tue Aug 3 16:46:51 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 3 Aug 2021 16:46:51 GMT Subject: RFR: 8265057: G1: Investigate removal of maintenance of two BOT thresholds [v2] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to remove `_next_offset_index` and maintain only one threshold `_next_offset_threshold` a direct pointer to the BOT entry in the BOT. We derive the next_offset_index from from _next_offset_threshold where required. Performance tests do not show any regression as a result of this change. > > Testing: Tier 1-5 Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: sreviewer uggestiedto reemove comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4972/files - new: https://git.openjdk.java.net/jdk/pull/4972/files/4d412780..22afbff9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4972&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4972&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4972.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4972/head:pull/4972 PR: https://git.openjdk.java.net/jdk/pull/4972 From github.com+42199859+nikolaytach at openjdk.java.net Tue Aug 3 19:50:31 2021 From: github.com+42199859+nikolaytach at openjdk.java.net (NikolayTach) Date: Tue, 3 Aug 2021 19:50:31 GMT Subject: RFR: 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers In-Reply-To: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> References: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> Message-ID: On Tue, 3 Aug 2021 13:17:48 GMT, Richard Reingruber wrote: > I tried to make this pr is dependent on #4968. Hope this works... > > The change fixes a test issue in gc/g1/TestMixedGCLiveThreshold.java by adding > the option `-XX:ParallelGCThreads=1`. This prevents the full gc from running > with more than one worker because if it would do that, live data could be > distributed into several regions and one of them could be selected for > collection which is unexpected with `-XX:G1MixedGCLiveThresholdPercent=25`. > > Please refer to the JBS-Item for more details. > > Thanks, Richard. Can use curl to watch this? ------------- PR: https://git.openjdk.java.net/jdk/pull/4971 From mli at openjdk.java.net Wed Aug 4 01:13:48 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 4 Aug 2021 01:13:48 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space Message-ID: Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. ------------- Commit messages: - G1: move copy before CAS to improve performance Changes: https://git.openjdk.java.net/jdk/pull/4983/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4983&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271579 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4983.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4983/head:pull/4983 PR: https://git.openjdk.java.net/jdk/pull/4983 From mli at openjdk.java.net Wed Aug 4 01:26:29 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 4 Aug 2021 01:26:29 GMT Subject: RFR: 8270842: G1: Only young regions need to redirty outside references in remset. In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 02:13:38 GMT, Hamlin Li wrote: > For evac failure objects in non-young regions (old) of cset, the outside reference remset already recorded in the dirty queue, we only needs to do it for obj in young regions Thanks Thomas, looking forward to hear the further result from you. (I'm sorry for my previous comments, I tried to put patch/diff content in the comment, but seems it's a mess. I start/end patch content with a ''', but seems it does not work as expected.) ------------- PR: https://git.openjdk.java.net/jdk/pull/4853 From jiefu at openjdk.java.net Wed Aug 4 01:38:29 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 4 Aug 2021 01:38:29 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space In-Reply-To: References: Message-ID: <6MsNRNFLeZf0Rvp2RCj5PAko_ZnHV5CA8Lcb77R1Yy4=.c49a12c2-1e1c-4339-a8af-993e38e5159a@github.com> On Wed, 4 Aug 2021 01:06:47 GMT, Hamlin Li wrote: > Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. > > After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. Amazing. So what do you think is the reason for the performance gain? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From stuefe at openjdk.java.net Wed Aug 4 04:14:32 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 4 Aug 2021 04:14:32 GMT Subject: Integrated: JDK-8271721: Split gc/g1/TestMixedGCLiveThreshold into separate tests In-Reply-To: References: Message-ID: On Tue, 3 Aug 2021 07:24:49 GMT, Thomas Stuefe wrote: > Hi, may I please have reviews for this trivial change: > > TestMixedGCLiveThreshold executes three tests in one by spawning off three child processes sequentially. In order to make the tests easier to analyze and to be able to execute them concurrently it would be helpful to have three subtests instead. > > I also added diagnostic output. > > Note: this was motivated while looking at test errors on ppc similar to JDK-8211081. > > ..Thomas This pull request has now been integrated. Changeset: 66c653c5 Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/66c653c561b3b5e904579af62e23ff94952bca05 Stats: 35 lines in 1 file changed: 23 ins; 5 del; 7 mod 8271721: Split gc/g1/TestMixedGCLiveThreshold into separate tests Reviewed-by: tschatzl, rrich ------------- PR: https://git.openjdk.java.net/jdk/pull/4968 From rrich at openjdk.java.net Wed Aug 4 04:17:50 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Wed, 4 Aug 2021 04:17:50 GMT Subject: RFR: 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers [v2] In-Reply-To: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> References: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> Message-ID: > I tried to make this pr is dependent on #4968. Hope this works... > > The change fixes a test issue in gc/g1/TestMixedGCLiveThreshold.java by adding > the option `-XX:ParallelGCThreads=1`. This prevents the full gc from running > with more than one worker because if it would do that, live data could be > distributed into several regions and one of them could be selected for > collection which is unexpected with `-XX:G1MixedGCLiveThresholdPercent=25`. > > Please refer to the JBS-Item for more details. > > Thanks, Richard. Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4971/files - new: https://git.openjdk.java.net/jdk/pull/4971/files/7d759975..7d759975 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4971&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4971&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4971.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4971/head:pull/4971 PR: https://git.openjdk.java.net/jdk/pull/4971 From "github.com+73116608+openjdk-notifier[bot]" at openjdk.java.net Wed Aug 4 04:17:50 2021 From: "github.com+73116608+openjdk-notifier[bot]" at openjdk.java.net (openjdk-notifier [bot]) Date: Wed, 4 Aug 2021 04:17:50 GMT Subject: RFR: 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers In-Reply-To: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> References: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> Message-ID: On Tue, 3 Aug 2021 13:17:48 GMT, Richard Reingruber wrote: > I tried to make this pr is dependent on #4968. Hope this works... > > The change fixes a test issue in gc/g1/TestMixedGCLiveThreshold.java by adding > the option `-XX:ParallelGCThreads=1`. This prevents the full gc from running > with more than one worker because if it would do that, live data could be > distributed into several regions and one of them could be selected for > collection which is unexpected with `-XX:G1MixedGCLiveThresholdPercent=25`. > > Please refer to the JBS-Item for more details. > > Thanks, Richard. The dependent pull request has now been integrated, and the target branch of this pull request has been updated. This means that changes from the dependent pull request can start to show up as belonging to this pull request, which may be confusing for reviewers. To remedy this situation, simply merge the latest changes from the new target branch into this pull request by running commands similar to these in the local repository for your personal fork: git checkout JDK-8271721-split-TestMixedGCLiveThreshold git fetch https://git.openjdk.java.net/jdk master git merge FETCH_HEAD # if there are conflicts, follow the instructions given by git merge git commit -m "Merge master" git push ------------- PR: https://git.openjdk.java.net/jdk/pull/4971 From rrich at openjdk.java.net Wed Aug 4 07:34:29 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Wed, 4 Aug 2021 07:34:29 GMT Subject: RFR: 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers In-Reply-To: References: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> Message-ID: On Tue, 3 Aug 2021 19:47:24 GMT, NikolayTach wrote: > > > Can use curl to watch this? I guess you could... ------------- PR: https://git.openjdk.java.net/jdk/pull/4971 From rrich at openjdk.java.net Wed Aug 4 07:51:07 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Wed, 4 Aug 2021 07:51:07 GMT Subject: RFR: 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers [v3] In-Reply-To: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> References: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> Message-ID: > I tried to make this pr is dependent on #4968. Hope this works... > > The change fixes a test issue in gc/g1/TestMixedGCLiveThreshold.java by adding > the option `-XX:ParallelGCThreads=1`. This prevents the full gc from running > with more than one worker because if it would do that, live data could be > distributed into several regions and one of them could be selected for > collection which is unexpected with `-XX:G1MixedGCLiveThresholdPercent=25`. > > Please refer to the JBS-Item for more details. > > Thanks, Richard. Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' - 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers - start ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4971/files - new: https://git.openjdk.java.net/jdk/pull/4971/files/7d759975..3762a0dc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4971&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4971&range=01-02 Stats: 1876 lines in 109 files changed: 1339 ins; 342 del; 195 mod Patch: https://git.openjdk.java.net/jdk/pull/4971.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4971/head:pull/4971 PR: https://git.openjdk.java.net/jdk/pull/4971 From rrich at openjdk.java.net Wed Aug 4 08:17:41 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Wed, 4 Aug 2021 08:17:41 GMT Subject: RFR: 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers [v3] In-Reply-To: References: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> Message-ID: On Wed, 4 Aug 2021 07:51:07 GMT, Richard Reingruber wrote: >> I tried to make this pr is dependent on #4968. Hope this works... >> >> The change fixes a test issue in gc/g1/TestMixedGCLiveThreshold.java by adding >> the option `-XX:ParallelGCThreads=1`. This prevents the full gc from running >> with more than one worker because if it would do that, live data could be >> distributed into several regions and one of them could be selected for >> collection which is unexpected with `-XX:G1MixedGCLiveThresholdPercent=25`. >> >> Please refer to the JBS-Item for more details. >> >> Thanks, Richard. > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' > - 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers > - start > > > The dependent pull request has now been integrated, and the target branch of this pull request has been updated. This means that changes from the dependent pull request can start to show up as belonging to this pull request, which may be confusing for reviewers. To remedy this situation, simply merge the latest changes from the new target branch into this pull request by running commands similar to these in the local repository for your personal fork: > > ```shell > git checkout JDK-8271721-split-TestMixedGCLiveThreshold > git fetch https://git.openjdk.java.net/jdk master > git merge FETCH_HEAD > # if there are conflicts, follow the instructions given by git merge > git commit -m "Merge master" > git push > ``` Hi @edvbld, this is my first dependent pr. So far it went well but now I've got issues with the procedure above. First: ist there a `-b` missing in `git checkout JDK-8271721-split-TestMixedGCLiveThreshold`? I've executed all the steps above (with -b in the first command) but I still see a commit from #4968 in this pr's commits. Besides that I don't quite understand how it would help to create that new branch and push it to my personal fork. I expected merging master into this pr's branch (https://github.com/reinrich/jdk/tree/8271722__TESTBUG__gc_g1_TestMixedGCLiveThreshold_java_can_fail_if_G1_Full_GC_uses__gt_1_workers) would help but that didn't. Can you help to get rid of the 'start' commit by @tstuefe in this pr's commits? Cheers, Richard. ------------- PR: https://git.openjdk.java.net/jdk/pull/4971 From rrich at openjdk.java.net Wed Aug 4 09:08:33 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Wed, 4 Aug 2021 09:08:33 GMT Subject: RFR: 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers [v3] In-Reply-To: References: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> Message-ID: On Wed, 4 Aug 2021 08:14:05 GMT, Richard Reingruber wrote: > > > > The dependent pull request has now been integrated, and the target branch of this pull request has been updated. This means that changes from the dependent pull request can start to show up as belonging to this pull request, which may be confusing for reviewers. To remedy this situation, simply merge the latest changes from the new target branch into this pull request by running commands similar to these in the local repository for your personal fork: > > ```shell > > git checkout JDK-8271721-split-TestMixedGCLiveThreshold > > git fetch https://git.openjdk.java.net/jdk master > > git merge FETCH_HEAD > > # if there are conflicts, follow the instructions given by git merge > > git commit -m "Merge master" > > git push > > ``` > > Hi @edvbld, > > this is my first dependent pr. So far it went well but now I've got issues with the procedure above. > > First: ist there a `-b` missing in `git checkout JDK-8271721-split-TestMixedGCLiveThreshold`? > > I've executed all the steps above (with -b in the first command) but I still see a commit from #4968 in this pr's commits. > > Besides that I don't quite understand how it would help to create that new branch and push it to my personal fork. > > I expected merging master into this pr's branch (https://github.com/reinrich/jdk/tree/8271722__TESTBUG__gc_g1_TestMixedGCLiveThreshold_java_can_fail_if_G1_Full_GC_uses__gt_1_workers) would help but that didn't. > > Can you help to get rid of the 'start' commit by @tstuefe in this pr's commits? > > Cheers, Richard. Hi @kevinrushforth and @erikj79, @edvbld seems to be less engaged with skara recently. Maybe you can help me out with the issues described above? Thanks, Richard. ------------- PR: https://git.openjdk.java.net/jdk/pull/4971 From lkorinth at openjdk.java.net Wed Aug 4 09:11:30 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Wed, 4 Aug 2021 09:11:30 GMT Subject: RFR: 8271217: Fix race between G1PeriodicGCTask checks and GC request In-Reply-To: References: Message-ID: On Mon, 26 Jul 2021 18:05:10 GMT, Kim Barrett wrote: > Please review this change to remove a race involving G1PeriodicGCTask. The > task performs various checks to determine whether a periodic GC should be > performed. If they pass, then it requests the GC by calling try_collect. > The problem is that an unrelated GC could occur after the task's checks but > before its request is processed. In that case we don't want to do the > periodic GC after all, but there's nothing to prevent it. > > This change lets the task capture the various GC counters as part of its > checking, and pass them to try_collect for use in detecting an intervening > GC. > > Testing: > mach5 tier1 Looks good to me. If you like, you could initialize the values of the counters in the default constructor (to zero for example). ------------- Marked as reviewed by lkorinth (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4905 From pliden at openjdk.java.net Wed Aug 4 09:28:41 2021 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 4 Aug 2021 09:28:41 GMT Subject: RFR: 8271121: ZGC: stack overflow (segv) when -Xlog:gc+start=debug Message-ID: <1oySXuASSlTRKZ7LKqR4l4wPIq8he_F9P0pNAQD6b9Q=.24ab2082-37b6-45b8-8078-e90a2b62eb8d@github.com> When an allocation stall happens we're trying to access oops (the thread and thread name) while logging. We shouldn't do that since it can lead to a recursive allocation stall situation, which eventually will cause a stack overflow. I suggest we simply don't log anything in ZStatCriticalPhase::register_start(). The useful part of the logging happens in ZStatCriticalPhase::register_end() anyway, where it's safe to access oops. ------------- Commit messages: - 8271121: ZGC: stack overflow (segv) when -Xlog:gc+start=debug Changes: https://git.openjdk.java.net/jdk/pull/4990/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4990&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271121 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/4990.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4990/head:pull/4990 PR: https://git.openjdk.java.net/jdk/pull/4990 From jiefu at openjdk.java.net Wed Aug 4 09:40:35 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 4 Aug 2021 09:40:35 GMT Subject: RFR: 8271121: ZGC: stack overflow (segv) when -Xlog:gc+start=debug In-Reply-To: <1oySXuASSlTRKZ7LKqR4l4wPIq8he_F9P0pNAQD6b9Q=.24ab2082-37b6-45b8-8078-e90a2b62eb8d@github.com> References: <1oySXuASSlTRKZ7LKqR4l4wPIq8he_F9P0pNAQD6b9Q=.24ab2082-37b6-45b8-8078-e90a2b62eb8d@github.com> Message-ID: <6mua6zpLPXraq9CJnLgOaHgR1IVeOzmZbU51AZ3gkXY=.fcfee003-b158-4284-9c74-094d303b3997@github.com> On Wed, 4 Aug 2021 09:22:54 GMT, Per Liden wrote: > When an allocation stall happens we're trying to access oops (the thread and thread name) while logging. We shouldn't do that since it can lead to a recursive allocation stall situation, which eventually will cause a stack overflow. I suggest we simply don't log anything in ZStatCriticalPhase::register_start(). The useful part of the logging happens in ZStatCriticalPhase::register_end() anyway, where it's safe to access oops. src/hotspot/share/gc/z/zStat.cpp line 762: > 760: _verbose(verbose) {} > 761: > 762: void ZStatCriticalPhase::register_start(const Ticks& start) const { So this method is empty. Shall we just remove it? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/4990 From pliden at openjdk.java.net Wed Aug 4 09:47:28 2021 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 4 Aug 2021 09:47:28 GMT Subject: RFR: 8271121: ZGC: stack overflow (segv) when -Xlog:gc+start=debug In-Reply-To: <6mua6zpLPXraq9CJnLgOaHgR1IVeOzmZbU51AZ3gkXY=.fcfee003-b158-4284-9c74-094d303b3997@github.com> References: <1oySXuASSlTRKZ7LKqR4l4wPIq8he_F9P0pNAQD6b9Q=.24ab2082-37b6-45b8-8078-e90a2b62eb8d@github.com> <6mua6zpLPXraq9CJnLgOaHgR1IVeOzmZbU51AZ3gkXY=.fcfee003-b158-4284-9c74-094d303b3997@github.com> Message-ID: On Wed, 4 Aug 2021 09:37:03 GMT, Jie Fu wrote: >> When an allocation stall happens we're trying to access oops (the thread and thread name) while logging. We shouldn't do that since it can lead to a recursive allocation stall situation, which eventually will cause a stack overflow. I suggest we simply don't log anything in ZStatCriticalPhase::register_start(). The useful part of the logging happens in ZStatCriticalPhase::register_end() anyway, where it's safe to access oops. > > src/hotspot/share/gc/z/zStat.cpp line 762: > >> 760: _verbose(verbose) {} >> 761: >> 762: void ZStatCriticalPhase::register_start(const Ticks& start) const { > > So this method is empty. > Shall we just remove it? > Thanks. No, because it's pure virtual in the super class. ------------- PR: https://git.openjdk.java.net/jdk/pull/4990 From ayang at openjdk.java.net Wed Aug 4 09:58:33 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 4 Aug 2021 09:58:33 GMT Subject: RFR: 8271121: ZGC: stack overflow (segv) when -Xlog:gc+start=debug In-Reply-To: <1oySXuASSlTRKZ7LKqR4l4wPIq8he_F9P0pNAQD6b9Q=.24ab2082-37b6-45b8-8078-e90a2b62eb8d@github.com> References: <1oySXuASSlTRKZ7LKqR4l4wPIq8he_F9P0pNAQD6b9Q=.24ab2082-37b6-45b8-8078-e90a2b62eb8d@github.com> Message-ID: On Wed, 4 Aug 2021 09:22:54 GMT, Per Liden wrote: > When an allocation stall happens we're trying to access oops (the thread and thread name) while logging. We shouldn't do that since it can lead to a recursive allocation stall situation, which eventually will cause a stack overflow. I suggest we simply don't log anything in ZStatCriticalPhase::register_start(). The useful part of the logging happens in ZStatCriticalPhase::register_end() anyway, where it's safe to access oops. Marked as reviewed by ayang (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4990 From eosterlund at openjdk.java.net Wed Aug 4 10:02:28 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 4 Aug 2021 10:02:28 GMT Subject: RFR: 8271121: ZGC: stack overflow (segv) when -Xlog:gc+start=debug In-Reply-To: <1oySXuASSlTRKZ7LKqR4l4wPIq8he_F9P0pNAQD6b9Q=.24ab2082-37b6-45b8-8078-e90a2b62eb8d@github.com> References: <1oySXuASSlTRKZ7LKqR4l4wPIq8he_F9P0pNAQD6b9Q=.24ab2082-37b6-45b8-8078-e90a2b62eb8d@github.com> Message-ID: On Wed, 4 Aug 2021 09:22:54 GMT, Per Liden wrote: > When an allocation stall happens we're trying to access oops (the thread and thread name) while logging. We shouldn't do that since it can lead to a recursive allocation stall situation, which eventually will cause a stack overflow. I suggest we simply don't log anything in ZStatCriticalPhase::register_start(). The useful part of the logging happens in ZStatCriticalPhase::register_end() anyway, where it's safe to access oops. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4990 From eosterlund at openjdk.java.net Wed Aug 4 10:32:38 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 4 Aug 2021 10:32:38 GMT Subject: [jdk17] Integrated: 8271064: ZGC several jvm08 perf regressions after JDK-8268372 In-Reply-To: References: Message-ID: On Tue, 3 Aug 2021 08:13:02 GMT, Erik ?sterlund wrote: > After the integration of ZGC dynamic thread selection, there was a regression that popped up on a few benchmarks that all had in common a high allocation rate and very quick GC times. An extra conservative measure for not trusting the heuristics when using few GC threads, did not play very well with this scenario. The effect of it is back-to-back GC causing thousands of GCs when <100 were really needed. Removing this conservative measure causes the regression to go away, and a bunch of green benchmarks to pop up instead of red ones. This suggest that conservative measure isn't working very well and does not scale to these scenarios. It is also already taken care of by other conservative measures. I propose it is removed, to solve the regressions. > The entire ZGC perf suite was rerun (no red, only green), and tier1-3 was run. There isn't much potential for things to blow up as I just removed one parameter in the heuristic. So if anything nasty was to show up, it would be in the perf runs. This pull request has now been integrated. Changeset: 181483b9 Author: Erik ?sterlund URL: https://git.openjdk.java.net/jdk17/commit/181483b90bcc7d4e44109a14213d4ee2804f7f32 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod 8271064: ZGC several jvm08 perf regressions after JDK-8268372 Reviewed-by: ayang, pliden, tschatzl ------------- PR: https://git.openjdk.java.net/jdk17/pull/298 From iwalulya at openjdk.java.net Wed Aug 4 13:07:35 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 4 Aug 2021 13:07:35 GMT Subject: RFR: 8265057: G1: Investigate removal of maintenance of two BOT thresholds [v2] In-Reply-To: References: Message-ID: On Tue, 3 Aug 2021 13:38:27 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> sreviewer uggestiedto reemove comment > > Marked as reviewed by ayang (Committer). Thanks @albertnetymk and @tschatzl for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/4972 From iwalulya at openjdk.java.net Wed Aug 4 13:07:36 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 4 Aug 2021 13:07:36 GMT Subject: Integrated: 8265057: G1: Investigate removal of maintenance of two BOT thresholds In-Reply-To: References: Message-ID: On Tue, 3 Aug 2021 13:25:07 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to remove `_next_offset_index` and maintain only one threshold `_next_offset_threshold` a direct pointer to the BOT entry in the BOT. We derive the next_offset_index from from _next_offset_threshold where required. Performance tests do not show any regression as a result of this change. > > Testing: Tier 1-5 This pull request has now been integrated. Changeset: 0a27f264 Author: Ivan Walulya URL: https://git.openjdk.java.net/jdk/commit/0a27f264da5a21d581e099573e48485bdeea7790 Stats: 31 lines in 3 files changed: 4 ins; 8 del; 19 mod 8265057: G1: Investigate removal of maintenance of two BOT thresholds Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/4972 From kbarrett at openjdk.java.net Wed Aug 4 14:51:29 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Aug 2021 14:51:29 GMT Subject: RFR: 8271217: Fix race between G1PeriodicGCTask checks and GC request In-Reply-To: References: Message-ID: On Wed, 4 Aug 2021 09:08:30 GMT, Leo Korinth wrote: > If you like, you could initialize the values of the counters in the default constructor (to zero for example). I think initialized to zero is no better than uninitialized here. Either way, a default-constructed object of this type shouldn't be given to try_collect. In an unpublished development version I had a debug-only is_initialized flag that was set appropriately by the constructors and checked by the accessors, but decided it more clutter than it was worth and dropped it. ------------- PR: https://git.openjdk.java.net/jdk/pull/4905 From kbarrett at openjdk.java.net Wed Aug 4 14:51:29 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Aug 2021 14:51:29 GMT Subject: RFR: 8271217: Fix race between G1PeriodicGCTask checks and GC request In-Reply-To: <8gQNCSwFUNPwQ0TT6-NA_2bOJAdtI1rMnMykc8VHqLE=.23e93c0c-dbc1-43ad-a0c1-c5b25ce1687b@github.com> References: <8gQNCSwFUNPwQ0TT6-NA_2bOJAdtI1rMnMykc8VHqLE=.23e93c0c-dbc1-43ad-a0c1-c5b25ce1687b@github.com> Message-ID: On Mon, 2 Aug 2021 07:19:12 GMT, Ivan Walulya wrote: >> Please review this change to remove a race involving G1PeriodicGCTask. The >> task performs various checks to determine whether a periodic GC should be >> performed. If they pass, then it requests the GC by calling try_collect. >> The problem is that an unrelated GC could occur after the task's checks but >> before its request is processed. In that case we don't want to do the >> periodic GC after all, but there's nothing to prevent it. >> >> This change lets the task capture the various GC counters as part of its >> checking, and pass them to try_collect for use in detecting an intervening >> GC. >> >> Testing: >> mach5 tier1 > > Lgtm! Thanks @walulyai , @tschatzl , and @lkorinth for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/4905 From mdoerr at openjdk.java.net Wed Aug 4 14:52:28 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 4 Aug 2021 14:52:28 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space In-Reply-To: References: Message-ID: On Wed, 4 Aug 2021 01:06:47 GMT, Hamlin Li wrote: > Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. > > After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. Sounds like a good idea. Only doing prefetch immediately before copy looks odd, but this may be a different topic. ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From kbarrett at openjdk.java.net Wed Aug 4 15:08:06 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Aug 2021 15:08:06 GMT Subject: RFR: 8271217: Fix race between G1PeriodicGCTask checks and GC request [v2] In-Reply-To: References: Message-ID: > Please review this change to remove a race involving G1PeriodicGCTask. The > task performs various checks to determine whether a periodic GC should be > performed. If they pass, then it requests the GC by calling try_collect. > The problem is that an unrelated GC could occur after the task's checks but > before its request is processed. In that case we don't want to do the > periodic GC after all, but there's nothing to prevent it. > > This change lets the task capture the various GC counters as part of its > checking, and pass them to try_collect for use in detecting an intervening > GC. > > Testing: > mach5 tier1 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into g1service_races - eliminate races in periodic gc requests ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4905/files - new: https://git.openjdk.java.net/jdk/pull/4905/files/0152fd58..c48d53b6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4905&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4905&range=00-01 Stats: 38804 lines in 372 files changed: 34765 ins; 2096 del; 1943 mod Patch: https://git.openjdk.java.net/jdk/pull/4905.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4905/head:pull/4905 PR: https://git.openjdk.java.net/jdk/pull/4905 From kbarrett at openjdk.java.net Wed Aug 4 15:08:11 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Aug 2021 15:08:11 GMT Subject: Integrated: 8271217: Fix race between G1PeriodicGCTask checks and GC request In-Reply-To: References: Message-ID: On Mon, 26 Jul 2021 18:05:10 GMT, Kim Barrett wrote: > Please review this change to remove a race involving G1PeriodicGCTask. The > task performs various checks to determine whether a periodic GC should be > performed. If they pass, then it requests the GC by calling try_collect. > The problem is that an unrelated GC could occur after the task's checks but > before its request is processed. In that case we don't want to do the > periodic GC after all, but there's nothing to prevent it. > > This change lets the task capture the various GC counters as part of its > checking, and pass them to try_collect for use in detecting an intervening > GC. > > Testing: > mach5 tier1 This pull request has now been integrated. Changeset: 452f7d76 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/452f7d764fc0112cabf0be944e4233173d63f933 Stats: 159 lines in 8 files changed: 126 ins; 13 del; 20 mod 8271217: Fix race between G1PeriodicGCTask checks and GC request Reviewed-by: iwalulya, tschatzl, lkorinth ------------- PR: https://git.openjdk.java.net/jdk/pull/4905 From tschatzl at openjdk.java.net Wed Aug 4 16:10:29 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 4 Aug 2021 16:10:29 GMT Subject: RFR: 8270842: G1: Only young regions need to redirty outside references in remset. In-Reply-To: References: Message-ID: <-FFkrwUQv0BzUS7QD5f4QwAE5WVtGYxhVGKjotJPI4Q=.3b0c15f8-3255-401d-a2fd-45f409400156@github.com> On Wed, 21 Jul 2021 02:13:38 GMT, Hamlin Li wrote: > For evac failure objects in non-young regions (old) of cset, the outside reference remset already recorded in the dirty queue, we only needs to do it for obj in young regions Adding the reason why we need to re-scan only young gen objects is the use of `from->is_young()` in various places to avoid enqueuing too many cards. The exact condition would be something like `from->is_survivor()` (note: this means the newly allocated survivor regions allocated during this gc - remember that at start of the this gc we relabel all previous survivor regions as eden, which will become part of the eden of the next gc). I (force-)pushed a prototype that adds such a label to `G1HeapRegionAttr` to also avoid touching the `HeapRegion` table completely at https://github.com/tschatzl/jdk/tree/submit/evac-failure-no-scan-during-remove-self-forwards. During this investigation a few additional (unrelated, preexisting) issues with the current handling of objects during evacuation failure became kind of obvious, I filed https://bugs.openjdk.java.net/browse/JDK-8271871 and https://bugs.openjdk.java.net/browse/JDK-8271870. However I think this change is good as is though and can be replaced later with a more refined version of the suggested prototype. Testing tier1-5 has also been good afaict (still rerunning as I had some apparently different issues on Windows), as well as running gc/g1 jtreg tests with globally injected `-XX:+G1EvacuationFailureALot` (via `JAVA_OPTIONS_`). ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4853 From tschatzl at openjdk.java.net Wed Aug 4 16:20:30 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 4 Aug 2021 16:20:30 GMT Subject: RFR: 8270842: G1: Only young regions need to redirty outside references in remset. In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 02:13:38 GMT, Hamlin Li wrote: > For evac failure objects in non-young regions (old) of cset, the outside reference remset already recorded in the dirty queue, we only needs to do it for obj in young regions The comment in `RemoveSelfForwardPtrObjClosure::do_object()` before the change needs updating. It's completely incomprehensible to me (at this point) and mostly refers to very old behavior. Please change before pushing. Something like: // During evacuation failure we do not record inter-region // references referencing regions that need a remembered set // update originating from young regions (including eden) that // failed evacuation. Make up for that omission now by rescanning // these failed objects. ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4853 From dcubed at openjdk.java.net Wed Aug 4 16:30:43 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 4 Aug 2021 16:30:43 GMT Subject: [jdk17] RFR: 8271877: ProblemList jdk/jfr/event/gc/detailed/TestEvacuationFailedEvent.java in JDK17 Message-ID: A trivial fix to ProblemList jdk/jfr/event/gc/detailed/TestEvacuationFailedEvent.java in JDK17. ------------- Commit messages: - 8271877: ProblemList jdk/jfr/event/gc/detailed/TestEvacuationFailedEvent.java in JDK17 Changes: https://git.openjdk.java.net/jdk17/pull/299/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk17&pr=299&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271877 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk17/pull/299.diff Fetch: git fetch https://git.openjdk.java.net/jdk17 pull/299/head:pull/299 PR: https://git.openjdk.java.net/jdk17/pull/299 From dcubed at openjdk.java.net Wed Aug 4 16:41:38 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 4 Aug 2021 16:41:38 GMT Subject: [jdk17] RFR: 8271877: ProblemList jdk/jfr/event/gc/detailed/TestEvacuationFailedEvent.java in JDK17 In-Reply-To: References: Message-ID: <6h0ZGQFB7f30HLr2uXhLoBBxs4TXrs4LIuQGj4jFmZc=.489ed428-8e33-4e5e-a52a-a180f73022db@github.com> On Wed, 4 Aug 2021 16:36:48 GMT, Joe Darcy wrote: >> A trivial fix to ProblemList jdk/jfr/event/gc/detailed/TestEvacuationFailedEvent.java in JDK17. > > Marked as reviewed by darcy (Reviewer). @jddarcy - Thanks for the fast review! ------------- PR: https://git.openjdk.java.net/jdk17/pull/299 From darcy at openjdk.java.net Wed Aug 4 16:41:38 2021 From: darcy at openjdk.java.net (Joe Darcy) Date: Wed, 4 Aug 2021 16:41:38 GMT Subject: [jdk17] RFR: 8271877: ProblemList jdk/jfr/event/gc/detailed/TestEvacuationFailedEvent.java in JDK17 In-Reply-To: References: Message-ID: On Wed, 4 Aug 2021 16:21:22 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList jdk/jfr/event/gc/detailed/TestEvacuationFailedEvent.java in JDK17. Marked as reviewed by darcy (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk17/pull/299 From dcubed at openjdk.java.net Wed Aug 4 16:46:35 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 4 Aug 2021 16:46:35 GMT Subject: [jdk17] Integrated: 8271877: ProblemList jdk/jfr/event/gc/detailed/TestEvacuationFailedEvent.java in JDK17 In-Reply-To: References: Message-ID: On Wed, 4 Aug 2021 16:21:22 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList jdk/jfr/event/gc/detailed/TestEvacuationFailedEvent.java in JDK17. This pull request has now been integrated. Changeset: 5f547e8c Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk17/commit/5f547e8c119e9c0f6a000d2fdc2a693a4e601ba0 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8271877: ProblemList jdk/jfr/event/gc/detailed/TestEvacuationFailedEvent.java in JDK17 Reviewed-by: darcy ------------- PR: https://git.openjdk.java.net/jdk17/pull/299 From dcubed at openjdk.java.net Wed Aug 4 20:58:46 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 4 Aug 2021 20:58:46 GMT Subject: RFR: 8271898: disable os.release_multi_mappings_vm on macOS-X64 Message-ID: A trivial fix to disable os.release_multi_mappings_vm on macOS-X64. ------------- Commit messages: - 8271898: disable os.release_multi_mappings_vm on macOS-X64 Changes: https://git.openjdk.java.net/jdk/pull/5006/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5006&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271898 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5006.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5006/head:pull/5006 PR: https://git.openjdk.java.net/jdk/pull/5006 From kbarrett at openjdk.java.net Wed Aug 4 21:09:32 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 4 Aug 2021 21:09:32 GMT Subject: RFR: 8271898: disable os.release_multi_mappings_vm on macOS-X64 In-Reply-To: References: Message-ID: On Wed, 4 Aug 2021 20:45:09 GMT, Daniel D. Daugherty wrote: > A trivial fix to disable os.release_multi_mappings_vm on macOS-X64. Looks "good" (actually kind of horrible, but...) Oh, and yes, trivial. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5006 From dcubed at openjdk.java.net Wed Aug 4 21:09:33 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 4 Aug 2021 21:09:33 GMT Subject: RFR: 8271898: disable os.release_multi_mappings_vm on macOS-X64 In-Reply-To: References: Message-ID: On Wed, 4 Aug 2021 21:03:14 GMT, Kim Barrett wrote: >> A trivial fix to disable os.release_multi_mappings_vm on macOS-X64. > > Looks "good" (actually kind of horrible, but...) @kimbarrett - Thanks for the fast review. Yes, it is kind of horrible, but it's what we have for gtest... ------------- PR: https://git.openjdk.java.net/jdk/pull/5006 From dcubed at openjdk.java.net Wed Aug 4 21:09:33 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 4 Aug 2021 21:09:33 GMT Subject: Integrated: 8271898: disable os.release_multi_mappings_vm on macOS-X64 In-Reply-To: References: Message-ID: <8y-T5jV-DP-Zyqyg0_dbQL-PLMwmMABLCCxElQB_468=.f4e4a6d8-be2e-40ab-9bbd-de712c23addd@github.com> On Wed, 4 Aug 2021 20:45:09 GMT, Daniel D. Daugherty wrote: > A trivial fix to disable os.release_multi_mappings_vm on macOS-X64. This pull request has now been integrated. Changeset: d62fbea7 Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/d62fbea7b41f150f25ed3a9a037c081cfdc217b6 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod 8271898: disable os.release_multi_mappings_vm on macOS-X64 Reviewed-by: kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/5006 From dcubed at openjdk.java.net Wed Aug 4 21:09:33 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 4 Aug 2021 21:09:33 GMT Subject: RFR: 8271898: disable os.release_multi_mappings_vm on macOS-X64 In-Reply-To: References: Message-ID: On Wed, 4 Aug 2021 20:45:09 GMT, Daniel D. Daugherty wrote: > A trivial fix to disable os.release_multi_mappings_vm on macOS-X64. And thanks for the confirmation that it is trivial. Since this is just a replay of: JDK-8267339 Temporarily disable os.release_multi_mappings_vm on macOS x64 I wasn't too worried about it. :-) ------------- PR: https://git.openjdk.java.net/jdk/pull/5006 From jwilhelm at openjdk.java.net Wed Aug 4 23:03:58 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Wed, 4 Aug 2021 23:03:58 GMT Subject: RFR: Merge jdk17 Message-ID: Forwardport JDK 17 -> JDK 18 ------------- Commit messages: - Merge - 8271894: ProblemList javax/swing/JComponent/7154030/bug7154030.java in JDK17 - 8271877: ProblemList jdk/jfr/event/gc/detailed/TestEvacuationFailedEvent.java in JDK17 - 8271064: ZGC several jvm08 perf regressions after JDK-8268372 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=5009&range=00.0 - jdk17: https://webrevs.openjdk.java.net/?repo=jdk&pr=5009&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/5009/files Stats: 8 lines in 2 files changed: 6 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5009.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5009/head:pull/5009 PR: https://git.openjdk.java.net/jdk/pull/5009 From jwilhelm at openjdk.java.net Wed Aug 4 23:20:51 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Wed, 4 Aug 2021 23:20:51 GMT Subject: RFR: Merge jdk17 [v2] In-Reply-To: References: Message-ID: > Forwardport JDK 17 -> JDK 18 Jesper Wilhelmsson has updated the pull request incrementally with two additional commits since the last revision: - Merge - Merge ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5009/files - new: https://git.openjdk.java.net/jdk/pull/5009/files/13123975..789f6f8b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5009&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5009&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5009.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5009/head:pull/5009 PR: https://git.openjdk.java.net/jdk/pull/5009 From jwilhelm at openjdk.java.net Thu Aug 5 01:05:33 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 5 Aug 2021 01:05:33 GMT Subject: Integrated: Merge jdk17 In-Reply-To: References: Message-ID: <4r0lRo6diWLXWMsUQvYWoxyKulRmuyd3iej5bcytNFw=.0ad83a92-6e36-4a48-a32a-f844b3e46333@github.com> On Wed, 4 Aug 2021 22:55:08 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 17 -> JDK 18 This pull request has now been integrated. Changeset: cd6b54ec Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/cd6b54ec40f1d60fbdb6c8aee1e6ba662daca58c Stats: 5 lines in 2 files changed: 3 ins; 1 del; 1 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/5009 From kbarrett at openjdk.java.net Thu Aug 5 01:29:29 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 5 Aug 2021 01:29:29 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space In-Reply-To: References: Message-ID: On Wed, 4 Aug 2021 01:06:47 GMT, Hamlin Li wrote: > Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. > > After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. Any explanation for why this has a significant impact? ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From stuefe at openjdk.java.net Thu Aug 5 03:30:31 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 5 Aug 2021 03:30:31 GMT Subject: RFR: 8271898: disable os.release_multi_mappings_vm on macOS-X64 In-Reply-To: References: Message-ID: On Wed, 4 Aug 2021 21:07:22 GMT, Daniel D. Daugherty wrote: >> A trivial fix to disable os.release_multi_mappings_vm on macOS-X64. > > And thanks for the confirmation that it is trivial. Since this is just a replay of: > > JDK-8267339 Temporarily disable os.release_multi_mappings_vm on macOS x64 > > I wasn't too worried about it. :-) Must have been an error when merging the original JDK-8267339 into JDK-8256844. Thanks, @dcubed-ojdk for taking care of this. ------------- PR: https://git.openjdk.java.net/jdk/pull/5006 From rrich at openjdk.java.net Thu Aug 5 07:34:37 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Thu, 5 Aug 2021 07:34:37 GMT Subject: Integrated: 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers In-Reply-To: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> References: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> Message-ID: On Tue, 3 Aug 2021 13:17:48 GMT, Richard Reingruber wrote: > I tried to make this pr is dependent on #4968. Hope this works... > > The change fixes a test issue in gc/g1/TestMixedGCLiveThreshold.java by adding > the option `-XX:ParallelGCThreads=1`. This prevents the full gc from running > with more than one worker because if it would do that, live data could be > distributed into several regions and one of them could be selected for > collection which is unexpected with `-XX:G1MixedGCLiveThresholdPercent=25`. > > Please refer to the JBS-Item for more details. > > Thanks, Richard. This pull request has now been integrated. Changeset: 4abe5311 Author: Richard Reingruber URL: https://git.openjdk.java.net/jdk/commit/4abe5311407c68d04fb0babb87fa279e35d5fabc Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/4971 From rrich at openjdk.java.net Thu Aug 5 07:34:37 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Thu, 5 Aug 2021 07:34:37 GMT Subject: RFR: 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers [v3] In-Reply-To: References: <4968prrcB2ivxncTpILaWl5ZfeeIyc1AC1HF3kpbKQ8=.713c63d5-264e-4c4d-8de4-1d7c824fb391@github.com> Message-ID: On Wed, 4 Aug 2021 07:51:07 GMT, Richard Reingruber wrote: >> I tried to make this pr is dependent on #4968. Hope this works... >> >> The change fixes a test issue in gc/g1/TestMixedGCLiveThreshold.java by adding >> the option `-XX:ParallelGCThreads=1`. This prevents the full gc from running >> with more than one worker because if it would do that, live data could be >> distributed into several regions and one of them could be selected for >> collection which is unexpected with `-XX:G1MixedGCLiveThresholdPercent=25`. >> >> Please refer to the JBS-Item for more details. >> >> Thanks, Richard. > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' > - 8271722: [TESTBUG] gc/g1/TestMixedGCLiveThreshold.java can fail if G1 Full GC uses >1 workers > - start Thanks for the reviews Albert and Thomas! ------------- PR: https://git.openjdk.java.net/jdk/pull/4971 From ayang at openjdk.java.net Thu Aug 5 08:03:39 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 5 Aug 2021 08:03:39 GMT Subject: RFR: 8271896: Remove unnecessary top address checks in BOT Message-ID: Simple change of removing unnecessary top checks. I left the one `HeapRegion::block_size` as an assert, since it's outside BOT. Test: hotspot_gc ------------- Commit messages: - bot-top Changes: https://git.openjdk.java.net/jdk/pull/5013/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5013&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271896 Stats: 5 lines in 3 files changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5013.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5013/head:pull/5013 PR: https://git.openjdk.java.net/jdk/pull/5013 From ayang at openjdk.java.net Thu Aug 5 09:18:38 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 5 Aug 2021 09:18:38 GMT Subject: RFR: 8271930: Simplify end_card calculation in G1BlockOffsetTablePart::verify Message-ID: Simplify `end_card` calculation and some general cleanup. Test: hotspot_gc ------------- Commit messages: - bot-old Changes: https://git.openjdk.java.net/jdk/pull/5015/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5015&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271930 Stats: 18 lines in 3 files changed: 1 ins; 12 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/5015.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5015/head:pull/5015 PR: https://git.openjdk.java.net/jdk/pull/5015 From tschatzl at openjdk.java.net Thu Aug 5 10:36:29 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 5 Aug 2021 10:36:29 GMT Subject: RFR: 8271896: Remove unnecessary top address checks in BOT In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 07:58:12 GMT, Albert Mingkun Yang wrote: > Simple change of removing unnecessary top checks. I left the one `HeapRegion::block_size` as an assert, since it's outside BOT. > > Test: hotspot_gc Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5013 From iwalulya at openjdk.java.net Thu Aug 5 10:45:28 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Thu, 5 Aug 2021 10:45:28 GMT Subject: RFR: 8271896: Remove unnecessary top address checks in BOT In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 07:58:12 GMT, Albert Mingkun Yang wrote: > Simple change of removing unnecessary top checks. I left the one `HeapRegion::block_size` as an assert, since it's outside BOT. > > Test: hotspot_gc Marked as reviewed by iwalulya (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5013 From mli at openjdk.java.net Thu Aug 5 12:41:50 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 5 Aug 2021 12:41:50 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v2] In-Reply-To: References: Message-ID: > Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. > > After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: simplify mark/age adjust ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4983/files - new: https://git.openjdk.java.net/jdk/pull/4983/files/997fdc19..91601879 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4983&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4983&range=00-01 Stats: 11 lines in 1 file changed: 0 ins; 10 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4983.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4983/head:pull/4983 PR: https://git.openjdk.java.net/jdk/pull/4983 From mli at openjdk.java.net Thu Aug 5 12:41:50 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 5 Aug 2021 12:41:50 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space In-Reply-To: References: Message-ID: On Wed, 4 Aug 2021 01:06:47 GMT, Hamlin Li wrote: > Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. > > After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. Not quite sure, it could be related to the pipeline scheduling at CPU level. BTW, I just uploaded the test result for x86, no regression for x86. ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From pliden at openjdk.java.net Thu Aug 5 12:44:32 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 5 Aug 2021 12:44:32 GMT Subject: RFR: 8271121: ZGC: stack overflow (segv) when -Xlog:gc+start=debug In-Reply-To: References: <1oySXuASSlTRKZ7LKqR4l4wPIq8he_F9P0pNAQD6b9Q=.24ab2082-37b6-45b8-8078-e90a2b62eb8d@github.com> Message-ID: On Wed, 4 Aug 2021 09:56:01 GMT, Albert Mingkun Yang wrote: >> When an allocation stall happens we're trying to access oops (the thread and thread name) while logging. We shouldn't do that since it can lead to a recursive allocation stall situation, which eventually will cause a stack overflow. I suggest we simply don't log anything in ZStatCriticalPhase::register_start(). The useful part of the logging happens in ZStatCriticalPhase::register_end() anyway, where it's safe to access oops. > > Marked as reviewed by ayang (Committer). Thanks @albertnetymk and @fisk for reviewing. ------------- PR: https://git.openjdk.java.net/jdk/pull/4990 From pliden at openjdk.java.net Thu Aug 5 12:44:32 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 5 Aug 2021 12:44:32 GMT Subject: Integrated: 8271121: ZGC: stack overflow (segv) when -Xlog:gc+start=debug In-Reply-To: <1oySXuASSlTRKZ7LKqR4l4wPIq8he_F9P0pNAQD6b9Q=.24ab2082-37b6-45b8-8078-e90a2b62eb8d@github.com> References: <1oySXuASSlTRKZ7LKqR4l4wPIq8he_F9P0pNAQD6b9Q=.24ab2082-37b6-45b8-8078-e90a2b62eb8d@github.com> Message-ID: On Wed, 4 Aug 2021 09:22:54 GMT, Per Liden wrote: > When an allocation stall happens we're trying to access oops (the thread and thread name) while logging. We shouldn't do that since it can lead to a recursive allocation stall situation, which eventually will cause a stack overflow. I suggest we simply don't log anything in ZStatCriticalPhase::register_start(). The useful part of the logging happens in ZStatCriticalPhase::register_end() anyway, where it's safe to access oops. This pull request has now been integrated. Changeset: 18dd4d46 Author: Per Liden URL: https://git.openjdk.java.net/jdk/commit/18dd4d469d120276d05e74607d780f01056f1a8b Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8271121: ZGC: stack overflow (segv) when -Xlog:gc+start=debug Reviewed-by: ayang, eosterlund ------------- PR: https://git.openjdk.java.net/jdk/pull/4990 From github.com+42899633+eastig at openjdk.java.net Thu Aug 5 13:56:28 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 5 Aug 2021 13:56:28 GMT Subject: RFR: 8271939: Clean up primitive raw accessors in oopDesc In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 11:20:58 GMT, Roman Kennke wrote: > We have some _raw accessors in oop.hpp and related code. Those are leftover from when Shenandoah required primitive barriers and are no longer needed. Their uses can be replaced with the regular accessors. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] hotspot_gc LGTM ------------- Marked as reviewed by eastig at github.com (no known OpenJDK username). PR: https://git.openjdk.java.net/jdk/pull/5017 From rkennke at redhat.com Thu Aug 5 14:21:33 2021 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 5 Aug 2021 16:21:33 +0200 Subject: Intentional heap bounds change? Message-ID: <475beb6a-8ca2-7849-bada-cb8288d78d47@redhat.com> Hello GC devs, I see a case where heap may shrink below -Xms. This may be intentional or not. -Xms is initial heap size after all, not necessarily minimum heap size. However, there is also documentation that -Xms *does* provide the lower bounds for heap sizing policy, e.g. [1] I believe this is due to this change: https://github.com/openjdk/jdk/commit/1ed5b22d6e48ffbebaa53cf79df1c5c790aa9d71#diff-74a766b0aa16c688981a4d7b20da1c93315ce72358cac7158e9b50f82c9773a3L567 I.e. instead of adjusting old-gen minimum size with young-gen minimum size, it is set to _gen_alignment. Is this intentional? Thanks, Roman [1] https://docs.oracle.com/en/java/javase/11/gctuning/factors-affecting-garbage-collection-performance.html#GUID-B0BFEFCB-F045-4105-BFA4-C97DE81DAC5B From rkennke at openjdk.java.net Thu Aug 5 14:36:44 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 5 Aug 2021 14:36:44 GMT Subject: RFR: 8271951: Consolidate preserved marks overflow stack in SerialGC Message-ID: <_FpPV4VH0_Iyh0qyN6bJ6wyVISaoy5chgr3ARPlPIKU=.8eafc90e-dfc7-44fc-9316-6bb626e8e8f8@github.com> In SerialGC we put preserved marks in space that we may have in the young gen, and failing that, we push it to an overflow stack. The overflow stack is actually two stacks, and replicate behavior of the in-place preserved marks. These can be consolidated. Testing: - [ ] tier1 - [ ] tier2 - [ ] hotspot_gc ------------- Commit messages: - 8271951: Consolidate preserved marks overflow stack in SerialGC Changes: https://git.openjdk.java.net/jdk/pull/5022/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5022&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271951 Stats: 23 lines in 3 files changed: 2 ins; 10 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/5022.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5022/head:pull/5022 PR: https://git.openjdk.java.net/jdk/pull/5022 From tschatzl at openjdk.java.net Thu Aug 5 14:59:29 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 5 Aug 2021 14:59:29 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v2] In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 12:41:50 GMT, Hamlin Li wrote: >> Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. >> >> After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > simplify mark/age adjust We are currently trying to reproduce these improvements on aarch64 and x64. Also, please revert the change to the age calculation and file this separately. I think I remember the reason for incrementing the age the way it is done is/was that it is supposed to avoid some forced reloads of the mark word which may or may not be important. Either way it is completely unrelated to this change. ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From dcubed at openjdk.java.net Thu Aug 5 14:59:42 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 5 Aug 2021 14:59:42 GMT Subject: Integrated: 8271953: fix mis-merge in JDK-8271878 Message-ID: A trivial change to fix a mis-merge in JDK-8271878. ------------- Commit messages: - 8271953: fix mis-merge in JDK-8271878 Changes: https://git.openjdk.java.net/jdk/pull/5023/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5023&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271953 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5023.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5023/head:pull/5023 PR: https://git.openjdk.java.net/jdk/pull/5023 From jwilhelm at openjdk.java.net Thu Aug 5 14:59:42 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 5 Aug 2021 14:59:42 GMT Subject: Integrated: 8271953: fix mis-merge in JDK-8271878 In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 14:48:16 GMT, Daniel D. Daugherty wrote: > A trivial change to fix a mis-merge in JDK-8271878. Marked as reviewed by jwilhelm (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5023 From ctornqvi at openjdk.java.net Thu Aug 5 14:59:42 2021 From: ctornqvi at openjdk.java.net (Christian Tornqvist) Date: Thu, 5 Aug 2021 14:59:42 GMT Subject: Integrated: 8271953: fix mis-merge in JDK-8271878 In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 14:48:16 GMT, Daniel D. Daugherty wrote: > A trivial change to fix a mis-merge in JDK-8271878. Marked as reviewed by ctornqvi (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5023 From dcubed at openjdk.java.net Thu Aug 5 14:59:43 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 5 Aug 2021 14:59:43 GMT Subject: Integrated: 8271953: fix mis-merge in JDK-8271878 In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 14:52:28 GMT, Jesper Wilhelmsson wrote: >> A trivial change to fix a mis-merge in JDK-8271878. > > Marked as reviewed by jwilhelm (Reviewer). @JesperIRL - Thanks for the fast review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5023 From dcubed at openjdk.java.net Thu Aug 5 14:59:43 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 5 Aug 2021 14:59:43 GMT Subject: Integrated: 8271953: fix mis-merge in JDK-8271878 In-Reply-To: References: Message-ID: <_3Rk-dASucutCCcUZ9K6Vy2AR-wMzBRSIahbNpnzZxs=.974c85d3-f12c-490c-88a5-d0f147bbfc14@github.com> On Thu, 5 Aug 2021 14:48:16 GMT, Daniel D. Daugherty wrote: > A trivial change to fix a mis-merge in JDK-8271878. This pull request has now been integrated. Changeset: 7234a433 Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/7234a433f8ba13d8a4b696a77653b441163d2afa Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8271953: fix mis-merge in JDK-8271878 Reviewed-by: jwilhelm, ctornqvi ------------- PR: https://git.openjdk.java.net/jdk/pull/5023 From dcubed at openjdk.java.net Thu Aug 5 15:30:33 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 5 Aug 2021 15:30:33 GMT Subject: Integrated: 8271953: fix mis-merge in JDK-8271878 In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 14:53:53 GMT, Christian Tornqvist wrote: >> A trivial change to fix a mis-merge in JDK-8271878. > > Marked as reviewed by ctornqvi (Reviewer). @ctornqvi - Thanks for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5023 From shade at openjdk.java.net Thu Aug 5 16:11:34 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 5 Aug 2021 16:11:34 GMT Subject: RFR: 8271939: Clean up primitive raw accessors in oopDesc In-Reply-To: References: Message-ID: <5By8v2haGJrhX-3_K3CB7rBxdXaSdsPFpe94WRaS8hM=.415e79a7-8751-45cf-b106-8320ce3aded0@github.com> On Thu, 5 Aug 2021 11:20:58 GMT, Roman Kennke wrote: > We have some _raw accessors in oop.hpp and related code. Those are leftover from when Shenandoah required primitive barriers and are no longer needed. Their uses can be replaced with the regular accessors. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] hotspot_gc Looking at the [RFE](http://hg.openjdk.java.net/jdk/jdk/rev/7c3891b9f1e0) that introduced these, I think `loader_data_raw` is also the candidate for removal? ------------- PR: https://git.openjdk.java.net/jdk/pull/5017 From mli at openjdk.java.net Fri Aug 6 01:37:52 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 6 Aug 2021 01:37:52 GMT Subject: RFR: 8270842: G1: Only young regions need to redirty outside references in remset. [v2] In-Reply-To: References: Message-ID: > For evac failure objects in non-young regions (old) of cset, the outside reference remset already recorded in the dirty queue, we only needs to do it for obj in young regions Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: replace out-of-date comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4853/files - new: https://git.openjdk.java.net/jdk/pull/4853/files/7790e440..88ce3d42 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4853&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4853&range=00-01 Stats: 12 lines in 1 file changed: 0 ins; 7 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/4853.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4853/head:pull/4853 PR: https://git.openjdk.java.net/jdk/pull/4853 From mli at openjdk.java.net Fri Aug 6 01:37:53 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 6 Aug 2021 01:37:53 GMT Subject: RFR: 8270842: G1: Only young regions need to redirty outside references in remset. In-Reply-To: References: Message-ID: <4XhGycrXcPRz_H5Xkl9EH0yZwNRdG3GV1m45Z_mkVgE=.5d5b1f86-fc42-4584-bc4f-43e98023a32b@github.com> On Wed, 21 Jul 2021 02:13:38 GMT, Hamlin Li wrote: > For evac failure objects in non-young regions (old) of cset, the outside reference remset already recorded in the dirty queue, we only needs to do it for obj in young regions Thanks Thomas, I've updated the comments. Seems the pr is not ready for now, should I continue or do you have some other plan? ------------- PR: https://git.openjdk.java.net/jdk/pull/4853 From mli at openjdk.java.net Fri Aug 6 02:15:55 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 6 Aug 2021 02:15:55 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v3] In-Reply-To: References: Message-ID: > Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. > > After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. Hamlin Li has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4983/files - new: https://git.openjdk.java.net/jdk/pull/4983/files/91601879..997fdc19 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4983&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4983&range=01-02 Stats: 11 lines in 1 file changed: 10 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4983.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4983/head:pull/4983 PR: https://git.openjdk.java.net/jdk/pull/4983 From mli at openjdk.java.net Fri Aug 6 02:15:55 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 6 Aug 2021 02:15:55 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v2] In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 12:41:50 GMT, Hamlin Li wrote: >> Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. >> >> After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > simplify mark/age adjust I see, just reverted the code related to age which is now tracked by https://bugs.openjdk.java.net/browse/JDK-8272070 ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From mli at openjdk.java.net Fri Aug 6 03:16:31 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 6 Aug 2021 03:16:31 GMT Subject: RFR: 8270842: G1: Only young regions need to redirty outside references in remset. [v2] In-Reply-To: References: Message-ID: <6UkBjiKzPP6eSS6DYniySWglZoJW0e9j-onlyCFpDPA=.129673ec-656d-4668-b454-cc5a09598694@github.com> On Fri, 6 Aug 2021 01:37:52 GMT, Hamlin Li wrote: >> For evac failure objects in non-young regions (old) of cset, the outside reference remset already recorded in the dirty queue, we only needs to do it for obj in young regions > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > replace out-of-date comments Just checked your prototype code, I think it's a good way to go. ------------- PR: https://git.openjdk.java.net/jdk/pull/4853 From tschatzl at openjdk.java.net Fri Aug 6 07:39:30 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 6 Aug 2021 07:39:30 GMT Subject: RFR: 8270842: G1: Only young regions need to redirty outside references in remset. [v2] In-Reply-To: References: Message-ID: <6CgDKk9X6-nrg24NxfXo9zoRHku-iiDZrep406bBB9E=.59f4ae37-76ce-4821-9af5-75b1595cd212@github.com> On Fri, 6 Aug 2021 01:37:52 GMT, Hamlin Li wrote: >> For evac failure objects in non-young regions (old) of cset, the outside reference remset already recorded in the dirty queue, we only needs to do it for obj in young regions > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > replace out-of-date comments Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4853 From ayang at openjdk.java.net Fri Aug 6 08:30:32 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 6 Aug 2021 08:30:32 GMT Subject: RFR: 8271896: Remove unnecessary top address checks in BOT In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 07:58:12 GMT, Albert Mingkun Yang wrote: > Simple change of removing unnecessary top checks. I left the one `HeapRegion::block_size` as an assert, since it's outside BOT. > > Test: hotspot_gc Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5013 From ayang at openjdk.java.net Fri Aug 6 08:30:33 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 6 Aug 2021 08:30:33 GMT Subject: Integrated: 8271896: Remove unnecessary top address checks in BOT In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 07:58:12 GMT, Albert Mingkun Yang wrote: > Simple change of removing unnecessary top checks. I left the one `HeapRegion::block_size` as an assert, since it's outside BOT. > > Test: hotspot_gc This pull request has now been integrated. Changeset: c2b7face Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/c2b7facea442eda470913546001c9a5e35d18929 Stats: 5 lines in 3 files changed: 0 ins; 4 del; 1 mod 8271896: Remove unnecessary top address checks in BOT Reviewed-by: tschatzl, iwalulya ------------- PR: https://git.openjdk.java.net/jdk/pull/5013 From pliden at openjdk.java.net Fri Aug 6 10:10:46 2021 From: pliden at openjdk.java.net (Per Liden) Date: Fri, 6 Aug 2021 10:10:46 GMT Subject: RFR: 8267186: Add string deduplication support to ZGC Message-ID: This change adds support for string deduplication to ZGC. It's a pretty straight forward change, but to make reviewing even easier it is broken up into two commits: __ZGC: Introduce ZMarkContext__ This commit just moves the `ZMarkCache` into the new `ZMarkContext` class, and we now pass a `ZMarkContext*` around instead of a `ZMarkCache*`. The `ZMarkContext` class is a more general container for worker-local data, which in the next commit will be extended to also include the `StringDedup::Requests` queue. __8267186: Add string deduplication support to ZGC__ This commits adds the actual string dedup functionality and enables relevant tests. We use the `deduplication_requested` bit in the `String` object to filter out `Strings` we've already attempted to deduplicate. Testing: - Passes all string dedup tests. - Passes Tier1-7 with ZGC on Linux/x86_64 (with -XX:+UseStringDeduplication enabled by default to get better exposure). ------------- Commit messages: - 8267186: Add string deduplication support to ZGC - ZGC: Introduce ZMarkContext Changes: https://git.openjdk.java.net/jdk/pull/5029/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5029&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267186 Stats: 231 lines in 12 files changed: 211 ins; 1 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/5029.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5029/head:pull/5029 PR: https://git.openjdk.java.net/jdk/pull/5029 From iwalulya at openjdk.java.net Fri Aug 6 11:00:39 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Fri, 6 Aug 2021 11:00:39 GMT Subject: RFR: 8271884: G1CH::_expand_heap_after_alloc_failure is no longer needed Message-ID: Hi, Please review this change to remove _expand_heap_after_alloc_failure. Checks for old_is_full(), survivor_is_full(), and G1CollectedHeap::has_more_regions() should eliminate repeated expansion attempts. Testing; tier 1 ------------- Commit messages: - remove _expand_heap_after_alloc_failure Changes: https://git.openjdk.java.net/jdk/pull/5030/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5030&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271884 Stats: 15 lines in 2 files changed: 0 ins; 13 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5030.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5030/head:pull/5030 PR: https://git.openjdk.java.net/jdk/pull/5030 From stefank at openjdk.java.net Fri Aug 6 11:20:28 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Fri, 6 Aug 2021 11:20:28 GMT Subject: RFR: 8271939: Clean up primitive raw accessors in oopDesc In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 11:20:58 GMT, Roman Kennke wrote: > We have some _raw accessors in oop.hpp and related code. Those are leftover from when Shenandoah required primitive barriers and are no longer needed. Their uses can be replaced with the regular accessors. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] hotspot_gc Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5017 From rkennke at openjdk.java.net Fri Aug 6 11:59:59 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 6 Aug 2021 11:59:59 GMT Subject: RFR: 8271939: Clean up primitive raw accessors in oopDesc [v2] In-Reply-To: References: Message-ID: > We have some _raw accessors in oop.hpp and related code. Those are leftover from when Shenandoah required primitive barriers and are no longer needed. Their uses can be replaced with the regular accessors. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] hotspot_gc Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Change loader_data_raw() to regular loader_data() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5017/files - new: https://git.openjdk.java.net/jdk/pull/5017/files/12b3c772..515c3a8a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5017&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5017&range=00-01 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5017.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5017/head:pull/5017 PR: https://git.openjdk.java.net/jdk/pull/5017 From rkennke at openjdk.java.net Fri Aug 6 12:38:38 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 6 Aug 2021 12:38:38 GMT Subject: RFR: 8271946: Cleanup leftovers in Space and subclasses Message-ID: <06djYCvDj8KzCJVZ7KJqS8B9WtRBwl7hN1CMm4BZzXk=.23e2ee84-3f0f-4274-a470-6ea5b2d888cc@github.com> There are a number of methods in (Compactible)Space that seem trivially implemented. E.g. obj_size(), adjust_obj_size(), scanned_block_is_obj() etc. I believe they are leftovers from CMS (not sure) but looks like they can be replaced by obj->size() or removed altogether. Also, the the related templated methods scan_and_forward(), scan_and_adjust_pointers() and scan_and_compact() can be directly and mechanically inlined into their callers. It makes the code more readable. Testing: - [x] hotspot_gc - [ ] tier1 - [ ] tier2 ------------- Commit messages: - 8271946: Cleanup leftovers in Space and subclasses Changes: https://git.openjdk.java.net/jdk/pull/5031/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5031&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271946 Stats: 409 lines in 3 files changed: 160 ins; 244 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/5031.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5031/head:pull/5031 PR: https://git.openjdk.java.net/jdk/pull/5031 From eosterlund at openjdk.java.net Fri Aug 6 13:09:28 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 6 Aug 2021 13:09:28 GMT Subject: RFR: 8267186: Add string deduplication support to ZGC In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 10:03:05 GMT, Per Liden wrote: > This change adds support for string deduplication to ZGC. It's a pretty straight forward change, but to make reviewing even easier it is broken up into two commits: > > __ZGC: Introduce ZMarkContext__ > This commit just moves the `ZMarkCache` into the new `ZMarkContext` class, and we now pass a `ZMarkContext*` around instead of a `ZMarkCache*`. The `ZMarkContext` class is a more general container for worker-local data, which in the next commit will be extended to also include the `StringDedup::Requests` queue. > > __8267186: Add string deduplication support to ZGC__ > This commits adds the actual string dedup functionality and enables relevant tests. We use the `deduplication_requested` bit in the `String` object to filter out `Strings` we've already attempted to deduplicate. > > Testing: > - Passes all string dedup tests. > - Passes Tier1-7 with ZGC on Linux/x86_64 (with -XX:+UseStringDeduplication enabled by default to get better exposure). Now that we have a context object for marking, it would be nice if it also contained the stripes and stacks that we pass around everywhere, that we originally determine just the lines below where the context object is created. ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5029 From pliden at openjdk.java.net Fri Aug 6 14:02:30 2021 From: pliden at openjdk.java.net (Per Liden) Date: Fri, 6 Aug 2021 14:02:30 GMT Subject: RFR: 8267186: Add string deduplication support to ZGC In-Reply-To: References: Message-ID: <1FsmlIjywgG71JFIZhivKhkMzl5ps6bvBvgbFz5gREA=.11ee8f87-56ad-40b5-9734-dadafac9e57e@github.com> On Fri, 6 Aug 2021 13:06:27 GMT, Erik ?sterlund wrote: > Now that we have a context object for marking, it would be nice if it also contained the stripes and stacks that we pass around everywhere, that we originally determine just the lines below where the context object is created. Good suggestion, will fix that. ------------- PR: https://git.openjdk.java.net/jdk/pull/5029 From pliden at openjdk.java.net Fri Aug 6 14:11:57 2021 From: pliden at openjdk.java.net (Per Liden) Date: Fri, 6 Aug 2021 14:11:57 GMT Subject: RFR: 8267186: Add string deduplication support to ZGC [v2] In-Reply-To: References: Message-ID: > This change adds support for string deduplication to ZGC. It's a pretty straight forward change, but to make reviewing even easier it is broken up into two commits: > > __ZGC: Introduce ZMarkContext__ > This commit just moves the `ZMarkCache` into the new `ZMarkContext` class, and we now pass a `ZMarkContext*` around instead of a `ZMarkCache*`. The `ZMarkContext` class is a more general container for worker-local data, which in the next commit will be extended to also include the `StringDedup::Requests` queue. > > __8267186: Add string deduplication support to ZGC__ > This commits adds the actual string dedup functionality and enables relevant tests. We use the `deduplication_requested` bit in the `String` object to filter out `Strings` we've already attempted to deduplicate. > > Testing: > - Passes all string dedup tests. > - Passes Tier1-7 with ZGC on Linux/x86_64 (with -XX:+UseStringDeduplication enabled by default to get better exposure). Per Liden has updated the pull request incrementally with one additional commit since the last revision: Review comments from Erik ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5029/files - new: https://git.openjdk.java.net/jdk/pull/5029/files/5b258cb0..c8cd63ee Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5029&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5029&range=00-01 Stats: 96 lines in 5 files changed: 36 ins; 38 del; 22 mod Patch: https://git.openjdk.java.net/jdk/pull/5029.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5029/head:pull/5029 PR: https://git.openjdk.java.net/jdk/pull/5029 From mli at openjdk.java.net Fri Aug 6 14:19:32 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 6 Aug 2021 14:19:32 GMT Subject: RFR: 8270842: G1: Only young regions need to redirty outside references in remset. [v2] In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 01:37:52 GMT, Hamlin Li wrote: >> For evac failure objects in non-young regions (old) of cset, the outside reference remset already recorded in the dirty queue, we only needs to do it for obj in young regions > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > replace out-of-date comments Thanks for your review, Thomas. ------------- PR: https://git.openjdk.java.net/jdk/pull/4853 From mli at openjdk.java.net Fri Aug 6 14:19:33 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 6 Aug 2021 14:19:33 GMT Subject: Integrated: 8270842: G1: Only young regions need to redirty outside references in remset. In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 02:13:38 GMT, Hamlin Li wrote: > For evac failure objects in non-young regions (old) of cset, the outside reference remset already recorded in the dirty queue, we only needs to do it for obj in young regions This pull request has now been integrated. Changeset: cc615208 Author: Hamlin Li URL: https://git.openjdk.java.net/jdk/commit/cc61520803513e5aab597322303145562948c9a6 Stats: 15 lines in 1 file changed: 2 ins; 5 del; 8 mod 8270842: G1: Only young regions need to redirty outside references in remset. Reviewed-by: tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/4853 From eosterlund at openjdk.java.net Fri Aug 6 14:26:31 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 6 Aug 2021 14:26:31 GMT Subject: RFR: 8267186: Add string deduplication support to ZGC [v2] In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 14:11:57 GMT, Per Liden wrote: >> This change adds support for string deduplication to ZGC. It's a pretty straight forward change, but to make reviewing even easier it is broken up into two commits: >> >> __ZGC: Introduce ZMarkContext__ >> This commit just moves the `ZMarkCache` into the new `ZMarkContext` class, and we now pass a `ZMarkContext*` around instead of a `ZMarkCache*`. The `ZMarkContext` class is a more general container for worker-local data, which in the next commit will be extended to also include the `StringDedup::Requests` queue. >> >> __8267186: Add string deduplication support to ZGC__ >> This commits adds the actual string dedup functionality and enables relevant tests. We use the `deduplication_requested` bit in the `String` object to filter out `Strings` we've already attempted to deduplicate. >> >> Testing: >> - Passes all string dedup tests. >> - Passes Tier1-7 with ZGC on Linux/x86_64 (with -XX:+UseStringDeduplication enabled by default to get better exposure). > > Per Liden has updated the pull request incrementally with one additional commit since the last revision: > > Review comments from Erik Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5029 From mli at openjdk.java.net Fri Aug 6 14:29:35 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 6 Aug 2021 14:29:35 GMT Subject: RFR: 8270454: G1: Simplify region index comparison In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 02:11:32 GMT, Hamlin Li wrote: > This is a minor code improvement to simplify region index (unit) comparison. kindly reminder. ------------- PR: https://git.openjdk.java.net/jdk/pull/4852 From stefan.karlsson at oracle.com Fri Aug 6 15:47:16 2021 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 6 Aug 2021 17:47:16 +0200 Subject: Intentional heap bounds change? In-Reply-To: <475beb6a-8ca2-7849-bada-cb8288d78d47@redhat.com> References: <475beb6a-8ca2-7849-bada-cb8288d78d47@redhat.com> Message-ID: <2ba5e394-c705-af1b-481c-46e96ba14011@oracle.com> Hi, On 2021-08-05 16:21, Roman Kennke wrote: > Hello GC devs, > > I see a case where heap may shrink below -Xms. This may be intentional > or not. -Xms is initial heap size after all, not necessarily minimum > heap size. However, there is also documentation that -Xms *does* > provide the lower bounds for heap sizing policy, e.g. [1] The documentation around this didn't reflect the actual implementation in HotSpot. As a part of recent cleanup and introduction of the -XX:InitialHeapSize flag, we updated the documentation to reflect the way -Xms is actually working: https://docs.oracle.com/en/java/javase/16/docs/specs/man/java.html | --- -Xms| /size/ Sets the minimum and initial size (in bytes) of the heap. This value must be a multiple of 1024 and greater than 1 MB. Append the letter |k| or |K| to indicate kilobytes, |m| or |M| to indicate megabytes, |g| or |G| to indicate gigabytes. The following examples show how to set the size of allocated memory to 6 MB using various units: |-Xms6291456 -Xms6144k -Xms6m| Instead of the |-Xms| option to set both the minimum and initial size of the heap, you can use |-XX:MinHeapSize| to set the minimum size and |-XX:InitialHeapSize| to set the initial size." --- Concretely, the code that parses -Xms sets both MinHeapSize and InitialHeapSize: } else if (match_option(option, "-Xms", &tail)) { julong size = 0; // an initial heap size of 0 means automatically determine ArgsRange errcode = parse_memory_size(tail, &size, 0); ... if (FLAG_SET_CMDLINE(MinHeapSize, (size_t)size) != JVMFlag::SUCCESS) { return JNI_EINVAL; } if (FLAG_SET_CMDLINE(InitialHeapSize, (size_t)size) != JVMFlag::SUCCESS) { return JNI_EINVAL; } > > I believe this is due to this change: > > https://github.com/openjdk/jdk/commit/1ed5b22d6e48ffbebaa53cf79df1c5c790aa9d71#diff-74a766b0aa16c688981a4d7b20da1c93315ce72358cac7158e9b50f82c9773a3L567 > > > I.e. instead of adjusting old-gen minimum size with young-gen minimum > size, it is set to _gen_alignment. > > Is this intentional? Hopefully StefanJ will be able to answer this. The code has this comment, so maybe it is intentional? // Minimum sizes of the generations may be different than // the initial sizes.? An inconsistency is permitted here // in the total size that can be specified explicitly by // command line specification of OldSize and NewSize and // also a command line specification of -Xms.? Issue a warning // but allow the values to pass. Do you have a command line that reproduces this problem? StefanK > > Thanks, > Roman > > > [1] > https://docs.oracle.com/en/java/javase/11/gctuning/factors-affecting-garbage-collection-performance.html#GUID-B0BFEFCB-F045-4105-BFA4-C97DE81DAC5B > From github.com+16811675+linade at openjdk.java.net Sat Aug 7 05:03:44 2021 From: github.com+16811675+linade at openjdk.java.net (Yude Lin) Date: Sat, 7 Aug 2021 05:03:44 GMT Subject: RFR: 8272083: G1: Record iterated range for BOT performance during card scan Message-ID: A fix to the problem in 8272083 is to use a per-worker pointer to indicate where the worker has scanned up to, similar to the _scanned_to variable. The difference is this pointer (I call it _iterated_to) records the end of the object and _scanned_to records the end of the scan. Since we always scan with increasing addresses, the end of the latest object scanned is also the address where BOT has fixed up to. So we avoid having to fix below this address when calling block_start(). This implementation approximately reduce the number of calls to set_offset_array() during scan_heap_roots() 2-10 times (in my casual test with -XX:G1ConcRefinementGreenZone=1000000). What this approach not solving is random access to BOT. So far I haven't found anything having this pattern. ------------- Commit messages: - 8272083: G1: Record iterated range for BOT performance during card scan. Changes: https://git.openjdk.java.net/jdk/pull/5039/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5039&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272083 Stats: 46 lines in 5 files changed: 44 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5039.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5039/head:pull/5039 PR: https://git.openjdk.java.net/jdk/pull/5039 From stefank at openjdk.java.net Sat Aug 7 06:41:29 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Sat, 7 Aug 2021 06:41:29 GMT Subject: RFR: 8271946: Cleanup leftovers in Space and subclasses In-Reply-To: <06djYCvDj8KzCJVZ7KJqS8B9WtRBwl7hN1CMm4BZzXk=.23e2ee84-3f0f-4274-a470-6ea5b2d888cc@github.com> References: <06djYCvDj8KzCJVZ7KJqS8B9WtRBwl7hN1CMm4BZzXk=.23e2ee84-3f0f-4274-a470-6ea5b2d888cc@github.com> Message-ID: On Fri, 6 Aug 2021 12:31:01 GMT, Roman Kennke wrote: > There are a number of methods in (Compactible)Space that seem trivially implemented. E.g. obj_size(), adjust_obj_size(), scanned_block_is_obj() etc. I believe they are leftovers from CMS (not sure) but looks like they can be replaced by obj->size() or removed altogether. Also, the the related templated methods scan_and_forward(), scan_and_adjust_pointers() and scan_and_compact() can be directly and mechanically inlined into their callers. It makes the code more readable. > > Testing: > - [x] hotspot_gc > - [x] tier1 > - [ ] tier2 Marked as reviewed by stefank (Reviewer). src/hotspot/share/gc/shared/space.hpp line 319: > 317: // - scanned_block_size() > 318: // - adjust_obj_size() > 319: // - obj_size() The entire surrounding comment needs to be updated now that these functions have been removed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5031 From stefank at openjdk.java.net Sat Aug 7 06:44:30 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Sat, 7 Aug 2021 06:44:30 GMT Subject: RFR: 8271939: Clean up primitive raw accessors in oopDesc [v2] In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 11:59:59 GMT, Roman Kennke wrote: >> We have some _raw accessors in oop.hpp and related code. Those are leftover from when Shenandoah required primitive barriers and are no longer needed. Their uses can be replaced with the regular accessors. >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] hotspot_gc > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Change loader_data_raw() to regular loader_data() Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5017 From rkennke at openjdk.java.net Sat Aug 7 11:08:55 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Sat, 7 Aug 2021 11:08:55 GMT Subject: RFR: 8271946: Cleanup leftovers in Space and subclasses [v2] In-Reply-To: <06djYCvDj8KzCJVZ7KJqS8B9WtRBwl7hN1CMm4BZzXk=.23e2ee84-3f0f-4274-a470-6ea5b2d888cc@github.com> References: <06djYCvDj8KzCJVZ7KJqS8B9WtRBwl7hN1CMm4BZzXk=.23e2ee84-3f0f-4274-a470-6ea5b2d888cc@github.com> Message-ID: > There are a number of methods in (Compactible)Space that seem trivially implemented. E.g. obj_size(), adjust_obj_size(), scanned_block_is_obj() etc. I believe they are leftovers from CMS (not sure) but looks like they can be replaced by obj->size() or removed altogether. Also, the the related templated methods scan_and_forward(), scan_and_adjust_pointers() and scan_and_compact() can be directly and mechanically inlined into their callers. It makes the code more readable. > > Testing: > - [x] hotspot_gc > - [x] tier1 > - [ ] tier2 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remove obsolete comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5031/files - new: https://git.openjdk.java.net/jdk/pull/5031/files/41fe28ea..ebaa84b8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5031&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5031&range=00-01 Stats: 15 lines in 1 file changed: 0 ins; 15 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5031.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5031/head:pull/5031 PR: https://git.openjdk.java.net/jdk/pull/5031 From kbarrett at openjdk.java.net Sun Aug 8 02:21:36 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sun, 8 Aug 2021 02:21:36 GMT Subject: RFR: 8270842: G1: Only young regions need to redirty outside references in remset. [v2] In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 14:14:32 GMT, Hamlin Li wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> replace out-of-date comments > > Thanks for your review, Thomas. @Hamlin-Li two reviews are required for hotspot changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/4853 From mli at openjdk.java.net Mon Aug 9 01:30:36 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Mon, 9 Aug 2021 01:30:36 GMT Subject: RFR: 8270842: G1: Only young regions need to redirty outside references in remset. [v2] In-Reply-To: References: Message-ID: On Sun, 8 Aug 2021 02:18:09 GMT, Kim Barrett wrote: >> Thanks for your review, Thomas. > > @Hamlin-Li two reviews are required for hotspot changes. @kimbarrett Sorry, I will pay more attention. Thanks for reminding. ------------- PR: https://git.openjdk.java.net/jdk/pull/4853 From ngasson at openjdk.java.net Mon Aug 9 03:44:31 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 9 Aug 2021 03:44:31 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v3] In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 02:15:55 GMT, Hamlin Li wrote: >> Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. >> >> After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. > > Hamlin Li has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. What type of machine are you using to get that result? We tested SPECjbb on AWS M6g but couldn't see any significant difference in critical jops. ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From kbarrett at openjdk.java.net Mon Aug 9 04:38:31 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 9 Aug 2021 04:38:31 GMT Subject: RFR: 8271884: G1CH::_expand_heap_after_alloc_failure is no longer needed In-Reply-To: References: Message-ID: <8nT1dj4jgEGLG_kWJfRvwVe20Zmv3-pWRKu0IfBx8AM=.cd66254e-9cc2-454c-bdef-ca87b37456b9@github.com> On Fri, 6 Aug 2021 10:43:19 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to remove _expand_heap_after_alloc_failure. Checks for old_is_full(), survivor_is_full(), and G1CollectedHeap::has_more_regions() should eliminate repeated expansion attempts. > > Testing; tier 1 Change looks good. Testing should include verifying there isn't large amounts of output when the appropriate logging is enabled and there are expansion failures. I filed the RFE based on code-examination; it would be a pain if I missed something and this flag actually is still needed. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5030 From kbarrett at openjdk.java.net Mon Aug 9 04:41:28 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 9 Aug 2021 04:41:28 GMT Subject: RFR: 8270454: G1: Simplify region index comparison In-Reply-To: References: Message-ID: <9kBo7z9owNrW_zoDN9PuS6hoTzU-NSra5XURo8eFld4=.6797a8d3-eb9a-4397-8d30-179355f1db93@github.com> On Wed, 21 Jul 2021 02:11:32 GMT, Hamlin Li wrote: > This is a minor code improvement to simplify region index (unit) comparison. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4852 From vishalchand2492 at gmail.com Mon Aug 9 07:08:52 2021 From: vishalchand2492 at gmail.com (Vishal Chand) Date: Mon, 9 Aug 2021 12:38:52 +0530 Subject: G1 GC : JEP proposal for increased max region size and card size for very large heaps Message-ID: Greetings! *Problem Statement *- I was investigating G1 GC performance on large heaps ( > 64g). The max. region size is 32m, which gives > 4k regions and card size is 512b (fixed). The time taken by Post Evacuate Collection set phase is ~10 percent of the total GC time, while the same phase takes ~5 percent for smaller heaps. I experimented with larger regions sizes (64m and 128m) and larger card size (1024b) to check the impact. *Experiment *- SpecJbb2015 with G1 GC. With 8 groups and 119g memory per backend. I changed the max. configurable region size to 128m and card size to 1024b (requires code changes in the hotspot). The results are shown in the table below. G1GC G1GC 32m 512b G1GC 64m 512b G1GC 128m 512b G1GC 32m 1024b G1GC 64m 1024b G1GC 128m 1024b Heap Size per Backend 119g 119g 119g 119g 119g 119g max-jOPS 273028 273028 274743 273028 273028 273028 critical-jOPS *211701 (Baseline)* 212266 212670 212771 *217150 (+2.5%)* 216494 Avg. GC time (ms) 142.03 136.75 139.04 135.64 133.37 128.62 GC Phases Pre Evacuate Collection Set (ms) 1.92 1.381 1.13 1.92 1.32 1.01 Merge Heap Roots (ms) 3.16 2.56 2.99 2.59 2.02 1.57 Evacuate Collection Set (ms) 124.59 119.6 124.85 125.16 121.49 118.46 Post Evacuate Collection Set (ms) 13.09 (10%) 12.51 11.79 7.71 6.68 (5%) 6.73 Other (ms) 1.22 1.15 1.14 1.5 1.14 1.08 *Conclusion *- Critical-jOPS improved by 2.5% by using 64m regions and 1024b cards. The time taken by Post Evacuate Collection set phase is reduced from 10% of total GC time to 5% of total GC time. *Remarks *- I'm proposing to have a tunable for large cards and larger regions for very large heaps (>64g). I would like to contribute to these changes. I've done the POC and looking for someone to create a JEP for this enhancement and assign me the JEP. Thanks and Regards, Vishal Chand From mli at openjdk.java.net Mon Aug 9 07:27:38 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Mon, 9 Aug 2021 07:27:38 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v3] In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 02:15:55 GMT, Hamlin Li wrote: >> Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. >> >> After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. > > Hamlin Li has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Thanks for the testing, and feedback. setting in run_multi.sh of specjbb2015: GROUP_COUNT=4 TI_JVM_COUNT=1 JAVA_OPTS_BE="-server -XX:+UseG1GC -Xmx50g -Xms50g -XX:ParallelGCThreads=25" my test env is as below: $ uname -a Linux jvm-97 4.19.36-vhulk1907.1.0.h702.eulerosv2r8.aarch64 #1 SMP Mon Mar 16 00:02:15 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux $ lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 1 Core(s) per socket: 64 Socket(s): 2 NUMA node(s): 4 Vendor ID: 0x48 Model: 0 Stepping: 0x1 BogoMIPS: 200.00 L1d cache: 64K L1i cache: 64K L2 cache: 512K L3 cache: 65536K NUMA node0 CPU(s): 0-31 NUMA node1 CPU(s): 32-63 NUMA node2 CPU(s): 64-95 NUMA node3 CPU(s): 96-127 ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From mli at openjdk.java.net Mon Aug 9 07:30:33 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Mon, 9 Aug 2021 07:30:33 GMT Subject: RFR: 8270454: G1: Simplify region index comparison In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 02:11:32 GMT, Hamlin Li wrote: > This is a minor code improvement to simplify region index (unit) comparison. I think I still needs another review before push this patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/4852 From ayang at openjdk.java.net Mon Aug 9 07:57:29 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 9 Aug 2021 07:57:29 GMT Subject: RFR: 8270454: G1: Simplify region index comparison In-Reply-To: References: Message-ID: <_6oiSzOLk1rjHqBYXUi4-nRicuxGcWmWYvsQLjEpuvk=.9f4129f1-e9e0-4553-a54a-b15819e6193e@github.com> On Wed, 21 Jul 2021 02:11:32 GMT, Hamlin Li wrote: > This is a minor code improvement to simplify region index (unit) comparison. Marked as reviewed by ayang (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4852 From github.com+39413832+weixlu at openjdk.java.net Mon Aug 9 08:28:49 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Mon, 9 Aug 2021 08:28:49 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing Message-ID: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms [root at localhost corretto]# grep "average latency:" nohup_baseline.out average latency: 2ms:40us average latency: 6ms:550us average latency: 6ms:543us average latency: 6ms:493us average latency: 928us average latency: 794us average latency: 1ms:403us average latency: 23ms:216us average latency: 775us [root at localhost corretto]# grep "average latency:" nohup_optimized.out average latency: 2ms:48us average latency: 5ms:948us average latency: 5ms:940us average latency: 5ms:875us average latency: 850us average latency: 723us average latency: 1ms:221us average latency: 22ms:653us average latency: 693us ------------- Commit messages: - 8272138: ZGC: Adopt release ordering for self-healing Changes: https://git.openjdk.java.net/jdk/pull/5046/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5046&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272138 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5046.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5046/head:pull/5046 PR: https://git.openjdk.java.net/jdk/pull/5046 From tschatzl at openjdk.java.net Mon Aug 9 09:01:35 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 9 Aug 2021 09:01:35 GMT Subject: RFR: 8271939: Clean up primitive raw accessors in oopDesc [v2] In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 11:59:59 GMT, Roman Kennke wrote: >> We have some _raw accessors in oop.hpp and related code. Those are leftover from when Shenandoah required primitive barriers and are no longer needed. Their uses can be replaced with the regular accessors. >> >> Testing: >> - [x] tier1 >> - [x] tier2 >> - [x] hotspot_gc > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Change loader_data_raw() to regular loader_data() Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5017 From eosterlund at openjdk.java.net Mon Aug 9 09:06:34 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Aug 2021 09:06:34 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing In-Reply-To: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Mon, 9 Aug 2021 08:19:35 GMT, Xiaowei Lu wrote: > ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. > Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. > We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. > > > [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log > baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms > baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms > baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms > baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms > baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms > baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms > baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms > baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms > baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms > baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms > baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms > baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms > optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms > optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms > optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms > optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms > optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms > optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms > optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms > optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms > optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms > optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms > optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms > optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms > > [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log > baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms > baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms > baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms > baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms > baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms > baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms > baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms > baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms > baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms > baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms > baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms > baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms > optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms > optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms > optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms > optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms > optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms > optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms > optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms > optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms > optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms > optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms > optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms > optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms > > [root at localhost corretto]# grep "average latency:" nohup_baseline.out > average latency: 2ms:40us > average latency: 6ms:550us > average latency: 6ms:543us > average latency: 6ms:493us > average latency: 928us > average latency: 794us > average latency: 1ms:403us > average latency: 23ms:216us > average latency: 775us > [root at localhost corretto]# grep "average latency:" nohup_optimized.out > average latency: 2ms:48us > average latency: 5ms:948us > average latency: 5ms:940us > average latency: 5ms:875us > average latency: 850us > average latency: 723us > average latency: 1ms:221us > average latency: 22ms:653us > average latency: 693us The ordering for relocation, not to expose bytes from before a remote thread copy, is already handled appropriately in the forwarding table itself. Marking does not need to happen before self-healing. In other words, the release serves no purpose, as far as I can see. At least not in the jdk-jdk repository. There is a loom dependency in self-heal ordering, but I volunteer to fix that before their next rebase. So I would suggest having relaxed ordering instead, for the CAS. I wonder if that gives you more performance boost. ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5046 From tschatzl at openjdk.java.net Mon Aug 9 09:33:06 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 9 Aug 2021 09:33:06 GMT Subject: RFR: 8272093: Extract evacuation failure injection from G1CollectedHeap Message-ID: Hi all, can I have reviews for this change that extracts out the evacuation failure injection mechanism from `G1CollectedHeap` into its own class? This reduces the amount of code in `G1CollectedHeap` a bit, moving somewhat uninteresting but somewhat important (testing) code out from there. Also documented it a bit. The other reason was that imho it stuck out a bit due to the ifdef's in `G1CollectedHeap`. I checked that gcc does compile out the call to `evacuation_should_fail()` in `G1ParScanThreadState` in product mode. Testing: tier1, local testing Thanks, Thomas ------------- Commit messages: - Fix initialization order issue with injector - Extract evac failure injector from G1CollectedHeap Changes: https://git.openjdk.java.net/jdk/pull/5047/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5047&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272093 Stats: 329 lines in 7 files changed: 212 ins; 114 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5047.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5047/head:pull/5047 PR: https://git.openjdk.java.net/jdk/pull/5047 From tschatzl at openjdk.java.net Mon Aug 9 09:50:39 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 9 Aug 2021 09:50:39 GMT Subject: RFR: 8270454: G1: Simplify region index comparison In-Reply-To: References: Message-ID: On Wed, 21 Jul 2021 02:11:32 GMT, Hamlin Li wrote: > This is a minor code improvement to simplify region index (unit) comparison. Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4852 From tschatzl at openjdk.java.net Mon Aug 9 09:49:31 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 9 Aug 2021 09:49:31 GMT Subject: RFR: 8271951: Consolidate preserved marks overflow stack in SerialGC In-Reply-To: <_FpPV4VH0_Iyh0qyN6bJ6wyVISaoy5chgr3ARPlPIKU=.8eafc90e-dfc7-44fc-9316-6bb626e8e8f8@github.com> References: <_FpPV4VH0_Iyh0qyN6bJ6wyVISaoy5chgr3ARPlPIKU=.8eafc90e-dfc7-44fc-9316-6bb626e8e8f8@github.com> Message-ID: On Thu, 5 Aug 2021 14:29:09 GMT, Roman Kennke wrote: > In SerialGC we put preserved marks in space that we may have in the young gen, and failing that, we push it to an overflow stack. The overflow stack is actually two stacks, and replicate behavior of the in-place preserved marks. These can be consolidated. > > Testing: > - [x] tier1 > - [ ] tier2 > - [x] hotspot_gc Lgtm. Pre-existing, and just a side note and not directly aimed at you: when looking at the change, there is the `PreservedMark` class in Serial GC, and `PreservedMarks` with its `OopAndMarkWord` inner class and `PreservedMarksSet` in `shared/preservedMarks.hpp` which is not perfect naming. I'll probably file an RFE to look into this. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5022 From tschatzl at openjdk.java.net Mon Aug 9 10:27:42 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 9 Aug 2021 10:27:42 GMT Subject: RFR: 8271880: Tighten condition for excluding regions from collecting cards with cross-references Message-ID: Hi, can I have reviews for this change that by tightening the condition for excluding regions from collecting cards with cross-references allows us to avoid the rescanning of objects that failed evacuation in the fix up self-forwards phase after evacuation failure. I.e. during gc g1 never collects cards/references from the young gen (including eden) for later refinement - which means that we need to rescan all live objects remaining in eden regions for cross-references. The problem or the reason why we did that was that we did not want to add cards to refine from survivor regions (i.e. next gc's young gen) because we just don't need to as we always collect young gen, so references from there need not be recorded in the remembered sets (and actually, if we did, we errorneouosly marked cards in young gen which later card table verification will not like) - but we did not have that information on hand anywhere already quickly accessible. This change solves that problem by actually recording this information in the region attribute table as "NewSurvivor" type region. "NewSurvivor" because I did want to make explicit that these are the survivor regions from the *new* (next) young generation (i.e. just survivor) and not the survivor regions of the previous gc (that were turned eden at the start of this gc) but something like "NewYoung" or so would be fine with me as well. Another interesting addition is probably the new assert in `G1ParThreadScanState::enqueue_card_if_tracked` assert(!_g1h->heap_region_containing(o)->in_collection_set(), "Should not try to enqueue reference into collection set region"); This, at this time, verifies the assumption that g1 is not trying to collect references *to* the collection set, i.e. other objects that failed evacuation - after all we later relabel their regions as old *without* a remembered set; we would do otherwise unnecessarily because the reason is that (currently) cset tracking for these regions is enabled (at least during gc - we only later relabel and drop the remembered sets). This might change later if we want to move evacuation failed regions into survivor (or keep their remembered sets for some reason), but for now we filter attempts to add cards in the dcqs for those this way. Testing: tier1-5, gc/g1 with `JAVA_OPTIONS_=-XX+G1EvacuationFailureALot -XX:+VerifyAfterGC`. Thanks, Thomas ------------- Commit messages: - Some random changes - No scan during self forward Changes: https://git.openjdk.java.net/jdk/pull/5037/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5037&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271880 Stats: 152 lines in 12 files changed: 45 ins; 70 del; 37 mod Patch: https://git.openjdk.java.net/jdk/pull/5037.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5037/head:pull/5037 PR: https://git.openjdk.java.net/jdk/pull/5037 From rkennke at openjdk.java.net Mon Aug 9 10:34:37 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 9 Aug 2021 10:34:37 GMT Subject: RFR: 8271951: Consolidate preserved marks overflow stack in SerialGC In-Reply-To: References: <_FpPV4VH0_Iyh0qyN6bJ6wyVISaoy5chgr3ARPlPIKU=.8eafc90e-dfc7-44fc-9316-6bb626e8e8f8@github.com> Message-ID: On Mon, 9 Aug 2021 09:46:49 GMT, Thomas Schatzl wrote: > Lgtm. > > Pre-existing, and just a side note and not directly aimed at you: when looking at the change, there is the `PreservedMark` class in Serial GC, and `PreservedMarks` with its `OopAndMarkWord` inner class and `PreservedMarksSet` in `shared/preservedMarks.hpp` which is not perfect naming. I'll probably file an RFE to look into this. We might even think about using PreservedMarksSet in Serial GC, if we can allocate it in free parts of the heap. Not sure how feasible that would be though... ------------- PR: https://git.openjdk.java.net/jdk/pull/5022 From rkennke at openjdk.java.net Mon Aug 9 10:35:35 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 9 Aug 2021 10:35:35 GMT Subject: Integrated: 8271939: Clean up primitive raw accessors in oopDesc In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 11:20:58 GMT, Roman Kennke wrote: > We have some _raw accessors in oop.hpp and related code. Those are leftover from when Shenandoah required primitive barriers and are no longer needed. Their uses can be replaced with the regular accessors. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] hotspot_gc This pull request has now been integrated. Changeset: a86ac0d1 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/a86ac0d1e3a6f02e587362c767abdf62b308d321 Stats: 43 lines in 10 files changed: 0 ins; 30 del; 13 mod 8271939: Clean up primitive raw accessors in oopDesc Reviewed-by: stefank, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/5017 From tschatzl at openjdk.java.net Mon Aug 9 11:37:36 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 9 Aug 2021 11:37:36 GMT Subject: RFR: 8272083: G1: Record iterated range for BOT performance during card scan In-Reply-To: References: Message-ID: On Sat, 7 Aug 2021 04:56:08 GMT, Yude Lin wrote: > A fix to the problem in 8272083 is to use a per-worker pointer to indicate where the worker has scanned up to, similar to the _scanned_to variable. The difference is this pointer (I call it _iterated_to) records the end of the object and _scanned_to records the end of the scan. Since we always scan with increasing addresses, the end of the latest object scanned is also the address where BOT has fixed up to. So we avoid having to fix below this address when calling block_start(). This implementation approximately reduce the number of calls to set_offset_array() during scan_heap_roots() 2-10 times (in my casual test with -XX:G1ConcRefinementGreenZone=1000000). > > What this approach not solving is random access to BOT. So far I haven't found anything having this pattern. Just one initial thought: did you consider distributing the cards to scan from higher addresses to lower ones? Maybe this will remove some of this excess work - i.e. when the other threads will start helping for a given region, the BOT has already been fixed up for them. Another consideration could be fixing the BOT concurrently up to the highest marked card. ------------- PR: https://git.openjdk.java.net/jdk/pull/5039 From ayang at openjdk.java.net Mon Aug 9 11:51:34 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 9 Aug 2021 11:51:34 GMT Subject: RFR: 8272093: Extract evacuation failure injection from G1CollectedHeap In-Reply-To: References: Message-ID: On Mon, 9 Aug 2021 09:25:30 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that extracts out the evacuation failure injection mechanism from `G1CollectedHeap` into its own class? > This reduces the amount of code in `G1CollectedHeap` a bit, moving somewhat uninteresting but somewhat important (testing) code out from there. Also documented it a bit. The other reason was that imho it stuck out a bit due to the ifdef's in `G1CollectedHeap`. > > I checked that gcc does compile out the call to `evacuation_should_fail()` in `G1ParScanThreadState` in product mode. > > Testing: tier1, local testing > > Thanks, > Thomas Marked as reviewed by ayang (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5047 From github.com+39413832+weixlu at openjdk.java.net Mon Aug 9 11:56:31 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Mon, 9 Aug 2021 11:56:31 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Mon, 9 Aug 2021 09:04:00 GMT, Erik ?sterlund wrote: >> ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. >> Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. >> We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. >> >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms >> >> [root at localhost corretto]# grep "average latency:" nohup_baseline.out >> average latency: 2ms:40us >> average latency: 6ms:550us >> average latency: 6ms:543us >> average latency: 6ms:493us >> average latency: 928us >> average latency: 794us >> average latency: 1ms:403us >> average latency: 23ms:216us >> average latency: 775us >> [root at localhost corretto]# grep "average latency:" nohup_optimized.out >> average latency: 2ms:48us >> average latency: 5ms:948us >> average latency: 5ms:940us >> average latency: 5ms:875us >> average latency: 850us >> average latency: 723us >> average latency: 1ms:221us >> average latency: 22ms:653us >> average latency: 693us > > The ordering for relocation, not to expose bytes from before a remote thread copy, is already handled appropriately in the forwarding table itself. Marking does not need to happen before self-healing. In other words, the release serves no purpose, as far as I can see. At least not in the jdk-jdk repository. There is a loom dependency in self-heal ordering, but I volunteer to fix that before their next rebase. So I would suggest having relaxed ordering instead, for the CAS. I wonder if that gives you more performance boost. @fisk Thanks for your comment. Yes, there is no need for marking to happen before self healing. But when an object needs to be copied to to-space in relocation phase, I guess self healing should not happen before copy. Otherwise the other thread may get a healed reference which hasn't finished copy yet. Since this pull request (https://github.com/openjdk/jdk/pull/4763) adopts release ordering for forwarding table access, maybe we need the release semantic here in self healing to prevent the reorder. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From tschatzl at openjdk.java.net Mon Aug 9 12:50:29 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 9 Aug 2021 12:50:29 GMT Subject: RFR: 8271951: Consolidate preserved marks overflow stack in SerialGC In-Reply-To: References: <_FpPV4VH0_Iyh0qyN6bJ6wyVISaoy5chgr3ARPlPIKU=.8eafc90e-dfc7-44fc-9316-6bb626e8e8f8@github.com> Message-ID: On Mon, 9 Aug 2021 10:31:54 GMT, Roman Kennke wrote: > > Pre-existing, and just a side note and not directly aimed at you: when looking at the change, there is the `PreservedMark` class in Serial GC, and `PreservedMarks` with its `OopAndMarkWord` inner class and `PreservedMarksSet` in `shared/preservedMarks.hpp` which is not perfect naming. I'll probably file an RFE to look into this. > > We might even think about using PreservedMarksSet in Serial GC, if we can allocate it in free parts of the heap. Not sure how feasible that would be though... Agree. Filed JDK-8272147. ------------- PR: https://git.openjdk.java.net/jdk/pull/5022 From pliden at openjdk.java.net Mon Aug 9 13:28:48 2021 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 9 Aug 2021 13:28:48 GMT Subject: RFR: 8271862: C2 intrinsic for Reference.refersTo() is often not used Message-ID: While fixing JDK-8270626 it was discovered that the C2 intrinsic for `Reference.refersTo()` is not used as often as one would think. Instead C2 often prefers using the native implementation, which is much slower which defeats the purpose of having an intrinsic in the first place. The problem arise from having `refersTo0()` be virtual, native and @IntrinsicCandidate. This can be worked around by introducing an virtual method layer and so that `refersTo0()` can be made `final`. That's what this patch does, by adding a virtual `refersToImpl()` which in turn calls the now final `refersTo0()`. It should be noted that `Object.clone()` and `Object.hashCode()` are also virtual, native and @IntrinsicCandidate and these methods get special treatment by C2 to get the intrinsic generated more optimally. We might want to do a similar optimization for `Reference.refersTo0()`. In that case, it is suggested that such an improvement could come later and be handled as a separate enhancement, and @kimbarrett has already filed JDK-8272140 to track that. Testing: I found this hard to test without instrumenting Hotspot to tell me that the intrinsic was called instead of the native method. So, for that reason testing was ad-hoc, using an instrumented Hotspot in combination with the test (`vmTestbase/gc/gctests/PhantomReference/phantom002/TestDescription.java`) that initially uncovered the problem. See JDK-8270626 for additional information. ------------- Commit messages: - 8271862: C2 intrinsic for Reference.refersTo() is often not used Changes: https://git.openjdk.java.net/jdk/pull/5052/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5052&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271862 Stats: 14 lines in 2 files changed: 11 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5052.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5052/head:pull/5052 PR: https://git.openjdk.java.net/jdk/pull/5052 From eosterlund at openjdk.java.net Mon Aug 9 13:46:30 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Aug 2021 13:46:30 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing In-Reply-To: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Mon, 9 Aug 2021 08:19:35 GMT, Xiaowei Lu wrote: > ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. > Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. > We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. > > > [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log > baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms > baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms > baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms > baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms > baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms > baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms > baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms > baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms > baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms > baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms > baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms > baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms > optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms > optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms > optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms > optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms > optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms > optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms > optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms > optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms > optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms > optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms > optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms > optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms > > [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log > baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms > baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms > baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms > baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms > baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms > baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms > baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms > baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms > baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms > baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms > baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms > baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms > optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms > optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms > optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms > optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms > optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms > optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms > optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms > optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms > optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms > optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms > optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms > optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms > > [root at localhost corretto]# grep "average latency:" nohup_baseline.out > average latency: 2ms:40us > average latency: 6ms:550us > average latency: 6ms:543us > average latency: 6ms:493us > average latency: 928us > average latency: 794us > average latency: 1ms:403us > average latency: 23ms:216us > average latency: 775us > [root at localhost corretto]# grep "average latency:" nohup_optimized.out > average latency: 2ms:48us > average latency: 5ms:948us > average latency: 5ms:940us > average latency: 5ms:875us > average latency: 850us > average latency: 723us > average latency: 1ms:221us > average latency: 22ms:653us > average latency: 693us A thread that copies the object and self heals will conceptually do the following, assuming relaxed memory ordering: copy(); release(); cas_forwarding_table(); cas_self_heal(); The release before casing in the forwading table, acts as a release for both accesses, in the scenario when the copy is being published. So in the scenario you describe, the release in the forwarding table is already enough, to ensure that anyone reading the self healed pointer, is guaranteed to not observe bytes from before the copy. In the scenario when one thread performs the copy that gets published to the forwarding table, and another thread self-heals the pointer with the value acquired from the forwarding table, we will indeed not have a release to publish the pointer, only an acquire used to read from the forwarding table. However, this is fine, as the new MCA ARMv8 architecture does not allow causal consistency violations like WRC (cf. https://dl.acm.org/doi/pdf/10.1145/3158107 section 4). So we no longer need to use acquire/release to guarantee causal consistency across threads. This would naturally not hold for PPC, but there is no PPC port for ZGC yet. It is interesting though that when loading a self-healed pointer, we do not perform any acquire. That is fine when dereferencing the loaded pointer, as a dependent load, eliding the need for an acquire. And that is indeed fine for the JIT compiled code, because we know it is always a dependent load (or safe in other ways). However, for the C++ code, we can not *guarantee* that there will be a dependent load in a spec conforming way. That might be something to look into. In practice, there isn't any good reason why reading and oop and then dereferencing it wouldn't yield a dependent load, but the spec doesn't promise anything and could in theory allow compilers to mess this up. However, having an acquire for every oop load in the runtime does sound a bit costly. The memory_order_consume semantics were supposed to solve this, but I'm not sure if the compilers have yet become good at doing something useful with that, other than just having it be equivalent to acquire. Might be something to check out in the disassembly to see what it yields. But that is an exercise for another day, as this isn't an issue you are introducing with this patch. Hope this helps explain my thoughts in more detail. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From ayang at openjdk.java.net Mon Aug 9 17:43:32 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 9 Aug 2021 17:43:32 GMT Subject: RFR: 8271880: Tighten condition for excluding regions from collecting cards with cross-references In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 21:05:50 GMT, Thomas Schatzl wrote: > Hi, > > can I have reviews for this change that by tightening the condition for excluding regions from collecting cards with cross-references allows us to avoid the rescanning of objects that failed evacuation in the fix up self-forwards phase after evacuation failure. > > I.e. during gc g1 never collects cards/references from the young gen (including eden) for later refinement - which means that we need to rescan all live objects remaining in eden regions for cross-references. > > The problem or the reason why we did that was that we did not want to add cards to refine from survivor regions (i.e. next gc's young gen) because we just don't need to as we always collect young gen, so references from there need not be recorded in the remembered sets (and actually, if we did, we errorneouosly marked cards in young gen which later card table verification will not like) - but we did not have that information on hand anywhere already quickly accessible. > > This change solves that problem by actually recording this information in the region attribute table as "NewSurvivor" type region. "NewSurvivor" because I did want to make explicit that these are the survivor regions from the *new* (next) young generation (i.e. just survivor) and not the survivor regions of the previous gc (that were turned eden at the start of this gc) but something like "NewYoung" or so would be fine with me as well (or certainly just "Survivor", but that might be confusing). > > Another interesting addition is probably the new assert in `G1ParThreadScanState::enqueue_card_if_tracked` > > > assert(!_g1h->heap_region_containing(o)->in_collection_set(), "Should not try to enqueue reference into collection set region"); > > This, at this time, verifies the assumption that g1 is not trying to collect references *to* the collection set, i.e. other objects that failed evacuation - after all we later relabel their regions as old *without* a remembered set; we would do otherwise unnecessarily because the reason is that (currently) cset tracking for these regions is enabled (at least during gc - we only later relabel and drop the remembered sets). > > This might change later if we want to move evacuation failed regions into survivor (or keep their remembered sets for some reason), but for now we filter attempts to add cards in the dcqs for those this way. > > Testing: tier1-5, gc/g1 with `JAVA_OPTIONS_=-XX+G1EvacuationFailureALot -XX:+VerifyAfterGC`. > > Thanks, > Thomas Changes requested by ayang (Committer). src/hotspot/share/gc/g1/g1OopClosures.hpp line 96: > 94: }; > 95: > 96: ScanningInYoungValues _skip_card_enqueue; The enum name can be `tristate`, right? src/hotspot/share/gc/g1/g1OopClosures.hpp line 114: > 112: }; > 113: > 114: // RAII object to properly set the _scanning_in_young field in G1ScanEvacuatedObjClosure. Obsolete comments; can be dropped completely, IMO. src/hotspot/share/gc/g1/g1OopClosures.hpp line 121: > 119: G1SkipCardEnqueueSetter(G1ScanEvacuatedObjClosure* closure, bool skip_enqueue_cards) : _closure(closure) { > 120: assert(_closure->_skip_card_enqueue == G1ScanEvacuatedObjClosure::Uninitialized, "Must not be set"); > 121: _closure->_skip_card_enqueue = skip_enqueue_cards ? G1ScanEvacuatedObjClosure::True : G1ScanEvacuatedObjClosure::False; There's no reason for `class G1SkipCardEnqueueSetter`, field `_skip_card_enqueue` and method arg `skip_enqueue_cards` to not follow the same pattern; both alternatives are fine, "skip enqueue cards" or "skip card enqueue". src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 248: > 246: G1HeapRegionAttr dest_attr = _g1h->region_attr(to_array); > 247: assert(!dest_attr.is_in_cset(), "must not scan object from cset here"); > 248: G1SkipCardEnqueueSetter x(&_scanner, dest_attr.is_new_survivor() || dest_attr.is_in_cset()); We just assert `!dest_attr.is_in_cset()`, why still `|| dest_attr.is_in_cset()`? src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 283: > 281: dest_attr = _g1h->region_attr(to_array); > 282: assert(!_g1h->region_attr(to_array).is_in_cset(), "must not scan object from cset here"); > 283: G1SkipCardEnqueueSetter x(&_scanner, dest_attr.is_new_survivor() || dest_attr.is_in_cset()); Is this still WIP? ------------- PR: https://git.openjdk.java.net/jdk/pull/5037 From rkennke at openjdk.java.net Mon Aug 9 18:30:48 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 9 Aug 2021 18:30:48 GMT Subject: RFR: 8272165: Consolidate mark_must_be_preserved() variants Message-ID: We have markWord::mark_must_be_preserved() and markWord::mark_must_be_preserved_for_promotion_failure(). We used to have slightly different implementations when we had BiasedLocking, but now they are identical and used for identical purposes too. Let's remove what we don't need. Testing: - [ ] hotspot_gc - [ ] tier1 - [ ] tier2 ------------- Commit messages: - 8272165: Consolidate mark_must_be_preserved() variants Changes: https://git.openjdk.java.net/jdk/pull/5057/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5057&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272165 Stats: 12 lines in 4 files changed: 0 ins; 11 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5057.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5057/head:pull/5057 PR: https://git.openjdk.java.net/jdk/pull/5057 From tschatzl at openjdk.java.net Mon Aug 9 18:42:42 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 9 Aug 2021 18:42:42 GMT Subject: RFR: 8271880: Tighten condition for excluding regions from collecting cards with cross-references In-Reply-To: References: Message-ID: On Mon, 9 Aug 2021 17:38:21 GMT, Albert Mingkun Yang wrote: >> Hi, >> >> can I have reviews for this change that by tightening the condition for excluding regions from collecting cards with cross-references allows us to avoid the rescanning of objects that failed evacuation in the fix up self-forwards phase after evacuation failure. >> >> I.e. during gc g1 never collects cards/references from the young gen (including eden) for later refinement - which means that we need to rescan all live objects remaining in eden regions for cross-references. >> >> The problem or the reason why we did that was that we did not want to add cards to refine from survivor regions (i.e. next gc's young gen) because we just don't need to as we always collect young gen, so references from there need not be recorded in the remembered sets (and actually, if we did, we errorneouosly marked cards in young gen which later card table verification will not like) - but we did not have that information on hand anywhere already quickly accessible. >> >> This change solves that problem by actually recording this information in the region attribute table as "NewSurvivor" type region. "NewSurvivor" because I did want to make explicit that these are the survivor regions from the *new* (next) young generation (i.e. just survivor) and not the survivor regions of the previous gc (that were turned eden at the start of this gc) but something like "NewYoung" or so would be fine with me as well (or certainly just "Survivor", but that might be confusing). >> >> Another interesting addition is probably the new assert in `G1ParThreadScanState::enqueue_card_if_tracked` >> >> >> assert(!_g1h->heap_region_containing(o)->in_collection_set(), "Should not try to enqueue reference into collection set region"); >> >> This, at this time, verifies the assumption that g1 is not trying to collect references *to* the collection set, i.e. other objects that failed evacuation - after all we later relabel their regions as old *without* a remembered set; we would do otherwise unnecessarily because the reason is that (currently) cset tracking for these regions is enabled (at least during gc - we only later relabel and drop the remembered sets). >> >> This might change later if we want to move evacuation failed regions into survivor (or keep their remembered sets for some reason), but for now we filter attempts to add cards in the dcqs for those this way. >> >> Testing: tier1-5, gc/g1 with `JAVA_OPTIONS_=-XX+G1EvacuationFailureALot -XX:+VerifyAfterGC`. >> >> Thanks, >> Thomas > > src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 248: > >> 246: G1HeapRegionAttr dest_attr = _g1h->region_attr(to_array); >> 247: assert(!dest_attr.is_in_cset(), "must not scan object from cset here"); >> 248: G1SkipCardEnqueueSetter x(&_scanner, dest_attr.is_new_survivor() || dest_attr.is_in_cset()); > > We just assert `!dest_attr.is_in_cset()`, why still `|| dest_attr.is_in_cset()`? `assert(!dest_attr.is_in_cset(), "must not scan object from cset here");` is because this should not happen (yet) as we do not split large objarrays that failed evacuation at this time. I'll remove that one, because something like "not implemented" or "should not happen" isn't great either. `dest_attr.is_new_survivor() || dest_attr.is_in_cset()` is incorrect, just `dest_attr.is_new_survivor()` is correct (then again, it could not have happened yet). > src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 283: > >> 281: dest_attr = _g1h->region_attr(to_array); >> 282: assert(!_g1h->region_attr(to_array).is_in_cset(), "must not scan object from cset here"); >> 283: G1SkipCardEnqueueSetter x(&_scanner, dest_attr.is_new_survivor() || dest_attr.is_in_cset()); > > Is this still WIP? No, but has the same bug as the other similar place. ------------- PR: https://git.openjdk.java.net/jdk/pull/5037 From tschatzl at openjdk.java.net Mon Aug 9 19:07:09 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 9 Aug 2021 19:07:09 GMT Subject: RFR: 8271880: Tighten condition for excluding regions from collecting cards with cross-references [v2] In-Reply-To: References: Message-ID: > Hi, > > can I have reviews for this change that by tightening the condition for excluding regions from collecting cards with cross-references allows us to avoid the rescanning of objects that failed evacuation in the fix up self-forwards phase after evacuation failure. > > I.e. during gc g1 never collects cards/references from the young gen (including eden) for later refinement - which means that we need to rescan all live objects remaining in eden regions for cross-references. > > The problem or the reason why we did that was that we did not want to add cards to refine from survivor regions (i.e. next gc's young gen) because we just don't need to as we always collect young gen, so references from there need not be recorded in the remembered sets (and actually, if we did, we errorneouosly marked cards in young gen which later card table verification will not like) - but we did not have that information on hand anywhere already quickly accessible. > > This change solves that problem by actually recording this information in the region attribute table as "NewSurvivor" type region. "NewSurvivor" because I did want to make explicit that these are the survivor regions from the *new* (next) young generation (i.e. just survivor) and not the survivor regions of the previous gc (that were turned eden at the start of this gc) but something like "NewYoung" or so would be fine with me as well (or certainly just "Survivor", but that might be confusing). > > Another interesting addition is probably the new assert in `G1ParThreadScanState::enqueue_card_if_tracked` > > > assert(!_g1h->heap_region_containing(o)->in_collection_set(), "Should not try to enqueue reference into collection set region"); > > This, at this time, verifies the assumption that g1 is not trying to collect references *to* the collection set, i.e. other objects that failed evacuation - after all we later relabel their regions as old *without* a remembered set; we would do otherwise unnecessarily because the reason is that (currently) cset tracking for these regions is enabled (at least during gc - we only later relabel and drop the remembered sets). > > This might change later if we want to move evacuation failed regions into survivor (or keep their remembered sets for some reason), but for now we filter attempts to add cards in the dcqs for those this way. > > Testing: tier1-5, gc/g1 with `JAVA_OPTIONS_=-XX+G1EvacuationFailureALot -XX:+VerifyAfterGC`. > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5037/files - new: https://git.openjdk.java.net/jdk/pull/5037/files/952df15d..b38c11e4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5037&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5037&range=00-01 Stats: 15 lines in 2 files changed: 4 ins; 2 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/5037.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5037/head:pull/5037 PR: https://git.openjdk.java.net/jdk/pull/5037 From tschatzl at openjdk.java.net Mon Aug 9 19:07:34 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 9 Aug 2021 19:07:34 GMT Subject: RFR: 8272165: Consolidate mark_must_be_preserved() variants In-Reply-To: References: Message-ID: <1zVW6e8aQOIGtp11Y2ACR2T_8G5uobZFTwXDWu8-EK0=.e5b027f0-7430-421b-93d5-dbc47c87ab9a@github.com> On Mon, 9 Aug 2021 18:25:35 GMT, Roman Kennke wrote: > We have markWord::mark_must_be_preserved() and markWord::mark_must_be_preserved_for_promotion_failure(). We used to have slightly different implementations when we had BiasedLocking, but now they are identical and used for identical purposes too. Let's remove what we don't need. > > Testing: > - [ ] hotspot_gc > - [ ] tier1 > - [ ] tier2 Lgtm and trivial removal. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5057 From kbarrett at openjdk.java.net Mon Aug 9 18:17:35 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 9 Aug 2021 18:17:35 GMT Subject: RFR: 8271862: C2 intrinsic for Reference.refersTo() is often not used In-Reply-To: References: Message-ID: On Mon, 9 Aug 2021 13:13:33 GMT, Per Liden wrote: > While fixing JDK-8270626 it was discovered that the C2 intrinsic for `Reference.refersTo()` is not used as often as one would think. Instead C2 often prefers using the native implementation, which is much slower which defeats the purpose of having an intrinsic in the first place. > > The problem arise from having `refersTo0()` be virtual, native and @IntrinsicCandidate. This can be worked around by introducing an virtual method layer and so that `refersTo0()` can be made `final`. That's what this patch does, by adding a virtual `refersToImpl()` which in turn calls the now final `refersTo0()`. > > It should be noted that `Object.clone()` and `Object.hashCode()` are also virtual, native and @IntrinsicCandidate and these methods get special treatment by C2 to get the intrinsic generated more optimally. We might want to do a similar optimization for `Reference.refersTo0()`. In that case, it is suggested that such an improvement could come later and be handled as a separate enhancement, and @kimbarrett has already filed JDK-8272140 to track that. > > Testing: > I found this hard to test without instrumenting Hotspot to tell me that the intrinsic was called instead of the native method. So, for that reason testing was ad-hoc, using an instrumented Hotspot in combination with the test (`vmTestbase/gc/gctests/PhantomReference/phantom002/TestDescription.java`) that initially uncovered the problem. See JDK-8270626 for additional information. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5052 From mchung at openjdk.java.net Mon Aug 9 19:29:32 2021 From: mchung at openjdk.java.net (Mandy Chung) Date: Mon, 9 Aug 2021 19:29:32 GMT Subject: RFR: 8271862: C2 intrinsic for Reference.refersTo() is often not used In-Reply-To: References: Message-ID: On Mon, 9 Aug 2021 13:13:33 GMT, Per Liden wrote: > While fixing JDK-8270626 it was discovered that the C2 intrinsic for `Reference.refersTo()` is not used as often as one would think. Instead C2 often prefers using the native implementation, which is much slower which defeats the purpose of having an intrinsic in the first place. > > The problem arise from having `refersTo0()` be virtual, native and @IntrinsicCandidate. This can be worked around by introducing an virtual method layer and so that `refersTo0()` can be made `final`. That's what this patch does, by adding a virtual `refersToImpl()` which in turn calls the now final `refersTo0()`. > > It should be noted that `Object.clone()` and `Object.hashCode()` are also virtual, native and @IntrinsicCandidate and these methods get special treatment by C2 to get the intrinsic generated more optimally. We might want to do a similar optimization for `Reference.refersTo0()`. In that case, it is suggested that such an improvement could come later and be handled as a separate enhancement, and @kimbarrett has already filed JDK-8272140 to track that. > > Testing: > I found this hard to test without instrumenting Hotspot to tell me that the intrinsic was called instead of the native method. So, for that reason testing was ad-hoc, using an instrumented Hotspot in combination with the test (`vmTestbase/gc/gctests/PhantomReference/phantom002/TestDescription.java`) that initially uncovered the problem. See JDK-8270626 for additional information. Looks good to me too. ------------- Marked as reviewed by mchung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5052 From vlivanov at openjdk.java.net Mon Aug 9 20:03:36 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 9 Aug 2021 20:03:36 GMT Subject: RFR: 8271862: C2 intrinsic for Reference.refersTo() is often not used In-Reply-To: References: Message-ID: On Mon, 9 Aug 2021 13:13:33 GMT, Per Liden wrote: > While fixing JDK-8270626 it was discovered that the C2 intrinsic for `Reference.refersTo()` is not used as often as one would think. Instead C2 often prefers using the native implementation, which is much slower which defeats the purpose of having an intrinsic in the first place. > > The problem arise from having `refersTo0()` be virtual, native and @IntrinsicCandidate. This can be worked around by introducing an virtual method layer and so that `refersTo0()` can be made `final`. That's what this patch does, by adding a virtual `refersToImpl()` which in turn calls the now final `refersTo0()`. > > It should be noted that `Object.clone()` and `Object.hashCode()` are also virtual, native and @IntrinsicCandidate and these methods get special treatment by C2 to get the intrinsic generated more optimally. We might want to do a similar optimization for `Reference.refersTo0()`. In that case, it is suggested that such an improvement could come later and be handled as a separate enhancement, and @kimbarrett has already filed JDK-8272140 to track that. > > Testing: > I found this hard to test without instrumenting Hotspot to tell me that the intrinsic was called instead of the native method. So, for that reason testing was ad-hoc, using an instrumented Hotspot in combination with the test (`vmTestbase/gc/gctests/PhantomReference/phantom002/TestDescription.java`) that initially uncovered the problem. See JDK-8270626 for additional information. src/java.base/share/classes/java/lang/ref/Reference.java line 374: > 372: * to call the native implementation over the intrinsic. > 373: */ > 374: boolean refersToImpl(T obj) { I'm curious why can't you get rid of `refersToImpl` (virtual method) and just override `refersTo` on `PhantomReference`. Am I missing something important about keeping `refersTo` final? src/java.base/share/classes/java/lang/ref/Reference.java line 379: > 377: > 378: @IntrinsicCandidate > 379: private native final boolean refersTo0(Object o); No need to have it both `private` and `final`. Just `private` should be enough. ------------- PR: https://git.openjdk.java.net/jdk/pull/5052 From mchung at openjdk.java.net Mon Aug 9 20:18:31 2021 From: mchung at openjdk.java.net (Mandy Chung) Date: Mon, 9 Aug 2021 20:18:31 GMT Subject: RFR: 8271862: C2 intrinsic for Reference.refersTo() is often not used In-Reply-To: References: Message-ID: On Mon, 9 Aug 2021 19:58:48 GMT, Vladimir Ivanov wrote: >> While fixing JDK-8270626 it was discovered that the C2 intrinsic for `Reference.refersTo()` is not used as often as one would think. Instead C2 often prefers using the native implementation, which is much slower which defeats the purpose of having an intrinsic in the first place. >> >> The problem arise from having `refersTo0()` be virtual, native and @IntrinsicCandidate. This can be worked around by introducing an virtual method layer and so that `refersTo0()` can be made `final`. That's what this patch does, by adding a virtual `refersToImpl()` which in turn calls the now final `refersTo0()`. >> >> It should be noted that `Object.clone()` and `Object.hashCode()` are also virtual, native and @IntrinsicCandidate and these methods get special treatment by C2 to get the intrinsic generated more optimally. We might want to do a similar optimization for `Reference.refersTo0()`. In that case, it is suggested that such an improvement could come later and be handled as a separate enhancement, and @kimbarrett has already filed JDK-8272140 to track that. >> >> Testing: >> I found this hard to test without instrumenting Hotspot to tell me that the intrinsic was called instead of the native method. So, for that reason testing was ad-hoc, using an instrumented Hotspot in combination with the test (`vmTestbase/gc/gctests/PhantomReference/phantom002/TestDescription.java`) that initially uncovered the problem. See JDK-8270626 for additional information. > > src/java.base/share/classes/java/lang/ref/Reference.java line 374: > >> 372: * to call the native implementation over the intrinsic. >> 373: */ >> 374: boolean refersToImpl(T obj) { > > I'm curious why can't you get rid of `refersToImpl` (virtual method) and just override `refersTo` on `PhantomReference`. Am I missing something important about keeping `refersTo` final? We don't want user-defined subclass of `Reference` overriding the `refersTo` implementation accidentally. ------------- PR: https://git.openjdk.java.net/jdk/pull/5052 From ayang at openjdk.java.net Mon Aug 9 20:46:35 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 9 Aug 2021 20:46:35 GMT Subject: RFR: 8271880: Tighten condition for excluding regions from collecting cards with cross-references [v2] In-Reply-To: References: Message-ID: <7z84zNFo_BDVIktycI1LcRBkxFq8HzhuMi18VE5N2JQ=.3f4ff5e1-d560-4515-9240-4c94e98b7f58@github.com> On Mon, 9 Aug 2021 19:07:09 GMT, Thomas Schatzl wrote: >> Hi, >> >> can I have reviews for this change that by tightening the condition for excluding regions from collecting cards with cross-references allows us to avoid the rescanning of objects that failed evacuation in the fix up self-forwards phase after evacuation failure. >> >> I.e. during gc g1 never collects cards/references from the young gen (including eden) for later refinement - which means that we need to rescan all live objects remaining in eden regions for cross-references. >> >> The problem or the reason why we did that was that we did not want to add cards to refine from survivor regions (i.e. next gc's young gen) because we just don't need to as we always collect young gen, so references from there need not be recorded in the remembered sets (and actually, if we did, we errorneouosly marked cards in young gen which later card table verification will not like) - but we did not have that information on hand anywhere already quickly accessible. >> >> This change solves that problem by actually recording this information in the region attribute table as "NewSurvivor" type region. "NewSurvivor" because I did want to make explicit that these are the survivor regions from the *new* (next) young generation (i.e. just survivor) and not the survivor regions of the previous gc (that were turned eden at the start of this gc) but something like "NewYoung" or so would be fine with me as well (or certainly just "Survivor", but that might be confusing). >> >> Another interesting addition is probably the new assert in `G1ParThreadScanState::enqueue_card_if_tracked` >> >> >> assert(!_g1h->heap_region_containing(o)->in_collection_set(), "Should not try to enqueue reference into collection set region"); >> >> This, at this time, verifies the assumption that g1 is not trying to collect references *to* the collection set, i.e. other objects that failed evacuation - after all we later relabel their regions as old *without* a remembered set; we would do otherwise unnecessarily because the reason is that (currently) cset tracking for these regions is enabled (at least during gc - we only later relabel and drop the remembered sets). >> >> This might change later if we want to move evacuation failed regions into survivor (or keep their remembered sets for some reason), but for now we filter attempts to add cards in the dcqs for those this way. >> >> Testing: tier1-5, gc/g1 with `JAVA_OPTIONS_=-XX+G1EvacuationFailureALot -XX:+VerifyAfterGC`. >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > ayang review Marked as reviewed by ayang (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5037 From vlivanov at openjdk.java.net Mon Aug 9 21:16:31 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 9 Aug 2021 21:16:31 GMT Subject: RFR: 8271862: C2 intrinsic for Reference.refersTo() is often not used In-Reply-To: References: Message-ID: On Mon, 9 Aug 2021 20:15:59 GMT, Mandy Chung wrote: >> src/java.base/share/classes/java/lang/ref/Reference.java line 374: >> >>> 372: * to call the native implementation over the intrinsic. >>> 373: */ >>> 374: boolean refersToImpl(T obj) { >> >> I'm curious why can't you get rid of `refersToImpl` (virtual method) and just override `refersTo` on `PhantomReference`. Am I missing something important about keeping `refersTo` final? > > We don't want user-defined subclass of `Reference` overriding the `refersTo` implementation accidentally. Got it. Thanks for the clarification! ------------- PR: https://git.openjdk.java.net/jdk/pull/5052 From kbarrett at openjdk.java.net Tue Aug 10 01:49:29 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 10 Aug 2021 01:49:29 GMT Subject: RFR: 8267186: Add string deduplication support to ZGC [v2] In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 14:11:57 GMT, Per Liden wrote: >> This change adds support for string deduplication to ZGC. It's a pretty straight forward change, but to make reviewing even easier it is broken up into two commits: >> >> __ZGC: Introduce ZMarkContext__ >> This commit just moves the `ZMarkCache` into the new `ZMarkContext` class, and we now pass a `ZMarkContext*` around instead of a `ZMarkCache*`. The `ZMarkContext` class is a more general container for worker-local data, which in the next commit will be extended to also include the `StringDedup::Requests` queue. >> >> __8267186: Add string deduplication support to ZGC__ >> This commits adds the actual string dedup functionality and enables relevant tests. We use the `deduplication_requested` bit in the `String` object to filter out `Strings` we've already attempted to deduplicate. >> >> Testing: >> - Passes all string dedup tests. >> - Passes Tier1-7 with ZGC on Linux/x86_64 (with -XX:+UseStringDeduplication enabled by default to get better exposure). > > Per Liden has updated the pull request incrementally with one additional commit since the last revision: > > Review comments from Erik Looks good, except for the include order in zMarkContext.inline.hpp. src/hotspot/share/gc/z/zMarkContext.inline.hpp line 29: > 27: #include "classfile/javaClasses.inline.hpp" > 28: #include "gc/z/zMarkCache.inline.hpp" > 29: #include "gc/z/zMarkContext.hpp" The associated .hpp file is supposed to be included first by a .inline.hpp file, per JDK-8267464. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5029 From github.com+16811675+linade at openjdk.java.net Tue Aug 10 03:18:29 2021 From: github.com+16811675+linade at openjdk.java.net (Yude Lin) Date: Tue, 10 Aug 2021 03:18:29 GMT Subject: RFR: 8272083: G1: Record iterated range for BOT performance during card scan In-Reply-To: References: Message-ID: On Sat, 7 Aug 2021 04:56:08 GMT, Yude Lin wrote: > A fix to the problem in 8272083 is to use a per-worker pointer to indicate where the worker has scanned up to, similar to the _scanned_to variable. The difference is this pointer (I call it _iterated_to) records the end of the object and _scanned_to records the end of the scan. Since we always scan with increasing addresses, the end of the latest object scanned is also the address where BOT has fixed up to. So we avoid having to fix below this address when calling block_start(). This implementation approximately reduce the number of calls to set_offset_array() during scan_heap_roots() 2-10 times (in my casual test with -XX:G1ConcRefinementGreenZone=1000000). > > What this approach not solving is random access to BOT. So far I haven't found anything having this pattern. I had an implementation that does a backward scan. It ended up not as clean as the '_iterated_to' solution. Yes, the '_iterated_to' solution still leaves some excess work. I think it only happens when (correct me if I'm wrong): 1. Worker1 is working on chunk1 and it's about to finish, worker2 starts working on chunk2 that is just behind chunk1; and 2. There is a gc-allocated block spans chunk1 and chunk2. If worker2 starts with a card that falls into this gc-allocated block, it will have to fix some BOT entries that might have been fixed by worker 1. It only happens once, since after this fix, worker 2 has its own '_iterated_to' hint. I considered how to do concurrent fixing. Suppose we use a per-region pointer to record where BOT has fixed up to and all workers update it. Still using the above scenario, if worker1 is in the middle of processing chunk1 and worker2 gets to fix something when processing chunk2. Then the pointer will be updated to something > chunk1.end(). Then worker1 will not benefit from this pointer anymore since it's not pointing into chunk1. Furthermore, it's hard to guarantee updates to be visible by worker2 in time. Worker1 only has useful information for worker2 when worker1 is about to finish chunk1. Worker2 only benefits from this information when it starts on chunk2 before worker1 claims it: worker2 claims chunk2 worker1 about to finish chunk1 [ ] worker1 tries to claim chunk2 The window between '[ ]' is small, making worker2 hard to benefit. In summary, I think the per-worker '_iterated_to' solution has a somewhat small excess work; I don't see the concurrent fixing offers a lot more; the backward scan solution solves all problem, but the reason I like it a little less is that it's less straightforward, and essentially it's requiring the user (scanner) to make adjustment for a BOT problem. ------------- PR: https://git.openjdk.java.net/jdk/pull/5039 From mli at openjdk.java.net Tue Aug 10 03:22:34 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Tue, 10 Aug 2021 03:22:34 GMT Subject: RFR: 8270454: G1: Simplify region index comparison In-Reply-To: <9kBo7z9owNrW_zoDN9PuS6hoTzU-NSra5XURo8eFld4=.6797a8d3-eb9a-4397-8d30-179355f1db93@github.com> References: <9kBo7z9owNrW_zoDN9PuS6hoTzU-NSra5XURo8eFld4=.6797a8d3-eb9a-4397-8d30-179355f1db93@github.com> Message-ID: <3B1XIgSJL7Le_Js3GNm1KxRwuP9qwGYREC7l1VDXrUE=.e63d2ca0-3954-43cd-8106-f594d573b80c@github.com> On Mon, 9 Aug 2021 04:38:18 GMT, Kim Barrett wrote: >> This is a minor code improvement to simplify region index (unit) comparison. > > Looks good. @kimbarrett @tschatzl @albertnetymk Thanks for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/4852 From mli at openjdk.java.net Tue Aug 10 03:22:34 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Tue, 10 Aug 2021 03:22:34 GMT Subject: Integrated: 8270454: G1: Simplify region index comparison In-Reply-To: References: Message-ID: <2LxonTKZOJXil515cY9L7k5FBCFeVWH91rVhaUyAe04=.f6151a9d-b859-462c-8193-eadc2dbb7c12@github.com> On Wed, 21 Jul 2021 02:11:32 GMT, Hamlin Li wrote: > This is a minor code improvement to simplify region index (unit) comparison. This pull request has now been integrated. Changeset: abdc1074 Author: Hamlin Li URL: https://git.openjdk.java.net/jdk/commit/abdc1074dcefda9012bb4d84c9f34a2dca5ea560 Stats: 8 lines in 1 file changed: 0 ins; 6 del; 2 mod 8270454: G1: Simplify region index comparison Reviewed-by: kbarrett, ayang, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/4852 From github.com+16811675+linade at openjdk.java.net Tue Aug 10 03:25:35 2021 From: github.com+16811675+linade at openjdk.java.net (Yude Lin) Date: Tue, 10 Aug 2021 03:25:35 GMT Subject: RFR: 8272083: G1: Record iterated range for BOT performance during card scan In-Reply-To: References: Message-ID: <7Sj2wSp7gqTDWEMM98OJhd_dYVVKj46iKms5EumS4LM=.ac9eb50f-6254-40b7-93c5-9d91f3724695@github.com> On Sat, 7 Aug 2021 04:56:08 GMT, Yude Lin wrote: > A fix to the problem in 8272083 is to use a per-worker pointer to indicate where the worker has scanned up to, similar to the _scanned_to variable. The difference is this pointer (I call it _iterated_to) records the end of the object and _scanned_to records the end of the scan. Since we always scan with increasing addresses, the end of the latest object scanned is also the address where BOT has fixed up to. So we avoid having to fix below this address when calling block_start(). This implementation approximately reduce the number of calls to set_offset_array() during scan_heap_roots() 2-10 times (in my casual test with -XX:G1ConcRefinementGreenZone=1000000). > > What this approach not solving is random access to BOT. So far I haven't found anything having this pattern. uploaded the backward scan implementation (draft) for reference https://github.com/linade/jdk/commit/e2c123f0819b5aa0cc7d093d2ac4512ef37bc78a ------------- PR: https://git.openjdk.java.net/jdk/pull/5039 From kbarrett at openjdk.java.net Tue Aug 10 04:25:29 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 10 Aug 2021 04:25:29 GMT Subject: RFR: 8271951: Consolidate preserved marks overflow stack in SerialGC In-Reply-To: <_FpPV4VH0_Iyh0qyN6bJ6wyVISaoy5chgr3ARPlPIKU=.8eafc90e-dfc7-44fc-9316-6bb626e8e8f8@github.com> References: <_FpPV4VH0_Iyh0qyN6bJ6wyVISaoy5chgr3ARPlPIKU=.8eafc90e-dfc7-44fc-9316-6bb626e8e8f8@github.com> Message-ID: <6VKN-rj3f9JQ7ZtcGif2WfVCKwtH4Y5_C1xVEdTDzA8=.e854ba16-a09f-48c4-a3c9-22f6d7050e91@github.com> On Thu, 5 Aug 2021 14:29:09 GMT, Roman Kennke wrote: > In SerialGC we put preserved marks in space that we may have in the young gen, and failing that, we push it to an overflow stack. The overflow stack is actually two stacks, and replicate behavior of the in-place preserved marks. These can be consolidated. > > Testing: > - [x] tier1 > - [ ] tier2 > - [x] hotspot_gc Looks good. Just the question about PreservedMark::init. src/hotspot/share/gc/serial/markSweep.hpp line 203: > 201: PreservedMark(oop obj, markWord mark) : _obj(obj), _mark(mark) {} > 202: > 203: void init(oop obj, markWord mark) { Do we even need this `init` function anymore? Even if such an assignment is still needed, it could just use assignment from a newly constructed object, e.g. `pm = PreservedMark(obj, mark);`. The only place I found where it is used is preserve_marks, which can use the assignment form instead. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5022 From github.com+39413832+weixlu at openjdk.java.net Tue Aug 10 07:14:30 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Tue, 10 Aug 2021 07:14:30 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Mon, 9 Aug 2021 13:43:31 GMT, Erik ?sterlund wrote: >> ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. >> Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. >> We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. >> >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms >> >> [root at localhost corretto]# grep "average latency:" nohup_baseline.out >> average latency: 2ms:40us >> average latency: 6ms:550us >> average latency: 6ms:543us >> average latency: 6ms:493us >> average latency: 928us >> average latency: 794us >> average latency: 1ms:403us >> average latency: 23ms:216us >> average latency: 775us >> [root at localhost corretto]# grep "average latency:" nohup_optimized.out >> average latency: 2ms:48us >> average latency: 5ms:948us >> average latency: 5ms:940us >> average latency: 5ms:875us >> average latency: 850us >> average latency: 723us >> average latency: 1ms:221us >> average latency: 22ms:653us >> average latency: 693us > > A thread that copies the object and self heals will conceptually do the following, assuming relaxed memory ordering: > > copy(); > release(); > cas_forwarding_table(); > cas_self_heal(); > > The release before casing in the forwading table, acts as a release for both accesses, in the scenario when the copy is being published. So in the scenario you describe, the release in the forwarding table is already enough, to ensure that anyone reading the self healed pointer, is guaranteed to not observe bytes from before the copy. In the scenario when one thread performs the copy that gets published to the forwarding table, and another thread self-heals the pointer with the value acquired from the forwarding table, we will indeed not have a release to publish the pointer, only an acquire used to read from the forwarding table. However, this is fine, as the new MCA ARMv8 architecture does not allow causal consistency violations like WRC (cf. https://dl.acm.org/doi/pdf/10.1145/3158107 section 4). So we no longer need to use acquire/release to guarantee causal consistency across threads. This would naturally not hold for PPC, but there is no PPC port for ZGC yet. > > It is interesting though that when loading a self-healed pointer, we do not perform any acquire. That is fine when dereferencing the loaded pointer, as a dependent load, eliding the need for an acquire. And that is indeed fine for the JIT compiled code, because we know it is always a dependent load (or safe in other ways). However, for the C++ code, we can not *guarantee* that there will be a dependent load in a spec conforming way. That might be something to look into. In practice, there isn't any good reason why reading and oop and then dereferencing it wouldn't yield a dependent load, but the spec doesn't promise anything and could in theory allow compilers to mess this up. However, having an acquire for every oop load in the runtime does sound a bit costly. The memory_order_consume semantics were supposed to solve this, but I'm not sure if the compilers have yet become good at doing something useful with that, other than just having it be equivalent to acquire. Might be somethi ng to check out in the disassembly to see what it yields. But that is an exercise for another day, as this isn't an issue you are introducing with this patch. > > Hope this helps explain my thoughts in more detail. @fisk Thanks a lot for your detailed explanation. But I?m quite confused about the release in the forwarding table, why is it able to act as a release for both accesses? In my view, since the release is ?bonded? to forwarding table, it only ensures that copy happens before installing forwardee, why does it have something to do with self healing? From your explanation, I guess release is not ?bonded? to forwarding table. Instead, it maybe serves as membar to block all the CASes afterwards. Thanks again for your patient explanation. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From stefank at openjdk.java.net Tue Aug 10 07:27:29 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 10 Aug 2021 07:27:29 GMT Subject: RFR: 8271946: Cleanup leftovers in Space and subclasses [v2] In-Reply-To: References: <06djYCvDj8KzCJVZ7KJqS8B9WtRBwl7hN1CMm4BZzXk=.23e2ee84-3f0f-4274-a470-6ea5b2d888cc@github.com> Message-ID: On Sat, 7 Aug 2021 11:08:55 GMT, Roman Kennke wrote: >> There are a number of methods in (Compactible)Space that seem trivially implemented. E.g. obj_size(), adjust_obj_size(), scanned_block_is_obj() etc. I believe they are leftovers from CMS (not sure) but looks like they can be replaced by obj->size() or removed altogether. Also, the the related templated methods scan_and_forward(), scan_and_adjust_pointers() and scan_and_compact() can be directly and mechanically inlined into their callers. It makes the code more readable. >> >> Testing: >> - [x] hotspot_gc >> - [x] tier1 >> - [ ] tier2 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Remove obsolete comments Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5031 From tschatzl at openjdk.java.net Tue Aug 10 07:43:37 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 07:43:37 GMT Subject: RFR: 8272083: G1: Record iterated range for BOT performance during card scan In-Reply-To: References: Message-ID: On Sat, 7 Aug 2021 04:56:08 GMT, Yude Lin wrote: > A fix to the problem in 8272083 is to use a per-worker pointer to indicate where the worker has scanned up to, similar to the _scanned_to variable. The difference is this pointer (I call it _iterated_to) records the end of the object and _scanned_to records the end of the scan. Since we always scan with increasing addresses, the end of the latest object scanned is also the address where BOT has fixed up to. So we avoid having to fix below this address when calling block_start(). This implementation approximately reduce the number of calls to set_offset_array() during scan_heap_roots() 2-10 times (in my casual test with -XX:G1ConcRefinementGreenZone=1000000). > > What this approach not solving is random access to BOT. So far I haven't found anything having this pattern. >From what I understand from a brief look at the backwards scan prototype (I will continue a bit, and I may be wrong), the code actually implements scanning backwards the cards within claimed chunks. I wasn't really suggesting to scan backwards the cards, but claim the chunks (i.e. the set of cards a given thread will look at) from top to bottom addresses within the region. Or just have the first claim be the top chunk, or as a first task for a completely untouched region, have the thread explicitly update the BOT. Or after claiming a chunk, first fix the bot for that chunk, and then scan through. The first thread claiming the top chunk will do all the work, and by the time the other threads (working on other regions) have finished, the BOT is done and ready for use. Afaik the BOT update work needs to be done always anyway, so you could imagine that it might be useful to do it as an entirely extra step (as needed). Note that I'm not sure if these ideas are actually any good :P I am also trying to understand a little how serious the problem is in terms of actual performance hit by the duplicate work. It might be fairly high when not doing refinement at all as you indicate. The reason is basically that the suggested change bleeds fairly much into other non-card-scanning code so I want to at least think about alternatives that may be limited to e.g. just the remset scan code. As for concurrent BOT update, I believe that the existing concurrent refinement code already concurrently updates the BOT as needed afaik so that shouldn't be an issue (I need to go look in detail). One idea could be that after allocating a PLAB (in old gen only of course) we enqueue some background task to fix these. We know the exact locations and sizes of those, maybe this somewhat helps. We also know the chunk sizes that are used for scanning, so we know the problematic places. Something like the "root region scan" for marking (but we do not need to wait for it to complete during gc). ------------- PR: https://git.openjdk.java.net/jdk/pull/5039 From github.com+7947546+tanghaoth90 at openjdk.java.net Tue Aug 10 07:46:29 2021 From: github.com+7947546+tanghaoth90 at openjdk.java.net (Hao Tang) Date: Tue, 10 Aug 2021 07:46:29 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Mon, 9 Aug 2021 13:43:31 GMT, Erik ?sterlund wrote: >> ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. >> Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. >> We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. >> >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms >> >> [root at localhost corretto]# grep "average latency:" nohup_baseline.out >> average latency: 2ms:40us >> average latency: 6ms:550us >> average latency: 6ms:543us >> average latency: 6ms:493us >> average latency: 928us >> average latency: 794us >> average latency: 1ms:403us >> average latency: 23ms:216us >> average latency: 775us >> [root at localhost corretto]# grep "average latency:" nohup_optimized.out >> average latency: 2ms:48us >> average latency: 5ms:948us >> average latency: 5ms:940us >> average latency: 5ms:875us >> average latency: 850us >> average latency: 723us >> average latency: 1ms:221us >> average latency: 22ms:653us >> average latency: 693us > > A thread that copies the object and self heals will conceptually do the following, assuming relaxed memory ordering: > > copy(); > release(); > cas_forwarding_table(); > cas_self_heal(); > > The release before casing in the forwading table, acts as a release for both accesses, in the scenario when the copy is being published. So in the scenario you describe, the release in the forwarding table is already enough, to ensure that anyone reading the self healed pointer, is guaranteed to not observe bytes from before the copy. In the scenario when one thread performs the copy that gets published to the forwarding table, and another thread self-heals the pointer with the value acquired from the forwarding table, we will indeed not have a release to publish the pointer, only an acquire used to read from the forwarding table. However, this is fine, as the new MCA ARMv8 architecture does not allow causal consistency violations like WRC (cf. https://dl.acm.org/doi/pdf/10.1145/3158107 section 4). So we no longer need to use acquire/release to guarantee causal consistency across threads. This would naturally not hold for PPC, but there is no PPC port for ZGC yet. > > It is interesting though that when loading a self-healed pointer, we do not perform any acquire. That is fine when dereferencing the loaded pointer, as a dependent load, eliding the need for an acquire. And that is indeed fine for the JIT compiled code, because we know it is always a dependent load (or safe in other ways). However, for the C++ code, we can not *guarantee* that there will be a dependent load in a spec conforming way. That might be something to look into. In practice, there isn't any good reason why reading and oop and then dereferencing it wouldn't yield a dependent load, but the spec doesn't promise anything and could in theory allow compilers to mess this up. However, having an acquire for every oop load in the runtime does sound a bit costly. The memory_order_consume semantics were supposed to solve this, but I'm not sure if the compilers have yet become good at doing something useful with that, other than just having it be equivalent to acquire. Might be somethi ng to check out in the disassembly to see what it yields. But that is an exercise for another day, as this isn't an issue you are introducing with this patch. > > Hope this helps explain my thoughts in more detail. @fisk Hi, Eric. We are wondering if one thread loading a healed pointer can observe the corresponding copy has not finished yet. Assuming relaxed ordering for `cas_self_heal`, both Thread A and Thread B are loading the same reference. **Thread A**: `load obj.fld; // will relocate the object referenced by obj.fld` thread A will do the following: 1 copy(); 2 cas_forwarding_table(); // release 3 cas_self_heal(); // relaxed **Thread B**: `load obj.fld; // load the same reference` thread B may obverses the following reordering of **thread A**: 3 cas_self_heal(); // relaxed 1 copy(); 2 cas_forwarding_table(); // release To our knowledge, release ordering in _line 2_ does not prevent _line 3_ to be reordering before _line 1_, which indicates the release in the forwarding table is not enough. Perhaps we need to add acquire ordering to _line 2_ or add release ordering to _line 3_. In another way, as @weixlu said, > Instead, it maybe serves as membar to block all the CASes afterwards. relaxed ordering in _line 2_ along with release ordering in _line 3_ can indeed ensure thread B always observes the object copy. Looking forward to your advice. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From tschatzl at openjdk.java.net Tue Aug 10 08:12:00 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 08:12:00 GMT Subject: RFR: 8271880: Tighten condition for excluding regions from collecting cards with cross-references [v3] In-Reply-To: References: Message-ID: > Hi, > > can I have reviews for this change that by tightening the condition for excluding regions from collecting cards with cross-references allows us to avoid the rescanning of objects that failed evacuation in the fix up self-forwards phase after evacuation failure. > > I.e. during gc g1 never collects cards/references from the young gen (including eden) for later refinement - which means that we need to rescan all live objects remaining in eden regions for cross-references. > > The problem or the reason why we did that was that we did not want to add cards to refine from survivor regions (i.e. next gc's young gen) because we just don't need to as we always collect young gen, so references from there need not be recorded in the remembered sets (and actually, if we did, we errorneouosly marked cards in young gen which later card table verification will not like) - but we did not have that information on hand anywhere already quickly accessible. > > This change solves that problem by actually recording this information in the region attribute table as "NewSurvivor" type region. "NewSurvivor" because I did want to make explicit that these are the survivor regions from the *new* (next) young generation (i.e. just survivor) and not the survivor regions of the previous gc (that were turned eden at the start of this gc) but something like "NewYoung" or so would be fine with me as well (or certainly just "Survivor", but that might be confusing). > > Another interesting addition is probably the new assert in `G1ParThreadScanState::enqueue_card_if_tracked` > > > assert(!_g1h->heap_region_containing(o)->in_collection_set(), "Should not try to enqueue reference into collection set region"); > > This, at this time, verifies the assumption that g1 is not trying to collect references *to* the collection set, i.e. other objects that failed evacuation - after all we later relabel their regions as old *without* a remembered set; we would do otherwise unnecessarily because the reason is that (currently) cset tracking for these regions is enabled (at least during gc - we only later relabel and drop the remembered sets). > > This might change later if we want to move evacuation failed regions into survivor (or keep their remembered sets for some reason), but for now we filter attempts to add cards in the dcqs for those this way. > > Testing: tier1-5, gc/g1 with `JAVA_OPTIONS_=-XX+G1EvacuationFailureALot -XX:+VerifyAfterGC`. > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Clarify comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5037/files - new: https://git.openjdk.java.net/jdk/pull/5037/files/b38c11e4..b9fd95a9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5037&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5037&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5037.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5037/head:pull/5037 PR: https://git.openjdk.java.net/jdk/pull/5037 From tschatzl at openjdk.java.net Tue Aug 10 08:30:24 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 08:30:24 GMT Subject: RFR: 8271880: Tighten condition for excluding regions from collecting cards with cross-references [v4] In-Reply-To: References: Message-ID: > Hi, > > can I have reviews for this change that by tightening the condition for excluding regions from collecting cards with cross-references allows us to avoid the rescanning of objects that failed evacuation in the fix up self-forwards phase after evacuation failure. > > I.e. during gc g1 never collects cards/references from the young gen (including eden) for later refinement - which means that we need to rescan all live objects remaining in eden regions for cross-references. > > The problem or the reason why we did that was that we did not want to add cards to refine from survivor regions (i.e. next gc's young gen) because we just don't need to as we always collect young gen, so references from there need not be recorded in the remembered sets (and actually, if we did, we errorneouosly marked cards in young gen which later card table verification will not like) - but we did not have that information on hand anywhere already quickly accessible. > > This change solves that problem by actually recording this information in the region attribute table as "NewSurvivor" type region. "NewSurvivor" because I did want to make explicit that these are the survivor regions from the *new* (next) young generation (i.e. just survivor) and not the survivor regions of the previous gc (that were turned eden at the start of this gc) but something like "NewYoung" or so would be fine with me as well (or certainly just "Survivor", but that might be confusing). > > Another interesting addition is probably the new assert in `G1ParThreadScanState::enqueue_card_if_tracked` > > > assert(!_g1h->heap_region_containing(o)->in_collection_set(), "Should not try to enqueue reference into collection set region"); > > This, at this time, verifies the assumption that g1 is not trying to collect references *to* the collection set, i.e. other objects that failed evacuation - after all we later relabel their regions as old *without* a remembered set; we would do otherwise unnecessarily because the reason is that (currently) cset tracking for these regions is enabled (at least during gc - we only later relabel and drop the remembered sets). > > This might change later if we want to move evacuation failed regions into survivor (or keep their remembered sets for some reason), but for now we filter attempts to add cards in the dcqs for those this way. > > Testing: tier1-5, gc/g1 with `JAVA_OPTIONS_=-XX+G1EvacuationFailureALot -XX:+VerifyAfterGC`. > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: More updates to comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5037/files - new: https://git.openjdk.java.net/jdk/pull/5037/files/b9fd95a9..0187da74 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5037&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5037&range=02-03 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5037.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5037/head:pull/5037 PR: https://git.openjdk.java.net/jdk/pull/5037 From pliden at openjdk.java.net Tue Aug 10 08:32:02 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 10 Aug 2021 08:32:02 GMT Subject: RFR: 8267186: Add string deduplication support to ZGC [v3] In-Reply-To: References: Message-ID: > This change adds support for string deduplication to ZGC. It's a pretty straight forward change, but to make reviewing even easier it is broken up into two commits: > > __ZGC: Introduce ZMarkContext__ > This commit just moves the `ZMarkCache` into the new `ZMarkContext` class, and we now pass a `ZMarkContext*` around instead of a `ZMarkCache*`. The `ZMarkContext` class is a more general container for worker-local data, which in the next commit will be extended to also include the `StringDedup::Requests` queue. > > __8267186: Add string deduplication support to ZGC__ > This commits adds the actual string dedup functionality and enables relevant tests. We use the `deduplication_requested` bit in the `String` object to filter out `Strings` we've already attempted to deduplicate. > > Testing: > - Passes all string dedup tests. > - Passes Tier1-7 with ZGC on Linux/x86_64 (with -XX:+UseStringDeduplication enabled by default to get better exposure). Per Liden has updated the pull request incrementally with one additional commit since the last revision: Fix include order ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5029/files - new: https://git.openjdk.java.net/jdk/pull/5029/files/c8cd63ee..c164f2c9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5029&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5029&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5029.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5029/head:pull/5029 PR: https://git.openjdk.java.net/jdk/pull/5029 From pliden at openjdk.java.net Tue Aug 10 08:32:06 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 10 Aug 2021 08:32:06 GMT Subject: RFR: 8267186: Add string deduplication support to ZGC [v2] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 01:44:32 GMT, Kim Barrett wrote: >> Per Liden has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments from Erik > > src/hotspot/share/gc/z/zMarkContext.inline.hpp line 29: > >> 27: #include "classfile/javaClasses.inline.hpp" >> 28: #include "gc/z/zMarkCache.inline.hpp" >> 29: #include "gc/z/zMarkContext.hpp" > > The associated .hpp file is supposed to be included first by a .inline.hpp file, per JDK-8267464. Ah, forgot about that new rule. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5029 From tschatzl at openjdk.java.net Tue Aug 10 08:41:08 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 08:41:08 GMT Subject: RFR: 8271880: Tighten condition for excluding regions from collecting cards with cross-references [v5] In-Reply-To: References: Message-ID: > Hi, > > can I have reviews for this change that by tightening the condition for excluding regions from collecting cards with cross-references allows us to avoid the rescanning of objects that failed evacuation in the fix up self-forwards phase after evacuation failure. > > I.e. during gc g1 never collects cards/references from the young gen (including eden) for later refinement - which means that we need to rescan all live objects remaining in eden regions for cross-references. > > The problem or the reason why we did that was that we did not want to add cards to refine from survivor regions (i.e. next gc's young gen) because we just don't need to as we always collect young gen, so references from there need not be recorded in the remembered sets (and actually, if we did, we errorneouosly marked cards in young gen which later card table verification will not like) - but we did not have that information on hand anywhere already quickly accessible. > > This change solves that problem by actually recording this information in the region attribute table as "NewSurvivor" type region. "NewSurvivor" because I did want to make explicit that these are the survivor regions from the *new* (next) young generation (i.e. just survivor) and not the survivor regions of the previous gc (that were turned eden at the start of this gc) but something like "NewYoung" or so would be fine with me as well (or certainly just "Survivor", but that might be confusing). > > Another interesting addition is probably the new assert in `G1ParThreadScanState::enqueue_card_if_tracked` > > > assert(!_g1h->heap_region_containing(o)->in_collection_set(), "Should not try to enqueue reference into collection set region"); > > This, at this time, verifies the assumption that g1 is not trying to collect references *to* the collection set, i.e. other objects that failed evacuation - after all we later relabel their regions as old *without* a remembered set; we would do otherwise unnecessarily because the reason is that (currently) cset tracking for these regions is enabled (at least during gc - we only later relabel and drop the remembered sets). > > This might change later if we want to move evacuation failed regions into survivor (or keep their remembered sets for some reason), but for now we filter attempts to add cards in the dcqs for those this way. > > Testing: tier1-5, gc/g1 with `JAVA_OPTIONS_=-XX+G1EvacuationFailureALot -XX:+VerifyAfterGC`. > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into evac-failure-no-scan-during-remove-self-forwards - More updates to comment - Clarify comment - ayang review - Some random changes - No scan during self forward ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5037/files - new: https://git.openjdk.java.net/jdk/pull/5037/files/0187da74..880d0067 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5037&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5037&range=03-04 Stats: 6080 lines in 117 files changed: 5342 ins; 338 del; 400 mod Patch: https://git.openjdk.java.net/jdk/pull/5037.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5037/head:pull/5037 PR: https://git.openjdk.java.net/jdk/pull/5037 From ayang at openjdk.java.net Tue Aug 10 08:49:49 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Aug 2021 08:49:49 GMT Subject: RFR: 8272196: Remove unused class ParStrongRootsScope Message-ID: Trivial change of removing an unused class definition. ------------- Commit messages: - pgc Changes: https://git.openjdk.java.net/jdk/pull/5064/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5064&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272196 Stats: 15 lines in 2 files changed: 0 ins; 15 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5064.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5064/head:pull/5064 PR: https://git.openjdk.java.net/jdk/pull/5064 From tschatzl at openjdk.java.net Tue Aug 10 08:55:33 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 08:55:33 GMT Subject: RFR: 8272196: Remove unused class ParStrongRootsScope In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 08:42:57 GMT, Albert Mingkun Yang wrote: > Trivial change of removing an unused class definition. Lgtm and trivial. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5064 From pliden at openjdk.java.net Tue Aug 10 08:59:59 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 10 Aug 2021 08:59:59 GMT Subject: RFR: 8271862: C2 intrinsic for Reference.refersTo() is often not used [v2] In-Reply-To: References: Message-ID: > While fixing JDK-8270626 it was discovered that the C2 intrinsic for `Reference.refersTo()` is not used as often as one would think. Instead C2 often prefers using the native implementation, which is much slower which defeats the purpose of having an intrinsic in the first place. > > The problem arise from having `refersTo0()` be virtual, native and @IntrinsicCandidate. This can be worked around by introducing an virtual method layer and so that `refersTo0()` can be made `final`. That's what this patch does, by adding a virtual `refersToImpl()` which in turn calls the now final `refersTo0()`. > > It should be noted that `Object.clone()` and `Object.hashCode()` are also virtual, native and @IntrinsicCandidate and these methods get special treatment by C2 to get the intrinsic generated more optimally. We might want to do a similar optimization for `Reference.refersTo0()`. In that case, it is suggested that such an improvement could come later and be handled as a separate enhancement, and @kimbarrett has already filed JDK-8272140 to track that. > > Testing: > I found this hard to test without instrumenting Hotspot to tell me that the intrinsic was called instead of the native method. So, for that reason testing was ad-hoc, using an instrumented Hotspot in combination with the test (`vmTestbase/gc/gctests/PhantomReference/phantom002/TestDescription.java`) that initially uncovered the problem. See JDK-8270626 for additional information. Per Liden has updated the pull request incrementally with one additional commit since the last revision: Private implies final ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5052/files - new: https://git.openjdk.java.net/jdk/pull/5052/files/66743f76..3bc3d5e5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5052&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5052&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5052.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5052/head:pull/5052 PR: https://git.openjdk.java.net/jdk/pull/5052 From pliden at openjdk.java.net Tue Aug 10 09:00:00 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 10 Aug 2021 09:00:00 GMT Subject: RFR: 8271862: C2 intrinsic for Reference.refersTo() is often not used [v2] In-Reply-To: References: Message-ID: On Mon, 9 Aug 2021 19:59:21 GMT, Vladimir Ivanov wrote: >> Per Liden has updated the pull request incrementally with one additional commit since the last revision: >> >> Private implies final > > src/java.base/share/classes/java/lang/ref/Reference.java line 379: > >> 377: >> 378: @IntrinsicCandidate >> 379: private native final boolean refersTo0(Object o); > > No need to have it both `private` and `final`. Just `private` should be enough. Good point. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5052 From pliden at openjdk.java.net Tue Aug 10 09:02:30 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 10 Aug 2021 09:02:30 GMT Subject: RFR: 8272196: Remove unused class ParStrongRootsScope In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 08:42:57 GMT, Albert Mingkun Yang wrote: > Trivial change of removing an unused class definition. Agree, good and trivial. ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5064 From eosterlund at openjdk.java.net Tue Aug 10 09:16:30 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 10 Aug 2021 09:16:30 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Mon, 9 Aug 2021 13:43:31 GMT, Erik ?sterlund wrote: >> ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. >> Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. >> We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. >> >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms >> >> [root at localhost corretto]# grep "average latency:" nohup_baseline.out >> average latency: 2ms:40us >> average latency: 6ms:550us >> average latency: 6ms:543us >> average latency: 6ms:493us >> average latency: 928us >> average latency: 794us >> average latency: 1ms:403us >> average latency: 23ms:216us >> average latency: 775us >> [root at localhost corretto]# grep "average latency:" nohup_optimized.out >> average latency: 2ms:48us >> average latency: 5ms:948us >> average latency: 5ms:940us >> average latency: 5ms:875us >> average latency: 850us >> average latency: 723us >> average latency: 1ms:221us >> average latency: 22ms:653us >> average latency: 693us > > A thread that copies the object and self heals will conceptually do the following, assuming relaxed memory ordering: > > copy(); > release(); > cas_forwarding_table(); > cas_self_heal(); > > The release before casing in the forwading table, acts as a release for both accesses, in the scenario when the copy is being published. So in the scenario you describe, the release in the forwarding table is already enough, to ensure that anyone reading the self healed pointer, is guaranteed to not observe bytes from before the copy. In the scenario when one thread performs the copy that gets published to the forwarding table, and another thread self-heals the pointer with the value acquired from the forwarding table, we will indeed not have a release to publish the pointer, only an acquire used to read from the forwarding table. However, this is fine, as the new MCA ARMv8 architecture does not allow causal consistency violations like WRC (cf. https://dl.acm.org/doi/pdf/10.1145/3158107 section 4). So we no longer need to use acquire/release to guarantee causal consistency across threads. This would naturally not hold for PPC, but there is no PPC port for ZGC yet. > > It is interesting though that when loading a self-healed pointer, we do not perform any acquire. That is fine when dereferencing the loaded pointer, as a dependent load, eliding the need for an acquire. And that is indeed fine for the JIT compiled code, because we know it is always a dependent load (or safe in other ways). However, for the C++ code, we can not *guarantee* that there will be a dependent load in a spec conforming way. That might be something to look into. In practice, there isn't any good reason why reading and oop and then dereferencing it wouldn't yield a dependent load, but the spec doesn't promise anything and could in theory allow compilers to mess this up. However, having an acquire for every oop load in the runtime does sound a bit costly. The memory_order_consume semantics were supposed to solve this, but I'm not sure if the compilers have yet become good at doing something useful with that, other than just having it be equivalent to acquire. Might be somethi ng to check out in the disassembly to see what it yields. But that is an exercise for another day, as this isn't an issue you are introducing with this patch. > > Hope this helps explain my thoughts in more detail. > @fisk Thanks a lot for your detailed explanation. But I?m quite confused about the release in the forwarding table, why is it able to act as a release for both accesses? In my view, since the release is ?bonded? to forwarding table, it only ensures that copy happens before installing forwardee, why does it have something to do with self healing? From your explanation, I guess release is not ?bonded? to forwarding table. Instead, it maybe serves as membar to block all the CASes afterwards. Thanks again for your patient explanation. What I wrote kind of assumes that we have an explicit OrderAccess::release(); relaxed_cas(); in the forwarding table. Looks like we currently have a bounded releasing cas instead. The unbounded release is guaranteed to provide release semantics for *any* subsequent store operation. However, a bounded Store-Release, (i.e. stlr) instruction only needs to provide the release semantics to the bound store, as you say. So I suppose what I propose is to use release(); relaxed_cas(); in the forwarding table, and then also relaxed_cas(); in the self healing. Letting the release be done in only one place, and only when actual copying happens. The vast majority of self heals I have found are purely remapping the pointer lazily, which can then dodge any need for release when self healing. Hope this makes sense. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From eosterlund at openjdk.java.net Tue Aug 10 09:32:30 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 10 Aug 2021 09:32:30 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Tue, 10 Aug 2021 09:13:58 GMT, Erik ?sterlund wrote: >> A thread that copies the object and self heals will conceptually do the following, assuming relaxed memory ordering: >> >> copy(); >> release(); >> cas_forwarding_table(); >> cas_self_heal(); >> >> The release before casing in the forwading table, acts as a release for both accesses, in the scenario when the copy is being published. So in the scenario you describe, the release in the forwarding table is already enough, to ensure that anyone reading the self healed pointer, is guaranteed to not observe bytes from before the copy. In the scenario when one thread performs the copy that gets published to the forwarding table, and another thread self-heals the pointer with the value acquired from the forwarding table, we will indeed not have a release to publish the pointer, only an acquire used to read from the forwarding table. However, this is fine, as the new MCA ARMv8 architecture does not allow causal consistency violations like WRC (cf. https://dl.acm.org/doi/pdf/10.1145/3158107 section 4). So we no longer need to use acquire/release to guarantee causal consistency across threads. This would naturally not hold for PPC, but there is no PPC port for ZGC yet. >> >> It is interesting though that when loading a self-healed pointer, we do not perform any acquire. That is fine when dereferencing the loaded pointer, as a dependent load, eliding the need for an acquire. And that is indeed fine for the JIT compiled code, because we know it is always a dependent load (or safe in other ways). However, for the C++ code, we can not *guarantee* that there will be a dependent load in a spec conforming way. That might be something to look into. In practice, there isn't any good reason why reading and oop and then dereferencing it wouldn't yield a dependent load, but the spec doesn't promise anything and could in theory allow compilers to mess this up. However, having an acquire for every oop load in the runtime does sound a bit costly. The memory_order_consume semantics were supposed to solve this, but I'm not sure if the compilers have yet become good at doing something useful with that, other than just having it be equivalent to acquire. Might be someth ing to check out in the disassembly to see what it yields. But that is an exercise for another day, as this isn't an issue you are introducing with this patch. >> >> Hope this helps explain my thoughts in more detail. > >> @fisk Thanks a lot for your detailed explanation. But I?m quite confused about the release in the forwarding table, why is it able to act as a release for both accesses? In my view, since the release is ?bonded? to forwarding table, it only ensures that copy happens before installing forwardee, why does it have something to do with self healing? From your explanation, I guess release is not ?bonded? to forwarding table. Instead, it maybe serves as membar to block all the CASes afterwards. Thanks again for your patient explanation. > > What I wrote kind of assumes that we have an explicit OrderAccess::release(); relaxed_cas(); in the forwarding table. Looks like we currently have a bounded releasing cas instead. The unbounded release is guaranteed to provide release semantics for *any* subsequent store operation. However, a bounded Store-Release, (i.e. stlr) instruction only needs to provide the release semantics to the bound store, as you say. So I suppose what I propose is to use release(); relaxed_cas(); in the forwarding table, and then also relaxed_cas(); in the self healing. Letting the release be done in only one place, and only when actual copying happens. The vast majority of self heals I have found are purely remapping the pointer lazily, which can then dodge any need for release when self healing. Hope this makes sense. > @fisk Hi, Eric. We are wondering if one thread loading a healed pointer can observe the corresponding copy has not finished yet. Assuming relaxed ordering for `cas_self_heal`, both Thread A and Thread B are loading the same reference. > > **Thread A**: `load obj.fld; // will relocate the object referenced by obj.fld` > thread A will do the following: > > ``` > 1 copy(); > 2 cas_forwarding_table(); // release > 3 cas_self_heal(); // relaxed > ``` > > **Thread B**: `load obj.fld; // load the same reference` > thread B may obverses the following reordering of **thread A**: > > ``` > 3 cas_self_heal(); // relaxed > 1 copy(); > 2 cas_forwarding_table(); // release > ``` > > To our knowledge, release ordering in _line 2_ does not prevent _line 3_ to be reordering before _line 1_, which indicates the release in the forwarding table is not enough. Perhaps we need to add acquire ordering to _line 2_ or add release ordering to _line 3_. > > In another way, as @weixlu said, > > > Instead, it maybe serves as membar to block all the CASes afterwards. > > relaxed ordering in _line 2_ along with release ordering in _line 3_ can indeed ensure thread B always observes the object copy. > > Looking forward to your advice. Yeah so I was kind of assuming the forwarding table installation would be an unbound release(); relaxed_cas(); That way, the release serves a dual purpose: releasing for the table and releasing for the self heal. That way the vast majority of self heals (that only do remapping), won't need to release. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From ayang at openjdk.java.net Tue Aug 10 10:16:35 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Aug 2021 10:16:35 GMT Subject: RFR: 8272196: Remove unused class ParStrongRootsScope In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 08:42:57 GMT, Albert Mingkun Yang wrote: > Trivial change of removing an unused class definition. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5064 From ayang at openjdk.java.net Tue Aug 10 10:16:35 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Aug 2021 10:16:35 GMT Subject: Integrated: 8272196: Remove unused class ParStrongRootsScope In-Reply-To: References: Message-ID: <-YF56PKBuc3OWnlCF9OSIoPKHegtCWxqu_uBT67KR0Y=.724a8342-b242-442c-8c43-cd9d1206db88@github.com> On Tue, 10 Aug 2021 08:42:57 GMT, Albert Mingkun Yang wrote: > Trivial change of removing an unused class definition. This pull request has now been integrated. Changeset: f2599ad8 Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/f2599ad867476f11fdc455084bb64ab6e91fa146 Stats: 15 lines in 2 files changed: 0 ins; 15 del; 0 mod 8272196: Remove unused class ParStrongRootsScope Reviewed-by: tschatzl, pliden ------------- PR: https://git.openjdk.java.net/jdk/pull/5064 From ayang at openjdk.java.net Tue Aug 10 10:17:41 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Aug 2021 10:17:41 GMT Subject: RFR: 8272216: G1: replace G1ParScanThreadState::_dest with a constant Message-ID: Simple change of removing an array with statically-known identical elements. Test: hotspot_gc ------------- Commit messages: - dest Changes: https://git.openjdk.java.net/jdk/pull/5065/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5065&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272216 Stats: 17 lines in 2 files changed: 2 ins; 14 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5065.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5065/head:pull/5065 PR: https://git.openjdk.java.net/jdk/pull/5065 From ayang at openjdk.java.net Tue Aug 10 10:48:54 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Aug 2021 10:48:54 GMT Subject: RFR: 8272216: G1: replace G1ParScanThreadState::_dest with a constant [v2] In-Reply-To: References: Message-ID: > Simple change of removing an array with statically-known identical elements. > > Test: hotspot_gc Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5065/files - new: https://git.openjdk.java.net/jdk/pull/5065/files/a134b432..0e9c8e0d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5065&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5065&range=00-01 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5065.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5065/head:pull/5065 PR: https://git.openjdk.java.net/jdk/pull/5065 From kbarrett at openjdk.java.net Tue Aug 10 10:48:54 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 10 Aug 2021 10:48:54 GMT Subject: RFR: 8272216: G1: replace G1ParScanThreadState::_dest with a constant [v2] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 10:46:20 GMT, Albert Mingkun Yang wrote: >> Simple change of removing an array with statically-known identical elements. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good. Just a suggested change to assert placement. src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 361: > 359: } > 360: } > 361: assert(region_attr.is_young() || region_attr.is_old(), "must be either Young or Old"); I think I'd rather have this assert at the beginning of the function. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5065 From ayang at openjdk.java.net Tue Aug 10 10:48:55 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Aug 2021 10:48:55 GMT Subject: RFR: 8272216: G1: replace G1ParScanThreadState::_dest with a constant [v2] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 10:34:25 GMT, Kim Barrett wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 361: > >> 359: } >> 360: } >> 361: assert(region_attr.is_young() || region_attr.is_old(), "must be either Young or Old"); > > I think I'd rather have this assert at the beginning of the function. Revised. ------------- PR: https://git.openjdk.java.net/jdk/pull/5065 From tschatzl at openjdk.java.net Tue Aug 10 10:55:29 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 10:55:29 GMT Subject: RFR: 8272216: G1: replace G1ParScanThreadState::_dest with a constant [v2] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 10:48:54 GMT, Albert Mingkun Yang wrote: >> Simple change of removing an array with statically-known identical elements. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5065 From github.com+16811675+linade at openjdk.java.net Tue Aug 10 11:27:27 2021 From: github.com+16811675+linade at openjdk.java.net (Yude Lin) Date: Tue, 10 Aug 2021 11:27:27 GMT Subject: RFR: 8272083: G1: Record iterated range for BOT performance during card scan In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 07:40:58 GMT, Thomas Schatzl wrote: > From what I understand from a brief look at the backwards scan prototype (I will continue a bit, and I may be wrong), the code actually implements scanning backwards the cards within claimed chunks. I wasn't really suggesting to scan backwards the cards, but claim the chunks (i.e. the set of cards a given thread will look at) from top to bottom addresses within the region. Or just have the first claim be the top chunk, or as a first task for a completely untouched region, have the thread explicitly update the BOT. Or after claiming a chunk, first fix the bot for that chunk, and then scan through. > The prototype reverses both the chunk claiming process and the card scanning process in each chunk. Because I thought within each chunk there could also be repetitive work if we do forward scan. > I am also trying to understand a little how serious the problem is in terms of actual performance hit by the duplicate work. It might be fairly high when not doing refinement at all as you indicate. > I can do a test that compares the scan time with and without the duplicate work removed. And maybe change the concurrent refinement strength to create different groups. It seems the decisions depend on more data. E.g., if concurrent refinement is on, and the de-duplicate process isn't very effective in terms of reduction in scan time, then the process itself should be too heavy weight? > As for concurrent BOT update, I believe that the existing concurrent refinement code already concurrently updates the BOT as needed afaik so that shouldn't be an issue (I need to go look in detail). One idea could be that after allocating a PLAB (in old gen only of course) we enqueue some background task to fix these. We know the exact locations and sizes of those, maybe this somewhat helps. We also know the chunk sizes that are used for scanning, so we know the problematic places. > Something like the "root region scan" for marking (but we do not need to wait for it to complete during gc). I misunderstood the concurrent update part. Now I see that you are actually suggesting moving almost the entire BOT fixing work out of the gc pause. I think that might further reduce the pause time. If I understand correctly, concurrent refinement fixes part of the BOT, but there are some leftovers. Now that the leftovers are also moved to the concurrent phase. I think it's worth some experiment too! ------------- PR: https://git.openjdk.java.net/jdk/pull/5039 From tschatzl at openjdk.java.net Tue Aug 10 11:33:36 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 11:33:36 GMT Subject: RFR: 8271880: Tighten condition for excluding regions from collecting cards with cross-references [v5] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 08:41:08 GMT, Thomas Schatzl wrote: >> Hi, >> >> can I have reviews for this change that by tightening the condition for excluding regions from collecting cards with cross-references allows us to avoid the rescanning of objects that failed evacuation in the fix up self-forwards phase after evacuation failure. >> >> I.e. during gc g1 never collects cards/references from the young gen (including eden) for later refinement - which means that we need to rescan all live objects remaining in eden regions for cross-references. >> >> The problem or the reason why we did that was that we did not want to add cards to refine from survivor regions (i.e. next gc's young gen) because we just don't need to as we always collect young gen, so references from there need not be recorded in the remembered sets (and actually, if we did, we errorneouosly marked cards in young gen which later card table verification will not like) - but we did not have that information on hand anywhere already quickly accessible. >> >> This change solves that problem by actually recording this information in the region attribute table as "NewSurvivor" type region. "NewSurvivor" because I did want to make explicit that these are the survivor regions from the *new* (next) young generation (i.e. just survivor) and not the survivor regions of the previous gc (that were turned eden at the start of this gc) but something like "NewYoung" or so would be fine with me as well (or certainly just "Survivor", but that might be confusing). >> >> Another interesting addition is probably the new assert in `G1ParThreadScanState::enqueue_card_if_tracked` >> >> >> assert(!_g1h->heap_region_containing(o)->in_collection_set(), "Should not try to enqueue reference into collection set region"); >> >> This, at this time, verifies the assumption that g1 is not trying to collect references *to* the collection set, i.e. other objects that failed evacuation - after all we later relabel their regions as old *without* a remembered set; we would do otherwise unnecessarily because the reason is that (currently) cset tracking for these regions is enabled (at least during gc - we only later relabel and drop the remembered sets). >> >> This might change later if we want to move evacuation failed regions into survivor (or keep their remembered sets for some reason), but for now we filter attempts to add cards in the dcqs for those this way. >> >> Testing: tier1-5, gc/g1 with `JAVA_OPTIONS_=-XX+G1EvacuationFailureALot -XX:+VerifyAfterGC`. >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into evac-failure-no-scan-during-remove-self-forwards > - More updates to comment > - Clarify comment > - ayang review > - Some random changes > - No scan during self forward Fyi, there is a bug (or at least a crash in another tier1-5 run) in the current code that I'm currently tracking down, so please hold off further reviews for now. Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/5037 From stefank at openjdk.java.net Tue Aug 10 11:43:30 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 10 Aug 2021 11:43:30 GMT Subject: RFR: 8267186: Add string deduplication support to ZGC [v3] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 08:32:02 GMT, Per Liden wrote: >> This change adds support for string deduplication to ZGC. It's a pretty straight forward change, but to make reviewing even easier it is broken up into two commits: >> >> __ZGC: Introduce ZMarkContext__ >> This commit just moves the `ZMarkCache` into the new `ZMarkContext` class, and we now pass a `ZMarkContext*` around instead of a `ZMarkCache*`. The `ZMarkContext` class is a more general container for worker-local data, which in the next commit will be extended to also include the `StringDedup::Requests` queue. >> >> __8267186: Add string deduplication support to ZGC__ >> This commits adds the actual string dedup functionality and enables relevant tests. We use the `deduplication_requested` bit in the `String` object to filter out `Strings` we've already attempted to deduplicate. >> >> Testing: >> - Passes all string dedup tests. >> - Passes Tier1-7 with ZGC on Linux/x86_64 (with -XX:+UseStringDeduplication enabled by default to get better exposure). > > Per Liden has updated the pull request incrementally with one additional commit since the last revision: > > Fix include order Looks good. ZMarkContext serves multiple purposes in the marking code, and if we (maybe in the future) add code for those purposes then it risk being a class of multiple responsibilities. Therefore, I think it would be good to let this class be a simpler data carrier, and move the extra code/logic in ZMarkContext::try_deduplicate out to ZMark, or maybe even a new ZStringDedup class. ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5029 From tschatzl at openjdk.java.net Tue Aug 10 12:33:38 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 12:33:38 GMT Subject: RFR: 8272161: Make evacuation failure data structures local to collection Message-ID: Hi all, can I have reviews for this change that moves some evacuation (failure) related data structures into the per-thread evacuation context (`G1ParThreadScanState`) which removes some young collection related code from G1CollectedHeap? Testing: tier1-5 Thanks, Thomas ------------- Commit messages: - Minor cleanup - Make evacuation failure data structures local to collection Changes: https://git.openjdk.java.net/jdk/pull/5066/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5066&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272161 Stats: 121 lines in 6 files changed: 42 ins; 45 del; 34 mod Patch: https://git.openjdk.java.net/jdk/pull/5066.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5066/head:pull/5066 PR: https://git.openjdk.java.net/jdk/pull/5066 From tschatzl at openjdk.java.net Tue Aug 10 12:41:32 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 12:41:32 GMT Subject: Withdrawn: 8272161: Make evacuation failure data structures local to collection In-Reply-To: References: Message-ID: <62ActpU0mh1ChR2goCiQUN7r-0pqGushNeU3SvRrcG0=.4b84f97a-b2ba-48da-aaf6-88873d219fb9@github.com> On Tue, 10 Aug 2021 12:27:16 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that moves some evacuation (failure) related data structures into the per-thread evacuation context (`G1ParThreadScanState`) which removes some young collection related code from G1CollectedHeap? > > Testing: tier1-5 > > Thanks, > Thomas This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5066 From tschatzl at openjdk.java.net Tue Aug 10 12:49:55 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 12:49:55 GMT Subject: RFR: 8272161: Make evacuation failure data structures local to collection [v2] In-Reply-To: References: Message-ID: > Hi all, > > can I have reviews for this change that moves some evacuation (failure) related data structures into the per-thread evacuation context (`G1ParThreadScanState`) which removes some young collection related code from G1CollectedHeap? > > Testing: tier1-5 > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Preserved marks stacks must be allocated on C heap. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5066/files - new: https://git.openjdk.java.net/jdk/pull/5066/files/2fc966ac..0def9589 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5066&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5066&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5066.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5066/head:pull/5066 PR: https://git.openjdk.java.net/jdk/pull/5066 From iwalulya at openjdk.java.net Tue Aug 10 13:09:49 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 10 Aug 2021 13:09:49 GMT Subject: RFR: 8267833: Improve G1CardSetInlinePtr::add() Message-ID: Hi all, Please review this cleanup change to reduce the number of comparisons made when adding entries to an inline pointer cardset container. We allow the find to restart from a previously known index. Testing: Tier 1-3 ------------- Commit messages: - cleanup - change location of the assert - chaange asseert to if check - 8267833: Improve G1CardSetInlinePtr::add() Changes: https://git.openjdk.java.net/jdk/pull/5067/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5067&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267833 Stats: 25 lines in 2 files changed: 18 ins; 1 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5067.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5067/head:pull/5067 PR: https://git.openjdk.java.net/jdk/pull/5067 From github.com+39413832+weixlu at openjdk.java.net Tue Aug 10 13:44:04 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Tue, 10 Aug 2021 13:44:04 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing [v2] In-Reply-To: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: > ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. > Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. > We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. > > > [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log > baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms > baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms > baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms > baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms > baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms > baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms > baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms > baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms > baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms > baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms > baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms > baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms > optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms > optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms > optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms > optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms > optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms > optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms > optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms > optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms > optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms > optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms > optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms > optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms > > [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log > baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms > baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms > baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms > baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms > baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms > baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms > baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms > baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms > baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms > baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms > baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms > baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms > optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms > optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms > optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms > optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms > optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms > optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms > optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms > optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms > optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms > optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms > optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms > optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms > > [root at localhost corretto]# grep "average latency:" nohup_baseline.out > average latency: 2ms:40us > average latency: 6ms:550us > average latency: 6ms:543us > average latency: 6ms:493us > average latency: 928us > average latency: 794us > average latency: 1ms:403us > average latency: 23ms:216us > average latency: 775us > [root at localhost corretto]# grep "average latency:" nohup_optimized.out > average latency: 2ms:48us > average latency: 5ms:948us > average latency: 5ms:940us > average latency: 5ms:875us > average latency: 850us > average latency: 723us > average latency: 1ms:221us > average latency: 22ms:653us > average latency: 693us Xiaowei Lu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5046/files - new: https://git.openjdk.java.net/jdk/pull/5046/files/3be3aa20..2b05fae1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5046&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5046&range=00-01 Stats: 57454 lines in 930 files changed: 47750 ins; 4467 del; 5237 mod Patch: https://git.openjdk.java.net/jdk/pull/5046.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5046/head:pull/5046 PR: https://git.openjdk.java.net/jdk/pull/5046 From github.com+39413832+weixlu at openjdk.java.net Tue Aug 10 13:44:06 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Tue, 10 Aug 2021 13:44:06 GMT Subject: Withdrawn: 8272138: ZGC: Adopt release ordering for self-healing In-Reply-To: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Mon, 9 Aug 2021 08:19:35 GMT, Xiaowei Lu wrote: > ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. > Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. > We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. > > > [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log > baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms > baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms > baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms > baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms > baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms > baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms > baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms > baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms > baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms > baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms > baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms > baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms > optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms > optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms > optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms > optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms > optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms > optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms > optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms > optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms > optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms > optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms > optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms > optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms > > [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log > baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms > baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms > baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms > baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms > baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms > baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms > baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms > baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms > baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms > baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms > baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms > baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms > optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms > optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms > optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms > optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms > optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms > optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms > optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms > optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms > optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms > optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms > optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms > optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms > > [root at localhost corretto]# grep "average latency:" nohup_baseline.out > average latency: 2ms:40us > average latency: 6ms:550us > average latency: 6ms:543us > average latency: 6ms:493us > average latency: 928us > average latency: 794us > average latency: 1ms:403us > average latency: 23ms:216us > average latency: 775us > [root at localhost corretto]# grep "average latency:" nohup_optimized.out > average latency: 2ms:48us > average latency: 5ms:948us > average latency: 5ms:940us > average latency: 5ms:875us > average latency: 850us > average latency: 723us > average latency: 1ms:221us > average latency: 22ms:653us > average latency: 693us This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From github.com+39413832+weixlu at openjdk.java.net Tue Aug 10 13:50:57 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Tue, 10 Aug 2021 13:50:57 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing [v3] In-Reply-To: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: > ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. > Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. > We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. > > > [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log > baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms > baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms > baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms > baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms > baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms > baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms > baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms > baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms > baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms > baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms > baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms > baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms > optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms > optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms > optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms > optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms > optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms > optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms > optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms > optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms > optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms > optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms > optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms > optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms > > [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log > baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms > baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms > baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms > baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms > baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms > baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms > baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms > baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms > baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms > baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms > baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms > baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms > optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms > optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms > optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms > optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms > optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms > optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms > optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms > optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms > optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms > optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms > optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms > optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms > > [root at localhost corretto]# grep "average latency:" nohup_baseline.out > average latency: 2ms:40us > average latency: 6ms:550us > average latency: 6ms:543us > average latency: 6ms:493us > average latency: 928us > average latency: 794us > average latency: 1ms:403us > average latency: 23ms:216us > average latency: 775us > [root at localhost corretto]# grep "average latency:" nohup_optimized.out > average latency: 2ms:48us > average latency: 5ms:948us > average latency: 5ms:940us > average latency: 5ms:875us > average latency: 850us > average latency: 723us > average latency: 1ms:221us > average latency: 22ms:653us > average latency: 693us Xiaowei Lu has updated the pull request incrementally with one additional commit since the last revision: 8272138: ZGC: Adopt release ordering for self-healing ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5046/files - new: https://git.openjdk.java.net/jdk/pull/5046/files/2b05fae1..348c1c04 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5046&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5046&range=01-02 Stats: 4 lines in 3 files changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5046.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5046/head:pull/5046 PR: https://git.openjdk.java.net/jdk/pull/5046 From iwalulya at openjdk.java.net Tue Aug 10 13:57:40 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 10 Aug 2021 13:57:40 GMT Subject: RFR: 8272228: G1: G1CardSetInlinePtr Remove unreliable assertion Message-ID: Hi all, Please review this cleanup to remove unreliable asserts in G1CardSetInlinePtr. ------------- Commit messages: - remove asserts which always rereturn true Changes: https://git.openjdk.java.net/jdk/pull/5069/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5069&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272228 Stats: 6 lines in 1 file changed: 0 ins; 4 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5069.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5069/head:pull/5069 PR: https://git.openjdk.java.net/jdk/pull/5069 From github.com+39413832+weixlu at openjdk.java.net Tue Aug 10 14:09:48 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Tue, 10 Aug 2021 14:09:48 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing [v4] In-Reply-To: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: > ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. > Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. > We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. > > > [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log > baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms > baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms > baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms > baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms > baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms > baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms > baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms > baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms > baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms > baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms > baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms > baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms > optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms > optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms > optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms > optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms > optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms > optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms > optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms > optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms > optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms > optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms > optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms > optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms > > [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log > baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms > baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms > baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms > baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms > baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms > baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms > baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms > baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms > baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms > baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms > baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms > baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms > optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms > optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms > optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms > optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms > optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms > optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms > optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms > optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms > optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms > optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms > optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms > optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms > > [root at localhost corretto]# grep "average latency:" nohup_baseline.out > average latency: 2ms:40us > average latency: 6ms:550us > average latency: 6ms:543us > average latency: 6ms:493us > average latency: 928us > average latency: 794us > average latency: 1ms:403us > average latency: 23ms:216us > average latency: 775us > [root at localhost corretto]# grep "average latency:" nohup_optimized.out > average latency: 2ms:48us > average latency: 5ms:948us > average latency: 5ms:940us > average latency: 5ms:875us > average latency: 850us > average latency: 723us > average latency: 1ms:221us > average latency: 22ms:653us > average latency: 693us Xiaowei Lu has updated the pull request incrementally with one additional commit since the last revision: add annotation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5046/files - new: https://git.openjdk.java.net/jdk/pull/5046/files/348c1c04..3f685dee Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5046&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5046&range=02-03 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5046.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5046/head:pull/5046 PR: https://git.openjdk.java.net/jdk/pull/5046 From ayang at openjdk.java.net Tue Aug 10 14:10:32 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Aug 2021 14:10:32 GMT Subject: RFR: 8272228: G1: G1CardSetInlinePtr Remove unreliable assertion In-Reply-To: References: Message-ID: <6PqvNJRDN-xtgODHgovcYnot_s6lnC7ox4lZkDtMD-s=.d10dc856-c528-4344-9bba-331af1cf6fc8@github.com> On Tue, 10 Aug 2021 13:50:42 GMT, Ivan Walulya wrote: > Hi all, > > Please review this cleanup to remove unreliable asserts in G1CardSetInlinePtr. I think the intention here is sth like: `assert((_value & CardSetPtrTypeMask) == CardSetInlinePtr)`. ------------- PR: https://git.openjdk.java.net/jdk/pull/5069 From pliden at openjdk.java.net Tue Aug 10 14:15:30 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 10 Aug 2021 14:15:30 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing [v3] In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Tue, 10 Aug 2021 13:50:57 GMT, Xiaowei Lu wrote: >> ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. >> Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. >> We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. >> >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms >> >> [root at localhost corretto]# grep "average latency:" nohup_baseline.out >> average latency: 2ms:40us >> average latency: 6ms:550us >> average latency: 6ms:543us >> average latency: 6ms:493us >> average latency: 928us >> average latency: 794us >> average latency: 1ms:403us >> average latency: 23ms:216us >> average latency: 775us >> [root at localhost corretto]# grep "average latency:" nohup_optimized.out >> average latency: 2ms:48us >> average latency: 5ms:948us >> average latency: 5ms:940us >> average latency: 5ms:875us >> average latency: 850us >> average latency: 723us >> average latency: 1ms:221us >> average latency: 22ms:653us >> average latency: 693us > > Xiaowei Lu has updated the pull request incrementally with one additional commit since the last revision: > > 8272138: ZGC: Adopt release ordering for self-healing src/hotspot/share/gc/z/zRelocate.cpp line 77: > 75: ZUtils::object_copy_disjoint(from_addr, to_addr, size); > 76: > 77: OrderAccess::release(); Could you please replace the calls to `OrderAccess::release()` with a single call here instead: ```--- a/src/hotspot/share/gc/z/zForwarding.inline.hpp +++ b/src/hotspot/share/gc/z/zForwarding.inline.hpp @@ -136,6 +136,8 @@ inline uintptr_t ZForwarding::insert(uintptr_t from_index, uintptr_t to_offset, const ZForwardingEntry new_entry(from_index, to_offset); const ZForwardingEntry old_entry; // Empty + OrderAccess::release(); + for (;;) { const ZForwardingEntry prev_entry = Atomic::cmpxchg(entries() + *cursor, old_entry, new_entry, memory_order_release); if (!prev_entry.populated()) { While keeping the calls close to the copy code has some value, I think I'd prefer to have it in a single place, and close to the corresponding acquire in `at()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From pliden at openjdk.java.net Tue Aug 10 14:21:32 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 10 Aug 2021 14:21:32 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing [v3] In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Tue, 10 Aug 2021 14:12:30 GMT, Per Liden wrote: >> Xiaowei Lu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8272138: ZGC: Adopt release ordering for self-healing > > src/hotspot/share/gc/z/zRelocate.cpp line 77: > >> 75: ZUtils::object_copy_disjoint(from_addr, to_addr, size); >> 76: >> 77: OrderAccess::release(); > > Could you please replace the calls to `OrderAccess::release()` with a single call here instead: > > ```--- a/src/hotspot/share/gc/z/zForwarding.inline.hpp > +++ b/src/hotspot/share/gc/z/zForwarding.inline.hpp > @@ -136,6 +136,8 @@ inline uintptr_t ZForwarding::insert(uintptr_t from_index, uintptr_t to_offset, > const ZForwardingEntry new_entry(from_index, to_offset); > const ZForwardingEntry old_entry; // Empty > > + OrderAccess::release(); > + > for (;;) { > const ZForwardingEntry prev_entry = Atomic::cmpxchg(entries() + *cursor, old_entry, new_entry, memory_order_relaxed); > if (!prev_entry.populated()) { > > > While keeping the calls close to the copy code has some value, I think I'd prefer to have it in a single place, and close to the corresponding acquire in `at()`. I updated my comment above, since it had a typo (used `memory_order_release` instead of `memory_order_relaxed`). ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From github.com+39413832+weixlu at openjdk.java.net Tue Aug 10 14:21:32 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Tue, 10 Aug 2021 14:21:32 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing [v3] In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Tue, 10 Aug 2021 14:18:36 GMT, Per Liden wrote: >> src/hotspot/share/gc/z/zRelocate.cpp line 77: >> >>> 75: ZUtils::object_copy_disjoint(from_addr, to_addr, size); >>> 76: >>> 77: OrderAccess::release(); >> >> Could you please replace the calls to `OrderAccess::release()` with a single call here instead: >> >> ```--- a/src/hotspot/share/gc/z/zForwarding.inline.hpp >> +++ b/src/hotspot/share/gc/z/zForwarding.inline.hpp >> @@ -136,6 +136,8 @@ inline uintptr_t ZForwarding::insert(uintptr_t from_index, uintptr_t to_offset, >> const ZForwardingEntry new_entry(from_index, to_offset); >> const ZForwardingEntry old_entry; // Empty >> >> + OrderAccess::release(); >> + >> for (;;) { >> const ZForwardingEntry prev_entry = Atomic::cmpxchg(entries() + *cursor, old_entry, new_entry, memory_order_relaxed); >> if (!prev_entry.populated()) { >> >> >> While keeping the calls close to the copy code has some value, I think I'd prefer to have it in a single place, and close to the corresponding acquire in `at()`. > > I updated my comment above, since it had a typo (used `memory_order_release` instead of `memory_order_relaxed`). OK, it's better to add the fence in one single place. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From iwalulya at openjdk.java.net Tue Aug 10 14:26:33 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 10 Aug 2021 14:26:33 GMT Subject: RFR: 8272228: G1: G1CardSetInlinePtr Remove unreliable assertion In-Reply-To: <6PqvNJRDN-xtgODHgovcYnot_s6lnC7ox4lZkDtMD-s=.d10dc856-c528-4344-9bba-331af1cf6fc8@github.com> References: <6PqvNJRDN-xtgODHgovcYnot_s6lnC7ox4lZkDtMD-s=.d10dc856-c528-4344-9bba-331af1cf6fc8@github.com> Message-ID: On Tue, 10 Aug 2021 14:07:41 GMT, Albert Mingkun Yang wrote: > I think the intention here is sth like: `assert((_value & CardSetPtrTypeMask) == CardSetInlinePtr)`. yes, that might be better than just removing them ------------- PR: https://git.openjdk.java.net/jdk/pull/5069 From github.com+39413832+weixlu at openjdk.java.net Tue Aug 10 14:33:56 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Tue, 10 Aug 2021 14:33:56 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing [v5] In-Reply-To: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: > ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. > Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. > We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. > > > [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log > baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms > baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms > baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms > baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms > baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms > baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms > baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms > baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms > baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms > baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms > baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms > baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms > optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms > optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms > optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms > optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms > optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms > optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms > optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms > optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms > optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms > optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms > optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms > optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms > > [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log > baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms > baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms > baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms > baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms > baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms > baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms > baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms > baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms > baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms > baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms > baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms > baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms > optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms > optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms > optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms > optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms > optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms > optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms > optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms > optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms > optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms > optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms > optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms > optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms > > [root at localhost corretto]# grep "average latency:" nohup_baseline.out > average latency: 2ms:40us > average latency: 6ms:550us > average latency: 6ms:543us > average latency: 6ms:493us > average latency: 928us > average latency: 794us > average latency: 1ms:403us > average latency: 23ms:216us > average latency: 775us > [root at localhost corretto]# grep "average latency:" nohup_optimized.out > average latency: 2ms:48us > average latency: 5ms:948us > average latency: 5ms:940us > average latency: 5ms:875us > average latency: 850us > average latency: 723us > average latency: 1ms:221us > average latency: 22ms:653us > average latency: 693us Xiaowei Lu has updated the pull request incrementally with one additional commit since the last revision: replace release() to a single place in insert() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5046/files - new: https://git.openjdk.java.net/jdk/pull/5046/files/3f685dee..3b8ed5f5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5046&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5046&range=03-04 Stats: 12 lines in 2 files changed: 4 ins; 8 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5046.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5046/head:pull/5046 PR: https://git.openjdk.java.net/jdk/pull/5046 From github.com+39413832+weixlu at openjdk.java.net Tue Aug 10 14:33:57 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Tue, 10 Aug 2021 14:33:57 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing [v3] In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Tue, 10 Aug 2021 14:18:47 GMT, Xiaowei Lu wrote: >> I updated my comment above, since it had a typo (used `memory_order_release` instead of `memory_order_relaxed`). > > OK, it's better to add the fence in one single place. > I updated my comment above, since it had a typo (used `memory_order_release` instead of `memory_order_relaxed`). I have replaced it. Thanks for your advice. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From github.com+39413832+weixlu at openjdk.java.net Tue Aug 10 14:43:33 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Tue, 10 Aug 2021 14:43:33 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Tue, 10 Aug 2021 09:29:04 GMT, Erik ?sterlund wrote: >>> @fisk Thanks a lot for your detailed explanation. But I?m quite confused about the release in the forwarding table, why is it able to act as a release for both accesses? In my view, since the release is ?bonded? to forwarding table, it only ensures that copy happens before installing forwardee, why does it have something to do with self healing? From your explanation, I guess release is not ?bonded? to forwarding table. Instead, it maybe serves as membar to block all the CASes afterwards. Thanks again for your patient explanation. >> >> What I wrote kind of assumes that we have an explicit OrderAccess::release(); relaxed_cas(); in the forwarding table. Looks like we currently have a bounded releasing cas instead. The unbounded release is guaranteed to provide release semantics for *any* subsequent store operation. However, a bounded Store-Release, (i.e. stlr) instruction only needs to provide the release semantics to the bound store, as you say. So I suppose what I propose is to use release(); relaxed_cas(); in the forwarding table, and then also relaxed_cas(); in the self healing. Letting the release be done in only one place, and only when actual copying happens. The vast majority of self heals I have found are purely remapping the pointer lazily, which can then dodge any need for release when self healing. Hope this makes sense. > >> @fisk Hi, Eric. We are wondering if one thread loading a healed pointer can observe the corresponding copy has not finished yet. Assuming relaxed ordering for `cas_self_heal`, both Thread A and Thread B are loading the same reference. >> >> **Thread A**: `load obj.fld; // will relocate the object referenced by obj.fld` >> thread A will do the following: >> >> ``` >> 1 copy(); >> 2 cas_forwarding_table(); // release >> 3 cas_self_heal(); // relaxed >> ``` >> >> **Thread B**: `load obj.fld; // load the same reference` >> thread B may obverses the following reordering of **thread A**: >> >> ``` >> 3 cas_self_heal(); // relaxed >> 1 copy(); >> 2 cas_forwarding_table(); // release >> ``` >> >> To our knowledge, release ordering in _line 2_ does not prevent _line 3_ to be reordering before _line 1_, which indicates the release in the forwarding table is not enough. Perhaps we need to add acquire ordering to _line 2_ or add release ordering to _line 3_. >> >> In another way, as @weixlu said, >> >> > Instead, it maybe serves as membar to block all the CASes afterwards. >> >> relaxed ordering in _line 2_ along with release ordering in _line 3_ can indeed ensure thread B always observes the object copy. >> >> Looking forward to your advice. > > Yeah so I was kind of assuming the forwarding table installation would be an unbound release(); relaxed_cas(); > That way, the release serves a dual purpose: releasing for the table and releasing for the self heal. That way the vast majority of self heals (that only do remapping), won't need to release. > > @fisk Thanks a lot for your detailed explanation. But I?m quite confused about the release in the forwarding table, why is it able to act as a release for both accesses? In my view, since the release is ?bonded? to forwarding table, it only ensures that copy happens before installing forwardee, why does it have something to do with self healing? From your explanation, I guess release is not ?bonded? to forwarding table. Instead, it maybe serves as membar to block all the CASes afterwards. Thanks again for your patient explanation. > > What I wrote kind of assumes that we have an explicit OrderAccess::release(); relaxed_cas(); in the forwarding table. Looks like we currently have a bounded releasing cas instead. The unbounded release is guaranteed to provide release semantics for _any_ subsequent store operation. However, a bounded Store-Release, (i.e. stlr) instruction only needs to provide the release semantics to the bound store, as you say. So I suppose what I propose is to use release(); relaxed_cas(); in the forwarding table, and then also relaxed_cas(); in the self healing. Letting the release be done in only one place, and only when actual copying happens. The vast majority of self heals I have found are purely remapping the pointer lazily, which can then dodge any need for release when self healing. Hope this makes sense. Yes, it's better to use OrderAccess::release() with two relaxed_cas in forwarding table and self heal. I have made some changes and will run benchmarks afterwards. I believe your advice helps improve performace. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From ayang at openjdk.java.net Tue Aug 10 14:56:58 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Aug 2021 14:56:58 GMT Subject: RFR: 8272228: G1: G1CardSetInlinePtr Remove unreliable assertion [v2] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 14:54:29 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this cleanup to remove unreliable asserts in G1CardSetInlinePtr. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Albert suggestion The JBS ticket title needs to be updated. ------------- Marked as reviewed by ayang (Committer). PR: https://git.openjdk.java.net/jdk/pull/5069 From iwalulya at openjdk.java.net Tue Aug 10 14:56:57 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 10 Aug 2021 14:56:57 GMT Subject: RFR: 8272228: G1: G1CardSetInlinePtr Remove unreliable assertion [v2] In-Reply-To: References: Message-ID: > Hi all, > > Please review this cleanup to remove unreliable asserts in G1CardSetInlinePtr. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Albert suggestion ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5069/files - new: https://git.openjdk.java.net/jdk/pull/5069/files/8cf7e8ee..4fb8670c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5069&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5069&range=00-01 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5069.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5069/head:pull/5069 PR: https://git.openjdk.java.net/jdk/pull/5069 From tschatzl at openjdk.java.net Tue Aug 10 15:09:28 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 15:09:28 GMT Subject: RFR: 8272228: G1: G1CardSetInlinePtr Remove unreliable assertion [v2] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 14:56:57 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this cleanup to remove unreliable asserts in G1CardSetInlinePtr. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Albert suggestion Lgtm. Please fix the CR title. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5069 From iwalulya at openjdk.java.net Tue Aug 10 15:18:41 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 10 Aug 2021 15:18:41 GMT Subject: RFR: 8272231 G1: Refactor G1CardSet::get_card_set to return G1CardSetHashTableValue* Message-ID: Hi all, Please review this change to return G1CardSetHashTableValue* from G1CardSet::get_card_set. Consequently, we do not have to use G1CardSet::get_or_add_card_set in instances where we only need to get the table entry (such as in G1CardSet::transfer_cards_in_howl). Testing: Tier 1-5. ------------- Commit messages: - refactor G1CardSet::get_card_set Changes: https://git.openjdk.java.net/jdk/pull/5072/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5072&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272231 Stats: 22 lines in 2 files changed: 6 ins; 3 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/5072.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5072/head:pull/5072 PR: https://git.openjdk.java.net/jdk/pull/5072 From ayang at openjdk.java.net Tue Aug 10 16:31:04 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Aug 2021 16:31:04 GMT Subject: RFR: 8272235: G1: update outdated code root fixup Message-ID: <2ZKpnNC6QMTJt0xrG99R33JLrZTdBQIrVN9oUW7r26o=.83c94abd-e8e0-45e2-9c3c-dfdc5f543fec@github.com> Simple change of updating outdated comments and certain renaming. Test: hotspot_gc ------------- Commit messages: - nmethod-list Changes: https://git.openjdk.java.net/jdk/pull/5073/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5073&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272235 Stats: 19 lines in 4 files changed: 7 ins; 0 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/5073.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5073/head:pull/5073 PR: https://git.openjdk.java.net/jdk/pull/5073 From rkennke at openjdk.java.net Tue Aug 10 16:47:28 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 10 Aug 2021 16:47:28 GMT Subject: RFR: 8272165: Consolidate mark_must_be_preserved() variants [v2] In-Reply-To: References: Message-ID: > We have markWord::mark_must_be_preserved() and markWord::mark_must_be_preserved_for_promotion_failure(). We used to have slightly different implementations when we had BiasedLocking, but now they are identical and used for identical purposes too. Let's remove what we don't need. > > Testing: > - [x] hotspot_gc > - [x] tier1 > - [x] tier2 Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8272165 - 8272165: Consolidate mark_must_be_preserved() variants ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5057/files - new: https://git.openjdk.java.net/jdk/pull/5057/files/ebeace12..e66ddc7c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5057&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5057&range=00-01 Stats: 8049 lines in 220 files changed: 6586 ins; 688 del; 775 mod Patch: https://git.openjdk.java.net/jdk/pull/5057.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5057/head:pull/5057 PR: https://git.openjdk.java.net/jdk/pull/5057 From tschatzl at openjdk.java.net Tue Aug 10 19:58:27 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 19:58:27 GMT Subject: RFR: 8272235: G1: update outdated code root fixup In-Reply-To: <2ZKpnNC6QMTJt0xrG99R33JLrZTdBQIrVN9oUW7r26o=.83c94abd-e8e0-45e2-9c3c-dfdc5f543fec@github.com> References: <2ZKpnNC6QMTJt0xrG99R33JLrZTdBQIrVN9oUW7r26o=.83c94abd-e8e0-45e2-9c3c-dfdc5f543fec@github.com> Message-ID: On Tue, 10 Aug 2021 15:36:22 GMT, Albert Mingkun Yang wrote: > Simple change of updating outdated comments and certain renaming. > > Test: hotspot_gc Lgtm. I do not know why the "additional builds" would fail, but there is no log either. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3724: > 3722: // G1RootProcessor object. By subtracting the WorkGang task from the total > 3723: // time of this scope, we get the "NMethod List Cleanup" time. Such list is > 3724: // constructed during "STW two-phase nmethod root processing", see more in Maybe s/such/this. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5073 From tschatzl at openjdk.java.net Tue Aug 10 20:00:25 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 20:00:25 GMT Subject: RFR: 8272231 G1: Refactor G1CardSet::get_card_set to return G1CardSetHashTableValue* In-Reply-To: References: Message-ID: <5hM1cy_MoHg5TrXVe6VEdBaxsk4HysB3MHz4AvQt9D4=.40d988a4-a178-4428-a9bd-0a1350aab57f@github.com> On Tue, 10 Aug 2021 15:11:02 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to return G1CardSetHashTableValue* from G1CardSet::get_card_set. Consequently, we do not have to use G1CardSet::get_or_add_card_set in instances where we only need to get the table entry (such as in G1CardSet::transfer_cards_in_howl). > > Testing: Tier 1-5. Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5072 From tschatzl at openjdk.java.net Tue Aug 10 20:01:29 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 20:01:29 GMT Subject: RFR: 8267833: Improve G1CardSetInlinePtr::add() In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 13:02:26 GMT, Ivan Walulya wrote: > Hi all, > > Please review this cleanup change to reduce the number of comparisons made when adding entries to an inline pointer cardset container. We allow the find to restart from a previously known index. > > Testing: Tier 1-3 Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5067 From tschatzl at openjdk.java.net Tue Aug 10 20:28:25 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 20:28:25 GMT Subject: RFR: 8271884: G1CH::_expand_heap_after_alloc_failure is no longer needed In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 10:43:19 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to remove _expand_heap_after_alloc_failure. Checks for old_is_full(), survivor_is_full(), and G1CollectedHeap::has_more_regions() should eliminate repeated expansion attempts. > > Testing; tier 1 Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 167: > 165: HeapRegion* res = _hrm.allocate_free_region(type, node_index); > 166: > 167: if (res == NULL && do_expand) { Ivan and me were looking into any adverse impact when removing this, in particular the expansion being called very often without `_expand_heap_after_alloc_failure`. There is indeed an issue with convoying in both the survivor and old gen region allocation; the code that ultimately calls down here looks like the following: if (result == NULL && !old_is_full()) { MutexLocker x(FreeList_lock, Mutex::_no_safepoint_check_flag); result = old_gc_alloc_region()->attempt_allocation_locked(min_word_size, desired_word_size, actual_word_size); if (result == NULL) { set_old_full(); } } ``` (likewise for survivor allocation) After the first check to `old_is_full()` we get the `FreeList_lock` and after that call down to the allocation. This is actually the problem: multiple threads can be waiting for the lock trying to allocate, and all of these waiters will be able to try to expand without that `_expand_heap_after_alloc_failure` flag. The solution is to check `old_is_full()` after grabbing the lock again - which is actually better than only finding out here quite late that there is nothing to allocate. ------------- PR: https://git.openjdk.java.net/jdk/pull/5030 From ayang at openjdk.java.net Tue Aug 10 20:46:57 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Aug 2021 20:46:57 GMT Subject: RFR: 8272235: G1: update outdated code root fixup [v2] In-Reply-To: <2ZKpnNC6QMTJt0xrG99R33JLrZTdBQIrVN9oUW7r26o=.83c94abd-e8e0-45e2-9c3c-dfdc5f543fec@github.com> References: <2ZKpnNC6QMTJt0xrG99R33JLrZTdBQIrVN9oUW7r26o=.83c94abd-e8e0-45e2-9c3c-dfdc5f543fec@github.com> Message-ID: > Simple change of updating outdated comments and certain renaming. > > Test: hotspot_gc Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5073/files - new: https://git.openjdk.java.net/jdk/pull/5073/files/bcd8b8e3..ce687d4b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5073&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5073&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5073.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5073/head:pull/5073 PR: https://git.openjdk.java.net/jdk/pull/5073 From ayang at openjdk.java.net Tue Aug 10 20:47:02 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 10 Aug 2021 20:47:02 GMT Subject: RFR: 8272235: G1: update outdated code root fixup [v2] In-Reply-To: References: <2ZKpnNC6QMTJt0xrG99R33JLrZTdBQIrVN9oUW7r26o=.83c94abd-e8e0-45e2-9c3c-dfdc5f543fec@github.com> Message-ID: On Tue, 10 Aug 2021 19:54:50 GMT, Thomas Schatzl wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3724: > >> 3722: // G1RootProcessor object. By subtracting the WorkGang task from the total >> 3723: // time of this scope, we get the "NMethod List Cleanup" time. Such list is >> 3724: // constructed during "STW two-phase nmethod root processing", see more in > > Maybe s/such/this. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5073 From tschatzl at openjdk.java.net Tue Aug 10 20:57:24 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 10 Aug 2021 20:57:24 GMT Subject: RFR: 8271884: G1CH::_expand_heap_after_alloc_failure is no longer needed In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 20:24:42 GMT, Thomas Schatzl wrote: >> Hi, >> >> Please review this change to remove _expand_heap_after_alloc_failure. Checks for old_is_full(), survivor_is_full(), and G1CollectedHeap::has_more_regions() should eliminate repeated expansion attempts. >> >> Testing; tier 1 > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 167: > >> 165: HeapRegion* res = _hrm.allocate_free_region(type, node_index); >> 166: >> 167: if (res == NULL && do_expand) { > > Ivan and me were looking into any adverse impact when removing this, in particular the expansion being called very often without `_expand_heap_after_alloc_failure`. > There is indeed an issue with convoying in both the survivor and old gen region allocation; the code that ultimately calls down here looks like the following: > > if (result == NULL && !old_is_full()) { > MutexLocker x(FreeList_lock, Mutex::_no_safepoint_check_flag); > result = old_gc_alloc_region()->attempt_allocation_locked(min_word_size, > desired_word_size, > actual_word_size); > if (result == NULL) { > set_old_full(); > } > } > ``` > (likewise for survivor allocation) > After the first check to `old_is_full()` we get the `FreeList_lock` and after that call down to the allocation. This is actually the problem: multiple threads can be waiting for the lock trying to allocate, and all of these waiters will be able to try to expand without that `_expand_heap_after_alloc_failure` flag. > The solution is to check `old_is_full()` after grabbing the lock again - which is actually better than only finding out here quite late that there is nothing to allocate. We should probably check if there is a similar convoying effect for mutator allocation, possibly in a separate CR. ------------- PR: https://git.openjdk.java.net/jdk/pull/5030 From github.com+7947546+tanghaoth90 at openjdk.java.net Wed Aug 11 03:51:26 2021 From: github.com+7947546+tanghaoth90 at openjdk.java.net (Hao Tang) Date: Wed, 11 Aug 2021 03:51:26 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing [v5] In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Tue, 10 Aug 2021 14:33:56 GMT, Xiaowei Lu wrote: >> ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. >> Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. >> We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. >> >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms >> >> [root at localhost corretto]# grep "average latency:" nohup_baseline.out >> average latency: 2ms:40us >> average latency: 6ms:550us >> average latency: 6ms:543us >> average latency: 6ms:493us >> average latency: 928us >> average latency: 794us >> average latency: 1ms:403us >> average latency: 23ms:216us >> average latency: 775us >> [root at localhost corretto]# grep "average latency:" nohup_optimized.out >> average latency: 2ms:48us >> average latency: 5ms:948us >> average latency: 5ms:940us >> average latency: 5ms:875us >> average latency: 850us >> average latency: 723us >> average latency: 1ms:221us >> average latency: 22ms:653us >> average latency: 693us > > Xiaowei Lu has updated the pull request incrementally with one additional commit since the last revision: > > replace release() to a single place in insert() src/hotspot/share/gc/z/zForwarding.inline.hpp line 141: > 139: // Make sure that object copy is finished > 140: // before forwarding table installation > 141: OrderAccess::release(); Can we replace it by `OrderAccess::storestore()`? I think LoadStore constraint in `OrderAccess::release()` is not required here. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From github.com+39413832+weixlu at openjdk.java.net Wed Aug 11 06:28:42 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Wed, 11 Aug 2021 06:28:42 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing [v5] In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: <58nzCNHermRZrBS09J5J2ObMu4cBUqExCd2DGkzTYTI=.6717427c-0c74-4ddc-bf27-bf1dc2babbbc@github.com> On Tue, 10 Aug 2021 14:33:56 GMT, Xiaowei Lu wrote: >> ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. >> Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. >> We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. >> >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms >> >> [root at localhost corretto]# grep "average latency:" nohup_baseline.out >> average latency: 2ms:40us >> average latency: 6ms:550us >> average latency: 6ms:543us >> average latency: 6ms:493us >> average latency: 928us >> average latency: 794us >> average latency: 1ms:403us >> average latency: 23ms:216us >> average latency: 775us >> [root at localhost corretto]# grep "average latency:" nohup_optimized.out >> average latency: 2ms:48us >> average latency: 5ms:948us >> average latency: 5ms:940us >> average latency: 5ms:875us >> average latency: 850us >> average latency: 723us >> average latency: 1ms:221us >> average latency: 22ms:653us >> average latency: 693us > > Xiaowei Lu has updated the pull request incrementally with one additional commit since the last revision: > > replace release() to a single place in insert() After changing the bounded release to unbounded and relaxed_cas, concurrent mark/relocate time has decreased by 10%-20% according to corretto/heapothesys benchmark. In contrast, the original optimization only witnessed 5%-10% decrease. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From eosterlund at openjdk.java.net Wed Aug 11 06:34:26 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 11 Aug 2021 06:34:26 GMT Subject: RFR: 8272138: ZGC: Adopt release ordering for self-healing [v5] In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Tue, 10 Aug 2021 14:33:56 GMT, Xiaowei Lu wrote: >> ZGC utilizes self-healing in load barrier to fix bad references. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc., where addresses get healed) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the healed address. >> Let us consider memory_order_release for ZBarrier::self_heal. For example, Thread 1 is fixing a reference, and Thread 2 attempts to access the same reference. There exists data dependency in T2 where access of pointer happens before access of object?s content, equaling acquire semantic. Pairing with the release semantic in self healing, this makes up inter-thread acquire-release memory ordering. As a result, the two statements we mentioned above can be guaranteed by the acquire-release ordering. >> We performed an experiment on benchmark corretto/heapothesys on AArch64. The optimized version results in both (1) shorter average concurrent mark time and (2) shorter average concurrent relocation time. Furthermore, we notice shorter average latency in almost all testcases. >> >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Mark " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Mark 960.359 / 960.359 587.203 / 1248.362 587.203 / 1248.362 587.203 / 1248.362 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Mark 116.748 / 116.748 656.777 / 1736.469 656.777 / 1736.469 656.777 / 1736.469 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Mark 125.460 / 125.460 620.440 / 1736.469 620.440 / 1736.469 620.440 / 1736.469 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Mark 935.295 / 935.295 673.080 / 1736.469 673.080 / 1736.469 673.080 / 1736.469 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Mark 1448.705 / 1448.705 723.484 / 1814.849 723.484 / 1814.849 723.484 / 1814.849 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Mark 1490.123 / 1490.123 796.960 / 1842.794 796.960 / 1842.794 796.960 / 1842.794 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 912.439 / 2183.065 867.799 / 2183.065 867.799 / 2183.065 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Mark 1468.594 / 1468.594 990.044 / 2183.065 912.281 / 2183.065 912.281 / 2183.065 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Mark 137.435 / 137.435 1109.116 / 2276.535 967.470 / 2276.535 967.470 / 2276.535 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Mark 184.093 / 184.093 1172.446 / 2276.535 997.343 / 2276.535 997.343 / 2276.535 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Mark 1537.673 / 1537.673 1211.815 / 2276.535 1013.076 / 2276.535 1013.076 / 2276.535 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1218.085 / 2276.535 1025.443 / 2276.535 1025.443 / 2276.535 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Mark 1053.065 / 1053.065 581.646 / 1249.822 581.646 / 1249.822 581.646 / 1249.822 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Mark 885.795 / 885.795 573.650 / 1277.782 573.650 / 1277.782 573.650 / 1277.782 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Mark 124.236 / 124.236 641.028 / 1828.124 641.028 / 1828.124 641.028 / 1828.124 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Mark 875.383 / 875.383 666.465 / 1828.124 666.465 / 1828.124 666.465 / 1828.124 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Mark 1937.305 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 754.228 / 1937.305 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Mark 173.064 / 173.064 771.387 / 1937.305 771.387 / 1937.305 771.387 / 1937.305 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Mark 1832.584 / 1832.584 899.646 / 2048.471 856.838 / 2048.471 856.838 / 2048.471 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Mark 1510.755 / 1510.755 981.807 / 2048.471 893.373 / 2048.471 893.373 / 2048.471 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Mark 1472.737 / 1472.737 1044.755 / 2089.539 927.733 / 2089.539 927.733 / 2089.539 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Mark 1513.077 / 1513.077 1095.827 / 2089.539 947.202 / 2089.539 947.202 / 2089.539 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 1073.703 / 2089.539 943.684 / 2089.539 943.684 / 2089.539 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Mark 1337.865 / 1337.865 1119.936 / 2113.895 962.172 / 2113.895 962.172 / 2113.895 ms >> >> [root at localhost corretto]# grep "00.*Phase: Concurrent Relocate " *.log >> baseline.log:[100.412s][info][gc,stats ] Phase: Concurrent Relocate 196.522 / 196.522 114.318 / 245.371 114.318 / 245.371 114.318 / 245.371 ms >> baseline.log:[200.411s][info][gc,stats ] Phase: Concurrent Relocate 47.748 / 47.748 130.861 / 331.948 130.861 / 331.948 130.861 / 331.948 ms >> baseline.log:[300.411s][info][gc,stats ] Phase: Concurrent Relocate 56.922 / 56.922 129.174 / 331.948 129.174 / 331.948 129.174 / 331.948 ms >> baseline.log:[400.411s][info][gc,stats ] Phase: Concurrent Relocate 218.707 / 218.707 137.495 / 331.948 137.495 / 331.948 137.495 / 331.948 ms >> baseline.log:[500.411s][info][gc,stats ] Phase: Concurrent Relocate 197.166 / 197.166 144.216 / 359.644 144.216 / 359.644 144.216 / 359.644 ms >> baseline.log:[600.411s][info][gc,stats ] Phase: Concurrent Relocate 202.118 / 202.118 153.507 / 373.447 153.507 / 373.447 153.507 / 373.447 ms >> baseline.log:[700.411s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 172.241 / 395.113 164.291 / 395.113 164.291 / 395.113 ms >> baseline.log:[800.412s][info][gc,stats ] Phase: Concurrent Relocate 215.121 / 215.121 186.007 / 421.039 173.139 / 421.039 173.139 / 421.039 ms >> baseline.log:[900.411s][info][gc,stats ] Phase: Concurrent Relocate 48.550 / 48.550 203.420 / 421.982 181.899 / 421.982 181.899 / 421.982 ms >> baseline.log:[1000.411s][info][gc,stats ] Phase: Concurrent Relocate 53.847 / 53.847 211.774 / 421.982 185.728 / 421.982 185.728 / 421.982 ms >> baseline.log:[1100.411s][info][gc,stats ] Phase: Concurrent Relocate 224.489 / 224.489 218.195 / 431.088 188.087 / 431.088 188.087 / 431.088 ms >> baseline.log:[1200.412s][info][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 222.852 / 431.088 191.130 / 431.088 191.130 / 431.088 ms >> optimized.log:[100.423s][info][gc,stats ] Phase: Concurrent Relocate 193.811 / 193.811 113.043 / 248.471 113.043 / 248.471 113.043 / 248.471 ms >> optimized.log:[200.423s][info][gc,stats ] Phase: Concurrent Relocate 196.220 / 196.220 117.810 / 248.471 117.810 / 248.471 117.810 / 248.471 ms >> optimized.log:[300.423s][info][gc,stats ] Phase: Concurrent Relocate 48.786 / 48.786 131.753 / 351.890 131.753 / 351.890 131.753 / 351.890 ms >> optimized.log:[400.423s][info][gc,stats ] Phase: Concurrent Relocate 195.302 / 195.302 139.115 / 351.890 139.115 / 351.890 139.115 / 351.890 ms >> optimized.log:[500.423s][info][gc,stats ] Phase: Concurrent Relocate 374.022 / 374.022 155.204 / 374.022 155.204 / 374.022 155.204 / 374.022 ms >> optimized.log:[600.423s][info][gc,stats ] Phase: Concurrent Relocate 49.222 / 49.222 159.444 / 400.795 159.444 / 400.795 159.444 / 400.795 ms >> optimized.log:[700.423s][info][gc,stats ] Phase: Concurrent Relocate 381.072 / 381.072 182.488 / 409.086 173.140 / 409.086 173.140 / 409.086 ms >> optimized.log:[800.423s][info][gc,stats ] Phase: Concurrent Relocate 223.399 / 223.399 191.774 / 409.086 175.748 / 409.086 175.748 / 409.086 ms >> optimized.log:[900.423s][info][gc,stats ] Phase: Concurrent Relocate 214.184 / 214.184 201.526 / 409.086 181.302 / 409.086 181.302 / 409.086 ms >> optimized.log:[1000.423s][info][gc,stats ] Phase: Concurrent Relocate 208.600 / 208.600 207.389 / 420.479 183.756 / 420.479 183.756 / 420.479 ms >> optimized.log:[1100.423s][info][gc,stats ] Phase: Concurrent Relocate 209.444 / 209.444 202.367 / 420.479 183.173 / 420.479 183.173 / 420.479 ms >> optimized.log:[1200.423s][info][gc,stats ] Phase: Concurrent Relocate 223.841 / 223.841 206.268 / 420.479 185.074 / 420.479 185.074 / 420.479 ms >> >> [root at localhost corretto]# grep "average latency:" nohup_baseline.out >> average latency: 2ms:40us >> average latency: 6ms:550us >> average latency: 6ms:543us >> average latency: 6ms:493us >> average latency: 928us >> average latency: 794us >> average latency: 1ms:403us >> average latency: 23ms:216us >> average latency: 775us >> [root at localhost corretto]# grep "average latency:" nohup_optimized.out >> average latency: 2ms:48us >> average latency: 5ms:948us >> average latency: 5ms:940us >> average latency: 5ms:875us >> average latency: 850us >> average latency: 723us >> average latency: 1ms:221us >> average latency: 22ms:653us >> average latency: 693us > > Xiaowei Lu has updated the pull request incrementally with one additional commit since the last revision: > > replace release() to a single place in insert() Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5046 From pliden at openjdk.java.net Wed Aug 11 08:26:56 2021 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 11 Aug 2021 08:26:56 GMT Subject: RFR: 8267186: Add string deduplication support to ZGC [v4] In-Reply-To: References: Message-ID: > This change adds support for string deduplication to ZGC. It's a pretty straight forward change, but to make reviewing even easier it is broken up into two commits: > > __ZGC: Introduce ZMarkContext__ > This commit just moves the `ZMarkCache` into the new `ZMarkContext` class, and we now pass a `ZMarkContext*` around instead of a `ZMarkCache*`. The `ZMarkContext` class is a more general container for worker-local data, which in the next commit will be extended to also include the `StringDedup::Requests` queue. > > __8267186: Add string deduplication support to ZGC__ > This commits adds the actual string dedup functionality and enables relevant tests. We use the `deduplication_requested` bit in the `String` object to filter out `Strings` we've already attempted to deduplicate. > > Testing: > - Passes all string dedup tests. > - Passes Tier1-7 with ZGC on Linux/x86_64 (with -XX:+UseStringDeduplication enabled by default to get better exposure). Per Liden has updated the pull request incrementally with one additional commit since the last revision: Review comments from StefanK ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5029/files - new: https://git.openjdk.java.net/jdk/pull/5029/files/c164f2c9..ea61a000 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5029&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5029&range=02-03 Stats: 60 lines in 3 files changed: 28 ins; 27 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/5029.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5029/head:pull/5029 PR: https://git.openjdk.java.net/jdk/pull/5029 From pliden at openjdk.java.net Wed Aug 11 08:29:27 2021 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 11 Aug 2021 08:29:27 GMT Subject: RFR: 8272138: ZGC: Adopt relaxed ordering for self-healing [v5] In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: <0Jtwh7ieWFPVvTXYCX1bFq-bjFnypGI2AsOxuHRFj6s=.82a5468b-21da-403b-9ff5-488fb46d0403@github.com> On Tue, 10 Aug 2021 14:33:56 GMT, Xiaowei Lu wrote: >> ZGC utilizes self-healing in load barrier to fix references to old objects. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc. where we get healed address) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the content at the healed address. >> Let us consider changing the bounded releasing CAS in forwarding table installation to unbounded. In this way, the release (OrderAccess::release()) serves as a membar to block any subsequent store operations (forwarding table installation and self healing). More specifically, we can use release(), relaxed_cas() in the forwarding table and relaxed_cas() in the self healing. In the scenario when a thread reads healed address from forwarding table, relaxed_cas() is also fine due to the load-acquire in forwarding table. >> Since release is done only in forwarding table, the majority of self heals, which only remap the pointer, is no longer restricted by the memory order. Performance is visibly improved, demonstrated by benchmark corretto/heapothesys on AArch64. The optimized version decreases both average concurrent mark/relocation time by 15%-20% and 10%-15%, respectively? >> >> baseline >> [1000.412s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 161.031 / 232.546 147.641 / 289.855 147.641 / 289.855 ms >> [1100.412s][info ][gc,stats ] Phase: Concurrent Relocate 220.739 / 220.739 162.373 / 232.546 149.909 / 289.855 149.909 / 289.855 ms >> [1200.411s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 157.900 / 232.546 148.912 / 289.855 148.912 / 289.855 ms >> [1000.412s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 816.505 / 1508.676 759.533 / 1567.506 759.533 / 1567.506 ms >> [1100.412s][info ][gc,stats ] Phase: Concurrent Mark 1262.680 / 1262.680 838.750 / 1508.676 772.495 / 1567.506 772.495 / 1567.506 ms >> [1200.411s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 813.220 / 1422.825 769.346 / 1567.506 769.346 / 1567.506 ms >> optimized >> [1000.447s][info ][gc,stats ] Phase: Concurrent Relocate 248.035 / 248.035 162.719 / 253.498 124.061 / 273.227 124.061 / 273.227 ms >> [1100.447s][info ][gc,stats ] Phase: Concurrent Relocate 217.434 / 217.434 161.209 / 253.498 126.767 / 273.227 126.767 / 273.227 ms >> [1200.447s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 164.023 / 253.498 129.314 / 273.227 129.314 / 273.227 ms >> [1000.447s][info ][gc,stats ] Phase: Concurrent Mark 1224.059 / 1224.059 809.838 / 1518.214 582.867 / 1518.214 582.867 / 1518.214 ms >> [1100.447s][info ][gc,stats ] Phase: Concurrent Mark 1302.235 / 1302.235 793.589 / 1518.214 600.215 / 1518.214 600.215 / 1518.214 ms >> [1200.447s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 821.320 / 1518.214 615.187 / 1518.214 615.187 / 1518.214 ms > > Xiaowei Lu has updated the pull request incrementally with one additional commit since the last revision: > > replace release() to a single place in insert() Looks good. ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5046 From pliden at openjdk.java.net Wed Aug 11 08:31:36 2021 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 11 Aug 2021 08:31:36 GMT Subject: RFR: 8267186: Add string deduplication support to ZGC [v3] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 11:40:55 GMT, Stefan Karlsson wrote: > [...] I think it would be good to let this class be a simpler data carrier, and move the extra code/logic in ZMarkContext::try_deduplicate out to ZMark, or maybe even a new ZStringDedup class. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5029 From github.com+39413832+weixlu at openjdk.java.net Wed Aug 11 08:34:30 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Wed, 11 Aug 2021 08:34:30 GMT Subject: RFR: 8272138: ZGC: Adopt relaxed ordering for self-healing In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: On Tue, 10 Aug 2021 09:29:04 GMT, Erik ?sterlund wrote: >>> @fisk Thanks a lot for your detailed explanation. But I?m quite confused about the release in the forwarding table, why is it able to act as a release for both accesses? In my view, since the release is ?bonded? to forwarding table, it only ensures that copy happens before installing forwardee, why does it have something to do with self healing? From your explanation, I guess release is not ?bonded? to forwarding table. Instead, it maybe serves as membar to block all the CASes afterwards. Thanks again for your patient explanation. >> >> What I wrote kind of assumes that we have an explicit OrderAccess::release(); relaxed_cas(); in the forwarding table. Looks like we currently have a bounded releasing cas instead. The unbounded release is guaranteed to provide release semantics for *any* subsequent store operation. However, a bounded Store-Release, (i.e. stlr) instruction only needs to provide the release semantics to the bound store, as you say. So I suppose what I propose is to use release(); relaxed_cas(); in the forwarding table, and then also relaxed_cas(); in the self healing. Letting the release be done in only one place, and only when actual copying happens. The vast majority of self heals I have found are purely remapping the pointer lazily, which can then dodge any need for release when self healing. Hope this makes sense. > >> @fisk Hi, Eric. We are wondering if one thread loading a healed pointer can observe the corresponding copy has not finished yet. Assuming relaxed ordering for `cas_self_heal`, both Thread A and Thread B are loading the same reference. >> >> **Thread A**: `load obj.fld; // will relocate the object referenced by obj.fld` >> thread A will do the following: >> >> ``` >> 1 copy(); >> 2 cas_forwarding_table(); // release >> 3 cas_self_heal(); // relaxed >> ``` >> >> **Thread B**: `load obj.fld; // load the same reference` >> thread B may obverses the following reordering of **thread A**: >> >> ``` >> 3 cas_self_heal(); // relaxed >> 1 copy(); >> 2 cas_forwarding_table(); // release >> ``` >> >> To our knowledge, release ordering in _line 2_ does not prevent _line 3_ to be reordering before _line 1_, which indicates the release in the forwarding table is not enough. Perhaps we need to add acquire ordering to _line 2_ or add release ordering to _line 3_. >> >> In another way, as @weixlu said, >> >> > Instead, it maybe serves as membar to block all the CASes afterwards. >> >> relaxed ordering in _line 2_ along with release ordering in _line 3_ can indeed ensure thread B always observes the object copy. >> >> Looking forward to your advice. > > Yeah so I was kind of assuming the forwarding table installation would be an unbound release(); relaxed_cas(); > That way, the release serves a dual purpose: releasing for the table and releasing for the self heal. That way the vast majority of self heals (that only do remapping), won't need to release. @fisk @pliden Thanks for your advice and reviewing. Could you please sponsor this change? ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From iwalulya at openjdk.java.net Wed Aug 11 08:39:06 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 11 Aug 2021 08:39:06 GMT Subject: RFR: 8271884: G1CH::_expand_heap_after_alloc_failure is no longer needed [v2] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 20:54:49 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 167: >> >>> 165: HeapRegion* res = _hrm.allocate_free_region(type, node_index); >>> 166: >>> 167: if (res == NULL && do_expand) { >> >> Ivan and me were looking into any adverse impact when removing this, in particular the expansion being called very often without `_expand_heap_after_alloc_failure`. >> There is indeed an issue with convoying in both the survivor and old gen region allocation; the code that ultimately calls down here looks like the following: >> >> if (result == NULL && !old_is_full()) { >> MutexLocker x(FreeList_lock, Mutex::_no_safepoint_check_flag); >> result = old_gc_alloc_region()->attempt_allocation_locked(min_word_size, >> desired_word_size, >> actual_word_size); >> if (result == NULL) { >> set_old_full(); >> } >> } >> ``` >> (likewise for survivor allocation) >> After the first check to `old_is_full()` we get the `FreeList_lock` and after that call down to the allocation. This is actually the problem: multiple threads can be waiting for the lock trying to allocate, and all of these waiters will be able to try to expand without that `_expand_heap_after_alloc_failure` flag. >> The solution is to check `old_is_full()` after grabbing the lock again - which is actually better than only finding out here quite late that there is nothing to allocate. > > We should probably check if there is a similar convoying effect for mutator allocation, possibly in a separate CR. Thanks @tschatzl for the detailed explanation. Fixed! ------------- PR: https://git.openjdk.java.net/jdk/pull/5030 From iwalulya at openjdk.java.net Wed Aug 11 08:39:03 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 11 Aug 2021 08:39:03 GMT Subject: RFR: 8271884: G1CH::_expand_heap_after_alloc_failure is no longer needed [v2] In-Reply-To: References: Message-ID: > Hi, > > Please review this change to remove _expand_heap_after_alloc_failure. Checks for old_is_full(), survivor_is_full(), and G1CollectedHeap::has_more_regions() should eliminate repeated expansion attempts. > > Testing; tier 1 Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: check *_is_full status after acquiring FreeList_lock ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5030/files - new: https://git.openjdk.java.net/jdk/pull/5030/files/df0ef880..096e815f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5030&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5030&range=00-01 Stats: 14 lines in 1 file changed: 4 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/5030.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5030/head:pull/5030 PR: https://git.openjdk.java.net/jdk/pull/5030 From github.com+39413832+weixlu at openjdk.java.net Wed Aug 11 08:46:35 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Wed, 11 Aug 2021 08:46:35 GMT Subject: Integrated: 8272138: ZGC: Adopt relaxed ordering for self-healing In-Reply-To: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: <5zIqvOrMEBsQYQS8nZ-VloZFSV5H3Trzbx4KBYcMDcs=.71e354de-3c20-41ff-b901-ad0259f5cf00@github.com> On Mon, 9 Aug 2021 08:19:35 GMT, Xiaowei Lu wrote: > ZGC utilizes self-healing in load barrier to fix references to old objects. Currently, this fixing (ZBarrier::self_heal) adopts memory_order_conservative to guarantee that (1) the slow path (relocate, mark, etc. where we get healed address) always happens before self healing, and (2) the other thread that accesses the same reference is able to access the content at the healed address. > Let us consider changing the bounded releasing CAS in forwarding table installation to unbounded. In this way, the release (OrderAccess::release()) serves as a membar to block any subsequent store operations (forwarding table installation and self healing). More specifically, we can use release(), relaxed_cas() in the forwarding table and relaxed_cas() in the self healing. In the scenario when a thread reads healed address from forwarding table, relaxed_cas() is also fine due to the load-acquire in forwarding table. > Since release is done only in forwarding table, the majority of self heals, which only remap the pointer, is no longer restricted by the memory order. Performance is visibly improved, demonstrated by benchmark corretto/heapothesys on AArch64. The optimized version decreases both average concurrent mark/relocation time by 15%-20% and 10%-15%, respectively? > > baseline > [1000.412s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 161.031 / 232.546 147.641 / 289.855 147.641 / 289.855 ms > [1100.412s][info ][gc,stats ] Phase: Concurrent Relocate 220.739 / 220.739 162.373 / 232.546 149.909 / 289.855 149.909 / 289.855 ms > [1200.411s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 157.900 / 232.546 148.912 / 289.855 148.912 / 289.855 ms > [1000.412s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 816.505 / 1508.676 759.533 / 1567.506 759.533 / 1567.506 ms > [1100.412s][info ][gc,stats ] Phase: Concurrent Mark 1262.680 / 1262.680 838.750 / 1508.676 772.495 / 1567.506 772.495 / 1567.506 ms > [1200.411s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 813.220 / 1422.825 769.346 / 1567.506 769.346 / 1567.506 ms > optimized > [1000.447s][info ][gc,stats ] Phase: Concurrent Relocate 248.035 / 248.035 162.719 / 253.498 124.061 / 273.227 124.061 / 273.227 ms > [1100.447s][info ][gc,stats ] Phase: Concurrent Relocate 217.434 / 217.434 161.209 / 253.498 126.767 / 273.227 126.767 / 273.227 ms > [1200.447s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 164.023 / 253.498 129.314 / 273.227 129.314 / 273.227 ms > [1000.447s][info ][gc,stats ] Phase: Concurrent Mark 1224.059 / 1224.059 809.838 / 1518.214 582.867 / 1518.214 582.867 / 1518.214 ms > [1100.447s][info ][gc,stats ] Phase: Concurrent Mark 1302.235 / 1302.235 793.589 / 1518.214 600.215 / 1518.214 600.215 / 1518.214 ms > [1200.447s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 821.320 / 1518.214 615.187 / 1518.214 615.187 / 1518.214 ms This pull request has now been integrated. Changeset: 846cc88f Author: Xiaowei Lu Committer: Per Liden URL: https://git.openjdk.java.net/jdk/commit/846cc88f9452a63269130b7fe17f504deaf2a773 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod 8272138: ZGC: Adopt relaxed ordering for self-healing Co-authored-by: Hao Tang Reviewed-by: eosterlund, pliden ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From tschatzl at openjdk.java.net Wed Aug 11 09:01:26 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 11 Aug 2021 09:01:26 GMT Subject: RFR: 8272083: G1: Record iterated range for BOT performance during card scan In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 11:24:50 GMT, Yude Lin wrote: > > From what I understand from a brief look at the backwards scan prototype (I will continue a bit, and I may be wrong), the code actually implements scanning backwards the cards within claimed chunks. I wasn't really suggesting to scan backwards the cards, but claim the chunks (i.e. the set of cards a given thread will look at) from top to bottom addresses within the region. Or just have the first claim be the top chunk, or as a first task for a completely untouched region, have the thread explicitly update the BOT. Or after claiming a chunk, first fix the bot for that chunk, and then scan through. > > The prototype reverses both the chunk claiming process and the card scanning process in each chunk. Because I thought within each chunk there could also be repetitive work if we do forward scan. I would not think so. At the first card in that last chunk point the last index for the BOT is up to date, and we are linearly improving the BOT from there. > > > I am also trying to understand a little how serious the problem is in terms of actual performance hit by the duplicate work. It might be fairly high when not doing refinement at all as you indicate. > > I can do a test that compares the scan time with and without the duplicate work removed. And maybe change the concurrent refinement strength to create different groups. It seems the decisions depend on more data. E.g., if concurrent refinement is on, and the de-duplicate process isn't very effective in terms of reduction in scan time, then the process itself should be too heavy weight? That would be great to quantify this issue (mostly out of curiosity). > > As for concurrent BOT update, I believe that the existing concurrent refinement code already concurrently updates the BOT as needed afaik so that shouldn't be an issue (I need to go look in detail). One idea could be that after allocating a PLAB (in old gen only of course) we enqueue some background task to fix these. We know the exact locations and sizes of those, maybe this somewhat helps. We also know the chunk sizes that are used for scanning, so we know the problematic places. > > Something like the "root region scan" for marking (but we do not need to wait for it to complete during gc). > > I misunderstood the concurrent update part. Now I see that you are actually suggesting moving almost the entire BOT fixing work out of the gc pause. I think that might further reduce the pause time. If I understand correctly, concurrent refinement fixes part of the BOT, but there are some leftovers. Now that the leftovers are also moved to the concurrent phase. I think it's worth some experiment too! +1 The next step would then be formatting the "holes" within the live objects with dummy objects while we are already doing that, and then we could drop the second bitmap :) See also https://bugs.openjdk.java.net/browse/JDK-8210708 for the idea. Of course, a completely different issue... Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/5039 From tschatzl at openjdk.java.net Wed Aug 11 09:10:26 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 11 Aug 2021 09:10:26 GMT Subject: RFR: 8271884: G1CH::_expand_heap_after_alloc_failure is no longer needed [v2] In-Reply-To: References: Message-ID: On Wed, 11 Aug 2021 08:39:03 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to remove _expand_heap_after_alloc_failure. Checks for old_is_full(), survivor_is_full(), and G1CollectedHeap::has_more_regions() should eliminate repeated expansion attempts. >> >> Testing; tier 1 > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > check *_is_full status after acquiring FreeList_lock Lgtm, please add the suggested comments. src/hotspot/share/gc/g1/g1Allocator.cpp line 277: > 275: if (result == NULL && !old_is_full()) { > 276: MutexLocker x(FreeList_lock, Mutex::_no_safepoint_check_flag); > 277: if (!old_is_full()) { Please add a short comment about the reasons for the re-check here (and above) ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5030 From mli at openjdk.java.net Wed Aug 11 09:41:26 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 11 Aug 2021 09:41:26 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v2] In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 14:56:57 GMT, Thomas Schatzl wrote: >> Hamlin Li has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > We are currently trying to reproduce these improvements on aarch64 and x64. > > Also, please revert the change to the age calculation and file this separately. I think I remember the reason for incrementing the age the way it is done is/was that it is supposed to avoid some forced reloads of the mark word which may or may not be important. Either way it is completely unrelated to this change. @tschatzl Hi Thomas, may I ask how's test result on aarch64 and x64? ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From tschatzl at openjdk.java.net Wed Aug 11 10:18:27 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 11 Aug 2021 10:18:27 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v3] In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 02:15:55 GMT, Hamlin Li wrote: >> Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. >> >> After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. > > Hamlin Li has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Sorry for the wait, but actually my testing just finished today - I initially had a few issues (needing to redo a few times, and due to the results re-testing to get better results), but I could not see any statistically significant difference in either maxjops or criticaljops (or any other benchmark I have tested) with Oracle cloud instances with Ampere Altra (dual-socket) machines at this time. Also tried just running on a single socket. There is no discernible difference on x64 either. I will discuss this with others how to proceed today; personally it may be worth aligning the code with Parallel GC anyway. ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From iwalulya at openjdk.java.net Wed Aug 11 10:17:40 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 11 Aug 2021 10:17:40 GMT Subject: RFR: 8271884: G1CH::_expand_heap_after_alloc_failure is no longer needed [v3] In-Reply-To: References: Message-ID: > Hi, > > Please review this change to remove _expand_heap_after_alloc_failure. Checks for old_is_full(), survivor_is_full(), and G1CollectedHeap::has_more_regions() should eliminate repeated expansion attempts. > > Testing; tier 1 Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: add comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5030/files - new: https://git.openjdk.java.net/jdk/pull/5030/files/096e815f..aa72921e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5030&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5030&range=01-02 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5030.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5030/head:pull/5030 PR: https://git.openjdk.java.net/jdk/pull/5030 From ayang at openjdk.java.net Wed Aug 11 10:22:29 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 11 Aug 2021 10:22:29 GMT Subject: RFR: 8272216: G1: replace G1ParScanThreadState::_dest with a constant [v2] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 10:48:54 GMT, Albert Mingkun Yang wrote: >> Simple change of removing an array with statically-known identical elements. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5065 From ayang at openjdk.java.net Wed Aug 11 10:22:30 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 11 Aug 2021 10:22:30 GMT Subject: Integrated: 8272216: G1: replace G1ParScanThreadState::_dest with a constant In-Reply-To: References: Message-ID: <8TQ4El4gx423DmD_Ia7pK-Iv33KnAL64bUXZPFER1sc=.881c4685-38a3-4507-8981-287164e3b0b1@github.com> On Tue, 10 Aug 2021 10:10:45 GMT, Albert Mingkun Yang wrote: > Simple change of removing an array with statically-known identical elements. > > Test: hotspot_gc This pull request has now been integrated. Changeset: 0d0f2d07 Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/0d0f2d07f72cc709618e5e448d43be7704b1ac68 Stats: 18 lines in 2 files changed: 3 ins; 14 del; 1 mod 8272216: G1: replace G1ParScanThreadState::_dest with a constant Reviewed-by: kbarrett, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/5065 From tschatzl at openjdk.java.net Wed Aug 11 10:28:26 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 11 Aug 2021 10:28:26 GMT Subject: RFR: 8271884: G1CH::_expand_heap_after_alloc_failure is no longer needed [v3] In-Reply-To: References: Message-ID: On Wed, 11 Aug 2021 10:17:40 GMT, Ivan Walulya wrote: >> Hi, >> >> Please review this change to remove _expand_heap_after_alloc_failure. Checks for old_is_full(), survivor_is_full(), and G1CollectedHeap::has_more_regions() should eliminate repeated expansion attempts. >> >> Testing; tier 1 > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > add comments Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5030 From rkennke at redhat.com Wed Aug 11 11:04:09 2021 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 11 Aug 2021 13:04:09 +0200 Subject: Intentional heap bounds change? In-Reply-To: <2ba5e394-c705-af1b-481c-46e96ba14011@oracle.com> References: <475beb6a-8ca2-7849-bada-cb8288d78d47@redhat.com> <2ba5e394-c705-af1b-481c-46e96ba14011@oracle.com> Message-ID: <1fd77b1a-4ff3-67d6-3285-6c3fc3c2ffc6@redhat.com> Hi, thanks for the comments. I have an example that shows the 'problem': http://cr.openjdk.java.net/~rkennke/Example.java Reproduce it using: java -Xms1g -Xmx4g -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -Xlog:gc*=info,gc=debug,gc+heap*=trace,gc+ergo*=trace:file=gc.log:time,uptimemillis,level,tags Example 300 Check the heap log for heap size entries: grep "Pause Young" gc.log This yields the following: http://cr.openjdk.java.net/~rkennke/gc-heapsize.txt It shows that the heap is indeed shrinking well below -Xms, e.g. ~600MB instead of 1024MB here: GC(12) Pause Young (Allocation Failure) 559M->589M(592M) 4,797ms Thanks for your help! Roman > Hi, > > On 2021-08-05 16:21, Roman Kennke wrote: >> Hello GC devs, >> >> I see a case where heap may shrink below -Xms. This may be intentional >> or not. -Xms is initial heap size after all, not necessarily minimum >> heap size. However, there is also documentation that -Xms *does* >> provide the lower bounds for heap sizing policy, e.g. [1] > > The documentation around this didn't reflect the actual implementation > in HotSpot. As a part of recent cleanup and introduction of the > -XX:InitialHeapSize flag, we updated the documentation to reflect the > way -Xms is actually working: > > https://docs.oracle.com/en/java/javase/16/docs/specs/man/java.html > | > --- > -Xms| /size/ > > Sets the minimum and initial size (in bytes) of the heap. This value > must be a multiple of 1024 and greater than 1 MB. Append the letter > |k| or |K| to indicate kilobytes, |m| or |M| to indicate megabytes, > |g| or |G| to indicate gigabytes. The following examples show how to > set the size of allocated memory to 6 MB using various units: > > |-Xms6291456 -Xms6144k -Xms6m| > > Instead of the |-Xms| option to set both the minimum and initial > size of the heap, you can use |-XX:MinHeapSize| to set the minimum > size and |-XX:InitialHeapSize| to set the initial size." > > --- > > > Concretely, the code that parses -Xms sets both MinHeapSize and > InitialHeapSize: > > > } else if (match_option(option, "-Xms", &tail)) { > julong size = 0; > // an initial heap size of 0 means automatically determine > ArgsRange errcode = parse_memory_size(tail, &size, 0); > ... > if (FLAG_SET_CMDLINE(MinHeapSize, (size_t)size) != JVMFlag::SUCCESS) { > return JNI_EINVAL; > } > if (FLAG_SET_CMDLINE(InitialHeapSize, (size_t)size) != JVMFlag::SUCCESS) { > return JNI_EINVAL; > } > > >> >> I believe this is due to this change: >> >> https://github.com/openjdk/jdk/commit/1ed5b22d6e48ffbebaa53cf79df1c5c790aa9d71#diff-74a766b0aa16c688981a4d7b20da1c93315ce72358cac7158e9b50f82c9773a3L567 >> >> >> I.e. instead of adjusting old-gen minimum size with young-gen minimum >> size, it is set to _gen_alignment. >> >> Is this intentional? > > Hopefully StefanJ will be able to answer this. The code has this > comment, so maybe it is intentional? > > // Minimum sizes of the generations may be different than > // the initial sizes.? An inconsistency is permitted here > // in the total size that can be specified explicitly by > // command line specification of OldSize and NewSize and > // also a command line specification of -Xms.? Issue a warning > // but allow the values to pass. > > Do you have a command line that reproduces this problem? > > StefanK > >> >> Thanks, >> Roman >> >> >> [1] >> https://docs.oracle.com/en/java/javase/11/gctuning/factors-affecting-garbage-collection-performance.html#GUID-B0BFEFCB-F045-4105-BFA4-C97DE81DAC5B >> > From pliden at openjdk.java.net Wed Aug 11 11:09:34 2021 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 11 Aug 2021 11:09:34 GMT Subject: RFR: 8267186: Add string deduplication support to ZGC [v2] In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 14:23:25 GMT, Erik ?sterlund wrote: >> Per Liden has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments from Erik > > Looks good. Thanks @fisk, @kimbarrett and @stefank for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/5029 From pliden at openjdk.java.net Wed Aug 11 11:09:37 2021 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 11 Aug 2021 11:09:37 GMT Subject: Integrated: 8267186: Add string deduplication support to ZGC In-Reply-To: References: Message-ID: <-oQYtnrR0yJ5sW9iAMZhTreSDqW2IyXvkLIMebcIY9Q=.467a9f1a-0ecd-4fca-bc84-2e0d172c0242@github.com> On Fri, 6 Aug 2021 10:03:05 GMT, Per Liden wrote: > This change adds support for string deduplication to ZGC. It's a pretty straight forward change, but to make reviewing even easier it is broken up into two commits: > > __ZGC: Introduce ZMarkContext__ > This commit just moves the `ZMarkCache` into the new `ZMarkContext` class, and we now pass a `ZMarkContext*` around instead of a `ZMarkCache*`. The `ZMarkContext` class is a more general container for worker-local data, which in the next commit will be extended to also include the `StringDedup::Requests` queue. > > __8267186: Add string deduplication support to ZGC__ > This commits adds the actual string dedup functionality and enables relevant tests. We use the `deduplication_requested` bit in the `String` object to filter out `Strings` we've already attempted to deduplicate. > > Testing: > - Passes all string dedup tests. > - Passes Tier1-7 with ZGC on Linux/x86_64 (with -XX:+UseStringDeduplication enabled by default to get better exposure). This pull request has now been integrated. Changeset: abebbe23 Author: Per Liden URL: https://git.openjdk.java.net/jdk/commit/abebbe2335a6dc9b12e5f271bf32cdc54f80b660 Stats: 255 lines in 11 files changed: 219 ins; 9 del; 27 mod 8267186: Add string deduplication support to ZGC Reviewed-by: eosterlund, kbarrett, stefank ------------- PR: https://git.openjdk.java.net/jdk/pull/5029 From pliden at openjdk.java.net Wed Aug 11 11:12:29 2021 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 11 Aug 2021 11:12:29 GMT Subject: RFR: 8271862: C2 intrinsic for Reference.refersTo() is often not used [v2] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 08:59:59 GMT, Per Liden wrote: >> While fixing JDK-8270626 it was discovered that the C2 intrinsic for `Reference.refersTo()` is not used as often as one would think. Instead C2 often prefers using the native implementation, which is much slower which defeats the purpose of having an intrinsic in the first place. >> >> The problem arise from having `refersTo0()` be virtual, native and @IntrinsicCandidate. This can be worked around by introducing an virtual method layer and so that `refersTo0()` can be made `final`. That's what this patch does, by adding a virtual `refersToImpl()` which in turn calls the now final `refersTo0()`. >> >> It should be noted that `Object.clone()` and `Object.hashCode()` are also virtual, native and @IntrinsicCandidate and these methods get special treatment by C2 to get the intrinsic generated more optimally. We might want to do a similar optimization for `Reference.refersTo0()`. In that case, it is suggested that such an improvement could come later and be handled as a separate enhancement, and @kimbarrett has already filed JDK-8272140 to track that. >> >> Testing: >> I found this hard to test without instrumenting Hotspot to tell me that the intrinsic was called instead of the native method. So, for that reason testing was ad-hoc, using an instrumented Hotspot in combination with the test (`vmTestbase/gc/gctests/PhantomReference/phantom002/TestDescription.java`) that initially uncovered the problem. See JDK-8270626 for additional information. > > Per Liden has updated the pull request incrementally with one additional commit since the last revision: > > Private implies final Thanks all for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/5052 From pliden at openjdk.java.net Wed Aug 11 11:12:29 2021 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 11 Aug 2021 11:12:29 GMT Subject: Integrated: 8271862: C2 intrinsic for Reference.refersTo() is often not used In-Reply-To: References: Message-ID: On Mon, 9 Aug 2021 13:13:33 GMT, Per Liden wrote: > While fixing JDK-8270626 it was discovered that the C2 intrinsic for `Reference.refersTo()` is not used as often as one would think. Instead C2 often prefers using the native implementation, which is much slower which defeats the purpose of having an intrinsic in the first place. > > The problem arise from having `refersTo0()` be virtual, native and @IntrinsicCandidate. This can be worked around by introducing an virtual method layer and so that `refersTo0()` can be made `final`. That's what this patch does, by adding a virtual `refersToImpl()` which in turn calls the now final `refersTo0()`. > > It should be noted that `Object.clone()` and `Object.hashCode()` are also virtual, native and @IntrinsicCandidate and these methods get special treatment by C2 to get the intrinsic generated more optimally. We might want to do a similar optimization for `Reference.refersTo0()`. In that case, it is suggested that such an improvement could come later and be handled as a separate enhancement, and @kimbarrett has already filed JDK-8272140 to track that. > > Testing: > I found this hard to test without instrumenting Hotspot to tell me that the intrinsic was called instead of the native method. So, for that reason testing was ad-hoc, using an instrumented Hotspot in combination with the test (`vmTestbase/gc/gctests/PhantomReference/phantom002/TestDescription.java`) that initially uncovered the problem. See JDK-8270626 for additional information. This pull request has now been integrated. Changeset: 3f723ca4 Author: Per Liden URL: https://git.openjdk.java.net/jdk/commit/3f723ca4577b9cffeb6153ee386edd75f1dfb1c6 Stats: 14 lines in 2 files changed: 11 ins; 0 del; 3 mod 8271862: C2 intrinsic for Reference.refersTo() is often not used Reviewed-by: kbarrett, mchung ------------- PR: https://git.openjdk.java.net/jdk/pull/5052 From mli at openjdk.java.net Wed Aug 11 11:18:25 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 11 Aug 2021 11:18:25 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v3] In-Reply-To: References: Message-ID: <-MLyllwdSDXZHH-TUFnSpxNnBPX9DtBl97qiRDAAheg=.258596c7-7678-4f53-b834-c4587e7403a8@github.com> On Fri, 6 Aug 2021 02:15:55 GMT, Hamlin Li wrote: >> Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. >> >> After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. > > Hamlin Li has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Thanks for the feedback. Please kindly let me know your discussion result. ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From iwalulya at openjdk.java.net Wed Aug 11 11:59:29 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 11 Aug 2021 11:59:29 GMT Subject: RFR: 8271884: G1CH::_expand_heap_after_alloc_failure is no longer needed [v3] In-Reply-To: <8nT1dj4jgEGLG_kWJfRvwVe20Zmv3-pWRKu0IfBx8AM=.cd66254e-9cc2-454c-bdef-ca87b37456b9@github.com> References: <8nT1dj4jgEGLG_kWJfRvwVe20Zmv3-pWRKu0IfBx8AM=.cd66254e-9cc2-454c-bdef-ca87b37456b9@github.com> Message-ID: On Mon, 9 Aug 2021 04:35:49 GMT, Kim Barrett wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> add comments > > Change looks good. > > Testing should include verifying there isn't large amounts of output when > the appropriate logging is enabled and there are expansion failures. I filed > the RFE based on code-examination; it would be a pain if I missed something > and this flag actually is still needed. Thanks @kimbarrett and @tschatzl for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/5030 From iwalulya at openjdk.java.net Wed Aug 11 11:59:30 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 11 Aug 2021 11:59:30 GMT Subject: Integrated: 8271884: G1CH::_expand_heap_after_alloc_failure is no longer needed In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 10:43:19 GMT, Ivan Walulya wrote: > Hi, > > Please review this change to remove _expand_heap_after_alloc_failure. Checks for old_is_full(), survivor_is_full(), and G1CollectedHeap::has_more_regions() should eliminate repeated expansion attempts. > > Testing; tier 1 This pull request has now been integrated. Changeset: cd1751c3 Author: Ivan Walulya URL: https://git.openjdk.java.net/jdk/commit/cd1751c34e974683f3d2734c8ad5823a6ea27295 Stats: 35 lines in 3 files changed: 10 ins; 13 del; 12 mod 8271884: G1CH::_expand_heap_after_alloc_failure is no longer needed Reviewed-by: kbarrett, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/5030 From iwalulya at openjdk.java.net Wed Aug 11 14:04:50 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 11 Aug 2021 14:04:50 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC Message-ID: Hi all, Please review this change to add string deduplication support to ParallelGC. Testing: All string deduplication tests, and Tier 1-5. ------------- Commit messages: - 8267185: Add string deduplication support to ParallelGC Changes: https://git.openjdk.java.net/jdk/pull/5085/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5085&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267185 Stats: 219 lines in 18 files changed: 218 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5085.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5085/head:pull/5085 PR: https://git.openjdk.java.net/jdk/pull/5085 From tschatzl at openjdk.java.net Wed Aug 11 14:36:28 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 11 Aug 2021 14:36:28 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v3] In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 02:15:55 GMT, Hamlin Li wrote: >> Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. >> >> After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. > > Hamlin Li has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. So we talked about this internally, and standalone it does not seem to be worth adding given that the gain you report is not reproducable on multiple (current) systems, and together we were not able to come up with an explanation why this suggested order of copy/cas would be better. However it also enables further code improvements by simplifying (mark word) code. So we think that if this change and the mark word change suggested earlier together are at least performance-neutral (which is likely imo), we are going to take in both changes. I'll start some perf runs with both changes applied (I'll just manually re-apply the mark word change you suggested earlier). Hth, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From iwalulya at openjdk.java.net Wed Aug 11 14:36:29 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 11 Aug 2021 14:36:29 GMT Subject: RFR: 8272228: G1: G1CardSetInlinePtr Fix tautological assertion In-Reply-To: <6PqvNJRDN-xtgODHgovcYnot_s6lnC7ox4lZkDtMD-s=.d10dc856-c528-4344-9bba-331af1cf6fc8@github.com> References: <6PqvNJRDN-xtgODHgovcYnot_s6lnC7ox4lZkDtMD-s=.d10dc856-c528-4344-9bba-331af1cf6fc8@github.com> Message-ID: <3UUEPDGo9WfVh5OnyA2AU81bKAArmTwgbMbfsavTeE0=.c1265526-cb42-4532-93a1-b7e6a7c07478@github.com> On Tue, 10 Aug 2021 14:07:41 GMT, Albert Mingkun Yang wrote: >> Hi all, >> >> Please review this cleanup to remove unreliable asserts in G1CardSetInlinePtr. > > I think the intention here is sth like: `assert((_value & CardSetPtrTypeMask) == CardSetInlinePtr)`. Thanks @albertnetymk and @tschatzl for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/5069 From iwalulya at openjdk.java.net Wed Aug 11 14:36:29 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 11 Aug 2021 14:36:29 GMT Subject: Integrated: 8272228: G1: G1CardSetInlinePtr Fix tautological assertion In-Reply-To: References: Message-ID: <0HhYVYgVmG4X5WKXrS_PevFyKy1W1b_m0Tavf7NZSHo=.7eeb0b27-f3ad-4663-ac97-fc5e0bcc635a@github.com> On Tue, 10 Aug 2021 13:50:42 GMT, Ivan Walulya wrote: > Hi all, > > Please review this cleanup to remove unreliable asserts in G1CardSetInlinePtr. This pull request has now been integrated. Changeset: 61942276 Author: Ivan Walulya URL: https://git.openjdk.java.net/jdk/commit/619422764d55875c1b9687ae7e9ce4dc99b71bf9 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8272228: G1: G1CardSetInlinePtr Fix tautological assertion Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/5069 From tschatzl at openjdk.java.net Wed Aug 11 14:43:25 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 11 Aug 2021 14:43:25 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v3] In-Reply-To: References: Message-ID: <3MKANDyGNWto-KRVbSqKUWet7-F_AL1vhywi4IMT1uo=.eb62071c-2f09-454c-8d80-5d140dc6415b@github.com> On Fri, 6 Aug 2021 02:15:55 GMT, Hamlin Li wrote: >> Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. >> >> After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. > > Hamlin Li has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. This is the code I'm using: https://github.com/openjdk/jdk/compare/master...tschatzl:mark-word-cleanup-copy-cas?expand=1 ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From ayang at openjdk.java.net Wed Aug 11 14:46:26 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 11 Aug 2021 14:46:26 GMT Subject: RFR: 8272138: ZGC: Adopt relaxed ordering for self-healing [v5] In-Reply-To: References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> Message-ID: <1Na0dfqrX2J-U92sd0RoX0p-0QXnA2o6Wyc1TELzw_Q=.6e9a9ec5-9934-4749-944c-09906e01ba6e@github.com> On Wed, 11 Aug 2021 03:47:59 GMT, Hao Tang wrote: >> Xiaowei Lu has updated the pull request incrementally with one additional commit since the last revision: >> >> replace release() to a single place in insert() > > src/hotspot/share/gc/z/zForwarding.inline.hpp line 141: > >> 139: // Make sure that object copy is finished >> 140: // before forwarding table installation >> 141: OrderAccess::release(); > > Can we replace it by `OrderAccess::storestore()`? I think LoadStore constraint in `OrderAccess::release()` is not required here. > Furthermore, is it possible to relax `Atomic::load_acquire` in `ZForwarding::at`? I believe further relaxing is possible, but it's unclear whether there will be observable perf improvement. Could be interesting to see some numbers before drawing any conclusion. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From zgu at openjdk.java.net Thu Aug 12 00:20:37 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 12 Aug 2021 00:20:37 GMT Subject: RFR: 8272327: Shenandoah: Avoid enqueuing duplicate string candidates Message-ID: Now, there is a new API java_lang_String::test_and_set_deduplication_requested() that can help to identify if a string candidate has been enqueued, so that we can avoid enqueuing duplicate entries. Test: - [x] hotspot_gc_shenandoah ------------- Commit messages: - 8272327: Shenandoah: Avoid enqueuing duplicate string candidates Changes: https://git.openjdk.java.net/jdk/pull/5096/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5096&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272327 Stats: 10 lines in 3 files changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5096.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5096/head:pull/5096 PR: https://git.openjdk.java.net/jdk/pull/5096 From mli at openjdk.java.net Thu Aug 12 01:16:25 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 12 Aug 2021 01:16:25 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v3] In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 02:15:55 GMT, Hamlin Li wrote: >> Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. >> >> After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. > > Hamlin Li has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Thanks Thomas, then I'll wait for your perf result. ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From stefan.karlsson at oracle.com Thu Aug 12 11:16:56 2021 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 12 Aug 2021 13:16:56 +0200 Subject: Intentional heap bounds change? In-Reply-To: <1fd77b1a-4ff3-67d6-3285-6c3fc3c2ffc6@redhat.com> References: <475beb6a-8ca2-7849-bada-cb8288d78d47@redhat.com> <2ba5e394-c705-af1b-481c-46e96ba14011@oracle.com> <1fd77b1a-4ff3-67d6-3285-6c3fc3c2ffc6@redhat.com> Message-ID: <7d3f0384-aad5-94de-9fda-ed2026b7d2f1@oracle.com> I logged a Bug for this: https://bugs.openjdk.java.net/browse/JDK-8272364 Thanks, StefanK On 2021-08-11 13:04, Roman Kennke wrote: > Hi, > > thanks for the comments. I have an example that shows the 'problem': > > http://cr.openjdk.java.net/~rkennke/Example.java > > Reproduce it using: > > java -Xms1g -Xmx4g -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 > -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 > -XX:AdaptiveSizePolicyWeight=90 > -Xlog:gc*=info,gc=debug,gc+heap*=trace,gc+ergo*=trace:file=gc.log:time,uptimemillis,level,tags > Example 300 > > Check the heap log for heap size entries: > > grep "Pause Young" gc.log > > This yields the following: > > http://cr.openjdk.java.net/~rkennke/gc-heapsize.txt > > It shows that the heap is indeed shrinking well below -Xms, e.g. > ~600MB instead of 1024MB here: > > GC(12) Pause Young (Allocation Failure) 559M->589M(592M) 4,797ms > > Thanks for your help! > Roman > > >> Hi, >> >> On 2021-08-05 16:21, Roman Kennke wrote: >>> Hello GC devs, >>> >>> I see a case where heap may shrink below -Xms. This may be >>> intentional or not. -Xms is initial heap size after all, not >>> necessarily minimum heap size. However, there is also documentation >>> that -Xms *does* provide the lower bounds for heap sizing policy, >>> e.g. [1] >> >> The documentation around this didn't reflect the actual >> implementation in HotSpot. As a part of recent cleanup and >> introduction of the -XX:InitialHeapSize flag, we updated the >> documentation to reflect the way -Xms is actually working: >> >> https://docs.oracle.com/en/java/javase/16/docs/specs/man/java.html >> | >> --- >> -Xms| /size/ >> >> ??? Sets the minimum and initial size (in bytes) of the heap. This value >> ??? must be a multiple of 1024 and greater than 1 MB. Append the letter >> ??? |k| or |K| to indicate kilobytes, |m| or |M| to indicate megabytes, >> ??? |g| or |G| to indicate gigabytes. The following examples show how to >> ??? set the size of allocated memory to 6 MB using various units: >> >> ??? |-Xms6291456 -Xms6144k -Xms6m| >> >> ??? Instead of the |-Xms| option to set both the minimum and initial >> ??? size of the heap, you can use |-XX:MinHeapSize| to set the minimum >> ??? size and |-XX:InitialHeapSize| to set the initial size." >> >> --- >> >> >> Concretely, the code that parses -Xms sets both MinHeapSize and >> InitialHeapSize: >> >> >> ???? } else if (match_option(option, "-Xms", &tail)) { >> ?????? julong size = 0; >> ?????? // an initial heap size of 0 means automatically determine >> ?????? ArgsRange errcode = parse_memory_size(tail, &size, 0); >> ... >> ?????? if (FLAG_SET_CMDLINE(MinHeapSize, (size_t)size) != >> JVMFlag::SUCCESS) { >> ???????? return JNI_EINVAL; >> ?????? } >> ?????? if (FLAG_SET_CMDLINE(InitialHeapSize, (size_t)size) != >> JVMFlag::SUCCESS) { >> ???????? return JNI_EINVAL; >> ?????? } >> >> >>> >>> I believe this is due to this change: >>> >>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/commit/1ed5b22d6e48ffbebaa53cf79df1c5c790aa9d71*diff-74a766b0aa16c688981a4d7b20da1c93315ce72358cac7158e9b50f82c9773a3L567__;Iw!!ACWV5N9M2RV99hQ!Ydh0taE2rXnnONFDOoIBlbg_dh6YcXqA_2JPPqZUMWcQQvWFah7azM-toyRUxGwfAbPx$ >>> >>> >>> I.e. instead of adjusting old-gen minimum size with young-gen >>> minimum size, it is set to _gen_alignment. >>> >>> Is this intentional? >> >> Hopefully StefanJ will be able to answer this. The code has this >> comment, so maybe it is intentional? >> >> // Minimum sizes of the generations may be different than >> // the initial sizes.? An inconsistency is permitted here >> // in the total size that can be specified explicitly by >> // command line specification of OldSize and NewSize and >> // also a command line specification of -Xms.? Issue a warning >> // but allow the values to pass. >> >> Do you have a command line that reproduces this problem? >> >> StefanK >> >>> >>> Thanks, >>> Roman >>> >>> >>> [1] >>> https://docs.oracle.com/en/java/javase/11/gctuning/factors-affecting-garbage-collection-performance.html#GUID-B0BFEFCB-F045-4105-BFA4-C97DE81DAC5B >>> >> > From tschatzl at openjdk.java.net Fri Aug 13 06:45:26 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 13 Aug 2021 06:45:26 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v3] In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 02:15:55 GMT, Hamlin Li wrote: >> Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. >> >> After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. > > Hamlin Li has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Lgtm; perf results with the mark word change (see earlier) look good, i.e. no particular perf changes. Please wait for a second review. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4983 From ayang at openjdk.java.net Fri Aug 13 09:06:38 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 13 Aug 2021 09:06:38 GMT Subject: RFR: 8272439: G1: add documentation to G1CardSetInlinePtr Message-ID: Add some documentation to `class G1CardSetInlinePtr`, and simplify a field initialization in `G1CardSetConfiguration`. Test: hotspot_gc ------------- Commit messages: - cardset Changes: https://git.openjdk.java.net/jdk/pull/5108/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5108&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272439 Stats: 34 lines in 3 files changed: 33 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5108.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5108/head:pull/5108 PR: https://git.openjdk.java.net/jdk/pull/5108 From github.com+7947546+tanghaoth90 at openjdk.java.net Fri Aug 13 09:33:27 2021 From: github.com+7947546+tanghaoth90 at openjdk.java.net (Hao Tang) Date: Fri, 13 Aug 2021 09:33:27 GMT Subject: RFR: 8272138: ZGC: Adopt relaxed ordering for self-healing [v5] In-Reply-To: <1Na0dfqrX2J-U92sd0RoX0p-0QXnA2o6Wyc1TELzw_Q=.6e9a9ec5-9934-4749-944c-09906e01ba6e@github.com> References: <_3RIbGP62hf5po0zd15yPQ-UB8YumCYCincY6yE4wrQ=.2897ea0f-c1ba-4d7a-b7a4-3d17e5ee15ec@github.com> <1Na0dfqrX2J-U92sd0RoX0p-0QXnA2o6Wyc1TELzw_Q=.6e9a9ec5-9934-4749-944c-09906e01ba6e@github.com> Message-ID: On Wed, 11 Aug 2021 14:43:46 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/z/zForwarding.inline.hpp line 141: >> >>> 139: // Make sure that object copy is finished >>> 140: // before forwarding table installation >>> 141: OrderAccess::release(); >> >> Can we replace it by `OrderAccess::storestore()`? I think LoadStore constraint in `OrderAccess::release()` is not required here. >> Furthermore, is it possible to relax `Atomic::load_acquire` in `ZForwarding::at`? > > I believe further relaxing is possible, but it's unclear whether there will be observable perf improvement. Could be interesting to see some numbers before drawing any conclusion. @albertnetymk Thanks for your suggestion. > Can we replace it by `OrderAccess::storestore()`? `OrderAccess::release()` has the same implementation as `OrderAccess::storestore()` on AArch64. Therefore, replacing `OrderAccess::release()` does not yield perf improvement. I found an interesting pull request related to StoreStore: https://github.com/openjdk/jdk/pull/427 (not integrated yet). I am not aware that this patch can make any improvement. > Furthermore, is it possible to relax `Atomic::load_acquire` in `ZForwarding::at`? My initial experiment suggests >5% concurrent mark time reduction using `Atomic::load` instead. I am still working on checking the correctness of the relaxation. ------------- PR: https://git.openjdk.java.net/jdk/pull/5046 From tschatzl at openjdk.java.net Fri Aug 13 10:19:21 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 13 Aug 2021 10:19:21 GMT Subject: RFR: 8272439: G1: add documentation to G1CardSetInlinePtr In-Reply-To: References: Message-ID: On Fri, 13 Aug 2021 08:57:26 GMT, Albert Mingkun Yang wrote: > Add some documentation to `class G1CardSetInlinePtr`, and simplify a field initialization in `G1CardSetConfiguration`. > > Test: hotspot_gc Some minor comments, but lgtm otherwise. src/hotspot/share/gc/g1/g1CardSet.hpp line 50: > 48: > 49: class G1CardSetConfiguration { > 50: // #bits required to hold the max card index; log(#cards_per_region) I would prefer a real sentence like `// [Holds] the number of bits required to cover the maximum card index for the regions covered by this card set.`. src/hotspot/share/gc/g1/g1CardSetContainers.hpp line 70: > 68: // |unused| ... | card_index1 | card_index0 |SSS00| > 69: // +------+ +---------------+--------------+-----+ > 70: Unnecessary extra newline ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5108 From ayang at openjdk.java.net Fri Aug 13 11:11:47 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 13 Aug 2021 11:11:47 GMT Subject: RFR: 8272439: G1: add documentation to G1CardSetInlinePtr [v2] In-Reply-To: References: Message-ID: > Add some documentation to `class G1CardSetInlinePtr`, and simplify a field initialization in `G1CardSetConfiguration`. > > Test: hotspot_gc Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5108/files - new: https://git.openjdk.java.net/jdk/pull/5108/files/e282d72d..1b441153 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5108&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5108&range=00-01 Stats: 4 lines in 2 files changed: 2 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5108.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5108/head:pull/5108 PR: https://git.openjdk.java.net/jdk/pull/5108 From ayang at openjdk.java.net Fri Aug 13 11:11:49 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 13 Aug 2021 11:11:49 GMT Subject: RFR: 8272439: G1: add documentation to G1CardSetInlinePtr [v2] In-Reply-To: References: Message-ID: On Fri, 13 Aug 2021 10:06:01 GMT, Thomas Schatzl wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/gc/g1/g1CardSet.hpp line 50: > >> 48: >> 49: class G1CardSetConfiguration { >> 50: // #bits required to hold the max card index; log(#cards_per_region) > > I would prefer a real sentence like `// [Holds] the number of bits required to cover the maximum card index for the regions covered by this card set.`. Fixed. > src/hotspot/share/gc/g1/g1CardSetContainers.hpp line 70: > >> 68: // |unused| ... | card_index1 | card_index0 |SSS00| >> 69: // +------+ +---------------+--------------+-----+ >> 70: > > Unnecessary extra newline Removed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5108 From iwalulya at openjdk.java.net Fri Aug 13 11:56:23 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Fri, 13 Aug 2021 11:56:23 GMT Subject: RFR: 8272161: Make evacuation failure data structures local to collection [v2] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 12:49:55 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I have reviews for this change that moves some evacuation (failure) related data structures into the per-thread evacuation context (`G1ParThreadScanState`) which removes some young collection related code from G1CollectedHeap? >> >> Testing: tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Preserved marks stacks must be allocated on C heap. Lgtm! ------------- Marked as reviewed by iwalulya (Committer). PR: https://git.openjdk.java.net/jdk/pull/5066 From ayang at openjdk.java.net Fri Aug 13 17:21:36 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 13 Aug 2021 17:21:36 GMT Subject: RFR: 8272461: G1: remove empty declaration of cleanup_after_scan_heap_roots Message-ID: Trivial change of removing an empty declaration. ------------- Commit messages: - trivial Changes: https://git.openjdk.java.net/jdk/pull/5114/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5114&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272461 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5114.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5114/head:pull/5114 PR: https://git.openjdk.java.net/jdk/pull/5114 From kbarrett at openjdk.java.net Sat Aug 14 00:05:21 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 14 Aug 2021 00:05:21 GMT Subject: RFR: 8272461: G1: remove empty declaration of cleanup_after_scan_heap_roots In-Reply-To: References: Message-ID: On Fri, 13 Aug 2021 17:12:47 GMT, Albert Mingkun Yang wrote: > Trivial change of removing an empty declaration. Looks good, and trivial. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5114 From kbarrett at openjdk.java.net Sat Aug 14 01:27:28 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 14 Aug 2021 01:27:28 GMT Subject: RFR: 8272439: G1: add documentation to G1CardSetInlinePtr [v2] In-Reply-To: References: Message-ID: On Fri, 13 Aug 2021 11:11:47 GMT, Albert Mingkun Yang wrote: >> Add some documentation to `class G1CardSetInlinePtr`, and simplify a field initialization in `G1CardSetConfiguration`. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5108 From ayang at openjdk.java.net Mon Aug 16 07:36:32 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 16 Aug 2021 07:36:32 GMT Subject: RFR: 8272439: G1: add documentation to G1CardSetInlinePtr [v2] In-Reply-To: References: Message-ID: On Fri, 13 Aug 2021 11:11:47 GMT, Albert Mingkun Yang wrote: >> Add some documentation to `class G1CardSetInlinePtr`, and simplify a field initialization in `G1CardSetConfiguration`. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5108 From ayang at openjdk.java.net Mon Aug 16 07:37:28 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 16 Aug 2021 07:37:28 GMT Subject: RFR: 8272461: G1: remove empty declaration of cleanup_after_scan_heap_roots In-Reply-To: References: Message-ID: On Fri, 13 Aug 2021 17:12:47 GMT, Albert Mingkun Yang wrote: > Trivial change of removing an empty declaration. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5114 From ayang at openjdk.java.net Mon Aug 16 07:37:29 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 16 Aug 2021 07:37:29 GMT Subject: Integrated: 8272461: G1: remove empty declaration of cleanup_after_scan_heap_roots In-Reply-To: References: Message-ID: <_hp8DGrDNrB16loR5_7Yy7wVrkFYkSCBUQQ_w2rEBvY=.21b49268-9670-4eb8-b9c9-d0ed30fbe399@github.com> On Fri, 13 Aug 2021 17:12:47 GMT, Albert Mingkun Yang wrote: > Trivial change of removing an empty declaration. This pull request has now been integrated. Changeset: 0209d9f3 Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/0209d9f382f09840c29ac34b27dd41d2c8676913 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod 8272461: G1: remove empty declaration of cleanup_after_scan_heap_roots Reviewed-by: kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/5114 From ayang at openjdk.java.net Mon Aug 16 07:39:31 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 16 Aug 2021 07:39:31 GMT Subject: Integrated: 8272439: G1: add documentation to G1CardSetInlinePtr In-Reply-To: References: Message-ID: <1yt1ocOgzAOzeHYSbhWyieWcs0bPgQeTFVgbJ2_pew4=.7cf66c23-9f75-457c-8c55-798346a40815@github.com> On Fri, 13 Aug 2021 08:57:26 GMT, Albert Mingkun Yang wrote: > Add some documentation to `class G1CardSetInlinePtr`, and simplify a field initialization in `G1CardSetConfiguration`. > > Test: hotspot_gc This pull request has now been integrated. Changeset: 7a5b37b8 Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/7a5b37b8ca64650a8d23d64013bf49c9f3a60e2c Stats: 35 lines in 3 files changed: 34 ins; 0 del; 1 mod 8272439: G1: add documentation to G1CardSetInlinePtr Co-authored-by: Thomas Schatzl Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/5108 From kbarrett at openjdk.java.net Mon Aug 16 07:56:31 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 16 Aug 2021 07:56:31 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC In-Reply-To: References: Message-ID: On Wed, 11 Aug 2021 13:56:09 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to add string deduplication support to ParallelGC. > > Testing: All string deduplication tests, and Tier 1-5. Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 143: > 141: > 142: void safepoint_synchronize_begin(); > 143: void safepoint_synchronize_end(); I see no callers of these new functions. src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 289: > 287: push_contents(new_obj); > 288: > 289: if (psStringDedup::is_candidate_from_evacuation(o->klass(), new_obj->age(), new_obj_is_tenured)) { I think `is_candidate_from_evacuation` should just take `new_obj` and `new_obj_is_tenured` as arguments, and get the klass and age there-in as needed. Maybe there's enough inlining and such that the compiler can optimize away unneeded accesses, but why not make the compiler's and the reader's job easier? src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 290: > 288: > 289: if (psStringDedup::is_candidate_from_evacuation(o->klass(), new_obj->age(), new_obj_is_tenured)) { > 290: if (!java_lang_String::test_and_set_deduplication_requested(o)) { Setting the flag on the from-object isn't useful; it needs to be on the to-object. But I don't think this is needed anyway. `is_candidate_from_evacuation` shouldn't generate duplicate requests. src/hotspot/share/gc/parallel/psStringDedup.hpp line 41: > 39: uint age, > 40: bool obj_is_tenured) { > 41: return StringDedup::is_enabled_string(klass) && It is better here to first check `StringDedup::is_enabled` and only get the klass if it's actually needed. G1 uses StringDedup::is_enabled_string because it happens to already have the klass in hand. That's not the case here; but the caller is getting the klass just for calling this function. src/hotspot/share/gc/shared/stringdedup/stringDedupConfig.cpp line 119: > 117: if (!UseStringDeduplication) { > 118: return true; > 119: } else if (UseEpsilonGC || UseSerialGC) { This change is a small land mine for the next new GC project. I'd prefer adding !UseParallelGC. Or rewrite to have an `else if (UseXXXGC || ...) { ... } else { // not supported }` structure. ------------- PR: https://git.openjdk.java.net/jdk/pull/5085 From ayang at openjdk.java.net Mon Aug 16 07:58:28 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 16 Aug 2021 07:58:28 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v3] In-Reply-To: References: Message-ID: On Fri, 6 Aug 2021 02:15:55 GMT, Hamlin Li wrote: >> Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. >> >> After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. > > Hamlin Li has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Marked as reviewed by ayang (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From kbarrett at openjdk.java.net Mon Aug 16 08:44:32 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 16 Aug 2021 08:44:32 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC In-Reply-To: References: Message-ID: On Wed, 11 Aug 2021 13:56:09 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to add string deduplication support to ParallelGC. > > Testing: All string deduplication tests, and Tier 1-5. Changes requested by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5085 From kbarrett at openjdk.java.net Mon Aug 16 08:44:33 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 16 Aug 2021 08:44:33 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC In-Reply-To: References: Message-ID: On Wed, 11 Aug 2021 15:11:57 GMT, Kim Barrett wrote: >> Hi all, >> >> Please review this change to add string deduplication support to ParallelGC. >> >> Testing: All string deduplication tests, and Tier 1-5. > > src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 143: > >> 141: >> 142: void safepoint_synchronize_begin(); >> 143: void safepoint_synchronize_end(); > > I see no callers of these new functions. Oh, never mind. These are overriding virtual function declarations. Please add `virtual`. The modern style would be to use `override`, but gcc warns if some but not all of a class's overridden virtual functions use `override`, and you probably don't want to do that update for this class as part of this change. ------------- PR: https://git.openjdk.java.net/jdk/pull/5085 From kbarrett at openjdk.java.net Mon Aug 16 08:55:25 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 16 Aug 2021 08:55:25 GMT Subject: RFR: 8267833: Improve G1CardSetInlinePtr::add() In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 13:02:26 GMT, Ivan Walulya wrote: > Hi all, > > Please review this cleanup change to reduce the number of comparisons made when adding entries to an inline pointer cardset container. We allow the find to restart from a previously known index. > > Testing: Tier 1-3 Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/g1/g1CardSetContainers.inline.hpp line 54: > 52: uint num_elems = num_cards_in(_value); > 53: if (num_elems > 0) { > 54: cur_idx = find(card_idx, bits_per_card, cur_idx, num_elems); I found the naming here a bit confusing, with `num_elems` here but `end_at` in the find function. Similarly in `contains`. src/hotspot/share/gc/g1/g1CardSetContainers.inline.hpp line 95: > 93: } > 94: > 95: inline bool G1CardSetInlinePtr::contains(uint card_idx, uint bits_per_card) { I think `contains` previously didn't use it's `card_idx` argument. Is it needed by any callers? Or was the previous non-use just a performance bug? src/hotspot/share/gc/g1/g1CardSetContainers.inline.hpp line 97: > 95: inline bool G1CardSetInlinePtr::contains(uint card_idx, uint bits_per_card) { > 96: uint num_elems = num_cards_in(_value); > 97: if(num_elems == 0) { Space after `if` ------------- PR: https://git.openjdk.java.net/jdk/pull/5067 From kbarrett at openjdk.java.net Mon Aug 16 09:10:24 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 16 Aug 2021 09:10:24 GMT Subject: RFR: 8267833: Improve G1CardSetInlinePtr::add() In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 13:02:26 GMT, Ivan Walulya wrote: > Hi all, > > Please review this cleanup change to reduce the number of comparisons made when adding entries to an inline pointer cardset container. We allow the find to restart from a previously known index. > > Testing: Tier 1-3 Changes requested by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5067 From kbarrett at openjdk.java.net Mon Aug 16 09:10:25 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 16 Aug 2021 09:10:25 GMT Subject: RFR: 8267833: Improve G1CardSetInlinePtr::add() In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 08:51:40 GMT, Kim Barrett wrote: >> Hi all, >> >> Please review this cleanup change to reduce the number of comparisons made when adding entries to an inline pointer cardset container. We allow the find to restart from a previously known index. >> >> Testing: Tier 1-3 > > src/hotspot/share/gc/g1/g1CardSetContainers.inline.hpp line 95: > >> 93: } >> 94: >> 95: inline bool G1CardSetInlinePtr::contains(uint card_idx, uint bits_per_card) { > > I think `contains` previously didn't use it's `card_idx` argument. Is it needed by any callers? Or was the previous non-use just a performance bug? D'oh! Of course it's used. It's the thing we're looking for. I just couldn't see it right there in front of me. Never mind. ------------- PR: https://git.openjdk.java.net/jdk/pull/5067 From rkennke at openjdk.java.net Mon Aug 16 09:17:28 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 16 Aug 2021 09:17:28 GMT Subject: RFR: 8272327: Shenandoah: Avoid enqueuing duplicate string candidates In-Reply-To: References: Message-ID: On Thu, 12 Aug 2021 00:12:14 GMT, Zhengyu Gu wrote: > Now, there is a new API java_lang_String::test_and_set_deduplication_requested() that can help to identify if a string candidate has been enqueued, so that we can avoid enqueuing duplicate entries. > > Test: > - [x] hotspot_gc_shenandoah Looks good to me, thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5096 From kbarrett at openjdk.java.net Mon Aug 16 09:19:26 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 16 Aug 2021 09:19:26 GMT Subject: RFR: 8272231 G1: Refactor G1CardSet::get_card_set to return G1CardSetHashTableValue* In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 15:11:02 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to return G1CardSetHashTableValue* from G1CardSet::get_card_set. Consequently, we do not have to use G1CardSet::get_or_add_card_set in instances where we only need to get the table entry (such as in G1CardSet::transfer_cards_in_howl). > > Testing: Tier 1-5. Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/g1/g1CardSet.cpp line 195: > 193: > 194: class G1CardSetHashTableFound : public StackObj { > 195: G1CardSetHashTableValue* _value = nullptr; Non-static data member initializers (http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2008/n2756.htm) have not (yet) been approved for use in HotSpot. Needs to be initialized in a constructor. src/hotspot/share/gc/g1/g1CardSet.cpp line 612: > 610: Atomic::add(&howling_array->_num_entries, diff, memory_order_relaxed); > 611: > 612: G1CardSetHashTableValue* table_entry = get_card_set(card_region); Maybe there should be an assert here that `table_entry` is non-null. ------------- PR: https://git.openjdk.java.net/jdk/pull/5072 From kbarrett at openjdk.java.net Mon Aug 16 09:24:28 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 16 Aug 2021 09:24:28 GMT Subject: RFR: 8272235: G1: update outdated code root fixup [v2] In-Reply-To: References: <2ZKpnNC6QMTJt0xrG99R33JLrZTdBQIrVN9oUW7r26o=.83c94abd-e8e0-45e2-9c3c-dfdc5f543fec@github.com> Message-ID: On Tue, 10 Aug 2021 20:46:57 GMT, Albert Mingkun Yang wrote: >> Simple change of updating outdated comments and certain renaming. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5073 From ayang at openjdk.java.net Mon Aug 16 09:46:33 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 16 Aug 2021 09:46:33 GMT Subject: RFR: 8272235: G1: update outdated code root fixup [v2] In-Reply-To: References: <2ZKpnNC6QMTJt0xrG99R33JLrZTdBQIrVN9oUW7r26o=.83c94abd-e8e0-45e2-9c3c-dfdc5f543fec@github.com> Message-ID: On Tue, 10 Aug 2021 20:46:57 GMT, Albert Mingkun Yang wrote: >> Simple change of updating outdated comments and certain renaming. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5073 From ayang at openjdk.java.net Mon Aug 16 09:46:33 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 16 Aug 2021 09:46:33 GMT Subject: Integrated: 8272235: G1: update outdated code root fixup In-Reply-To: <2ZKpnNC6QMTJt0xrG99R33JLrZTdBQIrVN9oUW7r26o=.83c94abd-e8e0-45e2-9c3c-dfdc5f543fec@github.com> References: <2ZKpnNC6QMTJt0xrG99R33JLrZTdBQIrVN9oUW7r26o=.83c94abd-e8e0-45e2-9c3c-dfdc5f543fec@github.com> Message-ID: On Tue, 10 Aug 2021 15:36:22 GMT, Albert Mingkun Yang wrote: > Simple change of updating outdated comments and certain renaming. > > Test: hotspot_gc This pull request has now been integrated. Changeset: 69cc588f Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/69cc588fce0aef3f6066f2ff313d5319b528d684 Stats: 19 lines in 4 files changed: 7 ins; 0 del; 12 mod 8272235: G1: update outdated code root fixup Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/5073 From iwalulya at openjdk.java.net Mon Aug 16 09:48:05 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 16 Aug 2021 09:48:05 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v2] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to add string deduplication support to ParallelGC. > > Testing: All string deduplication tests, and Tier 1-5. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Kim review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5085/files - new: https://git.openjdk.java.net/jdk/pull/5085/files/b1fe77ae..cd22f565 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5085&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5085&range=00-01 Stats: 22 lines in 5 files changed: 3 ins; 9 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/5085.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5085/head:pull/5085 PR: https://git.openjdk.java.net/jdk/pull/5085 From iwalulya at openjdk.java.net Mon Aug 16 10:03:52 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 16 Aug 2021 10:03:52 GMT Subject: RFR: 8267833: Improve G1CardSetInlinePtr::add() [v2] In-Reply-To: References: Message-ID: > Hi all, > > Please review this cleanup change to reduce the number of comparisons made when adding entries to an inline pointer cardset container. We allow the find to restart from a previously known index. > > Testing: Tier 1-3 Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Kim review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5067/files - new: https://git.openjdk.java.net/jdk/pull/5067/files/e271088a..abf1e63b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5067&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5067&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5067.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5067/head:pull/5067 PR: https://git.openjdk.java.net/jdk/pull/5067 From iwalulya at openjdk.java.net Mon Aug 16 10:03:53 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 16 Aug 2021 10:03:53 GMT Subject: RFR: 8267833: Improve G1CardSetInlinePtr::add() [v2] In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 08:09:25 GMT, Kim Barrett wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> Kim review > > src/hotspot/share/gc/g1/g1CardSetContainers.inline.hpp line 54: > >> 52: uint num_elems = num_cards_in(_value); >> 53: if (num_elems > 0) { >> 54: cur_idx = find(card_idx, bits_per_card, cur_idx, num_elems); > > I found the naming here a bit confusing, with `num_elems` here but `end_at` in the find function. Similarly in `contains`. Fixed, opted to use `num_elems` in find function too. ------------- PR: https://git.openjdk.java.net/jdk/pull/5067 From iwalulya at openjdk.java.net Mon Aug 16 10:25:53 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 16 Aug 2021 10:25:53 GMT Subject: RFR: 8272231 G1: Refactor G1CardSet::get_card_set to return G1CardSetHashTableValue* [v2] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to return G1CardSetHashTableValue* from G1CardSet::get_card_set. Consequently, we do not have to use G1CardSet::get_or_add_card_set in instances where we only need to get the table entry (such as in G1CardSet::transfer_cards_in_howl). > > Testing: Tier 1-5. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Kim review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5072/files - new: https://git.openjdk.java.net/jdk/pull/5072/files/1a0ffeed..02039d48 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5072&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5072&range=00-01 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5072.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5072/head:pull/5072 PR: https://git.openjdk.java.net/jdk/pull/5072 From iwalulya at openjdk.java.net Mon Aug 16 10:35:40 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 16 Aug 2021 10:35:40 GMT Subject: RFR: 8215118: Reduce HeapRegion to only have one member for index in CSet Message-ID: Hi all, Please review this change to maintain only a single member variable to indicate the region's index in a collection set: either the actual collection set for young regions or the optional regions array if the region is considered optional during a mixed collection. Testing: Tier 1-7 ------------- Commit messages: - 8215118: Reduce HeapRegion to only have one member for index in CSet Changes: https://git.openjdk.java.net/jdk/pull/5122/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5122&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8215118 Stats: 42 lines in 4 files changed: 25 ins; 3 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/5122.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5122/head:pull/5122 PR: https://git.openjdk.java.net/jdk/pull/5122 From kbarrett at openjdk.java.net Mon Aug 16 11:09:25 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 16 Aug 2021 11:09:25 GMT Subject: RFR: 8272231 G1: Refactor G1CardSet::get_card_set to return G1CardSetHashTableValue* [v2] In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 10:25:53 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to return G1CardSetHashTableValue* from G1CardSet::get_card_set. Consequently, we do not have to use G1CardSet::get_or_add_card_set in instances where we only need to get the table entry (such as in G1CardSet::transfer_cards_in_howl). >> >> Testing: Tier 1-5. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Kim review Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5072 From kbarrett at openjdk.java.net Mon Aug 16 11:10:27 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 16 Aug 2021 11:10:27 GMT Subject: RFR: 8267833: Improve G1CardSetInlinePtr::add() [v2] In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 10:03:52 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this cleanup change to reduce the number of comparisons made when adding entries to an inline pointer cardset container. We allow the find to restart from a previously known index. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Kim review Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5067 From kbarrett at openjdk.java.net Mon Aug 16 11:16:30 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 16 Aug 2021 11:16:30 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v2] In-Reply-To: References: Message-ID: <2pUeA601XJNPMguiEVhafhf3P9-MC8iObSsqdMvXKqo=.6422cdb4-1e9c-45c7-a244-63ecd21aba05@github.com> On Mon, 16 Aug 2021 09:48:05 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to add string deduplication support to ParallelGC. >> >> Testing: All string deduplication tests, and Tier 1-5. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Kim review Looks good. src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 290: > 288: > 289: if (psStringDedup::is_candidate_from_evacuation(new_obj, new_obj_is_tenured)) { > 290: _string_dedup_requests.add(o); Update indentation? Or is github not showing the whitespace-only change? ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5085 From iwalulya at openjdk.java.net Mon Aug 16 11:16:31 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 16 Aug 2021 11:16:31 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v2] In-Reply-To: <2pUeA601XJNPMguiEVhafhf3P9-MC8iObSsqdMvXKqo=.6422cdb4-1e9c-45c7-a244-63ecd21aba05@github.com> References: <2pUeA601XJNPMguiEVhafhf3P9-MC8iObSsqdMvXKqo=.6422cdb4-1e9c-45c7-a244-63ecd21aba05@github.com> Message-ID: <9i934Blv3AsPkbJwd9S-M5dF3uYor6R0r4tsSkZ18Gg=.d8ddad43-ef7e-412c-9099-a91cde3c7140@github.com> On Mon, 16 Aug 2021 11:10:12 GMT, Kim Barrett wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> Kim review > > src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 290: > >> 288: >> 289: if (psStringDedup::is_candidate_from_evacuation(new_obj, new_obj_is_tenured)) { >> 290: _string_dedup_requests.add(o); > > Update indentation? Or is github not showing the whitespace-only change? Need to update the indentation! Thanks ------------- PR: https://git.openjdk.java.net/jdk/pull/5085 From iwalulya at openjdk.java.net Mon Aug 16 11:44:19 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 16 Aug 2021 11:44:19 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v3] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to add string deduplication support to ParallelGC. > > Testing: All string deduplication tests, and Tier 1-5. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: fix indentation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5085/files - new: https://git.openjdk.java.net/jdk/pull/5085/files/cd22f565..ba73d853 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5085&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5085&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5085.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5085/head:pull/5085 PR: https://git.openjdk.java.net/jdk/pull/5085 From iwalulya at openjdk.java.net Mon Aug 16 13:12:31 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 16 Aug 2021 13:12:31 GMT Subject: RFR: 8267833: Improve G1CardSetInlinePtr::add() [v2] In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 11:07:47 GMT, Kim Barrett wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> Kim review > > Marked as reviewed by kbarrett (Reviewer). Thanks @kimbarrett and @tschatzl for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5067 From iwalulya at openjdk.java.net Mon Aug 16 13:12:32 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 16 Aug 2021 13:12:32 GMT Subject: Integrated: 8267833: Improve G1CardSetInlinePtr::add() In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 13:02:26 GMT, Ivan Walulya wrote: > Hi all, > > Please review this cleanup change to reduce the number of comparisons made when adding entries to an inline pointer cardset container. We allow the find to restart from a previously known index. > > Testing: Tier 1-3 This pull request has now been integrated. Changeset: 83d0e128 Author: Ivan Walulya URL: https://git.openjdk.java.net/jdk/commit/83d0e128e362822584bb51b00576cb754f44e58b Stats: 25 lines in 2 files changed: 18 ins; 1 del; 6 mod 8267833: Improve G1CardSetInlinePtr::add() Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/5067 From iwalulya at openjdk.java.net Mon Aug 16 13:13:32 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 16 Aug 2021 13:13:32 GMT Subject: Integrated: 8272231 G1: Refactor G1CardSet::get_card_set to return G1CardSetHashTableValue* In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 15:11:02 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to return G1CardSetHashTableValue* from G1CardSet::get_card_set. Consequently, we do not have to use G1CardSet::get_or_add_card_set in instances where we only need to get the table entry (such as in G1CardSet::transfer_cards_in_howl). > > Testing: Tier 1-5. This pull request has now been integrated. Changeset: 0a03481a Author: Ivan Walulya URL: https://git.openjdk.java.net/jdk/commit/0a03481a6566d59b21ea5f802cb1f0028531c9d8 Stats: 24 lines in 2 files changed: 9 ins; 2 del; 13 mod 8272231: G1: Refactor G1CardSet::get_card_set to return G1CardSetHashTableValue* Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/5072 From iwalulya at openjdk.java.net Mon Aug 16 13:13:31 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 16 Aug 2021 13:13:31 GMT Subject: RFR: 8272231 G1: Refactor G1CardSet::get_card_set to return G1CardSetHashTableValue* [v2] In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 11:06:01 GMT, Kim Barrett wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> Kim review > > Marked as reviewed by kbarrett (Reviewer). Thanks @kimbarrett and @tschatzl for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5072 From ayang at openjdk.java.net Mon Aug 16 13:15:43 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 16 Aug 2021 13:15:43 GMT Subject: RFR: 8272520: Inline GenericTaskQueue::initialize() to the constructor Message-ID: Simple refactoring of GenericTaskQueue initialization API. Test: tier1_gc ------------- Commit messages: - ctor-init Changes: https://git.openjdk.java.net/jdk/pull/5125/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5125&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272520 Stats: 22 lines in 10 files changed: 1 ins; 20 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5125.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5125/head:pull/5125 PR: https://git.openjdk.java.net/jdk/pull/5125 From ayang at openjdk.java.net Mon Aug 16 13:29:59 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 16 Aug 2021 13:29:59 GMT Subject: RFR: 8272521: Remove unused PSPromotionManager::_claimed_stack_breadth Message-ID: Trivial change of removing an unused field. Test: build ------------- Commit messages: - trivial Changes: https://git.openjdk.java.net/jdk/pull/5126/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5126&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272521 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5126.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5126/head:pull/5126 PR: https://git.openjdk.java.net/jdk/pull/5126 From kbarrett at openjdk.java.net Mon Aug 16 14:14:29 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 16 Aug 2021 14:14:29 GMT Subject: RFR: 8272521: Remove unused PSPromotionManager::_claimed_stack_breadth In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 13:21:36 GMT, Albert Mingkun Yang wrote: > Trivial change of removing an unused field. > > Test: build Looks good, and trivial. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5126 From kbarrett at openjdk.java.net Mon Aug 16 14:33:23 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 16 Aug 2021 14:33:23 GMT Subject: RFR: 8272520: Inline GenericTaskQueue::initialize() to the constructor In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 13:09:03 GMT, Albert Mingkun Yang wrote: > Simple refactoring of GenericTaskQueue initialization API. > > Test: tier1_gc Looks good. Just a couple minor nits. src/hotspot/share/gc/shared/taskqueue.inline.hpp line 53: > 51: template > 52: inline GenericTaskQueue::GenericTaskQueue() : _last_stolen_queue_id(InvalidQueueId), _seed(17 /* random number */) { > 53: assert(sizeof(Age) == sizeof(size_t), "Depends on this."); [pre-existing] This assert is over-constraining. The real requirement is static-asserted in the Age class. src/hotspot/share/gc/shared/taskqueue.inline.hpp line 54: > 52: inline GenericTaskQueue::GenericTaskQueue() : _last_stolen_queue_id(InvalidQueueId), _seed(17 /* random number */) { > 53: assert(sizeof(Age) == sizeof(size_t), "Depends on this."); > 54: _elems = ArrayAllocator::allocate(N, F); `_elems` could be initialized in the mem-initializer-list. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5125 From ayang at openjdk.java.net Mon Aug 16 14:46:53 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 16 Aug 2021 14:46:53 GMT Subject: RFR: 8272520: Inline GenericTaskQueue::initialize() to the constructor [v2] In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 14:28:37 GMT, Kim Barrett wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/gc/shared/taskqueue.inline.hpp line 53: > >> 51: template >> 52: inline GenericTaskQueue::GenericTaskQueue() : _last_stolen_queue_id(InvalidQueueId), _seed(17 /* random number */) { >> 53: assert(sizeof(Age) == sizeof(size_t), "Depends on this."); > > [pre-existing] This assert is over-constraining. The real requirement is static-asserted in the Age class. Removed. > src/hotspot/share/gc/shared/taskqueue.inline.hpp line 54: > >> 52: inline GenericTaskQueue::GenericTaskQueue() : _last_stolen_queue_id(InvalidQueueId), _seed(17 /* random number */) { >> 53: assert(sizeof(Age) == sizeof(size_t), "Depends on this."); >> 54: _elems = ArrayAllocator::allocate(N, F); > > `_elems` could be initialized in the mem-initializer-list. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5125 From ayang at openjdk.java.net Mon Aug 16 14:46:52 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 16 Aug 2021 14:46:52 GMT Subject: RFR: 8272520: Inline GenericTaskQueue::initialize() to the constructor [v2] In-Reply-To: References: Message-ID: > Simple refactoring of GenericTaskQueue initialization API. > > Test: tier1_gc Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5125/files - new: https://git.openjdk.java.net/jdk/pull/5125/files/70f8c7c1..5376e921 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5125&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5125&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/5125.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5125/head:pull/5125 PR: https://git.openjdk.java.net/jdk/pull/5125 From zgu at openjdk.java.net Tue Aug 17 00:38:28 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 17 Aug 2021 00:38:28 GMT Subject: Integrated: 8272327: Shenandoah: Avoid enqueuing duplicate string candidates In-Reply-To: References: Message-ID: <00vLItF5y9KgKb3g-eOhSLXTyyF95OwyLTUEwp37IYk=.653b291d-24d8-4d47-b3c8-4ffa37b5758b@github.com> On Thu, 12 Aug 2021 00:12:14 GMT, Zhengyu Gu wrote: > Now, there is a new API java_lang_String::test_and_set_deduplication_requested() that can help to identify if a string candidate has been enqueued, so that we can avoid enqueuing duplicate entries. > > Test: > - [x] hotspot_gc_shenandoah This pull request has now been integrated. Changeset: ee8bf10d Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/ee8bf10d321da8a261ff4eda705cef753b4a7014 Stats: 10 lines in 3 files changed: 8 ins; 0 del; 2 mod 8272327: Shenandoah: Avoid enqueuing duplicate string candidates Reviewed-by: rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/5096 From mli at openjdk.java.net Tue Aug 17 01:45:51 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Tue, 17 Aug 2021 01:45:51 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v4] In-Reply-To: References: Message-ID: > Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. > > After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Closes JDK-8272070: simplify age calculation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4983/files - new: https://git.openjdk.java.net/jdk/pull/4983/files/997fdc19..a3ebe710 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4983&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4983&range=02-03 Stats: 11 lines in 1 file changed: 0 ins; 10 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/4983.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4983/head:pull/4983 PR: https://git.openjdk.java.net/jdk/pull/4983 From kbarrett at openjdk.java.net Tue Aug 17 02:03:27 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 17 Aug 2021 02:03:27 GMT Subject: RFR: 8272520: Inline GenericTaskQueue::initialize() to the constructor [v2] In-Reply-To: References: Message-ID: <3_bUhZAWBbYbVX953szkIP8Ulmke0865odp6crbELB4=.cb721896-63a9-44d4-bdbd-3c48f1e16d18@github.com> On Mon, 16 Aug 2021 14:46:52 GMT, Albert Mingkun Yang wrote: >> Simple refactoring of GenericTaskQueue initialization API. >> >> Test: tier1_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Looks even better. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5125 From iwalulya at openjdk.java.net Tue Aug 17 08:34:27 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 17 Aug 2021 08:34:27 GMT Subject: RFR: 8272521: Remove unused PSPromotionManager::_claimed_stack_breadth In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 13:21:36 GMT, Albert Mingkun Yang wrote: > Trivial change of removing an unused field. > > Test: build Marked as reviewed by iwalulya (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5126 From iwalulya at openjdk.java.net Tue Aug 17 08:35:31 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 17 Aug 2021 08:35:31 GMT Subject: RFR: 8272520: Inline GenericTaskQueue::initialize() to the constructor [v2] In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 14:46:52 GMT, Albert Mingkun Yang wrote: >> Simple refactoring of GenericTaskQueue initialization API. >> >> Test: tier1_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by iwalulya (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5125 From ayang at openjdk.java.net Tue Aug 17 12:43:28 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 17 Aug 2021 12:43:28 GMT Subject: RFR: 8272521: Remove unused PSPromotionManager::_claimed_stack_breadth In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 13:21:36 GMT, Albert Mingkun Yang wrote: > Trivial change of removing an unused field. > > Test: build Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5126 From ayang at openjdk.java.net Tue Aug 17 12:46:31 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 17 Aug 2021 12:46:31 GMT Subject: Integrated: 8272521: Remove unused PSPromotionManager::_claimed_stack_breadth In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 13:21:36 GMT, Albert Mingkun Yang wrote: > Trivial change of removing an unused field. > > Test: build This pull request has now been integrated. Changeset: 2ed7b709 Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/2ed7b709a197f009632580b17e3b1df34c1ffeb7 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8272521: Remove unused PSPromotionManager::_claimed_stack_breadth Reviewed-by: kbarrett, iwalulya ------------- PR: https://git.openjdk.java.net/jdk/pull/5126 From ayang at openjdk.java.net Tue Aug 17 12:46:33 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 17 Aug 2021 12:46:33 GMT Subject: RFR: 8272520: Inline GenericTaskQueue::initialize() to the constructor [v2] In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 14:46:52 GMT, Albert Mingkun Yang wrote: >> Simple refactoring of GenericTaskQueue initialization API. >> >> Test: tier1_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5125 From ayang at openjdk.java.net Tue Aug 17 12:46:34 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 17 Aug 2021 12:46:34 GMT Subject: Integrated: 8272520: Inline GenericTaskQueue::initialize() to the constructor In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 13:09:03 GMT, Albert Mingkun Yang wrote: > Simple refactoring of GenericTaskQueue initialization API. > > Test: tier1_gc This pull request has now been integrated. Changeset: 2aaf7952 Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/2aaf795270eb07eb155df9a7f5e1d6901f09d8f0 Stats: 24 lines in 10 files changed: 1 ins; 20 del; 3 mod 8272520: Inline GenericTaskQueue::initialize() to the constructor Reviewed-by: kbarrett, iwalulya ------------- PR: https://git.openjdk.java.net/jdk/pull/5125 From iwalulya at openjdk.java.net Tue Aug 17 13:38:38 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 17 Aug 2021 13:38:38 GMT Subject: RFR: 8267894: Skip work for empty regions in G1 Full GC Message-ID: Hi all, Please review this change to skip doing work on regions that were empty at the beginning of the Full GC in phases that execute on a per-region basis. Testing: Tier 1-3. ------------- Commit messages: - 8267894: Skip work for empty regions in G1 Full GC Changes: https://git.openjdk.java.net/jdk/pull/5143/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5143&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267894 Stats: 26 lines in 5 files changed: 17 ins; 2 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/5143.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5143/head:pull/5143 PR: https://git.openjdk.java.net/jdk/pull/5143 From ayang at openjdk.java.net Tue Aug 17 14:45:46 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 17 Aug 2021 14:45:46 GMT Subject: RFR: 8272579: G1: remove unnecesary null check in G1ParScanThreadStateSet::flush Message-ID: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> Simple change of replacing a null-check with an assertion. Test: hotspot_gc ------------- Commit messages: - pss-null Changes: https://git.openjdk.java.net/jdk/pull/5145/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5145&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272579 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5145.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5145/head:pull/5145 PR: https://git.openjdk.java.net/jdk/pull/5145 From ayang at openjdk.java.net Tue Aug 17 16:19:35 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 17 Aug 2021 16:19:35 GMT Subject: RFR: 8272576: G1: Use more accurate integer type for collection set length Message-ID: <_qBesqCURhe18YG7VNpW-40vlY6B0GcpDOtTjHm1WMM=.a22ac9ee-c415-42dd-8761-ee05454664af@github.com> Simple change of using smaller integer type, `size_t` to `uint`. Also moved the bound-check assertion before array access. Test: hotspot_gc ------------- Commit messages: - uint-length Changes: https://git.openjdk.java.net/jdk/pull/5148/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5148&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272576 Stats: 10 lines in 2 files changed: 2 ins; 2 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5148.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5148/head:pull/5148 PR: https://git.openjdk.java.net/jdk/pull/5148 From zgu at openjdk.java.net Tue Aug 17 18:39:37 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 17 Aug 2021 18:39:37 GMT Subject: RFR: 8267188: gc/stringdedup/TestStringDeduplicationInterned.java fails with Shenandoah Message-ID: Shenandoah currently enqueues deduplication candidates when marks object gray. However, it can not do so when it scans roots, as it can potentially result lock rank inversion between stack watermark lock and dedup request buffer lock. As the result, it can not enqueue as many as candidates as it should be able to, I believe JDK-8271834 is due to the same problem. I purpose we switch to enqueue candidates when we mark object black. We are still not able to enqueue all candidates, only when they have displaced headers or have monitor inflating in progress. Upstream is working on removing displaced headers, we should revisit the logic afterward, or we can choose to deduplicate all string regardless their ages. Test: - [x] hotspot_gc_shenandoah - [x] tier1 with -XX:+UseShenandoahGC and -XX:+UseStringDeduplication ------------- Commit messages: - fix format - v0 - Merge branch 'master' into JDK-8267188-string-intern - v0 Changes: https://git.openjdk.java.net/jdk/pull/5151/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5151&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8267188 Stats: 112 lines in 8 files changed: 35 ins; 28 del; 49 mod Patch: https://git.openjdk.java.net/jdk/pull/5151.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5151/head:pull/5151 PR: https://git.openjdk.java.net/jdk/pull/5151 From iwalulya at openjdk.java.net Tue Aug 17 19:47:24 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 17 Aug 2021 19:47:24 GMT Subject: RFR: 8272579: G1: remove unnecesary null check in G1ParScanThreadStateSet::flush In-Reply-To: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> References: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> Message-ID: On Tue, 17 Aug 2021 14:38:09 GMT, Albert Mingkun Yang wrote: > Simple change of replacing a null-check with an assertion. > > Test: hotspot_gc Marked as reviewed by iwalulya (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5145 From kbarrett at openjdk.java.net Wed Aug 18 05:25:25 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 18 Aug 2021 05:25:25 GMT Subject: RFR: 8272579: G1: remove unnecesary null check in G1ParScanThreadStateSet::flush In-Reply-To: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> References: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> Message-ID: On Tue, 17 Aug 2021 14:38:09 GMT, Albert Mingkun Yang wrote: > Simple change of replacing a null-check with an assertion. > > Test: hotspot_gc Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5145 From mli at openjdk.java.net Wed Aug 18 06:00:26 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 18 Aug 2021 06:00:26 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v4] In-Reply-To: References: Message-ID: On Tue, 17 Aug 2021 01:45:51 GMT, Hamlin Li wrote: >> Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. >> >> After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Closes JDK-8272070: simplify age calculation Kindly reminder, I think I still need another reviewer? ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From github.com+3094961+buddyliao at openjdk.java.net Wed Aug 18 07:11:40 2021 From: github.com+3094961+buddyliao at openjdk.java.net (BuddyLiao) Date: Wed, 18 Aug 2021 07:11:40 GMT Subject: RFR: 8272611: Use parallel heap inspection when print classhisto info in gc log Message-ID: 8272611: Use parallel heap inspection when print classhisto info in gc log ------------- Commit messages: - 8272611: Use parallel heap inspection when print classhisto info in gc log - Merge branch 'openjdk:master' into master - 8266193: BasicJMapTest does not include testHistoParallel methods Changes: https://git.openjdk.java.net/jdk/pull/5156/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5156&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272611 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5156.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5156/head:pull/5156 PR: https://git.openjdk.java.net/jdk/pull/5156 From github.com+3094961+buddyliao at openjdk.java.net Wed Aug 18 07:31:43 2021 From: github.com+3094961+buddyliao at openjdk.java.net (BuddyLiao) Date: Wed, 18 Aug 2021 07:31:43 GMT Subject: RFR: 8272611: Use parallel heap inspection when print classhisto info in gc log [v2] In-Reply-To: References: Message-ID: > 8272611: Use parallel heap inspection when print classhisto info in gc log BuddyLiao has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5156/files - new: https://git.openjdk.java.net/jdk/pull/5156/files/a896a21b..ec63957f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5156&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5156&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5156.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5156/head:pull/5156 PR: https://git.openjdk.java.net/jdk/pull/5156 From github.com+3094961+buddyliao at openjdk.java.net Wed Aug 18 07:31:44 2021 From: github.com+3094961+buddyliao at openjdk.java.net (BuddyLiao) Date: Wed, 18 Aug 2021 07:31:44 GMT Subject: Withdrawn: 8272611: Use parallel heap inspection when print classhisto info in gc log In-Reply-To: References: Message-ID: On Wed, 18 Aug 2021 07:04:34 GMT, BuddyLiao wrote: > 8272611: Use parallel heap inspection when print classhisto info in gc log This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5156 From rbruno at gsd.inesc-id.pt Wed Aug 18 07:36:03 2021 From: rbruno at gsd.inesc-id.pt (Rodrigo Bruno) Date: Wed, 18 Aug 2021 08:36:03 +0100 Subject: Status of JEP-8204088/JDK-8236073 In-Reply-To: References: Message-ID: Hi Thomas, Hi Man, On Mon, 2 Aug 2021 at 16:49, Thomas Schatzl wrote: > Hi Man, > > On 27.07.21 01:39, Man Cao wrote: > > Hi all, > > > > I added the "g1-heap-resizing" label to the issues that have been > > brought up in the thread: > > https://bugs.openjdk.java.net/issues/?jql=labels+%3D+g1-heap-resizing > > . > > We agree that most of these issues should be resolved first, before > > further working on SoftMaxHeapSize or developing an external component > > to automatically set GCTimeRatio/GCCpuPercentage at runtime. > > Thanks. > > > > > Regarding CurrentMaxHeapSize, it is tracked in > > https://bugs.openjdk.java.net/browse/JDK-8204088 > > . It seems more > > independent of other g1-heap-resizing issues, so perhaps it could be > > implemented earlier. On the other hand, once most of the > > g1-heap-resizing issues are fixed, SoftMaxHeapSize should behave better. > > One main goal of those g1-heap-resizing issues is to fix the problem > > that "G1 expands the heap too aggressively". > > Fwiw, these two are fairly independent issues, and the main problem with > the proposed CurrentMaxHeapSize patch has been that the change is not > atomic in the sense that the changes simply modify the global heap size > variable without regard whether and when it is accessed. That may result > in fairly nasty races as different components of the GC may then have a > different notion of what the current max heap size actually is at the > moment. > That is true. However, we could easily do the update to CurrentMaxHeapSize inside a VMOperation to ensure an atomic change. This is not in the current patch but it is probably a simple change. Right? > > > > > I agree that CurrentMaxHeapSize can address several problems that cannot > > be solved by tuning SoftMaxHeapSize and/or GCTimeRatio/GCCpuPercentage. > > However, it may introduce other issues and does not satisfy all of our > > requirements. In particular: > > - In our experience, having a tight Xmx is prone to GC thrashing, which > > could make servers unresponsive and wasting CPU resources, and is worse > > than getting killed by a container manager due to running out of > > container RAM (because a killed process can be restarted easily). I > > suspect using CurrentMaxHeapSize instead of SoftMaxHeapSize will > > increase the chance of GC thrashing, when it is better to let the > > process run out of container RAM limit and get killed and restarted. > Agreed. If CurrentMaxHeapSize is used to decrease the size of the heap, then trashing could be an issue. However, it also gives you the option to increase the size of the heap if the current heap is leading to trashing. I understand that in some services it might be easier to restart the service. This is not the case for example in user applications where state might be lost after a restart. > > > > - We found that turning off CompressedOops could waste 30%-40% > > heap memory. This means many of our servers are better off keeping Xmx > > below 32G. Thus, many of our servers cannot directly use the strategy of > > setting Xmx to the maximum memory available, then dynamically adjust > > CurrentMaxHeapSize. Admittedly, this problem exists for both > > CurrentMaxHeapSize or SoftMaxHeapSize. We will probably need to develop > > a separate strategy to determine when to make Xmx go above 32G. It is > > also possible to set -XX:ObjectAlignmentInBytes=16, which can then use > > CompressedOops up to 64G of Xmx. > > > > We are definitely happy to experiment and compare automatically setting > > CurrentMaxHeapSize vs SoftMaxHeapSize. Again, SoftMaxHeapSize is not > > ready until most g1-heap-resizing issues are fixed. > > +1 > +1 > > Thomas > -- rodrigo-bruno.github.io From github.com+3094961+buddyliao at openjdk.java.net Wed Aug 18 07:48:33 2021 From: github.com+3094961+buddyliao at openjdk.java.net (BuddyLiao) Date: Wed, 18 Aug 2021 07:48:33 GMT Subject: RFR: 8272611: Use parallel heap inspection when print classhisto info in gc log Message-ID: I have built it on linux x86_64 and run test-tier1 --------- ### Progress - [x] Change must not contain extraneous whitespace - [x] Commit message must refer to an issue - [ ] Change must be properly reviewed ### Reviewing
Using git Checkout this PR locally: \ `$ git fetch https://git.openjdk.java.net/jdk pull/5158/head:pull/5158` \ `$ git checkout pull/5158` Update a local copy of the PR: \ `$ git checkout pull/5158` \ `$ git pull https://git.openjdk.java.net/jdk pull/5158/head`
Using Skara CLI tools Checkout this PR locally: \ `$ git pr checkout 5158` View PR using the GUI difftool: \ `$ git pr show -t 5158`
Using diff file Download this PR as a diff file: \ https://git.openjdk.java.net/jdk/pull/5158.diff
------------- Commit messages: - 8272611: Use parallel heap inspection when print classhisto info in gc log Changes: https://git.openjdk.java.net/jdk/pull/5158/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5158&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272611 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5158.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5158/head:pull/5158 PR: https://git.openjdk.java.net/jdk/pull/5158 From iwalulya at openjdk.java.net Wed Aug 18 10:32:25 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 18 Aug 2021 10:32:25 GMT Subject: RFR: 8272576: G1: Use more accurate integer type for collection set length In-Reply-To: <_qBesqCURhe18YG7VNpW-40vlY6B0GcpDOtTjHm1WMM=.a22ac9ee-c415-42dd-8761-ee05454664af@github.com> References: <_qBesqCURhe18YG7VNpW-40vlY6B0GcpDOtTjHm1WMM=.a22ac9ee-c415-42dd-8761-ee05454664af@github.com> Message-ID: On Tue, 17 Aug 2021 16:12:28 GMT, Albert Mingkun Yang wrote: > Simple change of using smaller integer type, `size_t` to `uint`. Also moved the bound-check assertion before array access. > > Test: hotspot_gc Marked as reviewed by iwalulya (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5148 From ayang at openjdk.java.net Wed Aug 18 10:40:47 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 18 Aug 2021 10:40:47 GMT Subject: RFR: 8272629: G1: simplify G1GCParPhaseTimesTracker constructor API Message-ID: Simple change of removing an always-true argument in `G1GCParPhaseTimesTracker`. Test: hotspot_gc ------------- Commit messages: - must-record Changes: https://git.openjdk.java.net/jdk/pull/5162/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5162&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272629 Stats: 9 lines in 3 files changed: 0 ins; 4 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/5162.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5162/head:pull/5162 PR: https://git.openjdk.java.net/jdk/pull/5162 From sjohanss at openjdk.java.net Wed Aug 18 11:23:25 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 18 Aug 2021 11:23:25 GMT Subject: RFR: 8267894: Skip work for empty regions in G1 Full GC In-Reply-To: References: Message-ID: <3cG75YzDeeDFWeDmgvnqISwE0oz4TDSQ0kGzPMpmue4=.8eff89c0-9399-45e8-b61d-14cfdb813f83@github.com> On Tue, 17 Aug 2021 13:29:59 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to skip doing work on regions that were empty at the beginning of the Full GC in phases that execute on a per-region basis. > > Testing: Tier 1-3. I think the change looks good in general, but maybe we should go with a "named" state for free regions. Either we change the name for the default value in `G1FullGCHeapRegionAttr` or we add a new state that we set for free regions. What do you think? ------------- PR: https://git.openjdk.java.net/jdk/pull/5143 From iwalulya at openjdk.java.net Wed Aug 18 11:30:26 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 18 Aug 2021 11:30:26 GMT Subject: RFR: 8267894: Skip work for empty regions in G1 Full GC In-Reply-To: <3cG75YzDeeDFWeDmgvnqISwE0oz4TDSQ0kGzPMpmue4=.8eff89c0-9399-45e8-b61d-14cfdb813f83@github.com> References: <3cG75YzDeeDFWeDmgvnqISwE0oz4TDSQ0kGzPMpmue4=.8eff89c0-9399-45e8-b61d-14cfdb813f83@github.com> Message-ID: On Wed, 18 Aug 2021 11:20:13 GMT, Stefan Johansson wrote: > I think the change looks good in general, but maybe we should go with a "named" state for free regions. Either we change the name for the default value in `G1FullGCHeapRegionAttr` or we add a new state that we set for free regions. What do you think? AFAIK, changing the default value to "Free" instead of "Invalid" would suffice, however, need to clarify on how that is handled by `assert(!is_invalid(obj), "not initialized yet");` which indicate some sort of transition from "Invalid" to "initialized". ------------- PR: https://git.openjdk.java.net/jdk/pull/5143 From sjohanss at openjdk.java.net Wed Aug 18 11:45:27 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 18 Aug 2021 11:45:27 GMT Subject: RFR: 8272629: G1: simplify G1GCParPhaseTimesTracker constructor API In-Reply-To: References: Message-ID: On Wed, 18 Aug 2021 10:31:29 GMT, Albert Mingkun Yang wrote: > Simple change of removing an always-true argument in `G1GCParPhaseTimesTracker`. > > Test: hotspot_gc >From what I can tell this argument is still needed since we can do multiple calls to `merge_heap_roots(false)` in `G1CollectedHeap::evacuate_optional_collection_set()`. So for the `G1GCPhaseTimes::OptMergeRS` phase we need to sometimes use `record_or_add_time_secs()` to avoid the assert/overwriting data. Or am I missing something? ------------- PR: https://git.openjdk.java.net/jdk/pull/5162 From sjohanss at openjdk.java.net Wed Aug 18 11:55:28 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 18 Aug 2021 11:55:28 GMT Subject: RFR: 8267894: Skip work for empty regions in G1 Full GC In-Reply-To: References: Message-ID: On Tue, 17 Aug 2021 13:29:59 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to skip doing work on regions that were empty at the beginning of the Full GC in phases that execute on a per-region basis. > > Testing: Tier 1-3. I think the text "not initialized yet" is a bit wrong. We should never call is_skip_marking() for objects in "Invalid" regions, because there can't be any objects there. Same goes for "Free" regions. So I would change the assert to something like: assert(!is_free(obj), "Should be no objects in free regions."); I agree that changing the default value should work. One benefit of using a new state could be to have a difference between free and not committed regions. But not sure it is worth the effort right now. ------------- PR: https://git.openjdk.java.net/jdk/pull/5143 From sjohanss at openjdk.java.net Wed Aug 18 12:01:30 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 18 Aug 2021 12:01:30 GMT Subject: RFR: 8272576: G1: Use more accurate integer type for collection set length In-Reply-To: <_qBesqCURhe18YG7VNpW-40vlY6B0GcpDOtTjHm1WMM=.a22ac9ee-c415-42dd-8761-ee05454664af@github.com> References: <_qBesqCURhe18YG7VNpW-40vlY6B0GcpDOtTjHm1WMM=.a22ac9ee-c415-42dd-8761-ee05454664af@github.com> Message-ID: On Tue, 17 Aug 2021 16:12:28 GMT, Albert Mingkun Yang wrote: > Simple change of using smaller integer type, `size_t` to `uint`. Also moved the bound-check assertion before array access. > > Test: hotspot_gc Looks good. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5148 From fmatte at openjdk.java.net Wed Aug 18 12:45:41 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Wed, 18 Aug 2021 12:45:41 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 Message-ID: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. > __ move(addr, tmp); However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. The fix is to check if it is is_oop() and call the mov appropriately. No issues found in local testing and Mach5 tier1-3 ------------- Commit messages: - 8272563: Possible assertion failure in CardTableBarrierSetC1 Changes: https://git.openjdk.java.net/jdk/pull/5164/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5164&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272563 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5164.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5164/head:pull/5164 PR: https://git.openjdk.java.net/jdk/pull/5164 From sjohanss at openjdk.java.net Wed Aug 18 12:51:24 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 18 Aug 2021 12:51:24 GMT Subject: RFR: 8272579: G1: remove unnecesary null check in G1ParScanThreadStateSet::flush In-Reply-To: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> References: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> Message-ID: On Tue, 17 Aug 2021 14:38:09 GMT, Albert Mingkun Yang wrote: > Simple change of replacing a null-check with an assertion. > > Test: hotspot_gc src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 561: > 559: for (uint worker_id = 0; worker_id < _n_workers; ++worker_id) { > 560: G1ParScanThreadState* pss = _states[worker_id]; > 561: assert(pss != nullptr, "must be initialized"); Is it guaranteed that all workers are initialized? As you state in the bug, they are lazily initialized, so I wonder if there are use cases where a worker would not get initialized during the collection. Something like, very little work and a lot of workers. If there is a guarantee here, does it hold below in `record_unused_optional_region()` as well? We are using the same construct with `continue`there. ------------- PR: https://git.openjdk.java.net/jdk/pull/5145 From ayang at openjdk.java.net Wed Aug 18 13:04:28 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 18 Aug 2021 13:04:28 GMT Subject: RFR: 8272579: G1: remove unnecesary null check in G1ParScanThreadStateSet::flush In-Reply-To: References: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> Message-ID: On Wed, 18 Aug 2021 12:48:20 GMT, Stefan Johansson wrote: >> Simple change of replacing a null-check with an assertion. >> >> Test: hotspot_gc > > src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 561: > >> 559: for (uint worker_id = 0; worker_id < _n_workers; ++worker_id) { >> 560: G1ParScanThreadState* pss = _states[worker_id]; >> 561: assert(pss != nullptr, "must be initialized"); > > Is it guaranteed that all workers are initialized? As you state in the bug, they are lazily initialized, so I wonder if there are use cases where a worker would not get initialized during the collection. Something like, very little work and a lot of workers. > > If there is a guarantee here, does it hold below in `record_unused_optional_region()` as well? We are using the same construct with `continue`there. In `G1CollectedHeap::evacuate_initial_collection_set`: G1ParScanThreadStateSet per_thread_states(this, &rdcqs, workers()->active_workers(), <---- setting up states for workers collection_set()->young_region_length(), collection_set()->optional_region_length()); ... evacuate_initial_collection_set(&per_thread_states, may_do_optional_evacuation); \ -- task_time = run_task_timed(&g1_par_task); <--- launching all workers Therefore, I think the assertion is safe. > does it hold below in record_unused_optional_region() as well? I think so. ------------- PR: https://git.openjdk.java.net/jdk/pull/5145 From sjohanss at openjdk.java.net Wed Aug 18 13:18:23 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 18 Aug 2021 13:18:23 GMT Subject: RFR: 8272579: G1: remove unnecesary null check in G1ParScanThreadStateSet::flush In-Reply-To: References: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> Message-ID: On Wed, 18 Aug 2021 13:01:00 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 561: >> >>> 559: for (uint worker_id = 0; worker_id < _n_workers; ++worker_id) { >>> 560: G1ParScanThreadState* pss = _states[worker_id]; >>> 561: assert(pss != nullptr, "must be initialized"); >> >> Is it guaranteed that all workers are initialized? As you state in the bug, they are lazily initialized, so I wonder if there are use cases where a worker would not get initialized during the collection. Something like, very little work and a lot of workers. >> >> If there is a guarantee here, does it hold below in `record_unused_optional_region()` as well? We are using the same construct with `continue`there. > > In `G1CollectedHeap::evacuate_initial_collection_set`: > > > G1ParScanThreadStateSet per_thread_states(this, > &rdcqs, > workers()->active_workers(), <---- setting up states for workers > collection_set()->young_region_length(), > collection_set()->optional_region_length()); > ... > evacuate_initial_collection_set(&per_thread_states, may_do_optional_evacuation); > \ > -- task_time = run_task_timed(&g1_par_task); <--- launching all workers > > Therefore, I think the assertion is safe. > >> does it hold below in record_unused_optional_region() as well? > > I think so. In that case I would change that one as well, but it please run some more testing to ensure it is safe. ------------- PR: https://git.openjdk.java.net/jdk/pull/5145 From iwalulya at openjdk.java.net Wed Aug 18 13:26:47 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 18 Aug 2021 13:26:47 GMT Subject: RFR: 8267894: Skip work for empty regions in G1 Full GC [v2] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to skip doing work on regions that were empty at the beginning of the Full GC in phases that execute on a per-region basis. > > Testing: Tier 1-3. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: changes proposeed by kstefanj review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5143/files - new: https://git.openjdk.java.net/jdk/pull/5143/files/ea9582fb..f196ea48 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5143&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5143&range=00-01 Stats: 19 lines in 6 files changed: 2 ins; 1 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/5143.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5143/head:pull/5143 PR: https://git.openjdk.java.net/jdk/pull/5143 From sjohanss at openjdk.java.net Wed Aug 18 13:39:26 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 18 Aug 2021 13:39:26 GMT Subject: RFR: 8267894: Skip work for empty regions in G1 Full GC [v2] In-Reply-To: References: Message-ID: On Wed, 18 Aug 2021 13:26:47 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to skip doing work on regions that were empty at the beginning of the Full GC in phases that execute on a per-region basis. >> >> Testing: Tier 1-3. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > changes proposeed by kstefanj review Looks good to me. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5143 From ddong at openjdk.java.net Wed Aug 18 14:14:54 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Wed, 18 Aug 2021 14:14:54 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC Message-ID: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Hi team, Please help review this change to add string deduplication support to SerialGC. Thanks, Denghui ------------- Commit messages: - fix - 8272609: Add string deduplication support to SerialGC Changes: https://git.openjdk.java.net/jdk/pull/5153/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272609 Stats: 210 lines in 17 files changed: 208 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5153.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5153/head:pull/5153 PR: https://git.openjdk.java.net/jdk/pull/5153 From ayang at openjdk.java.net Wed Aug 18 15:56:48 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 18 Aug 2021 15:56:48 GMT Subject: RFR: 8272579: G1: remove unnecesary null check in G1ParScanThreadStateSet::flush [v2] In-Reply-To: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> References: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> Message-ID: > Simple change of replacing a null-check with an assertion. > > Test: hotspot_gc Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5145/files - new: https://git.openjdk.java.net/jdk/pull/5145/files/301f0087..fdaf0b09 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5145&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5145&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5145.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5145/head:pull/5145 PR: https://git.openjdk.java.net/jdk/pull/5145 From ayang at openjdk.java.net Wed Aug 18 15:56:48 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 18 Aug 2021 15:56:48 GMT Subject: RFR: 8272579: G1: remove unnecesary null check in G1ParScanThreadStateSet::flush [v2] In-Reply-To: References: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> Message-ID: <7xX8F2InXeS1JFktqHhau2Pp9vZIXAktfdYQm7oefts=.aa0b10dc-ff78-4d03-8bd2-76427e5e3018@github.com> On Wed, 18 Aug 2021 13:15:53 GMT, Stefan Johansson wrote: >> In `G1CollectedHeap::evacuate_initial_collection_set`: >> >> >> G1ParScanThreadStateSet per_thread_states(this, >> &rdcqs, >> workers()->active_workers(), <---- setting up states for workers >> collection_set()->young_region_length(), >> collection_set()->optional_region_length()); >> ... >> evacuate_initial_collection_set(&per_thread_states, may_do_optional_evacuation); >> \ >> -- task_time = run_task_timed(&g1_par_task); <--- launching all workers >> >> Therefore, I think the assertion is safe. >> >>> does it hold below in record_unused_optional_region() as well? >> >> I think so. > > In that case I would change that one as well, but it please run some more testing to ensure it is safe. Changed; and tier1-3 pass. ------------- PR: https://git.openjdk.java.net/jdk/pull/5145 From ayang at openjdk.java.net Wed Aug 18 15:59:25 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 18 Aug 2021 15:59:25 GMT Subject: RFR: 8272579: G1: remove unnecesary null check for G1ParScanThreadStateSet::_states slots In-Reply-To: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> References: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> Message-ID: On Tue, 17 Aug 2021 14:38:09 GMT, Albert Mingkun Yang wrote: > Simple change of replacing a null-check with an assertion. > > Test: hotspot_gc I updated the JBS ticket description/title as well to reflect the new patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/5145 From rkennke at openjdk.java.net Wed Aug 18 18:25:28 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 18 Aug 2021 18:25:28 GMT Subject: RFR: 8266519: Cleanup resolve() leftovers from BarrierSet et al In-Reply-To: References: Message-ID: On Tue, 4 May 2021 16:38:30 GMT, Roman Kennke wrote: > Shenandoah used to require a way to resolve oops, but it's long unused the the corresponding code in the access machinery is obsolete. Let's remove it. > > Testing: > - [x] hotspot_gc_shenandoah > - [x] tier1 > - [x] tier2 Ping? Any takers? ------------- PR: https://git.openjdk.java.net/jdk/pull/3862 From kbarrett at openjdk.java.net Thu Aug 19 00:11:25 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 19 Aug 2021 00:11:25 GMT Subject: RFR: 8266519: Cleanup resolve() leftovers from BarrierSet et al In-Reply-To: References: Message-ID: On Wed, 18 Aug 2021 18:22:09 GMT, Roman Kennke wrote: > Ping? Any takers? The only email for this PR I have any record of is the auto-withdraw at the end of June. Weird! I'll have a look. ------------- PR: https://git.openjdk.java.net/jdk/pull/3862 From kbarrett at openjdk.java.net Thu Aug 19 00:20:25 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 19 Aug 2021 00:20:25 GMT Subject: RFR: 8266519: Cleanup resolve() leftovers from BarrierSet et al In-Reply-To: References: Message-ID: On Tue, 4 May 2021 16:38:30 GMT, Roman Kennke wrote: > Shenandoah used to require a way to resolve oops, but it's long unused the the corresponding code in the access machinery is obsolete. Let's remove it. > > Testing: > - [x] hotspot_gc_shenandoah > - [x] tier1 > - [x] tier2 Looks good. The initial RFR email made it into the hotspot-dev mail.openjdk.java.net archive. But it doesn't look like it made it to me for some reason. Was there some kind of email outage around that time? ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3862 From kbarrett at openjdk.java.net Thu Aug 19 00:30:22 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 19 Aug 2021 00:30:22 GMT Subject: RFR: 8272579: G1: remove unnecesary null check for G1ParScanThreadStateSet::_states slots [v2] In-Reply-To: References: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> Message-ID: <_JW6ruvbbwjEQHTMVQ7sOh3phqP_FLdFIkHrLQNIeK0=.6636197c-75de-422a-9fa5-a79d83f5c9bb@github.com> On Wed, 18 Aug 2021 15:56:48 GMT, Albert Mingkun Yang wrote: >> Simple change of replacing a null-check with an assertion. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Still looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5145 From kbarrett at openjdk.java.net Thu Aug 19 05:28:22 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 19 Aug 2021 05:28:22 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC In-Reply-To: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: <4_Yhnp0uKh5-1BfEt0gYnInlKAnevRxXEeFk-rS2eP8=.d9f82b6e-25c9-4310-b6d8-e299ea9678dc@github.com> On Wed, 18 Aug 2021 02:24:11 GMT, Denghui Dong wrote: > Hi team, > > Please help review this change to add string deduplication support to SerialGC. > > Thanks, > Denghui The changes mostly look okay. But it's not obvious that string deduplication support is useful for SerialGC. The kinds of applications SerialGC is often used for don't really benefit from string deduplication. Can someone suggest a good use-case? One possible benefit is that if Epsilon is the only collector not supporting string deduplication then we might be able to simplify the string deduplication tests, replacing the per-GC test descriptions with just one to require !Epsilon and otherwise use whatever the current GC happens to be. src/hotspot/share/gc/serial/markSweep.hpp line 116: > 114: static SerialOldTracer* _gc_tracer; > 115: > 116: static StringDedup::Requests _string_dedup_requests; Avoid non-local variables with non-trivial / not-constexpr constructors or destructors. Better would be a pointer to a Requests object that is initially NULL and assigned by an initialization function or per collection, as is done for some of the other nearby variables. Yes, there are also some variables here that violate this; that's not a reason to add to that collection. (Note that the Style Guide currently mistakenly refers to "namespace-scoped" rather than "non-local" variables - JDK-8272691.) src/hotspot/share/gc/shared/stringdedup/stringDedupConfig.cpp line 119: > 117: if (!UseStringDeduplication) { > 118: return true; > 119: } else if (!UseG1GC && !UseShenandoahGC && !UseZGC && !UseSerialGC) { This will, of course, have a merge conflict with JDK-8267185. ------------- Changes requested by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5153 From stefank at openjdk.java.net Thu Aug 19 07:21:23 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 19 Aug 2021 07:21:23 GMT Subject: RFR: 8270041: Consolidate oopDesc::cas_forward_to() and oopDesc::forward_to_atomic() In-Reply-To: <3xGT20phR8n05MxmosIRONZDUtn8T2WTL4uz5MO7FqE=.13e2c107-34ec-4f1d-ba51-f4d21021cea9@github.com> References: <3xGT20phR8n05MxmosIRONZDUtn8T2WTL4uz5MO7FqE=.13e2c107-34ec-4f1d-ba51-f4d21021cea9@github.com> Message-ID: On Wed, 18 Aug 2021 18:36:11 GMT, Roman Kennke wrote: > The two methods do exactly the same, except one returns a bool, the other a value. The only use of cas_forward_to() can be consolidated to use forward_to_atomic() instead. > > Testing: > - [ ] hotspot_gc > - [ ] tier1 > - [ ] tier2 Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5167 From stefank at openjdk.java.net Thu Aug 19 07:43:24 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 19 Aug 2021 07:43:24 GMT Subject: RFR: 8266519: Cleanup resolve() leftovers from BarrierSet et al In-Reply-To: References: Message-ID: On Tue, 4 May 2021 16:38:30 GMT, Roman Kennke wrote: > Shenandoah used to require a way to resolve oops, but it's long unused the the corresponding code in the access machinery is obsolete. Let's remove it. > > Testing: > - [x] hotspot_gc_shenandoah > - [x] tier1 > - [x] tier2 Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/3862 From kbarrett at openjdk.java.net Thu Aug 19 07:50:25 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 19 Aug 2021 07:50:25 GMT Subject: RFR: 8270041: Consolidate oopDesc::cas_forward_to() and oopDesc::forward_to_atomic() In-Reply-To: <3xGT20phR8n05MxmosIRONZDUtn8T2WTL4uz5MO7FqE=.13e2c107-34ec-4f1d-ba51-f4d21021cea9@github.com> References: <3xGT20phR8n05MxmosIRONZDUtn8T2WTL4uz5MO7FqE=.13e2c107-34ec-4f1d-ba51-f4d21021cea9@github.com> Message-ID: On Wed, 18 Aug 2021 18:36:11 GMT, Roman Kennke wrote: > The two methods do exactly the same, except one returns a bool, the other a value. The only use of cas_forward_to() can be consolidated to use forward_to_atomic() instead. > > Testing: > - [ ] hotspot_gc > - [ ] tier1 > - [ ] tier2 Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5167 From mli at openjdk.java.net Thu Aug 19 08:19:43 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 19 Aug 2021 08:19:43 GMT Subject: RFR: 8254739: G1: Optimize evacuation failure for regions with few failed objects Message-ID: This is a try to optimize evcuation failure for regions. I record every evacuation failure object per region (by G1EvacuationFailureObjsInHR), and then iterate (which indeed includes compact/sort/iteration) these objects directly in RemoveSelfForwardPtrHRClosure. I have tested it with following parameters, - -XX:+ParallelGCThreads=1/32/64 - -XX:G1EvacuationFailureALotInterval=1 - -XX:G1EvacuationFailureALotCount=2/10/100/1000/10000/100000 It saves "Remove Self Forwards" time all the time ,and in most condition it saves "Evacuate Collection Set" time. It brings some performance degradation when -XX:G1EvacuationFailureALotCount is low, such as *2*. To improve this a little, we can record the number evacuation failure object per region, and not record these objects when the number hit some limit. But I'm not sure if it's necessary to do so, as I think such condition is so extreme to be met in real environment, although I'm not quite sure. ------------- Commit messages: - fix: reset state after iteration - speed up iterate evac failure objs in one region. Changes: https://git.openjdk.java.net/jdk/pull/5181/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5181&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254739 Stats: 283 lines in 6 files changed: 241 ins; 31 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/5181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5181/head:pull/5181 PR: https://git.openjdk.java.net/jdk/pull/5181 From sjohanss at openjdk.java.net Thu Aug 19 08:59:23 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 19 Aug 2021 08:59:23 GMT Subject: RFR: 8272579: G1: remove unnecesary null check for G1ParScanThreadStateSet::_states slots [v2] In-Reply-To: References: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> Message-ID: <9w5L3PWGmFrknk6SwuG4MDYsPCYCM1Ht97xB0xym4yo=.9c2b857d-d69a-458d-9660-359f83d9cd40@github.com> On Wed, 18 Aug 2021 15:56:48 GMT, Albert Mingkun Yang wrote: >> Simple change of replacing a null-check with an assertion. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good now. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5145 From ddong at openjdk.java.net Thu Aug 19 09:14:55 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Thu, 19 Aug 2021 09:14:55 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v2] In-Reply-To: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: > Hi team, > > Please help review this change to add string deduplication support to SerialGC. > > Thanks, > Denghui Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: avoid non-local variables with non-trivial / not-constexpr constructors or destructors ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5153/files - new: https://git.openjdk.java.net/jdk/pull/5153/files/5b9a5536..d7a7c68a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=00-01 Stats: 6 lines in 5 files changed: 1 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/5153.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5153/head:pull/5153 PR: https://git.openjdk.java.net/jdk/pull/5153 From ddong at openjdk.java.net Thu Aug 19 09:18:22 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Thu, 19 Aug 2021 09:18:22 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v2] In-Reply-To: <4_Yhnp0uKh5-1BfEt0gYnInlKAnevRxXEeFk-rS2eP8=.d9f82b6e-25c9-4310-b6d8-e299ea9678dc@github.com> References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> <4_Yhnp0uKh5-1BfEt0gYnInlKAnevRxXEeFk-rS2eP8=.d9f82b6e-25c9-4310-b6d8-e299ea9678dc@github.com> Message-ID: On Thu, 19 Aug 2021 05:25:03 GMT, Kim Barrett wrote: > > But it's not obvious that string deduplication support is useful for SerialGC. The kinds of applications SerialGC is often used for don't really benefit from string deduplication. Can someone suggest a good use-case? > In some serverless applications (such as AWS Lambda) scenarios, SerialGC is generally used as the default GC algorithm. If there are many string constructions in the application, I think string deduplication may be beneficial. > src/hotspot/share/gc/serial/markSweep.hpp line 116: > >> 114: static SerialOldTracer* _gc_tracer; >> 115: >> 116: static StringDedup::Requests _string_dedup_requests; > > Avoid non-local variables with non-trivial / not-constexpr constructors or destructors. Better would be a pointer to a Requests object that is initially NULL and assigned by an initialization function or per collection, as is done for some of the other nearby variables. Yes, there are also some variables here that violate this; that's not a reason to add to that collection. (Note that the Style Guide currently mistakenly refers to "namespace-scoped" rather than "non-local" variables - JDK-8272691.) Fixed. I made StringDedup::Requests extends ResourceObj since global operators new and delete is not allowed in Hotspot. Thanks ------------- PR: https://git.openjdk.java.net/jdk/pull/5153 From ayang at openjdk.java.net Thu Aug 19 09:59:31 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 19 Aug 2021 09:59:31 GMT Subject: RFR: 8272579: G1: remove unnecesary null check for G1ParScanThreadStateSet::_states slots [v2] In-Reply-To: References: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> Message-ID: On Wed, 18 Aug 2021 15:56:48 GMT, Albert Mingkun Yang wrote: >> Simple change of replacing a null-check with an assertion. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5145 From ayang at openjdk.java.net Thu Aug 19 09:59:32 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 19 Aug 2021 09:59:32 GMT Subject: Integrated: 8272579: G1: remove unnecesary null check for G1ParScanThreadStateSet::_states slots In-Reply-To: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> References: <4EJqMhv6zlk8OXfsMOHA5mxZVLJcHQf3Y7lZXneZcms=.f6d33cb9-9ba9-456b-a404-2785ba0cb896@github.com> Message-ID: On Tue, 17 Aug 2021 14:38:09 GMT, Albert Mingkun Yang wrote: > Simple change of replacing a null-check with an assertion. > > Test: hotspot_gc This pull request has now been integrated. Changeset: 82b2f21d Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/82b2f21d17828546f154cf31d174e26d0944530f Stats: 8 lines in 1 file changed: 0 ins; 6 del; 2 mod 8272579: G1: remove unnecesary null check for G1ParScanThreadStateSet::_states slots Reviewed-by: iwalulya, kbarrett, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/5145 From ayang at openjdk.java.net Thu Aug 19 10:00:31 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 19 Aug 2021 10:00:31 GMT Subject: RFR: 8272576: G1: Use more accurate integer type for collection set length In-Reply-To: <_qBesqCURhe18YG7VNpW-40vlY6B0GcpDOtTjHm1WMM=.a22ac9ee-c415-42dd-8761-ee05454664af@github.com> References: <_qBesqCURhe18YG7VNpW-40vlY6B0GcpDOtTjHm1WMM=.a22ac9ee-c415-42dd-8761-ee05454664af@github.com> Message-ID: On Tue, 17 Aug 2021 16:12:28 GMT, Albert Mingkun Yang wrote: > Simple change of using smaller integer type, `size_t` to `uint`. Also moved the bound-check assertion before array access. > > Test: hotspot_gc Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5148 From ayang at openjdk.java.net Thu Aug 19 10:00:31 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 19 Aug 2021 10:00:31 GMT Subject: Integrated: 8272576: G1: Use more accurate integer type for collection set length In-Reply-To: <_qBesqCURhe18YG7VNpW-40vlY6B0GcpDOtTjHm1WMM=.a22ac9ee-c415-42dd-8761-ee05454664af@github.com> References: <_qBesqCURhe18YG7VNpW-40vlY6B0GcpDOtTjHm1WMM=.a22ac9ee-c415-42dd-8761-ee05454664af@github.com> Message-ID: On Tue, 17 Aug 2021 16:12:28 GMT, Albert Mingkun Yang wrote: > Simple change of using smaller integer type, `size_t` to `uint`. Also moved the bound-check assertion before array access. > > Test: hotspot_gc This pull request has now been integrated. Changeset: ab418129 Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/ab41812951aad9d44fb475d3a8c94b65d9e22b20 Stats: 10 lines in 2 files changed: 2 ins; 2 del; 6 mod 8272576: G1: Use more accurate integer type for collection set length Reviewed-by: iwalulya, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/5148 From rkennke at openjdk.java.net Thu Aug 19 10:05:29 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 19 Aug 2021 10:05:29 GMT Subject: RFR: 8270911: Don't use Access API to access object header word In-Reply-To: <4_GETFuyze8Bs77RVU6GiHXPfNL0W9UVAjQbhj0dXic=.658c3fdf-074e-492c-8940-b96214979770@github.com> References: <4_GETFuyze8Bs77RVU6GiHXPfNL0W9UVAjQbhj0dXic=.658c3fdf-074e-492c-8940-b96214979770@github.com> Message-ID: On Mon, 19 Jul 2021 14:04:27 GMT, Roman Kennke wrote: > Back when Shenandoah required read- and write-barriers on all object accesses, we introduced using the HeapAccess API to access object header words, too. Since JDK 13 (introduction of Shenandoah LRB), this is no longer necessary. I propose to simplify header access by not using the Access API but use ordinary C++ instead. > > (N.b: do we want to get rid of all primitive Access API, too? I would do it.) > > Testing: > - [x] tier1 > - [ ] tier2 Closing this in favour of #5166 ------------- PR: https://git.openjdk.java.net/jdk/pull/4827 From rkennke at openjdk.java.net Thu Aug 19 10:05:29 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 19 Aug 2021 10:05:29 GMT Subject: Withdrawn: 8270911: Don't use Access API to access object header word In-Reply-To: <4_GETFuyze8Bs77RVU6GiHXPfNL0W9UVAjQbhj0dXic=.658c3fdf-074e-492c-8940-b96214979770@github.com> References: <4_GETFuyze8Bs77RVU6GiHXPfNL0W9UVAjQbhj0dXic=.658c3fdf-074e-492c-8940-b96214979770@github.com> Message-ID: On Mon, 19 Jul 2021 14:04:27 GMT, Roman Kennke wrote: > Back when Shenandoah required read- and write-barriers on all object accesses, we introduced using the HeapAccess API to access object header words, too. Since JDK 13 (introduction of Shenandoah LRB), this is no longer necessary. I propose to simplify header access by not using the Access API but use ordinary C++ instead. > > (N.b: do we want to get rid of all primitive Access API, too? I would do it.) > > Testing: > - [x] tier1 > - [ ] tier2 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/4827 From rkennke at openjdk.java.net Thu Aug 19 10:11:44 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 19 Aug 2021 10:11:44 GMT Subject: RFR: 8271951: Consolidate preserved marks overflow stack in SerialGC [v2] In-Reply-To: <_FpPV4VH0_Iyh0qyN6bJ6wyVISaoy5chgr3ARPlPIKU=.8eafc90e-dfc7-44fc-9316-6bb626e8e8f8@github.com> References: <_FpPV4VH0_Iyh0qyN6bJ6wyVISaoy5chgr3ARPlPIKU=.8eafc90e-dfc7-44fc-9316-6bb626e8e8f8@github.com> Message-ID: > In SerialGC we put preserved marks in space that we may have in the young gen, and failing that, we push it to an overflow stack. The overflow stack is actually two stacks, and replicate behavior of the in-place preserved marks. These can be consolidated. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] hotspot_gc Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remove PreservedMark::init() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5022/files - new: https://git.openjdk.java.net/jdk/pull/5022/files/fb48de4d..c0f0d4ad Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5022&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5022&range=00-01 Stats: 7 lines in 2 files changed: 0 ins; 6 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5022.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5022/head:pull/5022 PR: https://git.openjdk.java.net/jdk/pull/5022 From rkennke at openjdk.java.net Thu Aug 19 10:18:28 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 19 Aug 2021 10:18:28 GMT Subject: RFR: 8271946: Cleanup leftovers in Space and subclasses [v2] In-Reply-To: References: <06djYCvDj8KzCJVZ7KJqS8B9WtRBwl7hN1CMm4BZzXk=.23e2ee84-3f0f-4274-a470-6ea5b2d888cc@github.com> Message-ID: On Sat, 7 Aug 2021 11:08:55 GMT, Roman Kennke wrote: >> There are a number of methods in (Compactible)Space that seem trivially implemented. E.g. obj_size(), adjust_obj_size(), scanned_block_is_obj() etc. I believe they are leftovers from CMS (not sure) but looks like they can be replaced by obj->size() or removed altogether. Also, the the related templated methods scan_and_forward(), scan_and_adjust_pointers() and scan_and_compact() can be directly and mechanically inlined into their callers. It makes the code more readable. >> >> Testing: >> - [x] hotspot_gc >> - [x] tier1 >> - [x] tier2 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Remove obsolete comments Ping? Anybody else who wants to approve this? Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/5031 From rkennke at openjdk.java.net Thu Aug 19 10:20:27 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 19 Aug 2021 10:20:27 GMT Subject: Integrated: 8272165: Consolidate mark_must_be_preserved() variants In-Reply-To: References: Message-ID: On Mon, 9 Aug 2021 18:25:35 GMT, Roman Kennke wrote: > We have markWord::mark_must_be_preserved() and markWord::mark_must_be_preserved_for_promotion_failure(). We used to have slightly different implementations when we had BiasedLocking, but now they are identical and used for identical purposes too. Let's remove what we don't need. > > Testing: > - [x] hotspot_gc > - [x] tier1 > - [x] tier2 This pull request has now been integrated. Changeset: 03b5e99d Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/03b5e99d998e037f84e9e2395b49321979c0acd8 Stats: 12 lines in 4 files changed: 0 ins; 11 del; 1 mod 8272165: Consolidate mark_must_be_preserved() variants Reviewed-by: tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/5057 From ayang at openjdk.java.net Thu Aug 19 10:31:25 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 19 Aug 2021 10:31:25 GMT Subject: RFR: 8272629: G1: simplify G1GCParPhaseTimesTracker constructor API In-Reply-To: References: Message-ID: On Wed, 18 Aug 2021 10:31:29 GMT, Albert Mingkun Yang wrote: > Simple change of removing an always-true argument in `G1GCParPhaseTimesTracker`. > > Test: hotspot_gc Yes, you are right. I overlooked it. I will close this PR and the ticket. ------------- PR: https://git.openjdk.java.net/jdk/pull/5162 From ayang at openjdk.java.net Thu Aug 19 10:31:25 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 19 Aug 2021 10:31:25 GMT Subject: Withdrawn: 8272629: G1: simplify G1GCParPhaseTimesTracker constructor API In-Reply-To: References: Message-ID: On Wed, 18 Aug 2021 10:31:29 GMT, Albert Mingkun Yang wrote: > Simple change of removing an always-true argument in `G1GCParPhaseTimesTracker`. > > Test: hotspot_gc This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5162 From rkennke at openjdk.java.net Thu Aug 19 12:13:21 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 19 Aug 2021 12:13:21 GMT Subject: RFR: 8267188: gc/stringdedup/TestStringDeduplicationInterned.java fails with Shenandoah In-Reply-To: References: Message-ID: On Tue, 17 Aug 2021 18:32:54 GMT, Zhengyu Gu wrote: > Shenandoah currently enqueues deduplication candidates when marks object gray. > > However, it can not do so when it scans roots, as it can potentially result lock rank inversion between stack watermark lock and dedup request buffer lock. > > As the result, it can not enqueue as many as candidates as it should be able to, I believe JDK-8271834 is due to the same problem. I purpose we switch to enqueue candidates when we mark object black. > > We are still not able to enqueue all candidates, only when they have displaced headers or have monitor inflating in progress. Upstream is working on removing displaced headers, we should revisit the logic afterward, or we can choose to deduplicate all string regardless their ages. > > > > Test: > - [x] hotspot_gc_shenandoah > - [x] tier1 with -XX:+UseShenandoahGC and -XX:+UseStringDeduplication Looks good to me! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5151 From iwalulya at openjdk.java.net Thu Aug 19 13:36:54 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Thu, 19 Aug 2021 13:36:54 GMT Subject: RFR: 8215118: Reduce HeapRegion to only have one member for index in CSet [v2] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to maintain only a single member variable to indicate the region's index in a collection set: either the actual collection set for young regions or the optional regions array if the region is considered optional during a mixed collection. > > Testing: Tier 1-7 Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: after kstefanj discussion ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5122/files - new: https://git.openjdk.java.net/jdk/pull/5122/files/e1d5d026..43cda20b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5122&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5122&range=00-01 Stats: 56 lines in 8 files changed: 8 ins; 28 del; 20 mod Patch: https://git.openjdk.java.net/jdk/pull/5122.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5122/head:pull/5122 PR: https://git.openjdk.java.net/jdk/pull/5122 From ayang at openjdk.java.net Thu Aug 19 13:48:28 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 19 Aug 2021 13:48:28 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v3] In-Reply-To: References: Message-ID: <07v1G9XXgMjxDlLbrGNOI1GYwMJAsWzVtmCMneRGk-8=.6eab983b-cdc2-485d-abc3-cf8d6100ec46@github.com> On Mon, 16 Aug 2021 11:44:19 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to add string deduplication support to ParallelGC. >> >> Testing: All string deduplication tests, and Tier 1-5. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > fix indentation The rest look fine. src/hotspot/share/gc/parallel/psStringDedup.cpp line 33: > 31: // reached the deduplication age threshold, i.e. has not previously been a > 32: // candidate during its life in the young generation. > 33: return PSParallelCompact::space_id(cast_from_oop(java_string)) > PSParallelCompact::old_space_id && I wonder if the space-id comparison can be replaced with sth like `PSScavenge::is_obj_in_young(java_string)`; we are checking if this `java_string` live in young-gen, right? ------------- PR: https://git.openjdk.java.net/jdk/pull/5085 From iwalulya at openjdk.java.net Thu Aug 19 13:52:39 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Thu, 19 Aug 2021 13:52:39 GMT Subject: RFR: 8215118: Reduce HeapRegion to only have one member for index in CSet [v3] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to maintain only a single member variable to indicate the region's index in a collection set: either the actual collection set for young regions or the optional regions array if the region is considered optional during a mixed collection. > > Testing: Tier 1-7 Ivan Walulya has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - resolve conflict - after kstefanj discussion - 8215118: Reduce HeapRegion to only have one member for index in CSet ------------- Changes: https://git.openjdk.java.net/jdk/pull/5122/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5122&range=02 Stats: 54 lines in 9 files changed: 11 ins; 10 del; 33 mod Patch: https://git.openjdk.java.net/jdk/pull/5122.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5122/head:pull/5122 PR: https://git.openjdk.java.net/jdk/pull/5122 From iwalulya at openjdk.java.net Thu Aug 19 14:31:03 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Thu, 19 Aug 2021 14:31:03 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v4] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to add string deduplication support to ParallelGC. > > Testing: All string deduplication tests, and Tier 1-5. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: albertnetymk review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5085/files - new: https://git.openjdk.java.net/jdk/pull/5085/files/ba73d853..90dec08b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5085&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5085&range=02-03 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5085.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5085/head:pull/5085 PR: https://git.openjdk.java.net/jdk/pull/5085 From github.com+1866174+tbzhang at openjdk.java.net Thu Aug 19 15:08:37 2021 From: github.com+1866174+tbzhang at openjdk.java.net (Tongbao Zhang) Date: Thu, 19 Aug 2021 15:08:37 GMT Subject: RFR: 8272417: ZGC: fastdebug build crashes when printing ClassLoaderData Message-ID: Following test fails on fastdebug build with the options: "-XX:+UseZGC -XX:+VerifyAfterGC -Xlog:gc*=trace". It crashes on the unknown claim value of the ClassLoaderData public class A { public static void main(String... args) { System.gc(); } } The unknown claim value of the ClassLoaderData is "ClassLoaderData::_claim_other | ClassLoaderData::_claim_strong" which is introduced by the ZVerifyOopClosure. This patch clears the CLD's claimed marks after the ZGC verification ------------- Commit messages: - 8272417: ZGC: fastdebug build crashes when printing ClassLoaderData Changes: https://git.openjdk.java.net/jdk/pull/5107/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5107&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272417 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5107.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5107/head:pull/5107 PR: https://git.openjdk.java.net/jdk/pull/5107 From robilad at openjdk.java.net Thu Aug 19 15:08:37 2021 From: robilad at openjdk.java.net (Dalibor Topic) Date: Thu, 19 Aug 2021 15:08:37 GMT Subject: RFR: 8272417: ZGC: fastdebug build crashes when printing ClassLoaderData In-Reply-To: References: Message-ID: <34T1qfRAU-w_Vo-f-KF9L40i4FX0TvSTPL5hkPrXasQ=.76a48f44-343f-418c-89d8-73198c56e7d9@github.com> On Fri, 13 Aug 2021 07:14:51 GMT, Tongbao Zhang wrote: > Following test fails on fastdebug build with the options: "-XX:+UseZGC -XX:+VerifyAfterGC -Xlog:gc*=trace". It crashes on the unknown claim value of the ClassLoaderData > > public class A { > public static void main(String... args) { > System.gc(); > } > } > > The unknown claim value of the ClassLoaderData is "ClassLoaderData::_claim_other | ClassLoaderData::_claim_strong" which is introduced by the ZVerifyOopClosure. This patch clears the CLD's claimed marks after the ZGC verification Hi, please contact me at Dalibor.Topic at oracle.com so that I can mark your account as verified. ------------- PR: https://git.openjdk.java.net/jdk/pull/5107 From github.com+1866174+tbzhang at openjdk.java.net Thu Aug 19 15:08:37 2021 From: github.com+1866174+tbzhang at openjdk.java.net (Tongbao Zhang) Date: Thu, 19 Aug 2021 15:08:37 GMT Subject: RFR: 8272417: ZGC: fastdebug build crashes when printing ClassLoaderData In-Reply-To: References: Message-ID: On Fri, 13 Aug 2021 07:14:51 GMT, Tongbao Zhang wrote: > Following test fails on fastdebug build with the options: "-XX:+UseZGC -XX:+VerifyAfterGC -Xlog:gc*=trace". It crashes on the unknown claim value of the ClassLoaderData > > public class A { > public static void main(String... args) { > System.gc(); > } > } > > The unknown claim value of the ClassLoaderData is "ClassLoaderData::_claim_other | ClassLoaderData::_claim_strong" which is introduced by the ZVerifyOopClosure. This patch clears the CLD's claimed marks after the ZGC verification > Hi, please contact me at [Dalibor.Topic at oracle.com](mailto:Dalibor.Topic at oracle.com) so that I can mark your account as verified. Thank you! ------------- PR: https://git.openjdk.java.net/jdk/pull/5107 From ayang at openjdk.java.net Thu Aug 19 15:13:26 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 19 Aug 2021 15:13:26 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v4] In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 14:31:03 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to add string deduplication support to ParallelGC. >> >> Testing: All string deduplication tests, and Tier 1-5. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > albertnetymk review Marked as reviewed by ayang (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5085 From ayang at openjdk.java.net Thu Aug 19 15:22:38 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 19 Aug 2021 15:22:38 GMT Subject: RFR: 8272725: G1: replace type needs_remset_update_t with bool Message-ID: Simple type change in `struct G1HeapRegionAttr`. Test: hotspot_gc ------------- Commit messages: - boolean Changes: https://git.openjdk.java.net/jdk/pull/5188/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5188&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272725 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5188.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5188/head:pull/5188 PR: https://git.openjdk.java.net/jdk/pull/5188 From iwalulya at openjdk.java.net Thu Aug 19 16:11:43 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Thu, 19 Aug 2021 16:11:43 GMT Subject: RFR: 8215118: Reduce HeapRegion to only have one member for index in CSet [v4] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to maintain only a single member variable to indicate the region's index in a collection set: either the actual collection set for young regions or the optional regions array if the region is considered optional during a mixed collection. > > Testing: Tier 1-7 Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: error in branch merge ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5122/files - new: https://git.openjdk.java.net/jdk/pull/5122/files/eaaf5683..dde6a82c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5122&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5122&range=02-03 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5122.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5122/head:pull/5122 PR: https://git.openjdk.java.net/jdk/pull/5122 From rkennke at openjdk.java.net Thu Aug 19 19:53:28 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 19 Aug 2021 19:53:28 GMT Subject: Integrated: 8266519: Cleanup resolve() leftovers from BarrierSet et al In-Reply-To: References: Message-ID: On Tue, 4 May 2021 16:38:30 GMT, Roman Kennke wrote: > Shenandoah used to require a way to resolve oops, but it's long unused the the corresponding code in the access machinery is obsolete. Let's remove it. > > Testing: > - [x] hotspot_gc_shenandoah > - [x] tier1 > - [x] tier2 This pull request has now been integrated. Changeset: 7eccbd4f Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/7eccbd4fde58ea36d6a21a2c4ffa3bc5d0b38c10 Stats: 41 lines in 4 files changed: 0 ins; 40 del; 1 mod 8266519: Cleanup resolve() leftovers from BarrierSet et al Reviewed-by: kbarrett, stefank ------------- PR: https://git.openjdk.java.net/jdk/pull/3862 From rkennke at openjdk.java.net Thu Aug 19 19:54:28 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 19 Aug 2021 19:54:28 GMT Subject: Integrated: 8271951: Consolidate preserved marks overflow stack in SerialGC In-Reply-To: <_FpPV4VH0_Iyh0qyN6bJ6wyVISaoy5chgr3ARPlPIKU=.8eafc90e-dfc7-44fc-9316-6bb626e8e8f8@github.com> References: <_FpPV4VH0_Iyh0qyN6bJ6wyVISaoy5chgr3ARPlPIKU=.8eafc90e-dfc7-44fc-9316-6bb626e8e8f8@github.com> Message-ID: On Thu, 5 Aug 2021 14:29:09 GMT, Roman Kennke wrote: > In SerialGC we put preserved marks in space that we may have in the young gen, and failing that, we push it to an overflow stack. The overflow stack is actually two stacks, and replicate behavior of the in-place preserved marks. These can be consolidated. > > Testing: > - [x] tier1 > - [x] tier2 > - [x] hotspot_gc This pull request has now been integrated. Changeset: b40e8f0f Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/b40e8f0f9e719f28cf128d74d834233860e4ab67 Stats: 27 lines in 3 files changed: 0 ins; 14 del; 13 mod 8271951: Consolidate preserved marks overflow stack in SerialGC Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/5022 From rkennke at openjdk.java.net Thu Aug 19 19:57:28 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 19 Aug 2021 19:57:28 GMT Subject: Integrated: 8270041: Consolidate oopDesc::cas_forward_to() and oopDesc::forward_to_atomic() In-Reply-To: <3xGT20phR8n05MxmosIRONZDUtn8T2WTL4uz5MO7FqE=.13e2c107-34ec-4f1d-ba51-f4d21021cea9@github.com> References: <3xGT20phR8n05MxmosIRONZDUtn8T2WTL4uz5MO7FqE=.13e2c107-34ec-4f1d-ba51-f4d21021cea9@github.com> Message-ID: On Wed, 18 Aug 2021 18:36:11 GMT, Roman Kennke wrote: > The two methods do exactly the same, except one returns a bool, the other a value. The only use of cas_forward_to() can be consolidated to use forward_to_atomic() instead. > > Testing: > - [ ] hotspot_gc > - [ ] tier1 > - [ ] tier2 This pull request has now been integrated. Changeset: f4be211a Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/f4be211ae290824cb6c678dcdff0df91a20117d6 Stats: 10 lines in 3 files changed: 0 ins; 9 del; 1 mod 8270041: Consolidate oopDesc::cas_forward_to() and oopDesc::forward_to_atomic() Reviewed-by: stefank, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/5167 From stefank at openjdk.java.net Thu Aug 19 22:08:27 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 19 Aug 2021 22:08:27 GMT Subject: RFR: 8272417: ZGC: fastdebug build crashes when printing ClassLoaderData In-Reply-To: References: Message-ID: <_u642xJ73W5qtcDDXhLB8a3qbgVNxMJGq0wWXlIb8E4=.63401fae-c7ee-4335-bdf8-c6aab187858b@github.com> On Fri, 13 Aug 2021 07:14:51 GMT, Tongbao Zhang wrote: > Following test fails on fastdebug build with the options: "-XX:+UseZGC -XX:+VerifyAfterGC -Xlog:gc*=trace". It crashes on the unknown claim value of the ClassLoaderData > > public class A { > public static void main(String... args) { > System.gc(); > } > } > > The unknown claim value of the ClassLoaderData is "ClassLoaderData::_claim_other | ClassLoaderData::_claim_strong" which is introduced by the ZVerifyOopClosure. This patch clears the CLD's claimed marks after the ZGC verification I mentioned in the bug report that this is not enough to correctly fix this bug. Could you fix the printing code instead? ------------- Changes requested by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5107 From kbarrett at openjdk.java.net Thu Aug 19 23:54:26 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 19 Aug 2021 23:54:26 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v2] In-Reply-To: References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: On Thu, 19 Aug 2021 09:14:55 GMT, Denghui Dong wrote: >> Hi team, >> >> Please help review this change to add string deduplication support to SerialGC. >> >> Thanks, >> Denghui > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > avoid non-local variables with non-trivial / not-constexpr constructors or destructors Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/shared/stringdedup/stringDedup.hpp line 198: > 196: // is completed the Requests object must be flushed (either explicitly or by > 197: // the destructor). > 198: class StringDedup::Requests : public ResourceObj { Should be `CHeapObj` rather than `ResourceObj`. ------------- PR: https://git.openjdk.java.net/jdk/pull/5153 From kbarrett at openjdk.java.net Fri Aug 20 00:03:29 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 20 Aug 2021 00:03:29 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v2] In-Reply-To: References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: On Thu, 19 Aug 2021 09:14:55 GMT, Denghui Dong wrote: >> Hi team, >> >> Please help review this change to add string deduplication support to SerialGC. >> >> Thanks, >> Denghui > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > avoid non-local variables with non-trivial / not-constexpr constructors or destructors Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/serial/serialStringDedup.hpp line 46: > 44: bool obj_is_tenured) { > 45: return StringDedup::is_enabled() && > 46: java_lang_String::is_instance_inlined(java_string) && Missing `#include` for this function. A bunch of the pre-submit builds on github are failing because of this. To call this function will require including javaClasses.inline.hpp, and ".inline.hpp" files cannot be included by an ordinary ".hpp". Dealing with this will require adding "gc/serial/stringDedup.inline.hpp", moving this definition there, &etc. ------------- PR: https://git.openjdk.java.net/jdk/pull/5153 From kbarrett at openjdk.java.net Fri Aug 20 00:05:26 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 20 Aug 2021 00:05:26 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v4] In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 14:31:03 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to add string deduplication support to ParallelGC. >> >> Testing: All string deduplication tests, and Tier 1-5. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > albertnetymk review Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/parallel/psStringDedup.hpp line 28: > 26: #define SHARE_GC_PARALLEL_PSSTRINGDEDUP_HPP > 27: > 28: #include "classfile/javaClasses.inline.hpp" ".inline.hpp" files must not be included by ".hpp" files. ------------- PR: https://git.openjdk.java.net/jdk/pull/5085 From ddong at openjdk.java.net Fri Aug 20 01:06:58 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 20 Aug 2021 01:06:58 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v3] In-Reply-To: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: > Hi team, > > Please help review this change to add string deduplication support to SerialGC. > > Thanks, > Denghui Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: fix build problem ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5153/files - new: https://git.openjdk.java.net/jdk/pull/5153/files/d7a7c68a..564612e0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=01-02 Stats: 97 lines in 7 files changed: 49 ins; 41 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/5153.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5153/head:pull/5153 PR: https://git.openjdk.java.net/jdk/pull/5153 From ddong at openjdk.java.net Fri Aug 20 01:44:54 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 20 Aug 2021 01:44:54 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v4] In-Reply-To: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: > Hi team, > > Please help review this change to add string deduplication support to SerialGC. > > Thanks, > Denghui Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: fix build error ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5153/files - new: https://git.openjdk.java.net/jdk/pull/5153/files/564612e0..e8bb9e88 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5153.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5153/head:pull/5153 PR: https://git.openjdk.java.net/jdk/pull/5153 From kbarrett at openjdk.java.net Fri Aug 20 03:02:23 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 20 Aug 2021 03:02:23 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v4] In-Reply-To: References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: On Fri, 20 Aug 2021 01:44:54 GMT, Denghui Dong wrote: >> Hi team, >> >> Please help review this change to add string deduplication support to SerialGC. >> >> Thanks, >> Denghui > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > fix build error Some more tidying up. Sorry for not noticing a couple of these earlier. src/hotspot/share/gc/serial/serialStringDedup.hpp line 27: > 25: #define SHARE_GC_SERIAL_STRINGDEDUP_HPP > 26: > 27: #include "gc/shared/stringdedup/stringDedup.hpp" There is nothing here that requires this header. Rather, it is needed in the inline.hpp file and should be moved there. src/hotspot/share/gc/serial/serialStringDedup.hpp line 42: > 40: // Candidate selection policy for young during evacuation. > 41: // If to is young then age should be the new (survivor's) age. > 42: // if to is old then age should be the age of the copied from object. These comments don't match the arguments. There is no `to` or `age` here. This comment seems to have been just copied from G1StringDedup. src/hotspot/share/gc/serial/serialStringDedup.hpp line 43: > 41: // If to is young then age should be the new (survivor's) age. > 42: // if to is old then age should be the age of the copied from object. > 43: static inline bool is_candidate_from_evacuation(oop java_string, Argument is poorly named. The value might not be, and indeed is unlikely to be, a String. Part of this function's job is to check whether the argument is a String. src/hotspot/share/gc/serial/serialStringDedup.hpp line 46: > 44: bool obj_is_tenured); > 45: }; > 46: #endif // SHARE_GC_SERIAL_STRINGDEDUP_HPP There's typically a blank line before the final `#endif`. src/hotspot/share/gc/serial/serialStringDedup.inline.hpp line 28: > 26: > 27: #include "gc/serial/serialHeap.hpp" > 28: #include "gc/serial/serialStringDedup.hpp" Inline headers are required to include the associated .hpp first. That's typically with a blank line between it and the remainder of the includes for emphasis. https://github.com/openjdk/jdk/blame/ddcd851c43aa97477c7e406490c0c7c7d71ac629/doc/hotspot-style.md#L141-L144 src/hotspot/share/gc/serial/serialStringDedup.inline.hpp line 36: > 34: // reached the deduplication age threshold, i.e. has not previously been a > 35: // candidate during its life in the young generation. > 36: return SerialHeap::heap()->young_gen()->is_in_reserved(java_string) && This depends on implicit includes since there's nothing here to ensure the type of `young_gen()` is in scope. I think it's coming from serialHeap.hpp, but that could be changed in the future to forward-declare the generation class rather than include the associated header. It might be better to have this function in a .cpp file (as it was in an earlier iteration) to limit header exposure in clients. This function isn't so performance critical, since before calling it we've already checked that deduplication is enabled and the argument is a String, so being inline isn't as important here as for is_candidate_from_evacuation. ------------- Changes requested by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5153 From ddong at openjdk.java.net Fri Aug 20 03:53:59 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 20 Aug 2021 03:53:59 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v5] In-Reply-To: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: > Hi team, > > Please help review this change to add string deduplication support to SerialGC. > > Thanks, > Denghui Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: tidy ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5153/files - new: https://git.openjdk.java.net/jdk/pull/5153/files/e8bb9e88..e4c7fd29 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=03-04 Stats: 49 lines in 3 files changed: 39 ins; 4 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5153.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5153/head:pull/5153 PR: https://git.openjdk.java.net/jdk/pull/5153 From ddong at openjdk.java.net Fri Aug 20 03:54:08 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 20 Aug 2021 03:54:08 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v4] In-Reply-To: References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: <0vfxgLJecApeTmULVPZ5EVG-uYyKVWul3QFpVZbX4Dw=.7949ac2f-6b25-426c-88ef-27e7cc357f57@github.com> On Fri, 20 Aug 2021 02:59:01 GMT, Kim Barrett wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> fix build error > > Some more tidying up. Sorry for not noticing a couple of these earlier. @kimbarrett Thanks for your careful review! I have fixed the related problems. > src/hotspot/share/gc/serial/serialStringDedup.hpp line 27: > >> 25: #define SHARE_GC_SERIAL_STRINGDEDUP_HPP >> 26: >> 27: #include "gc/shared/stringdedup/stringDedup.hpp" > > There is nothing here that requires this header. Rather, it is needed in the inline.hpp file and should be moved there. removed > src/hotspot/share/gc/serial/serialStringDedup.hpp line 42: > >> 40: // Candidate selection policy for young during evacuation. >> 41: // If to is young then age should be the new (survivor's) age. >> 42: // if to is old then age should be the age of the copied from object. > > These comments don't match the arguments. There is no `to` or `age` here. This comment seems to have been just copied from G1StringDedup. removed. > src/hotspot/share/gc/serial/serialStringDedup.hpp line 43: > >> 41: // If to is young then age should be the new (survivor's) age. >> 42: // if to is old then age should be the age of the copied from object. >> 43: static inline bool is_candidate_from_evacuation(oop java_string, > > Argument is poorly named. The value might not be, and indeed is unlikely to be, a String. Part of this function's job is to check whether the argument is a String. fixed. Use obj as instead. > src/hotspot/share/gc/serial/serialStringDedup.hpp line 46: > >> 44: bool obj_is_tenured); >> 45: }; >> 46: #endif // SHARE_GC_SERIAL_STRINGDEDUP_HPP > > There's typically a blank line before the final `#endif`. fixed > src/hotspot/share/gc/serial/serialStringDedup.inline.hpp line 28: > >> 26: >> 27: #include "gc/serial/serialHeap.hpp" >> 28: #include "gc/serial/serialStringDedup.hpp" > > Inline headers are required to include the associated .hpp first. That's typically with a blank line between it and the remainder of the includes for emphasis. > https://github.com/openjdk/jdk/blame/ddcd851c43aa97477c7e406490c0c7c7d71ac629/doc/hotspot-style.md#L141-L144 fixed. > src/hotspot/share/gc/serial/serialStringDedup.inline.hpp line 36: > >> 34: // reached the deduplication age threshold, i.e. has not previously been a >> 35: // candidate during its life in the young generation. >> 36: return SerialHeap::heap()->young_gen()->is_in_reserved(java_string) && > > This depends on implicit includes since there's nothing here to ensure the type of `young_gen()` is in scope. I think it's coming from serialHeap.hpp, but that could be changed in the future to forward-declare the generation class rather than include the associated header. It might be better to have this function in a .cpp file (as it was in an earlier iteration) to limit header exposure in clients. This function isn't so performance critical, since before calling it we've already checked that deduplication is enabled and the argument is a String, so being inline isn't as important here as for is_candidate_from_evacuation. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5153 From ddong at openjdk.java.net Fri Aug 20 05:47:54 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 20 Aug 2021 05:47:54 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v6] In-Reply-To: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: <67iPlmIeW-2gO1rDaGURfBMlVFXUuHO-0GnlSZ2sJI4=.1c947029-1c05-40dd-9c92-31e1cce1c03a@github.com> > Hi team, > > Please help review this change to add string deduplication support to SerialGC. > > Thanks, > Denghui Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: fix build problem ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5153/files - new: https://git.openjdk.java.net/jdk/pull/5153/files/e4c7fd29..1ae4e452 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=04-05 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5153.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5153/head:pull/5153 PR: https://git.openjdk.java.net/jdk/pull/5153 From kbarrett at openjdk.java.net Fri Aug 20 06:08:27 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 20 Aug 2021 06:08:27 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v4] In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 14:31:03 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to add string deduplication support to ParallelGC. >> >> Testing: All string deduplication tests, and Tier 1-5. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > albertnetymk review Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/parallel/psStringDedup.hpp line 39: > 37: // If to is young then age should be the new (survivor's) age. > 38: // if to is old then age should be the age of the copied from object. > 39: static bool is_candidate_from_evacuation(oop obj, This comment was just copied from G1 and isn't right for this implementation. ------------- PR: https://git.openjdk.java.net/jdk/pull/5085 From iwalulya at openjdk.java.net Fri Aug 20 06:22:54 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Fri, 20 Aug 2021 06:22:54 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v5] In-Reply-To: References: Message-ID: <2FTKci79OWFuZgQ59O6dkJF2z-t67uwDS5gdmUYsUtQ=.f13e08b2-b654-441c-842c-5577d8231970@github.com> > Hi all, > > Please review this change to add string deduplication support to ParallelGC. > > Testing: All string deduplication tests, and Tier 1-5. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: kim requested cleanup ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5085/files - new: https://git.openjdk.java.net/jdk/pull/5085/files/90dec08b..bc1cca97 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5085&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5085&range=03-04 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5085.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5085/head:pull/5085 PR: https://git.openjdk.java.net/jdk/pull/5085 From iwalulya at openjdk.java.net Fri Aug 20 06:22:58 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Fri, 20 Aug 2021 06:22:58 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v4] In-Reply-To: References: Message-ID: On Fri, 20 Aug 2021 06:05:34 GMT, Kim Barrett wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> albertnetymk review > > src/hotspot/share/gc/parallel/psStringDedup.hpp line 39: > >> 37: // If to is young then age should be the new (survivor's) age. >> 38: // if to is old then age should be the age of the copied from object. >> 39: static bool is_candidate_from_evacuation(oop obj, > > This comment was just copied from G1 and isn't right for this implementation. Leftover from an earlier version, fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5085 From thartmann at openjdk.java.net Fri Aug 20 06:28:25 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 20 Aug 2021 06:28:25 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 In-Reply-To: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 18 Aug 2021 12:37:00 GMT, Fairoz Matte wrote: > This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp > > The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >> __ move(addr, tmp); > However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. > The fix is to check if it is is_oop() and call the mov appropriately. > > No issues found in local testing and Mach5 tier1-3 Using an additional register + move does not seem right to fix this. The problem is that `addr->type()` is `T_OBJECT` and the type of the virtual register `tmp` is `T_LONG`, right? What about this? LIR_Opr tmp = gen->new_register(addr->type()); Could the reporter provide a reproducer? ------------- Changes requested by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5164 From iwalulya at openjdk.java.net Fri Aug 20 06:55:37 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Fri, 20 Aug 2021 06:55:37 GMT Subject: RFR: 8236176: Parallel GC SplitInfo comment should be updated for shadow regions Message-ID: Hi all, Please review this small change to add a high-level description of how shadow regions are utilized during the compact phase of the parallel GC. There is no special handling of Splitinfo by the shadow regions; the destination computation and the handling of partial_objects are done during the summary phase while the shadow regions optimization only applies to compaction. Additionally, some unused variables are cleaned up. ------------- Commit messages: - cleanup - minor type - Merge branch 'master' into JDK-8236176_Paralle_SplitInfo_shadow_regions - 8236176: Parallel GC SplitInfo comment should be updated for shadow regions Changes: https://git.openjdk.java.net/jdk/pull/5195/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5195&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8236176 Stats: 30 lines in 2 files changed: 19 ins; 10 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5195.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5195/head:pull/5195 PR: https://git.openjdk.java.net/jdk/pull/5195 From github.com+1866174+tbzhang at openjdk.java.net Fri Aug 20 08:05:47 2021 From: github.com+1866174+tbzhang at openjdk.java.net (Tongbao Zhang) Date: Fri, 20 Aug 2021 08:05:47 GMT Subject: RFR: 8272417: ZGC: fastdebug build crashes when printing ClassLoaderData [v2] In-Reply-To: References: Message-ID: <8sEXPfo3ewIJJAv7-zlLWKSq6qMbBf3VUmynfc42Ngk=.8774813e-03ae-47b9-9e15-15c90bfb39db@github.com> > Following test fails on fastdebug build with the options: "-XX:+UseZGC -XX:+VerifyAfterGC -Xlog:gc*=trace". It crashes on the unknown claim value of the ClassLoaderData > > public class A { > public static void main(String... args) { > System.gc(); > } > } > > The unknown claim value of the ClassLoaderData is "ClassLoaderData::_claim_other | ClassLoaderData::_claim_strong" which is introduced by the ZVerifyOopClosure. This patch clears the CLD's claimed marks after the ZGC verification Tongbao Zhang has updated the pull request incrementally with one additional commit since the last revision: fix the printing code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5107/files - new: https://git.openjdk.java.net/jdk/pull/5107/files/bcf609ef..c7912154 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5107&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5107&range=00-01 Stats: 12 lines in 2 files changed: 2 ins; 5 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/5107.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5107/head:pull/5107 PR: https://git.openjdk.java.net/jdk/pull/5107 From stefank at openjdk.java.net Fri Aug 20 08:17:24 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Fri, 20 Aug 2021 08:17:24 GMT Subject: RFR: 8272417: ZGC: fastdebug build crashes when printing ClassLoaderData [v2] In-Reply-To: <8sEXPfo3ewIJJAv7-zlLWKSq6qMbBf3VUmynfc42Ngk=.8774813e-03ae-47b9-9e15-15c90bfb39db@github.com> References: <8sEXPfo3ewIJJAv7-zlLWKSq6qMbBf3VUmynfc42Ngk=.8774813e-03ae-47b9-9e15-15c90bfb39db@github.com> Message-ID: On Fri, 20 Aug 2021 08:05:47 GMT, Tongbao Zhang wrote: >> Following test fails on fastdebug build with the options: "-XX:+UseZGC -XX:+VerifyAfterGC -Xlog:gc*=trace". It crashes on the unknown claim value of the ClassLoaderData >> >> public class A { >> public static void main(String... args) { >> System.gc(); >> } >> } >> >> The unknown claim value of the ClassLoaderData is "ClassLoaderData::_claim_other | ClassLoaderData::_claim_strong" which is introduced by the ZVerifyOopClosure. This patch clears the CLD's claimed marks after the ZGC verification > > Tongbao Zhang has updated the pull request incrementally with one additional commit since the last revision: > > fix the printing code Looks good. Thanks. ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5107 From github.com+1866174+tbzhang at openjdk.java.net Fri Aug 20 08:17:25 2021 From: github.com+1866174+tbzhang at openjdk.java.net (Tongbao Zhang) Date: Fri, 20 Aug 2021 08:17:25 GMT Subject: RFR: 8272417: ZGC: fastdebug build crashes when printing ClassLoaderData [v2] In-Reply-To: <_u642xJ73W5qtcDDXhLB8a3qbgVNxMJGq0wWXlIb8E4=.63401fae-c7ee-4335-bdf8-c6aab187858b@github.com> References: <_u642xJ73W5qtcDDXhLB8a3qbgVNxMJGq0wWXlIb8E4=.63401fae-c7ee-4335-bdf8-c6aab187858b@github.com> Message-ID: On Thu, 19 Aug 2021 22:05:32 GMT, Stefan Karlsson wrote: > I mentioned in the bug report that this is not enough to correctly fix this bug. Could you fix the printing code instead? Yes, Thanks for your advice! ------------- PR: https://git.openjdk.java.net/jdk/pull/5107 From tschatzl at openjdk.java.net Fri Aug 20 08:23:21 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 20 Aug 2021 08:23:21 GMT Subject: RFR: 8272725: G1: replace type needs_remset_update_t with bool In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 15:12:45 GMT, Albert Mingkun Yang wrote: > Simple type change in `struct G1HeapRegionAttr`. > > Test: hotspot_gc The use of `uint8_t` in this case is intentional to have control of the size of the table, keeping it small and access time fairly small to minimize cache pollution. The size of `bool` is not defined in C++ after all. This could "waste" a lot of (cache) space. When `G1HeapRegionAttr` has been introduced there has been a related performance issue with SPARC where this mattered which is gone now - I do not remember x64/x86 being particularly affected, but idk impact on the "new" ARM64 platform. Anyway, it seems better to me to make sure that in this case we keep the size of the heap region attr controlled. It's being accessed for literally every reference multiple times. ------------- PR: https://git.openjdk.java.net/jdk/pull/5188 From kbarrett at openjdk.java.net Fri Aug 20 08:46:20 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 20 Aug 2021 08:46:20 GMT Subject: RFR: 8272725: G1: replace type needs_remset_update_t with bool In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 15:12:45 GMT, Albert Mingkun Yang wrote: > Simple type change in `struct G1HeapRegionAttr`. > > Test: hotspot_gc A comment describing the rationale for the specific types might have been nice. I think there might have been something there in the past, but maybe removed with SPARC support. But I agree that this is a place where we really do care about low-level control. ------------- PR: https://git.openjdk.java.net/jdk/pull/5188 From ddong at openjdk.java.net Fri Aug 20 09:19:56 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 20 Aug 2021 09:19:56 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v7] In-Reply-To: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: > Hi team, > > Please help review this change to add string deduplication support to SerialGC. > > Thanks, > Denghui Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: tidy ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5153/files - new: https://git.openjdk.java.net/jdk/pull/5153/files/1ae4e452..9b6a1486 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=05-06 Stats: 10 lines in 2 files changed: 0 ins; 9 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5153.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5153/head:pull/5153 PR: https://git.openjdk.java.net/jdk/pull/5153 From ayang at openjdk.java.net Fri Aug 20 09:37:25 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 20 Aug 2021 09:37:25 GMT Subject: RFR: 8272725: G1: replace type needs_remset_update_t with bool In-Reply-To: References: Message-ID: <7kA7k-JOr2YbDXLM_e0dSL9VlcBLj522RAjoAtvG2G0=.c2555d23-7b6b-4735-ac16-bc52f01e463b@github.com> On Thu, 19 Aug 2021 15:12:45 GMT, Albert Mingkun Yang wrote: > Simple type change in `struct G1HeapRegionAttr`. > > Test: hotspot_gc I see; thank you for the explanation. How about I pivot this ticket/PR to adding some comment on the use of `uint8_t`? ------------- PR: https://git.openjdk.java.net/jdk/pull/5188 From ddong at openjdk.java.net Fri Aug 20 09:40:58 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 20 Aug 2021 09:40:58 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v8] In-Reply-To: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: > Hi team, > > Please help review this change to add string deduplication support to SerialGC. > > Thanks, > Denghui Denghui Dong has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: tidy ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5153/files - new: https://git.openjdk.java.net/jdk/pull/5153/files/9b6a1486..86da5446 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5153.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5153/head:pull/5153 PR: https://git.openjdk.java.net/jdk/pull/5153 From kbarrett at openjdk.java.net Fri Aug 20 09:41:00 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 20 Aug 2021 09:41:00 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v8] In-Reply-To: References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: On Fri, 20 Aug 2021 09:35:44 GMT, Denghui Dong wrote: >> Hi team, >> >> Please help review this change to add string deduplication support to SerialGC. >> >> Thanks, >> Denghui > > Denghui Dong has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > tidy Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5153 From tschatzl at openjdk.java.net Fri Aug 20 09:58:23 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 20 Aug 2021 09:58:23 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v3] In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 07:55:41 GMT, Albert Mingkun Yang wrote: >> Hamlin Li has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > Marked as reviewed by ayang (Committer). There are two reviews (from me and @albertnetymk), and Kim's question about the reasons do not seem relevant any more, so this is good to go imho. ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From iwalulya at openjdk.java.net Fri Aug 20 10:08:52 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Fri, 20 Aug 2021 10:08:52 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v6] In-Reply-To: References: Message-ID: <6CKSM0P9lLLNK22qlY0YtwY_SaHjjmc6VAzB1diR53s=.d4cb7d29-542c-4355-bac2-b4afd1353cc6@github.com> > Hi all, > > Please review this change to add string deduplication support to ParallelGC. > > Testing: All string deduplication tests, and Tier 1-5. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: more clean up ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5085/files - new: https://git.openjdk.java.net/jdk/pull/5085/files/bc1cca97..2f2a2cde Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5085&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5085&range=04-05 Stats: 52 lines in 3 files changed: 8 ins; 38 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5085.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5085/head:pull/5085 PR: https://git.openjdk.java.net/jdk/pull/5085 From tschatzl at openjdk.java.net Fri Aug 20 10:09:26 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 20 Aug 2021 10:09:26 GMT Subject: RFR: 8271946: Cleanup leftovers in Space and subclasses [v2] In-Reply-To: References: <06djYCvDj8KzCJVZ7KJqS8B9WtRBwl7hN1CMm4BZzXk=.23e2ee84-3f0f-4274-a470-6ea5b2d888cc@github.com> Message-ID: On Sat, 7 Aug 2021 11:08:55 GMT, Roman Kennke wrote: >> There are a number of methods in (Compactible)Space that seem trivially implemented. E.g. obj_size(), adjust_obj_size(), scanned_block_is_obj() etc. I believe they are leftovers from CMS (not sure) but looks like they can be replaced by obj->size() or removed altogether. Also, the the related templated methods scan_and_forward(), scan_and_adjust_pointers() and scan_and_compact() can be directly and mechanically inlined into their callers. It makes the code more readable. >> >> Testing: >> - [x] hotspot_gc >> - [x] tier1 >> - [x] tier2 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Remove obsolete comments Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5031 From ayang at openjdk.java.net Fri Aug 20 10:13:26 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 20 Aug 2021 10:13:26 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v6] In-Reply-To: <6CKSM0P9lLLNK22qlY0YtwY_SaHjjmc6VAzB1diR53s=.d4cb7d29-542c-4355-bac2-b4afd1353cc6@github.com> References: <6CKSM0P9lLLNK22qlY0YtwY_SaHjjmc6VAzB1diR53s=.d4cb7d29-542c-4355-bac2-b4afd1353cc6@github.com> Message-ID: On Fri, 20 Aug 2021 10:08:52 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to add string deduplication support to ParallelGC. >> >> Testing: All string deduplication tests, and Tier 1-5. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > more clean up Marked as reviewed by ayang (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5085 From rkennke at openjdk.java.net Fri Aug 20 10:15:26 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 20 Aug 2021 10:15:26 GMT Subject: Integrated: 8271946: Cleanup leftovers in Space and subclasses In-Reply-To: <06djYCvDj8KzCJVZ7KJqS8B9WtRBwl7hN1CMm4BZzXk=.23e2ee84-3f0f-4274-a470-6ea5b2d888cc@github.com> References: <06djYCvDj8KzCJVZ7KJqS8B9WtRBwl7hN1CMm4BZzXk=.23e2ee84-3f0f-4274-a470-6ea5b2d888cc@github.com> Message-ID: <5oAaTXhafIuKuTrTZS-RAbiODRNrZisi3X-va-5ZDUI=.b5641fb0-926b-4814-9df2-eecf2e014836@github.com> On Fri, 6 Aug 2021 12:31:01 GMT, Roman Kennke wrote: > There are a number of methods in (Compactible)Space that seem trivially implemented. E.g. obj_size(), adjust_obj_size(), scanned_block_is_obj() etc. I believe they are leftovers from CMS (not sure) but looks like they can be replaced by obj->size() or removed altogether. Also, the the related templated methods scan_and_forward(), scan_and_adjust_pointers() and scan_and_compact() can be directly and mechanically inlined into their callers. It makes the code more readable. > > Testing: > - [x] hotspot_gc > - [x] tier1 > - [x] tier2 This pull request has now been integrated. Changeset: 92bde673 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/92bde6738a8984000ffdef010228d5117b2d8313 Stats: 423 lines in 3 files changed: 160 ins; 259 del; 4 mod 8271946: Cleanup leftovers in Space and subclasses Reviewed-by: stefank, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/5031 From kbarrett at openjdk.java.net Fri Aug 20 10:26:23 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 20 Aug 2021 10:26:23 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v6] In-Reply-To: <6CKSM0P9lLLNK22qlY0YtwY_SaHjjmc6VAzB1diR53s=.d4cb7d29-542c-4355-bac2-b4afd1353cc6@github.com> References: <6CKSM0P9lLLNK22qlY0YtwY_SaHjjmc6VAzB1diR53s=.d4cb7d29-542c-4355-bac2-b4afd1353cc6@github.com> Message-ID: On Fri, 20 Aug 2021 10:08:52 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to add string deduplication support to ParallelGC. >> >> Testing: All string deduplication tests, and Tier 1-5. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > more clean up Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5085 From ayang at openjdk.java.net Fri Aug 20 11:22:48 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 20 Aug 2021 11:22:48 GMT Subject: RFR: 8272725: G1: replace type needs_remset_update_t with bool [v2] In-Reply-To: References: Message-ID: <7fRoxbV6b5GKXZuzfqQ4lXjjFtpXIFQ7F5CglMeB7DA=.791bf0dd-1faa-484e-b73a-f8236cdec8f5@github.com> > Simple type change in `struct G1HeapRegionAttr`. > > Test: hotspot_gc Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: - review - Revert "boolean" This reverts commit 2a23beb0549a928ae3876070dbdcf5a153c0703d. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5188/files - new: https://git.openjdk.java.net/jdk/pull/5188/files/2a23beb0..190215c8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5188&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5188&range=00-01 Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5188.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5188/head:pull/5188 PR: https://git.openjdk.java.net/jdk/pull/5188 From mli at openjdk.java.net Fri Aug 20 11:30:31 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 20 Aug 2021 11:30:31 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v4] In-Reply-To: References: Message-ID: On Tue, 17 Aug 2021 01:45:51 GMT, Hamlin Li wrote: >> Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. >> >> After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Closes JDK-8272070: simplify age calculation Thanks Thomas, I just saw Albert has been a reviewer too. Congratulation @albertnetymk ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From mli at openjdk.java.net Fri Aug 20 11:30:32 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 20 Aug 2021 11:30:32 GMT Subject: Integrated: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space In-Reply-To: References: Message-ID: On Wed, 4 Aug 2021 01:06:47 GMT, Hamlin Li wrote: > Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. > > After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. This pull request has now been integrated. Changeset: d874e961 Author: Hamlin Li URL: https://git.openjdk.java.net/jdk/commit/d874e9616f80324a53f3c8866ce500e55dfa308f Stats: 13 lines in 1 file changed: 1 ins; 11 del; 1 mod 8271579: G1: Move copy before CAS in do_copy_to_survivor_space 8272070: G1: Simplify age calculation after JDK-8271579 Co-authored-by: shoubing ma Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From iwalulya at openjdk.java.net Fri Aug 20 11:33:32 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Fri, 20 Aug 2021 11:33:32 GMT Subject: Integrated: 8267185: Add string deduplication support to ParallelGC In-Reply-To: References: Message-ID: On Wed, 11 Aug 2021 13:56:09 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to add string deduplication support to ParallelGC. > > Testing: All string deduplication tests, and Tier 1-5. This pull request has now been integrated. Changeset: fb1dfc6f Author: Ivan Walulya URL: https://git.openjdk.java.net/jdk/commit/fb1dfc6f49f62990aa9988e9d6f7ffd1adf45d8e Stats: 182 lines in 17 files changed: 181 ins; 0 del; 1 mod 8267185: Add string deduplication support to ParallelGC Reviewed-by: kbarrett, ayang ------------- PR: https://git.openjdk.java.net/jdk/pull/5085 From iwalulya at openjdk.java.net Fri Aug 20 11:33:29 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Fri, 20 Aug 2021 11:33:29 GMT Subject: RFR: 8267185: Add string deduplication support to ParallelGC [v6] In-Reply-To: References: <6CKSM0P9lLLNK22qlY0YtwY_SaHjjmc6VAzB1diR53s=.d4cb7d29-542c-4355-bac2-b4afd1353cc6@github.com> Message-ID: On Fri, 20 Aug 2021 10:09:58 GMT, Albert Mingkun Yang wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> more clean up > > Marked as reviewed by ayang (Committer). Thanks @albertnetymk and @kimbarrett for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5085 From tschatzl at openjdk.java.net Fri Aug 20 12:58:29 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 20 Aug 2021 12:58:29 GMT Subject: RFR: 8271579: G1: Move copy before CAS in do_copy_to_survivor_space [v4] In-Reply-To: References: Message-ID: On Tue, 17 Aug 2021 01:45:51 GMT, Hamlin Li wrote: >> Propose to move copy before CAS in do_copy_to_survivor_space, as we found this will improve G1 performance. Specjbb shows 3.7% in critical on aarch64, no change in max. >> >> After this change copy_to_survivor in G1 is also aligned with PS's copy_to_survivor. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Closes JDK-8272070: simplify age calculation Just for clarification: the default requirements for a change in the gc-team to integrate is to have at least two "r"eviewers (note the lowercase r - those are people that approved it regardless of role), at least one of which must be a "R"eviewer, i.e. has the openjdk reviewer role (and there are no other open questions/comments). So that formal requirement had already been fulfilled by me and Albert without Albert being a "R"eviewer. ------------- PR: https://git.openjdk.java.net/jdk/pull/4983 From ddong at openjdk.java.net Fri Aug 20 14:26:41 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 20 Aug 2021 14:26:41 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v9] In-Reply-To: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: > Hi team, > > Please help review this change to add string deduplication support to SerialGC. > > Thanks, > Denghui Denghui Dong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge branch 'master' into JDK-8272609 - tidy - fix build problem - tidy - fix build error - fix build problem - avoid non-local variables with non-trivial / not-constexpr constructors or destructors - fix - 8272609: Add string deduplication support to SerialGC ------------- Changes: https://git.openjdk.java.net/jdk/pull/5153/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=08 Stats: 248 lines in 19 files changed: 245 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5153.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5153/head:pull/5153 PR: https://git.openjdk.java.net/jdk/pull/5153 From ddong at openjdk.java.net Fri Aug 20 16:02:53 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 20 Aug 2021 16:02:53 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v10] In-Reply-To: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: > Hi team, > > Please help review this change to add string deduplication support to SerialGC. > > Thanks, > Denghui Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: fix build error ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5153/files - new: https://git.openjdk.java.net/jdk/pull/5153/files/bb4cdfef..077ca27f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5153&range=08-09 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5153.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5153/head:pull/5153 PR: https://git.openjdk.java.net/jdk/pull/5153 From iwalulya at openjdk.java.net Fri Aug 20 16:38:26 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Fri, 20 Aug 2021 16:38:26 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v10] In-Reply-To: References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: On Fri, 20 Aug 2021 16:02:53 GMT, Denghui Dong wrote: >> Hi team, >> >> Please help review this change to add string deduplication support to SerialGC. >> >> Thanks, >> Denghui > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > fix build error Marked as reviewed by iwalulya (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5153 From ayang at openjdk.java.net Fri Aug 20 21:27:31 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 20 Aug 2021 21:27:31 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v10] In-Reply-To: References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: On Fri, 20 Aug 2021 16:02:53 GMT, Denghui Dong wrote: >> Hi team, >> >> Please help review this change to add string deduplication support to SerialGC. >> >> Thanks, >> Denghui > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > fix build error src/hotspot/share/gc/serial/defNewGeneration.cpp line 606: > 604: assert(heap->no_allocs_since_save_marks(), "save marks have not been newly set."); > 605: > 606: _string_dedup_requests.flush(); I wonder if placing `_string_dedup_requests.flush();` before weak roots processing (`WeakProcessor::weak_oops_do`) makes more senses. The documentation says `flush` should be called when marking is done; for Serial young GC, it's equivalent to when evacuation is done, which is right after ref-processing. src/hotspot/share/gc/serial/genMarkSweep.cpp line 115: > 113: deallocate_stacks(); > 114: > 115: MarkSweep::_string_dedup_requests->flush(); Similarly, `flush` should be called inside `mark_sweep_phase1`, right after ref-processing. src/hotspot/share/gc/serial/markSweep.cpp line 219: > 217: MarkSweep::_gc_timer = new (ResourceObj::C_HEAP, mtGC) STWGCTimer(); > 218: MarkSweep::_gc_tracer = new (ResourceObj::C_HEAP, mtGC) SerialOldTracer(); > 219: MarkSweep::_string_dedup_requests = new StringDedup::Requests(); It's not clear to me why `_string_dedup_requests` is a pointer. Is it possible to define `static StringDedup::Requests _string_dedup_requests;` directly? That way, `Requests` doesn't need to use the inheritance. ------------- PR: https://git.openjdk.java.net/jdk/pull/5153 From kbarrett at openjdk.java.net Sat Aug 21 06:36:30 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 21 Aug 2021 06:36:30 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v10] In-Reply-To: References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: On Fri, 20 Aug 2021 21:17:50 GMT, Albert Mingkun Yang wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> fix build error > > src/hotspot/share/gc/serial/defNewGeneration.cpp line 606: > >> 604: assert(heap->no_allocs_since_save_marks(), "save marks have not been newly set."); >> 605: >> 606: _string_dedup_requests.flush(); > > I wonder if placing `_string_dedup_requests.flush();` before weak roots processing (`WeakProcessor::weak_oops_do`) makes more senses. The documentation says `flush` should be called when marking is done; for Serial young GC, it's equivalent to when evacuation is done, which is right after ref-processing. That documentation is mistaken; that's a left-over from an earlier development version that I forgot to update. Without special additional care, the lifetime also needs to cover final-reference processing. I have a todo item to file a bug against that comment. > src/hotspot/share/gc/serial/markSweep.cpp line 219: > >> 217: MarkSweep::_gc_timer = new (ResourceObj::C_HEAP, mtGC) STWGCTimer(); >> 218: MarkSweep::_gc_tracer = new (ResourceObj::C_HEAP, mtGC) SerialOldTracer(); >> 219: MarkSweep::_string_dedup_requests = new StringDedup::Requests(); > > It's not clear to me why `_string_dedup_requests` is a pointer. Is it possible to define `static StringDedup::Requests _string_dedup_requests;` directly? That way, `Requests` doesn't need to use the inheritance. It's a non-local variable with a non-trivial destructor. See https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2021-August/036412.html ------------- PR: https://git.openjdk.java.net/jdk/pull/5153 From github.com+16811675+linade at openjdk.java.net Sat Aug 21 09:28:26 2021 From: github.com+16811675+linade at openjdk.java.net (Yude Lin) Date: Sat, 21 Aug 2021 09:28:26 GMT Subject: RFR: 8272083: G1: Record iterated range for BOT performance during card scan In-Reply-To: References: Message-ID: On Sat, 7 Aug 2021 04:56:08 GMT, Yude Lin wrote: > A fix to the problem in 8272083 is to use a per-worker pointer to indicate where the worker has scanned up to, similar to the _scanned_to variable. The difference is this pointer (I call it _iterated_to) records the end of the object and _scanned_to records the end of the scan. Since we always scan with increasing addresses, the end of the latest object scanned is also the address where BOT has fixed up to. So we avoid having to fix below this address when calling block_start(). This implementation approximately reduce the number of calls to set_offset_array() during scan_heap_roots() 2-10 times (in my casual test with -XX:G1ConcRefinementGreenZone=1000000). > > What this approach not solving is random access to BOT. So far I haven't found anything having this pattern. (For completeness, I'll restate the problem in 8272083.) The block offset table (BOT) is something that serves the purpose "given a heap word, return the head of the object that covers this heap word" We typically use BOT to find the objects that covers the dirty memory marked by a dirty card. We need to update BOT with the information it needs (obj addr, obj size) at allocation. But during gc, we update it with (plab addr, plab size). When allocating within a plab, we don't update the BOT. So after a gc, BOT becomes crude. BOT will refine itself lazily when it's being used, that is, it will walk the plabs and update itself according to the objects it found during the walk. The current algorithm is such that BOT will sometimes walk the same part of a plab more than once, which is unnecessary. The update operation has an atomic store for each 512-byte memory it walks. This affects both concurrent refine and scan heap roots during pause. Using a dedicated "concurrent BOT fixing phase" we can remove most of the duplicated updates, and also move most of the actually-necessary updates out of gc pause. The result is something like ![image](https://user-images.githubusercontent.com/16811675/130313687-e2c0ced3-f4fa-439c-b67b-1248d95df441.png) This is a zoom-in of a box plot which shows the fixing phase (indicated by "fix" in the label) can reduce the total scan heap roots time. It's more effective when the refinement strength is low (indicated by refine=x, x means the non-adaptive green threshold, and "auto" means using the default adaptive mode). It can also increases the concurrent refinement speed, because it also reduces the work in there. The cost is a dedicated concurrent phase, which is described as follows. 1. PLABs are created during evacuation to contain promoted objects from survivor regions and surviving objects from old regions (mixed gc only). In each old region, the area between the old top() and new top(), before and after an evacuation pause, is where the new PLABs are, which we will record. 2. After the pause, a concurrent phase will be scheduled to fix BOT in these areas. Basically a fixing worker will claim a part of the area (we'll call it a chunk), and walk the memory for objects. If an objects crosses a BOT boundary, it will update the BOT entry. Some tweaks need to be made to make sure different workers never fix the same BOT entry. 3. If concurrent refinement tries to refine a card, it will probably run into an unfixed part of BOT. We prevent this by requiring the refiner to fix where the card is pointing at. If other worker has claimed the chunk where the card is pointing at, then refinement can proceed without a guarantee the chunk is actually fixed. This not often happens but might be a problem. 4. If another gc pause is scheduled, we need to wait for the fixing phase to complete. Because we assume that "BOT is going to be fixed almost entirely anyway. Might as well do it sooner". We can potentially abort though, and handle the unfixed part in some other way. My thoughts for this solution: 1. Why not use a queue to record precisely where the plabs are? First, the queue costs space. Maybe 1/128 of heap space at worst (e.g., 4 bytes for plab pointer; a region can have many plabs but only plabs larger than a card are interesting enough, so region_size/512 plabs in total). We can record very large plabs only, to reduce the queue size. Still it costs space. Secondly, enqueue action costs a little allocation time. Thirdly, if we want to do dummy object filling on top of fixing, as Thomas suggested earlier, it's going to need a heap walk anyway (I imagine). One thing I didn't quite get over is, what do you mean by > One idea could be that after allocating a PLAB (in old gen only of course) we enqueue some background task to fix these. We know the exact locations and sizes of those, maybe this somewhat helps. We also know the chunk sizes that are used for scanning, so we know the problematic places. Thomas? I don't know how knowing the chunk sizes is going to help with narrow down the problematic places. I feel like I'm being stupid and miss something :D Would you help me understand? Do you think we can work from here? I'll upload the code when it's more polished (supposing the general direction is okay :D) Some experiment details (using `numactl -C 0-15 -m 0 java -Xmx8g -Xms8g -XX:ParallelGCThreads=4 <-XX:G1ConcRefinementGreenZone=$refine> <-XX:-G1UseAdaptiveConcRefinement> -Xlog:gc*=debug:file=$fname -Dspecjbb.controller.type=PRESET -Dspecjbb.controller.preset.ir=8000 -Dspecjbb.controller.preset.duration=600000 -jar specjbb2015.jar -m COMPOSITE`): ![image](https://user-images.githubusercontent.com/16811675/130316831-9b1b8612-71d1-425c-9a11-c73b2b9bb8aa.png) ![image](https://user-images.githubusercontent.com/16811675/130317074-7fec23db-b61d-4696-903f-298f8f47994c.png) ![image](https://user-images.githubusercontent.com/16811675/130316964-4a319f56-55a3-4c4d-9b77-63ddcc323100.png) ![image](https://user-images.githubusercontent.com/16811675/130317006-73dc94c9-08ba-4998-903f-9cd590b47d67.png) ------------- PR: https://git.openjdk.java.net/jdk/pull/5039 From ddong at openjdk.java.net Sat Aug 21 14:39:27 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Sat, 21 Aug 2021 14:39:27 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v10] In-Reply-To: References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: On Fri, 20 Aug 2021 21:24:01 GMT, Albert Mingkun Yang wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> fix build error > > src/hotspot/share/gc/serial/genMarkSweep.cpp line 115: > >> 113: deallocate_stacks(); >> 114: >> 115: MarkSweep::_string_dedup_requests->flush(); > > Similarly, `flush` should be called inside `mark_sweep_phase1`, right after ref-processing. Thanks for your review and Kim's detailed explanation. I think putting this invocation here or the position you suggest should have no effect on the result, so I tend not to make this modification. ------------- PR: https://git.openjdk.java.net/jdk/pull/5153 From ayang at openjdk.java.net Sat Aug 21 16:43:24 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Sat, 21 Aug 2021 16:43:24 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v10] In-Reply-To: References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: On Sat, 21 Aug 2021 14:36:23 GMT, Denghui Dong wrote: >> src/hotspot/share/gc/serial/genMarkSweep.cpp line 115: >> >>> 113: deallocate_stacks(); >>> 114: >>> 115: MarkSweep::_string_dedup_requests->flush(); >> >> Similarly, `flush` should be called inside `mark_sweep_phase1`, right after ref-processing. > > Thanks for your review and Kim's detailed explanation. > > I think putting this invocation here or the position you suggest should have no effect on the result, so I tend not to make this modification. I believe it's desirable to narrow the request window (btw first `add` and `flush`), which offers more prompt string-dedup and memory release (in `flush`). Additionally, it makes more sense to call `flush` right after the complete live objects traversal (strong-marking + final-marking as part of ref-processing), since we will not visit any new objects then. ------------- PR: https://git.openjdk.java.net/jdk/pull/5153 From kbarrett at openjdk.java.net Sat Aug 21 19:12:28 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 21 Aug 2021 19:12:28 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v10] In-Reply-To: References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: On Sat, 21 Aug 2021 16:40:35 GMT, Albert Mingkun Yang wrote: >> Thanks for your review and Kim's detailed explanation. >> >> I think putting this invocation here or the position you suggest should have no effect on the result, so I tend not to make this modification. > > I believe it's desirable to narrow the request window (btw first `add` and `flush`), which offers more prompt string-dedup and memory release (in `flush`). Additionally, it makes more sense to call `flush` right after the complete live objects traversal (strong-marking + final-marking as part of ref-processing), since we will not visit any new objects then. I prefer it where it is. Rather than putting something between phantom reference processing an oopstorage-based weak processing, I would prefer to merge those two. I don't remember if there's an RFE for that. ------------- PR: https://git.openjdk.java.net/jdk/pull/5153 From ayang at openjdk.java.net Sat Aug 21 20:16:30 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Sat, 21 Aug 2021 20:16:30 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v10] In-Reply-To: References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: On Sat, 21 Aug 2021 19:09:35 GMT, Kim Barrett wrote: >> I believe it's desirable to narrow the request window (btw first `add` and `flush`), which offers more prompt string-dedup and memory release (in `flush`). Additionally, it makes more sense to call `flush` right after the complete live objects traversal (strong-marking + final-marking as part of ref-processing), since we will not visit any new objects then. > > I prefer it where it is. Rather than putting something between phantom > reference processing an oopstorage-based weak processing, I would prefer to > merge those two. I don't remember if there's an RFE for that. I still think `flush` should be called sooner, e.g. inside `mark_sweep_phase1`, which is not in conflict with merging phantom-processing and weak root processing. Anyway, this is subjective; feel free to integrate the current version. ------------- PR: https://git.openjdk.java.net/jdk/pull/5153 From ddong at openjdk.java.net Mon Aug 23 02:03:27 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 23 Aug 2021 02:03:27 GMT Subject: RFR: 8272609: Add string deduplication support to SerialGC [v8] In-Reply-To: References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: On Fri, 20 Aug 2021 09:34:15 GMT, Kim Barrett wrote: >> Denghui Dong has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> tidy > > Looks good. @kimbarrett @walulyai @albertnetymk thanks for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5153 From ddong at openjdk.java.net Mon Aug 23 06:15:31 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 23 Aug 2021 06:15:31 GMT Subject: Integrated: 8272609: Add string deduplication support to SerialGC In-Reply-To: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> References: <-dDHVaPKPNMAFdK-x_C9GIfenCz7RQo7WaaoepQ4VtI=.a00fcb5a-4e89-4f99-8333-4e1c05b71994@github.com> Message-ID: On Wed, 18 Aug 2021 02:24:11 GMT, Denghui Dong wrote: > Hi team, > > Please help review this change to add string deduplication support to SerialGC. > > Thanks, > Denghui This pull request has now been integrated. Changeset: e8a289e7 Author: Denghui Dong Committer: Yi Yang URL: https://git.openjdk.java.net/jdk/commit/e8a289e77d70d31f2f7d1a8dea620062dbdb3e2a Stats: 249 lines in 19 files changed: 246 ins; 0 del; 3 mod 8272609: Add string deduplication support to SerialGC Reviewed-by: kbarrett, iwalulya ------------- PR: https://git.openjdk.java.net/jdk/pull/5153 From pliden at openjdk.java.net Mon Aug 23 07:47:31 2021 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 23 Aug 2021 07:47:31 GMT Subject: RFR: 8272417: ZGC: fastdebug build crashes when printing ClassLoaderData [v2] In-Reply-To: <8sEXPfo3ewIJJAv7-zlLWKSq6qMbBf3VUmynfc42Ngk=.8774813e-03ae-47b9-9e15-15c90bfb39db@github.com> References: <8sEXPfo3ewIJJAv7-zlLWKSq6qMbBf3VUmynfc42Ngk=.8774813e-03ae-47b9-9e15-15c90bfb39db@github.com> Message-ID: On Fri, 20 Aug 2021 08:05:47 GMT, Tongbao Zhang wrote: >> Following test fails on fastdebug build with the options: "-XX:+UseZGC -XX:+VerifyAfterGC -Xlog:gc*=trace". It crashes on the unknown claim value of the ClassLoaderData >> >> public class A { >> public static void main(String... args) { >> System.gc(); >> } >> } >> >> The unknown claim value of the ClassLoaderData is "ClassLoaderData::_claim_other | ClassLoaderData::_claim_strong" which is introduced by the ZVerifyOopClosure. This patch clears the CLD's claimed marks after the ZGC verification > > Tongbao Zhang has updated the pull request incrementally with one additional commit since the last revision: > > fix the printing code Looks good. ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5107 From iwalulya at openjdk.java.net Mon Aug 23 07:59:42 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 23 Aug 2021 07:59:42 GMT Subject: RFR: 8242847: G1 should not clear mark bitmaps with no marks Message-ID: Hi all, Please review this change to bound the range of bitmap clearing for each region using the liveness data collected during marking. For the Concurrent Undo Mark cycle, the liveness information (next_top_at_mark_start and live_words) is in sync wrt. the _next_mark_bitmap that needs clearing. Hence, we use these details to clear only bitmaps for regions that were dirtied and need clearing i.e. only clear between [bottom, ntams), and only clear bitmaps for regions that had at least one bit set (i.e. have some live data). Testing: Tier 1-3. ------------- Commit messages: - add comments and refactor rename - create private functions - space - 8242847: G1 should not clear mark bitmaps with no marks Changes: https://git.openjdk.java.net/jdk/pull/5213/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5213&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8242847 Stats: 62 lines in 5 files changed: 46 ins; 7 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/5213.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5213/head:pull/5213 PR: https://git.openjdk.java.net/jdk/pull/5213 From thomas.schatzl at oracle.com Mon Aug 23 08:21:20 2021 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 23 Aug 2021 10:21:20 +0200 Subject: Status of JEP-8204088/JDK-8236073 In-Reply-To: References: Message-ID: <11ceed1e-fd73-fb78-e636-8b49aaa1f708@oracle.com> Hi Rodrigo, On 18.08.21 09:36, Rodrigo Bruno wrote: > Hi Thomas, Hi Man, > > On Mon, 2 Aug 2021 at 16:49, Thomas Schatzl > wrote: [...] > > Fwiw, these two are fairly independent issues, and the main problem > with > the proposed CurrentMaxHeapSize patch has been that the change is not > atomic in the sense that the changes simply modify the global heap size > variable without regard whether and when it is accessed. That may > result > in fairly nasty races as different components of the GC may then have a > different notion of what the current max heap size actually is at the > moment. > > > That is true. However, we could easily do the update to > CurrentMaxHeapSize inside a VMOperation to ensure an atomic change. This > is not in the current patch but it is probably a simple change. Right? Unfortunately, not, at least when I looked last time. There did not seem to be a way to trigger a VM operation based on the change of a variable using a jcmd command. An intermediate step of just polling that variable (in a g1 service task) and then triggering it or just waiting until the next gc (which is probably best, particular wrt to decreases in heap size) could be options. Thanks, Thomas From tschatzl at openjdk.java.net Mon Aug 23 08:28:25 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 23 Aug 2021 08:28:25 GMT Subject: RFR: 8271930: Simplify end_card calculation in G1BlockOffsetTablePart::verify In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 09:12:38 GMT, Albert Mingkun Yang wrote: > Simplify `end_card` calculation and some general cleanup. > > Test: hotspot_gc Lgtm apart from the minor comment typos. src/hotspot/share/gc/g1/heapRegion.cpp line 717: > 715: } > 716: > 717: // only regions in old generation contain valid BOT Please start the sentence with upper case and end with full stop. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5015 From ayang at openjdk.java.net Mon Aug 23 08:36:52 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 23 Aug 2021 08:36:52 GMT Subject: RFR: 8271930: Simplify end_card calculation in G1BlockOffsetTablePart::verify [v2] In-Reply-To: References: Message-ID: > Simplify `end_card` calculation and some general cleanup. > > Test: hotspot_gc Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5015/files - new: https://git.openjdk.java.net/jdk/pull/5015/files/82b0533c..e9157aaa Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5015&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5015&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5015.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5015/head:pull/5015 PR: https://git.openjdk.java.net/jdk/pull/5015 From ayang at openjdk.java.net Mon Aug 23 08:36:53 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 23 Aug 2021 08:36:53 GMT Subject: RFR: 8271930: Simplify end_card calculation in G1BlockOffsetTablePart::verify [v2] In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 08:25:07 GMT, Thomas Schatzl wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/gc/g1/heapRegion.cpp line 717: > >> 715: } >> 716: >> 717: // only regions in old generation contain valid BOT > > Please start the sentence with upper case and end with full stop. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5015 From tschatzl at openjdk.java.net Mon Aug 23 10:33:30 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 23 Aug 2021 10:33:30 GMT Subject: RFR: 8272083: G1: Record iterated range for BOT performance during card scan In-Reply-To: References: Message-ID: On Sat, 7 Aug 2021 04:56:08 GMT, Yude Lin wrote: > A fix to the problem in 8272083 is to use a per-worker pointer to indicate where the worker has scanned up to, similar to the _scanned_to variable. The difference is this pointer (I call it _iterated_to) records the end of the object and _scanned_to records the end of the scan. Since we always scan with increasing addresses, the end of the latest object scanned is also the address where BOT has fixed up to. So we avoid having to fix below this address when calling block_start(). This implementation approximately reduce the number of calls to set_offset_array() during scan_heap_roots() 2-10 times (in my casual test with -XX:G1ConcRefinementGreenZone=1000000). > > What this approach not solving is random access to BOT. So far I haven't found anything having this pattern. First, thanks for all the work put into the graphs, they make discussion much easier, and provide a very good argument pursuing this further so far. Some comments purely based on what is written here: > (For completeness, I'll restate the problem in 8272083.) > > The block offset table (BOT) is something that serves the purpose > "given a heap word, return the head of the object that covers this heap word" > We typically use BOT to find the objects that covers the dirty memory marked by a dirty card. > > We need to update BOT with the information it needs (obj addr, obj size) at allocation. But during gc, we update it with (plab addr, plab size). When allocating within a plab, we don't update the BOT. So after a gc, BOT becomes crude. BOT will refine itself lazily when it's being used, that is, it will walk the plabs and update itself according to the objects it found during the walk. The current algorithm is such that BOT will sometimes walk the same part of a plab more than once, which is unnecessary. The update operation has an atomic store for each 512-byte memory it walks. > > This affects both concurrent refine and scan heap roots during pause. > > Using a dedicated "concurrent BOT fixing phase" we can remove most of the duplicated updates, and also move most of the actually-necessary updates out of gc pause. The result is something like > ![image](https://user-images.githubusercontent.com/16811675/130313687-e2c0ced3-f4fa-439c-b67b-1248d95df441.png) > This is a zoom-in of a box plot which shows the fixing phase (indicated by "fix" in the label) can reduce the total scan heap roots time. It's more effective when the refinement strength is low (indicated by refine=x, x means the non-adaptive green threshold, and "auto" means using the default adaptive mode). > > It can also increases the concurrent refinement speed, because it also reduces the work in there. The cost is a dedicated concurrent phase, which is described as follows. > > 1. PLABs are created during evacuation to contain promoted objects from survivor regions and surviving objects from old regions (mixed gc only). In each old region, the area between the old top() and new top(), before and after an evacuation pause, is where the new PLABs are, which we will record. Note that there are also direct allocations for "larger" objects. > > 2. After the pause, a concurrent phase will be scheduled to fix BOT in these areas. Basically a fixing worker will claim a part of the area (we'll call it a chunk), and walk the memory for objects. If an objects crosses a BOT boundary, it will update the BOT entry. Some tweaks need to be made to make sure different workers never fix the same BOT entry. > > 3. If concurrent refinement tries to refine a card, it will probably run into an unfixed part of BOT. We prevent this by requiring the refiner to fix where the card is pointing at. If other worker has claimed the chunk where the card is pointing at, then refinement can proceed without a guarantee the chunk is actually fixed. This not often happens but might be a problem. This paragraph indicates that the refinement thread never fixes up the BOT, and there is some synchronization mechanism between BOT-fix thread and the refinement threads. Not sure how this works, but it depends on how the chunks for BOT-fixup are claimed - maybe the refinement thread could fix the BOT for that chunk first (if unclaimed)? Just an idea, could the refinement threads do that BOT-fixup work first "before" (or during) refining work? E.g. if a refinement thread gets a card, first check if there is a corresponding BOT-fixup chunk to do, and do that first, doing all to be refined cards in that PLAB area at the same time (or don't do that at the same time - there are a few options here). I do not think leaving some work for card scanning is an issue as long as most work is done. > > 4. If another gc pause is scheduled, we need to wait for the fixing phase to complete. Because we assume that "BOT is going to be fixed almost entirely anyway. Might as well do it sooner". We can potentially abort though, and handle the unfixed part in some other way. This paragraph indicates to me that the BOT-fix thread has higher priority than the GC pause, which seems to be the wrong way to look at the problem - in this case I would rather have the GC eat the cost of fixing up the BOT as the BOT fixup seems supplementary. Delaying remaining BOT-fixing work for after the GC pause seems not to be a good idea, this seems to only result in forever piling up work (assuming in steady state the GC is always faster than completing that work). Not completely sure if this is what you meant. Or did you mean that in the pause some extra(?) phase completes the fixup work, which could be another option? However just leaving the remainder of the work for the card scan, even if it sometimes does duplicate work does not seem to be an issue to me. This problem of duplicate work is probably largest on large heaps, where the time between pauses allows for complete concurrent BOT-fixup anyway. > > > My thoughts for this solution: > > 1. Why not use a queue to record precisely where the plabs are? > First, the queue costs space. Maybe 1/128 of heap space at worst (e.g., 4 bytes for plab pointer; a region can have many plabs but only plabs larger than a card are interesting enough, so region_size/512 plabs in total). We can record very large plabs only, to reduce the queue size. Still it costs space. Fwiw there is a fairly large default minimum PLAB size (2k plus something) which bounds the amount too; you could also do some simple compression here that would decrease the size of these data points significantly: since it is possibly sufficient to just store the "start card" of that PLAB anyway, i.e. you can actually get away with 16 bit entries similar to the remembered sets. Actually, you could probably directly use a `G1CardSet` to store these, probably with some tuned settings, you might not need an extra data structure at all (depending on the needs on that data structure). The `G1CardSet` is a fairly compact representation. Another option could be also just store a pointer to the first PLAB per region where this work needs to be done (similar to `G1CMRootMemRegions`); the additional PLAB boundaries could be recalculated by using the (known) PLAB size. I.e. `base + x * PLABSize` gives you the x+1`th PLAB start. Direct allocations are a (small) problem, as they change that boundary, however I think these can be detected assuming the refinement threads do not interfere; given a `cur` pointer within that region, which indicates where other threads need to continue fixing up the BOT (until `top()` of course), if the BOT entry at `card(cur + PLABSize)` does not exactly point to `base + PLABSize` (i.e. the start of the next PLAB), this must have been a single object (or the first object's size is larger than a PLAB), so just adjust `cur` by that single object's size. I *think* the BOT of these single objects is already correct, so it can be skipped. Another argument against a too elaborate scheme here is that the amount of old gen regions allocated should be fairly small in the normal case due to the generational hypothesis, so the memory consumption for this should not matter. > Secondly, enqueue action costs a little allocation time. Thirdly, if we want to do dummy object filling on top of fixing, as Thomas suggested earlier, it's going to need a heap walk anyway (I imagine). G1 already does a heap walk after marking (rebuild remembered set phase). Only then we also need dummy-object filling (without a mark there is no class unloading, i.e. no objects get invalid). The difference to the BOT-fixup work is that you want to do this BOT-fixup work after every GC (that allocates into old). (This interrupting action also needs to be considered when/if implementing that concurrent reformatting). The dummy-object formatting has only been a note about that the same work (BOT-fixup) also needs to be done if implementing the other, so similar/same issues will arise. That is however something to be explored in a different CR/PR when/if done. > One thing I didn't quite get over is, what do you mean by > > > > One idea could be that after allocating a PLAB (in old gen only of course) we enqueue some background task to fix these. We know the exact locations and sizes of those, maybe this somewhat helps. We also know the chunk sizes that are used for scanning, so we know the problematic places. > > Thomas? I don't know how knowing the chunk sizes is going to help with narrow down the problematic places. I feel like I'm being stupid and miss something :D Would you help me understand? Do you think we can work from here? The "problematic places" were the places where such an overlap of work during card scan may occur, still assuming that we continue to fix up the card table during card scan. I.e. if the chunk size for card scan is e.g. 16k words, one option could be to only fix the BOT for PLABs crossing that 16k boundaries; since we know that during card scan threads are claiming cards per 16k chunks (idk right now what it actually is, probably higher), there can only be overlapped work happening across them. So if we had a good BOT entry at every 16k crossing, every thread during card scan would "never" do duplicate work. At least I believe so :) Of course, if the card scanning would not need to do the fixup, it would be even better as this, as your graphs indicate, decreases pause time. > > I'll upload the code when it's more polished (supposing the general direction is okay :D) So far it sounds good and the results seem good as well. > > Some experiment details (using `numactl -C 0-15 -m 0 java -Xmx8g -Xms8g -XX:ParallelGCThreads=4 <-XX:G1ConcRefinementGreenZone=$refine> <-XX:-G1UseAdaptiveConcRefinement> -Xlog:gc*=debug:file=$fname -Dspecjbb.controller.type=PRESET -Dspecjbb.controller.preset.ir=8000 -Dspecjbb.controller.preset.duration=600000 -jar specjbb2015.jar -m COMPOSITE`): The graphs look promising so far; I'm only a bit concerned about the total pause time outliers (system.gc()?) as they seem a bit high. Also for some reason the command line above only configures 4 gc threads out of 16. I'm looking forward to these patches. :) Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/5039 From tschatzl at openjdk.java.net Mon Aug 23 10:57:25 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 23 Aug 2021 10:57:25 GMT Subject: RFR: 8272611: Use parallel heap inspection when print classhisto info in gc log In-Reply-To: References: Message-ID: On Wed, 18 Aug 2021 07:35:50 GMT, Bin Liao wrote: > This patch helps to enhance the heap inspection with parallel threads when print classhisto info in gc log. > > I have built it on linux x86_64 and run test-tier1 successfully > > --------- > ### Progress > - [x] Change must not contain extraneous whitespace > - [x] Commit message must refer to an issue > - [ ] Change must be properly reviewed > > > > ### Reviewing >
Using git > > Checkout this PR locally: \ > `$ git fetch https://git.openjdk.java.net/jdk pull/5158/head:pull/5158` \ > `$ git checkout pull/5158` > > Update a local copy of the PR: \ > `$ git checkout pull/5158` \ > `$ git pull https://git.openjdk.java.net/jdk pull/5158/head` > >
>
Using Skara CLI tools > > Checkout this PR locally: \ > `$ git pr checkout 5158` > > View PR using the GUI difftool: \ > `$ git pr show -t 5158` > >
>
Using diff file > > Download this PR as a diff file: \ > https://git.openjdk.java.net/jdk/pull/5158.diff > >
Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/shared/collectedHeap.cpp line 551: > 549: ResourceMark rm; > 550: LogStream ls(lt); > 551: uint parallel_thread_num = MAX2(1, (uint)os::initial_active_processor_count() * 3 / 8); Where do the magic numbers for the formula come from? Please consider using the ergonomically determined number of threads (`CollectedHeap::safepoint_workers()->active_workers()`)? (Maybe change this in another CR as other reviewers think is better, looking whether there is a significant difference between these numbers) src/hotspot/share/gc/shared/collectedHeap.cpp line 552: > 550: LogStream ls(lt); > 551: uint parallel_thread_num = MAX2(1, (uint)os::initial_active_processor_count() * 3 / 8); > 552: VM_GC_HeapInspection inspector(&ls, false /* ! full gc */, parallel_thread_num); Pre-existing: the comment next to a bool constant in a parameter list should show the name of the parameter, i.e. s/! full_gc/request_full_gc/. Maybe change this as appropriate in other callers too. ------------- PR: https://git.openjdk.java.net/jdk/pull/5158 From tschatzl at openjdk.java.net Mon Aug 23 10:57:26 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 23 Aug 2021 10:57:26 GMT Subject: RFR: 8272611: Use parallel heap inspection when print classhisto info in gc log In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 10:39:59 GMT, Thomas Schatzl wrote: >> This patch helps to enhance the heap inspection with parallel threads when print classhisto info in gc log. >> >> I have built it on linux x86_64 and run test-tier1 successfully >> >> --------- >> ### Progress >> - [x] Change must not contain extraneous whitespace >> - [x] Commit message must refer to an issue >> - [ ] Change must be properly reviewed >> >> >> >> ### Reviewing >>
Using git >> >> Checkout this PR locally: \ >> `$ git fetch https://git.openjdk.java.net/jdk pull/5158/head:pull/5158` \ >> `$ git checkout pull/5158` >> >> Update a local copy of the PR: \ >> `$ git checkout pull/5158` \ >> `$ git pull https://git.openjdk.java.net/jdk pull/5158/head` >> >>
>>
Using Skara CLI tools >> >> Checkout this PR locally: \ >> `$ git pr checkout 5158` >> >> View PR using the GUI difftool: \ >> `$ git pr show -t 5158` >> >>
>>
Using diff file >> >> Download this PR as a diff file: \ >> https://git.openjdk.java.net/jdk/pull/5158.diff >> >>
> > src/hotspot/share/gc/shared/collectedHeap.cpp line 551: > >> 549: ResourceMark rm; >> 550: LogStream ls(lt); >> 551: uint parallel_thread_num = MAX2(1, (uint)os::initial_active_processor_count() * 3 / 8); > > Where do the magic numbers for the formula come from? Please consider using the ergonomically determined number of threads (`CollectedHeap::safepoint_workers()->active_workers()`)? (Maybe change this in another CR as other reviewers think is better, looking whether there is a significant difference between these numbers) Note: this is probably copy&pasted from `ClassHistogramDCmd::execute`; instead of the copy&paste please at least add something like a static `default_parallel_thread_num()` method to `VM_GC_HeapInspection`. ------------- PR: https://git.openjdk.java.net/jdk/pull/5158 From tschatzl at openjdk.java.net Mon Aug 23 10:58:28 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 23 Aug 2021 10:58:28 GMT Subject: RFR: 8272725: G1: add documentation on needs_remset_update_t vs bool [v2] In-Reply-To: <7fRoxbV6b5GKXZuzfqQ4lXjjFtpXIFQ7F5CglMeB7DA=.791bf0dd-1faa-484e-b73a-f8236cdec8f5@github.com> References: <7fRoxbV6b5GKXZuzfqQ4lXjjFtpXIFQ7F5CglMeB7DA=.791bf0dd-1faa-484e-b73a-f8236cdec8f5@github.com> Message-ID: On Fri, 20 Aug 2021 11:22:48 GMT, Albert Mingkun Yang wrote: >> Simple type change in `struct G1HeapRegionAttr`. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: > > - review > - Revert "boolean" > > This reverts commit 2a23beb0549a928ae3876070dbdcf5a153c0703d. Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5188 From shade at openjdk.java.net Mon Aug 23 11:02:46 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 23 Aug 2021 11:02:46 GMT Subject: RFR: 8272783: Epsilon: Refactor tests to improve performance Message-ID: Epsilon tests run in tier1, and they take considerable time, especially on low-core machines. We can refactor and simplify some of the tests for better performance. The improvements are: - Dropping test heap sizes to 64M..256M, which saves time zeroing/zapping the heap area; - Dropping some configurations, notably out-of-box `-Xbatch -Xcomp` (which makes little sense, as C1/C2 paths would be taken by other configs) Additionally, run configurations were reformatted to avoid long lines. $ make run-test TEST=gc/epsilon # Before real 1m28.297s user 9m2.689s sys 0m18.370s # After real 1m0.396s user 6m27.221s sys 0m13.499s ------------- Commit messages: - 8272783: Epsilon: Refactor tests to improve performance Changes: https://git.openjdk.java.net/jdk/pull/5202/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5202&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272783 Stats: 408 lines in 21 files changed: 271 ins; 29 del; 108 mod Patch: https://git.openjdk.java.net/jdk/pull/5202.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5202/head:pull/5202 PR: https://git.openjdk.java.net/jdk/pull/5202 From tschatzl at openjdk.java.net Mon Aug 23 11:05:36 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 23 Aug 2021 11:05:36 GMT Subject: RFR: 8267894: Skip work for empty regions in G1 Full GC [v2] In-Reply-To: References: Message-ID: On Wed, 18 Aug 2021 13:26:47 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to skip doing work on regions that were empty at the beginning of the Full GC in phases that execute on a per-region basis. >> >> Testing: Tier 1-3. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > changes proposeed by kstefanj review Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5143 From tschatzl at openjdk.java.net Mon Aug 23 11:18:27 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 23 Aug 2021 11:18:27 GMT Subject: RFR: 8236176: Parallel GC SplitInfo comment should be updated for shadow regions In-Reply-To: References: Message-ID: <6NlLePCpeL2d_ZEoEzyukAQovrNBmFC4UA9N9l3noH0=.7a5874e2-0ea4-401e-85c4-b17a65478d8a@github.com> On Fri, 20 Aug 2021 06:48:18 GMT, Ivan Walulya wrote: > Hi all, > > Please review this small change to add a high-level description of how shadow regions are utilized during the compact phase of the parallel GC. There is no special handling of Splitinfo by the shadow regions; the destination computation and the handling of partial_objects are done during the summary phase while the shadow regions optimization only applies to compaction. > > Additionally, some unused variables are cleaned up. Lgtm apart from my comments about some comment typos. src/hotspot/share/gc/parallel/psParallelCompact.hpp line 964: > 962: // region that can be put on the ready list. The regions are atomically added > 963: // and removed from the ready list. > 964: // This commentactually refers to the sentence: "// An alternate strategy is being investigated for this deferral of updating." above. This sentence should be removed as this offers no information about the current algorithm. Fwiw I do not know that actually anyone is working on that, and if so, a CR in JIRA would be more appropriate. I could not find anything in JIRA about work on this, and given that comment is pretty old, this is just confusing. src/hotspot/share/gc/parallel/psParallelCompact.hpp line 968: > 966: // destination regions may also be source regions themselves. Consequently, the > 967: // destination regions are not available for processing until all live objects > 968: // within them are evacuated to their destinations. These dependencies lead to There is a mix of one and two spaces after a full stop (5 times I think) in this and the next paragraph. I recommend using two spaces like in the above paragraphs for uniformity. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5195 From iwalulya at openjdk.java.net Mon Aug 23 11:29:33 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 23 Aug 2021 11:29:33 GMT Subject: RFR: 8271930: Simplify end_card calculation in G1BlockOffsetTablePart::verify [v2] In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 08:36:52 GMT, Albert Mingkun Yang wrote: >> Simplify `end_card` calculation and some general cleanup. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Lgtm! ------------- Marked as reviewed by iwalulya (Committer). PR: https://git.openjdk.java.net/jdk/pull/5015 From github.com+1866174+tbzhang at openjdk.java.net Mon Aug 23 11:58:30 2021 From: github.com+1866174+tbzhang at openjdk.java.net (Tongbao Zhang) Date: Mon, 23 Aug 2021 11:58:30 GMT Subject: Integrated: 8272417: ZGC: fastdebug build crashes when printing ClassLoaderData In-Reply-To: References: Message-ID: On Fri, 13 Aug 2021 07:14:51 GMT, Tongbao Zhang wrote: > Following test fails on fastdebug build with the options: "-XX:+UseZGC -XX:+VerifyAfterGC -Xlog:gc*=trace". It crashes on the unknown claim value of the ClassLoaderData > > public class A { > public static void main(String... args) { > System.gc(); > } > } > > The unknown claim value of the ClassLoaderData is "ClassLoaderData::_claim_other | ClassLoaderData::_claim_strong" which is introduced by the ZVerifyOopClosure. This patch clears the CLD's claimed marks after the ZGC verification This pull request has now been integrated. Changeset: 741f58c1 Author: Tongbao Zhang Committer: Stefan Karlsson URL: https://git.openjdk.java.net/jdk/commit/741f58c18c3dc49c5e1b793e411d8479770f2772 Stats: 7 lines in 1 file changed: 2 ins; 0 del; 5 mod 8272417: ZGC: fastdebug build crashes when printing ClassLoaderData Reviewed-by: stefank, pliden ------------- PR: https://git.openjdk.java.net/jdk/pull/5107 From tschatzl at openjdk.java.net Mon Aug 23 12:51:32 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 23 Aug 2021 12:51:32 GMT Subject: RFR: 8242847: G1 should not clear mark bitmaps with no marks In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 07:51:51 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to bound the range of bitmap clearing for each region using the liveness data collected during marking. For the Concurrent Undo Mark cycle, the liveness information (next_top_at_mark_start and live_words) is in sync wrt. the _next_mark_bitmap that needs clearing. Hence, we use these details to clear only bitmaps for regions that were dirtied and need clearing i.e. only clear between [bottom, ntams), and only clear bitmaps for regions that had at least one bit set (i.e. have some live data). > > Testing: Tier 1-3. Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 591: > 589: > 590: bool has_aborted() { > 591: if (_cm != NULL) { Maybe put the predicate `_cm != nullptr` in an extra method, something like `is_clear_concurrent()` (or a better name) would improve readability... src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 614: > 612: return r->next_top_at_mark_start(); > 613: } > 614: return r->end(); If not in the `UndoMark` operation (and doing the work concurrently), we could use the `prev_top_at_mark_start()` here, couldn't we? src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 618: > 616: > 617: public: > 618: G1ClearBitmapHRClosure(G1CMBitMap* bitmap, G1ConcurrentMark* cm) : HeapRegionClosure(), _bitmap(bitmap), _cm(cm) { Since this code is so highly specific to clearing `_next_bitmap` (and depending on current state which TAMS is the correct one for a given region) I would prefer if the code did not have configurable `bitmap` too. src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 673: > 671: } > 672: > 673: Unnecessary newline. ------------- PR: https://git.openjdk.java.net/jdk/pull/5213 From tschatzl at openjdk.java.net Mon Aug 23 12:51:32 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 23 Aug 2021 12:51:32 GMT Subject: RFR: 8242847: G1 should not clear mark bitmaps with no marks In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 12:41:47 GMT, Thomas Schatzl wrote: >> Hi all, >> >> Please review this change to bound the range of bitmap clearing for each region using the liveness data collected during marking. For the Concurrent Undo Mark cycle, the liveness information (next_top_at_mark_start and live_words) is in sync wrt. the _next_mark_bitmap that needs clearing. Hence, we use these details to clear only bitmaps for regions that were dirtied and need clearing i.e. only clear between [bottom, ntams), and only clear bitmaps for regions that had at least one bit set (i.e. have some live data). >> >> Testing: Tier 1-3. > > src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 614: > >> 612: return r->next_top_at_mark_start(); >> 613: } >> 614: return r->end(); > > If not in the `UndoMark` operation (and doing the work concurrently), we could use the `prev_top_at_mark_start()` here, couldn't we? Out of curiosity, did you measure what gain (in saved bitmap distance) does using the TAMSes give here? Asking because most TAMSes should be either at the end (=almost the same as top) or bottom of the region, only the current old gen allocation region at the time of the concurrent mark start pause may have a different one (I think). Maybe that additional suggested (by me) optimization is not worth the effort... ------------- PR: https://git.openjdk.java.net/jdk/pull/5213 From tschatzl at openjdk.java.net Mon Aug 23 12:56:29 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 23 Aug 2021 12:56:29 GMT Subject: RFR: 8272783: Epsilon: Refactor tests to improve performance In-Reply-To: References: Message-ID: <4oAWFpscQw47r1D2bPBaMUARka1sdKC0af49G61RiWo=.7fd59cf6-52d9-41cc-90d2-8dfe7443b8ae@github.com> On Fri, 20 Aug 2021 15:31:05 GMT, Aleksey Shipilev wrote: > Epsilon tests run in tier1, and they take considerable time, especially on low-core machines. We can refactor and simplify some of the tests for better performance. > > The improvements are: > - Dropping test heap sizes to 64M..256M, which saves time zeroing/zapping the heap area; > - Dropping some configurations, notably out-of-box `-Xbatch -Xcomp` (which makes little sense, as C1/C2 paths would be taken by other configs) > > Additionally, run configurations were reformatted to avoid long lines. > > > $ make run-test TEST=gc/epsilon > > # Before > real 1m28.297s > user 9m2.689s > sys 0m18.370s > > # After > real 1m0.396s > user 6m27.221s > sys 0m13.499s Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5202 From iwalulya at openjdk.java.net Mon Aug 23 12:59:32 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 23 Aug 2021 12:59:32 GMT Subject: RFR: 8267894: Skip work for empty regions in G1 Full GC In-Reply-To: References: Message-ID: On Wed, 18 Aug 2021 11:51:28 GMT, Stefan Johansson wrote: >> Hi all, >> >> Please review this change to skip doing work on regions that were empty at the beginning of the Full GC in phases that execute on a per-region basis. >> >> Testing: Tier 1-3. > > I think the text "not initialized yet" is a bit wrong. We should never call is_skip_marking() for objects in "Invalid" regions, because there can't be any objects there. Same goes for "Free" regions. So I would change the assert to something like: > > assert(!is_free(obj), "Should be no objects in free regions."); > > I agree that changing the default value should work. One benefit of using a new state could be to have a difference between free and not committed regions. But not sure it is worth the effort right now. Thanks @kstefanj and @tschatzl for the reviews ------------- PR: https://git.openjdk.java.net/jdk/pull/5143 From iwalulya at openjdk.java.net Mon Aug 23 13:02:33 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 23 Aug 2021 13:02:33 GMT Subject: Integrated: 8267894: Skip work for empty regions in G1 Full GC In-Reply-To: References: Message-ID: On Tue, 17 Aug 2021 13:29:59 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to skip doing work on regions that were empty at the beginning of the Full GC in phases that execute on a per-region basis. > > Testing: Tier 1-3. This pull request has now been integrated. Changeset: d542745d Author: Ivan Walulya URL: https://git.openjdk.java.net/jdk/commit/d542745dbe0f58f510108d15f7e310ec27f560db Stats: 37 lines in 6 files changed: 19 ins; 3 del; 15 mod 8267894: Skip work for empty regions in G1 Full GC Reviewed-by: sjohanss, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/5143 From rbruno at gsd.inesc-id.pt Mon Aug 23 15:49:21 2021 From: rbruno at gsd.inesc-id.pt (Rodrigo Bruno) Date: Mon, 23 Aug 2021 16:49:21 +0100 Subject: Status of JEP-8204088/JDK-8236073 In-Reply-To: <11ceed1e-fd73-fb78-e636-8b49aaa1f708@oracle.com> References: <11ceed1e-fd73-fb78-e636-8b49aaa1f708@oracle.com> Message-ID: Hi Thomas, On Mon, 23 Aug 2021 at 11:05, Thomas Schatzl wrote: > Hi Rodrigo, > > On 18.08.21 09:36, Rodrigo Bruno wrote: > > Hi Thomas, Hi Man, > > > > On Mon, 2 Aug 2021 at 16:49, Thomas Schatzl > > wrote: > [...] > > > > Fwiw, these two are fairly independent issues, and the main problem > > with > > the proposed CurrentMaxHeapSize patch has been that the change is not > > atomic in the sense that the changes simply modify the global heap > size > > variable without regard whether and when it is accessed. That may > > result > > in fairly nasty races as different components of the GC may then > have a > > different notion of what the current max heap size actually is at the > > moment. > > > > > > That is true. However, we could easily do the update to > > CurrentMaxHeapSize inside a VMOperation to ensure an atomic change. This > > is not in the current patch but it is probably a simple change. Right? > > Unfortunately, not, at least when I looked last time. There did not seem > to be a way to trigger a VM operation based on the change of a variable > using a jcmd command. > > An intermediate step of just polling that variable (in a g1 service > task) and then triggering it or just waiting until the next gc (which is > probably best, particular wrt to decreases in heap size) could be options. > That is a very good point. Waiting for the next GC might actually be the best option because we would (if I am not mistaken) always need a GC before a heap expansion/shrinkage. Do you think the patch would be considered if we proposed a newer version fixing this issue? cheers, rodrigo > > Thanks, > Thomas > -- rodrigo-bruno.github.io From tschatzl at openjdk.java.net Mon Aug 23 15:57:24 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 23 Aug 2021 15:57:24 GMT Subject: RFR: 8254739: G1: Optimize evacuation failure for regions with few failed objects In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 08:11:42 GMT, Hamlin Li wrote: > This is a try to optimize evcuation failure for regions. > I record every evacuation failure object per region (by G1EvacuationFailureObjsInHR), and then iterate (which indeed includes compact/sort/iteration) these objects directly in RemoveSelfForwardPtrHRClosure. > > I have tested it with following parameters, > > - -XX:+ParallelGCThreads=1/32/64 > - -XX:G1EvacuationFailureALotInterval=1 > - -XX:G1EvacuationFailureALotCount=2/10/100/1000/10000/100000 > > It saves "Remove Self Forwards" time all the time ,and in most condition it saves "Evacuate Collection Set" time. > > It brings some performance degradation when -XX:G1EvacuationFailureALotCount is low, such as *2*. To improve this a little, we can record the number evacuation failure object per region, and not record these objects when the number hit some limit. But I'm not sure if it's necessary to do so, as I think such condition is so extreme to be met in real environment, although I'm not quite sure. Initial feedback: we do not care about cases like `G1EvacuationFailureALotCount` at this time: for current evacuation failure the amount of failed objects is supposed to be small (due to `G1HeapReservePercent`), and if they are large, we are likely going into a full gc anyway. When reusing this mechanism for region pinning, there is unfortunately no particular limit on the live objects to be expected apart from something like "1/2 the objects" as this can occur with any region in the young gen (we can just skip selecting old gen regions for evacuation in that case), so performance should be good for that use case, more so than now. One problem I can see now with performance is the use of a linked list for storing the regions. Using such requires quite a lot of memory allocation (every `Node` object takes twice the needed data, i.e. the reference plus `next` link, plus low-level malloc overhead), and obviously traversing all links is kind of slow. Since the only thing that needs to be done concurrently (and fast) is adding an element, an option could be something like a segmented array of HeapWords (per region?), i.e. a list of arrays of elements. (Further, since this is per `HeapRegion` the memory cost could be cut in half easily by using 32 bit offsets). That would probably also be faster when "compacting" all these failed arrays into a single array, and faster when deallocating (just a few of these arrays). Something like `G1CardSetAllocator` with an element type of `G1CardSetArray`or the dirty card queue set allocation (`BufferNode::Allocator` for the allocation only part); unfortunately the code isn't that generic to be used as is for you. I am also not really convinced that the failed object lists should be attached to `HeapRegion`, I would rather prefer an extra array as this is something that is usually not needed (and only required for regions with failure, which are very few typically), and only needed during young gc. (I intend to add something like a container for `HeapRegion` specific data that is only needed and used during gc at some point, moving similar existing misplaced data out of `HeapRegion` there then). Overall we can add some cut-off for whole-region iteration at some point as you described as needed later I think. Do you have a breakdown of the time taken for the parts of the `G1EvacuationFailureObjsInHR::iterate()` method to see where where most time is spent? I.e. what overhead does the linked list impose compared to actual "useful" work? Do you have an idea about the total memory overhead? I can imagine that can be significant in this case. I understand this is a bit hard to measure in a realistic environment because while `-XX:G1EvacuationFailureALot` is good for implementing and debugging, but the distribution of failures is probably a lot different than real evacuation failures. Maybe I could help you with finding some options to configure some existing stress tests (GCBasher, ...) to cause evacuation failures? Later I will follow up with a more thorough look. Hth, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/5181 From iwalulya at openjdk.java.net Mon Aug 23 16:07:48 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 23 Aug 2021 16:07:48 GMT Subject: RFR: 8236176: Parallel GC SplitInfo comment should be updated for shadow regions [v2] In-Reply-To: References: Message-ID: > Hi all, > > Please review this small change to add a high-level description of how shadow regions are utilized during the compact phase of the parallel GC. There is no special handling of Splitinfo by the shadow regions; the destination computation and the handling of partial_objects are done during the summary phase while the shadow regions optimization only applies to compaction. > > Additionally, some unused variables are cleaned up. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Thomas review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5195/files - new: https://git.openjdk.java.net/jdk/pull/5195/files/52fbb6fa..b394ba83 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5195&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5195&range=00-01 Stats: 8 lines in 1 file changed: 0 ins; 1 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/5195.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5195/head:pull/5195 PR: https://git.openjdk.java.net/jdk/pull/5195 From iwalulya at openjdk.java.net Mon Aug 23 16:07:50 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 23 Aug 2021 16:07:50 GMT Subject: RFR: 8236176: Parallel GC SplitInfo comment should be updated for shadow regions [v2] In-Reply-To: <6NlLePCpeL2d_ZEoEzyukAQovrNBmFC4UA9N9l3noH0=.7a5874e2-0ea4-401e-85c4-b17a65478d8a@github.com> References: <6NlLePCpeL2d_ZEoEzyukAQovrNBmFC4UA9N9l3noH0=.7a5874e2-0ea4-401e-85c4-b17a65478d8a@github.com> Message-ID: On Mon, 23 Aug 2021 11:13:22 GMT, Thomas Schatzl wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> Thomas review > > src/hotspot/share/gc/parallel/psParallelCompact.hpp line 964: > >> 962: // region that can be put on the ready list. The regions are atomically added >> 963: // and removed from the ready list. >> 964: // > > This commentactually refers to the sentence: "// An alternate strategy is being investigated for this deferral of updating." above. > > This sentence should be removed as this offers no information about the current algorithm. Fwiw I do not know that actually anyone is working on that, and if so, a CR in JIRA would be more appropriate. I could not find anything in JIRA about work on this, and given that comment is pretty old, this is just confusing. Removed! ------------- PR: https://git.openjdk.java.net/jdk/pull/5195 From jonathanjoo at google.com Tue Aug 24 05:07:30 2021 From: jonathanjoo at google.com (Jonathan Joo) Date: Tue, 24 Aug 2021 01:07:30 -0400 Subject: Status of JEP-8204088/JDK-8236073 In-Reply-To: <939f9a95-9cd2-72b5-6972-99a651eb6e47@oracle.com> References: <84ee0b72-ffd8-3e2a-0461-a5a9d2c7e31b@oracle.com> <3694c54c-9bf5-9c5a-ada1-8f1d7bf737a1@oracle.com> <1119e172-3f29-5616-d3d8-a542b15fdf65@oracle.com> <79b3df05-4e79-420f-f105-b6d11c26dfc1@oracle.com> <02e32674-5095-ee6d-2d8a-2cb7d0f1a1e5@oracle.com> <35ff0404-9cb0-f9bf-0d37-2c3d7b4a0f67@oracle.com> <939f9a95-9cd2-72b5-6972-99a651eb6e47@oracle.com> Message-ID: Apologies for the late reply - just getting around to looking at this again. Thanks for the update! I am continuing to learn more about GC on my end and have been prototyping via putting together all the bits and pieces mentioned in previous emails. I'll definitely reach out to you again once I'm ready to start contributing to fixing up the heap sizing issues upstream. Thank you! ~ Jonathan On Tue, Jul 13, 2021 at 3:53 AM Thomas Schatzl wrote: > Hi all, > > On 21.06.21 13:40, Thomas Schatzl wrote: > > Hi, > > > [...]> > > Another reason to not immediately jump on it is that while I can see > > your point, first I would prefer to have the current issues fixed :) > fwiw, I updated the open repo with the JDK-8236073* changes to latest > and also fixed a calculation bug when shrinking. > > > https://github.com/tschatzl/jdk/tree/8238687-investigate-memory-uncommit-during-young-gc2 > > We still need to look at and investigate out-of-box performance changes > because of this. > > Sorry for doing a force push. I'll try to remember to do "normal" merges > in the future if that is more convenient to you. > > Thanks, > Thomas > From github.com+16811675+linade at openjdk.java.net Tue Aug 24 06:13:28 2021 From: github.com+16811675+linade at openjdk.java.net (Yude Lin) Date: Tue, 24 Aug 2021 06:13:28 GMT Subject: RFR: 8272083: G1: Record iterated range for BOT performance during card scan In-Reply-To: References: Message-ID: On Sat, 7 Aug 2021 04:56:08 GMT, Yude Lin wrote: > A fix to the problem in 8272083 is to use a per-worker pointer to indicate where the worker has scanned up to, similar to the _scanned_to variable. The difference is this pointer (I call it _iterated_to) records the end of the object and _scanned_to records the end of the scan. Since we always scan with increasing addresses, the end of the latest object scanned is also the address where BOT has fixed up to. So we avoid having to fix below this address when calling block_start(). This implementation approximately reduce the number of calls to set_offset_array() during scan_heap_roots() 2-10 times (in my casual test with -XX:G1ConcRefinementGreenZone=1000000). > > What this approach not solving is random access to BOT. So far I haven't found anything having this pattern. Hi Thomas, >> If concurrent refinement tries to refine a card, it will probably run into an unfixed part of BOT. We prevent this by requiring the refiner to fix where the card is pointing at. If other worker has claimed the chunk where the card is pointing at, then refinement can proceed without a guarantee the chunk is actually fixed. This not often happens but might be a problem. >> > This paragraph indicates that the refinement thread never fixes up the BOT, and there is some synchronization mechanism between BOT-fix thread and the refinement threads. > > Not sure how this works, but it depends on how the chunks for BOT-fixup are claimed - maybe the refinement thread could fix the BOT for that chunk first (if unclaimed)? Just an idea, could the refinement threads do that BOT-fixup work first "before" (or during) refining work? > > E.g. if a refinement thread gets a card, first check if there is a corresponding BOT-fixup chunk to do, and do that first, doing all to be refined cards in that PLAB area at the same time (or don't do that at the same time - there are a few options here). > > I do not think leaving some work for card scanning is an issue as long as most work is done. What I meant here is when the refinement thread refines a card, it will first try to claim all the unfixed chunks in the same region, and fix them, up to a point where the it knows that "where the card points into has been fixed, I will not run into unfixed BOT if I refine it". So I think it's like what you suggested. Refinement threads and fixer threads are synchronized by the chunk index. If one claims a chunk, the other will look for the next. An issue is that the chunks of a region is claimed from bottom to top, there is no easy way to fix a random one in the middle. It means the refinement can be stall for some time but it won't be too long. I imagine, if we are using a queue to manage the BOT-fixing tasks, there is still an order imposed. The refinement thread has to dequeue and fix everything until where the card points into is fixed. But even taking into account the time to do all the fixing, the refinement rate increases (in my experiment). So I think it's worth it? >> If another gc pause is scheduled, we need to wait for the fixing phase to complete. Because we assume that "BOT is going to be fixed almost entirely anyway. Might as well do it sooner". We can potentially abort though, and handle the unfixed part in some other way. >> > This paragraph indicates to me that the BOT-fix thread has higher priority than the GC pause, which seems to be the wrong way to look at the problem - in this case I would rather have the GC eat the cost of fixing up the BOT as the BOT fixup seems supplementary. > > Delaying remaining BOT-fixing work for after the GC pause seems not to be a good idea, this seems to only result in forever piling up work (assuming in steady state the GC is always faster than completing that work). Not completely sure if this is what you meant. > > Or did you mean that in the pause some extra(?) phase completes the fixup work, which could be another option? > > However just leaving the remainder of the work for the card scan, even if it sometimes does duplicate work does not seem to be an issue to me. This problem of duplicate work is probably largest on large heaps, where the time between pauses allows for complete concurrent BOT-fixup anyway. What I did here is letting the gc thread wait for fixing *in the safepoint*, when everything is paused. I think it's equivalent to adding another phase in the gc pause, but it seems I could be wrong. The data shows most of the time the waiting finishes within 0.1ms, but occasionally >10ms or even >100ms, which is causing part of the problem you mentioned about the outlier of the pause time. I'm yet to understand why. I'm sure something can be engineered to avoid this. But I agree we should probably abort the fixing when gc happens, if we consider the role of fixing as always supplementary to gc. I also realized afterwards that this waiting time should be considered by the pause time policy if we don't abort. > Actually, you could probably directly use a G1CardSet to store these, probably with some tuned settings, you might not need an extra data structure at all (depending on the needs on that data structure). The G1CardSet is a fairly compact representation. > > Another option could be also just store a pointer to the first PLAB per region where this work needs to be done (similar to G1CMRootMemRegions); the additional PLAB boundaries could be recalculated by using the (known) PLAB size. I.e. base + x * PLABSize gives you the x+1`th PLAB start. I'll check out these data structures. Thanks for the pointer~ > Direct allocations are a (small) problem, as they change that boundary, however I think these can be detected assuming the refinement threads do not interfere; given a cur pointer within that region, which indicates where other threads need to continue fixing up the BOT (until top() of course), if the BOT entry at card(cur + PLABSize) does not exactly point to base + PLABSize (i.e. the start of the next PLAB), this must have been a single object (or the first object's size is larger than a PLAB), so just adjust cur by that single object's size. I think the BOT of these single objects is already correct, so it can be skipped. But I still want to argue against using a queue or other extra data structure. An additional point (on top of space consumption) is that I don't have to make modification to the allocation code if I just use the old_top()/new_top() scheme. I checked how to modify them. It involves changing too many interfaces in G1AllocRegion/HeapRegion/G1Allocator, in order to pass down the information "am I allocating a plab or a large object?", to where the locked allocation actually happens. (Maybe we can discuss this over code when it's uploaded but) we do want to do enqueue when the HeapRegion is locked, right? > I.e. if the chunk size for card scan is e.g. 16k words, one option could be to only fix the BOT for PLABs crossing that 16k boundaries; since we know that during card scan threads are claiming cards per 16k chunks (idk right now what it actually is, probably higher), there can only be overlapped work happening across them. I *think*, even if a plab doesn't cross a chunk boundary, if we try to call block_start() with addr1 and addr2 (plab_start < addr1 < addr2) in that order, there is possibly duplicated work. The first call will try to fix [plab_start, addr1], the second call will try to fix [start_of_object_that_covers(addr1), addr2]. There is an overlap between this two ranges. The reason is the second call is going back by too much. The BOT offset array element at addr2 is not updated when [plab_start, addr1] is fixed. It will go back to start_of_object_that_covers(addr1) (because the BOT element at addr1 is updated from plab_start to the real object the covers addr1). > The graphs look promising so far; I'm only a bit concerned about the total pause time outliers (system.gc()?) as they seem a bit high. Also for some reason the command line above only configures 4 gc threads out of 16. The pause time issue can be partially explained by the waiting time for fixing, which is to be removed as discussed earlier. I tend to use less threads for testing. It only has one concurrent gc thread. The results seem less random ; ) Thank you for the detailed and informative reply! Regards, Yude ------------- PR: https://git.openjdk.java.net/jdk/pull/5039 From stefan.johansson at oracle.com Tue Aug 24 09:35:34 2021 From: stefan.johansson at oracle.com (Stefan Johansson) Date: Tue, 24 Aug 2021 11:35:34 +0200 Subject: Intentional heap bounds change? In-Reply-To: <7d3f0384-aad5-94de-9fda-ed2026b7d2f1@oracle.com> References: <475beb6a-8ca2-7849-bada-cb8288d78d47@redhat.com> <2ba5e394-c705-af1b-481c-46e96ba14011@oracle.com> <1fd77b1a-4ff3-67d6-3285-6c3fc3c2ffc6@redhat.com> <7d3f0384-aad5-94de-9fda-ed2026b7d2f1@oracle.com> Message-ID: Thanks for filing the issue Stefan, I took a short look at it and have updated the JBS entry. Cheers, Stefan On 2021-08-12 13:16, Stefan Karlsson wrote: > I logged a Bug for this: > https://bugs.openjdk.java.net/browse/JDK-8272364 > > Thanks, > StefanK > > On 2021-08-11 13:04, Roman Kennke wrote: >> Hi, >> >> thanks for the comments. I have an example that shows the 'problem': >> >> http://cr.openjdk.java.net/~rkennke/Example.java >> >> Reproduce it using: >> >> java -Xms1g -Xmx4g -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 >> -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 >> -XX:AdaptiveSizePolicyWeight=90 >> -Xlog:gc*=info,gc=debug,gc+heap*=trace,gc+ergo*=trace:file=gc.log:time,uptimemillis,level,tags >> Example 300 >> >> Check the heap log for heap size entries: >> >> grep "Pause Young" gc.log >> >> This yields the following: >> >> http://cr.openjdk.java.net/~rkennke/gc-heapsize.txt >> >> It shows that the heap is indeed shrinking well below -Xms, e.g. >> ~600MB instead of 1024MB here: >> >> GC(12) Pause Young (Allocation Failure) 559M->589M(592M) 4,797ms >> >> Thanks for your help! >> Roman >> >> >>> Hi, >>> >>> On 2021-08-05 16:21, Roman Kennke wrote: >>>> Hello GC devs, >>>> >>>> I see a case where heap may shrink below -Xms. This may be >>>> intentional or not. -Xms is initial heap size after all, not >>>> necessarily minimum heap size. However, there is also documentation >>>> that -Xms *does* provide the lower bounds for heap sizing policy, >>>> e.g. [1] >>> >>> The documentation around this didn't reflect the actual >>> implementation in HotSpot. As a part of recent cleanup and >>> introduction of the -XX:InitialHeapSize flag, we updated the >>> documentation to reflect the way -Xms is actually working: >>> >>> https://docs.oracle.com/en/java/javase/16/docs/specs/man/java.html >>> | >>> --- >>> -Xms| /size/ >>> >>> ??? Sets the minimum and initial size (in bytes) of the heap. This value >>> ??? must be a multiple of 1024 and greater than 1 MB. Append the letter >>> ??? |k| or |K| to indicate kilobytes, |m| or |M| to indicate megabytes, >>> ??? |g| or |G| to indicate gigabytes. The following examples show how to >>> ??? set the size of allocated memory to 6 MB using various units: >>> >>> ??? |-Xms6291456 -Xms6144k -Xms6m| >>> >>> ??? Instead of the |-Xms| option to set both the minimum and initial >>> ??? size of the heap, you can use |-XX:MinHeapSize| to set the minimum >>> ??? size and |-XX:InitialHeapSize| to set the initial size." >>> >>> --- >>> >>> >>> Concretely, the code that parses -Xms sets both MinHeapSize and >>> InitialHeapSize: >>> >>> >>> ???? } else if (match_option(option, "-Xms", &tail)) { >>> ?????? julong size = 0; >>> ?????? // an initial heap size of 0 means automatically determine >>> ?????? ArgsRange errcode = parse_memory_size(tail, &size, 0); >>> ... >>> ?????? if (FLAG_SET_CMDLINE(MinHeapSize, (size_t)size) != >>> JVMFlag::SUCCESS) { >>> ???????? return JNI_EINVAL; >>> ?????? } >>> ?????? if (FLAG_SET_CMDLINE(InitialHeapSize, (size_t)size) != >>> JVMFlag::SUCCESS) { >>> ???????? return JNI_EINVAL; >>> ?????? } >>> >>> >>>> >>>> I believe this is due to this change: >>>> >>>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/commit/1ed5b22d6e48ffbebaa53cf79df1c5c790aa9d71*diff-74a766b0aa16c688981a4d7b20da1c93315ce72358cac7158e9b50f82c9773a3L567__;Iw!!ACWV5N9M2RV99hQ!Ydh0taE2rXnnONFDOoIBlbg_dh6YcXqA_2JPPqZUMWcQQvWFah7azM-toyRUxGwfAbPx$ >>>> >>>> >>>> I.e. instead of adjusting old-gen minimum size with young-gen >>>> minimum size, it is set to _gen_alignment. >>>> >>>> Is this intentional? >>> >>> Hopefully StefanJ will be able to answer this. The code has this >>> comment, so maybe it is intentional? >>> >>> // Minimum sizes of the generations may be different than >>> // the initial sizes.? An inconsistency is permitted here >>> // in the total size that can be specified explicitly by >>> // command line specification of OldSize and NewSize and >>> // also a command line specification of -Xms.? Issue a warning >>> // but allow the values to pass. >>> >>> Do you have a command line that reproduces this problem? >>> >>> StefanK >>> >>>> >>>> Thanks, >>>> Roman >>>> >>>> >>>> [1] >>>> https://docs.oracle.com/en/java/javase/11/gctuning/factors-affecting-garbage-collection-performance.html#GUID-B0BFEFCB-F045-4105-BFA4-C97DE81DAC5B >>>> >>>> >>> >> > From iwalulya at openjdk.java.net Tue Aug 24 09:45:47 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 24 Aug 2021 09:45:47 GMT Subject: RFR: 8242847: G1 should not clear mark bitmaps with no marks [v2] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to bound the range of bitmap clearing for each region using the liveness data collected during marking. For the Concurrent Undo Mark cycle, the liveness information (next_top_at_mark_start and live_words) is in sync wrt. the _next_mark_bitmap that needs clearing. Hence, we use these details to clear only bitmaps for regions that were dirtied and need clearing i.e. only clear between [bottom, ntams), and only clear bitmaps for regions that had at least one bit set (i.e. have some live data). > > Testing: Tier 1-3. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: thomas review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5213/files - new: https://git.openjdk.java.net/jdk/pull/5213/files/6ab1f44b..0ce053a3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5213&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5213&range=00-01 Stats: 50 lines in 1 file changed: 18 ins; 8 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/5213.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5213/head:pull/5213 PR: https://git.openjdk.java.net/jdk/pull/5213 From iwalulya at openjdk.java.net Tue Aug 24 09:45:48 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 24 Aug 2021 09:45:48 GMT Subject: RFR: 8242847: G1 should not clear mark bitmaps with no marks [v2] In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 12:47:40 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 614: >> >>> 612: return r->next_top_at_mark_start(); >>> 613: } >>> 614: return r->end(); >> >> If not in the `UndoMark` operation (and doing the work concurrently), we could use the `prev_top_at_mark_start()` here, couldn't we? > > Out of curiosity, did you measure what gain (in saved bitmap distance) does using the TAMSes give here? Asking because most TAMSes should be either at the end (=almost the same as top) or bottom of the region, only the current old gen allocation region at the time of the concurrent mark start pause may have a different one (I think). > > Maybe that additional suggested (by me) optimization is not worth the effort... You are right, only in a few cases is TAMS below end, and even then, it is way smaller than chunk_size_in_words. ------------- PR: https://git.openjdk.java.net/jdk/pull/5213 From shade at openjdk.java.net Tue Aug 24 10:11:27 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Aug 2021 10:11:27 GMT Subject: RFR: 8272783: Epsilon: Refactor tests to improve performance In-Reply-To: <4oAWFpscQw47r1D2bPBaMUARka1sdKC0af49G61RiWo=.7fd59cf6-52d9-41cc-90d2-8dfe7443b8ae@github.com> References: <4oAWFpscQw47r1D2bPBaMUARka1sdKC0af49G61RiWo=.7fd59cf6-52d9-41cc-90d2-8dfe7443b8ae@github.com> Message-ID: <7hYELMi5OoouUS_hwX56xNkJmMHvEneRXpbMYbKvhlo=.31b58c83-baad-4587-a282-30452d6d291f@github.com> On Mon, 23 Aug 2021 12:53:18 GMT, Thomas Schatzl wrote: >> Epsilon tests run in tier1, and they take considerable time, especially on low-core machines. We can refactor and simplify some of the tests for better performance. >> >> The improvements are: >> - Dropping test heap sizes to 64M..256M, which saves time zeroing/zapping the heap area; >> - Dropping some configurations, notably out-of-box `-Xbatch -Xcomp` (which makes little sense, as C1/C2 paths would be taken by other configs) >> >> Additionally, run configurations were reformatted to avoid long lines. >> >> >> $ make run-test TEST=gc/epsilon >> >> # Before >> real 1m28.297s >> user 9m2.689s >> sys 0m18.370s >> >> # After >> real 1m0.396s >> user 6m27.221s >> sys 0m13.499s > > Lgtm. Thanks for review, @tschatzl! ------------- PR: https://git.openjdk.java.net/jdk/pull/5202 From shade at openjdk.java.net Tue Aug 24 10:11:27 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 24 Aug 2021 10:11:27 GMT Subject: Integrated: 8272783: Epsilon: Refactor tests to improve performance In-Reply-To: References: Message-ID: <3SeF6HBo33t4iocTLVSzJL5W1s_A2hoIDoxBPZ9RYIU=.c8f224fa-d1ac-4a83-a7b9-fcd5a90a8d60@github.com> On Fri, 20 Aug 2021 15:31:05 GMT, Aleksey Shipilev wrote: > Epsilon tests run in tier1, and they take considerable time, especially on low-core machines. We can refactor and simplify some of the tests for better performance. > > The improvements are: > - Dropping test heap sizes to 64M..256M, which saves time zeroing/zapping the heap area; > - Dropping some configurations, notably out-of-box `-Xbatch -Xcomp` (which makes little sense, as C1/C2 paths would be taken by other configs) > > Additionally, run configurations were reformatted to avoid long lines. > > > $ make run-test TEST=gc/epsilon > > # Before > real 1m28.297s > user 9m2.689s > sys 0m18.370s > > # After > real 1m0.396s > user 6m27.221s > sys 0m13.499s This pull request has now been integrated. Changeset: 7f80683c Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/7f80683cfeee3c069f48d5bce45fa92b2381b518 Stats: 408 lines in 21 files changed: 271 ins; 29 del; 108 mod 8272783: Epsilon: Refactor tests to improve performance Reviewed-by: tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/5202 From iwalulya at openjdk.java.net Tue Aug 24 10:29:57 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 24 Aug 2021 10:29:57 GMT Subject: RFR: 8242847: G1 should not clear mark bitmaps with no marks [v3] In-Reply-To: References: Message-ID: > Hi all, > > Please review this change to bound the range of bitmap clearing for each region using the liveness data collected during marking. For the Concurrent Undo Mark cycle, the liveness information (next_top_at_mark_start and live_words) is in sync wrt. the _next_mark_bitmap that needs clearing. Hence, we use these details to clear only bitmaps for regions that were dirtied and need clearing i.e. only clear between [bottom, ntams), and only clear bitmaps for regions that had at least one bit set (i.e. have some live data). > > Testing: Tier 1-3. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: trailing space ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5213/files - new: https://git.openjdk.java.net/jdk/pull/5213/files/0ce053a3..3c7d1325 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5213&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5213&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5213.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5213/head:pull/5213 PR: https://git.openjdk.java.net/jdk/pull/5213 From tschatzl at openjdk.java.net Tue Aug 24 11:45:26 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 24 Aug 2021 11:45:26 GMT Subject: RFR: 8272083: G1: Record iterated range for BOT performance during card scan In-Reply-To: References: Message-ID: On Sat, 7 Aug 2021 04:56:08 GMT, Yude Lin wrote: > A fix to the problem in 8272083 is to use a per-worker pointer to indicate where the worker has scanned up to, similar to the _scanned_to variable. The difference is this pointer (I call it _iterated_to) records the end of the object and _scanned_to records the end of the scan. Since we always scan with increasing addresses, the end of the latest object scanned is also the address where BOT has fixed up to. So we avoid having to fix below this address when calling block_start(). This implementation approximately reduce the number of calls to set_offset_array() during scan_heap_roots() 2-10 times (in my casual test with -XX:G1ConcRefinementGreenZone=1000000). > > What this approach not solving is random access to BOT. So far I haven't found anything having this pattern. Hi, > > > > If concurrent refinement tries to refine a card, it will probably run into an unfixed part of BOT. We prevent this by requiring the refiner to fix where the card is pointing at. If other worker has claimed the chunk where the card is pointing at, then refinement can proceed without a guarantee the chunk is actually fixed. This not often happens but might be a problem. > > > > > > This paragraph indicates that the refinement thread never fixes up the BOT, and there is some synchronization mechanism between BOT-fix thread and the refinement threads. > > Not sure how this works, but it depends on how the chunks for BOT-fixup are claimed - maybe the refinement thread could fix the BOT for that chunk first (if unclaimed)? Just an idea, could the refinement threads do that BOT-fixup work first "before" (or during) refining work? > > E.g. if a refinement thread gets a card, first check if there is a corresponding BOT-fixup chunk to do, and do that first, doing all to be refined cards in that PLAB area at the same time (or don't do that at the same time - there are a few options here). > > I do not think leaving some work for card scanning is an issue as long as most work is done. > > What I meant here is when the refinement thread refines a card, it will first try to claim all the unfixed chunks in the same region, and fix them, up to a point where the it knows that "where the card points into has been fixed, I will not run into unfixed BOT if I refine it". So I think it's like what you suggested. Refinement threads and fixer threads are synchronized by the chunk index. If one claims a chunk, the other will look for the next. > > An issue is that the chunks of a region is claimed from bottom to top, there is no easy way to fix a random one in the middle. It means the refinement can be stall for some time but it won't be too long. I imagine, if we are using a queue to manage the BOT-fixing tasks, there is still an order imposed. The refinement thread has to dequeue and fix everything until where the card points into is fixed. But even taking into account the time to do all the fixing, the refinement rate increases (in my experiment). So I think it's worth it? Fixing the BOT concurrently seems worthwhile so far, no worries. The current work assignment heuristics seem to be suboptimal though and probably assumes too much that there is only a single refinement thread (which isn't true even in your case - any java application thread may also start refining at some point). However the main drawback imho for the (or just one) refinement thread fixing everything (for a region - afaiu) seems to be that this operation might take too long. Our ballpark figure for time of the time allowance for a single task without checking back for a pending GC should only be up to `G1ConcMarkStepDurationMillis` (default 10ms, but that is rather on the high side nowadays; for another example, the remembered set rebuild takes 256kB chunks which on at that time current machines typically took < 1ms iirc), and processing all chunks for a region seems to take longer than that given the outliers the graphs showed. It's unfortunate that PLABs can get really large, so even just recording PLAB starts may be too coarse of a BOT-fixup granularity. However, before thinking about improving parallelism there, let's solve the initial problem :) The length/size of the PLABs is also the reason why you want that operation parallelized even in the concurrent phase. > > > > If another gc pause is scheduled, we need to wait for the fixing phase to complete. Because we assume that "BOT is going to be fixed almost entirely anyway. Might as well do it sooner". We can potentially abort though, and handle the unfixed part in some other way. > > > > > > This paragraph indicates to me that the BOT-fix thread has higher priority than the GC pause, which seems to be the wrong way to look at the problem - in this case I would rather have the GC eat the cost of fixing up the BOT as the BOT fixup seems supplementary. > > Delaying remaining BOT-fixing work for after the GC pause seems not to be a good idea, this seems to only result in forever piling up work (assuming in steady state the GC is always faster than completing that work). > > Not completely sure if this is what you meant. > > Or did you mean that in the pause some extra(?) phase completes the fixup work, which could be another option? > > However just leaving the remainder of the work for the card scan, even if it sometimes does duplicate work does not seem to be an issue to me. This problem of duplicate work is probably largest on large heaps, where the time between pauses allows for complete concurrent BOT-fixup anyway. > > What I did here is letting the gc thread wait for fixing _in the safepoint_, when everything is paused. I think it's equivalent to adding another phase in the gc pause, but it seems I could be wrong. The data shows most of the time the waiting finishes within During GC you should/could use all worker threads allowed for such things, which is faster - and maybe needs additional information to parallelize well. > 0.1ms, but occasionally >10ms or even >100ms, which is causing part of the problem you mentioned about the outlier of the pause time. I'm yet to understand why. I'm sure something can be engineered to avoid this. > > But I agree we should probably abort the fixing when gc happens, if we consider the role of fixing as always supplementary to gc. I also realized afterwards that this waiting time should be considered by the pause time policy if we don't abort. > > > Actually, you could probably directly use a G1CardSet to store these, probably with some tuned settings, you might not need an extra data structure at all (depending on the needs on that data structure). The G1CardSet is a fairly compact representation. > > Another option could be also just store a pointer to the first PLAB per region where this work needs to be done (similar to G1CMRootMemRegions); the additional PLAB boundaries could be recalculated by using the (known) PLAB size. I.e. base + x * PLABSize gives you the x+1`th PLAB start. > > I'll check out these data structures. Thanks for the pointer~ > > > Direct allocations are a (small) problem, as they change that boundary, however I think these can be detected assuming the refinement threads do not interfere; given a cur pointer within that region, which indicates where other threads need to continue fixing up the BOT (until top() of course), if the BOT entry at card(cur + PLABSize) does not exactly point to base + PLABSize (i.e. the start of the next PLAB), this must have been a single object (or the first object's size is larger than a PLAB), so just adjust cur by that single object's size. I think the BOT of these single objects is already correct, so it can be skipped. > > But I still want to argue against using a queue or other extra data structure. An additional point (on top of space consumption) is that I don't have to make modification to the allocation code if I just use the old_top()/new_top() scheme. I checked how to modify them. It involves changing too many interfaces in G1AllocRegion/HeapRegion/G1Allocator, in order to pass down the information "am I allocating a plab or a large object?", to where the locked allocation actually happens. (Maybe we can discuss this over code when it's uploaded but) we do want to do enqueue when the HeapRegion is locked, right? The code in `G1ParScanThreadState::allocate_copy_slow()` is responsible for allocating either a (new) PLAB or directly. At most the called `G1PLABAllocator::allocate_direct_or_new_plab` should be changed to return the actual allocation size to, in addition to the base pointer, get the actual allocation size into `allocate_copy_slow` that could record that. Fwiw, the call to `report_promotion_event()` in that method is actually notifying JFR of a new PLAB or direct allocation (note that it passes the `word_sz` of the object that caused that allocation, not the size of the block you got). So this should be fairly easy to add a hook at there. Of course it is probably most fruitful to discuss further implementation details on the actual code. > > > I.e. if the chunk size for card scan is e.g. 16k words, one option could be to only fix the BOT for PLABs crossing that 16k boundaries; since we know that during card scan threads are claiming cards per 16k chunks (idk right now what it actually is, probably higher), there can only be overlapped work happening across them. > > I _think_, even if a plab doesn't cross a chunk boundary, if we try to call block_start() with addr1 and addr2 (plab_start < addr1 < addr2) in that order, there is possibly duplicated work. The first call will try to fix [plab_start, addr1], the second call will try to fix [start_of_object_that_covers(addr1), addr2]. There is an overlap between this two ranges. The reason is the second call is going back by too much. The BOT offset array element at addr2 is not updated when [plab_start, addr1] is fixed. It will go back to start_of_object_that_covers(addr1) (because the BOT element at addr1 is updated from plab_start to the real object the covers addr1). That is true - I suggested to have that thread fix the BOT of that (single) entire PLAB (which might still be too large to do in one go; particularly specjbb20xx max out PLAB sizes). > > > The graphs look promising so far; I'm only a bit concerned about the total pause time outliers (system.gc()?) as they seem a bit high. Also for some reason the command line above only configures 4 gc threads out of 16. > > The pause time issue can be partially explained by the waiting time for fixing, which is to be removed as discussed earlier. I tend to use less threads for testing. It only has one concurrent gc thread. The results seem less random ; ) > > Thank you for the detailed and informative reply! Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/5039 From ayang at openjdk.java.net Tue Aug 24 12:18:28 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 24 Aug 2021 12:18:28 GMT Subject: RFR: 8271930: Simplify end_card calculation in G1BlockOffsetTablePart::verify [v2] In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 08:36:52 GMT, Albert Mingkun Yang wrote: >> Simplify `end_card` calculation and some general cleanup. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5015 From ayang at openjdk.java.net Tue Aug 24 12:18:29 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 24 Aug 2021 12:18:29 GMT Subject: Integrated: 8271930: Simplify end_card calculation in G1BlockOffsetTablePart::verify In-Reply-To: References: Message-ID: On Thu, 5 Aug 2021 09:12:38 GMT, Albert Mingkun Yang wrote: > Simplify `end_card` calculation and some general cleanup. > > Test: hotspot_gc This pull request has now been integrated. Changeset: 928b9724 Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/928b9724c98c3377f96f5c3786ef2d8d79485dfe Stats: 18 lines in 3 files changed: 1 ins; 12 del; 5 mod 8271930: Simplify end_card calculation in G1BlockOffsetTablePart::verify Reviewed-by: tschatzl, iwalulya ------------- PR: https://git.openjdk.java.net/jdk/pull/5015 From iwalulya at openjdk.java.net Tue Aug 24 14:30:26 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 24 Aug 2021 14:30:26 GMT Subject: RFR: 8272725: G1: add documentation on needs_remset_update_t vs bool [v2] In-Reply-To: <7fRoxbV6b5GKXZuzfqQ4lXjjFtpXIFQ7F5CglMeB7DA=.791bf0dd-1faa-484e-b73a-f8236cdec8f5@github.com> References: <7fRoxbV6b5GKXZuzfqQ4lXjjFtpXIFQ7F5CglMeB7DA=.791bf0dd-1faa-484e-b73a-f8236cdec8f5@github.com> Message-ID: On Fri, 20 Aug 2021 11:22:48 GMT, Albert Mingkun Yang wrote: >> Simple type change in `struct G1HeapRegionAttr`. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: > > - review > - Revert "boolean" > > This reverts commit 2a23beb0549a928ae3876070dbdcf5a153c0703d. Marked as reviewed by iwalulya (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5188 From iwalulya at openjdk.java.net Tue Aug 24 15:08:38 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 24 Aug 2021 15:08:38 GMT Subject: Withdrawn: 8215118: Reduce HeapRegion to only have one member for index in CSet In-Reply-To: References: Message-ID: On Mon, 16 Aug 2021 10:28:37 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to maintain only a single member variable to indicate the region's index in a collection set: either the actual collection set for young regions or the optional regions array if the region is considered optional during a mixed collection. > > Testing: Tier 1-7 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5122 From ayang at openjdk.java.net Tue Aug 24 15:24:33 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 24 Aug 2021 15:24:33 GMT Subject: RFR: 8272725: G1: add documentation on needs_remset_update_t vs bool [v2] In-Reply-To: <7fRoxbV6b5GKXZuzfqQ4lXjjFtpXIFQ7F5CglMeB7DA=.791bf0dd-1faa-484e-b73a-f8236cdec8f5@github.com> References: <7fRoxbV6b5GKXZuzfqQ4lXjjFtpXIFQ7F5CglMeB7DA=.791bf0dd-1faa-484e-b73a-f8236cdec8f5@github.com> Message-ID: On Fri, 20 Aug 2021 11:22:48 GMT, Albert Mingkun Yang wrote: >> Simple type change in `struct G1HeapRegionAttr`. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with two additional commits since the last revision: > > - review > - Revert "boolean" > > This reverts commit 2a23beb0549a928ae3876070dbdcf5a153c0703d. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5188 From ayang at openjdk.java.net Tue Aug 24 15:24:34 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 24 Aug 2021 15:24:34 GMT Subject: Integrated: 8272725: G1: add documentation on needs_remset_update_t vs bool In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 15:12:45 GMT, Albert Mingkun Yang wrote: > Simple type change in `struct G1HeapRegionAttr`. > > Test: hotspot_gc This pull request has now been integrated. Changeset: 6e0328f5 Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/6e0328f5829282b56327aa0128774cb916275d45 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8272725: G1: add documentation on needs_remset_update_t vs bool Reviewed-by: tschatzl, iwalulya ------------- PR: https://git.openjdk.java.net/jdk/pull/5188 From ayang at openjdk.java.net Tue Aug 24 17:58:39 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 24 Aug 2021 17:58:39 GMT Subject: RFR: 8272905: Consolidate discovered lists processing Message-ID: <-BaizreVcGJ9jz4D2W5ETKOW93gquYosYjmOayyRmX8=.6c21108b-7ce0-4270-b4c1-10d16928dea9@github.com> Simple change of merging discovered list processing for four kinds of non-strong refs and some general cleanup around. Test: tier1-3 ------------- Commit messages: - complete-gc Changes: https://git.openjdk.java.net/jdk/pull/5242/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5242&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272905 Stats: 125 lines in 3 files changed: 42 ins; 51 del; 32 mod Patch: https://git.openjdk.java.net/jdk/pull/5242.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5242/head:pull/5242 PR: https://git.openjdk.java.net/jdk/pull/5242 From tschatzl at openjdk.java.net Wed Aug 25 08:30:31 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 25 Aug 2021 08:30:31 GMT Subject: RFR: 8271880: Tighten condition for excluding regions from collecting cards with cross-references [v5] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 08:41:08 GMT, Thomas Schatzl wrote: >> Hi, >> >> can I have reviews for this change that by tightening the condition for excluding regions from collecting cards with cross-references allows us to avoid the rescanning of objects that failed evacuation in the fix up self-forwards phase after evacuation failure. >> >> I.e. during gc g1 never collects cards/references from the young gen (including eden) for later refinement - which means that we need to rescan all live objects remaining in eden regions for cross-references. >> >> The problem or the reason why we did that was that we did not want to add cards to refine from survivor regions (i.e. next gc's young gen) because we just don't need to as we always collect young gen, so references from there need not be recorded in the remembered sets (and actually, if we did, we errorneouosly marked cards in young gen which later card table verification will not like) - but we did not have that information on hand anywhere already quickly accessible. >> >> This change solves that problem by actually recording this information in the region attribute table as "NewSurvivor" type region. "NewSurvivor" because I did want to make explicit that these are the survivor regions from the *new* (next) young generation (i.e. just survivor) and not the survivor regions of the previous gc (that were turned eden at the start of this gc) but something like "NewYoung" or so would be fine with me as well (or certainly just "Survivor", but that might be confusing). >> >> Another interesting addition is probably the new assert in `G1ParThreadScanState::enqueue_card_if_tracked` >> >> >> assert(!_g1h->heap_region_containing(o)->in_collection_set(), "Should not try to enqueue reference into collection set region"); >> >> This, at this time, verifies the assumption that g1 is not trying to collect references *to* the collection set, i.e. other objects that failed evacuation - after all we later relabel their regions as old *without* a remembered set; we would do otherwise unnecessarily because the reason is that (currently) cset tracking for these regions is enabled (at least during gc - we only later relabel and drop the remembered sets). >> >> This might change later if we want to move evacuation failed regions into survivor (or keep their remembered sets for some reason), but for now we filter attempts to add cards in the dcqs for those this way. >> >> Testing: tier1-5, gc/g1 with `JAVA_OPTIONS_=-XX+G1EvacuationFailureALot -XX:+VerifyAfterGC`. >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into evac-failure-no-scan-during-remove-self-forwards > - More updates to comment > - Clarify comment > - ayang review > - Some random changes > - No scan during self forward Here is a description of the problem: G1 crashes because of a missing remembered set entry in the young generation for j.l.ref.References' discovered field. The reason is that - reference processing uses regular write barriers for the discovered fields, enqueuing into the global barrier set DCQS - since young gen evacuation failed regions still have the "young" card entry set all over the card is not added to the DCQS - even if the card would be enqueued, we would lose the mark because G1 later clears the card table, and redirties using the gc local dcqs (which does not contain that card because it has been added to the "wrong" dcqs) -> missing remembered set entry -> crash (or verification error) - this can also happen in old regions with the change introduced in [JDK-8270842](https://bugs.openjdk.java.net/browse/JDK-8270842)/[PR#4853](https://github.com/openjdk/jdk/pull/4853), but is likely much more rare. Previously there has been no issue because the fixup self-forwards phase re-enqueued these cards into the "correct" DCQS. ------------- PR: https://git.openjdk.java.net/jdk/pull/5037 From tschatzl at openjdk.java.net Wed Aug 25 08:36:29 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 25 Aug 2021 08:36:29 GMT Subject: RFR: 8271880: Tighten condition for excluding regions from collecting cards with cross-references [v5] In-Reply-To: References: Message-ID: <4i3TuXpzBHD05afCJ83dYBJWUlgXzyB_ioXKiCBrgEI=.1c95658d-5186-49c6-8d2a-19a6a518a018@github.com> On Tue, 10 Aug 2021 08:41:08 GMT, Thomas Schatzl wrote: >> Hi, >> >> can I have reviews for this change that by tightening the condition for excluding regions from collecting cards with cross-references allows us to avoid the rescanning of objects that failed evacuation in the fix up self-forwards phase after evacuation failure. >> >> I.e. during gc g1 never collects cards/references from the young gen (including eden) for later refinement - which means that we need to rescan all live objects remaining in eden regions for cross-references. >> >> The problem or the reason why we did that was that we did not want to add cards to refine from survivor regions (i.e. next gc's young gen) because we just don't need to as we always collect young gen, so references from there need not be recorded in the remembered sets (and actually, if we did, we errorneouosly marked cards in young gen which later card table verification will not like) - but we did not have that information on hand anywhere already quickly accessible. >> >> This change solves that problem by actually recording this information in the region attribute table as "NewSurvivor" type region. "NewSurvivor" because I did want to make explicit that these are the survivor regions from the *new* (next) young generation (i.e. just survivor) and not the survivor regions of the previous gc (that were turned eden at the start of this gc) but something like "NewYoung" or so would be fine with me as well (or certainly just "Survivor", but that might be confusing). >> >> Another interesting addition is probably the new assert in `G1ParThreadScanState::enqueue_card_if_tracked` >> >> >> assert(!_g1h->heap_region_containing(o)->in_collection_set(), "Should not try to enqueue reference into collection set region"); >> >> This, at this time, verifies the assumption that g1 is not trying to collect references *to* the collection set, i.e. other objects that failed evacuation - after all we later relabel their regions as old *without* a remembered set; we would do otherwise unnecessarily because the reason is that (currently) cset tracking for these regions is enabled (at least during gc - we only later relabel and drop the remembered sets). >> >> This might change later if we want to move evacuation failed regions into survivor (or keep their remembered sets for some reason), but for now we filter attempts to add cards in the dcqs for those this way. >> >> Testing: tier1-5, gc/g1 with `JAVA_OPTIONS_=-XX+G1EvacuationFailureALot -XX:+VerifyAfterGC`. >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into evac-failure-no-scan-during-remove-self-forwards > - More updates to comment > - Clarify comment > - ayang review > - Some random changes > - No scan during self forward ... and using the normal write barrier is good at that time for objects evacuated somewhere else: either this had been into survivor, where we do not care about card marks (and the "young" card filter works as expected), or they are clean (previously unallocated areas in the old gen region). ------------- PR: https://git.openjdk.java.net/jdk/pull/5037 From tschatzl at openjdk.java.net Wed Aug 25 09:26:25 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 25 Aug 2021 09:26:25 GMT Subject: RFR: 8242847: G1 should not clear mark bitmaps with no marks [v3] In-Reply-To: References: Message-ID: On Tue, 24 Aug 2021 10:29:57 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to bound the range of bitmap clearing for each region using the liveness data collected during marking. For the Concurrent Undo Mark cycle, the liveness information (next_top_at_mark_start and live_words) is in sync wrt. the _next_mark_bitmap that needs clearing. Hence, we use these details to clear only bitmaps for regions that were dirtied and need clearing i.e. only clear between [bottom, ntams), and only clear bitmaps for regions that had at least one bit set (i.e. have some live data). >> >> Testing: Tier 1-3. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > trailing space Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5213 From mli at openjdk.java.net Wed Aug 25 09:36:25 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 25 Aug 2021 09:36:25 GMT Subject: RFR: 8254739: G1: Optimize evacuation failure for regions with few failed objects In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 08:11:42 GMT, Hamlin Li wrote: > This is a try to optimize evcuation failure for regions. > I record every evacuation failure object per region (by G1EvacuationFailureObjsInHR), and then iterate (which indeed includes compact/sort/iteration) these objects directly in RemoveSelfForwardPtrHRClosure. > > I have tested it with following parameters, > > - -XX:+ParallelGCThreads=1/32/64 > - -XX:G1EvacuationFailureALotInterval=1 > - -XX:G1EvacuationFailureALotCount=2/10/100/1000/10000/100000 > > It saves "Remove Self Forwards" time all the time ,and in most condition it saves "Evacuate Collection Set" time. > > It brings some performance degradation when -XX:G1EvacuationFailureALotCount is low, such as *2*. To improve this a little, we can record the number evacuation failure object per region, and not record these objects when the number hit some limit. But I'm not sure if it's necessary to do so, as I think such condition is so extreme to be met in real environment, although I'm not quite sure. Thanks a lot for the detailed review and suggestion. > Initial feedback: we do not care about cases like `G1EvacuationFailureALotCount` at this time: for current evacuation failure the amount of failed objects is supposed to be small (due to `G1HeapReservePercent`), and if they are large, we are likely going into a full gc anyway. > When reusing this mechanism for region pinning, there is unfortunately no particular limit on the live objects to be expected apart from something like "1/2 the objects" as this can occur with any region in the young gen (we can just skip selecting old gen regions for evacuation in that case), so performance should be good for that use case, more so than now. > I agree. > One problem I can see now with performance is the use of a linked list for storing the regions. Using such requires quite a lot of memory allocation (every `Node` object takes twice the needed data, i.e. the reference plus `next` link, plus low-level malloc overhead), and obviously traversing all links is kind of slow. > Since the only thing that needs to be done concurrently (and fast) is adding an element, an option could be something like a segmented array of HeapWords (per region?), i.e. a list of arrays of elements. (Further, since this is per `HeapRegion` the memory cost could be cut in half easily by using 32 bit offsets). That would probably also be faster when "compacting" all these failed arrays into a single array, and faster when deallocating (just a few of these arrays). > Something like `G1CardSetAllocator` with an element type of `G1CardSetArray`or the dirty card queue set allocation (`BufferNode::Allocator` for the allocation only part); unfortunately the code isn't that generic to be used as is for you. > I have written a simple version of the "array" for this specific usage, I do not implement a buffer to cache the "node" used in "array". Seems it's getting better performance than my previous linked list version. ( I measure the end-to-end time simply) Maybe later I could make it more generic or merge with the card-set one? If it's feasible, should we do it in a separate issue? > I am also not really convinced that the failed object lists should be attached to `HeapRegion`, I would rather prefer an extra array as this is something that is usually not needed (and only required for regions with failure, which are very few typically), and only needed during young gc. (I intend to add something like a container for `HeapRegion` specific data that is only needed and used during gc at some point, moving similar existing misplaced data out of `HeapRegion` there then). > I agree. We could refactor it thoroughly. Could we do it in a separate issue? please kindly point to the issue if you already track this with an issue. > > Overall we can add some cut-off for whole-region iteration at some point as you described as needed later I think. > yes. > > Do you have a breakdown of the time taken for the parts of the `G1EvacuationFailureObjsInHR::iterate()` method to see where where most time is spent? I.e. what overhead does the linked list impose compared to actual "useful" work? Do you have an idea about the total memory overhead? I can imagine that can be significant in this case. > I will measure the time of the new implementation with "array", and update the information later. > > I understand this is a bit hard to measure in a realistic environment because while `-XX:G1EvacuationFailureALot` is good for implementing and debugging, but the distribution of failures is probably a lot different than real evacuation failures. Maybe I could help you with finding some options to configure some existing stress tests (GCBasher, ...) to cause evacuation failures? Thanks a lot, this will be great helpful. > > Later I will follow up with a more thorough look. > > Hth, > Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/5181 From ayang at openjdk.java.net Wed Aug 25 09:36:27 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 25 Aug 2021 09:36:27 GMT Subject: RFR: 8236176: Parallel GC SplitInfo comment should be updated for shadow regions [v2] In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 16:07:48 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this small change to add a high-level description of how shadow regions are utilized during the compact phase of the parallel GC. There is no special handling of Splitinfo by the shadow regions; the destination computation and the handling of partial_objects are done during the summary phase while the shadow regions optimization only applies to compaction. >> >> Additionally, some unused variables are cleaned up. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Thomas review Marked as reviewed by ayang (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5195 From mli at openjdk.java.net Wed Aug 25 09:51:50 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 25 Aug 2021 09:51:50 GMT Subject: RFR: 8254739: G1: Optimize evacuation failure for regions with few failed objects [v2] In-Reply-To: References: Message-ID: > This is a try to optimize evcuation failure for regions. > I record every evacuation failure object per region (by G1EvacuationFailureObjsInHR), and then iterate (which indeed includes compact/sort/iteration) these objects directly in RemoveSelfForwardPtrHRClosure. > > I have tested it with following parameters, > > - -XX:+ParallelGCThreads=1/32/64 > - -XX:G1EvacuationFailureALotInterval=1 > - -XX:G1EvacuationFailureALotCount=2/10/100/1000/10000/100000 > > It saves "Remove Self Forwards" time all the time ,and in most condition it saves "Evacuate Collection Set" time. > > It brings some performance degradation when -XX:G1EvacuationFailureALotCount is low, such as *2*. To improve this a little, we can record the number evacuation failure object per region, and not record these objects when the number hit some limit. But I'm not sure if it's necessary to do so, as I think such condition is so extreme to be met in real environment, although I'm not quite sure. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Use a segmented array rather than a linked list to record evacuation failure objs in one region ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5181/files - new: https://git.openjdk.java.net/jdk/pull/5181/files/4d88f428..74bc4d96 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5181&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5181&range=00-01 Stats: 249 lines in 3 files changed: 177 ins; 30 del; 42 mod Patch: https://git.openjdk.java.net/jdk/pull/5181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5181/head:pull/5181 PR: https://git.openjdk.java.net/jdk/pull/5181 From ayang at openjdk.java.net Wed Aug 25 10:08:38 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 25 Aug 2021 10:08:38 GMT Subject: RFR: 8272975: ParallelGC: add documentation to heap memory layout Message-ID: Simple change of adding some documentation. ------------- Commit messages: - heap-doc Changes: https://git.openjdk.java.net/jdk/pull/5247/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5247&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272975 Stats: 20 lines in 1 file changed: 20 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5247.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5247/head:pull/5247 PR: https://git.openjdk.java.net/jdk/pull/5247 From sjohanss at openjdk.java.net Wed Aug 25 10:13:28 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 25 Aug 2021 10:13:28 GMT Subject: RFR: 8242847: G1 should not clear mark bitmaps with no marks [v3] In-Reply-To: References: Message-ID: <-RwlWxlzG3W6kx_vDdXbh3N1pV246SzbEkWnVvC4dmI=.445a3660-afa2-44a0-a087-44a1cb97ab0d@github.com> On Tue, 24 Aug 2021 10:29:57 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to bound the range of bitmap clearing for each region using the liveness data collected during marking. For the Concurrent Undo Mark cycle, the liveness information (next_top_at_mark_start and live_words) is in sync wrt. the _next_mark_bitmap that needs clearing. Hence, we use these details to clear only bitmaps for regions that were dirtied and need clearing i.e. only clear between [bottom, ntams), and only clear bitmaps for regions that had at least one bit set (i.e. have some live data). >> >> Testing: Tier 1-3. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > trailing space Lgtm ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5213 From iwalulya at openjdk.java.net Wed Aug 25 10:22:29 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 25 Aug 2021 10:22:29 GMT Subject: RFR: 8236176: Parallel GC SplitInfo comment should be updated for shadow regions [v2] In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 16:07:48 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this small change to add a high-level description of how shadow regions are utilized during the compact phase of the parallel GC. There is no special handling of Splitinfo by the shadow regions; the destination computation and the handling of partial_objects are done during the summary phase while the shadow regions optimization only applies to compaction. >> >> Additionally, some unused variables are cleaned up. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Thomas review Thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5195 From iwalulya at openjdk.java.net Wed Aug 25 10:22:30 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 25 Aug 2021 10:22:30 GMT Subject: Integrated: 8236176: Parallel GC SplitInfo comment should be updated for shadow regions In-Reply-To: References: Message-ID: On Fri, 20 Aug 2021 06:48:18 GMT, Ivan Walulya wrote: > Hi all, > > Please review this small change to add a high-level description of how shadow regions are utilized during the compact phase of the parallel GC. There is no special handling of Splitinfo by the shadow regions; the destination computation and the handling of partial_objects are done during the summary phase while the shadow regions optimization only applies to compaction. > > Additionally, some unused variables are cleaned up. This pull request has now been integrated. Changeset: 63e062fb Author: Ivan Walulya URL: https://git.openjdk.java.net/jdk/commit/63e062fb78f925782cf4c9641b54f266bcbebc5c Stats: 31 lines in 2 files changed: 19 ins; 11 del; 1 mod 8236176: Parallel GC SplitInfo comment should be updated for shadow regions Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.java.net/jdk/pull/5195 From tschatzl at openjdk.java.net Wed Aug 25 11:16:22 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 25 Aug 2021 11:16:22 GMT Subject: RFR: 8254739: G1: Optimize evacuation failure for regions with few failed objects [v2] In-Reply-To: References: Message-ID: On Wed, 25 Aug 2021 09:51:50 GMT, Hamlin Li wrote: >> This is a try to optimize evcuation failure for regions. >> I record every evacuation failure object per region (by G1EvacuationFailureObjsInHR), and then iterate (which indeed includes compact/sort/iteration) these objects directly in RemoveSelfForwardPtrHRClosure. >> >> I have tested it with following parameters, >> >> - -XX:+ParallelGCThreads=1/32/64 >> - -XX:G1EvacuationFailureALotInterval=1 >> - -XX:G1EvacuationFailureALotCount=2/10/100/1000/10000/100000 >> >> It saves "Remove Self Forwards" time all the time ,and in most condition it saves "Evacuate Collection Set" time. >> >> It brings some performance degradation when -XX:G1EvacuationFailureALotCount is low, such as *2*. To improve this a little, we can record the number evacuation failure object per region, and not record these objects when the number hit some limit. But I'm not sure if it's necessary to do so, as I think such condition is so extreme to be met in real environment, although I'm not quite sure. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Use a segmented array rather than a linked list to record evacuation failure objs in one region Hi, > > One problem I can see now with performance is the use of a linked list for storing the regions. Using such requires quite a lot of memory allocation (every `Node` object takes twice the needed data, i.e. the reference plus `next` link, plus low-level malloc overhead), and obviously traversing all links is kind of slow. > > Since the only thing that needs to be done concurrently (and fast) is adding an element, an option could be something like a segmented array of HeapWords (per region?), i.e. a list of arrays of elements. (Further, since this is per `HeapRegion` the memory cost could be cut in half easily by using 32 bit offsets). That would probably also be faster when "compacting" all these failed arrays into a single array, and faster when deallocating (just a few of these arrays). > > Something like `G1CardSetAllocator` with an element type of `G1CardSetArray`or the dirty card queue set allocation (`BufferNode::Allocator` for the allocation only part); unfortunately the code isn't that generic to be used as is for you. > > I have written a simple version of the "array" for this specific usage, I do not implement a buffer to cache the "node" used in "array". Seems it's getting better performance than my previous linked list version. ( I measure the end-to-end time simply) > Maybe later I could make it more generic or merge with the card-set one? If it's feasible, should we do it in a separate issue? Yes, we can merge this later; there is the related [JDK-8267834: Refactor G1CardSetAllocator and BufferNode::Allocator to use a common base class](https://bugs.openjdk.java.net/browse/JDK-8267834) that may provide some useful base. > > > I am also not really convinced that the failed object lists should be attached to `HeapRegion`, I would rather prefer an extra array as this is something that is usually not needed (and only required for regions with failure, which are very few typically), and only needed during young gc. (I intend to add something like a container for `HeapRegion` specific data that is only needed and used during gc at some point, moving similar existing misplaced data out of `HeapRegion` there then). > > I agree. We could refactor it thoroughly. Could we do it in a separate issue? please kindly point to the issue if you already track this with an issue. I am currently working on separating the G1 young collection algorithm from `G1CollectedHeap` in [JDK-8253343: Extract G1 Young GC algorithm related code from G1CollectedHeap](https://bugs.openjdk.java.net/browse/JDK-8253343) prototype available [here](https://github.com/tschatzl/jdk/tree/submit/8253343-move-young-collector-to-separate-files)), after that I intend to try to localize young-collection only data structures as suggested. I filed [8272978: Factor out g1 young collection specific data structures](https://bugs.openjdk.java.net/browse/JDK-8272978) where a note about this data structure could be added. > > > Overall we can add some cut-off for whole-region iteration at some point as you described as needed later I think. > > yes. I filed JDK-8272977 to not forget on this. > > > Do you have a breakdown of the time taken for the parts of the `G1EvacuationFailureObjsInHR::iterate()` method to see where where most time is spent? I.e. what overhead does the linked list impose compared to actual "useful" work? Do you have an idea about the total memory overhead? I can imagine that can be significant in this case. > > I will measure the time of the new implementation with "array", and update the information later. Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/5181 From shade at openjdk.java.net Wed Aug 25 12:10:24 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 25 Aug 2021 12:10:24 GMT Subject: RFR: 8267188: gc/stringdedup/TestStringDeduplicationInterned.java fails with Shenandoah In-Reply-To: References: Message-ID: On Tue, 17 Aug 2021 18:32:54 GMT, Zhengyu Gu wrote: > Shenandoah currently enqueues deduplication candidates when marks object gray. > > However, it can not do so when it scans roots, as it can potentially result lock rank inversion between stack watermark lock and dedup request buffer lock. > > As the result, it can not enqueue as many as candidates as it should be able to, I believe JDK-8271834 is due to the same problem. I purpose we switch to enqueue candidates when we mark object black. > > We are still not able to enqueue all candidates, only when they have displaced headers or have monitor inflating in progress. Upstream is working on removing displaced headers, we should revisit the logic afterward, or we can choose to deduplicate all string regardless their ages. > > > > Test: > - [x] hotspot_gc_shenandoah > - [x] tier1 with -XX:+UseShenandoahGC and -XX:+UseStringDeduplication Looks fine, modulo a few suggestions. src/hotspot/share/gc/shenandoah/shenandoahMark.cpp line 120: > 118: } > 119: > 120: template Suggestion: template src/hotspot/share/gc/shenandoah/shenandoahMark.inline.hpp line 71: > 69: if (task->is_not_chunked()) { > 70: if (obj->is_instance()) { > 71: dedup_string(obj, req); I do wonder if it is faster to _first_ let the closure process the outbound references, and then do dedup for the object? This way, parallel GC workers can do work while we are calling `dedup_string` here. This is one of the reasons why old code did dedup check after the queue push. In other words, move this call at the end of this block. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5151 From zgu at openjdk.java.net Wed Aug 25 12:15:53 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 25 Aug 2021 12:15:53 GMT Subject: RFR: 8267188: gc/stringdedup/TestStringDeduplicationInterned.java fails with Shenandoah [v2] In-Reply-To: References: Message-ID: > Shenandoah currently enqueues deduplication candidates when marks object gray. > > However, it can not do so when it scans roots, as it can potentially result lock rank inversion between stack watermark lock and dedup request buffer lock. > > As the result, it can not enqueue as many as candidates as it should be able to, I believe JDK-8271834 is due to the same problem. I purpose we switch to enqueue candidates when we mark object black. > > We are still not able to enqueue all candidates, only when they have displaced headers or have monitor inflating in progress. Upstream is working on removing displaced headers, we should revisit the logic afterward, or we can choose to deduplicate all string regardless their ages. > > > > Test: > - [x] hotspot_gc_shenandoah > - [x] tier1 with -XX:+UseShenandoahGC and -XX:+UseStringDeduplication Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Aleksey's comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5151/files - new: https://git.openjdk.java.net/jdk/pull/5151/files/98eedbd5..18e59bd5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5151&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5151&range=00-01 Stats: 3 lines in 2 files changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5151.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5151/head:pull/5151 PR: https://git.openjdk.java.net/jdk/pull/5151 From iwalulya at openjdk.java.net Wed Aug 25 12:28:35 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 25 Aug 2021 12:28:35 GMT Subject: RFR: 8159979: During initial mark, preparing all regions for marking may take a significant amount of time Message-ID: Hi all, Please review this change to move preparing regions for marking to the multithreaded execution in G1PrepareEvacuationTask. Additionally, we add more marking details to the humongous region logs. Testing: Tier 1-3. ------------- Commit messages: - clean up - 8159979: During initial mark, preparing all regions for marking may take a significant amount of time Changes: https://git.openjdk.java.net/jdk/pull/5251/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5251&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8159979 Stats: 19 lines in 2 files changed: 6 ins; 12 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5251.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5251/head:pull/5251 PR: https://git.openjdk.java.net/jdk/pull/5251 From mli at openjdk.java.net Wed Aug 25 12:34:53 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 25 Aug 2021 12:34:53 GMT Subject: RFR: 8254739: G1: Optimize evacuation failure for regions with few failed objects [v3] In-Reply-To: References: Message-ID: > This is a try to optimize evcuation failure for regions. > I record every evacuation failure object per region (by G1EvacuationFailureObjsInHR), and then iterate (which indeed includes compact/sort/iteration) these objects directly in RemoveSelfForwardPtrHRClosure. > > I have tested it with following parameters, > > - -XX:+ParallelGCThreads=1/32/64 > - -XX:G1EvacuationFailureALotInterval=1 > - -XX:G1EvacuationFailureALotCount=2/10/100/1000/10000/100000 > > It saves "Remove Self Forwards" time all the time ,and in most condition it saves "Evacuate Collection Set" time. > > It brings some performance degradation when -XX:G1EvacuationFailureALotCount is low, such as *2*. To improve this a little, we can record the number evacuation failure object per region, and not record these objects when the number hit some limit. But I'm not sure if it's necessary to do so, as I think such condition is so extreme to be met in real environment, although I'm not quite sure. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Fix compilation issues on some platform. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5181/files - new: https://git.openjdk.java.net/jdk/pull/5181/files/74bc4d96..05f026a0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5181&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5181&range=01-02 Stats: 25 lines in 2 files changed: 5 ins; 6 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/5181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5181/head:pull/5181 PR: https://git.openjdk.java.net/jdk/pull/5181 From sjohanss at openjdk.java.net Wed Aug 25 12:46:37 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 25 Aug 2021 12:46:37 GMT Subject: RFR: 8272651: G1 heap region info print order changed by JDK-8269914 Message-ID: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> Please review this small change to restore and coordinate the order of the G1 logs. **Summary** Recent changes have slightly re-ordered parts of the G1 logs printed with `-Xlog:gc*` for young and full collections. This change will mostly restore the order but also address some differences we had in the past. The biggest part of this change is making sure the region transitions are printed at the "correct" place. For young collections they are moved to just after the phases and for full collections they are moved to be before the end of the pause log. Example logs: [0,046s][info][gc,start ] GC(0) Pause Young (Concurrent Start) (System.gc()) [0,048s][info][gc,task ] GC(0) Using 24 workers of 28 for evacuation [0,049s][info][gc,phases ] GC(0) Pre Evacuate Collection Set: 0,1ms [0,049s][info][gc,phases ] GC(0) Merge Heap Roots: 0,1ms [0,049s][info][gc,phases ] GC(0) Evacuate Collection Set: 0,4ms [0,049s][info][gc,phases ] GC(0) Post Evacuate Collection Set: 0,3ms [0,049s][info][gc,phases ] GC(0) Other: 2,5ms [0,049s][info][gc,heap ] GC(0) Eden regions: 1->0(5) [0,049s][info][gc,heap ] GC(0) Survivor regions: 0->1(1) [0,049s][info][gc,heap ] GC(0) Old regions: 0->0 [0,049s][info][gc,heap ] GC(0) Archive regions: 2->2 [0,049s][info][gc,heap ] GC(0) Humongous regions: 0->0 [0,049s][info][gc,metaspace] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) [0,049s][info][gc ] GC(0) Pause Young (Concurrent Start) (System.gc()) 9M->8M(1024M) 3,563ms [0,049s][info][gc,cpu ] GC(0) User=0,00s Sys=0,00s Real=0,00s [0,044s][info][gc,start ] GC(0) Pause Full (System.gc()) [0,044s][info][gc,task ] GC(0) Using 3 workers of 28 for full compaction [0,044s][info][gc,phases,start] GC(0) Phase 1: Mark live objects [0,044s][info][gc,phases ] GC(0) Phase 1: Mark live objects 0,607ms [0,044s][info][gc,phases,start] GC(0) Phase 2: Prepare for compaction [0,045s][info][gc,phases ] GC(0) Phase 2: Prepare for compaction 0,772ms [0,045s][info][gc,phases,start] GC(0) Phase 3: Adjust pointers [0,046s][info][gc,phases ] GC(0) Phase 3: Adjust pointers 0,361ms [0,046s][info][gc,phases,start] GC(0) Phase 4: Compact heap [0,046s][info][gc,phases ] GC(0) Phase 4: Compact heap 0,142ms [0,050s][info][gc,heap ] GC(0) Eden regions: 1->0(2) [0,050s][info][gc,heap ] GC(0) Survivor regions: 0->0(0) [0,050s][info][gc,heap ] GC(0) Old regions: 0->1 [0,050s][info][gc,heap ] GC(0) Archive regions: 2->2 [0,050s][info][gc,heap ] GC(0) Humongous regions: 0->0 [0,050s][info][gc,metaspace ] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) [0,050s][info][gc ] GC(0) Pause Full (System.gc()) 9M->8M(80M) 6,729ms [0,050s][info][gc,cpu ] GC(0) User=0,01s Sys=0,02s Real=0,01s **Testing** Locally run g1 tests and also some additional testing through mach5. ------------- Commit messages: - Fix test - G1FullGCMark with id and cpu-time - 8272651: G1 heap region info print order changed by JDK-8269914 Changes: https://git.openjdk.java.net/jdk/pull/5252/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5252&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272651 Stats: 36 lines in 5 files changed: 18 ins; 11 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/5252.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5252/head:pull/5252 PR: https://git.openjdk.java.net/jdk/pull/5252 From iwalulya at openjdk.java.net Wed Aug 25 14:14:44 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 25 Aug 2021 14:14:44 GMT Subject: RFR: 8159979: During initial mark, preparing all regions for marking may take a significant amount of time [v2] In-Reply-To: References: Message-ID: <5ySGWSwr8hTraDPQd8RtJ0VTEiwVO2ErYXnhoD0Q1mw=.8119a0c0-d563-4aa6-b30c-e6dfb619a3b2@github.com> > Hi all, > > Please review this change to move preparing regions for marking to the multithreaded execution in G1PrepareEvacuationTask. > > Testing: Tier 1-3. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: remove unrelated changes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5251/files - new: https://git.openjdk.java.net/jdk/pull/5251/files/56f1ce15..37b29959 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5251&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5251&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5251.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5251/head:pull/5251 PR: https://git.openjdk.java.net/jdk/pull/5251 From iwalulya at openjdk.java.net Wed Aug 25 14:38:31 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 25 Aug 2021 14:38:31 GMT Subject: RFR: 8242847: G1 should not clear mark bitmaps with no marks [v3] In-Reply-To: References: Message-ID: <-kv80cYFnxGd6X3EVkoOVkNEYaQA4owO44gvbpuSNNM=.207853c1-e798-42bf-b7cb-34802ec7194d@github.com> On Tue, 24 Aug 2021 10:29:57 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to bound the range of bitmap clearing for each region using the liveness data collected during marking. For the Concurrent Undo Mark cycle, the liveness information (next_top_at_mark_start and live_words) is in sync wrt. the _next_mark_bitmap that needs clearing. Hence, we use these details to clear only bitmaps for regions that were dirtied and need clearing i.e. only clear between [bottom, ntams), and only clear bitmaps for regions that had at least one bit set (i.e. have some live data). >> >> Testing: Tier 1-3. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > trailing space Thanks for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/5213 From iwalulya at openjdk.java.net Wed Aug 25 14:38:32 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 25 Aug 2021 14:38:32 GMT Subject: Integrated: 8242847: G1 should not clear mark bitmaps with no marks In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 07:51:51 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to bound the range of bitmap clearing for each region using the liveness data collected during marking. For the Concurrent Undo Mark cycle, the liveness information (next_top_at_mark_start and live_words) is in sync wrt. the _next_mark_bitmap that needs clearing. Hence, we use these details to clear only bitmaps for regions that were dirtied and need clearing i.e. only clear between [bottom, ntams), and only clear bitmaps for regions that had at least one bit set (i.e. have some live data). > > Testing: Tier 1-3. This pull request has now been integrated. Changeset: e36cbd8e Author: Ivan Walulya URL: https://git.openjdk.java.net/jdk/commit/e36cbd8e05774ea9847c69f9987a2242589acf7e Stats: 81 lines in 5 files changed: 57 ins; 8 del; 16 mod 8242847: G1 should not clear mark bitmaps with no marks Reviewed-by: tschatzl, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/5213 From iwalulya at openjdk.java.net Wed Aug 25 14:40:28 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 25 Aug 2021 14:40:28 GMT Subject: Withdrawn: 8159979: During initial mark, preparing all regions for marking may take a significant amount of time In-Reply-To: References: Message-ID: On Wed, 25 Aug 2021 12:20:08 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to move preparing regions for marking to the multithreaded execution in G1PrepareEvacuationTask. > > Testing: Tier 1-3. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5251 From iwalulya at openjdk.java.net Wed Aug 25 14:45:44 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 25 Aug 2021 14:45:44 GMT Subject: RFR: 8272983: G1 Add marking details to eager reclaim logging Message-ID: Hi all, Please review this small change to add details on marking status in both prev and next marking bitmaps to the logging output in G1PrepareRegionsClosure. ------------- Commit messages: - 8272983: G1 Add marking details to eager reclaim logging Changes: https://git.openjdk.java.net/jdk/pull/5254/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5254&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272983 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5254.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5254/head:pull/5254 PR: https://git.openjdk.java.net/jdk/pull/5254 From tschatzl at openjdk.java.net Wed Aug 25 14:47:33 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 25 Aug 2021 14:47:33 GMT Subject: RFR: 8272651: G1 heap region info print order changed by JDK-8269914 In-Reply-To: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> References: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> Message-ID: On Wed, 25 Aug 2021 12:40:16 GMT, Stefan Johansson wrote: > Please review this small change to restore and coordinate the order of the G1 logs. > > **Summary** > Recent changes have slightly re-ordered parts of the G1 logs printed with `-Xlog:gc*` for young and full collections. This change will mostly restore the order but also address some differences we had in the past. The biggest part of this change is making sure the region transitions are printed at the "correct" place. For young collections they are moved to just after the phases and for full collections they are moved to be before the end of the pause log. > > Example logs: > > [0,046s][info][gc,start ] GC(0) Pause Young (Concurrent Start) (System.gc()) > [0,048s][info][gc,task ] GC(0) Using 24 workers of 28 for evacuation > [0,049s][info][gc,phases ] GC(0) Pre Evacuate Collection Set: 0,1ms > [0,049s][info][gc,phases ] GC(0) Merge Heap Roots: 0,1ms > [0,049s][info][gc,phases ] GC(0) Evacuate Collection Set: 0,4ms > [0,049s][info][gc,phases ] GC(0) Post Evacuate Collection Set: 0,3ms > [0,049s][info][gc,phases ] GC(0) Other: 2,5ms > [0,049s][info][gc,heap ] GC(0) Eden regions: 1->0(5) > [0,049s][info][gc,heap ] GC(0) Survivor regions: 0->1(1) > [0,049s][info][gc,heap ] GC(0) Old regions: 0->0 > [0,049s][info][gc,heap ] GC(0) Archive regions: 2->2 > [0,049s][info][gc,heap ] GC(0) Humongous regions: 0->0 > [0,049s][info][gc,metaspace] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) > [0,049s][info][gc ] GC(0) Pause Young (Concurrent Start) (System.gc()) 9M->8M(1024M) 3,563ms > [0,049s][info][gc,cpu ] GC(0) User=0,00s Sys=0,00s Real=0,00s > > > [0,044s][info][gc,start ] GC(0) Pause Full (System.gc()) > [0,044s][info][gc,task ] GC(0) Using 3 workers of 28 for full compaction > [0,044s][info][gc,phases,start] GC(0) Phase 1: Mark live objects > [0,044s][info][gc,phases ] GC(0) Phase 1: Mark live objects 0,607ms > [0,044s][info][gc,phases,start] GC(0) Phase 2: Prepare for compaction > [0,045s][info][gc,phases ] GC(0) Phase 2: Prepare for compaction 0,772ms > [0,045s][info][gc,phases,start] GC(0) Phase 3: Adjust pointers > [0,046s][info][gc,phases ] GC(0) Phase 3: Adjust pointers 0,361ms > [0,046s][info][gc,phases,start] GC(0) Phase 4: Compact heap > [0,046s][info][gc,phases ] GC(0) Phase 4: Compact heap 0,142ms > [0,050s][info][gc,heap ] GC(0) Eden regions: 1->0(2) > [0,050s][info][gc,heap ] GC(0) Survivor regions: 0->0(0) > [0,050s][info][gc,heap ] GC(0) Old regions: 0->1 > [0,050s][info][gc,heap ] GC(0) Archive regions: 2->2 > [0,050s][info][gc,heap ] GC(0) Humongous regions: 0->0 > [0,050s][info][gc,metaspace ] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) > [0,050s][info][gc ] GC(0) Pause Full (System.gc()) 9M->8M(80M) 6,729ms > [0,050s][info][gc,cpu ] GC(0) User=0,01s Sys=0,02s Real=0,01s > > > **Testing** > Locally run g1 tests and also some additional testing through mach5. Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3045: > 3043: bool concurrent_operation_is_full_mark = false; > 3044: > 3045: // Verification may use the gang workers, so they must be set up before. I see (setup of) the amount of gang workers as input to the garbage collection algorithm, i.e. not the algorithm determines the number of threads, but is determined beforehand before doing the collection. E.g. see G1YoungCollector::collect() in https://github.com/tschatzl/jdk/commit/7fcfdfad9e01e0ae263dc61c1797281a4d5b192f#diff-da5609c228269aa265489241cdb50e89299e79b8691e0e534abf2823d7de2615 src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3057: > 3055: // Create the heap printer before internal pause timing to have > 3056: // heap information printed as last part of detailed GC log. > 3057: G1HeapPrinterMark hpm(this); This will make this code (and printing) disappear from the "Other" time in pause timing. Some of that printing (and data gathering) might be expensive. ------------- PR: https://git.openjdk.java.net/jdk/pull/5252 From tschatzl at openjdk.java.net Wed Aug 25 14:48:23 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 25 Aug 2021 14:48:23 GMT Subject: RFR: 8272975: ParallelGC: add documentation to heap memory layout In-Reply-To: References: Message-ID: On Wed, 25 Aug 2021 10:03:02 GMT, Albert Mingkun Yang wrote: > Simple change of adding some documentation. Lgtm (and trivial if it matters, but please let somebody else have a look at it too). ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5247 From tschatzl at openjdk.java.net Wed Aug 25 16:06:25 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 25 Aug 2021 16:06:25 GMT Subject: RFR: 8272983: G1 Add marking details to eager reclaim logging In-Reply-To: References: Message-ID: On Wed, 25 Aug 2021 14:38:47 GMT, Ivan Walulya wrote: > Hi all, > > Please review this small change to add details on marking status in both prev and next marking bitmaps to the logging output in G1PrepareRegionsClosure. Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5254 From kbarrett at openjdk.java.net Wed Aug 25 17:07:24 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 25 Aug 2021 17:07:24 GMT Subject: RFR: 8272975: ParallelGC: add documentation to heap memory layout In-Reply-To: References: Message-ID: On Wed, 25 Aug 2021 10:03:02 GMT, Albert Mingkun Yang wrote: > Simple change of adding some documentation. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5247 From zgu at openjdk.java.net Wed Aug 25 17:43:41 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 25 Aug 2021 17:43:41 GMT Subject: RFR: 8267188: gc/stringdedup/TestStringDeduplicationInterned.java fails with Shenandoah [v3] In-Reply-To: References: Message-ID: <3HCBjEN78flEzjy9JCR9Stq7JNgd6v9kbAKjihLUgds=.6a723cd3-8582-484d-93a5-d9b3bb0db77b@github.com> > Shenandoah currently enqueues deduplication candidates when marks object gray. > > However, it can not do so when it scans roots, as it can potentially result lock rank inversion between stack watermark lock and dedup request buffer lock. > > As the result, it can not enqueue as many as candidates as it should be able to, I believe JDK-8271834 is due to the same problem. I purpose we switch to enqueue candidates when we mark object black. > > We are still not able to enqueue all candidates, only when they have displaced headers or have monitor inflating in progress. Upstream is working on removing displaced headers, we should revisit the logic afterward, or we can choose to deduplicate all string regardless their ages. > > > > Test: > - [x] hotspot_gc_shenandoah > - [x] tier1 with -XX:+UseShenandoahGC and -XX:+UseStringDeduplication Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'JDK-8267188-string-intern' of github.com:zhengyu123/jdk into JDK-8267188-string-intern - Aleksey's comments - Merge and fix conflicts - fix format - v0 - Merge branch 'master' into JDK-8267188-string-intern - v0 ------------- Changes: https://git.openjdk.java.net/jdk/pull/5151/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5151&range=02 Stats: 112 lines in 8 files changed: 35 ins; 28 del; 49 mod Patch: https://git.openjdk.java.net/jdk/pull/5151.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5151/head:pull/5151 PR: https://git.openjdk.java.net/jdk/pull/5151 From zgu at openjdk.java.net Wed Aug 25 20:19:29 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 25 Aug 2021 20:19:29 GMT Subject: Integrated: 8267188: gc/stringdedup/TestStringDeduplicationInterned.java fails with Shenandoah In-Reply-To: References: Message-ID: On Tue, 17 Aug 2021 18:32:54 GMT, Zhengyu Gu wrote: > Shenandoah currently enqueues deduplication candidates when marks object gray. > > However, it can not do so when it scans roots, as it can potentially result lock rank inversion between stack watermark lock and dedup request buffer lock. > > As the result, it can not enqueue as many as candidates as it should be able to, I believe JDK-8271834 is due to the same problem. I purpose we switch to enqueue candidates when we mark object black. > > We are still not able to enqueue all candidates, only when they have displaced headers or have monitor inflating in progress. Upstream is working on removing displaced headers, we should revisit the logic afterward, or we can choose to deduplicate all string regardless their ages. > > > > Test: > - [x] hotspot_gc_shenandoah > - [x] tier1 with -XX:+UseShenandoahGC and -XX:+UseStringDeduplication This pull request has now been integrated. Changeset: 7212561d Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/7212561dd1ec65d7f31792959f0eaaab6229eaf4 Stats: 112 lines in 8 files changed: 35 ins; 28 del; 49 mod 8267188: gc/stringdedup/TestStringDeduplicationInterned.java fails with Shenandoah Reviewed-by: rkennke, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/5151 From mli at openjdk.java.net Thu Aug 26 03:02:58 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 26 Aug 2021 03:02:58 GMT Subject: RFR: 8254739: G1: Optimize evacuation failure for regions with few failed objects [v4] In-Reply-To: References: Message-ID: > This is a try to optimize evcuation failure for regions. > I record every evacuation failure object per region (by G1EvacuationFailureObjsInHR), and then iterate (which indeed includes compact/sort/iteration) these objects directly in RemoveSelfForwardPtrHRClosure. > > I have tested it with following parameters, > > - -XX:+ParallelGCThreads=1/32/64 > - -XX:G1EvacuationFailureALotInterval=1 > - -XX:G1EvacuationFailureALotCount=2/10/100/1000/10000/100000 > > It saves "Remove Self Forwards" time all the time ,and in most condition it saves "Evacuate Collection Set" time. > > It brings some performance degradation when -XX:G1EvacuationFailureALotCount is low, such as *2*. To improve this a little, we can record the number evacuation failure object per region, and not record these objects when the number hit some limit. But I'm not sure if it's necessary to do so, as I think such condition is so extreme to be met in real environment, although I'm not quite sure. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Fix test failures; Fix compilation failures on some platforms ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5181/files - new: https://git.openjdk.java.net/jdk/pull/5181/files/05f026a0..ded82751 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5181&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5181&range=02-03 Stats: 18 lines in 2 files changed: 13 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/5181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5181/head:pull/5181 PR: https://git.openjdk.java.net/jdk/pull/5181 From mli at openjdk.java.net Thu Aug 26 07:08:02 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 26 Aug 2021 07:08:02 GMT Subject: RFR: 8254739: G1: Optimize evacuation failure for regions with few failed objects [v5] In-Reply-To: References: Message-ID: > This is a try to optimize evcuation failure for regions. > I record every evacuation failure object per region (by G1EvacuationFailureObjsInHR), and then iterate (which indeed includes compact/sort/iteration) these objects directly in RemoveSelfForwardPtrHRClosure. > > I have tested it with following parameters, > > - -XX:+ParallelGCThreads=1/32/64 > - -XX:G1EvacuationFailureALotInterval=1 > - -XX:G1EvacuationFailureALotCount=2/10/100/1000/10000/100000 > > It saves "Remove Self Forwards" time all the time ,and in most condition it saves "Evacuate Collection Set" time. > > It brings some performance degradation when -XX:G1EvacuationFailureALotCount is low, such as *2*. To improve this a little, we can record the number evacuation failure object per region, and not record these objects when the number hit some limit. But I'm not sure if it's necessary to do so, as I think such condition is so extreme to be met in real environment, although I'm not quite sure. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Fix test failures; Fix compilation failures on some platforms ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5181/files - new: https://git.openjdk.java.net/jdk/pull/5181/files/ded82751..51d19eb0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5181&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5181&range=03-04 Stats: 24 lines in 2 files changed: 14 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/5181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5181/head:pull/5181 PR: https://git.openjdk.java.net/jdk/pull/5181 From mli at openjdk.java.net Thu Aug 26 08:52:56 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 26 Aug 2021 08:52:56 GMT Subject: RFR: 8254739: G1: Optimize evacuation failure for regions with few failed objects [v6] In-Reply-To: References: Message-ID: <_626_mS0z4EA6C0_QpGbOQ1bMs3PRVLMNDZ8d1u2bOI=.c23666e3-83f1-4511-8aaf-cfbfb91796ac@github.com> > This is a try to optimize evcuation failure for regions. > I record every evacuation failure object per region (by G1EvacuationFailureObjsInHR), and then iterate (which indeed includes compact/sort/iteration) these objects directly in RemoveSelfForwardPtrHRClosure. > > I have tested it with following parameters, > > - -XX:+ParallelGCThreads=1/32/64 > - -XX:G1EvacuationFailureALotInterval=1 > - -XX:G1EvacuationFailureALotCount=2/10/100/1000/10000/100000 > > It saves "Remove Self Forwards" time all the time ,and in most condition it saves "Evacuate Collection Set" time. > > It brings some performance degradation when -XX:G1EvacuationFailureALotCount is low, such as *2*. To improve this a little, we can record the number evacuation failure object per region, and not record these objects when the number hit some limit. But I'm not sure if it's necessary to do so, as I think such condition is so extreme to be met in real environment, although I'm not quite sure. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Fix compilation error on windows ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5181/files - new: https://git.openjdk.java.net/jdk/pull/5181/files/51d19eb0..f4549256 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5181&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5181&range=04-05 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5181/head:pull/5181 PR: https://git.openjdk.java.net/jdk/pull/5181 From mli at openjdk.java.net Thu Aug 26 10:04:54 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 26 Aug 2021 10:04:54 GMT Subject: RFR: 8254739: G1: Optimize evacuation failure for regions with few failed objects [v7] In-Reply-To: References: Message-ID: > This is a try to optimize evcuation failure for regions. > I record every evacuation failure object per region (by G1EvacuationFailureObjsInHR), and then iterate (which indeed includes compact/sort/iteration) these objects directly in RemoveSelfForwardPtrHRClosure. > > I have tested it with following parameters, > > - -XX:+ParallelGCThreads=1/32/64 > - -XX:G1EvacuationFailureALotInterval=1 > - -XX:G1EvacuationFailureALotCount=2/10/100/1000/10000/100000 > > It saves "Remove Self Forwards" time all the time ,and in most condition it saves "Evacuate Collection Set" time. > > It brings some performance degradation when -XX:G1EvacuationFailureALotCount is low, such as *2*. To improve this a little, we can record the number evacuation failure object per region, and not record these objects when the number hit some limit. But I'm not sure if it's necessary to do so, as I think such condition is so extreme to be met in real environment, although I'm not quite sure. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Fix compilation error on windows ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5181/files - new: https://git.openjdk.java.net/jdk/pull/5181/files/f4549256..34aed3f8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5181&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5181&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5181/head:pull/5181 PR: https://git.openjdk.java.net/jdk/pull/5181 From ayang at openjdk.java.net Thu Aug 26 10:10:29 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 26 Aug 2021 10:10:29 GMT Subject: RFR: 8272975: ParallelGC: add documentation to heap memory layout In-Reply-To: References: Message-ID: On Wed, 25 Aug 2021 10:03:02 GMT, Albert Mingkun Yang wrote: > Simple change of adding some documentation. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5247 From ayang at openjdk.java.net Thu Aug 26 10:10:29 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 26 Aug 2021 10:10:29 GMT Subject: Integrated: 8272975: ParallelGC: add documentation to heap memory layout In-Reply-To: References: Message-ID: <27PhEXS9QQuOUm-o01Irk6dhAW0I8Khw5FkEotT5ymU=.9598cd9c-62e3-4b93-a841-035c635c369e@github.com> On Wed, 25 Aug 2021 10:03:02 GMT, Albert Mingkun Yang wrote: > Simple change of adding some documentation. This pull request has now been integrated. Changeset: 11c9fd82 Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/11c9fd8298030200727a0e682fd9afb69ca8eb81 Stats: 20 lines in 1 file changed: 20 ins; 0 del; 0 mod 8272975: ParallelGC: add documentation to heap memory layout Co-authored-by: Thomas Schatzl Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/5247 From mli at openjdk.java.net Thu Aug 26 10:18:24 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 26 Aug 2021 10:18:24 GMT Subject: RFR: 8254739: G1: Optimize evacuation failure for regions with few failed objects [v7] In-Reply-To: References: Message-ID: On Thu, 26 Aug 2021 10:04:54 GMT, Hamlin Li wrote: >> This is a try to optimize evcuation failure for regions. >> I record every evacuation failure object per region (by G1EvacuationFailureObjsInHR), and then iterate (which indeed includes compact/sort/iteration) these objects directly in RemoveSelfForwardPtrHRClosure. >> >> I have tested it with following parameters, >> >> - -XX:+ParallelGCThreads=1/32/64 >> - -XX:G1EvacuationFailureALotInterval=1 >> - -XX:G1EvacuationFailureALotCount=2/10/100/1000/10000/100000 >> >> It saves "Remove Self Forwards" time all the time ,and in most condition it saves "Evacuate Collection Set" time. >> >> It brings some performance degradation when -XX:G1EvacuationFailureALotCount is low, such as *2*. To improve this a little, we can record the number evacuation failure object per region, and not record these objects when the number hit some limit. But I'm not sure if it's necessary to do so, as I think such condition is so extreme to be met in real environment, although I'm not quite sure. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Fix compilation error on windows The test based on new version (segmented array) shows that most (more than 90%) of iteration time is spent on "iterate_internal", compact cost almost no time, and less than 10% time is spent on sort. And I also attach the perf data on the JBS bug, for "end to end" time/"pause young" time/"Evacuate Collection Set? time/"Post Evacuate Collection Set" time/"Remove Self Forwards" time. Generally I think the new implemention works well for G1EvacuationFailureALotCount == 1/2/..., not just for G1EvacuationFailureALotCount >= 10. ------------- PR: https://git.openjdk.java.net/jdk/pull/5181 From sjohanss at openjdk.java.net Thu Aug 26 10:37:58 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 26 Aug 2021 10:37:58 GMT Subject: RFR: 8272651: G1 heap region info print order changed by JDK-8269914 In-Reply-To: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> References: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> Message-ID: On Wed, 25 Aug 2021 12:40:16 GMT, Stefan Johansson wrote: > Please review this small change to restore and coordinate the order of the G1 logs. > > **Summary** > Recent changes have slightly re-ordered parts of the G1 logs printed with `-Xlog:gc*` for young and full collections. This change will mostly restore the order but also address some differences we had in the past. The biggest part of this change is making sure the region transitions are printed at the "correct" place. For young collections they are moved to just after the phases and for full collections they are moved to be before the end of the pause log. > > Example logs: > > [0,046s][info][gc,start ] GC(0) Pause Young (Concurrent Start) (System.gc()) > [0,048s][info][gc,task ] GC(0) Using 24 workers of 28 for evacuation > [0,049s][info][gc,phases ] GC(0) Pre Evacuate Collection Set: 0,1ms > [0,049s][info][gc,phases ] GC(0) Merge Heap Roots: 0,1ms > [0,049s][info][gc,phases ] GC(0) Evacuate Collection Set: 0,4ms > [0,049s][info][gc,phases ] GC(0) Post Evacuate Collection Set: 0,3ms > [0,049s][info][gc,phases ] GC(0) Other: 2,5ms > [0,049s][info][gc,heap ] GC(0) Eden regions: 1->0(5) > [0,049s][info][gc,heap ] GC(0) Survivor regions: 0->1(1) > [0,049s][info][gc,heap ] GC(0) Old regions: 0->0 > [0,049s][info][gc,heap ] GC(0) Archive regions: 2->2 > [0,049s][info][gc,heap ] GC(0) Humongous regions: 0->0 > [0,049s][info][gc,metaspace] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) > [0,049s][info][gc ] GC(0) Pause Young (Concurrent Start) (System.gc()) 9M->8M(1024M) 3,563ms > [0,049s][info][gc,cpu ] GC(0) User=0,00s Sys=0,00s Real=0,00s > > > [0,044s][info][gc,start ] GC(0) Pause Full (System.gc()) > [0,044s][info][gc,task ] GC(0) Using 3 workers of 28 for full compaction > [0,044s][info][gc,phases,start] GC(0) Phase 1: Mark live objects > [0,044s][info][gc,phases ] GC(0) Phase 1: Mark live objects 0,607ms > [0,044s][info][gc,phases,start] GC(0) Phase 2: Prepare for compaction > [0,045s][info][gc,phases ] GC(0) Phase 2: Prepare for compaction 0,772ms > [0,045s][info][gc,phases,start] GC(0) Phase 3: Adjust pointers > [0,046s][info][gc,phases ] GC(0) Phase 3: Adjust pointers 0,361ms > [0,046s][info][gc,phases,start] GC(0) Phase 4: Compact heap > [0,046s][info][gc,phases ] GC(0) Phase 4: Compact heap 0,142ms > [0,050s][info][gc,heap ] GC(0) Eden regions: 1->0(2) > [0,050s][info][gc,heap ] GC(0) Survivor regions: 0->0(0) > [0,050s][info][gc,heap ] GC(0) Old regions: 0->1 > [0,050s][info][gc,heap ] GC(0) Archive regions: 2->2 > [0,050s][info][gc,heap ] GC(0) Humongous regions: 0->0 > [0,050s][info][gc,metaspace ] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) > [0,050s][info][gc ] GC(0) Pause Full (System.gc()) 9M->8M(80M) 6,729ms > [0,050s][info][gc,cpu ] GC(0) User=0,01s Sys=0,02s Real=0,01s > > > **Testing** > Locally run g1 tests and also some additional testing through mach5. Pushed a small change to make sure all JFR events are sent in the correct order. ------------- PR: https://git.openjdk.java.net/jdk/pull/5252 From sjohanss at openjdk.java.net Thu Aug 26 10:37:56 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 26 Aug 2021 10:37:56 GMT Subject: RFR: 8272651: G1 heap region info print order changed by JDK-8269914 [v2] In-Reply-To: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> References: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> Message-ID: > Please review this small change to restore and coordinate the order of the G1 logs. > > **Summary** > Recent changes have slightly re-ordered parts of the G1 logs printed with `-Xlog:gc*` for young and full collections. This change will mostly restore the order but also address some differences we had in the past. The biggest part of this change is making sure the region transitions are printed at the "correct" place. For young collections they are moved to just after the phases and for full collections they are moved to be before the end of the pause log. > > Example logs: > > [0,046s][info][gc,start ] GC(0) Pause Young (Concurrent Start) (System.gc()) > [0,048s][info][gc,task ] GC(0) Using 24 workers of 28 for evacuation > [0,049s][info][gc,phases ] GC(0) Pre Evacuate Collection Set: 0,1ms > [0,049s][info][gc,phases ] GC(0) Merge Heap Roots: 0,1ms > [0,049s][info][gc,phases ] GC(0) Evacuate Collection Set: 0,4ms > [0,049s][info][gc,phases ] GC(0) Post Evacuate Collection Set: 0,3ms > [0,049s][info][gc,phases ] GC(0) Other: 2,5ms > [0,049s][info][gc,heap ] GC(0) Eden regions: 1->0(5) > [0,049s][info][gc,heap ] GC(0) Survivor regions: 0->1(1) > [0,049s][info][gc,heap ] GC(0) Old regions: 0->0 > [0,049s][info][gc,heap ] GC(0) Archive regions: 2->2 > [0,049s][info][gc,heap ] GC(0) Humongous regions: 0->0 > [0,049s][info][gc,metaspace] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) > [0,049s][info][gc ] GC(0) Pause Young (Concurrent Start) (System.gc()) 9M->8M(1024M) 3,563ms > [0,049s][info][gc,cpu ] GC(0) User=0,00s Sys=0,00s Real=0,00s > > > [0,044s][info][gc,start ] GC(0) Pause Full (System.gc()) > [0,044s][info][gc,task ] GC(0) Using 3 workers of 28 for full compaction > [0,044s][info][gc,phases,start] GC(0) Phase 1: Mark live objects > [0,044s][info][gc,phases ] GC(0) Phase 1: Mark live objects 0,607ms > [0,044s][info][gc,phases,start] GC(0) Phase 2: Prepare for compaction > [0,045s][info][gc,phases ] GC(0) Phase 2: Prepare for compaction 0,772ms > [0,045s][info][gc,phases,start] GC(0) Phase 3: Adjust pointers > [0,046s][info][gc,phases ] GC(0) Phase 3: Adjust pointers 0,361ms > [0,046s][info][gc,phases,start] GC(0) Phase 4: Compact heap > [0,046s][info][gc,phases ] GC(0) Phase 4: Compact heap 0,142ms > [0,050s][info][gc,heap ] GC(0) Eden regions: 1->0(2) > [0,050s][info][gc,heap ] GC(0) Survivor regions: 0->0(0) > [0,050s][info][gc,heap ] GC(0) Old regions: 0->1 > [0,050s][info][gc,heap ] GC(0) Archive regions: 2->2 > [0,050s][info][gc,heap ] GC(0) Humongous regions: 0->0 > [0,050s][info][gc,metaspace ] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) > [0,050s][info][gc ] GC(0) Pause Full (System.gc()) 9M->8M(80M) 6,729ms > [0,050s][info][gc,cpu ] GC(0) User=0,01s Sys=0,02s Real=0,01s > > > **Testing** > Locally run g1 tests and also some additional testing through mach5. Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: HeapMark needs to be below JFRMark to make sure all GC events are withing start and end event. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5252/files - new: https://git.openjdk.java.net/jdk/pull/5252/files/4998d389..56078ab0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5252&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5252&range=00-01 Stats: 10 lines in 1 file changed: 5 ins; 5 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5252.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5252/head:pull/5252 PR: https://git.openjdk.java.net/jdk/pull/5252 From sjohanss at openjdk.java.net Thu Aug 26 10:38:03 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 26 Aug 2021 10:38:03 GMT Subject: RFR: 8272651: G1 heap region info print order changed by JDK-8269914 [v2] In-Reply-To: References: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> Message-ID: On Wed, 25 Aug 2021 13:00:27 GMT, Thomas Schatzl wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> HeapMark needs to be below JFRMark to make sure all GC events are withing start and end event. > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3045: > >> 3043: bool concurrent_operation_is_full_mark = false; >> 3044: >> 3045: // Verification may use the gang workers, so they must be set up before. > > I see (setup of) the amount of gang workers as input to the garbage collection algorithm, i.e. not the algorithm determines the number of threads, but is determined beforehand before doing the collection. > > E.g. see G1YoungCollector::collect() in https://github.com/tschatzl/jdk/commit/7fcfdfad9e01e0ae263dc61c1797281a4d5b192f#diff-da5609c228269aa265489241cdb50e89299e79b8691e0e534abf2823d7de2615 We discussed this as well and compared it with how the `G1FullCollector`handles this and whit this in mind the current proposal is ok. We might want to do more refactoring around this later on, but right now we can see calculating the number of workers as part of the young collector and not only input. > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3057: > >> 3055: // Create the heap printer before internal pause timing to have >> 3056: // heap information printed as last part of detailed GC log. >> 3057: G1HeapPrinterMark hpm(this); > > This will make this code (and printing) disappear from the "Other" time in pause timing. > > Some of that printing (and data gathering) might be expensive. Me and Thomas discussed this and we agreed on this being ok. Even if some part of the printing could take a little time it is not important to record it as "other time". ------------- PR: https://git.openjdk.java.net/jdk/pull/5252 From ayang at openjdk.java.net Thu Aug 26 11:10:26 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 26 Aug 2021 11:10:26 GMT Subject: RFR: 8272983: G1 Add marking details to eager reclaim logging In-Reply-To: References: Message-ID: On Wed, 25 Aug 2021 14:38:47 GMT, Ivan Walulya wrote: > Hi all, > > Please review this small change to add details on marking status in both prev and next marking bitmaps to the logging output in G1PrepareRegionsClosure. Marked as reviewed by ayang (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5254 From iwalulya at openjdk.java.net Thu Aug 26 13:45:27 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Thu, 26 Aug 2021 13:45:27 GMT Subject: RFR: 8272983: G1 Add marking details to eager reclaim logging In-Reply-To: References: Message-ID: On Wed, 25 Aug 2021 14:38:47 GMT, Ivan Walulya wrote: > Hi all, > > Please review this small change to add details on marking status in both prev and next marking bitmaps to the logging output in G1PrepareRegionsClosure. Thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5254 From iwalulya at openjdk.java.net Thu Aug 26 13:45:27 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Thu, 26 Aug 2021 13:45:27 GMT Subject: Integrated: 8272983: G1 Add marking details to eager reclaim logging In-Reply-To: References: Message-ID: On Wed, 25 Aug 2021 14:38:47 GMT, Ivan Walulya wrote: > Hi all, > > Please review this small change to add details on marking status in both prev and next marking bitmaps to the logging output in G1PrepareRegionsClosure. This pull request has now been integrated. Changeset: 845e1cea Author: Ivan Walulya URL: https://git.openjdk.java.net/jdk/commit/845e1cea8dcf0e902a2b6d3bf87749108c21c320 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8272983: G1 Add marking details to eager reclaim logging Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.java.net/jdk/pull/5254 From fmatte at openjdk.java.net Thu Aug 26 14:29:26 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Thu, 26 Aug 2021 14:29:26 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 In-Reply-To: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 18 Aug 2021 12:37:00 GMT, Fairoz Matte wrote: > This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp > > The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >> __ move(addr, tmp); > However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. > The fix is to check if it is is_oop() and call the mov appropriately. > > No issues found in local testing and Mach5 tier1-3 Hi Tobias, Thanks for looking into this thread. There is reproducer but it has lot of dependencies with dacapo-9.12-MR1-bach.jar and I am unable to isolate to smaller case. I tried with your suggestion but could not resolve the issue. Below is the original issue reproduced on fastdebug build (Updated the jbs bug) # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/tank/fmatte/bugs/8272563/openjdk-reproduce-5164/src/hotspot/share/c1/c1_LIR.hpp:413), pid=28744, tid=28763 # assert(is_double_stack() && !is_virtual()) failed: type check # # JRE version: OpenJDK Runtime Environment (16.0.2) (fastdebug build 16.0.2-internal+0-adhoc.fmatte.openjdk-reproduce-5164) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 16.0.2-internal+0-adhoc.fmatte.openjdk-reproduce-5164, compiled mode, tiered, compressed oops, compressed class ptrs, serial gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x7202f3] LIR_OprDesc::double_stack_ix() const+0x43 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /tank/fmatte/bugs/8272563/openjdk-reproduce-5164/core.28744) # # An error report file with more information is saved as: # error.log # # Compiler replay data is saved as: # /tank/fmatte/bugs/8272563/openjdk-reproduce-5164/replay_pid28744.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From ayang at openjdk.java.net Thu Aug 26 14:51:38 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 26 Aug 2021 14:51:38 GMT Subject: RFR: 8273033: SerialGC: remove obsolete comments Message-ID: Trivial change of removing obsolete comments. ------------- Commit messages: - trivial Changes: https://git.openjdk.java.net/jdk/pull/5267/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5267&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273033 Stats: 7 lines in 1 file changed: 0 ins; 7 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5267.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5267/head:pull/5267 PR: https://git.openjdk.java.net/jdk/pull/5267 From tschatzl at openjdk.java.net Thu Aug 26 17:10:29 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 26 Aug 2021 17:10:29 GMT Subject: RFR: 8272651: G1 heap region info print order changed by JDK-8269914 [v2] In-Reply-To: References: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> Message-ID: On Thu, 26 Aug 2021 10:37:56 GMT, Stefan Johansson wrote: >> Please review this small change to restore and coordinate the order of the G1 logs. >> >> **Summary** >> Recent changes have slightly re-ordered parts of the G1 logs printed with `-Xlog:gc*` for young and full collections. This change will mostly restore the order but also address some differences we had in the past. The biggest part of this change is making sure the region transitions are printed at the "correct" place. For young collections they are moved to just after the phases and for full collections they are moved to be before the end of the pause log. >> >> Example logs: >> >> [0,046s][info][gc,start ] GC(0) Pause Young (Concurrent Start) (System.gc()) >> [0,048s][info][gc,task ] GC(0) Using 24 workers of 28 for evacuation >> [0,049s][info][gc,phases ] GC(0) Pre Evacuate Collection Set: 0,1ms >> [0,049s][info][gc,phases ] GC(0) Merge Heap Roots: 0,1ms >> [0,049s][info][gc,phases ] GC(0) Evacuate Collection Set: 0,4ms >> [0,049s][info][gc,phases ] GC(0) Post Evacuate Collection Set: 0,3ms >> [0,049s][info][gc,phases ] GC(0) Other: 2,5ms >> [0,049s][info][gc,heap ] GC(0) Eden regions: 1->0(5) >> [0,049s][info][gc,heap ] GC(0) Survivor regions: 0->1(1) >> [0,049s][info][gc,heap ] GC(0) Old regions: 0->0 >> [0,049s][info][gc,heap ] GC(0) Archive regions: 2->2 >> [0,049s][info][gc,heap ] GC(0) Humongous regions: 0->0 >> [0,049s][info][gc,metaspace] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) >> [0,049s][info][gc ] GC(0) Pause Young (Concurrent Start) (System.gc()) 9M->8M(1024M) 3,563ms >> [0,049s][info][gc,cpu ] GC(0) User=0,00s Sys=0,00s Real=0,00s >> >> >> [0,044s][info][gc,start ] GC(0) Pause Full (System.gc()) >> [0,044s][info][gc,task ] GC(0) Using 3 workers of 28 for full compaction >> [0,044s][info][gc,phases,start] GC(0) Phase 1: Mark live objects >> [0,044s][info][gc,phases ] GC(0) Phase 1: Mark live objects 0,607ms >> [0,044s][info][gc,phases,start] GC(0) Phase 2: Prepare for compaction >> [0,045s][info][gc,phases ] GC(0) Phase 2: Prepare for compaction 0,772ms >> [0,045s][info][gc,phases,start] GC(0) Phase 3: Adjust pointers >> [0,046s][info][gc,phases ] GC(0) Phase 3: Adjust pointers 0,361ms >> [0,046s][info][gc,phases,start] GC(0) Phase 4: Compact heap >> [0,046s][info][gc,phases ] GC(0) Phase 4: Compact heap 0,142ms >> [0,050s][info][gc,heap ] GC(0) Eden regions: 1->0(2) >> [0,050s][info][gc,heap ] GC(0) Survivor regions: 0->0(0) >> [0,050s][info][gc,heap ] GC(0) Old regions: 0->1 >> [0,050s][info][gc,heap ] GC(0) Archive regions: 2->2 >> [0,050s][info][gc,heap ] GC(0) Humongous regions: 0->0 >> [0,050s][info][gc,metaspace ] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) >> [0,050s][info][gc ] GC(0) Pause Full (System.gc()) 9M->8M(80M) 6,729ms >> [0,050s][info][gc,cpu ] GC(0) User=0,01s Sys=0,02s Real=0,01s >> >> >> **Testing** >> Locally run g1 tests and also some additional testing through mach5. > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > HeapMark needs to be below JFRMark to make sure all GC events are withing start and end event. As discussed with Stefan - lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5252 From iveresov at openjdk.java.net Fri Aug 27 03:42:24 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Fri, 27 Aug 2021 03:42:24 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 In-Reply-To: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 18 Aug 2021 12:37:00 GMT, Fairoz Matte wrote: > This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp > > The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >> __ move(addr, tmp); > However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. > The fix is to check if it is is_oop() and call the mov appropriately. > > No issues found in local testing and Mach5 tier1-3 I think we have to somehow tell the register allocator not to spill arguments of type converting moves. We already have some logic there to prevent spills in logic and arithmetic involving T_OBJECT: https://github.com/openjdk/jdk/blob/d732c3091fea0a7c6d6767227de89002564504e5/src/hotspot/share/c1/c1_LinearScan.cpp#L1135 Perhaps we should also detect the conversion case when ( `src` and `dst` ) are of different types and force both to be in registers? Basically insert some logic here (reg-reg move): https://github.com/openjdk/jdk/blob/d732c3091fea0a7c6d6767227de89002564504e5/src/hotspot/share/c1/c1_LinearScan.cpp#L1068 Check the types, and if it's the T_OBJECT->T_LONG situation return `mustHaveRegister` ? May be there is better solution, but that's the first thing that came to mind. ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From mli at openjdk.java.net Fri Aug 27 04:18:42 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 27 Aug 2021 04:18:42 GMT Subject: RFR: 8254167 G1: Record regions where evacuation failed to provide targeted iteration Message-ID: This is a try to optimize evcuation failure for regions. I record evacuation failed regions, and iterate these regions directly rather than iterate the whole cset. The implementation reuses and refactors some of the existing data in g1CollectedHeap (_regions_failed_evacuation, _num_regions_failed_evacuation), and records these regions in an array, and iterate this array later in post evacuation phase. I have 2 implementations: - 1) CHT - 2) bitmap (reuse _regions_failed_evacuation in g1CollectedHeap) This implementation does not consider work distribution as mentioned in JDK-8254167 yet. But seems it already get better&stable performance gain than origin. We could improve it further later if work distribution is necessary. I will attach the perf data in JBS. ------------- Commit messages: - merge with existing classes - Initial commit Changes: https://git.openjdk.java.net/jdk/pull/5272/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5272&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254167 Stats: 218 lines in 6 files changed: 200 ins; 11 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/5272.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5272/head:pull/5272 PR: https://git.openjdk.java.net/jdk/pull/5272 From mli at openjdk.java.net Fri Aug 27 04:18:43 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 27 Aug 2021 04:18:43 GMT Subject: RFR: 8254167 G1: Record regions where evacuation failed to provide targeted iteration In-Reply-To: References: Message-ID: On Fri, 27 Aug 2021 04:09:52 GMT, Hamlin Li wrote: > This is a try to optimize evcuation failure for regions. > I record evacuation failed regions, and iterate these regions directly rather than iterate the whole cset. > The implementation reuses and refactors some of the existing data in g1CollectedHeap (_regions_failed_evacuation, _num_regions_failed_evacuation), and records these regions in an array, and iterate this array later in post evacuation phase. > > I have 2 implementations: > > - 1) CHT > - 2) bitmap (reuse _regions_failed_evacuation in g1CollectedHeap) > > This implementation does not consider work distribution as mentioned in JDK-8254167 yet. But seems it already get better&stable performance gain than origin. We could improve it further later if work distribution is necessary. > > I will attach the perf data in JBS. I will push the bitmap version later ------------- PR: https://git.openjdk.java.net/jdk/pull/5272 From mli at openjdk.java.net Fri Aug 27 04:25:26 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 27 Aug 2021 04:25:26 GMT Subject: RFR: 8254167 G1: Record regions where evacuation failed to provide targeted iteration In-Reply-To: References: Message-ID: On Fri, 27 Aug 2021 04:09:52 GMT, Hamlin Li wrote: > This is a try to optimize evcuation failure for regions. > I record evacuation failed regions, and iterate these regions directly rather than iterate the whole cset. > The implementation reuses and refactors some of the existing data in g1CollectedHeap (_regions_failed_evacuation, _num_regions_failed_evacuation), and records these regions in an array, and iterate this array later in post evacuation phase. > > I have 2 implementations: > > - 1) CHT > - 2) bitmap (reuse _regions_failed_evacuation in g1CollectedHeap) > > This implementation does not consider work distribution as mentioned in JDK-8254167 yet. But seems it already get better&stable performance gain than origin. We could improve it further later if work distribution is necessary. > > I will attach the perf data in JBS. Seem the test failure is not related to this patch, as my local test success, and it shows that there is no Evacuation Failure triggered during the test. ------------- PR: https://git.openjdk.java.net/jdk/pull/5272 From kbarrett at openjdk.java.net Fri Aug 27 08:54:40 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 27 Aug 2021 08:54:40 GMT Subject: RFR: 8273062: Generation::refs_discovery_is_xxx functions are unused Message-ID: Please review this trivial cleanup, removing a couple of unused virtual function declarations from the Generation class: virtual bool refs_discovery_is_atomic() const { return true; } virtual bool refs_discovery_is_mt() const { return false; } These functions are not called or overridden. Testing: local (linux-x64) build. ------------- Commit messages: - remove unused declarations Changes: https://git.openjdk.java.net/jdk/pull/5275/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5275&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273062 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5275.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5275/head:pull/5275 PR: https://git.openjdk.java.net/jdk/pull/5275 From ayang at openjdk.java.net Fri Aug 27 09:00:28 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 27 Aug 2021 09:00:28 GMT Subject: RFR: 8273062: Generation::refs_discovery_is_xxx functions are unused In-Reply-To: References: Message-ID: On Fri, 27 Aug 2021 08:47:24 GMT, Kim Barrett wrote: > Please review this trivial cleanup, removing a couple of unused virtual > function declarations from the Generation class: > > virtual bool refs_discovery_is_atomic() const { return true; } > virtual bool refs_discovery_is_mt() const { return false; } > > These functions are not called or overridden. > > Testing: > local (linux-x64) build. LGTM and trivial. ------------- Marked as reviewed by ayang (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5275 From github.com+3094961+buddyliao at openjdk.java.net Fri Aug 27 09:11:24 2021 From: github.com+3094961+buddyliao at openjdk.java.net (Bin Liao) Date: Fri, 27 Aug 2021 09:11:24 GMT Subject: RFR: 8272611: Use parallel heap inspection when print classhisto info in gc log In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 10:50:45 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/shared/collectedHeap.cpp line 551: >> >>> 549: ResourceMark rm; >>> 550: LogStream ls(lt); >>> 551: uint parallel_thread_num = MAX2(1, (uint)os::initial_active_processor_count() * 3 / 8); >> >> Where do the magic numbers for the formula come from? Please consider using the ergonomically determined number of threads (`CollectedHeap::safepoint_workers()->active_workers()`)? (Maybe change this in another CR as other reviewers think is better, looking whether there is a significant difference between these numbers) > > Note: this is probably copy&pasted from `ClassHistogramDCmd::execute`; instead of the copy&paste please at least add something like a static `default_parallel_thread_num()` method to `VM_GC_HeapInspection`. > Where do the magic numbers for the formula come from? Please consider using the ergonomically determined number of threads (`CollectedHeap::safepoint_workers()->active_workers()`)? (Maybe change this in another CR as other reviewers think is better, looking whether there is a significant difference between these numbers) Thank you for your advice, it seems that (CollectedHeap::safepoint_workers()->active_workers()) looks better. ------------- PR: https://git.openjdk.java.net/jdk/pull/5158 From github.com+3094961+buddyliao at openjdk.java.net Fri Aug 27 09:11:25 2021 From: github.com+3094961+buddyliao at openjdk.java.net (Bin Liao) Date: Fri, 27 Aug 2021 09:11:25 GMT Subject: RFR: 8272611: Use parallel heap inspection when print classhisto info in gc log In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 10:43:40 GMT, Thomas Schatzl wrote: > Pre-existing: the comment next to a bool constant in a parameter list should show the name of the parameter, i.e. s/! full_gc/request_full_gc/. Maybe change this as appropriate in other callers too. Sounds reasonable, so I will change here in this PR, and the other places in another new pr, is it ok? ------------- PR: https://git.openjdk.java.net/jdk/pull/5158 From kbarrett at openjdk.java.net Fri Aug 27 10:36:33 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 27 Aug 2021 10:36:33 GMT Subject: Integrated: 8273062: Generation::refs_discovery_is_xxx functions are unused In-Reply-To: References: Message-ID: On Fri, 27 Aug 2021 08:47:24 GMT, Kim Barrett wrote: > Please review this trivial cleanup, removing a couple of unused virtual > function declarations from the Generation class: > > virtual bool refs_discovery_is_atomic() const { return true; } > virtual bool refs_discovery_is_mt() const { return false; } > > These functions are not called or overridden. > > Testing: > local (linux-x64) build. This pull request has now been integrated. Changeset: a49a0c58 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/a49a0c58662bef48d02246193d86cc89fb9d030b Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod 8273062: Generation::refs_discovery_is_xxx functions are unused Reviewed-by: ayang ------------- PR: https://git.openjdk.java.net/jdk/pull/5275 From kbarrett at openjdk.java.net Fri Aug 27 10:36:32 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 27 Aug 2021 10:36:32 GMT Subject: RFR: 8273062: Generation::refs_discovery_is_xxx functions are unused In-Reply-To: References: Message-ID: On Fri, 27 Aug 2021 08:57:33 GMT, Albert Mingkun Yang wrote: >> Please review this trivial cleanup, removing a couple of unused virtual >> function declarations from the Generation class: >> >> virtual bool refs_discovery_is_atomic() const { return true; } >> virtual bool refs_discovery_is_mt() const { return false; } >> >> These functions are not called or overridden. >> >> Testing: >> local (linux-x64) build. > > LGTM and trivial. Thanks @albertnetymk for review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5275 From sjohanss at openjdk.java.net Fri Aug 27 12:32:27 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 27 Aug 2021 12:32:27 GMT Subject: RFR: 8273033: SerialGC: remove obsolete comments In-Reply-To: References: Message-ID: On Thu, 26 Aug 2021 14:44:20 GMT, Albert Mingkun Yang wrote: > Trivial change of removing obsolete comments. ? ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5267 From sjohanss at openjdk.java.net Fri Aug 27 12:42:28 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 27 Aug 2021 12:42:28 GMT Subject: RFR: 8272161: Make evacuation failure data structures local to collection [v2] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 12:49:55 GMT, Thomas Schatzl wrote: >> Hi all, >> >> can I have reviews for this change that moves some evacuation (failure) related data structures into the per-thread evacuation context (`G1ParThreadScanState`) which removes some young collection related code from G1CollectedHeap? >> >> Testing: tier1-5 >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Preserved marks stacks must be allocated on C heap. Lgtm ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5066 From iwalulya at openjdk.java.net Fri Aug 27 12:49:27 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Fri, 27 Aug 2021 12:49:27 GMT Subject: RFR: 8272651: G1 heap region info print order changed by JDK-8269914 [v2] In-Reply-To: References: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> Message-ID: <6xVr-UcMm2tykxqmPzFpSGT62-pgjdFGdw9zGi7KbmM=.2328159e-aa3a-48af-83d2-645cbde79c89@github.com> On Thu, 26 Aug 2021 10:37:56 GMT, Stefan Johansson wrote: >> Please review this small change to restore and coordinate the order of the G1 logs. >> >> **Summary** >> Recent changes have slightly re-ordered parts of the G1 logs printed with `-Xlog:gc*` for young and full collections. This change will mostly restore the order but also address some differences we had in the past. The biggest part of this change is making sure the region transitions are printed at the "correct" place. For young collections they are moved to just after the phases and for full collections they are moved to be before the end of the pause log. >> >> Example logs: >> >> [0,046s][info][gc,start ] GC(0) Pause Young (Concurrent Start) (System.gc()) >> [0,048s][info][gc,task ] GC(0) Using 24 workers of 28 for evacuation >> [0,049s][info][gc,phases ] GC(0) Pre Evacuate Collection Set: 0,1ms >> [0,049s][info][gc,phases ] GC(0) Merge Heap Roots: 0,1ms >> [0,049s][info][gc,phases ] GC(0) Evacuate Collection Set: 0,4ms >> [0,049s][info][gc,phases ] GC(0) Post Evacuate Collection Set: 0,3ms >> [0,049s][info][gc,phases ] GC(0) Other: 2,5ms >> [0,049s][info][gc,heap ] GC(0) Eden regions: 1->0(5) >> [0,049s][info][gc,heap ] GC(0) Survivor regions: 0->1(1) >> [0,049s][info][gc,heap ] GC(0) Old regions: 0->0 >> [0,049s][info][gc,heap ] GC(0) Archive regions: 2->2 >> [0,049s][info][gc,heap ] GC(0) Humongous regions: 0->0 >> [0,049s][info][gc,metaspace] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) >> [0,049s][info][gc ] GC(0) Pause Young (Concurrent Start) (System.gc()) 9M->8M(1024M) 3,563ms >> [0,049s][info][gc,cpu ] GC(0) User=0,00s Sys=0,00s Real=0,00s >> >> >> [0,044s][info][gc,start ] GC(0) Pause Full (System.gc()) >> [0,044s][info][gc,task ] GC(0) Using 3 workers of 28 for full compaction >> [0,044s][info][gc,phases,start] GC(0) Phase 1: Mark live objects >> [0,044s][info][gc,phases ] GC(0) Phase 1: Mark live objects 0,607ms >> [0,044s][info][gc,phases,start] GC(0) Phase 2: Prepare for compaction >> [0,045s][info][gc,phases ] GC(0) Phase 2: Prepare for compaction 0,772ms >> [0,045s][info][gc,phases,start] GC(0) Phase 3: Adjust pointers >> [0,046s][info][gc,phases ] GC(0) Phase 3: Adjust pointers 0,361ms >> [0,046s][info][gc,phases,start] GC(0) Phase 4: Compact heap >> [0,046s][info][gc,phases ] GC(0) Phase 4: Compact heap 0,142ms >> [0,050s][info][gc,heap ] GC(0) Eden regions: 1->0(2) >> [0,050s][info][gc,heap ] GC(0) Survivor regions: 0->0(0) >> [0,050s][info][gc,heap ] GC(0) Old regions: 0->1 >> [0,050s][info][gc,heap ] GC(0) Archive regions: 2->2 >> [0,050s][info][gc,heap ] GC(0) Humongous regions: 0->0 >> [0,050s][info][gc,metaspace ] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) >> [0,050s][info][gc ] GC(0) Pause Full (System.gc()) 9M->8M(80M) 6,729ms >> [0,050s][info][gc,cpu ] GC(0) User=0,01s Sys=0,02s Real=0,01s >> >> >> **Testing** >> Locally run g1 tests and also some additional testing through mach5. > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > HeapMark needs to be below JFRMark to make sure all GC events are withing start and end event. Marked as reviewed by iwalulya (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5252 From iwalulya at openjdk.java.net Fri Aug 27 15:56:36 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Fri, 27 Aug 2021 15:56:36 GMT Subject: RFR: 8237567: Refactor G1-specific code in shared VM_CollectForMetadataAllocation Message-ID: Hi all, Please review this refactoring change to move G1 specific code out of VM_CollectForMetadataAllocation. Testing: Tier 1-3. ------------- Commit messages: - add INCLUDE_G1GC check - 8237567: Refactor G1-specific code in shared VM_CollectForMetadataAllocation Changes: https://git.openjdk.java.net/jdk/pull/5282/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5282&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8237567 Stats: 45 lines in 4 files changed: 17 ins; 27 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5282.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5282/head:pull/5282 PR: https://git.openjdk.java.net/jdk/pull/5282 From mli at openjdk.java.net Sat Aug 28 13:50:50 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Sat, 28 Aug 2021 13:50:50 GMT Subject: RFR: 8254167 G1: Record regions where evacuation failed to provide targeted iteration [v2] In-Reply-To: References: Message-ID: > This is another try to optimize evcuation failure for regions. > I record evacuation failed regions, and iterate these regions directly rather than iterate the whole cset. > The implementation reuses and refactors some of the existing data in g1CollectedHeap (_regions_failed_evacuation, _num_regions_failed_evacuation), and records these regions in an array, and iterate this array later in post evacuation phase. > > I have 2 implementations: > > - 1) CHT (Initial version) > - 2) bitmap (latest version, reuse _regions_failed_evacuation in g1CollectedHeap) > > This implementation does not consider work distribution as mentioned in JDK-8254167 yet. But seems it already get better&stable performance gain than origin. We could improve it further later if work distribution is necessary. > > I will attach the perf data in JBS. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: reuse original bitmap rather than CHT ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5272/files - new: https://git.openjdk.java.net/jdk/pull/5272/files/c6cde28c..c790b720 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5272&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5272&range=00-01 Stats: 84 lines in 2 files changed: 18 ins; 52 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/5272.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5272/head:pull/5272 PR: https://git.openjdk.java.net/jdk/pull/5272 From tschatzl at openjdk.java.net Mon Aug 30 08:25:30 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 30 Aug 2021 08:25:30 GMT Subject: RFR: 8272611: Use parallel heap inspection when print classhisto info in gc log In-Reply-To: References: Message-ID: On Fri, 27 Aug 2021 09:07:07 GMT, Bin Liao wrote: >> Note: this is probably copy&pasted from `ClassHistogramDCmd::execute`; instead of the copy&paste please at least add something like a static `default_parallel_thread_num()` method to `VM_GC_HeapInspection`. > >> Where do the magic numbers for the formula come from? Please consider using the ergonomically determined number of threads (`CollectedHeap::safepoint_workers()->active_workers()`)? (Maybe change this in another CR as other reviewers think is better, looking whether there is a significant difference between these numbers) > > Thank you for your advice, it seems that (CollectedHeap::safepoint_workers()->active_workers()) looks better. ? ------------- PR: https://git.openjdk.java.net/jdk/pull/5158 From tschatzl at openjdk.java.net Mon Aug 30 08:25:31 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 30 Aug 2021 08:25:31 GMT Subject: RFR: 8272611: Use parallel heap inspection when print classhisto info in gc log In-Reply-To: References: Message-ID: On Fri, 27 Aug 2021 09:08:34 GMT, Bin Liao wrote: >> src/hotspot/share/gc/shared/collectedHeap.cpp line 552: >> >>> 550: LogStream ls(lt); >>> 551: uint parallel_thread_num = MAX2(1, (uint)os::initial_active_processor_count() * 3 / 8); >>> 552: VM_GC_HeapInspection inspector(&ls, false /* ! full gc */, parallel_thread_num); >> >> Pre-existing: the comment next to a bool constant in a parameter list should show the name of the parameter, i.e. s/! full_gc/request_full_gc/. Maybe change this as appropriate in other callers too. > >> Pre-existing: the comment next to a bool constant in a parameter list should show the name of the parameter, i.e. s/! full_gc/request_full_gc/. Maybe change this as appropriate in other callers too. > > Sounds reasonable, so I will change here in this PR, and the other places in another new pr, is it ok? Your call. ------------- PR: https://git.openjdk.java.net/jdk/pull/5158 From fmatte at openjdk.java.net Mon Aug 30 08:33:29 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Mon, 30 Aug 2021 08:33:29 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 In-Reply-To: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 18 Aug 2021 12:37:00 GMT, Fairoz Matte wrote: > This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp > > The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >> __ move(addr, tmp); > However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. > The fix is to check if it is is_oop() and call the mov appropriately. > > No issues found in local testing and Mach5 tier1-3 Hi Igor, Thanks for looking into this and thanks for the pointers to handle the spills in logic involved in T_OBJECT to T_LONG mov's and Also to check the src and dst must have the registers. These conditions can be handled using asserts/guarantee's (as the function doesn't return). But as original patch suggests we can have solution by copying over extra temp register. In https://github.com/openjdk/jdk/blob/d732c3091fea0a7c6d6767227de89002564504e5/src/hotspot/share/c1/c1_LinearScan.cpp#L1135 It was not handled and enfornced we must have both registers. So I am not sure if we also can handle this scenario or it can be prevented as done in c1_LinearScan.cpp file. Thanks, ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From ayang at openjdk.java.net Mon Aug 30 08:44:29 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 30 Aug 2021 08:44:29 GMT Subject: RFR: 8237567: Refactor G1-specific code in shared VM_CollectForMetadataAllocation In-Reply-To: References: Message-ID: On Fri, 27 Aug 2021 15:48:50 GMT, Ivan Walulya wrote: > Hi all, > > Please review this refactoring change to move G1 specific code out of VM_CollectForMetadataAllocation. > > Testing: Tier 1-3. Marked as reviewed by ayang (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5282 From ayang at openjdk.java.net Mon Aug 30 08:57:31 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 30 Aug 2021 08:57:31 GMT Subject: RFR: 8272651: G1 heap region info print order changed by JDK-8269914 [v2] In-Reply-To: References: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> Message-ID: <0B0EgLAY6pNrOiURd_OOczTvVzM8s1oR05wFNs4JqDI=.6d74f1be-c28c-4ba5-91b4-28147333b2f7@github.com> On Thu, 26 Aug 2021 10:37:56 GMT, Stefan Johansson wrote: >> Please review this small change to restore and coordinate the order of the G1 logs. >> >> **Summary** >> Recent changes have slightly re-ordered parts of the G1 logs printed with `-Xlog:gc*` for young and full collections. This change will mostly restore the order but also address some differences we had in the past. The biggest part of this change is making sure the region transitions are printed at the "correct" place. For young collections they are moved to just after the phases and for full collections they are moved to be before the end of the pause log. >> >> Example logs: >> >> [0,046s][info][gc,start ] GC(0) Pause Young (Concurrent Start) (System.gc()) >> [0,048s][info][gc,task ] GC(0) Using 24 workers of 28 for evacuation >> [0,049s][info][gc,phases ] GC(0) Pre Evacuate Collection Set: 0,1ms >> [0,049s][info][gc,phases ] GC(0) Merge Heap Roots: 0,1ms >> [0,049s][info][gc,phases ] GC(0) Evacuate Collection Set: 0,4ms >> [0,049s][info][gc,phases ] GC(0) Post Evacuate Collection Set: 0,3ms >> [0,049s][info][gc,phases ] GC(0) Other: 2,5ms >> [0,049s][info][gc,heap ] GC(0) Eden regions: 1->0(5) >> [0,049s][info][gc,heap ] GC(0) Survivor regions: 0->1(1) >> [0,049s][info][gc,heap ] GC(0) Old regions: 0->0 >> [0,049s][info][gc,heap ] GC(0) Archive regions: 2->2 >> [0,049s][info][gc,heap ] GC(0) Humongous regions: 0->0 >> [0,049s][info][gc,metaspace] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) >> [0,049s][info][gc ] GC(0) Pause Young (Concurrent Start) (System.gc()) 9M->8M(1024M) 3,563ms >> [0,049s][info][gc,cpu ] GC(0) User=0,00s Sys=0,00s Real=0,00s >> >> >> [0,044s][info][gc,start ] GC(0) Pause Full (System.gc()) >> [0,044s][info][gc,task ] GC(0) Using 3 workers of 28 for full compaction >> [0,044s][info][gc,phases,start] GC(0) Phase 1: Mark live objects >> [0,044s][info][gc,phases ] GC(0) Phase 1: Mark live objects 0,607ms >> [0,044s][info][gc,phases,start] GC(0) Phase 2: Prepare for compaction >> [0,045s][info][gc,phases ] GC(0) Phase 2: Prepare for compaction 0,772ms >> [0,045s][info][gc,phases,start] GC(0) Phase 3: Adjust pointers >> [0,046s][info][gc,phases ] GC(0) Phase 3: Adjust pointers 0,361ms >> [0,046s][info][gc,phases,start] GC(0) Phase 4: Compact heap >> [0,046s][info][gc,phases ] GC(0) Phase 4: Compact heap 0,142ms >> [0,050s][info][gc,heap ] GC(0) Eden regions: 1->0(2) >> [0,050s][info][gc,heap ] GC(0) Survivor regions: 0->0(0) >> [0,050s][info][gc,heap ] GC(0) Old regions: 0->1 >> [0,050s][info][gc,heap ] GC(0) Archive regions: 2->2 >> [0,050s][info][gc,heap ] GC(0) Humongous regions: 0->0 >> [0,050s][info][gc,metaspace ] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) >> [0,050s][info][gc ] GC(0) Pause Full (System.gc()) 9M->8M(80M) 6,729ms >> [0,050s][info][gc,cpu ] GC(0) User=0,01s Sys=0,02s Real=0,01s >> >> >> **Testing** >> Locally run g1 tests and also some additional testing through mach5. > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > HeapMark needs to be below JFRMark to make sure all GC events are withing start and end event. src/hotspot/share/gc/g1/g1FullCollector.hpp line 61: > 59: > 60: // Full GC Mark that holds GC id and CPU time trace. Needs to be separate > 61: // from the G1FullCollector and G1FullGCScope to get consistent logging. I think saying just "consistent logging" here is too terse; some elaboration (background info and/or with/without this separation comparison) could make the reasoning more explicit. ------------- PR: https://git.openjdk.java.net/jdk/pull/5252 From github.com+7947546+tanghaoth90 at openjdk.java.net Mon Aug 30 09:30:40 2021 From: github.com+7947546+tanghaoth90 at openjdk.java.net (Hao Tang) Date: Mon, 30 Aug 2021 09:30:40 GMT Subject: RFR: 8273122: ZGC: Load forwarding entries without acquire semantics Message-ID: JDK-8272138 introduced an unbound membar release() between the object copy and the forwarding table insertion. Thread A (Relocation): copy(); release(); cas_forwarding_table(); cas_self_heal(); The membar guarantees that the contents of the forwarded object are ready after a forwarding entry is loaded. Since load_object_content(ref) depends on the result of load_forwarding_table(), load_acquire can be safely changed to a simple load. Thread B (Remapping/Relocation): ref = load_forwarding_table(); // acquire (current version) -> relaxed (our proposal) load_object_content(ref); Our experiment on heapothesys demonstrates >5% time reduction spent on concurrent mark/relocation on AArch64. --------- ### Progress - [x] Change must not contain extraneous whitespace - [x] Commit message must refer to an issue - [ ] Change must be properly reviewed ### Reviewing
Using git Checkout this PR locally: \ `$ git fetch https://git.openjdk.java.net/jdk pull/5298/head:pull/5298` \ `$ git checkout pull/5298` Update a local copy of the PR: \ `$ git checkout pull/5298` \ `$ git pull https://git.openjdk.java.net/jdk pull/5298/head`
Using Skara CLI tools Checkout this PR locally: \ `$ git pr checkout 5298` View PR using the GUI difftool: \ `$ git pr show -t 5298`
Using diff file Download this PR as a diff file: \ https://git.openjdk.java.net/jdk/pull/5298.diff
------------- Commit messages: - ZGC: Load forwardee without acquire semantics Changes: https://git.openjdk.java.net/jdk/pull/5298/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5298&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273122 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5298.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5298/head:pull/5298 PR: https://git.openjdk.java.net/jdk/pull/5298 From github.com+7947546+tanghaoth90 at openjdk.java.net Mon Aug 30 09:30:40 2021 From: github.com+7947546+tanghaoth90 at openjdk.java.net (Hao Tang) Date: Mon, 30 Aug 2021 09:30:40 GMT Subject: RFR: 8273122: ZGC: Load forwarding entries without acquire semantics In-Reply-To: References: Message-ID: <0jr9hQXZzZdiIlXBpVqmUG0tuqt-iye2Db0m0w_dDdI=.a9932327-c73e-414a-9fa8-539d6a3d99e5@github.com> On Mon, 30 Aug 2021 09:17:12 GMT, Hao Tang wrote: > JDK-8272138 introduced an unbound membar release() between the object copy and the forwarding table insertion. > > Thread A (Relocation): > copy(); > release(); > cas_forwarding_table(); > cas_self_heal(); > > The membar guarantees that the contents of the forwarded object are ready after a forwarding entry is loaded. Since load_object_content(ref) depends on the result of load_forwarding_table(), load_acquire can be safely changed to a simple load. > > Thread B (Remapping/Relocation): > ref = load_forwarding_table(); // acquire (current version) -> relaxed (our proposal) > load_object_content(ref); > > Our experiment on heapothesys demonstrates >5% time reduction spent on concurrent mark/relocation on AArch64. > > --------- > ### Progress > - [x] Change must not contain extraneous whitespace > - [x] Commit message must refer to an issue > - [ ] Change must be properly reviewed > > > > > > ### Reviewing >
Using git > > Checkout this PR locally: \ > `$ git fetch https://git.openjdk.java.net/jdk pull/5298/head:pull/5298` \ > `$ git checkout pull/5298` > > Update a local copy of the PR: \ > `$ git checkout pull/5298` \ > `$ git pull https://git.openjdk.java.net/jdk pull/5298/head` > >
>
Using Skara CLI tools > > Checkout this PR locally: \ > `$ git pr checkout 5298` > > View PR using the GUI difftool: \ > `$ git pr show -t 5298` > >
>
Using diff file > > Download this PR as a diff file: \ > https://git.openjdk.java.net/jdk/pull/5298.diff > >
our experiment (gc1.log: the baseline; gc2.log: our proposal) $grep "00.*Phase: Concurrent Mark " gc* gc1.log:[100.399s][info ][gc,stats ] Phase: Concurrent Mark 789.113 / 789.113 568.992 / 1340.385 568.992 / 1340.385 568.992 / 1340.385 ms gc1.log:[200.399s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 608.129 / 1340.385 608.129 / 1340.385 608.129 / 1340.385 ms gc1.log:[300.399s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 658.501 / 1340.385 658.501 / 1340.385 658.501 / 1340.385 ms gc1.log:[400.399s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 688.056 / 1345.466 688.056 / 1345.466 688.056 / 1345.466 ms gc1.log:[500.400s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 706.695 / 1345.466 706.695 / 1345.466 706.695 / 1345.466 ms gc1.log:[600.400s][info ][gc,stats ] Phase: Concurrent Mark 805.184 / 805.184 740.751 / 1405.568 740.751 / 1405.568 740.751 / 1405.568 ms gc1.log:[700.399s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 769.357 / 1405.568 736.740 / 1405.568 736.740 / 1405.568 ms gc1.log:[800.399s][info ][gc,stats ] Phase: Concurrent Mark 1190.918 / 1190.918 779.011 / 1405.568 737.999 / 1405.568 737.999 / 1405.568 ms gc1.log:[900.399s][info ][gc,stats ] Phase: Concurrent Mark 168.838 / 168.838 772.214 / 1460.519 736.305 / 1460.519 736.305 / 1460.519 ms gc2.log:[100.436s][info ][gc,stats ] Phase: Concurrent Mark 179.593 / 179.593 529.779 / 1263.909 529.779 / 1263.909 529.779 / 1263.909 ms gc2.log:[200.436s][info ][gc,stats ] Phase: Concurrent Mark 154.201 / 154.201 661.044 / 1270.214 661.044 / 1270.214 661.044 / 1270.214 ms gc2.log:[300.436s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 654.328 / 1270.214 654.328 / 1270.214 654.328 / 1270.214 ms gc2.log:[400.436s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 708.091 / 1371.311 708.091 / 1371.311 708.091 / 1371.311 ms gc2.log:[500.436s][info ][gc,stats ] Phase: Concurrent Mark 133.302 / 133.302 668.686 / 1371.311 668.686 / 1371.311 668.686 / 1371.311 ms gc2.log:[600.437s][info ][gc,stats ] Phase: Concurrent Mark 137.578 / 137.578 553.064 / 1371.311 553.064 / 1371.311 553.064 / 1371.311 ms gc2.log:[700.436s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 611.669 / 1505.184 600.242 / 1505.184 600.242 / 1505.184 ms gc2.log:[800.436s][info ][gc,stats ] Phase: Concurrent Mark 0.000 / 0.000 674.987 / 2126.355 671.641 / 2126.355 671.641 / 2126.355 ms gc2.log:[900.436s][info ][gc,stats ] Phase: Concurrent Mark 1463.835 / 1463.835 691.205 / 2126.355 680.207 / 2126.355 680.207 / 2126.355 ms $grep "00.*Phase: Concurrent Relocate " gc* gc1.log:[100.399s][info ][gc,stats ] Phase: Concurrent Relocate 142.406 / 142.406 86.449 / 251.164 86.449 / 251.164 86.449 / 251.164 ms gc1.log:[200.399s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 103.633 / 251.164 103.633 / 251.164 103.633 / 251.164 ms gc1.log:[300.399s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 119.149 / 251.164 119.149 / 251.164 119.149 / 251.164 ms gc1.log:[400.399s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 127.896 / 251.164 127.896 / 251.164 127.896 / 251.164 ms gc1.log:[500.400s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 131.994 / 251.164 131.994 / 251.164 131.994 / 251.164 ms gc1.log:[600.400s][info ][gc,stats ] Phase: Concurrent Relocate 159.275 / 159.275 137.536 / 251.164 137.536 / 251.164 137.536 / 251.164 ms gc1.log:[700.399s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 150.155 / 226.174 139.784 / 251.164 139.784 / 251.164 ms gc1.log:[800.399s][info ][gc,stats ] Phase: Concurrent Relocate 225.072 / 225.072 153.125 / 226.174 141.247 / 251.164 141.247 / 251.164 ms gc1.log:[900.399s][info ][gc,stats ] Phase: Concurrent Relocate 54.794 / 54.794 151.922 / 228.903 141.573 / 251.164 141.573 / 251.164 ms gc2.log:[100.436s][info ][gc,stats ] Phase: Concurrent Relocate 49.474 / 49.474 100.802 / 258.853 100.802 / 258.853 100.802 / 258.853 ms gc2.log:[200.436s][info ][gc,stats ] Phase: Concurrent Relocate 48.347 / 48.347 125.454 / 258.853 125.454 / 258.853 125.454 / 258.853 ms gc2.log:[300.436s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 129.807 / 258.853 129.807 / 258.853 129.807 / 258.853 ms gc2.log:[400.436s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 136.403 / 258.853 136.403 / 258.853 136.403 / 258.853 ms gc2.log:[500.436s][info ][gc,stats ] Phase: Concurrent Relocate 52.439 / 52.439 130.421 / 258.853 130.421 / 258.853 130.421 / 258.853 ms gc2.log:[600.437s][info ][gc,stats ] Phase: Concurrent Relocate 48.181 / 48.181 112.995 / 258.853 112.995 / 258.853 112.995 / 258.853 ms gc2.log:[700.436s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 124.456 / 240.284 121.156 / 258.853 121.156 / 258.853 ms gc2.log:[800.436s][info ][gc,stats ] Phase: Concurrent Relocate 0.000 / 0.000 134.948 / 405.144 132.669 / 405.144 132.669 / 405.144 ms gc2.log:[900.436s][info ][gc,stats ] Phase: Concurrent Relocate 229.356 / 229.356 137.447 / 405.144 135.168 / 405.144 135.168 / 405.144 ms ------------- PR: https://git.openjdk.java.net/jdk/pull/5298 From github.com+7947546+tanghaoth90 at openjdk.java.net Mon Aug 30 09:40:29 2021 From: github.com+7947546+tanghaoth90 at openjdk.java.net (Hao Tang) Date: Mon, 30 Aug 2021 09:40:29 GMT Subject: RFR: 8273122: ZGC: Load forwarding entries without acquire semantics In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 09:17:12 GMT, Hao Tang wrote: > JDK-8272138 introduced an unbound membar release() between the object copy and the forwarding table insertion. > > Thread A (Relocation): > copy(); > release(); > cas_forwarding_table(); > cas_self_heal(); > > The membar guarantees that the contents of the forwarded object are ready after a forwarding entry is loaded. Since load_object_content(ref) depends on the result of load_forwarding_table(), load_acquire can be safely changed to a simple load. > > Thread B (Remapping/Relocation): > ref = load_forwarding_table(); // acquire (current version) -> relaxed (our proposal) > load_object_content(ref); > > Our experiment on heapothesys demonstrates >5% time reduction spent on concurrent mark/relocation on AArch64. > > --------- > ### Progress > - [x] Change must not contain extraneous whitespace > - [x] Commit message must refer to an issue > - [ ] Change must be properly reviewed > > > > > > ### Reviewing >
Using git > > Checkout this PR locally: \ > `$ git fetch https://git.openjdk.java.net/jdk pull/5298/head:pull/5298` \ > `$ git checkout pull/5298` > > Update a local copy of the PR: \ > `$ git checkout pull/5298` \ > `$ git pull https://git.openjdk.java.net/jdk pull/5298/head` > >
>
Using Skara CLI tools > > Checkout this PR locally: \ > `$ git pr checkout 5298` > > View PR using the GUI difftool: \ > `$ git pr show -t 5298` > >
>
Using diff file > > Download this PR as a diff file: \ > https://git.openjdk.java.net/jdk/pull/5298.diff > >
@albertnetymk We have discussed this issue in https://github.com/openjdk/jdk/pull/5046/files . Looking forward to your suggestions. ------------- PR: https://git.openjdk.java.net/jdk/pull/5298 From github.com+39413832+weixlu at openjdk.java.net Mon Aug 30 11:17:37 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Mon, 30 Aug 2021 11:17:37 GMT Subject: RFR: 8273127: Shenandoah: Adopt relaxed order for update oop Message-ID: Shenandoah evacuates object and then updates the reference in load barriers. Currently, memory order release is adopted in atomic_update_oop(), and we propose to use relaxed instead. Since memory order conservative is adopted in updating forwardee, this membar ensures that object copy is finished before updating the reference. In the scenario when a thread reads self-healed address from forwardee, relaxed is also fine due to the data dependency of the self-healed address. This relaxation of memory order brings us some performance improvement. We have run specjbb2015 on AArch64. Critical-JOPS increases by 3% while max-JOPS maintains almost the same. baseline_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107513, max-jOPS = 102968, critical-jOPS = 37913 baseline_2:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 106769, max-jOPS = 101742, critical-jOPS = 37151 baseline_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 110118, max-jOPS = 101742, critical-jOPS = 37041 optimized_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 111708, max-jOPS = 101742, critical-jOPS = 37687 optimized_2:RUN RESULT: hbIR (max attempted) = 147077, hbIR (settled) = 122581, max-jOPS = 104425, critical-jOPS = 39663 optimized_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107286, max-jOPS = 102968, critical-jOPS = 38579 ------------- Commit messages: - Shenandoah: Adopt relaxed order for update oop Changes: https://git.openjdk.java.net/jdk/pull/5299/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5299&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273127 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5299.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5299/head:pull/5299 PR: https://git.openjdk.java.net/jdk/pull/5299 From shade at openjdk.java.net Mon Aug 30 11:28:25 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 30 Aug 2021 11:28:25 GMT Subject: RFR: 8273127: Shenandoah: Adopt relaxed order for update oop In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 11:08:22 GMT, Xiaowei Lu wrote: > Shenandoah evacuates object and then updates the reference in load barriers. Currently, memory order release is adopted in atomic_update_oop(), and we propose to use relaxed instead. Since memory order conservative is adopted in updating forwardee, this membar ensures that object copy is finished before updating the reference. In the scenario when a thread reads self-healed address from forwardee, relaxed is also fine due to the data dependency of the self-healed address. > This relaxation of memory order brings us some performance improvement. We have run specjbb2015 on AArch64. Critical-JOPS increases by 3% while max-JOPS maintains almost the same. > > baseline_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107513, max-jOPS = 102968, critical-jOPS = 37913 > baseline_2:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 106769, max-jOPS = 101742, critical-jOPS = 37151 > baseline_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 110118, max-jOPS = 101742, critical-jOPS = 37041 > > optimized_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 111708, max-jOPS = 101742, critical-jOPS = 37687 > optimized_2:RUN RESULT: hbIR (max attempted) = 147077, hbIR (settled) = 122581, max-jOPS = 104425, critical-jOPS = 39663 > optimized_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107286, max-jOPS = 102968, critical-jOPS = 38579 This conflicts with #2496, which would use "release" when updating the forwardee. I remember trying this, and "relaxed" was not sufficient, with observable failures (IIRC, in `tier1` with `-XX:+UseShenandoahGC`). ------------- PR: https://git.openjdk.java.net/jdk/pull/5299 From github.com+3094961+buddyliao at openjdk.java.net Mon Aug 30 11:31:46 2021 From: github.com+3094961+buddyliao at openjdk.java.net (Bin Liao) Date: Mon, 30 Aug 2021 11:31:46 GMT Subject: RFR: 8272611: Use parallel heap inspection when print classhisto info in gc log [v2] In-Reply-To: References: Message-ID: > This patch helps to enhance the heap inspection with parallel threads when print classhisto info in gc log. > > I have built it on linux x86_64 and run test-tier1 successfully > > --------- > ### Progress > - [x] Change must not contain extraneous whitespace > - [x] Commit message must refer to an issue > - [ ] Change must be properly reviewed > > > > ### Reviewing >
Using git > > Checkout this PR locally: \ > `$ git fetch https://git.openjdk.java.net/jdk pull/5158/head:pull/5158` \ > `$ git checkout pull/5158` > > Update a local copy of the PR: \ > `$ git checkout pull/5158` \ > `$ git pull https://git.openjdk.java.net/jdk pull/5158/head` > >
>
Using Skara CLI tools > > Checkout this PR locally: \ > `$ git pr checkout 5158` > > View PR using the GUI difftool: \ > `$ git pr show -t 5158` > >
>
Using diff file > > Download this PR as a diff file: \ > https://git.openjdk.java.net/jdk/pull/5158.diff > >
Bin Liao has updated the pull request incrementally with one additional commit since the last revision: 8272611: Use parallel heap inspection when print classhisto info in gc log ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5158/files - new: https://git.openjdk.java.net/jdk/pull/5158/files/56cc6022..6ae1b554 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5158&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5158&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5158.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5158/head:pull/5158 PR: https://git.openjdk.java.net/jdk/pull/5158 From shade at openjdk.java.net Mon Aug 30 12:04:38 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 30 Aug 2021 12:04:38 GMT Subject: RFR: 8273127: Shenandoah: Adopt relaxed order for update oop In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 11:08:22 GMT, Xiaowei Lu wrote: > Shenandoah evacuates object and then updates the reference in load barriers. Currently, memory order release is adopted in atomic_update_oop(), and we propose to use relaxed instead. Since memory order conservative is adopted in updating forwardee, this membar ensures that object copy is finished before updating the reference. In the scenario when a thread reads self-healed address from forwardee, relaxed is also fine due to the data dependency of the self-healed address. > This relaxation of memory order brings us some performance improvement. We have run specjbb2015 on AArch64. Critical-JOPS increases by 3% while max-JOPS maintains almost the same. > > baseline_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107513, max-jOPS = 102968, critical-jOPS = 37913 > baseline_2:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 106769, max-jOPS = 101742, critical-jOPS = 37151 > baseline_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 110118, max-jOPS = 101742, critical-jOPS = 37041 > > optimized_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 111708, max-jOPS = 101742, critical-jOPS = 37687 > optimized_2:RUN RESULT: hbIR (max attempted) = 147077, hbIR (settled) = 122581, max-jOPS = 104425, critical-jOPS = 39663 > optimized_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107286, max-jOPS = 102968, critical-jOPS = 38579 Actually, let me think about it a bit more. Maybe if we do `OrderAccess::release()` in fwdptr installation (thus still relaxing it from `conservative`), we can use `relaxed` during concurrent heap updates. ------------- PR: https://git.openjdk.java.net/jdk/pull/5299 From tschatzl at openjdk.java.net Mon Aug 30 12:10:33 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 30 Aug 2021 12:10:33 GMT Subject: RFR: 8272161: Make evacuation failure data structures local to collection [v2] In-Reply-To: References: Message-ID: <02HptSHZzdMt7l4cjsTQcJ-8qGLlK74vBM5gJ36IZXc=.a3d23ddd-8cad-469a-862b-75f02eff15ad@github.com> On Fri, 27 Aug 2021 12:39:13 GMT, Stefan Johansson wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> Preserved marks stacks must be allocated on C heap. > > Lgtm Thanks @kstefanj and @walulyai for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/5066 From tschatzl at openjdk.java.net Mon Aug 30 12:10:34 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 30 Aug 2021 12:10:34 GMT Subject: Integrated: 8272161: Make evacuation failure data structures local to collection In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 12:27:16 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that moves some evacuation (failure) related data structures into the per-thread evacuation context (`G1ParThreadScanState`) which removes some young collection related code from G1CollectedHeap? > > Testing: tier1-5 > > Thanks, > Thomas This pull request has now been integrated. Changeset: bb7aa1c6 Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/bb7aa1c6a4856827cf05e976215699725b56b87a Stats: 122 lines in 6 files changed: 43 ins; 45 del; 34 mod 8272161: Make evacuation failure data structures local to collection Reviewed-by: iwalulya, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/5066 From tschatzl at openjdk.java.net Mon Aug 30 12:12:28 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 30 Aug 2021 12:12:28 GMT Subject: RFR: 8272611: Use parallel heap inspection when print classhisto info in gc log [v2] In-Reply-To: References: Message-ID: <5oqzAWWnAjX5GdwSi3QY820X6hAUw10LoeTUF8Osi7o=.2b56f0ae-7dc1-4e54-be53-69c4bdcbea99@github.com> On Mon, 30 Aug 2021 11:31:46 GMT, Bin Liao wrote: >> This patch helps to enhance the heap inspection with parallel threads when print classhisto info in gc log. >> >> I have built it on linux x86_64 and run test-tier1 successfully >> >> --------- >> ### Progress >> - [x] Change must not contain extraneous whitespace >> - [x] Commit message must refer to an issue >> - [ ] Change must be properly reviewed >> >> >> >> ### Reviewing >>
Using git >> >> Checkout this PR locally: \ >> `$ git fetch https://git.openjdk.java.net/jdk pull/5158/head:pull/5158` \ >> `$ git checkout pull/5158` >> >> Update a local copy of the PR: \ >> `$ git checkout pull/5158` \ >> `$ git pull https://git.openjdk.java.net/jdk pull/5158/head` >> >>
>>
Using Skara CLI tools >> >> Checkout this PR locally: \ >> `$ git pr checkout 5158` >> >> View PR using the GUI difftool: \ >> `$ git pr show -t 5158` >> >>
>>
Using diff file >> >> Download this PR as a diff file: \ >> https://git.openjdk.java.net/jdk/pull/5158.diff >> >>
> > Bin Liao has updated the pull request incrementally with one additional commit since the last revision: > > 8272611: Use parallel heap inspection when print classhisto info in gc log Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/shared/collectedHeap.cpp line 552: > 550: LogStream ls(lt); > 551: uint parallel_thread_num = safepoint_workers()->active_workers(); > 552: VM_GC_HeapInspection inspector(&ls, false /* request full gc */, parallel_thread_num); I would prefer the same issue would also be fixed in the other place (`ClassHistogramDCmd::execute`), using a static method as suggested. Otherwise the problem will exist forever. ------------- PR: https://git.openjdk.java.net/jdk/pull/5158 From sjohanss at openjdk.java.net Mon Aug 30 12:15:54 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 30 Aug 2021 12:15:54 GMT Subject: RFR: 8272651: G1 heap region info print order changed by JDK-8269914 [v3] In-Reply-To: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> References: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> Message-ID: > Please review this small change to restore and coordinate the order of the G1 logs. > > **Summary** > Recent changes have slightly re-ordered parts of the G1 logs printed with `-Xlog:gc*` for young and full collections. This change will mostly restore the order but also address some differences we had in the past. The biggest part of this change is making sure the region transitions are printed at the "correct" place. For young collections they are moved to just after the phases and for full collections they are moved to be before the end of the pause log. > > Example logs: > > [0,046s][info][gc,start ] GC(0) Pause Young (Concurrent Start) (System.gc()) > [0,048s][info][gc,task ] GC(0) Using 24 workers of 28 for evacuation > [0,049s][info][gc,phases ] GC(0) Pre Evacuate Collection Set: 0,1ms > [0,049s][info][gc,phases ] GC(0) Merge Heap Roots: 0,1ms > [0,049s][info][gc,phases ] GC(0) Evacuate Collection Set: 0,4ms > [0,049s][info][gc,phases ] GC(0) Post Evacuate Collection Set: 0,3ms > [0,049s][info][gc,phases ] GC(0) Other: 2,5ms > [0,049s][info][gc,heap ] GC(0) Eden regions: 1->0(5) > [0,049s][info][gc,heap ] GC(0) Survivor regions: 0->1(1) > [0,049s][info][gc,heap ] GC(0) Old regions: 0->0 > [0,049s][info][gc,heap ] GC(0) Archive regions: 2->2 > [0,049s][info][gc,heap ] GC(0) Humongous regions: 0->0 > [0,049s][info][gc,metaspace] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) > [0,049s][info][gc ] GC(0) Pause Young (Concurrent Start) (System.gc()) 9M->8M(1024M) 3,563ms > [0,049s][info][gc,cpu ] GC(0) User=0,00s Sys=0,00s Real=0,00s > > > [0,044s][info][gc,start ] GC(0) Pause Full (System.gc()) > [0,044s][info][gc,task ] GC(0) Using 3 workers of 28 for full compaction > [0,044s][info][gc,phases,start] GC(0) Phase 1: Mark live objects > [0,044s][info][gc,phases ] GC(0) Phase 1: Mark live objects 0,607ms > [0,044s][info][gc,phases,start] GC(0) Phase 2: Prepare for compaction > [0,045s][info][gc,phases ] GC(0) Phase 2: Prepare for compaction 0,772ms > [0,045s][info][gc,phases,start] GC(0) Phase 3: Adjust pointers > [0,046s][info][gc,phases ] GC(0) Phase 3: Adjust pointers 0,361ms > [0,046s][info][gc,phases,start] GC(0) Phase 4: Compact heap > [0,046s][info][gc,phases ] GC(0) Phase 4: Compact heap 0,142ms > [0,050s][info][gc,heap ] GC(0) Eden regions: 1->0(2) > [0,050s][info][gc,heap ] GC(0) Survivor regions: 0->0(0) > [0,050s][info][gc,heap ] GC(0) Old regions: 0->1 > [0,050s][info][gc,heap ] GC(0) Archive regions: 2->2 > [0,050s][info][gc,heap ] GC(0) Humongous regions: 0->0 > [0,050s][info][gc,metaspace ] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) > [0,050s][info][gc ] GC(0) Pause Full (System.gc()) 9M->8M(80M) 6,729ms > [0,050s][info][gc,cpu ] GC(0) User=0,01s Sys=0,02s Real=0,01s > > > **Testing** > Locally run g1 tests and also some additional testing through mach5. Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Albert review. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5252/files - new: https://git.openjdk.java.net/jdk/pull/5252/files/56078ab0..360a1f88 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5252&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5252&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5252.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5252/head:pull/5252 PR: https://git.openjdk.java.net/jdk/pull/5252 From sjohanss at openjdk.java.net Mon Aug 30 12:15:58 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 30 Aug 2021 12:15:58 GMT Subject: RFR: 8272651: G1 heap region info print order changed by JDK-8269914 [v2] In-Reply-To: <0B0EgLAY6pNrOiURd_OOczTvVzM8s1oR05wFNs4JqDI=.6d74f1be-c28c-4ba5-91b4-28147333b2f7@github.com> References: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> <0B0EgLAY6pNrOiURd_OOczTvVzM8s1oR05wFNs4JqDI=.6d74f1be-c28c-4ba5-91b4-28147333b2f7@github.com> Message-ID: <-ToMYIafTP5C-mNBorTqf_TIiTADgYTca7gER2G2g2s=.9e33e906-33c0-4c9b-bfe2-e4c8e4357528@github.com> On Mon, 30 Aug 2021 08:54:09 GMT, Albert Mingkun Yang wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> HeapMark needs to be below JFRMark to make sure all GC events are withing start and end event. > > src/hotspot/share/gc/g1/g1FullCollector.hpp line 61: > >> 59: >> 60: // Full GC Mark that holds GC id and CPU time trace. Needs to be separate >> 61: // from the G1FullCollector and G1FullGCScope to get consistent logging. > > I think saying just "consistent logging" here is too terse; some elaboration (background info and/or with/without this separation comparison) could make the reasoning more explicit. Suggestion: // from the G1FullCollector and G1FullGCScope to allow the Full GC logging // to have the same structure as the Young GC logging. Would this be enough? I don't want to add to many details as it might easily get stale. But I wanted the comment to let the reader know why this isn't folded in to the other structures. ------------- PR: https://git.openjdk.java.net/jdk/pull/5252 From sjohanss at openjdk.java.net Mon Aug 30 12:30:27 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 30 Aug 2021 12:30:27 GMT Subject: RFR: 8272093: Extract evacuation failure injection from G1CollectedHeap In-Reply-To: References: Message-ID: <2UvMZtXuJmIuERs2FvIMyvWGvSSH6lYD9DBpMn35M2o=.3065431c-9243-4201-9726-8ba5713952d2@github.com> On Mon, 9 Aug 2021 09:25:30 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that extracts out the evacuation failure injection mechanism from `G1CollectedHeap` into its own class? > This reduces the amount of code in `G1CollectedHeap` a bit, moving somewhat uninteresting but somewhat important (testing) code out from there. Also documented it a bit. The other reason was that imho it stuck out a bit due to the ifdef's in `G1CollectedHeap`. > > I checked that gcc does compile out the call to `evacuation_should_fail()` in `G1ParScanThreadState` in product mode. > > Testing: tier1, local testing > > Thanks, > Thomas Looks good, but I think the members should be left out in product builds. src/hotspot/share/gc/g1/g1YoungGCEvacFailureInjector.hpp line 49: > 47: > 48: // The number of evacuations between induced failures. > 49: volatile size_t _evacuation_failure_object_count; Any reason to not guard these with `#ifndef PRODUCT`? ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5047 From tschatzl at openjdk.java.net Mon Aug 30 12:34:27 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 30 Aug 2021 12:34:27 GMT Subject: RFR: 8272093: Extract evacuation failure injection from G1CollectedHeap In-Reply-To: <2UvMZtXuJmIuERs2FvIMyvWGvSSH6lYD9DBpMn35M2o=.3065431c-9243-4201-9726-8ba5713952d2@github.com> References: <2UvMZtXuJmIuERs2FvIMyvWGvSSH6lYD9DBpMn35M2o=.3065431c-9243-4201-9726-8ba5713952d2@github.com> Message-ID: On Mon, 30 Aug 2021 12:21:59 GMT, Stefan Johansson wrote: >> Hi all, >> >> can I have reviews for this change that extracts out the evacuation failure injection mechanism from `G1CollectedHeap` into its own class? >> This reduces the amount of code in `G1CollectedHeap` a bit, moving somewhat uninteresting but somewhat important (testing) code out from there. Also documented it a bit. The other reason was that imho it stuck out a bit due to the ifdef's in `G1CollectedHeap`. >> >> I checked that gcc does compile out the call to `evacuation_should_fail()` in `G1ParScanThreadState` in product mode. >> >> Testing: tier1, local testing >> >> Thanks, >> Thomas > > src/hotspot/share/gc/g1/g1YoungGCEvacFailureInjector.hpp line 49: > >> 47: >> 48: // The number of evacuations between induced failures. >> 49: volatile size_t _evacuation_failure_object_count; > > Any reason to not guard these with `#ifndef PRODUCT`? Will do. I need to merge with master anyway. ------------- PR: https://git.openjdk.java.net/jdk/pull/5047 From tschatzl at openjdk.java.net Mon Aug 30 13:12:45 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 30 Aug 2021 13:12:45 GMT Subject: RFR: 8272093: Extract evacuation failure injection from G1CollectedHeap [v2] In-Reply-To: References: Message-ID: > Hi all, > > can I have reviews for this change that extracts out the evacuation failure injection mechanism from `G1CollectedHeap` into its own class? > This reduces the amount of code in `G1CollectedHeap` a bit, moving somewhat uninteresting but somewhat important (testing) code out from there. Also documented it a bit. The other reason was that imho it stuck out a bit due to the ifdef's in `G1CollectedHeap`. > > I checked that gcc does compile out the call to `evacuation_should_fail()` in `G1ParScanThreadState` in product mode. > > Testing: tier1, local testing > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' into 8272093-extract-evac-failure-injector - Also remove injector members from product builds - Fix initialization order issue with injector - Extract evac failure injector from G1CollectedHeap ------------- Changes: https://git.openjdk.java.net/jdk/pull/5047/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5047&range=01 Stats: 330 lines in 7 files changed: 214 ins; 110 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5047.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5047/head:pull/5047 PR: https://git.openjdk.java.net/jdk/pull/5047 From iwalulya at openjdk.java.net Mon Aug 30 13:36:40 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 30 Aug 2021 13:36:40 GMT Subject: RFR: 8159979: During initial mark, preparing all regions for marking may take a significant amount of time Message-ID: <1aCyJ3XrBfjWFjudPPWbw3mZavoz1nhkQRdzJ1xGOzk=.7aab48b5-233e-4917-864b-a4d4b7c26dd4@github.com> Hi all, Please review this change to move the work done during preparation for a concurrent mark start (pre_concurrent_start) into a G1BatchedGangTask. Testing: Tier 1-3 ------------- Commit messages: - 8159979: During initial mark, preparing all regions for marking may take a significant amount of time Changes: https://git.openjdk.java.net/jdk/pull/5302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5302&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8159979 Stats: 121 lines in 5 files changed: 87 ins; 20 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/5302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5302/head:pull/5302 PR: https://git.openjdk.java.net/jdk/pull/5302 From ayang at openjdk.java.net Mon Aug 30 14:13:32 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 30 Aug 2021 14:13:32 GMT Subject: RFR: 8272651: G1 heap region info print order changed by JDK-8269914 [v3] In-Reply-To: References: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> Message-ID: On Mon, 30 Aug 2021 12:15:54 GMT, Stefan Johansson wrote: >> Please review this small change to restore and coordinate the order of the G1 logs. >> >> **Summary** >> Recent changes have slightly re-ordered parts of the G1 logs printed with `-Xlog:gc*` for young and full collections. This change will mostly restore the order but also address some differences we had in the past. The biggest part of this change is making sure the region transitions are printed at the "correct" place. For young collections they are moved to just after the phases and for full collections they are moved to be before the end of the pause log. >> >> Example logs: >> >> [0,046s][info][gc,start ] GC(0) Pause Young (Concurrent Start) (System.gc()) >> [0,048s][info][gc,task ] GC(0) Using 24 workers of 28 for evacuation >> [0,049s][info][gc,phases ] GC(0) Pre Evacuate Collection Set: 0,1ms >> [0,049s][info][gc,phases ] GC(0) Merge Heap Roots: 0,1ms >> [0,049s][info][gc,phases ] GC(0) Evacuate Collection Set: 0,4ms >> [0,049s][info][gc,phases ] GC(0) Post Evacuate Collection Set: 0,3ms >> [0,049s][info][gc,phases ] GC(0) Other: 2,5ms >> [0,049s][info][gc,heap ] GC(0) Eden regions: 1->0(5) >> [0,049s][info][gc,heap ] GC(0) Survivor regions: 0->1(1) >> [0,049s][info][gc,heap ] GC(0) Old regions: 0->0 >> [0,049s][info][gc,heap ] GC(0) Archive regions: 2->2 >> [0,049s][info][gc,heap ] GC(0) Humongous regions: 0->0 >> [0,049s][info][gc,metaspace] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) >> [0,049s][info][gc ] GC(0) Pause Young (Concurrent Start) (System.gc()) 9M->8M(1024M) 3,563ms >> [0,049s][info][gc,cpu ] GC(0) User=0,00s Sys=0,00s Real=0,00s >> >> >> [0,044s][info][gc,start ] GC(0) Pause Full (System.gc()) >> [0,044s][info][gc,task ] GC(0) Using 3 workers of 28 for full compaction >> [0,044s][info][gc,phases,start] GC(0) Phase 1: Mark live objects >> [0,044s][info][gc,phases ] GC(0) Phase 1: Mark live objects 0,607ms >> [0,044s][info][gc,phases,start] GC(0) Phase 2: Prepare for compaction >> [0,045s][info][gc,phases ] GC(0) Phase 2: Prepare for compaction 0,772ms >> [0,045s][info][gc,phases,start] GC(0) Phase 3: Adjust pointers >> [0,046s][info][gc,phases ] GC(0) Phase 3: Adjust pointers 0,361ms >> [0,046s][info][gc,phases,start] GC(0) Phase 4: Compact heap >> [0,046s][info][gc,phases ] GC(0) Phase 4: Compact heap 0,142ms >> [0,050s][info][gc,heap ] GC(0) Eden regions: 1->0(2) >> [0,050s][info][gc,heap ] GC(0) Survivor regions: 0->0(0) >> [0,050s][info][gc,heap ] GC(0) Old regions: 0->1 >> [0,050s][info][gc,heap ] GC(0) Archive regions: 2->2 >> [0,050s][info][gc,heap ] GC(0) Humongous regions: 0->0 >> [0,050s][info][gc,metaspace ] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) >> [0,050s][info][gc ] GC(0) Pause Full (System.gc()) 9M->8M(80M) 6,729ms >> [0,050s][info][gc,cpu ] GC(0) User=0,01s Sys=0,02s Real=0,01s >> >> >> **Testing** >> Locally run g1 tests and also some additional testing through mach5. > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > Albert review. Marked as reviewed by ayang (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5252 From github.com+10835776+stsypanov at openjdk.java.net Mon Aug 30 14:36:05 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Mon, 30 Aug 2021 14:36:05 GMT Subject: RFR: 8273140: Replace usages of Enum.class.getEnumConstants() with Enum.values() where possible Message-ID: <4N-BgoVLVgZk_uPkRLoY-427bF-kKKwDr1zRHI7o1PI=.dcdbec6e-f3f8-48e4-94ce-a88d7e494c41@github.com> Just a very tiny clean-up. There are some places in JDK code base where we call `Enum.class.getEnumConstants()` to get all the values of the referenced `enum`. This is excessive, less-readable and slower than just calling `Enum.values()` as in `getEnumConstants()` we have volatile access: public T[] getEnumConstants() { T[] values = getEnumConstantsShared(); return (values != null) ? values.clone() : null; } private transient volatile T[] enumConstants; T[] getEnumConstantsShared() { T[] constants = enumConstants; if (constants == null) { /* ... */ } return constants; } Calling values() method is slightly faster: @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) public class EnumBenchmark { @Benchmark public Enum[] values() { return Enum.values(); } @Benchmark public Enum[] getEnumConstants() { return Enum.class.getEnumConstants(); } private enum Enum { A, B } } Benchmark Mode Cnt Score Error Units EnumBenchmark.getEnumConstants avgt 15 6,265 ? 0,051 ns/op EnumBenchmark.getEnumConstants:?gc.alloc.rate avgt 15 2434,075 ? 19,568 MB/sec EnumBenchmark.getEnumConstants:?gc.alloc.rate.norm avgt 15 24,002 ? 0,001 B/op EnumBenchmark.getEnumConstants:?gc.churn.G1_Eden_Space avgt 15 2433,709 ? 70,216 MB/sec EnumBenchmark.getEnumConstants:?gc.churn.G1_Eden_Space.norm avgt 15 23,998 ? 0,659 B/op EnumBenchmark.getEnumConstants:?gc.churn.G1_Survivor_Space avgt 15 0,009 ? 0,003 MB/sec EnumBenchmark.getEnumConstants:?gc.churn.G1_Survivor_Space.norm avgt 15 ? 10?? B/op EnumBenchmark.getEnumConstants:?gc.count avgt 15 210,000 counts EnumBenchmark.getEnumConstants:?gc.time avgt 15 119,000 ms EnumBenchmark.values avgt 15 4,164 ? 0,134 ns/op EnumBenchmark.values:?gc.alloc.rate avgt 15 3665,341 ? 120,721 MB/sec EnumBenchmark.values:?gc.alloc.rate.norm avgt 15 24,002 ? 0,001 B/op EnumBenchmark.values:?gc.churn.G1_Eden_Space avgt 15 3660,512 ? 137,250 MB/sec EnumBenchmark.values:?gc.churn.G1_Eden_Space.norm avgt 15 23,972 ? 0,529 B/op EnumBenchmark.values:?gc.churn.G1_Survivor_Space avgt 15 0,017 ? 0,003 MB/sec EnumBenchmark.values:?gc.churn.G1_Survivor_Space.norm avgt 15 ? 10?? B/op EnumBenchmark.values:?gc.count avgt 15 262,000 counts EnumBenchmark.values:?gc.time avgt 15 155,000 ms ------------- Commit messages: - 8273140: Replace usages of Enum.class.getEnumConstants() with Enum.values() where possible Changes: https://git.openjdk.java.net/jdk/pull/5303/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5303&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273140 Stats: 13 lines in 4 files changed: 0 ins; 2 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/5303.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5303/head:pull/5303 PR: https://git.openjdk.java.net/jdk/pull/5303 From sjohanss at openjdk.java.net Mon Aug 30 14:40:37 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 30 Aug 2021 14:40:37 GMT Subject: RFR: 8272651: G1 heap region info print order changed by JDK-8269914 [v2] In-Reply-To: <6xVr-UcMm2tykxqmPzFpSGT62-pgjdFGdw9zGi7KbmM=.2328159e-aa3a-48af-83d2-645cbde79c89@github.com> References: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> <6xVr-UcMm2tykxqmPzFpSGT62-pgjdFGdw9zGi7KbmM=.2328159e-aa3a-48af-83d2-645cbde79c89@github.com> Message-ID: On Fri, 27 Aug 2021 12:46:53 GMT, Ivan Walulya wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> HeapMark needs to be below JFRMark to make sure all GC events are withing start and end event. > > Marked as reviewed by iwalulya (Committer). Thanks @walulyai, @tschatzl and @albertnetymk for reviewing. ------------- PR: https://git.openjdk.java.net/jdk/pull/5252 From sjohanss at openjdk.java.net Mon Aug 30 14:40:38 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 30 Aug 2021 14:40:38 GMT Subject: Integrated: 8272651: G1 heap region info print order changed by JDK-8269914 In-Reply-To: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> References: <31RXx2Yekpu0GXtXT8xFnXBcNbMbWcsLky49Fqp1dOQ=.94639610-dce0-4136-880e-b322c24c34be@github.com> Message-ID: On Wed, 25 Aug 2021 12:40:16 GMT, Stefan Johansson wrote: > Please review this small change to restore and coordinate the order of the G1 logs. > > **Summary** > Recent changes have slightly re-ordered parts of the G1 logs printed with `-Xlog:gc*` for young and full collections. This change will mostly restore the order but also address some differences we had in the past. The biggest part of this change is making sure the region transitions are printed at the "correct" place. For young collections they are moved to just after the phases and for full collections they are moved to be before the end of the pause log. > > Example logs: > > [0,046s][info][gc,start ] GC(0) Pause Young (Concurrent Start) (System.gc()) > [0,048s][info][gc,task ] GC(0) Using 24 workers of 28 for evacuation > [0,049s][info][gc,phases ] GC(0) Pre Evacuate Collection Set: 0,1ms > [0,049s][info][gc,phases ] GC(0) Merge Heap Roots: 0,1ms > [0,049s][info][gc,phases ] GC(0) Evacuate Collection Set: 0,4ms > [0,049s][info][gc,phases ] GC(0) Post Evacuate Collection Set: 0,3ms > [0,049s][info][gc,phases ] GC(0) Other: 2,5ms > [0,049s][info][gc,heap ] GC(0) Eden regions: 1->0(5) > [0,049s][info][gc,heap ] GC(0) Survivor regions: 0->1(1) > [0,049s][info][gc,heap ] GC(0) Old regions: 0->0 > [0,049s][info][gc,heap ] GC(0) Archive regions: 2->2 > [0,049s][info][gc,heap ] GC(0) Humongous regions: 0->0 > [0,049s][info][gc,metaspace] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) > [0,049s][info][gc ] GC(0) Pause Young (Concurrent Start) (System.gc()) 9M->8M(1024M) 3,563ms > [0,049s][info][gc,cpu ] GC(0) User=0,00s Sys=0,00s Real=0,00s > > > [0,044s][info][gc,start ] GC(0) Pause Full (System.gc()) > [0,044s][info][gc,task ] GC(0) Using 3 workers of 28 for full compaction > [0,044s][info][gc,phases,start] GC(0) Phase 1: Mark live objects > [0,044s][info][gc,phases ] GC(0) Phase 1: Mark live objects 0,607ms > [0,044s][info][gc,phases,start] GC(0) Phase 2: Prepare for compaction > [0,045s][info][gc,phases ] GC(0) Phase 2: Prepare for compaction 0,772ms > [0,045s][info][gc,phases,start] GC(0) Phase 3: Adjust pointers > [0,046s][info][gc,phases ] GC(0) Phase 3: Adjust pointers 0,361ms > [0,046s][info][gc,phases,start] GC(0) Phase 4: Compact heap > [0,046s][info][gc,phases ] GC(0) Phase 4: Compact heap 0,142ms > [0,050s][info][gc,heap ] GC(0) Eden regions: 1->0(2) > [0,050s][info][gc,heap ] GC(0) Survivor regions: 0->0(0) > [0,050s][info][gc,heap ] GC(0) Old regions: 0->1 > [0,050s][info][gc,heap ] GC(0) Archive regions: 2->2 > [0,050s][info][gc,heap ] GC(0) Humongous regions: 0->0 > [0,050s][info][gc,metaspace ] GC(0) Metaspace: 73K(320K)->73K(320K) NonClass: 70K(192K)->70K(192K) Class: 3K(128K)->3K(128K) > [0,050s][info][gc ] GC(0) Pause Full (System.gc()) 9M->8M(80M) 6,729ms > [0,050s][info][gc,cpu ] GC(0) User=0,01s Sys=0,02s Real=0,01s > > > **Testing** > Locally run g1 tests and also some additional testing through mach5. This pull request has now been integrated. Changeset: f11e099a Author: Stefan Johansson URL: https://git.openjdk.java.net/jdk/commit/f11e099a149adfecc474ba37276ec8672067d090 Stats: 41 lines in 5 files changed: 21 ins; 13 del; 7 mod 8272651: G1 heap region info print order changed by JDK-8269914 Reviewed-by: tschatzl, iwalulya, ayang ------------- PR: https://git.openjdk.java.net/jdk/pull/5252 From iwalulya at openjdk.java.net Mon Aug 30 15:18:49 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 30 Aug 2021 15:18:49 GMT Subject: RFR: 8159979: During initial mark, preparing all regions for marking may take a significant amount of time [v2] In-Reply-To: <1aCyJ3XrBfjWFjudPPWbw3mZavoz1nhkQRdzJ1xGOzk=.7aab48b5-233e-4917-864b-a4d4b7c26dd4@github.com> References: <1aCyJ3XrBfjWFjudPPWbw3mZavoz1nhkQRdzJ1xGOzk=.7aab48b5-233e-4917-864b-a4d4b7c26dd4@github.com> Message-ID: > Hi all, > > Please review this change to move the work done during preparation for a concurrent mark start (pre_concurrent_start) into a G1BatchedGangTask. > > Testing: Tier 1-3 Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: typo ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5302/files - new: https://git.openjdk.java.net/jdk/pull/5302/files/19a33196..f7db3b1b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5302&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5302&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5302/head:pull/5302 PR: https://git.openjdk.java.net/jdk/pull/5302 From tschatzl at openjdk.java.net Mon Aug 30 15:47:45 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 30 Aug 2021 15:47:45 GMT Subject: RFR: 8273144: Remove unused top level "Sample Collection Set Candidates" logging Message-ID: Hi all, please review this trivial(?) removal of a log message and associated unused code. This is a leftover from JDK-8017163 and the merge with the BatchedGangTask API (there is a "Sample CSet Candidates" log which is used in post evacuation phase 2). Testing: changed TestGCLogMessages test case, compilation Thanks, Thomas ------------- Commit messages: - Remove unused log entry Changes: https://git.openjdk.java.net/jdk/pull/5304/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5304&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273144 Stats: 10 lines in 3 files changed: 0 ins; 10 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5304.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5304/head:pull/5304 PR: https://git.openjdk.java.net/jdk/pull/5304 From iveresov at openjdk.java.net Mon Aug 30 15:50:32 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Mon, 30 Aug 2021 15:50:32 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 In-Reply-To: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 18 Aug 2021 12:37:00 GMT, Fairoz Matte wrote: > This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp > > The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >> __ move(addr, tmp); > However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. > The fix is to check if it is is_oop() and call the mov appropriately. > > No issues found in local testing and Mach5 tier1-3 Well, an extra temp register won't solve it, right? You're just getting lucky that tmp2 is not getting spilled. If that happens you get exactly the same situation as before, don't you? ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From tschatzl at openjdk.java.net Mon Aug 30 15:57:27 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 30 Aug 2021 15:57:27 GMT Subject: RFR: 8159979: During initial mark, preparing all regions for marking may take a significant amount of time [v2] In-Reply-To: References: <1aCyJ3XrBfjWFjudPPWbw3mZavoz1nhkQRdzJ1xGOzk=.7aab48b5-233e-4917-864b-a4d4b7c26dd4@github.com> Message-ID: <26gtMDd9QqWt-SRYSOp1UDMr_rWQcNp3i0AIdlMezhw=.26035865-9db0-445e-bd08-56b272bd976e@github.com> On Mon, 30 Aug 2021 15:18:49 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to move the work done during preparation for a concurrent mark start (pre_concurrent_start) into a G1BatchedGangTask. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > typo The new log messages should be added to TestGCLogMessages.java. ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5302 From tschatzl at openjdk.java.net Mon Aug 30 15:58:39 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 30 Aug 2021 15:58:39 GMT Subject: RFR: 8272093: Extract evacuation failure injection from G1CollectedHeap [v2] In-Reply-To: <2UvMZtXuJmIuERs2FvIMyvWGvSSH6lYD9DBpMn35M2o=.3065431c-9243-4201-9726-8ba5713952d2@github.com> References: <2UvMZtXuJmIuERs2FvIMyvWGvSSH6lYD9DBpMn35M2o=.3065431c-9243-4201-9726-8ba5713952d2@github.com> Message-ID: On Mon, 30 Aug 2021 12:27:52 GMT, Stefan Johansson wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Merge branch 'master' into 8272093-extract-evac-failure-injector >> - Also remove injector members from product builds >> - Fix initialization order issue with injector >> - Extract evac failure injector from G1CollectedHeap > > Looks good, but I think the members should be left out in product builds. Thanks @kstefanj @albertnetymk for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/5047 From tschatzl at openjdk.java.net Mon Aug 30 15:58:40 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 30 Aug 2021 15:58:40 GMT Subject: Integrated: 8272093: Extract evacuation failure injection from G1CollectedHeap In-Reply-To: References: Message-ID: <1VTUHwjx44GcKCf5PFqwzN-KcgQ7rPazlWRtMVaVaCA=.dd80b2eb-b5f6-4d34-9011-b947757aba46@github.com> On Mon, 9 Aug 2021 09:25:30 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that extracts out the evacuation failure injection mechanism from `G1CollectedHeap` into its own class? > This reduces the amount of code in `G1CollectedHeap` a bit, moving somewhat uninteresting but somewhat important (testing) code out from there. Also documented it a bit. The other reason was that imho it stuck out a bit due to the ifdef's in `G1CollectedHeap`. > > I checked that gcc does compile out the call to `evacuation_should_fail()` in `G1ParScanThreadState` in product mode. > > Testing: tier1, local testing > > Thanks, > Thomas This pull request has now been integrated. Changeset: 7a01ba65 Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/7a01ba6528923569c5e8d2e9d138d38e95aa4faf Stats: 330 lines in 7 files changed: 214 ins; 110 del; 6 mod 8272093: Extract evacuation failure injection from G1CollectedHeap Reviewed-by: ayang, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/5047 From shade at openjdk.java.net Mon Aug 30 16:25:26 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 30 Aug 2021 16:25:26 GMT Subject: RFR: 8273127: Shenandoah: Adopt relaxed order for update oop In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 11:08:22 GMT, Xiaowei Lu wrote: > Shenandoah evacuates object and then updates the reference in load barriers. Currently, memory order release is adopted in atomic_update_oop(), and we propose to use relaxed instead. Since memory order conservative is adopted in updating forwardee, this membar ensures that object copy is finished before updating the reference. In the scenario when a thread reads self-healed address from forwardee, relaxed is also fine due to the data dependency of the self-healed address. > This relaxation of memory order brings us some performance improvement. We have run specjbb2015 on AArch64. Critical-JOPS increases by 3% while max-JOPS maintains almost the same. > > baseline_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107513, max-jOPS = 102968, critical-jOPS = 37913 > baseline_2:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 106769, max-jOPS = 101742, critical-jOPS = 37151 > baseline_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 110118, max-jOPS = 101742, critical-jOPS = 37041 > > optimized_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 111708, max-jOPS = 101742, critical-jOPS = 37687 > optimized_2:RUN RESULT: hbIR (max attempted) = 147077, hbIR (settled) = 122581, max-jOPS = 104425, critical-jOPS = 39663 > optimized_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107286, max-jOPS = 102968, critical-jOPS = 38579 I believe this is fine, yet I am checking other places and running concurrency-heavy tests. The comment deserves to be updated as well, like this: https://cr.openjdk.java.net/~shade/8273127/update-comments.patch -- what do you think? ------------- PR: https://git.openjdk.java.net/jdk/pull/5299 From tschatzl at openjdk.java.net Mon Aug 30 16:53:59 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 30 Aug 2021 16:53:59 GMT Subject: RFR: 8273144: Remove unused top level "Sample Collection Set Candidates" logging [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this trivial(?) removal of a log message and associated unused code. > > This is a leftover from JDK-8017163 and the merge with the BatchedGangTask API (there is a "Sample CSet Candidates" log which is used in post evacuation phase 2). > > Testing: changed TestGCLogMessages test case, compilation > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Fixes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5304/files - new: https://git.openjdk.java.net/jdk/pull/5304/files/09d86250..7241cfc9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5304&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5304&range=00-01 Stats: 68 lines in 1 file changed: 39 ins; 26 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5304.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5304/head:pull/5304 PR: https://git.openjdk.java.net/jdk/pull/5304 From tschatzl at openjdk.java.net Mon Aug 30 17:07:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 30 Aug 2021 17:07:43 GMT Subject: RFR: 8273147: Update and restructure TestGCLogMessages log message list Message-ID: Hi all, please review this change that updates the TestGCLogMessages file for more or less recently added log messages. Test: executing test case Thanks, Thomas ------------- Depends on: https://git.openjdk.java.net/jdk/pull/5304 Commit messages: - Fixes Changes: https://git.openjdk.java.net/jdk/pull/5306/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5306&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273147 Stats: 68 lines in 1 file changed: 39 ins; 26 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5306.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5306/head:pull/5306 PR: https://git.openjdk.java.net/jdk/pull/5306 From tschatzl at openjdk.java.net Mon Aug 30 17:08:50 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 30 Aug 2021 17:08:50 GMT Subject: RFR: 8273144: Remove unused top level "Sample Collection Set Candidates" logging [v3] In-Reply-To: References: Message-ID: > Hi all, > > please review this trivial(?) removal of a log message and associated unused code. > > This is a leftover from JDK-8017163 and the merge with the BatchedGangTask API (there is a "Sample CSet Candidates" log which is used in post evacuation phase 2). > > Testing: changed TestGCLogMessages test case, compilation > > Thanks, > Thomas Thomas Schatzl has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5304/files - new: https://git.openjdk.java.net/jdk/pull/5304/files/7241cfc9..09d86250 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5304&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5304&range=01-02 Stats: 67 lines in 1 file changed: 25 ins; 38 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/5304.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5304/head:pull/5304 PR: https://git.openjdk.java.net/jdk/pull/5304 From darcy at openjdk.java.net Mon Aug 30 17:57:31 2021 From: darcy at openjdk.java.net (Joe Darcy) Date: Mon, 30 Aug 2021 17:57:31 GMT Subject: RFR: 8273140: Replace usages of Enum.class.getEnumConstants() with Enum.values() where possible In-Reply-To: <4N-BgoVLVgZk_uPkRLoY-427bF-kKKwDr1zRHI7o1PI=.dcdbec6e-f3f8-48e4-94ce-a88d7e494c41@github.com> References: <4N-BgoVLVgZk_uPkRLoY-427bF-kKKwDr1zRHI7o1PI=.dcdbec6e-f3f8-48e4-94ce-a88d7e494c41@github.com> Message-ID: On Mon, 30 Aug 2021 14:26:56 GMT, ?????? ??????? wrote: > Just a very tiny clean-up. > > There are some places in JDK code base where we call `Enum.class.getEnumConstants()` to get all the values of the referenced `enum`. This is excessive, less-readable and slower than just calling `Enum.values()` as in `getEnumConstants()` we have volatile access: > > public T[] getEnumConstants() { > T[] values = getEnumConstantsShared(); > return (values != null) ? values.clone() : null; > } > > private transient volatile T[] enumConstants; > > T[] getEnumConstantsShared() { > T[] constants = enumConstants; > if (constants == null) { /* ... */ } > return constants; > } > > Calling values() method is slightly faster: > > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) > public class EnumBenchmark { > > @Benchmark > public Enum[] values() { > return Enum.values(); > } > > @Benchmark > public Enum[] getEnumConstants() { > return Enum.class.getEnumConstants(); > } > > private enum Enum { > A, > B > } > } > > > Benchmark Mode Cnt Score Error Units > EnumBenchmark.getEnumConstants avgt 15 6,265 ? 0,051 ns/op > EnumBenchmark.getEnumConstants:?gc.alloc.rate avgt 15 2434,075 ? 19,568 MB/sec > EnumBenchmark.getEnumConstants:?gc.alloc.rate.norm avgt 15 24,002 ? 0,001 B/op > EnumBenchmark.getEnumConstants:?gc.churn.G1_Eden_Space avgt 15 2433,709 ? 70,216 MB/sec > EnumBenchmark.getEnumConstants:?gc.churn.G1_Eden_Space.norm avgt 15 23,998 ? 0,659 B/op > EnumBenchmark.getEnumConstants:?gc.churn.G1_Survivor_Space avgt 15 0,009 ? 0,003 MB/sec > EnumBenchmark.getEnumConstants:?gc.churn.G1_Survivor_Space.norm avgt 15 ? 10?? B/op > EnumBenchmark.getEnumConstants:?gc.count avgt 15 210,000 counts > EnumBenchmark.getEnumConstants:?gc.time avgt 15 119,000 ms > > EnumBenchmark.values avgt 15 4,164 ? 0,134 ns/op > EnumBenchmark.values:?gc.alloc.rate avgt 15 3665,341 ? 120,721 MB/sec > EnumBenchmark.values:?gc.alloc.rate.norm avgt 15 24,002 ? 0,001 B/op > EnumBenchmark.values:?gc.churn.G1_Eden_Space avgt 15 3660,512 ? 137,250 MB/sec > EnumBenchmark.values:?gc.churn.G1_Eden_Space.norm avgt 15 23,972 ? 0,529 B/op > EnumBenchmark.values:?gc.churn.G1_Survivor_Space avgt 15 0,017 ? 0,003 MB/sec > EnumBenchmark.values:?gc.churn.G1_Survivor_Space.norm avgt 15 ? 10?? B/op > EnumBenchmark.values:?gc.count avgt 15 262,000 counts > EnumBenchmark.values:?gc.time avgt 15 155,000 ms src/java.desktop/share/classes/sun/font/AttributeValues.java line 2: > 1: /* > 2: * Copyright (c) 2004, 2014, Oracle and/or its affiliates. All rights reserved. Per OpenJDK conventions, the original copyright year must be preserved when a file is updated so "Copyright (c) 2004, 2021," in this case. ------------- PR: https://git.openjdk.java.net/jdk/pull/5303 From ayang at openjdk.java.net Mon Aug 30 18:41:33 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 30 Aug 2021 18:41:33 GMT Subject: RFR: 8271880: Tighten condition for excluding regions from collecting cards with cross-references [v5] In-Reply-To: References: Message-ID: On Tue, 10 Aug 2021 08:41:08 GMT, Thomas Schatzl wrote: >> Hi, >> >> can I have reviews for this change that by tightening the condition for excluding regions from collecting cards with cross-references allows us to avoid the rescanning of objects that failed evacuation in the fix up self-forwards phase after evacuation failure. >> >> I.e. during gc g1 never collects cards/references from the young gen (including eden) for later refinement - which means that we need to rescan all live objects remaining in eden regions for cross-references. >> >> The problem or the reason why we did that was that we did not want to add cards to refine from survivor regions (i.e. next gc's young gen) because we just don't need to as we always collect young gen, so references from there need not be recorded in the remembered sets (and actually, if we did, we errorneouosly marked cards in young gen which later card table verification will not like) - but we did not have that information on hand anywhere already quickly accessible. >> >> This change solves that problem by actually recording this information in the region attribute table as "NewSurvivor" type region. "NewSurvivor" because I did want to make explicit that these are the survivor regions from the *new* (next) young generation (i.e. just survivor) and not the survivor regions of the previous gc (that were turned eden at the start of this gc) but something like "NewYoung" or so would be fine with me as well (or certainly just "Survivor", but that might be confusing). >> >> Another interesting addition is probably the new assert in `G1ParThreadScanState::enqueue_card_if_tracked` >> >> >> assert(!_g1h->heap_region_containing(o)->in_collection_set(), "Should not try to enqueue reference into collection set region"); >> >> This, at this time, verifies the assumption that g1 is not trying to collect references *to* the collection set, i.e. other objects that failed evacuation - after all we later relabel their regions as old *without* a remembered set; we would do otherwise unnecessarily because the reason is that (currently) cset tracking for these regions is enabled (at least during gc - we only later relabel and drop the remembered sets). >> >> This might change later if we want to move evacuation failed regions into survivor (or keep their remembered sets for some reason), but for now we filter attempts to add cards in the dcqs for those this way. >> >> Testing: tier1-5, gc/g1 with `JAVA_OPTIONS_=-XX+G1EvacuationFailureALot -XX:+VerifyAfterGC`. >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into evac-failure-no-scan-during-remove-self-forwards > - More updates to comment > - Clarify comment > - ayang review > - Some random changes > - No scan during self forward The patch looks fine. Some confusion around assertions and comments. Re // ... assert(!dest_attr.is_young() || _g1h->heap_region_containing(to_array)->is_survivor(), "must be"); G1SkipCardEnqueueSetter x(&_scanner, dest_attr.is_young()); Even after reading the preceding comments, I can't follow the assertion logic, and appreciate why it's related to the constructor. I wonder if the following revised assertion/comment is correct and clearer. // Skip the card enqueue iff the objective (to_array) is in survivor region. However, HeapRegion::is_survivor() is too expensive here. // Instead, we use dest_attr.is_young() because the two values are always equal: successfully allocated young regions must be survivor regions. assert(dest_attr.is_young() == _g1h->heap_region_containing(to_array)->is_survivor(), "must be"); G1SkipCardEnqueueSetter x(&_scanner, dest_attr.is_young()); ------------- Marked as reviewed by ayang (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5037 From iwalulya at openjdk.java.net Mon Aug 30 18:57:27 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 30 Aug 2021 18:57:27 GMT Subject: RFR: 8273144: Remove unused top level "Sample Collection Set Candidates" logging [v3] In-Reply-To: References: Message-ID: <0I5iMeeArWlFVuav3dijQmvtgYx4kepUl2etkgtVRD0=.47499f66-9889-4b17-8043-65862eace821@github.com> On Mon, 30 Aug 2021 17:08:50 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this trivial(?) removal of a log message and associated unused code. >> >> This is a leftover from JDK-8017163 and the merge with the BatchedGangTask API (there is a "Sample CSet Candidates" log which is used in post evacuation phase 2). >> >> Testing: changed TestGCLogMessages test case, compilation >> >> Thanks, >> Thomas > > Thomas Schatzl has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Marked as reviewed by iwalulya (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5304 From iwalulya at openjdk.java.net Mon Aug 30 19:05:31 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 30 Aug 2021 19:05:31 GMT Subject: RFR: 8273147: Update and restructure TestGCLogMessages log message list In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 17:01:17 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that updates the TestGCLogMessages file for more or less recently added log messages. > > Test: executing test case > > Thanks, > Thomas Lgtm! ------------- Marked as reviewed by iwalulya (Committer). PR: https://git.openjdk.java.net/jdk/pull/5306 From github.com+10835776+stsypanov at openjdk.java.net Mon Aug 30 19:20:53 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Mon, 30 Aug 2021 19:20:53 GMT Subject: RFR: 8273140: Replace usages of Enum.class.getEnumConstants() with Enum.values() where possible [v2] In-Reply-To: <4N-BgoVLVgZk_uPkRLoY-427bF-kKKwDr1zRHI7o1PI=.dcdbec6e-f3f8-48e4-94ce-a88d7e494c41@github.com> References: <4N-BgoVLVgZk_uPkRLoY-427bF-kKKwDr1zRHI7o1PI=.dcdbec6e-f3f8-48e4-94ce-a88d7e494c41@github.com> Message-ID: > Just a very tiny clean-up. > > There are some places in JDK code base where we call `Enum.class.getEnumConstants()` to get all the values of the referenced `enum`. This is excessive, less-readable and slower than just calling `Enum.values()` as in `getEnumConstants()` we have volatile access: > > public T[] getEnumConstants() { > T[] values = getEnumConstantsShared(); > return (values != null) ? values.clone() : null; > } > > private transient volatile T[] enumConstants; > > T[] getEnumConstantsShared() { > T[] constants = enumConstants; > if (constants == null) { /* ... */ } > return constants; > } > > Calling values() method is slightly faster: > > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) > public class EnumBenchmark { > > @Benchmark > public Enum[] values() { > return Enum.values(); > } > > @Benchmark > public Enum[] getEnumConstants() { > return Enum.class.getEnumConstants(); > } > > private enum Enum { > A, > B > } > } > > > Benchmark Mode Cnt Score Error Units > EnumBenchmark.getEnumConstants avgt 15 6,265 ? 0,051 ns/op > EnumBenchmark.getEnumConstants:?gc.alloc.rate avgt 15 2434,075 ? 19,568 MB/sec > EnumBenchmark.getEnumConstants:?gc.alloc.rate.norm avgt 15 24,002 ? 0,001 B/op > EnumBenchmark.getEnumConstants:?gc.churn.G1_Eden_Space avgt 15 2433,709 ? 70,216 MB/sec > EnumBenchmark.getEnumConstants:?gc.churn.G1_Eden_Space.norm avgt 15 23,998 ? 0,659 B/op > EnumBenchmark.getEnumConstants:?gc.churn.G1_Survivor_Space avgt 15 0,009 ? 0,003 MB/sec > EnumBenchmark.getEnumConstants:?gc.churn.G1_Survivor_Space.norm avgt 15 ? 10?? B/op > EnumBenchmark.getEnumConstants:?gc.count avgt 15 210,000 counts > EnumBenchmark.getEnumConstants:?gc.time avgt 15 119,000 ms > > EnumBenchmark.values avgt 15 4,164 ? 0,134 ns/op > EnumBenchmark.values:?gc.alloc.rate avgt 15 3665,341 ? 120,721 MB/sec > EnumBenchmark.values:?gc.alloc.rate.norm avgt 15 24,002 ? 0,001 B/op > EnumBenchmark.values:?gc.churn.G1_Eden_Space avgt 15 3660,512 ? 137,250 MB/sec > EnumBenchmark.values:?gc.churn.G1_Eden_Space.norm avgt 15 23,972 ? 0,529 B/op > EnumBenchmark.values:?gc.churn.G1_Survivor_Space avgt 15 0,017 ? 0,003 MB/sec > EnumBenchmark.values:?gc.churn.G1_Survivor_Space.norm avgt 15 ? 10?? B/op > EnumBenchmark.values:?gc.count avgt 15 262,000 counts > EnumBenchmark.values:?gc.time avgt 15 155,000 ms ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: 8273140: Fix copyright year ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5303/files - new: https://git.openjdk.java.net/jdk/pull/5303/files/90319b2f..1cfb0af3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5303&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5303&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5303.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5303/head:pull/5303 PR: https://git.openjdk.java.net/jdk/pull/5303 From github.com+10835776+stsypanov at openjdk.java.net Mon Aug 30 19:20:56 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Mon, 30 Aug 2021 19:20:56 GMT Subject: RFR: 8273140: Replace usages of Enum.class.getEnumConstants() with Enum.values() where possible [v2] In-Reply-To: References: <4N-BgoVLVgZk_uPkRLoY-427bF-kKKwDr1zRHI7o1PI=.dcdbec6e-f3f8-48e4-94ce-a88d7e494c41@github.com> Message-ID: On Mon, 30 Aug 2021 17:54:45 GMT, Joe Darcy wrote: >> ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: >> >> 8273140: Fix copyright year > > src/java.desktop/share/classes/sun/font/AttributeValues.java line 2: > >> 1: /* >> 2: * Copyright (c) 2004, 2014, Oracle and/or its affiliates. All rights reserved. > > Per OpenJDK conventions, the original copyright year must be preserved when a file is updated so > > "Copyright (c) 2004, 2021," > > in this case. Done ------------- PR: https://git.openjdk.java.net/jdk/pull/5303 From github.com+10835776+stsypanov at openjdk.java.net Mon Aug 30 19:23:57 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Mon, 30 Aug 2021 19:23:57 GMT Subject: RFR: 8273140: Replace usages of Enum.class.getEnumConstants() with Enum.values() where possible [v3] In-Reply-To: <4N-BgoVLVgZk_uPkRLoY-427bF-kKKwDr1zRHI7o1PI=.dcdbec6e-f3f8-48e4-94ce-a88d7e494c41@github.com> References: <4N-BgoVLVgZk_uPkRLoY-427bF-kKKwDr1zRHI7o1PI=.dcdbec6e-f3f8-48e4-94ce-a88d7e494c41@github.com> Message-ID: > Just a very tiny clean-up. > > There are some places in JDK code base where we call `Enum.class.getEnumConstants()` to get all the values of the referenced `enum`. This is excessive, less-readable and slower than just calling `Enum.values()` as in `getEnumConstants()` we have volatile access: > > public T[] getEnumConstants() { > T[] values = getEnumConstantsShared(); > return (values != null) ? values.clone() : null; > } > > private transient volatile T[] enumConstants; > > T[] getEnumConstantsShared() { > T[] constants = enumConstants; > if (constants == null) { /* ... */ } > return constants; > } > > Calling values() method is slightly faster: > > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) > public class EnumBenchmark { > > @Benchmark > public Enum[] values() { > return Enum.values(); > } > > @Benchmark > public Enum[] getEnumConstants() { > return Enum.class.getEnumConstants(); > } > > private enum Enum { > A, > B > } > } > > > Benchmark Mode Cnt Score Error Units > EnumBenchmark.getEnumConstants avgt 15 6,265 ? 0,051 ns/op > EnumBenchmark.getEnumConstants:?gc.alloc.rate avgt 15 2434,075 ? 19,568 MB/sec > EnumBenchmark.getEnumConstants:?gc.alloc.rate.norm avgt 15 24,002 ? 0,001 B/op > EnumBenchmark.getEnumConstants:?gc.churn.G1_Eden_Space avgt 15 2433,709 ? 70,216 MB/sec > EnumBenchmark.getEnumConstants:?gc.churn.G1_Eden_Space.norm avgt 15 23,998 ? 0,659 B/op > EnumBenchmark.getEnumConstants:?gc.churn.G1_Survivor_Space avgt 15 0,009 ? 0,003 MB/sec > EnumBenchmark.getEnumConstants:?gc.churn.G1_Survivor_Space.norm avgt 15 ? 10?? B/op > EnumBenchmark.getEnumConstants:?gc.count avgt 15 210,000 counts > EnumBenchmark.getEnumConstants:?gc.time avgt 15 119,000 ms > > EnumBenchmark.values avgt 15 4,164 ? 0,134 ns/op > EnumBenchmark.values:?gc.alloc.rate avgt 15 3665,341 ? 120,721 MB/sec > EnumBenchmark.values:?gc.alloc.rate.norm avgt 15 24,002 ? 0,001 B/op > EnumBenchmark.values:?gc.churn.G1_Eden_Space avgt 15 3660,512 ? 137,250 MB/sec > EnumBenchmark.values:?gc.churn.G1_Eden_Space.norm avgt 15 23,972 ? 0,529 B/op > EnumBenchmark.values:?gc.churn.G1_Survivor_Space avgt 15 0,017 ? 0,003 MB/sec > EnumBenchmark.values:?gc.churn.G1_Survivor_Space.norm avgt 15 ? 10?? B/op > EnumBenchmark.values:?gc.count avgt 15 262,000 counts > EnumBenchmark.values:?gc.time avgt 15 155,000 ms ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: 8273140: Fix copyright year ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5303/files - new: https://git.openjdk.java.net/jdk/pull/5303/files/1cfb0af3..171ecd4e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5303&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5303&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5303.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5303/head:pull/5303 PR: https://git.openjdk.java.net/jdk/pull/5303 From kcr at openjdk.java.net Mon Aug 30 19:32:29 2021 From: kcr at openjdk.java.net (Kevin Rushforth) Date: Mon, 30 Aug 2021 19:32:29 GMT Subject: RFR: 8273140: Replace usages of Enum.class.getEnumConstants() with Enum.values() where possible [v3] In-Reply-To: References: <4N-BgoVLVgZk_uPkRLoY-427bF-kKKwDr1zRHI7o1PI=.dcdbec6e-f3f8-48e4-94ce-a88d7e494c41@github.com> Message-ID: On Mon, 30 Aug 2021 19:23:57 GMT, ?????? ??????? wrote: >> Just a very tiny clean-up. >> >> There are some places in JDK code base where we call `Enum.class.getEnumConstants()` to get all the values of the referenced `enum`. This is excessive, less-readable and slower than just calling `Enum.values()` as in `getEnumConstants()` we have volatile access: >> >> public T[] getEnumConstants() { >> T[] values = getEnumConstantsShared(); >> return (values != null) ? values.clone() : null; >> } >> >> private transient volatile T[] enumConstants; >> >> T[] getEnumConstantsShared() { >> T[] constants = enumConstants; >> if (constants == null) { /* ... */ } >> return constants; >> } >> >> Calling values() method is slightly faster: >> >> @BenchmarkMode(Mode.AverageTime) >> @OutputTimeUnit(TimeUnit.NANOSECONDS) >> @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) >> public class EnumBenchmark { >> >> @Benchmark >> public Enum[] values() { >> return Enum.values(); >> } >> >> @Benchmark >> public Enum[] getEnumConstants() { >> return Enum.class.getEnumConstants(); >> } >> >> private enum Enum { >> A, >> B >> } >> } >> >> >> Benchmark Mode Cnt Score Error Units >> EnumBenchmark.getEnumConstants avgt 15 6,265 ? 0,051 ns/op >> EnumBenchmark.getEnumConstants:?gc.alloc.rate avgt 15 2434,075 ? 19,568 MB/sec >> EnumBenchmark.getEnumConstants:?gc.alloc.rate.norm avgt 15 24,002 ? 0,001 B/op >> EnumBenchmark.getEnumConstants:?gc.churn.G1_Eden_Space avgt 15 2433,709 ? 70,216 MB/sec >> EnumBenchmark.getEnumConstants:?gc.churn.G1_Eden_Space.norm avgt 15 23,998 ? 0,659 B/op >> EnumBenchmark.getEnumConstants:?gc.churn.G1_Survivor_Space avgt 15 0,009 ? 0,003 MB/sec >> EnumBenchmark.getEnumConstants:?gc.churn.G1_Survivor_Space.norm avgt 15 ? 10?? B/op >> EnumBenchmark.getEnumConstants:?gc.count avgt 15 210,000 counts >> EnumBenchmark.getEnumConstants:?gc.time avgt 15 119,000 ms >> >> EnumBenchmark.values avgt 15 4,164 ? 0,134 ns/op >> EnumBenchmark.values:?gc.alloc.rate avgt 15 3665,341 ? 120,721 MB/sec >> EnumBenchmark.values:?gc.alloc.rate.norm avgt 15 24,002 ? 0,001 B/op >> EnumBenchmark.values:?gc.churn.G1_Eden_Space avgt 15 3660,512 ? 137,250 MB/sec >> EnumBenchmark.values:?gc.churn.G1_Eden_Space.norm avgt 15 23,972 ? 0,529 B/op >> EnumBenchmark.values:?gc.churn.G1_Survivor_Space avgt 15 0,017 ? 0,003 MB/sec >> EnumBenchmark.values:?gc.churn.G1_Survivor_Space.norm avgt 15 ? 10?? B/op >> EnumBenchmark.values:?gc.count avgt 15 262,000 counts >> EnumBenchmark.values:?gc.time avgt 15 155,000 ms > > ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: > > 8273140: Fix copyright year You missed two of the copyright year problems. src/java.desktop/share/classes/sun/font/EAttribute.java line 2: > 1: /* > 2: * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved. You need to also fix this copyright year. It must be `2005, 2021,` src/java.sql/share/classes/java/sql/JDBCType.java line 2: > 1: /* > 2: * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved. This one, too. ------------- PR: https://git.openjdk.java.net/jdk/pull/5303 From mli at openjdk.java.net Mon Aug 30 23:09:04 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Mon, 30 Aug 2021 23:09:04 GMT Subject: RFR: 8254167 G1: Record regions where evacuation failed to provide targeted iteration [v3] In-Reply-To: References: Message-ID: > This is another try to optimize evcuation failure for regions. > I record evacuation failed regions, and iterate these regions directly rather than iterate the whole cset. > The implementation reuses and refactors some of the existing data in g1CollectedHeap (_regions_failed_evacuation, _num_regions_failed_evacuation), and records these regions in an array, and iterate this array later in post evacuation phase. > > I have 2 implementations: > > - 1) CHT (Initial version) > - 2) bitmap (latest version, reuse _regions_failed_evacuation in g1CollectedHeap) > > This implementation does not consider work distribution as mentioned in JDK-8254167 yet. But seems it already get better&stable performance gain than origin. We could improve it further later if work distribution is necessary. > > I will attach the perf data in JBS. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Fix missing header files ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5272/files - new: https://git.openjdk.java.net/jdk/pull/5272/files/c790b720..71f35da0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5272&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5272&range=01-02 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5272.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5272/head:pull/5272 PR: https://git.openjdk.java.net/jdk/pull/5272 From fmatte at openjdk.java.net Tue Aug 31 03:30:32 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Tue, 31 Aug 2021 03:30:32 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 In-Reply-To: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 18 Aug 2021 12:37:00 GMT, Fairoz Matte wrote: > This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp > > The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >> __ move(addr, tmp); > However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. > The fix is to check if it is is_oop() and call the mov appropriately. > > No issues found in local testing and Mach5 tier1-3 yes that's being the case. I got your point, will try to add guarantees to handle this scenario and propose the solution after testing ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From iveresov at openjdk.java.net Tue Aug 31 04:39:28 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Tue, 31 Aug 2021 04:39:28 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 In-Reply-To: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 18 Aug 2021 12:37:00 GMT, Fairoz Matte wrote: > This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp > > The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >> __ move(addr, tmp); > However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. > The fix is to check if it is is_oop() and call the mov appropriately. > > No issues found in local testing and Mach5 tier1-3 Oh, hey! Another much easier idea. Do it with a `lea`. LIR_Address* addr_addr = new LIR_Address(addr, addr->type()); __ leal(addr_addr, tmp); It will force `addr` into a register. And no need to mess with the register allocator. ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From github.com+39413832+weixlu at openjdk.java.net Tue Aug 31 07:09:26 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Tue, 31 Aug 2021 07:09:26 GMT Subject: RFR: 8273127: Shenandoah: Adopt relaxed order for update oop In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 16:22:22 GMT, Aleksey Shipilev wrote: >> Shenandoah evacuates object and then updates the reference in load barriers. Currently, memory order release is adopted in atomic_update_oop(), and we propose to use relaxed instead. Since memory order conservative is adopted in updating forwardee, this membar ensures that object copy is finished before updating the reference. In the scenario when a thread reads self-healed address from forwardee, relaxed is also fine due to the data dependency of the self-healed address. >> This relaxation of memory order brings us some performance improvement. We have run specjbb2015 on AArch64. Critical-JOPS increases by 3% while max-JOPS maintains almost the same. >> >> baseline_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107513, max-jOPS = 102968, critical-jOPS = 37913 >> baseline_2:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 106769, max-jOPS = 101742, critical-jOPS = 37151 >> baseline_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 110118, max-jOPS = 101742, critical-jOPS = 37041 >> >> optimized_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 111708, max-jOPS = 101742, critical-jOPS = 37687 >> optimized_2:RUN RESULT: hbIR (max attempted) = 147077, hbIR (settled) = 122581, max-jOPS = 104425, critical-jOPS = 39663 >> optimized_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107286, max-jOPS = 102968, critical-jOPS = 38579 > > I believe this is fine, yet I am checking other places and running concurrency-heavy tests. The comment deserves to be updated as well, like this: https://cr.openjdk.java.net/~shade/8273127/update-comments.patch -- what do you think? @shipilev Thanks for your suggestions. Yes, using memory order release in forwardee installation is reasonable and brings visibly performance improvement. If forwardee installation is ?release?, it is incorrect to use ?relaxed? in updating reference, since updating reference is possible to reorder before object copy. To fix this, I agree to your suggestion that we can use OrderAccess::release (unbounded release) to replace Atomic::cmpxchg release (bounded release). Unbounded release is enough to ensure correctness in ?relaxed? updating reference, but it may result in a little bit side effect in performance, compared to bounded release. Besides, could we use memory_order_acq_rel in Atomic::cmpxchg of fwdptr installation? I guess this is also enough for ?relaxed? updating reference, and it is more relax than memory_order_conservative by one membar on AArch64. In short, we can 1. use OrderAccess::release in forwardee installation and memory_order_relaxed in updating reference. 2. use memory_order_acq_rel in forwardee installation and memory_order_relaxed in updating reference. I plan to run specjbb on both cases and see which one gives more performance boost. Looking forward to your suggestions! ------------- PR: https://git.openjdk.java.net/jdk/pull/5299 From eosterlund at openjdk.java.net Tue Aug 31 07:24:22 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 31 Aug 2021 07:24:22 GMT Subject: RFR: 8273122: ZGC: Load forwarding entries without acquire semantics In-Reply-To: References: Message-ID: <6J3tUH2qOBL5jPyxK8eSQLcPQKTe5HwsqNZEZ4f4R3I=.16cb838a-d251-4dd5-b3d1-3a89d2e1766f@github.com> On Mon, 30 Aug 2021 09:17:12 GMT, Hao Tang wrote: > JDK-8272138 introduced an unbound membar release() between the object copy and the forwarding table insertion. > > Thread A (Relocation): > copy(); > release(); > cas_forwarding_table(); > cas_self_heal(); > > The membar guarantees that the contents of the forwarded object are ready after a forwarding entry is loaded. Since load_object_content(ref) depends on the result of load_forwarding_table(), load_acquire can be safely changed to a simple load. > > Thread B (Remapping/Relocation): > ref = load_forwarding_table(); // acquire (current version) -> relaxed (our proposal) > load_object_content(ref); > > Our experiment on heapothesys demonstrates >5% time reduction spent on concurrent mark/relocation on AArch64. > > --------- > ### Progress > - [x] Change must not contain extraneous whitespace > - [x] Commit message must refer to an issue > - [ ] Change must be properly reviewed > > > > > > ### Reviewing >
Using git > > Checkout this PR locally: \ > `$ git fetch https://git.openjdk.java.net/jdk pull/5298/head:pull/5298` \ > `$ git checkout pull/5298` > > Update a local copy of the PR: \ > `$ git checkout pull/5298` \ > `$ git pull https://git.openjdk.java.net/jdk pull/5298/head` > >
>
Using Skara CLI tools > > Checkout this PR locally: \ > `$ git pr checkout 5298` > > View PR using the GUI difftool: \ > `$ git pr show -t 5298` > >
>
Using diff file > > Download this PR as a diff file: \ > https://git.openjdk.java.net/jdk/pull/5298.diff > >
We usually do not relax things to rely on data dependency in C++ code, as the C++ standard gives us no guarantees that the data dependency will be retained in the generated machine code. It only gives threats that it might not work. In practice probably not, but still. So I don't know how I feel about this. Have you tried to use load consume instead? If that yields the same improvement, we can incorporate load_consume into HotSpot. I think that is an interesting experiment worth running first. ------------- PR: https://git.openjdk.java.net/jdk/pull/5298 From shade at openjdk.java.net Tue Aug 31 07:53:27 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 31 Aug 2021 07:53:27 GMT Subject: RFR: 8273122: ZGC: Load forwarding entries without acquire semantics In-Reply-To: <6J3tUH2qOBL5jPyxK8eSQLcPQKTe5HwsqNZEZ4f4R3I=.16cb838a-d251-4dd5-b3d1-3a89d2e1766f@github.com> References: <6J3tUH2qOBL5jPyxK8eSQLcPQKTe5HwsqNZEZ4f4R3I=.16cb838a-d251-4dd5-b3d1-3a89d2e1766f@github.com> Message-ID: On Tue, 31 Aug 2021 07:21:02 GMT, Erik ?sterlund wrote: > We usually do not relax things to rely on data dependency in C++ code, as the C++ standard gives us no guarantees that the data dependency will be retained in the generated machine code. It only gives threats that it might not work. In practice probably not, but still. So I don't know how I feel about this. Have you tried to use load consume instead? If that yields the same improvement, we can incorporate load_consume into HotSpot. I think that is an interesting experiment worth running first. FWIW, Shenandoah faces a similar thing (see #2496), but in a different direction: it started with "relaxed" for fwdptr accesses. Bumping that up to "acquire" for GC code costs quite a bit. I think I convinced myself (to be fair, with a significant amount of handwaving) that while dangerous, it still works. Yes, "consume" would be nice as the additional safety measure, but I don't think any current C++ compiler properly implements it, and instead it bumps "consume" to "acquire", leaving you at the same performance place. That is to say that "consume" experiments might be futile at this moment. ------------- PR: https://git.openjdk.java.net/jdk/pull/5298 From tschatzl at openjdk.java.net Tue Aug 31 08:31:27 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 08:31:27 GMT Subject: RFR: 8159979: During initial mark, preparing all regions for marking may take a significant amount of time [v2] In-Reply-To: References: <1aCyJ3XrBfjWFjudPPWbw3mZavoz1nhkQRdzJ1xGOzk=.7aab48b5-233e-4917-864b-a4d4b7c26dd4@github.com> Message-ID: On Mon, 30 Aug 2021 15:18:49 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to move the work done during preparation for a concurrent mark start (pre_concurrent_start) into a G1BatchedGangTask. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > typo Since I'm messing around with the TestGCLogMessages.java test anyway currently, I'll add the two messages in https://github.com/openjdk/jdk/pull/5306. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5302 From eosterlund at openjdk.java.net Tue Aug 31 08:36:23 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 31 Aug 2021 08:36:23 GMT Subject: RFR: 8273122: ZGC: Load forwarding entries without acquire semantics In-Reply-To: References: <6J3tUH2qOBL5jPyxK8eSQLcPQKTe5HwsqNZEZ4f4R3I=.16cb838a-d251-4dd5-b3d1-3a89d2e1766f@github.com> Message-ID: On Tue, 31 Aug 2021 07:50:27 GMT, Aleksey Shipilev wrote: > > We usually do not relax things to rely on data dependency in C++ code, as the C++ standard gives us no guarantees that the data dependency will be retained in the generated machine code. It only gives threats that it might not work. In practice probably not, but still. So I don't know how I feel about this. Have you tried to use load consume instead? If that yields the same improvement, we can incorporate load_consume into HotSpot. I think that is an interesting experiment worth running first. > > FWIW, Shenandoah faces a similar thing (see #2496), but in a different direction: it started with "relaxed" for fwdptr accesses. Bumping that up to "acquire" for GC code costs quite a bit. I think I convinced myself (to be fair, with a significant amount of handwaving) that while dangerous, it still works. Yes, "consume" would be nice as the additional safety measure, but I don't think any current C++ compiler properly implements it, and instead it bumps "consume" to "acquire", leaving you at the same performance place. That is to say that "consume" experiments might be futile at this moment. I also recall reading that due to the complexity of implementing consume correctly, compilers chose to implement it as acquire instead. But that was years ago and I was kind of hoping something would have changed there. But maybe that is still the state of the art a decade after its introduction. It's a pity because all we want is for the compiler to not do anything outrageously weird such as breaking trivial data dependencies with some hypothetical bizarre address guessing optimization, that I'm sure nobody does anyway. Sigh... ------------- PR: https://git.openjdk.java.net/jdk/pull/5298 From fweimer at openjdk.java.net Tue Aug 31 09:05:25 2021 From: fweimer at openjdk.java.net (Florian Weimer) Date: Tue, 31 Aug 2021 09:05:25 GMT Subject: RFR: 8273122: ZGC: Load forwarding entries without acquire semantics In-Reply-To: References: Message-ID: <3nrIWFtGqAtEPkMyhl-UzPzJak-RCDG7-b-Bn8OJ6ys=.2af5cf5c-dd8b-4210-96e6-abaf97f74179@github.com> On Mon, 30 Aug 2021 09:17:12 GMT, Hao Tang wrote: > JDK-8272138 introduced an unbound membar release() between the object copy and the forwarding table insertion. > > Thread A (Relocation): > copy(); > release(); > cas_forwarding_table(); > cas_self_heal(); > > The membar guarantees that the contents of the forwarded object are ready after a forwarding entry is loaded. Since load_object_content(ref) depends on the result of load_forwarding_table(), load_acquire can be safely changed to a simple load. > > Thread B (Remapping/Relocation): > ref = load_forwarding_table(); // acquire (current version) -> relaxed (our proposal) > load_object_content(ref); > > Our experiment on heapothesys demonstrates >5% time reduction spent on concurrent mark/relocation on AArch64. > > --------- > ### Progress > - [x] Change must not contain extraneous whitespace > - [x] Commit message must refer to an issue > - [ ] Change must be properly reviewed > > > > > > ### Reviewing >
Using git > > Checkout this PR locally: \ > `$ git fetch https://git.openjdk.java.net/jdk pull/5298/head:pull/5298` \ > `$ git checkout pull/5298` > > Update a local copy of the PR: \ > `$ git checkout pull/5298` \ > `$ git pull https://git.openjdk.java.net/jdk pull/5298/head` > >
>
Using Skara CLI tools > > Checkout this PR locally: \ > `$ git pr checkout 5298` > > View PR using the GUI difftool: \ > `$ git pr show -t 5298` > >
>
Using diff file > > Download this PR as a diff file: \ > https://git.openjdk.java.net/jdk/pull/5298.diff > >
As far as I know, Hotspot does not consistently use C++ atomics, so whether the compiler implements consume-as-acquire or any other aspect of the C++ memory model not does not really matter. Consequently, the code is full of undefined data races as far as the compiler is concerned. Compiler writers are aware of that and generally avoid optimizations that would break this memory-model-in-a-library approach (although there is of course no language specification that actually guarantees this). I think that historically, the approach has been that when compiler optimizations make things go wrong, some sort of compiler barrier is added to the code. I assume the same thing could be done to implement an approximation to consume semantics. ------------- PR: https://git.openjdk.java.net/jdk/pull/5298 From ayang at openjdk.java.net Tue Aug 31 09:31:26 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 31 Aug 2021 09:31:26 GMT Subject: RFR: 8159979: During initial mark, preparing all regions for marking may take a significant amount of time [v2] In-Reply-To: References: <1aCyJ3XrBfjWFjudPPWbw3mZavoz1nhkQRdzJ1xGOzk=.7aab48b5-233e-4917-864b-a4d4b7c26dd4@github.com> Message-ID: On Mon, 30 Aug 2021 15:18:49 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to move the work done during preparation for a concurrent mark start (pre_concurrent_start) into a G1BatchedGangTask. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > typo Marked as reviewed by ayang (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5302 From tschatzl at openjdk.java.net Tue Aug 31 09:32:33 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 09:32:33 GMT Subject: RFR: 8237567: Refactor G1-specific code in shared VM_CollectForMetadataAllocation In-Reply-To: References: Message-ID: On Fri, 27 Aug 2021 15:48:50 GMT, Ivan Walulya wrote: > Hi all, > > Please review this refactoring change to move G1 specific code out of VM_CollectForMetadataAllocation. > > Testing: Tier 1-3. While not completely satisfying - maybe adding a predicate to `CollectedHeap` for this to completely remove the G1 dependency? - it's a step in the right direction. Adding more methods that are really specific to one GC are not really satisfying either, so lgtm from me. Please fix the method name `concurrent` vs. `conc` before pushing. src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 1137: > 1135: bool try_collect(GCCause::Cause cause, const G1GCCounters& counters_before); > 1136: > 1137: void initiate_conc_gc_for_metadata_allocation(GCCause::Cause gc_cause); Please type out `concurrent` as other identifiers here do. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5282 From iwalulya at openjdk.java.net Tue Aug 31 09:52:52 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 31 Aug 2021 09:52:52 GMT Subject: RFR: 8237567: Refactor G1-specific code in shared VM_CollectForMetadataAllocation [v2] In-Reply-To: References: Message-ID: <4VKbHcNBHHnF456cJBnFVVRUCjCrWvd2SImljOFuFKw=.6d47f8dd-752c-4b25-9844-c0ee892750c7@github.com> > Hi all, > > Please review this refactoring change to move G1 specific code out of VM_CollectForMetadataAllocation. > > Testing: Tier 1-3. Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: Thomas review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5282/files - new: https://git.openjdk.java.net/jdk/pull/5282/files/b8827b66..13c7bf74 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5282&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5282&range=00-01 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5282.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5282/head:pull/5282 PR: https://git.openjdk.java.net/jdk/pull/5282 From tschatzl at openjdk.java.net Tue Aug 31 11:19:35 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 11:19:35 GMT Subject: RFR: 8272985: Reference discovery is confused about atomicity and degree of parallelism Message-ID: Hi all, can I have reviews for this change that fixes some (apparent) confusion between atomicity (of the discovery in the `ReferenceProcessor` sense) vs. the currently selected degree of parallelism? For all four different combinations of atomicity and parallelism the `discovered` link in the `java.lang.ref.Reference` needs to be updated differently using a different kind of access, instead of just two based on atomicity. I'll fix the use of `atomic` in `ReferenceProcessor` in a separate CR to hopefully remove the confusion for the reader too. Testing: tier1-5, internal perf benchmarks without regressions Thanks, Thomas ------------- Commit messages: - Some refactoring - Confused with atomic vs. MT - Initial commit Changes: https://git.openjdk.java.net/jdk/pull/5314/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5314&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272985 Stats: 82 lines in 3 files changed: 44 ins; 20 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/5314.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5314/head:pull/5314 PR: https://git.openjdk.java.net/jdk/pull/5314 From eosterlund at openjdk.java.net Tue Aug 31 11:28:30 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 31 Aug 2021 11:28:30 GMT Subject: RFR: 8273122: ZGC: Load forwarding entries without acquire semantics In-Reply-To: <3nrIWFtGqAtEPkMyhl-UzPzJak-RCDG7-b-Bn8OJ6ys=.2af5cf5c-dd8b-4210-96e6-abaf97f74179@github.com> References: <3nrIWFtGqAtEPkMyhl-UzPzJak-RCDG7-b-Bn8OJ6ys=.2af5cf5c-dd8b-4210-96e6-abaf97f74179@github.com> Message-ID: <__3vd0zHPnyiWG9EX51xHkw1mQUeRvBY3iKgq6TwkJs=.0db5ad5d-f5d1-467b-bf42-739dc061c8cd@github.com> On Tue, 31 Aug 2021 09:02:32 GMT, Florian Weimer wrote: > As far as I know, Hotspot does not consistently use C++ atomics, so whether the compiler implements consume-as-acquire or any other aspect of the C++ memory model not does not really matter. Consequently, the code is full of undefined data races as far as the compiler is concerned. Compiler writers are aware of that and generally avoid optimizations that would break this memory-model-in-a-library approach (although there is of course no language specification that actually guarantees this). > > I think that historically, the approach has been that when compiler optimizations make things go wrong, some sort of compiler barrier is added to the code. I assume the same thing could be done to implement an approximation to consume semantics. We have not had access to C++ atomics for a long time. It's only rather recently that we upgraded past C++11. So I don't think the question of consume or acquire is irrelevant. If consume was to magically do what we want, in a way that is more standards compliant, then it would be a compelling reason to start upgrading our atomics. Given of course, that consume actually does do something other than to just bind to acquire. ------------- PR: https://git.openjdk.java.net/jdk/pull/5298 From tschatzl at openjdk.java.net Tue Aug 31 11:36:51 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 11:36:51 GMT Subject: RFR: 8272985: Reference discovery is confused about atomicity and degree of parallelism [v2] In-Reply-To: References: Message-ID: > Hi all, > > can I have reviews for this change that fixes some (apparent) confusion between atomicity (of the discovery in the `ReferenceProcessor` sense) vs. the currently selected degree of parallelism? > > For all four different combinations of atomicity and parallelism the `discovered` link in the `java.lang.ref.Reference` needs to be updated differently using a different kind of access, instead of just two based on atomicity. > > I'll fix the use of `atomic` in `ReferenceProcessor` in a separate CR to hopefully remove the confusion for the reader too. > > Testing: tier1-5, internal perf benchmarks without regressions > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Some more refactorng ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5314/files - new: https://git.openjdk.java.net/jdk/pull/5314/files/115be125..e213e69c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5314&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5314&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/5314.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5314/head:pull/5314 PR: https://git.openjdk.java.net/jdk/pull/5314 From tschatzl at openjdk.java.net Tue Aug 31 11:40:24 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 11:40:24 GMT Subject: RFR: 8273033: SerialGC: remove obsolete comments In-Reply-To: References: Message-ID: On Thu, 26 Aug 2021 14:44:20 GMT, Albert Mingkun Yang wrote: > Trivial change of removing obsolete comments. Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5267 From tschatzl at openjdk.java.net Tue Aug 31 12:13:45 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 12:13:45 GMT Subject: RFR: 8271880: Tighten condition for excluding regions from collecting cards with cross-references [v6] In-Reply-To: References: Message-ID: <_bJkZ5LEmVAF9BbWwOQM5SUpeHdft19plI1H4ymkVqM=.e6a8c755-89ce-4451-9906-3ccff92cc4b1@github.com> > Hi, > > can I have reviews for this change that by tightening the condition for excluding regions from collecting cards with cross-references allows us to avoid the rescanning of objects that failed evacuation in the fix up self-forwards phase after evacuation failure. > > I.e. during gc g1 never collects cards/references from the young gen (including eden) for later refinement - which means that we need to rescan all live objects remaining in eden regions for cross-references. > > The problem or the reason why we did that was that we did not want to add cards to refine from survivor regions (i.e. next gc's young gen) because we just don't need to as we always collect young gen, so references from there need not be recorded in the remembered sets (and actually, if we did, we errorneouosly marked cards in young gen which later card table verification will not like) - but we did not have that information on hand anywhere already quickly accessible. > > This change solves that problem by actually recording this information in the region attribute table as "NewSurvivor" type region. "NewSurvivor" because I did want to make explicit that these are the survivor regions from the *new* (next) young generation (i.e. just survivor) and not the survivor regions of the previous gc (that were turned eden at the start of this gc) but something like "NewYoung" or so would be fine with me as well (or certainly just "Survivor", but that might be confusing). > > Another interesting addition is probably the new assert in `G1ParThreadScanState::enqueue_card_if_tracked` > > > assert(!_g1h->heap_region_containing(o)->in_collection_set(), "Should not try to enqueue reference into collection set region"); > > This, at this time, verifies the assumption that g1 is not trying to collect references *to* the collection set, i.e. other objects that failed evacuation - after all we later relabel their regions as old *without* a remembered set; we would do otherwise unnecessarily because the reason is that (currently) cset tracking for these regions is enabled (at least during gc - we only later relabel and drop the remembered sets). > > This might change later if we want to move evacuation failed regions into survivor (or keep their remembered sets for some reason), but for now we filter attempts to add cards in the dcqs for those this way. > > Testing: tier1-5, gc/g1 with `JAVA_OPTIONS_=-XX+G1EvacuationFailureALot -XX:+VerifyAfterGC`. > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge master - Merge branch 'master' into evac-failure-no-scan-during-remove-self-forwards - More updates to comment - Clarify comment - ayang review - Some random changes - No scan during self forward ------------- Changes: https://git.openjdk.java.net/jdk/pull/5037/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5037&range=05 Stats: 149 lines in 12 files changed: 50 ins; 67 del; 32 mod Patch: https://git.openjdk.java.net/jdk/pull/5037.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5037/head:pull/5037 PR: https://git.openjdk.java.net/jdk/pull/5037 From iwalulya at openjdk.java.net Tue Aug 31 12:33:31 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 31 Aug 2021 12:33:31 GMT Subject: Integrated: 8159979: During initial mark, preparing all regions for marking may take a significant amount of time In-Reply-To: <1aCyJ3XrBfjWFjudPPWbw3mZavoz1nhkQRdzJ1xGOzk=.7aab48b5-233e-4917-864b-a4d4b7c26dd4@github.com> References: <1aCyJ3XrBfjWFjudPPWbw3mZavoz1nhkQRdzJ1xGOzk=.7aab48b5-233e-4917-864b-a4d4b7c26dd4@github.com> Message-ID: On Mon, 30 Aug 2021 13:30:24 GMT, Ivan Walulya wrote: > Hi all, > > Please review this change to move the work done during preparation for a concurrent mark start (pre_concurrent_start) into a G1BatchedGangTask. > > Testing: Tier 1-3 This pull request has now been integrated. Changeset: 841e3943 Author: Ivan Walulya URL: https://git.openjdk.java.net/jdk/commit/841e3943c480c95409446bb35fb9a56d7fc48c8a Stats: 121 lines in 5 files changed: 87 ins; 20 del; 14 mod 8159979: During initial mark, preparing all regions for marking may take a significant amount of time Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.java.net/jdk/pull/5302 From iwalulya at openjdk.java.net Tue Aug 31 12:33:31 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 31 Aug 2021 12:33:31 GMT Subject: RFR: 8159979: During initial mark, preparing all regions for marking may take a significant amount of time [v2] In-Reply-To: References: <1aCyJ3XrBfjWFjudPPWbw3mZavoz1nhkQRdzJ1xGOzk=.7aab48b5-233e-4917-864b-a4d4b7c26dd4@github.com> Message-ID: On Mon, 30 Aug 2021 15:18:49 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to move the work done during preparation for a concurrent mark start (pre_concurrent_start) into a G1BatchedGangTask. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > typo Thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5302 From ayang at openjdk.java.net Tue Aug 31 12:55:30 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 31 Aug 2021 12:55:30 GMT Subject: RFR: 8273033: SerialGC: remove obsolete comments In-Reply-To: References: Message-ID: On Thu, 26 Aug 2021 14:44:20 GMT, Albert Mingkun Yang wrote: > Trivial change of removing obsolete comments. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5267 From ayang at openjdk.java.net Tue Aug 31 12:55:31 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 31 Aug 2021 12:55:31 GMT Subject: Integrated: 8273033: SerialGC: remove obsolete comments In-Reply-To: References: Message-ID: On Thu, 26 Aug 2021 14:44:20 GMT, Albert Mingkun Yang wrote: > Trivial change of removing obsolete comments. This pull request has now been integrated. Changeset: 9bc7cc56 Author: Albert Mingkun Yang URL: https://git.openjdk.java.net/jdk/commit/9bc7cc56513adb9d2e39cd286d2a229c9c285e2d Stats: 7 lines in 1 file changed: 0 ins; 7 del; 0 mod 8273033: SerialGC: remove obsolete comments Reviewed-by: sjohanss, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/5267 From ayang at openjdk.java.net Tue Aug 31 13:07:29 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 31 Aug 2021 13:07:29 GMT Subject: RFR: 8273144: Remove unused top level "Sample Collection Set Candidates" logging [v3] In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 17:08:50 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this trivial(?) removal of a log message and associated unused code. >> >> This is a leftover from JDK-8017163 and the merge with the BatchedGangTask API (there is a "Sample CSet Candidates" log which is used in post evacuation phase 2). >> >> Testing: changed TestGCLogMessages test case, compilation >> >> Thanks, >> Thomas > > Thomas Schatzl has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Marked as reviewed by ayang (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5304 From tschatzl at openjdk.java.net Tue Aug 31 13:26:31 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 13:26:31 GMT Subject: RFR: 8273144: Remove unused top level "Sample Collection Set Candidates" logging [v3] In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 17:08:50 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this trivial(?) removal of a log message and associated unused code. >> >> This is a leftover from JDK-8017163 and the merge with the BatchedGangTask API (there is a "Sample CSet Candidates" log which is used in post evacuation phase 2). >> >> Testing: changed TestGCLogMessages test case, compilation >> >> Thanks, >> Thomas > > Thomas Schatzl has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Thanks for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/5304 From tschatzl at openjdk.java.net Tue Aug 31 13:29:30 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 13:29:30 GMT Subject: RFR: 8272905: Consolidate discovered lists processing In-Reply-To: <-BaizreVcGJ9jz4D2W5ETKOW93gquYosYjmOayyRmX8=.6c21108b-7ce0-4270-b4c1-10d16928dea9@github.com> References: <-BaizreVcGJ9jz4D2W5ETKOW93gquYosYjmOayyRmX8=.6c21108b-7ce0-4270-b4c1-10d16928dea9@github.com> Message-ID: On Tue, 24 Aug 2021 17:50:57 GMT, Albert Mingkun Yang wrote: > Simple change of merging discovered list processing for four kinds of non-strong refs and some general cleanup around. > > Test: tier1-3 Looks good to me as far as I can tell apart from the minor suggestion in `RefProcTask::process_discovered_list`, but I'm not too confident about the ins and outs of this code. src/hotspot/share/gc/shared/referenceProcessor.cpp line 389: > 387: > 388: if (referent == NULL || iter.is_referent_alive()) { > 389: iter.make_referent_alive(); The handling for `NULL` referents is (just a bit) different for phantom refs than for the others, but seems inconsequential. src/hotspot/share/gc/shared/referenceProcessor.cpp line 453: > 451: } > 452: > 453: assert(do_enqueue_and_clear != (ref_type == REF_FINAL), "Only Final refs are not enqueued"); Instead of this assert and the assignments to `do_enqeue_and_clear` above, `ref_type != REF_FINAL` could be passed to `process_discoved_list_work`. That would tighten the code and make the assert obsolete. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5242 From shade at openjdk.java.net Tue Aug 31 13:30:30 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 31 Aug 2021 13:30:30 GMT Subject: RFR: 8273127: Shenandoah: Adopt relaxed order for update oop In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 11:08:22 GMT, Xiaowei Lu wrote: > Shenandoah evacuates object and then updates the reference in load barriers. Currently, memory order release is adopted in atomic_update_oop(), and we propose to use relaxed instead. Since memory order conservative is adopted in updating forwardee, this membar ensures that object copy is finished before updating the reference. In the scenario when a thread reads self-healed address from forwardee, relaxed is also fine due to the data dependency of the self-healed address. > This relaxation of memory order brings us some performance improvement. We have run specjbb2015 on AArch64. Critical-JOPS increases by 3% while max-JOPS maintains almost the same. > > baseline_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107513, max-jOPS = 102968, critical-jOPS = 37913 > baseline_2:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 106769, max-jOPS = 101742, critical-jOPS = 37151 > baseline_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 110118, max-jOPS = 101742, critical-jOPS = 37041 > > optimized_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 111708, max-jOPS = 101742, critical-jOPS = 37687 > optimized_2:RUN RESULT: hbIR (max attempted) = 147077, hbIR (settled) = 122581, max-jOPS = 104425, critical-jOPS = 39663 > optimized_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107286, max-jOPS = 102968, critical-jOPS = 38579 I think forwardee installation is better done with `acq_rel`, given that two GC threads may try to install it, and the failing thread should consume/acquire the other forwardee. Not sure the `OrderAccess::release` provides the same semantics. I have updated #2496 with more discussion and implementation. ------------- PR: https://git.openjdk.java.net/jdk/pull/5299 From iwalulya at openjdk.java.net Tue Aug 31 13:35:31 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 31 Aug 2021 13:35:31 GMT Subject: RFR: 8237567: Refactor G1-specific code in shared VM_CollectForMetadataAllocation [v2] In-Reply-To: <4VKbHcNBHHnF456cJBnFVVRUCjCrWvd2SImljOFuFKw=.6d47f8dd-752c-4b25-9844-c0ee892750c7@github.com> References: <4VKbHcNBHHnF456cJBnFVVRUCjCrWvd2SImljOFuFKw=.6d47f8dd-752c-4b25-9844-c0ee892750c7@github.com> Message-ID: On Tue, 31 Aug 2021 09:52:52 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this refactoring change to move G1 specific code out of VM_CollectForMetadataAllocation. >> >> Testing: Tier 1-3. > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > Thomas review Thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5282 From iwalulya at openjdk.java.net Tue Aug 31 13:35:32 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 31 Aug 2021 13:35:32 GMT Subject: Integrated: 8237567: Refactor G1-specific code in shared VM_CollectForMetadataAllocation In-Reply-To: References: Message-ID: On Fri, 27 Aug 2021 15:48:50 GMT, Ivan Walulya wrote: > Hi all, > > Please review this refactoring change to move G1 specific code out of VM_CollectForMetadataAllocation. > > Testing: Tier 1-3. This pull request has now been integrated. Changeset: e6712551 Author: Ivan Walulya URL: https://git.openjdk.java.net/jdk/commit/e67125512f585c8efad2e7685b9bc409c96563d7 Stats: 45 lines in 4 files changed: 17 ins; 27 del; 1 mod 8237567: Refactor G1-specific code in shared VM_CollectForMetadataAllocation Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/5282 From shade at openjdk.java.net Tue Aug 31 14:09:35 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 31 Aug 2021 14:09:35 GMT Subject: RFR: 8273122: ZGC: Load forwarding entries without acquire semantics In-Reply-To: <__3vd0zHPnyiWG9EX51xHkw1mQUeRvBY3iKgq6TwkJs=.0db5ad5d-f5d1-467b-bf42-739dc061c8cd@github.com> References: <3nrIWFtGqAtEPkMyhl-UzPzJak-RCDG7-b-Bn8OJ6ys=.2af5cf5c-dd8b-4210-96e6-abaf97f74179@github.com> <__3vd0zHPnyiWG9EX51xHkw1mQUeRvBY3iKgq6TwkJs=.0db5ad5d-f5d1-467b-bf42-739dc061c8cd@github.com> Message-ID: On Tue, 31 Aug 2021 11:25:46 GMT, Erik ?sterlund wrote: > Given of course, that consume actually does do something other than to just bind to acquire. Today, both GCC trunk and Clang trunk on ARM64 lift `consume` to `acquire`, see https://godbolt.org/z/7jxohsWE1 -- so that path is still unhappy. After spending a fun day in this particular rabbit hole, I tentatively think we are safer leaving the `acquire`-in-lieu-of-`consume` in. In new revision of #2496, we do that for Shenandoah. I think relocation phase where forwardings are changing concurrently can subsume the cost of `acquire`. #2496 also tries to optimize the Shenandoah paths that are penalized by `acquires` unnecessarily: heap updates after relocations are done. Maybe a similar partial relaxation would be possible for ZGC, e.g. for concurrent mark that does fixups. The reason why I say "for safety" is because I think I see odd C++ MM cases where not acquiring the fwdptr breaks the apparent release-order / release-acquire transitivity. (Don't ask for details yet, still working that out... and not even sure "consume" fixes it.) It seems to be C++ only problem, as herd7 examples for AArch64 and Power behave like we want. So that one is not necessarily a concern for a hardware-centric low-level not-your-father's-C++ code, but I wonder if we even want to risk it. ------------- PR: https://git.openjdk.java.net/jdk/pull/5298 From ayang at openjdk.java.net Tue Aug 31 14:11:49 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 31 Aug 2021 14:11:49 GMT Subject: RFR: 8272905: Consolidate discovered lists processing [v2] In-Reply-To: <-BaizreVcGJ9jz4D2W5ETKOW93gquYosYjmOayyRmX8=.6c21108b-7ce0-4270-b4c1-10d16928dea9@github.com> References: <-BaizreVcGJ9jz4D2W5ETKOW93gquYosYjmOayyRmX8=.6c21108b-7ce0-4270-b4c1-10d16928dea9@github.com> Message-ID: > Simple change of merging discovered list processing for four kinds of non-strong refs and some general cleanup around. > > Test: tier1-3 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5242/files - new: https://git.openjdk.java.net/jdk/pull/5242/files/cfcbad70..3650b5d6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5242&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5242&range=00-01 Stats: 7 lines in 1 file changed: 1 ins; 5 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5242.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5242/head:pull/5242 PR: https://git.openjdk.java.net/jdk/pull/5242 From mli at openjdk.java.net Tue Aug 31 14:37:44 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Tue, 31 Aug 2021 14:37:44 GMT Subject: RFR: 8254167 G1: Record regions where evacuation failed to provide targeted iteration [v4] In-Reply-To: References: Message-ID: <2rdaiC7TDwMQiq035XVEhD0kbYXiOc9EL3ZxogYI21o=.b1965f40-0ac7-4f2c-aec5-07bf2207d583@github.com> > This is another try to optimize evcuation failure for regions. > I record evacuation failed regions, and iterate these regions directly rather than iterate the whole cset. > The implementation reuses and refactors some of the existing data in g1CollectedHeap (_regions_failed_evacuation, _num_regions_failed_evacuation), and records these regions in an array, and iterate this array later in post evacuation phase. > > I have 2 implementations: > > - 1) CHT (Initial version) > - 2) bitmap (latest version, reuse _regions_failed_evacuation in g1CollectedHeap) > > This implementation does not consider work distribution as mentioned in JDK-8254167 yet. But seems it already get better&stable performance gain than origin. We could improve it further later if work distribution is necessary. > > I will attach the perf data in JBS. Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - merge from master - Fix missing header files - reuse original bitmap rather than CHT - merge with existing classes - Initial commit ------------- Changes: https://git.openjdk.java.net/jdk/pull/5272/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5272&range=03 Stats: 187 lines in 7 files changed: 169 ins; 11 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/5272.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5272/head:pull/5272 PR: https://git.openjdk.java.net/jdk/pull/5272 From fmatte at openjdk.java.net Tue Aug 31 14:54:56 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Tue, 31 Aug 2021 14:54:56 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 [v2] In-Reply-To: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: > This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp > > The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >> __ move(addr, tmp); > However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. > The fix is to check if it is is_oop() and call the mov appropriately. > > No issues found in local testing and Mach5 tier1-3 Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: 8272563: Possible assertion failure in CardTableBarrierSetC1 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5164/files - new: https://git.openjdk.java.net/jdk/pull/5164/files/374cd7b8..c023f4bc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5164&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5164&range=00-01 Stats: 6 lines in 1 file changed: 1 ins; 3 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5164.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5164/head:pull/5164 PR: https://git.openjdk.java.net/jdk/pull/5164 From fmatte at openjdk.java.net Tue Aug 31 14:59:27 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Tue, 31 Aug 2021 14:59:27 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 [v2] In-Reply-To: References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Tue, 31 Aug 2021 14:54:56 GMT, Fairoz Matte wrote: >> This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp >> >> The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >>> __ move(addr, tmp); >> However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. >> The fix is to check if it is is_oop() and call the mov appropriately. >> >> No issues found in local testing and Mach5 tier1-3 > > Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: > > 8272563: Possible assertion failure in CardTableBarrierSetC1 Hi Igor, Possibly you mean this, ` LIR_Opr addr_opr = LIR_OprFact::address(new LIR_Address(addr, addr->type())); __ leal(addr_opr, tmp);` I have updated the patch with this fix and there is no issue in hs-tier1 to hs-tier3 ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From tschatzl at openjdk.java.net Tue Aug 31 15:39:31 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 15:39:31 GMT Subject: RFR: 8254167 G1: Record regions where evacuation failed to provide targeted iteration [v4] In-Reply-To: <2rdaiC7TDwMQiq035XVEhD0kbYXiOc9EL3ZxogYI21o=.b1965f40-0ac7-4f2c-aec5-07bf2207d583@github.com> References: <2rdaiC7TDwMQiq035XVEhD0kbYXiOc9EL3ZxogYI21o=.b1965f40-0ac7-4f2c-aec5-07bf2207d583@github.com> Message-ID: On Tue, 31 Aug 2021 14:37:44 GMT, Hamlin Li wrote: >> This is another try to optimize evcuation failure for regions. >> I record evacuation failed regions, and iterate these regions directly rather than iterate the whole cset. >> The implementation reuses and refactors some of the existing data in g1CollectedHeap (_regions_failed_evacuation, _num_regions_failed_evacuation), and records these regions in an array, and iterate this array later in post evacuation phase. >> >> I have 2 implementations: >> >> - 1) CHT (Initial version) >> - 2) bitmap (latest version, reuse _regions_failed_evacuation in g1CollectedHeap) >> >> This implementation does not consider work distribution as mentioned in JDK-8254167 yet. But seems it already get better&stable performance gain than origin. We could improve it further later if work distribution is necessary. >> >> I will attach the perf data in JBS. > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - merge from master > - Fix missing header files > - reuse original bitmap rather than CHT > - merge with existing classes > - Initial commit Please also remove the ".g1CollectedHeap.hpp.swp" file accidentally committed. > This implementation does not consider work distribution as mentioned in JDK-8254167 yet. But seems it already get better&stable performance gain than origin. We could improve it further later if work distribution is necessary. I'll split out this request into a new CR to not forget; because while this is a good step forward, it would be good to know the actual number of objects (or bytes, or whatever) that failed evacuation to better distribute work. That can be added though when tackling that issue. Overall, it looks like what I expected ? src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 881: > 879: void preserve_mark_during_evac_failure(uint worker_id, oop obj, markWord m); > 880: > 881: void iterate_evacuation_failure_regions_par(HeapRegionClosure* closure, Please avoid additional clutter of `G1CollectedHeap`. Pass `_evac_failure_regions` to `post_evacuate_collection_set` and then down to `G1ParRemoveSelfForwardPtrsTask` where it calls `par_iterate` directly. I am currently moving stuff, particularly data structures only used in the young collector to a separate class (JDK-8253343). So while it is not avoidable to put `_evac_failure_regions` here, let's avoid further adding young-collector only stuff to `G1CollectedHeap`. I'll of course do those changes (i.e. moving code to `G1YoungCollector`) and further related cleanup when ready. src/hotspot/share/gc/g1/g1EvacFailure.cpp line 277: > 275: // otherwise unreachable object at the very end of the collection. That object > 276: // might cause an evacuation failure in any region in the collection set. > 277: // _g1h->collection_set_par_iterate_all(&rsfp_cl, &_hrclaimer, worker_id); Please remove this debug code. Also the comment above is obsolete: now that we know in which regions exactly the evacuation failure happened, this comment does not apply any more. src/hotspot/share/gc/g1/g1EvacuationFailureRegions.cpp line 1: > 1: /* It's a bit weird that all other similar files start with "g1EvacFailure" and this one with "g1EvacuationFailure" - please change according to existing style (or rename all others first in a separate CR). src/hotspot/share/gc/g1/g1EvacuationFailureRegions.cpp line 26: > 24: > 25: #include "precompiled.hpp" > 26: #include "g1EvacuationFailureRegions.hpp" should be `gc/g1/g1Evacu....` src/hotspot/share/gc/g1/g1EvacuationFailureRegions.cpp line 27: > 25: #include "precompiled.hpp" > 26: #include "g1EvacuationFailureRegions.hpp" > 27: #include "gc/g1/g1CollectedHeap.hpp" Includes should be sorted alphabetically. src/hotspot/share/gc/g1/g1EvacuationFailureRegions.cpp line 28: > 26: #include "g1EvacuationFailureRegions.hpp" > 27: #include "gc/g1/g1CollectedHeap.hpp" > 28: #include "gc/g1/g1CollectedHeap.inline.hpp" There is no need to include both `.inline.hpp` and `.hpp`. src/hotspot/share/gc/g1/g1EvacuationFailureRegions.cpp line 38: > 36: > 37: G1EvacuationFailureRegions::~G1EvacuationFailureRegions() { > 38: FREE_C_HEAP_ARRAY(uint, _evac_failure_regions); The include for `FREE_C_HEAP_ARRAY` is missing. src/hotspot/share/gc/g1/g1EvacuationFailureRegions.cpp line 43: > 41: void G1EvacuationFailureRegions::initialize() { > 42: Atomic::store(&_evac_failure_regions_cur_length, 0u); > 43: _max_regions = G1CollectedHeap::heap()->max_reserved_regions(); Please pass in `max_reserved_regions()` - this makes the dependency on it during initialization obvious. src/hotspot/share/gc/g1/g1EvacuationFailureRegions.cpp line 51: > 49: HeapRegionClaimer* _hrclaimer, > 50: uint worker_id) { > 51: assert_at_safepoint(); As far as I can see this is a verbatim copy of `G1CollectionSet::iterate_part_from` (with some minor simplifications). Not really happy about that, need to think more about how/if we can avoid this. src/hotspot/share/gc/g1/g1EvacuationFailureRegions.hpp line 35: > 33: > 34: // This class > 35: // 1. records for every region on the heap whether evacuation failed for it. Please write about the class without using a list; at most use an unordered list as there seems to be no ordering here (i.e. at most use bullets). I strongly prefer just two regular sentences. src/hotspot/share/gc/g1/g1EvacuationFailureRegions.hpp line 40: > 38: class G1EvacuationFailureRegions { > 39: > 40: private: unnecessary `private` and newline. src/hotspot/share/gc/g1/g1EvacuationFailureRegions.hpp line 41: > 39: > 40: private: > 41: CHeapBitMap _regions_failed_evacuation; Please keep the existing comments (from `G1CollectedHeap`). src/hotspot/share/gc/g1/g1EvacuationFailureRegions.hpp line 51: > 49: void initialize(); > 50: void reset(); > 51: bool contains(uint region_idx) const; Please add a newline between initialization/reset stuff and other methods. src/hotspot/share/gc/g1/g1EvacuationFailureRegions.hpp line 72: > 70: }; > 71: > 72: Superfluous newline ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5272 From tschatzl at openjdk.java.net Tue Aug 31 15:39:33 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 15:39:33 GMT Subject: RFR: 8254167 G1: Record regions where evacuation failed to provide targeted iteration [v4] In-Reply-To: References: <2rdaiC7TDwMQiq035XVEhD0kbYXiOc9EL3ZxogYI21o=.b1965f40-0ac7-4f2c-aec5-07bf2207d583@github.com> Message-ID: On Tue, 31 Aug 2021 15:15:28 GMT, Thomas Schatzl wrote: >> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - merge from master >> - Fix missing header files >> - reuse original bitmap rather than CHT >> - merge with existing classes >> - Initial commit > > src/hotspot/share/gc/g1/g1EvacuationFailureRegions.cpp line 26: > >> 24: >> 25: #include "precompiled.hpp" >> 26: #include "g1EvacuationFailureRegions.hpp" > > should be `gc/g1/g1Evacu....` Please separate precompiled.hpp and the others with a newline. ------------- PR: https://git.openjdk.java.net/jdk/pull/5272 From tschatzl at openjdk.java.net Tue Aug 31 15:49:32 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 15:49:32 GMT Subject: Integrated: 8273144: Remove unused top level "Sample Collection Set Candidates" logging In-Reply-To: References: Message-ID: <46bZTe4ONSoYfhqAlQ5DDeCfq2SFcTDgyWHS7koPM8o=.f3b23dbf-7f51-4483-bec1-ffe4d65f5e6e@github.com> On Mon, 30 Aug 2021 15:38:50 GMT, Thomas Schatzl wrote: > Hi all, > > please review this trivial(?) removal of a log message and associated unused code. > > This is a leftover from JDK-8017163 and the merge with the BatchedGangTask API (there is a "Sample CSet Candidates" log which is used in post evacuation phase 2). > > Testing: changed TestGCLogMessages test case, compilation > > Thanks, > Thomas This pull request has now been integrated. Changeset: ba3587e5 Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/ba3587e524aeec43a0c4174ddd96b8890a34fa36 Stats: 10 lines in 3 files changed: 0 ins; 10 del; 0 mod 8273144: Remove unused top level "Sample Collection Set Candidates" logging Reviewed-by: iwalulya, ayang ------------- PR: https://git.openjdk.java.net/jdk/pull/5304 From "github.com+73116608+openjdk-notifier[bot]" at openjdk.java.net Tue Aug 31 15:51:52 2021 From: "github.com+73116608+openjdk-notifier[bot]" at openjdk.java.net (openjdk-notifier [bot]) Date: Tue, 31 Aug 2021 15:51:52 GMT Subject: RFR: 8273147: Update and restructure TestGCLogMessages log message list In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 17:01:17 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that updates the TestGCLogMessages file for more or less recently added log messages. > > Test: executing test case > > Thanks, > Thomas The dependent pull request has now been integrated, and the target branch of this pull request has been updated. This means that changes from the dependent pull request can start to show up as belonging to this pull request, which may be confusing for reviewers. To remedy this situation, simply merge the latest changes from the new target branch into this pull request by running commands similar to these in the local repository for your personal fork: git checkout submit/8273144-sample-collection-set-log git fetch https://git.openjdk.java.net/jdk master git merge FETCH_HEAD # if there are conflicts, follow the instructions given by git merge git commit -m "Merge master" git push ------------- PR: https://git.openjdk.java.net/jdk/pull/5306 From tschatzl at openjdk.java.net Tue Aug 31 15:51:51 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 15:51:51 GMT Subject: RFR: 8273147: Update and restructure TestGCLogMessages log message list [v2] In-Reply-To: References: Message-ID: <6tZQ3kIMjt6EAo8w3H5rqAJbUKE-qUpNi1UTZQCBwzo=.cba4aa66-43a8-4409-9520-4dcbafc5dfb2@github.com> > Hi all, > > please review this change that updates the TestGCLogMessages file for more or less recently added log messages. > > Test: executing test case > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5306/files - new: https://git.openjdk.java.net/jdk/pull/5306/files/21168c45..21168c45 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5306&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5306&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5306.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5306/head:pull/5306 PR: https://git.openjdk.java.net/jdk/pull/5306 From tschatzl at openjdk.java.net Tue Aug 31 16:07:58 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 16:07:58 GMT Subject: RFR: 8273147: Update and restructure TestGCLogMessages log message list [v3] In-Reply-To: References: Message-ID: > Hi all, > > please review this change that updates the TestGCLogMessages file for more or less recently added log messages. > > Test: executing test case > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Fix comment after accidental removal during merge - Merge master - Add messages for parallel concurrent mark start initiation phase - Fixes - Remove unused log entry ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5306/files - new: https://git.openjdk.java.net/jdk/pull/5306/files/21168c45..a8f27ca1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5306&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5306&range=01-02 Stats: 3874 lines in 102 files changed: 1263 ins; 1231 del; 1380 mod Patch: https://git.openjdk.java.net/jdk/pull/5306.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5306/head:pull/5306 PR: https://git.openjdk.java.net/jdk/pull/5306 From tschatzl at openjdk.java.net Tue Aug 31 16:15:29 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 16:15:29 GMT Subject: RFR: 8254739: G1: Optimize evacuation failure for regions with few failed objects [v7] In-Reply-To: References: Message-ID: On Thu, 26 Aug 2021 10:04:54 GMT, Hamlin Li wrote: >> This is a try to optimize evcuation failure for regions. >> I record every evacuation failure object per region (by G1EvacuationFailureObjsInHR), and then iterate (which indeed includes compact/sort/iteration) these objects directly in RemoveSelfForwardPtrHRClosure. >> >> I have tested it with following parameters, >> >> - -XX:+ParallelGCThreads=1/32/64 >> - -XX:G1EvacuationFailureALotInterval=1 >> - -XX:G1EvacuationFailureALotCount=2/10/100/1000/10000/100000 >> >> It saves "Remove Self Forwards" time all the time ,and in most condition it saves "Evacuate Collection Set" time. >> >> It brings some performance degradation when -XX:G1EvacuationFailureALotCount is low, such as *2*. To improve this a little, we can record the number evacuation failure object per region, and not record these objects when the number hit some limit. But I'm not sure if it's necessary to do so, as I think such condition is so extreme to be met in real environment, although I'm not quite sure. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Fix compilation error on windows Just fyi, I am looking at the change, but it's a relatively significant patch and I want to do some local testing too. ------------- PR: https://git.openjdk.java.net/jdk/pull/5181 From tschatzl at openjdk.java.net Tue Aug 31 16:15:36 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 16:15:36 GMT Subject: RFR: 8273186: Remove leftover comment about sparse remembered set in G1 HeapRegionRemSet Message-ID: Hi all, please review this trivial one-line obsolete comment about sparse remembered sets which do not exist any more. Testing: local compilation Thanks, Thomas ------------- Commit messages: - Remove obsolete Changes: https://git.openjdk.java.net/jdk/pull/5319/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5319&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273186 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5319.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5319/head:pull/5319 PR: https://git.openjdk.java.net/jdk/pull/5319 From ayang at openjdk.java.net Tue Aug 31 16:24:25 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 31 Aug 2021 16:24:25 GMT Subject: RFR: 8273186: Remove leftover comment about sparse remembered set in G1 HeapRegionRemSet In-Reply-To: References: Message-ID: On Tue, 31 Aug 2021 16:08:43 GMT, Thomas Schatzl wrote: > Hi all, > > please review this trivial one-line obsolete comment about sparse remembered sets which do not exist any more. > > Testing: local compilation > > Thanks, > Thomas Trivial. ------------- Marked as reviewed by ayang (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5319 From iveresov at openjdk.java.net Tue Aug 31 17:00:35 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Tue, 31 Aug 2021 17:00:35 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 [v2] In-Reply-To: References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Tue, 31 Aug 2021 14:54:56 GMT, Fairoz Matte wrote: >> This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp >> >> The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >>> __ move(addr, tmp); >> However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. >> The fix is to check if it is is_oop() and call the mov appropriately. >> >> No issues found in local testing and Mach5 tier1-3 > > Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: > > 8272563: Possible assertion failure in CardTableBarrierSetC1 Good. You don't actually need to check if it's an oop or not. You can just do `leal` for all the types. ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From tschatzl at openjdk.java.net Tue Aug 31 19:45:02 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 19:45:02 GMT Subject: RFR: 8273186: Remove leftover comment about sparse remembered set in G1 HeapRegionRemSet In-Reply-To: References: Message-ID: On Tue, 31 Aug 2021 16:21:33 GMT, Albert Mingkun Yang wrote: >> Hi all, >> >> please review this trivial one-line obsolete comment about sparse remembered sets which do not exist any more. >> >> Testing: local compilation >> >> Thanks, >> Thomas > > Trivial. Thanks @albertnetymk ------------- PR: https://git.openjdk.java.net/jdk/pull/5319 From tschatzl at openjdk.java.net Tue Aug 31 19:50:23 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 31 Aug 2021 19:50:23 GMT Subject: Integrated: 8273186: Remove leftover comment about sparse remembered set in G1 HeapRegionRemSet In-Reply-To: References: Message-ID: On Tue, 31 Aug 2021 16:08:43 GMT, Thomas Schatzl wrote: > Hi all, > > please review this trivial one-line obsolete comment about sparse remembered sets which do not exist any more. > > Testing: local compilation > > Thanks, > Thomas This pull request has now been integrated. Changeset: c1e0aac8 Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/c1e0aac846861f9bd8a23818a21670a2f649631b Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8273186: Remove leftover comment about sparse remembered set in G1 HeapRegionRemSet Reviewed-by: ayang ------------- PR: https://git.openjdk.java.net/jdk/pull/5319 From github.com+10835776+stsypanov at openjdk.java.net Tue Aug 31 20:04:23 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Tue, 31 Aug 2021 20:04:23 GMT Subject: RFR: 8273140: Replace usages of Enum.class.getEnumConstants() with Enum.values() where possible [v4] In-Reply-To: <4N-BgoVLVgZk_uPkRLoY-427bF-kKKwDr1zRHI7o1PI=.dcdbec6e-f3f8-48e4-94ce-a88d7e494c41@github.com> References: <4N-BgoVLVgZk_uPkRLoY-427bF-kKKwDr1zRHI7o1PI=.dcdbec6e-f3f8-48e4-94ce-a88d7e494c41@github.com> Message-ID: > Just a very tiny clean-up. > > There are some places in JDK code base where we call `Enum.class.getEnumConstants()` to get all the values of the referenced `enum`. This is excessive, less-readable and slower than just calling `Enum.values()` as in `getEnumConstants()` we have volatile access: > > public T[] getEnumConstants() { > T[] values = getEnumConstantsShared(); > return (values != null) ? values.clone() : null; > } > > private transient volatile T[] enumConstants; > > T[] getEnumConstantsShared() { > T[] constants = enumConstants; > if (constants == null) { /* ... */ } > return constants; > } > > Calling values() method is slightly faster: > > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) > public class EnumBenchmark { > > @Benchmark > public Enum[] values() { > return Enum.values(); > } > > @Benchmark > public Enum[] getEnumConstants() { > return Enum.class.getEnumConstants(); > } > > private enum Enum { > A, > B > } > } > > > Benchmark Mode Cnt Score Error Units > EnumBenchmark.getEnumConstants avgt 15 6,265 ? 0,051 ns/op > EnumBenchmark.getEnumConstants:?gc.alloc.rate avgt 15 2434,075 ? 19,568 MB/sec > EnumBenchmark.getEnumConstants:?gc.alloc.rate.norm avgt 15 24,002 ? 0,001 B/op > EnumBenchmark.getEnumConstants:?gc.churn.G1_Eden_Space avgt 15 2433,709 ? 70,216 MB/sec > EnumBenchmark.getEnumConstants:?gc.churn.G1_Eden_Space.norm avgt 15 23,998 ? 0,659 B/op > EnumBenchmark.getEnumConstants:?gc.churn.G1_Survivor_Space avgt 15 0,009 ? 0,003 MB/sec > EnumBenchmark.getEnumConstants:?gc.churn.G1_Survivor_Space.norm avgt 15 ? 10?? B/op > EnumBenchmark.getEnumConstants:?gc.count avgt 15 210,000 counts > EnumBenchmark.getEnumConstants:?gc.time avgt 15 119,000 ms > > EnumBenchmark.values avgt 15 4,164 ? 0,134 ns/op > EnumBenchmark.values:?gc.alloc.rate avgt 15 3665,341 ? 120,721 MB/sec > EnumBenchmark.values:?gc.alloc.rate.norm avgt 15 24,002 ? 0,001 B/op > EnumBenchmark.values:?gc.churn.G1_Eden_Space avgt 15 3660,512 ? 137,250 MB/sec > EnumBenchmark.values:?gc.churn.G1_Eden_Space.norm avgt 15 23,972 ? 0,529 B/op > EnumBenchmark.values:?gc.churn.G1_Survivor_Space avgt 15 0,017 ? 0,003 MB/sec > EnumBenchmark.values:?gc.churn.G1_Survivor_Space.norm avgt 15 ? 10?? B/op > EnumBenchmark.values:?gc.count avgt 15 262,000 counts > EnumBenchmark.values:?gc.time avgt 15 155,000 ms ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: 8273140: Fix copyright year ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5303/files - new: https://git.openjdk.java.net/jdk/pull/5303/files/171ecd4e..a5f58fe2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5303&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5303&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5303.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5303/head:pull/5303 PR: https://git.openjdk.java.net/jdk/pull/5303 From github.com+10835776+stsypanov at openjdk.java.net Tue Aug 31 20:04:30 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Tue, 31 Aug 2021 20:04:30 GMT Subject: RFR: 8273140: Replace usages of Enum.class.getEnumConstants() with Enum.values() where possible [v3] In-Reply-To: References: <4N-BgoVLVgZk_uPkRLoY-427bF-kKKwDr1zRHI7o1PI=.dcdbec6e-f3f8-48e4-94ce-a88d7e494c41@github.com> Message-ID: On Mon, 30 Aug 2021 19:28:11 GMT, Kevin Rushforth wrote: >> ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: >> >> 8273140: Fix copyright year > > src/java.desktop/share/classes/sun/font/EAttribute.java line 2: > >> 1: /* >> 2: * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved. > > You need to also fix this copyright year. It must be `2005, 2021,` Right! Fixed > src/java.sql/share/classes/java/sql/JDBCType.java line 2: > >> 1: /* >> 2: * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved. > > This one, too. Done ------------- PR: https://git.openjdk.java.net/jdk/pull/5303 From jiefu at openjdk.java.net Tue Aug 31 23:16:53 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 31 Aug 2021 23:16:53 GMT Subject: RFR: 8159979: During initial mark, preparing all regions for marking may take a significant amount of time [v2] In-Reply-To: References: <1aCyJ3XrBfjWFjudPPWbw3mZavoz1nhkQRdzJ1xGOzk=.7aab48b5-233e-4917-864b-a4d4b7c26dd4@github.com> Message-ID: On Mon, 30 Aug 2021 15:18:49 GMT, Ivan Walulya wrote: >> Hi all, >> >> Please review this change to move the work done during preparation for a concurrent mark start (pre_concurrent_start) into a G1BatchedGangTask. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > typo One test fails after this integration: https://github.com/openjdk/jdk/pull/5322 ------------- PR: https://git.openjdk.java.net/jdk/pull/5302