From shade at openjdk.org Mon May 1 08:11:55 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 May 2023 08:11:55 GMT Subject: Integrated: Cleanup NumberSeq additions In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 15:31:58 GMT, Aleksey Shipilev wrote: > I have been looking into how to decrease our upstream exposure for `NumberSeq` changes, and it seems to be not solvable in an easy way. Normally, I would add `AbsSeq::add(AbsSeq& other)` methods and get to that from `HdrSeq::add(HdrSeq& other)`, and that almost works. The remaining bit is figuring out the decaying average/variance thingie. > > I spent an hour trying to come up with easily provable equations for _combining_ the decaying average/variance. The decaying average is somewhat easy, although there are boundary problems (decaying average treats 0-th element without adjustments, which requires remembering the `_last` element to compensate). Decaying variance looks remarkably scary in the few of my approaches. > > If we enter with current code upstream, it would be the uphill battle to implemented decays and prove they work. So instead, I took the alternative approach of doing everything inside Shenandoah code, except the decays. This reverts `numberSeq.*` to upstream state. > > Additional testing: > - [x] Linux x86_64 fastdebug `hotspot_gc_shenandoah` > - [x] Linux AArch64 fastdebug `hotspot_gc_shenandoah` > - [x] GitFarm This pull request has now been integrated. Changeset: 58fa1410 Author: Aleksey Shipilev URL: https://git.openjdk.org/shenandoah/commit/58fa1410a8ad1462cbe7c035b7d64f44a0b62c85 Stats: 187 lines in 7 files changed: 71 ins; 69 del; 47 mod Cleanup NumberSeq additions Reviewed-by: ysr, rkennke ------------- PR: https://git.openjdk.org/shenandoah/pull/268 From ysr at openjdk.org Mon May 1 16:48:31 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 1 May 2023 16:48:31 GMT Subject: RFR: Add generations to freeset [v20] In-Reply-To: References: <5f9OPGjMBja6ARwdpMrfEbycs8zx1VVUGbGnaKl6mtY=.edefcf13-4893-4137-b325-957bf2dd1dc5@github.com> <1PLi18Uydgq4shk2jZhTz6HsOLy5MLJMmYLwM1Zhaas=.0625559b-cb74-4aab-8156-69b364d7d1e1@github.com> Message-ID: On Sat, 29 Apr 2023 13:47:24 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 32: >> >>> 30: #include "gc/shenandoah/shenandoahHeap.hpp" >>> 31: >>> 32: enum MemoryReserve : uint8_t { >> >> Would it tidy up matters to move `ShenandoahSetsOfFree` into its own source file(s)? I realize it's only used by `ShenandoahFreeSet`, but I think it'll keep it all much cleaner. > > My preference for expedience is to keep as is. But if you feel strongly, I'll separate it out. That' fine. Let's leave as is for now, for the reason you state. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/250#discussion_r1181698110 From ysr at openjdk.org Mon May 1 16:48:34 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 1 May 2023 16:48:34 GMT Subject: RFR: Add generations to freeset [v20] In-Reply-To: References: <5f9OPGjMBja6ARwdpMrfEbycs8zx1VVUGbGnaKl6mtY=.edefcf13-4893-4137-b325-957bf2dd1dc5@github.com> <1PLi18Uydgq4shk2jZhTz6HsOLy5MLJMmYLwM1Zhaas=.0625559b-cb74-4aab-8156-69b364d7d1e1@github.com> Message-ID: On Sat, 29 Apr 2023 13:45:09 GMT, Kelvin Nilsen wrote: >> Hmm, wait, the only non-public method is `clear_internal` which is anyway exposed via `clear_all`. >> >> In that case, is the friend accessing a private data member field? >> >> Or perhaps the friend annotation is now obsolete from an earlier round of development and must now be removed. > > You are correct that I no longer need the friend class. In a previous version of the code, I had an assert on ShenandoahFreeSets::clear_all() that was not appropriate for all calls (in particular the initialization call) from ShenandoahFreeSet::clear_internal(). I'll make this change. (hmmm. Or should I put that assert_heaplocked()) back into ShenandoahFreeSets::clear_all()?) ok. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/250#discussion_r1181698676 From ysr at openjdk.org Mon May 1 17:00:24 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 1 May 2023 17:00:24 GMT Subject: RFR: Add generations to freeset [v21] In-Reply-To: References: Message-ID: <1l-dBu7t6TpOlrtIQ0AV9paM5vREN8H0et3rTs5BYl4=.0021fa79-65ab-4fb2-b959-c095cab1261f@github.com> On Sat, 29 Apr 2023 14:22:53 GMT, Kelvin Nilsen wrote: >> ShenandoahFreeSet has not yet been modified to deal efficiently with the combination of old-gen and young-gen collection set reserves. This PR makes changes so that we can distinguish between collector_is_free, old_collector_is_free, and mutator_is_free. Further, it endeavors to keep each set of free regions tightly packed, so the range of regions representing each set is small. >> >> In its current form, this no longer fails existing regression tests (except for known problems that are being addressed independently) > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Respond to reviewer feedback I didn't go through all of the details of the allocation strategies to reduce entropy of memory types across the range of regions etc., but this looks good otherwise. Would you be able to share some statistics or performance numbers that indicate the improvement here, either using range statistics or just raw allocation performance (say using a hyperalloc/extremem workload to compare)? Thanks for patiently responding to my nitpickings during the review process! src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 272: > 270: // request. Regions become sparsely distributed following a Full GC, which tends to slide all regions to the front of the > 271: // heap rather than allowing survivor regions to remain at the high end of the heap where we intend for them to congregate. > 272: // In the future, we may modify Full GC so that it slides old objects to the end of the heap and young objects to the start Mark with a `TODO:` ? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 92: > 90: // In other words: > 91: // if the requested which_set is empty: > 92: // left_most() and left_most_empty() return _max, right_most() and right_most_empty() return 0 I think you an elide the "_" underscore in left_most (leftmost is an English word). It also makes `leftmost_empty` easier to read. Would be great if you can make that global substitution in your comments etc. as well before checking in. (Or make it all have the underscore consistently in comments too if you prefer the variant with the underscore.) src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 124: > 122: > 123: // Assure leftmost, rightmost, leftmost_empty, and rightmost_empty bounds are valid for all free sets. > 124: // valid bounds honor all of the following (where max is the number of heap regions): Valid (upper-case V). ------------- Marked as reviewed by ysr (Author). PR Review: https://git.openjdk.org/shenandoah/pull/250#pullrequestreview-1407804254 PR Review Comment: https://git.openjdk.org/shenandoah/pull/250#discussion_r1181709565 PR Review Comment: https://git.openjdk.org/shenandoah/pull/250#discussion_r1181705647 PR Review Comment: https://git.openjdk.org/shenandoah/pull/250#discussion_r1181700452 From wkemper at openjdk.org Mon May 1 18:49:35 2023 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 May 2023 18:49:35 GMT Subject: RFR: Add generations to freeset [v21] In-Reply-To: References: Message-ID: On Sat, 29 Apr 2023 14:22:53 GMT, Kelvin Nilsen wrote: >> ShenandoahFreeSet has not yet been modified to deal efficiently with the combination of old-gen and young-gen collection set reserves. This PR makes changes so that we can distinguish between collector_is_free, old_collector_is_free, and mutator_is_free. Further, it endeavors to keep each set of free regions tightly packed, so the range of regions representing each set is small. >> >> In its current form, this no longer fails existing regression tests (except for known problems that are being addressed independently) > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Respond to reviewer feedback Changes requested by wkemper (Committer). src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 32: > 30: #include "gc/shenandoah/shenandoahHeap.hpp" > 31: > 32: enum FreeMemoryType : uint8_t { Globally visible type should `Shenandoah` prefix. ------------- PR Review: https://git.openjdk.org/shenandoah/pull/250#pullrequestreview-1407945260 PR Review Comment: https://git.openjdk.org/shenandoah/pull/250#discussion_r1181794878 From wkemper at openjdk.org Mon May 1 20:50:22 2023 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 May 2023 20:50:22 GMT Subject: RFR: 8305767: HdrSeq: support for a merge() method In-Reply-To: References: Message-ID: <157rPNsezs9_2jcBSZisxHBMOHF0CHOEz2Ga0LKYd6s=.ef8ae0c2-5700-4f82-9cf9-00f2cb0a4fb5@github.com> On Fri, 7 Apr 2023 23:03:02 GMT, William Kemper wrote: > A merge functionality on stats (distributions) was needed for the remembered set scan that I was using in some companion work. This PR implements a first cut at that, which is sufficient for our first (and only) use case. > > Unfortunately, for expediency, I am deferring work on decaying statistics, as a result of which users that want decaying statistics will get NaNs instead (or trigger guarantees). We are abandoning this change in favor of https://github.com/openjdk/shenandoah/pull/268 - which doesn't change any shared code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13395#issuecomment-1530229190 From wkemper at openjdk.org Mon May 1 20:50:23 2023 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 May 2023 20:50:23 GMT Subject: Withdrawn: 8305767: HdrSeq: support for a merge() method In-Reply-To: References: Message-ID: On Fri, 7 Apr 2023 23:03:02 GMT, William Kemper wrote: > A merge functionality on stats (distributions) was needed for the remembered set scan that I was using in some companion work. This PR implements a first cut at that, which is sufficient for our first (and only) use case. > > Unfortunately, for expediency, I am deferring work on decaying statistics, as a result of which users that want decaying statistics will get NaNs instead (or trigger guarantees). This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13395 From kdnilsen at openjdk.org Mon May 1 22:04:30 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 1 May 2023 22:04:30 GMT Subject: RFR: Add generations to freeset [v22] In-Reply-To: References: Message-ID: <8xQbCuJmQmDbjmYRux95s6VUxREQiC9xAPi94Dlzav0=.22a72e78-6516-4025-8430-5c9903ec8433@github.com> > ShenandoahFreeSet has not yet been modified to deal efficiently with the combination of old-gen and young-gen collection set reserves. This PR makes changes so that we can distinguish between collector_is_free, old_collector_is_free, and mutator_is_free. Further, it endeavors to keep each set of free regions tightly packed, so the range of regions representing each set is small. > > In its current form, this no longer fails existing regression tests (except for known problems that are being addressed independently) Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: More reviewer feedback responses Remove _ from left_most and right_most variable names, Rename MemoryFreeType to ShenandoahMemoryFreeType. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/250/files - new: https://git.openjdk.org/shenandoah/pull/250/files/08917880..d967f503 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=250&range=21 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=250&range=20-21 Stats: 175 lines in 2 files changed: 0 ins; 0 del; 175 mod Patch: https://git.openjdk.org/shenandoah/pull/250.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/250/head:pull/250 PR: https://git.openjdk.org/shenandoah/pull/250 From kdnilsen at openjdk.org Mon May 1 22:10:00 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 1 May 2023 22:10:00 GMT Subject: RFR: Add generations to freeset [v23] In-Reply-To: References: Message-ID: > ShenandoahFreeSet has not yet been modified to deal efficiently with the combination of old-gen and young-gen collection set reserves. This PR makes changes so that we can distinguish between collector_is_free, old_collector_is_free, and mutator_is_free. Further, it endeavors to keep each set of free regions tightly packed, so the range of regions representing each set is small. > > In its current form, this no longer fails existing regression tests (except for known problems that are being addressed independently) Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Reviewer feedback: fix capitalization in comment - Reviewer feedback: add a TODO comment ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/250/files - new: https://git.openjdk.org/shenandoah/pull/250/files/d967f503..b112fdd9 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=250&range=22 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=250&range=21-22 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/shenandoah/pull/250.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/250/head:pull/250 PR: https://git.openjdk.org/shenandoah/pull/250 From kdnilsen at openjdk.org Mon May 1 22:10:06 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 1 May 2023 22:10:06 GMT Subject: RFR: Add generations to freeset [v21] In-Reply-To: <1l-dBu7t6TpOlrtIQ0AV9paM5vREN8H0et3rTs5BYl4=.0021fa79-65ab-4fb2-b959-c095cab1261f@github.com> References: <1l-dBu7t6TpOlrtIQ0AV9paM5vREN8H0et3rTs5BYl4=.0021fa79-65ab-4fb2-b959-c095cab1261f@github.com> Message-ID: On Mon, 1 May 2023 16:43:49 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Respond to reviewer feedback > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 272: > >> 270: // request. Regions become sparsely distributed following a Full GC, which tends to slide all regions to the front of the >> 271: // heap rather than allowing survivor regions to remain at the high end of the heap where we intend for them to congregate. >> 272: // In the future, we may modify Full GC so that it slides old objects to the end of the heap and young objects to the start > > Mark with a `TODO:` ? Thanks. Done. > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 92: > >> 90: // In other words: >> 91: // if the requested which_set is empty: >> 92: // left_most() and left_most_empty() return _max, right_most() and right_most_empty() return 0 > > I think you an elide the "_" underscore in left_most (leftmost is an English word). It also makes `leftmost_empty` easier to read. > > Would be great if you can make that global substitution in your comments etc. as well before checking in. (Or make it all have the underscore consistently in comments too if you prefer the variant with the underscore.) Thanks. Done. > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 124: > >> 122: >> 123: // Assure leftmost, rightmost, leftmost_empty, and rightmost_empty bounds are valid for all free sets. >> 124: // valid bounds honor all of the following (where max is the number of heap regions): > > Valid (upper-case V). Thanks. Done. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/250#discussion_r1181926504 PR Review Comment: https://git.openjdk.org/shenandoah/pull/250#discussion_r1181927354 PR Review Comment: https://git.openjdk.org/shenandoah/pull/250#discussion_r1181927451 From kdnilsen at openjdk.org Mon May 1 22:10:09 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 1 May 2023 22:10:09 GMT Subject: RFR: Add generations to freeset [v21] In-Reply-To: References: Message-ID: <70iQDTH-yueUsiP_gm1kxwMIjOQwAOwFgfY6tHhEObw=.4a0e0340-6e16-4d2a-8adb-af094b464c28@github.com> On Mon, 1 May 2023 18:45:20 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Respond to reviewer feedback > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 32: > >> 30: #include "gc/shenandoah/shenandoahHeap.hpp" >> 31: >> 32: enum FreeMemoryType : uint8_t { > > Globally visible type should have `Shenandoah` prefix. Thanks. Done. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/250#discussion_r1181926593 From kdnilsen at openjdk.org Mon May 1 22:10:12 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 1 May 2023 22:10:12 GMT Subject: RFR: Add generations to freeset [v20] In-Reply-To: <1PLi18Uydgq4shk2jZhTz6HsOLy5MLJMmYLwM1Zhaas=.0625559b-cb74-4aab-8156-69b364d7d1e1@github.com> References: <5f9OPGjMBja6ARwdpMrfEbycs8zx1VVUGbGnaKl6mtY=.edefcf13-4893-4137-b325-957bf2dd1dc5@github.com> <1PLi18Uydgq4shk2jZhTz6HsOLy5MLJMmYLwM1Zhaas=.0625559b-cb74-4aab-8156-69b364d7d1e1@github.com> Message-ID: On Sat, 29 Apr 2023 00:10:39 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix white space > > src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 44: > >> 42: >> 43: private: >> 44: size_t _max; > > What does `_max` hold? Adding a comment: num_regions ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/250#discussion_r1181927673 From ysr at openjdk.org Mon May 1 22:24:10 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 1 May 2023 22:24:10 GMT Subject: RFR: Add generations to freeset [v23] In-Reply-To: References: Message-ID: On Mon, 1 May 2023 22:10:00 GMT, Kelvin Nilsen wrote: >> ShenandoahFreeSet has not yet been modified to deal efficiently with the combination of old-gen and young-gen collection set reserves. This PR makes changes so that we can distinguish between collector_is_free, old_collector_is_free, and mutator_is_free. Further, it endeavors to keep each set of free regions tightly packed, so the range of regions representing each set is small. >> >> In its current form, this no longer fails existing regression tests (except for known problems that are being addressed independently) > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - Reviewer feedback: fix capitalization in comment > - Reviewer feedback: add a TODO comment Marked as reviewed by ysr (Author). ------------- PR Review: https://git.openjdk.org/shenandoah/pull/250#pullrequestreview-1408180513 From wkemper at openjdk.org Mon May 1 23:00:09 2023 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 May 2023 23:00:09 GMT Subject: RFR: Add generations to freeset [v23] In-Reply-To: References: Message-ID: On Mon, 1 May 2023 22:10:00 GMT, Kelvin Nilsen wrote: >> ShenandoahFreeSet has not yet been modified to deal efficiently with the combination of old-gen and young-gen collection set reserves. This PR makes changes so that we can distinguish between collector_is_free, old_collector_is_free, and mutator_is_free. Further, it endeavors to keep each set of free regions tightly packed, so the range of regions representing each set is small. >> >> In its current form, this no longer fails existing regression tests (except for known problems that are being addressed independently) > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - Reviewer feedback: fix capitalization in comment > - Reviewer feedback: add a TODO comment Marked as reviewed by wkemper (Committer). ------------- PR Review: https://git.openjdk.org/shenandoah/pull/250#pullrequestreview-1408202921 From kdnilsen at openjdk.org Mon May 1 23:47:55 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 1 May 2023 23:47:55 GMT Subject: RFR: DRAFT: Expand old on demand [v28] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: No need to distinguish generational full gc verification ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/d8f69909..1ca9619d Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=27 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=26-27 Stats: 20 lines in 3 files changed: 0 ins; 19 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Tue May 2 00:00:01 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 May 2023 00:00:01 GMT Subject: RFR: DRAFT: Expand old on demand [v29] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix a comment re: max_young_evacuation_reserve ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/1ca9619d..92271b5c Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=28 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=27-28 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Tue May 2 00:45:50 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 May 2023 00:45:50 GMT Subject: RFR: DRAFT: Expand old on demand [v30] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Remove iteator over mutator threads as not needed ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/92271b5c..d6a2b2f8 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=29 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=28-29 Stats: 7 lines in 2 files changed: 0 ins; 7 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Tue May 2 00:55:11 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 May 2023 00:55:11 GMT Subject: Integrated: Add generations to freeset In-Reply-To: References: Message-ID: On Sun, 9 Apr 2023 22:11:36 GMT, Kelvin Nilsen wrote: > ShenandoahFreeSet has not yet been modified to deal efficiently with the combination of old-gen and young-gen collection set reserves. This PR makes changes so that we can distinguish between collector_is_free, old_collector_is_free, and mutator_is_free. Further, it endeavors to keep each set of free regions tightly packed, so the range of regions representing each set is small. > > In its current form, this no longer fails existing regression tests (except for known problems that are being addressed independently) This pull request has now been integrated. Changeset: 6f71ef60 Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/6f71ef6092d257df51d45baf0640c8ab85318898 Stats: 991 lines in 8 files changed: 722 ins; 109 del; 160 mod Add generations to freeset Reviewed-by: ysr, wkemper ------------- PR: https://git.openjdk.org/shenandoah/pull/250 From mdoerr at openjdk.org Tue May 2 09:51:47 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 May 2023 09:51:47 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v25] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: - Adaptation for JDK-8303002. - Merge remote-tracking branch 'origin' into PPC64_Panama - Revert unintended formatting changes. Fix comment. - Enable remaining foreign tests. - Adaptations for JDK-8304265. - Merge remote-tracking branch 'origin' into PPC64_Panama - Adaptation for JDK-8305668 - Merge remote-tracking branch 'origin' into PPC64_Panama - Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. - Adaptation for JDK-8303022. - ... and 20 more: https://git.openjdk.org/jdk/compare/860bf9b3...f5e22be0 ------------- Changes: https://git.openjdk.org/jdk/pull/12708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=24 Stats: 2465 lines in 69 files changed: 2348 ins; 1 del; 116 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From shade at openjdk.org Tue May 2 12:07:29 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 May 2023 12:07:29 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v15] In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 19:34:53 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Some more changes by @shipilev I think we are very close. Another round of review: src/hotspot/share/gc/shared/gcForwarding.hpp line 39: > 37: > 38: public: > 39: static void initialize(MemRegion heap, size_t region_size_words_shift); Suggestion: static void initialize(MemRegion heap, size_t region_size_words); src/hotspot/share/gc/shared/genCollectedHeap.cpp line 136: > 134: GCInitLogger::print(); > 135: > 136: GCForwarding::initialize(_reserved, SpaceAlignment); The second argument is not "shift" anymore, right? So this should be the actual reserved space size? src/hotspot/share/gc/shared/slidingForwarding.cpp line 131: > 129: assert(val < TABLE_SIZE, "must fit in table: val: " UINT64_FORMAT ", table-size: " UINTX_FORMAT ", table-size-bits: %d", > 130: val, TABLE_SIZE, log2i_exact(TABLE_SIZE)); > 131: return static_cast(val); Want to cast first, and _then_ assert, maybe? src/hotspot/share/gc/shared/slidingForwarding.hpp line 68: > 66: * ^------------------------------------------- alternate region select > 67: * ^----------------------------------------- in-region offset > 68: * ^----------------------- compressed class pointer (not handled, but also *not touched* by this code) I think we can invert these: * 64 32 0 * [........................|OOOOOOOOOOOOOOO|A|F|TT] * ^--- normal lock bits, would record "object is forwarded" * ^----- fallback bit (explained below) * ^------- alternate region select * ^----------------------- in-region offset * ^------------------------------------------------ protected area, *not touched* by this code, useful for * compressed class pointer with compact object headers ``` src/hotspot/share/gc/shared/slidingForwarding.hpp line 93: > 91: class SlidingForwarding : public CHeapObj { > 92: private: > 93: static const uintptr_t MARK_LOWER_HALF_MASK = 0xffffffff; This is just `right_n_bits(32)`? ------------- PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1408928430 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182441736 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182440390 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182452646 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182434367 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182438481 From rkennke at openjdk.org Tue May 2 12:50:22 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 12:50:22 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v15] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 11:47:15 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Some more changes by @shipilev > > src/hotspot/share/gc/shared/genCollectedHeap.cpp line 136: > >> 134: GCInitLogger::print(); >> 135: >> 136: GCForwarding::initialize(_reserved, SpaceAlignment); > > The second argument is not "shift" anymore, right? So this should be the actual reserved space size? I think SpaceAlignment is correct. We want to pass a region-size there, and the (default) region size for Serial should be the space alignment, because that is what eden, survivors and old-space will be aligned at. Unfortunately, Serial GC doesn't generally slide from top to bottom: it starts to slide old into old, then young into old until old is full, then slide the rest into young. Even worse, the survivor spaces are swapped with every GC cycle, so we really don't know that sliding goes top -> bottom. Using 'virtual' regions that align at SpaceAlignment solves the problem, though. (One exception is when the whole heap fits into our 2^28 words range, in which case we can treat the whole heap as single region) That said, I see a bug in the line: GCForwarding::initialize() takes region size *in words* but SpaceAlignment is *in bytes*. I'm fixing that to passing space-alignment in words instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182500382 From rkennke at openjdk.org Tue May 2 13:00:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 13:00:29 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Address @shipilev's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/74d4ad1f..5892ad5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=14-15 Stats: 15 lines in 4 files changed: 2 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From jsjolen at openjdk.org Tue May 2 13:09:21 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 2 May 2023 13:09:21 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 In-Reply-To: <-ZV05tb2xNWIBcGc7Nj_TZ6qq3BGrsjlKCT48_GTmQU=.6480f4f9-f1a5-47fa-94d9-51d3968ff711@github.com> References: <-ZV05tb2xNWIBcGc7Nj_TZ6qq3BGrsjlKCT48_GTmQU=.6480f4f9-f1a5-47fa-94d9-51d3968ff711@github.com> Message-ID: On Tue, 21 Mar 2023 13:32:10 GMT, Stuart Monteith wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > This looks OK so far. > However, is it your intention to also do aarch64.ad? > aarch64_ad.m4 and aarch64_vector(.ad|_ad.m4) files look clean. Hi @stooart-mon , would you be interested in approving this :)? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/12321#issuecomment-1531443645 From jvernee at openjdk.org Tue May 2 14:06:29 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 2 May 2023 14:06:29 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v24] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 12:59:33 GMT, Martin Doerr wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert unintended formatting changes. Fix comment. > > Adapted for JDK21, now. All tests have passed. My IDE had changed the formatting which is reverted, now. (I've kept the minor formatting changes in TestDontRelease.java because it looks better.) @TheRealMDoerr I think you already noticed but I have been integrating some followup patches after the JEP was integrated. I've also just integrated: https://github.com/openjdk/jdk/pull/13429 which looks like it has create a merge conflict. The good news is that it should no longer be needed to enable each test on PPC explicitly, as this now happens automatically as a result of adding the `LINUX_PPC_64_LE` constant to the CABI enum. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1531528368 From jvernee at openjdk.org Tue May 2 14:06:34 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 2 May 2023 14:06:34 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v25] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 09:51:47 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: > > - Adaptation for JDK-8303002. > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Revert unintended formatting changes. Fix comment. > - Enable remaining foreign tests. > - Adaptations for JDK-8304265. > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Adaptation for JDK-8305668 > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. > - Adaptation for JDK-8303022. > - ... and 20 more: https://git.openjdk.org/jdk/compare/860bf9b3...f5e22be0 On another note, how are you coming along with finding another reviewer? I (still) think it would be good to get someone that is familiar with PPC (particularly the ABI) as a second reviewer. test/jdk/java/foreign/TestHFA.java line 31: > 29: * @summary Test passing of Homogeneous Float Aggregates. > 30: * @enablePreview > 31: * @requires ((os.arch == "amd64" | os.arch == "x86_64") & sun.arch.data.model == "64") | os.arch == "aarch64" | os.arch == "ppc64le" | os.arch == "riscv64" This should also check for `jdk.foreign.linker != "UNSUPPORTED"` now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1531534791 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1182592089 From shade at openjdk.org Tue May 2 14:23:27 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 May 2023 14:23:27 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v15] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 12:47:26 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shared/genCollectedHeap.cpp line 136: >> >>> 134: GCInitLogger::print(); >>> 135: >>> 136: GCForwarding::initialize(_reserved, SpaceAlignment); >> >> The second argument is not "shift" anymore, right? So this should be the actual reserved space size? > > I think SpaceAlignment is correct. We want to pass a region-size there, and the (default) region size for Serial should be the space alignment, because that is what eden, survivors and old-space will be aligned at. Unfortunately, Serial GC doesn't generally slide from top to bottom: it starts to slide old into old, then young into old until old is full, then slide the rest into young. Even worse, the survivor spaces are swapped with every GC cycle, so we really don't know that sliding goes top -> bottom. Using 'virtual' regions that align at SpaceAlignment solves the problem, though. > (One exception is when the whole heap fits into our 2^28 words range, in which case we can treat the whole heap as single region) > That said, I see a bug in the line: GCForwarding::initialize() takes region size *in words* but SpaceAlignment is *in bytes*. I'm fixing that to passing space-alignment in words instead. Ah, that is _region size_, okay. `SpaceAlignment` seems okay then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182618027 From mdoerr at openjdk.org Tue May 2 14:24:39 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 May 2023 14:24:39 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v26] In-Reply-To: References: Message-ID: <6T74cq3nirARNZCJzJrDyqKifUbmzEg2Ky8hW3Tfh6U=.4bf0350a-380c-4d01-a179-4b66f7d99ddf@github.com> > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: - Merge remote-tracking branch 'origin' into PPC64_Panama - Prepare for JDK-8304888. (Revert test changes.) - Adaptation for JDK-8303002. - Merge remote-tracking branch 'origin' into PPC64_Panama - Revert unintended formatting changes. Fix comment. - Enable remaining foreign tests. - Adaptations for JDK-8304265. - Merge remote-tracking branch 'origin' into PPC64_Panama - Adaptation for JDK-8305668 - Merge remote-tracking branch 'origin' into PPC64_Panama - ... and 22 more: https://git.openjdk.org/jdk/compare/a8bf2acb...e4ddbda0 ------------- Changes: https://git.openjdk.org/jdk/pull/12708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=25 Stats: 2395 lines in 27 files changed: 2348 ins; 0 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Tue May 2 14:38:28 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 May 2023 14:38:28 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v25] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 14:01:57 GMT, Jorn Vernee wrote: > On another note, how are you coming along with finding another reviewer? I (still) think it would be good to get someone that is familiar with PPC (particularly the ABI) as a second reviewer. Second Review is in progress. I have merged your recent changes and all tests are passing. Your updates made my PR much smaller :-) Do you have for more changes to wait for or would you prefer to have this PR integrated soon? Off topic: I have read parts of the Big Endian ABI and we will need a solution for "An aggregate or union smaller than one doubleword in size is padded so that it appears in the least significant bits of the doubleword. All others are padded, if necessary, at their tail." The tail padding seems to be tricky for Big Endian as we currently access the wrong bytes. I think it could be solved by dirty hacks (shifting) in the backend, but that doesn't sound like a good solution. Do you have a good idea for that? Maybe shift or pad in Java? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1531594500 From shade at openjdk.org Tue May 2 14:51:29 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 May 2023 14:51:29 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: References: Message-ID: <5K91Rc15LdUc1SOEllnetA3-IS5T_pDYSkEXFIR8M64=.4ba97ce6-1033-490e-a4a6-911a9a870109@github.com> On Tue, 2 May 2023 13:00:29 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Address @shipilev's review Some more... src/hotspot/share/gc/shared/slidingForwarding.cpp line 40: > 38: SlidingForwarding::SlidingForwarding(MemRegion heap, size_t region_size_words) > 39: : _heap_start(heap.start()), > 40: _num_regions(((heap.end() - heap.start()) / region_size_words) + 1), This one overestimates the number of regions by 1, if heap is covered by regions exactly, right? Seems innocuous, though. src/hotspot/share/gc/shared/slidingForwarding.hpp line 112: > 110: > 111: // How many bits we use for the compressed pointer > 112: static const int NUM_COMPRESSED_BITS = 32 - OFFSET_BITS_SHIFT; Suggestion: // How many bits we use for the offset static const int NUM_OFFSET_BITS = 32 - OFFSET_BITS_SHIFT; src/hotspot/share/gc/shared/slidingForwarding.hpp line 165: > 163: }; > 164: > 165: static const size_t TABLE_SIZE = 128; Any reason why we do `128` here? I think we can take a bit larger table here, given that: a) the footprint would be eaten by chaining anyway; b) we delete the table after use anyway. 1K entries would take about 32K native memory, if I calculate it right. ------------- PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1409233470 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182631375 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182651630 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182660172 From jvernee at openjdk.org Tue May 2 14:51:29 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 2 May 2023 14:51:29 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v25] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 14:35:22 GMT, Martin Doerr wrote: > Do you have for more changes to wait for or would you prefer to have this PR integrated soon? I don't have anything else in the pipeline at the moment. > Off topic: I have read parts of the Big Endian ABI and we will need a solution for "An aggregate or union smaller than one doubleword in size is padded so that it appears in the least significant bits of the doubleword. All others are padded, if necessary, at their tail." The tail padding seems to be tricky for Big Endian as we currently access the wrong bytes. I think it could be solved by dirty hacks (shifting) in the backend, but that doesn't sound like a good solution. Do you have a good idea for that? Maybe shift or pad in Java? In general the assumption of the linker is that any layouts it is given are correct for the given platform/ABI. i.e. it is up to the user to specify the correct padding (and this is where jextract can help out a lot). We do also check that layouts are 'canonical' now (see https://github.com/openjdk/jdk/pull/13164 & [1]). I think this already guarantees that the necessary trailing padding is present (constraint 3)? Did you see the discussion at [2] ? I think we already have Big Endian covered? [1]: https://github.com/openjdk/jdk/blob/a8bf2acb7db63b508ef169e42a27b9c99178cbb1/src/java.base/share/classes/java/lang/foreign/Linker.java#L200-L209) [2]: https://github.com/openjdk/panama-foreign/pull/806#discussion_r1122138401 ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1531615390 From eosterlund at openjdk.org Tue May 2 15:15:26 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 2 May 2023 15:15:26 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 13:00:29 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Address @shipilev's review src/hotspot/share/gc/shared/preservedMarks.cpp line 52: > 50: if (GCForwarding::is_forwarded(obj)) { > 51: elem->set_oop(GCForwarding::forwardee(obj)); > 52: } Is PreservedMarks still useful after moving the spacious forwarding/mark information out from the markWord? I can see that we need it while transitioning to using your new code, but that's about it right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182691096 From stuefe at openjdk.org Tue May 2 15:28:37 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 2 May 2023 15:28:37 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: References: Message-ID: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> On Tue, 2 May 2023 13:00:29 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Address @shipilev's review Hi Roman, Small general concern, the last-last-ditch-GC fallback table may be impractical cost-wise. How large is that expected to grow? You pay 24+x (~48 on glibc with internal overhead) bytes per forwarded oop. Very easy first-step mitigation: Let the table house the first n (1000-10000) nodes as an inline member array. Allocate nodes from there, only allocate spilloffs from C-heap. Allocation would be a lot faster and cheaper memory wise, and its just some lines of code. Cheers, Thomas src/hotspot/share/gc/shared/slidingForwarding.cpp line 35: > 33: > 34: // We cannot use 0, because that may already be a valid base address in zero-based heaps. > 35: // 0x1 is safe because heap base addresses must be aligned by much larger alginemnt typo src/hotspot/share/gc/shared/slidingForwarding.cpp line 44: > 42: _region_size_words_shift(log2i_exact(region_size_words)), > 43: _bases_table(nullptr), > 44: _fallback_table(nullptr) { Assert for sane values for region_size? At least >= word size? src/hotspot/share/gc/shared/slidingForwarding.cpp line 81: > 79: _bases_table = nullptr; > 80: > 81: if (_fallback_table != nullptr) { null check not needed src/hotspot/share/gc/shared/slidingForwarding.cpp line 137: > 135: void FallbackTable::forward_to(HeapWord* from, HeapWord* to) { > 136: size_t idx = home_index(from); > 137: if (_table[idx]._from != nullptr) { Here you need to do a contains check, right? Because, as you wrote in your answer to Aleksey, forwardings can be rewritten: https://github.com/openjdk/jdk/pull/13582/files#r1180126262 src/hotspot/share/gc/shared/slidingForwarding.hpp line 121: > 119: size_t _region_size_words; > 120: size_t _region_size_words_shift; > 121: HeapWord** _bases_table; Small nit. For clarity, I would prefer if we had a real structure here, e.g.: struct region_forwarding { HeapWord* dest; HeapWord* alt_dest; }; region_forwarding* _table; src/hotspot/share/gc/shared/slidingForwarding.hpp line 168: > 166: FallbackTableEntry _table[TABLE_SIZE]; > 167: > 168: static size_t home_index(HeapWord* from); Nitpicking, but I'd prefer an int or unsigned as return val here. src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 56: > 54: // Primary is free > 55: _bases_table[base_idx] = to_region_base; > 56: } else if (region_contains(_bases_table[base_idx], to)) { Stupid question, could this not just be `else if ( _bases_table[base_idx] == to_region_base)` ? Same below? src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 87: > 85: > 86: HeapWord* SlidingForwarding::decode_forwarding(HeapWord* from, uintptr_t encoded) const { > 87: assert((encoded & markWord::marked_value) == markWord::marked_value, "must be marked as forwarded"); Assert for !FALLBACK too? src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 95: > 93: size_t base_idx = from_idx + alt_region; > 94: > 95: HeapWord* decoded = _bases_table[base_idx] + offset; Maybe assert that table slot != UNUSED_BASE first src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 115: > 113: uintptr_t encoded = encode_forwarding(from_hw, to_hw); > 114: markWord new_header = markWord((from_header.value() & ~MARK_LOWER_HALF_MASK) | encoded); > 115: from->set_mark(new_header); What happens if the header is displaced into an OM? Should we not update the displaced header instead? src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 125: > 123: assert(_bases_table != nullptr, "call begin() before asking for forwarding"); > 124: > 125: markWord header = from->mark(); Could this header be displaced? test/hotspot/gtest/gc/shared/test_slidingForwarding.cpp line 40: > 38: return ((uintptr_t(1) << 2) /* fallback */ | 3 /* forwarded */); > 39: } > 40: Could you add a test that forwarding works for displaced Oop+OM ? ------------- Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1405661292 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182596471 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182631488 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182633369 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182691084 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182612217 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182688315 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182622823 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182636381 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182640645 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182644415 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182647104 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182706683 From stuefe at openjdk.org Tue May 2 15:28:41 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 2 May 2023 15:28:41 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v10] In-Reply-To: <1rB2fkk813I4tm8B-G2ArcAjSJxWzyYIgf8yBWBVGwc=.8dc9ecb5-24c0-4a86-bf29-4cf6408a1b1b@github.com> References: <1rB2fkk813I4tm8B-G2ArcAjSJxWzyYIgf8yBWBVGwc=.8dc9ecb5-24c0-4a86-bf29-4cf6408a1b1b@github.com> Message-ID: <2F7cnbE_2v4qCk1LoBFRI2S9Ky2bcwgPMxHC0Cf0lHU=.0973cc70-d5df-409c-87b0-f21562f1010d@github.com> On Fri, 28 Apr 2023 07:52:53 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix minimal build test/hotspot/gtest/gc/shared/test_slidingForwarding.cpp line 35: > 33: // Test simple forwarding within the same region. > 34: TEST_VM(SlidingForwarding, simple) { > 35: HeapWord heap[16]; Please initialize array for release build gtests ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1180220812 From rkennke at openjdk.org Tue May 2 15:33:24 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 15:33:24 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 15:11:59 GMT, Erik ?sterlund wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Address @shipilev's review > > src/hotspot/share/gc/shared/preservedMarks.cpp line 52: > >> 50: if (GCForwarding::is_forwarded(obj)) { >> 51: elem->set_oop(GCForwarding::forwardee(obj)); >> 52: } > > Is PreservedMarks still useful after moving the spacious forwarding/mark information out from the markWord? I can see that we need it while transitioning to using your new code, but that's about it right? It is still useful. This PR implements a compression that allows to use only the lowest 32bit of the mark-word for the forwarding pointer, but it still essentially uses the mark-word to store that information. That means that it overrides i-hash-code and lock-bits just the same as the normal implementation, and thus must preserve this information. I *also* prototyped a hash-table-based forwarding which does no longer use the mark-word to store forwarding. However, I found that to be 1. significantly slower and 2. significantly larger. That was a trade-off that I did not want to make at this point, when we 'only' want 64-bit-headers, simply because it's not yet necessary. It *will* become necessary to make that trade-off, or come up with a better overall approach (e.g. use scissor-GC like Parallel GC does, or come up with a better fwd-table like in that paper that you sent me: https://dl.acm.org/doi/abs/10.1145/3546918.3546928) but this needs to be researched. So yeah, the sliding forwarding algorithm is an interim solution but I think it is worth to have it at this point in time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182713491 From wkemper at openjdk.org Tue May 2 15:46:48 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 May 2023 15:46:48 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: This merges tag jdk-21+20. Please have a look at [this change](https://github.com/openjdk/shenandoah/commit/f9a114efcbfdf5b8b8d605e8b431a9312c3b6111) to resolve conflicts between `CardTable` and `ShenandoahCardTable`. ------------- Commit messages: - Reduce coupling between shenandoah card table and shared card table - Merge tag 'jdk-21+20' into merge-jdk-21+20 - 8305771: SA ClassWriter.java fails to skip overpass methods - 8306951: [BACKOUT] JDK-8305252 make_method_handle_intrinsic may call java code under a lock - 8306409: Open source AWT KeyBoardFocusManger, LightWeightComponent related tests - 8306705: com/sun/jdi/PopAndInvokeTest.java fails with NativeMethodException - 8302182: Update Public Suffix List to 88467c9 - 8305853: java/text/Format/DateFormat/DateFormatRegression.java fails with "Uncaught exception thrown in test method Test4089106" - 8306374: (bf) Improve performance of DirectCharBuffer::append(CharSequence[,int,int]) - 8302328: [s390x] Simplify asm_assert definition - ... and 197 more: https://git.openjdk.org/shenandoah/compare/b442302c...f9a114ef The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=shenandoah&pr=269&range=00.0 - openjdk/jdk:master: https://webrevs.openjdk.org/?repo=shenandoah&pr=269&range=00.1 Changes: https://git.openjdk.org/shenandoah/pull/269/files Stats: 261355 lines in 2002 files changed: 237988 ins; 10398 del; 12969 mod Patch: https://git.openjdk.org/shenandoah/pull/269.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/269/head:pull/269 PR: https://git.openjdk.org/shenandoah/pull/269 From shade at openjdk.org Tue May 2 15:46:48 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 May 2023 15:46:48 GMT Subject: RFR: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Tue, 2 May 2023 15:35:45 GMT, William Kemper wrote: > This merges tag jdk-21+20. Please have a look at [this change](https://github.com/openjdk/shenandoah/commit/f9a114efcbfdf5b8b8d605e8b431a9312c3b6111) to resolve conflicts between `CardTable` and `ShenandoahCardTable`. Marked as reviewed by shade (Committer). ------------- PR Review: https://git.openjdk.org/shenandoah/pull/269#pullrequestreview-1409380546 From mdoerr at openjdk.org Tue May 2 15:52:31 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 May 2023 15:52:31 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v26] In-Reply-To: <6T74cq3nirARNZCJzJrDyqKifUbmzEg2Ky8hW3Tfh6U=.4bf0350a-380c-4d01-a179-4b66f7d99ddf@github.com> References: <6T74cq3nirARNZCJzJrDyqKifUbmzEg2Ky8hW3Tfh6U=.4bf0350a-380c-4d01-a179-4b66f7d99ddf@github.com> Message-ID: On Tue, 2 May 2023 14:24:39 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: > > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Prepare for JDK-8304888. (Revert test changes.) > - Adaptation for JDK-8303002. > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Revert unintended formatting changes. Fix comment. > - Enable remaining foreign tests. > - Adaptations for JDK-8304265. > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Adaptation for JDK-8305668 > - Merge remote-tracking branch 'origin' into PPC64_Panama > - ... and 22 more: https://git.openjdk.org/jdk/compare/a8bf2acb...e4ddbda0 > In general the assumption of the linker is that any layouts it is given are correct for the given platform/ABI. i.e. it is up to the user to specify the correct padding (and this is where jextract can help out a lot). We do also check that layouts are 'canonical' now (see #13164 & [1]). I think this already guarantees that the necessary trailing padding is present (constraint 3)? Did you see the discussion at [2] ? I think we already have Big Endian covered? I had seen that discussion. It appears to work for the TestMiniStruct I had uploaded (passed a structure consisting of 3 bytes). However, I'm getting failures when passing larger structs which have a size >8, but not divisible by 8. E.g. 17 tests of TestDowncallScope are failing. The guarantee was not hit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1531709856 From smonteith at openjdk.org Tue May 2 15:58:31 2023 From: smonteith at openjdk.org (Stuart Monteith) Date: Tue, 2 May 2023 15:58:31 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v3] In-Reply-To: References: Message-ID: <-XDKe5JFXse-84BaSSem2Qi7Xvp5-aTDbIGT-ka7ygY=.4e993996-b649-4cea-864e-7ab0ccd23ba5@github.com> On Tue, 11 Apr 2023 13:22:40 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fix style > - Merge remote-tracking branch 'origin/master' into JDK-8301493 > - Explicitly cast > - Fixes > - Replace NULL with nullptr in cpu/aarch64 As I'm only an author, I can't sponsor this patch, but it looks OK to me. It is consistent with the NULL->nullptr changes you have been doing elsewhere. There might always be this problem, but there are some missing instances NULL that may have appeared since you started this - I've listed them below. codeBuffer_aarch64.cpp: if (cb->stubs()->maybe_expand_to_ensure_remaining(total_requested_size) && cb->blob() == NULL) { gc/shared/barrierSetAssembler_aarch64.cpp: __ cbz(obj, error); // if klass is NULL it is broken stubGenerator_aarch64.cpp: if (bs_nm != NULL) { vm_version_aarch64.cpp: if (virt2 != NULL && strcasestr(line, virt2) != 0) { vm_version_aarch64.cpp: check_info_file(tname_file, "Xen", XenPVHVM, NULL, NoDetectedVirtualization); ------------- PR Comment: https://git.openjdk.org/jdk/pull/12321#issuecomment-1531719170 From wkemper at openjdk.org Tue May 2 16:07:19 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 May 2023 16:07:19 GMT Subject: RFR: Merge openjdk/jdk:master [v2] In-Reply-To: References: Message-ID: > This merges tag jdk-21+20. Please have a look at [this change](https://github.com/openjdk/shenandoah/commit/f9a114efcbfdf5b8b8d605e8b431a9312c3b6111) to resolve conflicts between `CardTable` and `ShenandoahCardTable`. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 273 commits: - Reduce coupling between shenandoah card table and shared card table - Merge tag 'jdk-21+20' into merge-jdk-21+20 Added tag jdk-21+20 for changeset 750bece0 - Remove redundant initialization code Reviewed-by: shade - Usage tracking cleanup Reviewed-by: kdnilsen, ysr - Use state of mark context itself to track stability of old mark bitmap Reviewed-by: ysr, shade - 8306334: Handle preemption of old cycle between filling and bootstrap phases Reviewed-by: kdnilsen - Eliminate TODO in ShenandoahHeuristics::select_aged_regions Reviewed-by: wkemper - Add unique ticket number for each disabled test Reviewed-by: ysr, shade - Specialize ShenandoahGenerationType to avoid generational check penalty on fast paths Reviewed-by: kdnilsen, wkemper - Account for humongous object waste Reviewed-by: ysr, kdnilsen - ... and 263 more: https://git.openjdk.org/shenandoah/compare/750bece0...f9a114ef ------------- Changes: https://git.openjdk.org/shenandoah/pull/269/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=269&range=01 Stats: 18852 lines in 215 files changed: 17207 ins; 794 del; 851 mod Patch: https://git.openjdk.org/shenandoah/pull/269.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/269/head:pull/269 PR: https://git.openjdk.org/shenandoah/pull/269 From wkemper at openjdk.org Tue May 2 16:07:25 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 May 2023 16:07:25 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: <0ji6n6_llOl_c2Yr51xyGhB6OVuMJM2bHyxEjpuBC2M=.08f2532f-57d4-4aac-a643-2cf5973712de@github.com> On Tue, 2 May 2023 15:35:45 GMT, William Kemper wrote: > This merges tag jdk-21+20. Please have a look at [this change](https://github.com/openjdk/shenandoah/commit/f9a114efcbfdf5b8b8d605e8b431a9312c3b6111) to resolve conflicts between `CardTable` and `ShenandoahCardTable`. This pull request has now been integrated. Changeset: 21982fc5 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/21982fc5813bf0c77e7b880f4f4caa5f5749a1a4 Stats: 261355 lines in 2002 files changed: 237988 ins; 10398 del; 12969 mod Merge openjdk/jdk:master Reviewed-by: shade ------------- PR: https://git.openjdk.org/shenandoah/pull/269 From rkennke at openjdk.org Tue May 2 16:51:13 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 16:51:13 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v17] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: More @shipilev's review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/5892ad5d..494ec9ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=15-16 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Tue May 2 16:54:24 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 16:54:24 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> References: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> Message-ID: On Tue, 2 May 2023 15:25:10 GMT, Thomas Stuefe wrote: > Hi Roman, > > Small general concern, the last-last-ditch-GC fallback table may be impractical cost-wise. How large is that expected to grow? You pay 24+x (~48 on glibc with internal overhead) bytes per forwarded oop. > > Very easy first-step mitigation: Let the table house the first n (1000-10000) nodes as an inline member array. Allocate nodes from there, only allocate spilloffs from C-heap. Allocation would be a lot faster and cheaper memory wise, and its just some lines of code. > I did some experiments with the only jtreg test that seems to exercise the G1 serial compaction (and thus the fallback-table) (the test is: gc/stress/TestMultiThreadStressRSet.java). With fallback-table size 128 I'd typically end up with several dozens excess nodes, sometimes more than the base table size. Up to table size of 512 this reduces signicantly but still typically one to several dozen extra nodes. When I switched to table-size of 1024 the extra nodes count drops to below one dozen in most cases. I'll leave the table-size at this value until we find a good reason to extend it, ok? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1531813673 From kdnilsen at openjdk.org Tue May 2 17:24:07 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 May 2023 17:24:07 GMT Subject: RFR: DRAFT: Expand old on demand [v31] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Tidy up some comments ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/d6a2b2f8..27a88228 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=30 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=29-30 Stats: 7 lines in 1 file changed: 3 ins; 2 del; 2 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From rkennke at openjdk.org Tue May 2 17:37:28 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 17:37:28 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> References: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> Message-ID: On Tue, 2 May 2023 14:16:16 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Address @shipilev's review > > src/hotspot/share/gc/shared/slidingForwarding.hpp line 121: > >> 119: size_t _region_size_words; >> 120: size_t _region_size_words_shift; >> 121: HeapWord** _bases_table; > > Small nit. For clarity, I would prefer if we had a real structure here, e.g.: > > struct region_forwarding { HeapWord* dest; HeapWord* alt_dest; }; > region_forwarding* _table; Ok, I am changing it. It looks like it's introducing a branch on the decoding-path though. I am not sure if a C++ compiler would optimise it to a branch-free code, though. It's probably a very minor concern. > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 115: > >> 113: uintptr_t encoded = encode_forwarding(from_hw, to_hw); >> 114: markWord new_header = markWord((from_header.value() & ~MARK_LOWER_HALF_MASK) | encoded); >> 115: from->set_mark(new_header); > > What happens if the header is displaced into an OM? Should we not update the displaced header instead? When the header is displaced, it will be recorded in the preserved-marks table. Then we over-write the mark-word with the forwarding. At the end of the GC, we will restore the original mark from the preserved-marks table. This is the same mechanism that is already used in normal uncompressed forwarding. > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 125: > >> 123: assert(_bases_table != nullptr, "call begin() before asking for forwarding"); >> 124: >> 125: markWord header = from->mark(); > > Could this header be displaced? No. See above. We actually check for that in decode_forwarding(): assert((encoded & markWord::marked_value) == markWord::marked_value, "must be marked as forwarded"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182827422 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182846767 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182847913 From shade at openjdk.org Tue May 2 17:37:28 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 May 2023 17:37:28 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: References: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> Message-ID: On Tue, 2 May 2023 17:12:12 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shared/slidingForwarding.hpp line 121: >> >>> 119: size_t _region_size_words; >>> 120: size_t _region_size_words_shift; >>> 121: HeapWord** _bases_table; >> >> Small nit. For clarity, I would prefer if we had a real structure here, e.g.: >> >> struct region_forwarding { HeapWord* dest; HeapWord* alt_dest; }; >> region_forwarding* _table; > > Ok, I am changing it. It looks like it's introducing a branch on the decoding-path though. I am not sure if a C++ compiler would optimise it to a branch-free code, though. It's probably a very minor concern. No wait, let's keep it as `HeapWord*` array. The fact that alternate selection is just a math addition matters a bit for decoding performance. I think it does not complicate the code all that much to warrant extra abstraction here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182832355 From shade at openjdk.org Tue May 2 17:37:31 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 May 2023 17:37:31 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> References: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> Message-ID: On Tue, 2 May 2023 14:23:36 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Address @shipilev's review > > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 56: > >> 54: // Primary is free >> 55: _bases_table[base_idx] = to_region_base; >> 56: } else if (region_contains(_bases_table[base_idx], to)) { > > Stupid question, could this not just be `else if ( _bases_table[base_idx] == to_region_base)` ? Same below? (kicks himself a little). Yes. Yes, it can. We would not need `region_contains` method then at all, I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182826854 From rkennke at openjdk.org Tue May 2 17:37:32 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 17:37:32 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: References: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> Message-ID: On Tue, 2 May 2023 17:11:35 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 56: >> >>> 54: // Primary is free >>> 55: _bases_table[base_idx] = to_region_base; >>> 56: } else if (region_contains(_bases_table[base_idx], to)) { >> >> Stupid question, could this not just be `else if ( _bases_table[base_idx] == to_region_base)` ? Same below? > > (kicks himself a little). Yes. Yes, it can. We would not need `region_contains` method then at all, I think. Indeed! Well spotted! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182827529 From rkennke at openjdk.org Tue May 2 17:46:37 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 17:46:37 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v16] In-Reply-To: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> References: <9OvQlCjXDbQ_AzJyCc-xYnS-zwv5ib2SxsfQgKINFWM=.86d9e284-1096-4510-a801-dfbb5a8b2880@github.com> Message-ID: <30hPs9U3Wt5ITn5XHdjVPuVcbqK4YWq1Xxfw2LznDYo=.0194177b-9472-41e1-bd2a-056eb80104ff@github.com> On Tue, 2 May 2023 15:11:59 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Address @shipilev's review > > src/hotspot/share/gc/shared/slidingForwarding.cpp line 137: > >> 135: void FallbackTable::forward_to(HeapWord* from, HeapWord* to) { >> 136: size_t idx = home_index(from); >> 137: if (_table[idx]._from != nullptr) { > > Here you need to do a contains check, right? Because, as you wrote in your answer to Aleksey, forwardings can be rewritten: https://github.com/openjdk/jdk/pull/13582/files#r1180126262 I don't think that ever happens (I think we'd only ever re-forward from normal forwarding to fallback-forwarding once), but I am adding that check for extra sanity. > test/hotspot/gtest/gc/shared/test_slidingForwarding.cpp line 40: > >> 38: return ((uintptr_t(1) << 2) /* fallback */ | 3 /* forwarded */); >> 39: } >> 40: > > Could you add a test that forwarding works for displaced Oop+OM ? Uhhh, that would involved the OM and preserved-marks subsystems. The saving and restoring of 'interesting mark-words' is done outside of the GCForwarding subsystem and not the responsibility here. I'd rather not test for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182855383 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182856956 From rkennke at openjdk.org Tue May 2 18:06:30 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 18:06:30 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v18] In-Reply-To: References: Message-ID: <06loJyeqlW5aON-IGrWJzY6DQBLkC3kyuxxeCMxq3xI=.da8bbc4c-0cd7-4c7f-9bb2-0b087ce70d11@github.com> > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Address @tstuefe's review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/494ec9ad..84181db6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=16-17 Stats: 38 lines in 3 files changed: 8 ins; 6 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Tue May 2 18:21:18 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 18:21:18 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v19] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Initialize 'heap' elements in test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/84181db6..8366454e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=17-18 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From kdnilsen at openjdk.org Tue May 2 22:33:06 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 May 2023 22:33:06 GMT Subject: RFR: DRAFT: Expand old on demand [v32] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: - Merge remote-tracking branch 'origin' into expand-old-on-demand - Tidy up some comments - Remove iteator over mutator threads as not needed - Fix a comment re: max_young_evacuation_reserve - No need to distinguish generational full gc verification - Fix ShouldNotReachHere() assert - Remove unnecessary conditional compile directives - Remove unnecessary assert The assert was added at some point during development, but found not to be relevant. - Better code reuse for clearing old-gen triggers - Tidy up loop for selection of mixed-evacuation candidate regions - ... and 104 more: https://git.openjdk.org/shenandoah/compare/21982fc5...42bb8f41 ------------- Changes: https://git.openjdk.org/shenandoah/pull/248/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=31 Stats: 3099 lines in 35 files changed: 1692 ins; 884 del; 523 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Tue May 2 22:40:24 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 May 2023 22:40:24 GMT Subject: RFR: DRAFT: Expand old on demand [v33] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix white space ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/42bb8f41..30415b40 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=32 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=31-32 Stats: 10 lines in 2 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From wkemper at openjdk.org Tue May 2 23:02:05 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 May 2023 23:02:05 GMT Subject: RFR: Use distinct "end of cycle" message for each Shenandoah pause Message-ID: This change allows the `GarbageCollectionNotificationInfo::gcAction` to be overridden. Shenandoah now uses this to distinguish the notification raised for each of the pauses during the concurrent cycle. This will make it easier for monitoring software to identify which phases may be having longer than expected pauses. ------------- Commit messages: - Simplify region aging logic - Allow unique message for each gc notification action Changes: https://git.openjdk.org/shenandoah/pull/270/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=270&range=00 Stats: 88 lines in 12 files changed: 37 ins; 21 del; 30 mod Patch: https://git.openjdk.org/shenandoah/pull/270.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/270/head:pull/270 PR: https://git.openjdk.org/shenandoah/pull/270 From kdnilsen at openjdk.org Tue May 2 23:06:45 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 May 2023 23:06:45 GMT Subject: RFR: Use distinct "end of cycle" message for each Shenandoah pause In-Reply-To: References: Message-ID: On Tue, 2 May 2023 22:54:53 GMT, William Kemper wrote: > This change allows the `GarbageCollectionNotificationInfo::gcAction` to be overridden. Shenandoah now uses this to distinguish the notification raised for each of the pauses during the concurrent cycle. This will make it easier for monitoring software to identify which phases may be having longer than expected pauses. src/hotspot/share/gc/shenandoah/shenandoahVMOperations.cpp line 82: > 80: > 81: void VM_ShenandoahFinalRoots::doit() { > 82: ShenandoahGCPauseMark mark(_gc_id, SvcGCMarker::CONCURRENT); Why are we disabling the aging of regions? And why are we disabling the aging cycle? ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/270#discussion_r1183122172 From wkemper at openjdk.org Tue May 2 23:16:42 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 May 2023 23:16:42 GMT Subject: RFR: Alphabetize includes Message-ID: Follow up to feedback on https://github.com/openjdk/shenandoah/pull/269 ------------- Commit messages: - Alphabetize includes Changes: https://git.openjdk.org/shenandoah/pull/271/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=271&range=00 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/271.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/271/head:pull/271 PR: https://git.openjdk.org/shenandoah/pull/271 From wkemper at openjdk.org Tue May 2 23:24:51 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 May 2023 23:24:51 GMT Subject: RFR: Use distinct "end of cycle" message for each Shenandoah pause [v2] In-Reply-To: References: Message-ID: <5bLDkWLMsGYeZWadW8pzMJ9u6gzx0av1Wa0hW6HB9lI=.2c8cb44e-1691-4896-92f3-a43e9c133960@github.com> > This change allows the `GarbageCollectionNotificationInfo::gcAction` to be overridden. Shenandoah now uses this to distinguish the notification raised for each of the pauses during the concurrent cycle. This will make it easier for monitoring software to identify which phases may be having longer than expected pauses. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Move region aging into op_final_roots ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/270/files - new: https://git.openjdk.org/shenandoah/pull/270/files/088002da..1c097462 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=270&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=270&range=00-01 Stats: 37 lines in 1 file changed: 19 ins; 18 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/270.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/270/head:pull/270 PR: https://git.openjdk.org/shenandoah/pull/270 From kdnilsen at openjdk.org Tue May 2 23:24:51 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 May 2023 23:24:51 GMT Subject: RFR: Use distinct "end of cycle" message for each Shenandoah pause [v2] In-Reply-To: <5bLDkWLMsGYeZWadW8pzMJ9u6gzx0av1Wa0hW6HB9lI=.2c8cb44e-1691-4896-92f3-a43e9c133960@github.com> References: <5bLDkWLMsGYeZWadW8pzMJ9u6gzx0av1Wa0hW6HB9lI=.2c8cb44e-1691-4896-92f3-a43e9c133960@github.com> Message-ID: On Tue, 2 May 2023 23:20:06 GMT, William Kemper wrote: >> This change allows the `GarbageCollectionNotificationInfo::gcAction` to be overridden. Shenandoah now uses this to distinguish the notification raised for each of the pauses during the concurrent cycle. This will make it easier for monitoring software to identify which phases may be having longer than expected pauses. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Move region aging into op_final_roots Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/shenandoah/pull/270#pullrequestreview-1409998104 From kdnilsen at openjdk.org Tue May 2 23:24:52 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 May 2023 23:24:52 GMT Subject: RFR: Use distinct "end of cycle" message for each Shenandoah pause [v2] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 23:04:03 GMT, Kelvin Nilsen wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Move region aging into op_final_roots > > src/hotspot/share/gc/shenandoah/shenandoahVMOperations.cpp line 82: > >> 80: >> 81: void VM_ShenandoahFinalRoots::doit() { >> 82: ShenandoahGCPauseMark mark(_gc_id, SvcGCMarker::CONCURRENT); > > Why are we disabling the aging of regions? And why are we disabling the aging cycle? In my initial review, I missed that this functionality has been moved to ShenandoahConcurrentGC::entry_final_roots() ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/270#discussion_r1183125829 From wkemper at openjdk.org Tue May 2 23:24:52 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 May 2023 23:24:52 GMT Subject: RFR: Use distinct "end of cycle" message for each Shenandoah pause [v2] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 23:12:16 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahVMOperations.cpp line 82: >> >>> 80: >>> 81: void VM_ShenandoahFinalRoots::doit() { >>> 82: ShenandoahGCPauseMark mark(_gc_id, SvcGCMarker::CONCURRENT); >> >> Why are we disabling the aging of regions? And why are we disabling the aging cycle? > > In my initial review, I missed that this functionality has been moved to ShenandoahConcurrentGC::entry_final_roots() I moved it into `ShenandoahConcurrentGC::op_final_roots` to fit the pattern of other phases (`VM_Operation_XYZ` and `entry_XYZ` should just have GC tracking boiler plate). Also, with this approach, we don't have to marshal the `is_aging_cycle` around. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/270#discussion_r1183127325 From kdnilsen at openjdk.org Tue May 2 23:24:52 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 May 2023 23:24:52 GMT Subject: RFR: Use distinct "end of cycle" message for each Shenandoah pause [v2] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 23:15:37 GMT, William Kemper wrote: >> In my initial review, I missed that this functionality has been moved to ShenandoahConcurrentGC::entry_final_roots() > > I moved it into `ShenandoahConcurrentGC::op_final_roots` to fit the pattern of other phases (`VM_Operation_XYZ` and `entry_XYZ` should just have GC tracking boiler plate). Also, with this approach, we don't have to marshal the `is_aging_cycle` around. Thanks for clarifying. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/270#discussion_r1183128107 From wkemper at openjdk.org Tue May 2 23:24:52 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 2 May 2023 23:24:52 GMT Subject: RFR: Use distinct "end of cycle" message for each Shenandoah pause [v2] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 23:17:24 GMT, Kelvin Nilsen wrote: >> I moved it into `ShenandoahConcurrentGC::op_final_roots` to fit the pattern of other phases (`VM_Operation_XYZ` and `entry_XYZ` should just have GC tracking boiler plate). Also, with this approach, we don't have to marshal the `is_aging_cycle` around. > > Thanks for clarifying. As I think on this a bit more... we should probably also age regions during final update refs. The "final roots" phase only happens on abbreviated cycles. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/270#discussion_r1183128208 From kdnilsen at openjdk.org Tue May 2 23:24:52 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 2 May 2023 23:24:52 GMT Subject: RFR: Use distinct "end of cycle" message for each Shenandoah pause [v2] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 23:17:38 GMT, William Kemper wrote: >> Thanks for clarifying. > > As I think on this a bit more... we should probably also age regions during final update refs. The "final roots" phase only happens on abbreviated cycles. Thanks for catching that. We need it in both places or some code path shared between the different cycles. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/270#discussion_r1183128792 From ysr at openjdk.org Wed May 3 01:54:55 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 3 May 2023 01:54:55 GMT Subject: RFR: Use distinct "end of cycle" message for each Shenandoah pause [v2] In-Reply-To: <5bLDkWLMsGYeZWadW8pzMJ9u6gzx0av1Wa0hW6HB9lI=.2c8cb44e-1691-4896-92f3-a43e9c133960@github.com> References: <5bLDkWLMsGYeZWadW8pzMJ9u6gzx0av1Wa0hW6HB9lI=.2c8cb44e-1691-4896-92f3-a43e9c133960@github.com> Message-ID: On Tue, 2 May 2023 23:24:51 GMT, William Kemper wrote: >> This change allows the `GarbageCollectionNotificationInfo::gcAction` to be overridden. Shenandoah now uses this to distinguish the notification raised for each of the pauses during the concurrent cycle. This will make it easier for monitoring software to identify which phases may be having longer than expected pauses. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Move region aging into op_final_roots LGTM. src/hotspot/share/gc/shenandoah/shenandoahVMOperations.cpp line 52: > 50: > 51: void VM_ShenandoahInitMark::doit() { > 52: ShenandoahGCPauseMark mark(_gc_id, "Init Mark", SvcGCMarker::CONCURRENT); Not your code, but I am curious how the 3rd argument "reason_type" `{MINOR, FULL, CONCURRENT}` is even ever used. I saw it dutifully passed around and eventually get used only to check whether it was `== FULL` or not, down in `VM_GC_Operation::notify_gc_begin`. src/hotspot/share/services/memoryManager.cpp line 297: > 295: if (is_notification_enabled()) { > 296: const char* message = notificationMessage != nullptr ? notificationMessage : _gc_end_message; > 297: GCNotifier::pushNotification(this, message, GCCause::to_string(cause)); This looks great to me. I am assuming all collectors that do their work in multiple pieces (at least ZGC and possibly G1 too) might want to make use of this functionality. I am guessing this might benefit from an upstream review at some point. ------------- Marked as reviewed by ysr (Author). PR Review: https://git.openjdk.org/shenandoah/pull/270#pullrequestreview-1410061490 PR Review Comment: https://git.openjdk.org/shenandoah/pull/270#discussion_r1183169848 PR Review Comment: https://git.openjdk.org/shenandoah/pull/270#discussion_r1183179079 From shade at openjdk.org Wed May 3 07:38:45 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 May 2023 07:38:45 GMT Subject: RFR: Alphabetize includes In-Reply-To: References: Message-ID: On Tue, 2 May 2023 23:10:41 GMT, William Kemper wrote: > Follow up to feedback on https://github.com/openjdk/shenandoah/pull/269 Marked as reviewed by shade (Committer). ------------- PR Review: https://git.openjdk.org/shenandoah/pull/271#pullrequestreview-1410329598 From rkennke at openjdk.org Wed May 3 10:54:43 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 10:54:43 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v20] In-Reply-To: References: Message-ID: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: More Thomas' comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/8366454e..b623db55 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=18-19 Stats: 25 lines in 3 files changed: 7 ins; 5 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From shade at openjdk.org Wed May 3 11:06:59 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 May 2023 11:06:59 GMT Subject: RFR: Remove the remaining unclean upstream difference Message-ID: After this change, the only shared code differences against upstream are the proper GC interface hooks and ProblemLists. ------------- Commit messages: - Fix Changes: https://git.openjdk.org/shenandoah/pull/272/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=272&range=00 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/272.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/272/head:pull/272 PR: https://git.openjdk.org/shenandoah/pull/272 From stuefe at openjdk.org Wed May 3 11:11:22 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 May 2023 11:11:22 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v20] In-Reply-To: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> References: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> Message-ID: <9CI4EFHvG-HhtLlu8Kf8x6zgmiKqi7au0zn6iHoVrYw=.1a5248d5-5967-428e-bfeb-76f37f8658ad@github.com> On Wed, 3 May 2023 10:54:43 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > More Thomas' comments src/hotspot/share/gc/shared/slidingForwarding.cpp line 153: > 151: // Set from and to in new or found entry. > 152: entry->_from = from; > 153: entry->_to = to; Why so complicated? Proposal: while (entry != nullptr && entry->_from != from) { entry = entry->_next; } if (entry == nullptr) { FallbackTableEntry* new_entry = NEW_C_HEAP_OBJ(FallbackTableEntry, mtGC); new_entry->next = head; new_entry->_from = from; head = entry = new_entry; } entry->_to = to; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183545518 From rkennke at openjdk.org Wed May 3 11:16:25 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 11:16:25 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v20] In-Reply-To: <9CI4EFHvG-HhtLlu8Kf8x6zgmiKqi7au0zn6iHoVrYw=.1a5248d5-5967-428e-bfeb-76f37f8658ad@github.com> References: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> <9CI4EFHvG-HhtLlu8Kf8x6zgmiKqi7au0zn6iHoVrYw=.1a5248d5-5967-428e-bfeb-76f37f8658ad@github.com> Message-ID: On Wed, 3 May 2023 11:08:04 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> More Thomas' comments > > src/hotspot/share/gc/shared/slidingForwarding.cpp line 153: > >> 151: // Set from and to in new or found entry. >> 152: entry->_from = from; >> 153: entry->_to = to; > > Why so complicated? Proposal: > > while (entry != nullptr && entry->_from != from) { > entry = entry->_next; > } > if (entry == nullptr) { > FallbackTableEntry* new_entry = NEW_C_HEAP_OBJ(FallbackTableEntry, mtGC); > new_entry->next = head; > new_entry->_from = from; > head = entry = new_entry; > } > entry->_to = to; Uhm, so this would not change the actual head > Why so complicated? Proposal: > > ``` > while (entry != nullptr && entry->_from != from) { > entry = entry->_next; > } > if (entry == nullptr) { > FallbackTableEntry* new_entry = NEW_C_HEAP_OBJ(FallbackTableEntry, mtGC); > new_entry->next = head; > new_entry->_from = from; > head = entry = new_entry; > } > entry->_to = to; > ``` Remember that head points into the array. We cannot actually prepend the new entry, we can only insert it as the first linked entry after head. If I see it correctly, it would not actually change the head-entry (the stuff in the array) except for its _to field. Also, the new_entry would not get linked anywhere. Or what am I missing? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183550376 From stuefe at openjdk.org Wed May 3 11:20:23 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 May 2023 11:20:23 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v20] In-Reply-To: References: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> <9CI4EFHvG-HhtLlu8Kf8x6zgmiKqi7au0zn6iHoVrYw=.1a5248d5-5967-428e-bfeb-76f37f8658ad@github.com> Message-ID: <5fyq8JC1XDQJffYmRtITjnFebGWAuYN8doLD_DoiPN0=.8809283a-4cbc-4e47-9598-aeb8a335c8eb@github.com> On Wed, 3 May 2023 11:13:47 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shared/slidingForwarding.cpp line 153: >> >>> 151: // Set from and to in new or found entry. >>> 152: entry->_from = from; >>> 153: entry->_to = to; >> >> Why so complicated? Proposal: >> >> while (entry != nullptr && entry->_from != from) { >> entry = entry->_next; >> } >> if (entry == nullptr) { >> FallbackTableEntry* new_entry = NEW_C_HEAP_OBJ(FallbackTableEntry, mtGC); >> new_entry->next = head; >> new_entry->_from = from; >> head = entry = new_entry; >> } >> entry->_to = to; > > Uhm, so this would not change the actual head > >> Why so complicated? Proposal: >> >> ``` >> while (entry != nullptr && entry->_from != from) { >> entry = entry->_next; >> } >> if (entry == nullptr) { >> FallbackTableEntry* new_entry = NEW_C_HEAP_OBJ(FallbackTableEntry, mtGC); >> new_entry->next = head; >> new_entry->_from = from; >> head = entry = new_entry; >> } >> entry->_to = to; >> ``` > > Remember that head points into the array. We cannot actually prepend the new entry, we can only insert it as the first linked entry after head. If I see it correctly, it would not actually change the head-entry (the stuff in the array) except for its _to field. Also, the new_entry would not get linked anywhere. Or what am I missing? Ah, sorry, I just realized you inlined the head elements into the table. Okay, never mind then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183553784 From stuefe at openjdk.org Wed May 3 11:24:28 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 May 2023 11:24:28 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v20] In-Reply-To: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> References: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> Message-ID: On Wed, 3 May 2023 10:54:43 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > More Thomas' comments Okay, good so far. src/hotspot/share/gc/shared/slidingForwarding.cpp line 147: > 145: new_entry->_next = head->_next; > 146: new_entry->_from = head->_from; > 147: new_entry->_to = head->_to; You could probably just use assignment here, which does memberwise copy. `*new_entry = *head;` ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1410687480 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183556810 From rkennke at openjdk.org Wed May 3 12:21:44 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 12:21:44 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v21] In-Reply-To: References: Message-ID: <9ne0qVqlv8GjcVAZ76BIuMLqypEmpAhS-W_cHi_FRfE=.8491768a-6c51-4af9-a07f-d99863d634a5@github.com> > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Refactor GCForwarding into SlidingForwarding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/b623db55..568e5ea3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=19-20 Stats: 343 lines in 20 files changed: 87 ins; 184 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Wed May 3 12:34:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 12:34:29 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v22] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: - Place 'public' correctly - Use member assignment, instead of explicitly copying the struct - Set UseAltGCForwarding flag in test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/568e5ea3..7691eb81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=20-21 Stats: 8 lines in 3 files changed: 4 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From kdnilsen at openjdk.org Wed May 3 13:26:05 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 3 May 2023 13:26:05 GMT Subject: RFR: Remove the remaining unclean upstream difference In-Reply-To: References: Message-ID: On Wed, 3 May 2023 10:59:18 GMT, Aleksey Shipilev wrote: > After this change, the only shared code differences against upstream are the proper GC interface hooks and ProblemLists. Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/shenandoah/pull/272#pullrequestreview-1410889254 From shade at openjdk.org Wed May 3 13:32:54 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 May 2023 13:32:54 GMT Subject: Integrated: Remove the remaining unclean upstream difference In-Reply-To: References: Message-ID: On Wed, 3 May 2023 10:59:18 GMT, Aleksey Shipilev wrote: > After this change, the only shared code differences against upstream are the proper GC interface hooks and ProblemLists. This pull request has now been integrated. Changeset: 4e1ad181 Author: Aleksey Shipilev URL: https://git.openjdk.org/shenandoah/commit/4e1ad181461cdaf736bcf622a619c95724ae0ca6 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Remove the remaining unclean upstream difference Reviewed-by: kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/272 From tschatzl at openjdk.org Wed May 3 13:49:31 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 13:49:31 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v3] In-Reply-To: References: Message-ID: <8r0Te2Q1VuISH9tDaZaMzNpEL373FmmtBf5A0hO-0ek=.250720c8-bcbf-47f5-a82b-611e93247bd9@github.com> On Tue, 11 Apr 2023 13:22:40 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fix style > - Merge remote-tracking branch 'origin/master' into JDK-8301493 > - Explicitly cast > - Fixes > - Replace NULL with nullptr in cpu/aarch64 Remaining `NULL` in gc/shared/BarrierSetAssembler::check_oop() codeBuffer_aarch64.cpp/emit_shared_trampolines() stubGenerator_aarch64.cpp/generate_final_stubs() vm_version_aarch64.cpp/check_info_file() ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12321#pullrequestreview-1410938066 From kdnilsen at openjdk.org Wed May 3 13:51:17 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 3 May 2023 13:51:17 GMT Subject: RFR: DRAFT: Expand old on demand [v34] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Tidy up after merge from mainline Some incompatibilities between the different development branches required relaxing assertions that only generational mode accounts for usage within generations. Also, we needed a fix in the handling of regions promoted in place when they are made free (only if they have a sufficiently large alloc_capacity. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/30415b40..8c96288b Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=33 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=32-33 Stats: 12 lines in 2 files changed: 2 ins; 9 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From rkennke at openjdk.org Wed May 3 14:10:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 14:10:33 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v23] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Bunch of fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/7691eb81..f30039a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=21-22 Stats: 13 lines in 3 files changed: 0 ins; 10 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From shade at openjdk.org Wed May 3 14:34:22 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 May 2023 14:34:22 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v23] In-Reply-To: References: Message-ID: <1JFhieDv0YPe9ntcx6S2IFzkMdj6NGQtuoWPcG0KXUU=.eb4cafe0-4086-49d2-9a6d-720aa9b2fe69@github.com> On Wed, 3 May 2023 14:10:33 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Bunch of fixes Performance: the worst case I can come up with is Serial Full GC that moves the entire heap full of smallest objects, like this: public class Retain { static final int RETAINED = Integer.getInteger("retained", 10_000_000); static final int GCS = Integer.getInteger("gcs", 100); static Object[] OBJECTS = new Object[RETAINED]; public static void main(String... args) { for (int t = 0; t < GCS; t++) { for (int c = 0; c < RETAINED; c++) { OBJECTS[c] = new Object(); } System.gc(); } } } On my `c6n.8xlarge` instance, with `java -Xmx1g -Xlog:gc -XX:+UseSerialGC Retain.java`, I see: baseline: 364 +- 5 ms patched, -AltGCForwarding: 385 +- 3 ms [+6%] patched, +AltGCForwarding: 445 +- 5ms [+22%] There are regressions even with `-AltGCForwarding`, and judging from the profiles and the point experiments, those are caused by the `AltGCForwarding` flag checks for every `forward_to` and `forwardee`, split evenly between these two paths. But given the very targeted workload above running back-to-back Full GCs intentionally, this regression looks okay. (I think the only way to dodge it would be to template the bunch of GC code and dispatch to it once per GC phase, rather than per oop, which would be very intrusive and serve no practical need, IMO.) The regression with `+AltGCForwarding` looks impressive in comparison: it is "only" worth three flag checks or so. The code I am seeing in profiles is already quite polished, so we would unlikely squeeze more from it without investing much more time. I don't think any of this would show up at larger benchmarks running in usual (young, mixed) GC modes. Indeed, I ran a few point experiments, and there seem to be no visible change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1533133187 From stuefe at openjdk.org Wed May 3 16:02:23 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 May 2023 16:02:23 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v23] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 14:10:33 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Bunch of fixes We may squeeze out a bit more performance for the +AltGCForwarding case, eliminate one ref hop, by inlining the bases table into the SlidingForwarding object. Let the table follow the object, and maybe reduce some member sizes (either one of _num_regions, _region_size_word_shift can be 32bit, for instance). At least if we have very few regions that would let the whole table live on the same cache line. Simplest way to do that would be to add a fixed sized array to the object and use it as backing memory if bases table size is <= that array size, otherwise dynamically allocate it. Maybe not worth the work, up to you of course. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1533303946 From stuefe at openjdk.org Wed May 3 16:11:23 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 3 May 2023 16:11:23 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v23] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 14:10:33 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Bunch of fixes Changes requested by stuefe (Reviewer). src/hotspot/share/gc/shared/slidingForwarding.cpp line 162: > 160: FallbackTableEntry* head = &_table[idx]; > 161: FallbackTableEntry* entry = head; > 162: // Search existing entry in chain starting at idx. You dont use the head node. You should use the head node before creating a new node. ------------- PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1411249288 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183900419 From gziemski at openjdk.org Wed May 3 16:56:20 2023 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 3 May 2023 16:56:20 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v3] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 13:22:40 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fix style > - Merge remote-tracking branch 'origin/master' into JDK-8301493 > - Explicitly cast > - Fixes > - Replace NULL with nullptr in cpu/aarch64 I only looked at the changes that you did make, not what you could have done and it LGTM. ------------- Marked as reviewed by gziemski (Committer). PR Review: https://git.openjdk.org/jdk/pull/12321#pullrequestreview-1411341622 From wkemper at openjdk.org Wed May 3 17:11:53 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 May 2023 17:11:53 GMT Subject: RFR: Use distinct "end of cycle" message for each Shenandoah pause [v2] In-Reply-To: References: <5bLDkWLMsGYeZWadW8pzMJ9u6gzx0av1Wa0hW6HB9lI=.2c8cb44e-1691-4896-92f3-a43e9c133960@github.com> Message-ID: <12JlyIbohMFBndHRXHArTXoHdbajPrPesi_ZVMhXtx8=.b0590afd-f7dc-4e0f-8908-20a901758aaf@github.com> On Wed, 3 May 2023 01:10:48 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request incrementally with one additional commit since the last revision: >> >> Move region aging into op_final_roots > > src/hotspot/share/gc/shenandoah/shenandoahVMOperations.cpp line 52: > >> 50: >> 51: void VM_ShenandoahInitMark::doit() { >> 52: ShenandoahGCPauseMark mark(_gc_id, "Init Mark", SvcGCMarker::CONCURRENT); > > Not your code, but I am curious how the 3rd argument "reason_type" `{MINOR, FULL, CONCURRENT}` is even ever used. I saw it dutifully passed around and eventually get used only to check whether it was `== FULL` or not, down in `VM_GC_Operation::notify_gc_begin`. Yeah, I traced that too. It ultimately ends up exposed to DTRACE probes. There are quite a few GC monitoring/event hoops to jump through. > src/hotspot/share/services/memoryManager.cpp line 297: > >> 295: if (is_notification_enabled()) { >> 296: const char* message = notificationMessage != nullptr ? notificationMessage : _gc_end_message; >> 297: GCNotifier::pushNotification(this, message, GCCause::to_string(cause)); > > This looks great to me. I am assuming all collectors that do their work in multiple pieces (at least ZGC and possibly G1 too) might want to make use of this functionality. > > I am guessing this might benefit from an upstream review at some point. Yes, I reckon so. I will target these changes for upstream as well. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/270#discussion_r1183976135 PR Review Comment: https://git.openjdk.org/shenandoah/pull/270#discussion_r1183976619 From wkemper at openjdk.org Wed May 3 17:11:53 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 May 2023 17:11:53 GMT Subject: Integrated: Use distinct "end of cycle" message for each Shenandoah pause In-Reply-To: References: Message-ID: On Tue, 2 May 2023 22:54:53 GMT, William Kemper wrote: > This change allows the `GarbageCollectionNotificationInfo::gcAction` to be overridden. Shenandoah now uses this to distinguish the notification raised for each of the pauses during the concurrent cycle. This will make it easier for monitoring software to identify which phases may be having longer than expected pauses. This pull request has now been integrated. Changeset: 90a1c9bf Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/90a1c9bfe1572c99c23bd63e686e34a1695c3bce Stats: 89 lines in 12 files changed: 38 ins; 21 del; 30 mod Use distinct "end of cycle" message for each Shenandoah pause Reviewed-by: kdnilsen, ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/270 From wkemper at openjdk.org Wed May 3 17:19:52 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 May 2023 17:19:52 GMT Subject: Integrated: Alphabetize includes In-Reply-To: References: Message-ID: On Tue, 2 May 2023 23:10:41 GMT, William Kemper wrote: > Follow up to feedback on https://github.com/openjdk/shenandoah/pull/269 This pull request has now been integrated. Changeset: 84c36e65 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/84c36e65a78c7f41adb61e5c2259340bdb9767d0 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Alphabetize includes Reviewed-by: shade ------------- PR: https://git.openjdk.org/shenandoah/pull/271 From rkennke at openjdk.org Wed May 3 17:47:24 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 17:47:24 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v24] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Flatten SlidingForwarding and use heads of FallbackTable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/f30039a0..fe0915e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=22-23 Stats: 117 lines in 3 files changed: 19 ins; 43 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From wkemper at openjdk.org Wed May 3 18:26:16 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 May 2023 18:26:16 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions Message-ID: At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. ------------- Commit messages: - Allow collectors to provide specific values for GC notification's action Changes: https://git.openjdk.org/jdk/pull/13785/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13785&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307378 Stats: 42 lines in 8 files changed: 19 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/13785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13785/head:pull/13785 PR: https://git.openjdk.org/jdk/pull/13785 From wkemper at openjdk.org Wed May 3 18:26:50 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 May 2023 18:26:50 GMT Subject: RFR: Remove passing tests from ProblemList.txt Message-ID: <_mVMIv03Z0947HMJS7M-KNqs-KhklMG780ACwlHhA18=.019bccd7-2ff4-4724-b289-fef444047d8a@github.com> Recent test runs suggest that all but three tests are now passing. The remaining failures are believed to be fixed by https://github.com/openjdk/shenandoah/pull/248. ------------- Commit messages: - Merge branch 'shenandoah-master' into some-tests-enabled - Disable still failing tests - Enable all tests so we can see where we stand Changes: https://git.openjdk.org/shenandoah/pull/273/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=273&range=00 Stats: 9 lines in 2 files changed: 0 ins; 6 del; 3 mod Patch: https://git.openjdk.org/shenandoah/pull/273.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/273/head:pull/273 PR: https://git.openjdk.org/shenandoah/pull/273 From kdnilsen at openjdk.org Wed May 3 18:36:14 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 3 May 2023 18:36:14 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions In-Reply-To: References: Message-ID: On Wed, 3 May 2023 18:17:20 GMT, William Kemper wrote: > At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. Marked as reviewed by kdnilsen (no project role). ------------- PR Review: https://git.openjdk.org/jdk/pull/13785#pullrequestreview-1411557097 From rkennke at openjdk.org Wed May 3 19:19:44 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 19:19:44 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: References: Message-ID: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix type narrowing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/fe0915e2..5ee17597 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=23-24 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From tschatzl at openjdk.org Wed May 3 21:42:24 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 21:42:24 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> Message-ID: <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> On Wed, 3 May 2023 19:19:44 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix type narrowing Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/shared/gc_globals.hpp line 699: > 697: \ > 698: product(bool, UseAltGCForwarding, false, EXPERIMENTAL, \ > 699: "Use alternative GC forwarding that preserves object headers") \ I would strongly prefer if this were not a product flag at this time, but a develop flag. It potentially decreases performance of serial gc full gcs by a significant amount with no upside at all (not that worried about g1 or other concurrent gcs). Can you give me reasons why an end user would ever consciously enable this flag? Using a develop flag is only a minor annoyance for development - we already do that for other features like evacuation failure injection in G1. For end users this would result in (guaranteed) zero performance impact. Only when adding compressed object headers with Lilliput this should be changed to a product flag. I do not know your schedule for upstreaming Lilliput, but if it would miss JDK 21, people would suffer from this for the entire lifetime of JDK 21.... which is an LTS release. (Fwiw I would suggest the same for a non-LTS release, it seems to be worse in this situation though). src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 43: > 41: > 42: uint SlidingForwarding::region_index_containing(HeapWord* addr) { > 43: uint index = static_cast(pointer_delta(addr, _heap_start) >> _region_size_words_shift); I believe it is possible to bias the array pointer to avoid that subtraction of the `_heap_start` like we do e.g. for the card table. See also `G1BiasedArray` or so for a kind of ready-made class implementing this. Not sure it will help a lot, but at least remove the subtraction and the load of the `_heap_start` value. src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 59: > 57: // Primary is free > 58: _bases_table[base_idx] = to_region_base; > 59: } else if (_bases_table[base_idx] == to_region_base) { This probably won't help at all with performance, but I would kind of put the checks for the common cases where the table values are set (particularly the first one) first (I may be wrong about whether this is possible). The `UNUSED_BASE` values in the tables will be encountered exactly once... ------------- PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1411879100 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184309687 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184310440 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184313798 From tschatzl at openjdk.org Wed May 3 22:08:24 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 22:08:24 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> Message-ID: On Wed, 3 May 2023 21:30:31 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix type narrowing > > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 43: > >> 41: >> 42: uint SlidingForwarding::region_index_containing(HeapWord* addr) { >> 43: uint index = static_cast(pointer_delta(addr, _heap_start) >> _region_size_words_shift); > > I believe it is possible to bias the array pointer to avoid that subtraction of the `_heap_start` like we do e.g. for the card table. See also `G1BiasedArray` or so for a kind of ready-made class implementing this. > > Not sure it will help a lot, but at least remove the subtraction and the load of the `_heap_start` value. Maybe possible ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184338484 From tschatzl at openjdk.org Wed May 3 22:08:25 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 22:08:25 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> Message-ID: On Wed, 3 May 2023 19:19:44 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix type narrowing src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 82: > 80: > 81: uintptr_t encoded = (offset << OFFSET_BITS_SHIFT) | > 82: (alt_region << ALT_REGION_SHIFT) | While I understand that a `bool` is typically encoded as either `0` or `1` (not sure if it's actually specified somewhere) it would likely make the code cleaner to use a real integer of some type here to me. Also, the shift could be inlined in the assignments above. Like setting `alt_region` to either `0 (<< ALT_REGION_SHIFT)` or `1 << ALT_REGION_SHIFT` directly in the code. This is obviously a nano-optimization that probably won't show up anywhere... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184337013 From dholmes at openjdk.org Wed May 3 22:12:28 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 May 2023 22:12:28 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v3] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 13:22:40 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fix style > - Merge remote-tracking branch 'origin/master' into JDK-8301493 > - Explicitly cast > - Fixes > - Replace NULL with nullptr in cpu/aarch64 Looks good - thanks! Three minor suggested changes and three opportunities to remove casts (that I spotted). src/hotspot/cpu/aarch64/icache_aarch64.cpp line 32: > 30: ICache::flush_icache_stub_t* flush_icache_stub) { > 31: // Give anyone who calls this a surprise > 32: *flush_icache_stub = (ICache::flush_icache_stub_t)nullptr; Hopefully don't need the cast any more. src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp line 270: > 268: virtual void pass_object() { > 269: intptr_t* addr = single_slot_addr(); > 270: intptr_t value = *addr == 0 ? (intptr_t)nullptr : (intptr_t)addr; This looks like it should be using 0 (zero) not NULL/nullptr? src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 971: > 969: > 970: isb(); > 971: mov_metadata(rmethod, (Metadata*)nullptr); Shouldn't need cast any more. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1642: > 1640: void MacroAssembler::null_check(Register reg, int offset) { > 1641: if (needs_explicit_null_check(offset)) { > 1642: // provoke OS null exception if reg = null by Suggest `reg is null` src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1648: > 1646: } else { > 1647: // nothing to do, (later) access of M[reg + offset] > 1648: // will provoke OS null exception if reg = null Suggest `reg is null` src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1424: > 1422: in_ByteSize(-1), > 1423: in_ByteSize(-1), > 1424: (OopMapSet*)nullptr); Shouldn't need cast any more ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12321#pullrequestreview-1411961600 PR Review Comment: https://git.openjdk.org/jdk/pull/12321#discussion_r1184332635 PR Review Comment: https://git.openjdk.org/jdk/pull/12321#discussion_r1184334682 PR Review Comment: https://git.openjdk.org/jdk/pull/12321#discussion_r1184335467 PR Review Comment: https://git.openjdk.org/jdk/pull/12321#discussion_r1184336073 PR Review Comment: https://git.openjdk.org/jdk/pull/12321#discussion_r1184336226 PR Review Comment: https://git.openjdk.org/jdk/pull/12321#discussion_r1184338414 From ysr at openjdk.org Wed May 3 23:26:49 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 3 May 2023 23:26:49 GMT Subject: RFR: Remove passing tests from ProblemList.txt In-Reply-To: <_mVMIv03Z0947HMJS7M-KNqs-KhklMG780ACwlHhA18=.019bccd7-2ff4-4724-b289-fef444047d8a@github.com> References: <_mVMIv03Z0947HMJS7M-KNqs-KhklMG780ACwlHhA18=.019bccd7-2ff4-4724-b289-fef444047d8a@github.com> Message-ID: On Wed, 3 May 2023 18:20:42 GMT, William Kemper wrote: > Recent test runs suggest that all but three tests are now passing. The remaining failures are believed to be fixed by https://github.com/openjdk/shenandoah/pull/248. Exclude two of the tests that fail in github actions... test/hotspot/jtreg/ProblemList.txt line 93: > 91: gc/stress/systemgc/TestSystemGCWithShenandoah.java#generational 8306340 generic-all > 92: gc/stress/gclocker/TestGCLockerWithShenandoah.java#generational 8306341 generic-all > 93: gc/TestAllocHumongousFragment.java#generational 8306342 generic-all Since `TestAllocHumongousFragment` and `TestClassLoaderLeak` still fail in [some configs](https://github.com/earthling-amzn/shenandoah/actions/runs/4874732691), I'd leave them in the problem list until they are fixed, perhaps by reproducing on those platforms. At least one of the flagged failure modes for TestClassLoaderLeak (the failing assert) is likely fixed in Kelvin's upcoming PR, so we can revisit these again following that PR as well. ------------- Changes requested by ysr (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/273#pullrequestreview-1412035739 PR Review Comment: https://git.openjdk.org/shenandoah/pull/273#discussion_r1184395631 From wkemper at openjdk.org Wed May 3 23:51:55 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 May 2023 23:51:55 GMT Subject: RFR: Remove passing tests from ProblemList.txt [v2] In-Reply-To: <_mVMIv03Z0947HMJS7M-KNqs-KhklMG780ACwlHhA18=.019bccd7-2ff4-4724-b289-fef444047d8a@github.com> References: <_mVMIv03Z0947HMJS7M-KNqs-KhklMG780ACwlHhA18=.019bccd7-2ff4-4724-b289-fef444047d8a@github.com> Message-ID: > Recent test runs suggest that all but three tests are now passing. The remaining failures are believed to be fixed by https://github.com/openjdk/shenandoah/pull/248. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Disable tests which failed in GHA ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/273/files - new: https://git.openjdk.org/shenandoah/pull/273/files/4c57a0da..98210777 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=273&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=273&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/273.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/273/head:pull/273 PR: https://git.openjdk.org/shenandoah/pull/273 From kdnilsen at openjdk.org Thu May 4 01:00:31 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 4 May 2023 01:00:31 GMT Subject: RFR: DRAFT: Expand old on demand [v35] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Update global usage following full gc even when non-generational mode ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/8c96288b..e4c120ea Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=34 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=33-34 Stats: 3 lines in 1 file changed: 1 ins; 2 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From rkennke at openjdk.org Thu May 4 06:01:25 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 06:01:25 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> Message-ID: On Wed, 3 May 2023 21:29:20 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix type narrowing > > src/hotspot/share/gc/shared/gc_globals.hpp line 699: > >> 697: \ >> 698: product(bool, UseAltGCForwarding, false, EXPERIMENTAL, \ >> 699: "Use alternative GC forwarding that preserves object headers") \ > > I would strongly prefer if this were not a product flag at this time, but a develop flag. > > It potentially decreases performance of serial gc full gcs by a significant amount with no upside at all (not that worried about g1 or other concurrent gcs). Can you give me reasons why an end user would ever consciously enable this flag? > > Using a develop flag is only a minor annoyance for development - we already do that for other features like evacuation failure injection in G1. For end users this would result in (guaranteed) zero performance impact. > > Only when adding compressed object headers with Lilliput this should be changed to a product flag. > > I do not know your schedule for upstreaming Lilliput, but if it would miss JDK 21, people would suffer from this for the entire lifetime of JDK 21.... which is an LTS release. (Fwiw I would suggest the same for a non-LTS release, it seems to be worse in this situation though). Ok that is reasonable, I will do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184571039 From rkennke at openjdk.org Thu May 4 06:01:27 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 06:01:27 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> Message-ID: On Wed, 3 May 2023 22:05:43 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 43: >> >>> 41: >>> 42: uint SlidingForwarding::region_index_containing(HeapWord* addr) { >>> 43: uint index = static_cast(pointer_delta(addr, _heap_start) >> _region_size_words_shift); >> >> I believe it is possible to bias the array pointer to avoid that subtraction of the `_heap_start` like we do e.g. for the card table. See also `G1BiasedArray` or so for a kind of ready-made class implementing this. >> >> Not sure it will help a lot, but at least remove the subtraction and the load of the `_heap_start` value. > > Maybe possible ;) I don't think so. The biasing in G1 GC (and Shenandoah GC) uses an array to look up per-region stuff (like cset property) without first calculating the actual region index. Instead, it allows to simply shift an address and use that biased index to address the biased array. Here we don't have an array, we only want the index of the region that contains the address. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184571191 From rkennke at openjdk.org Thu May 4 06:05:25 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 06:05:25 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> Message-ID: On Wed, 3 May 2023 21:35:03 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix type narrowing > > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 59: > >> 57: // Primary is free >> 58: _bases_table[base_idx] = to_region_base; >> 59: } else if (_bases_table[base_idx] == to_region_base) { > > This probably won't help at all with performance, but I would kind of put the checks for the common cases where the table values are set (particularly the first one) first (I may be wrong about whether this is possible). > The `UNUSED_BASE` values in the tables will be encountered exactly once... I believe we can safely swap the UNUSED with the primary check. I'll do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184573418 From rkennke at openjdk.org Thu May 4 06:09:26 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 06:09:26 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> Message-ID: <3DQM2ay8VGdLVhxa2iCqOOS3KX3AvXyoq_w3t228Sm0=.9385d002-3124-40ac-bddf-5340015dfed5@github.com> On Wed, 3 May 2023 22:04:08 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix type narrowing > > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 82: > >> 80: >> 81: uintptr_t encoded = (offset << OFFSET_BITS_SHIFT) | >> 82: (alt_region << ALT_REGION_SHIFT) | > > While I understand that a `bool` is typically encoded as either `0` or `1` (not sure if it's actually specified somewhere) it would likely make the code cleaner to use a real integer of some type here to me. > > Also, the shift could be inlined in the assignments above. > Like setting `alt_region` to either `0 (<< ALT_REGION_SHIFT)` or `1 << ALT_REGION_SHIFT` directly in the code. > This is obviously a nano-optimization that probably won't show up anywhere... Oh yes, I'll change it to an integral value. I don't see how moving the shift to the assignment would help, and I'd prefer to keep it in the place where we encode the value, I think that is more readable/less confusing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184575641 From rkennke at openjdk.org Thu May 4 06:30:01 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 06:30:01 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v26] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Switch back to size_t for some fields - Address @tschatzl's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/5ee17597..2762f1b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=24-25 Stats: 40 lines in 4 files changed: 4 ins; 4 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Thu May 4 07:04:19 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 07:04:19 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v27] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix release build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/2762f1b1..0cc732ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=25-26 Stats: 4 lines in 2 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From shade at openjdk.org Thu May 4 08:30:05 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 May 2023 08:30:05 GMT Subject: RFR: Remove passing tests from ProblemList.txt [v2] In-Reply-To: References: <_mVMIv03Z0947HMJS7M-KNqs-KhklMG780ACwlHhA18=.019bccd7-2ff4-4724-b289-fef444047d8a@github.com> Message-ID: On Wed, 3 May 2023 23:51:55 GMT, William Kemper wrote: >> Recent test runs suggest that all but three tests are now passing. The remaining failures are believed to be fixed by https://github.com/openjdk/shenandoah/pull/248. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Disable tests which failed in GHA Marked as reviewed by shade (Committer). ------------- PR Review: https://git.openjdk.org/shenandoah/pull/273#pullrequestreview-1412569051 From eosterlund at openjdk.org Thu May 4 09:37:28 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 4 May 2023 09:37:28 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v12] In-Reply-To: References: <7wh9oQNWTga_roKBanu2246zZ53TTL3HyoeAdnVQYpk=.f69973bf-821c-47dc-a8df-f4b47af1c9a6@github.com> <3H0IufrunNdAbYOGOTTp9oLepzSV7so-xIBrWUXoFkU=.980fe927-a804-48be-9c5a-aa7de4b8142e@github.com> Message-ID: On Fri, 28 Apr 2023 17:52:33 GMT, Erik ?sterlund wrote: >> It seems to be used in a couple of places already: >> >> grep -R ff51afd7ed558ccd src >> src/jdk.jfr/share/classes/jdk/jfr/internal/EventWriterKey.java: z = (z ^ (z >>> 33)) * 0xff51afd7ed558ccdL; >> src/java.base/share/classes/java/util/SplittableRandom.java: z = (z ^ (z >>> 33)) * 0xff51afd7ed558ccdL; // MurmurHash3 mix constants >> src/java.base/share/classes/java/util/concurrent/ThreadLocalRandom.java: z = (z ^ (z >>> 33)) * 0xff51afd7ed558ccdL; >> src/java.base/share/classes/jdk/internal/util/random/RandomSupport.java: z = (z ^ (z >>> 33)) * 0xff51afd7ed558ccdL; >> src/hotspot/share/gc/shared/slidingForwarding.cpp: val *= 0xff51afd7ed558ccdULL; >> src/hotspot/share/gc/shenandoah/shenandoahEvacOOMHandler.cpp: key *= UINT64_C(0xff51afd7ed558ccd); > > Sounds good then. If you wouldn't mind, I think this one is even better: https://github.com/iwanowww/jdk/blob/ssc.cascading/src/hotspot/share/oops/klass.cpp?plain=1#L309 @iwanowww is using it for faster type checking. Our very own @rose00 is behind this one, so we can definitely use it. It performs very well (~1ns per hash). The hash algo passes BigCrush (as a CBPNRG) and SMhasher (with the right loop to combine the input blocks). It's basically a fantastic hash function, that we are free to use. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184778234 From rkennke at openjdk.org Thu May 4 10:53:26 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 10:53:26 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v12] In-Reply-To: References: <7wh9oQNWTga_roKBanu2246zZ53TTL3HyoeAdnVQYpk=.f69973bf-821c-47dc-a8df-f4b47af1c9a6@github.com> <3H0IufrunNdAbYOGOTTp9oLepzSV7so-xIBrWUXoFkU=.980fe927-a804-48be-9c5a-aa7de4b8142e@github.com> Message-ID: <4-Ztc15xMXxNX5tiyMT-9YokDi65JlIuVX1TFuC__-s=.64094113-c745-4376-8a88-ab75d5cf0beb@github.com> On Thu, 4 May 2023 09:34:27 GMT, Erik ?sterlund wrote: >> Sounds good then. > > If you wouldn't mind, I think this one is even better: https://github.com/iwanowww/jdk/blob/ssc.cascading/src/hotspot/share/oops/klass.cpp?plain=1#L309 > @iwanowww is using it for faster type checking. Our very own @rose00 is behind this one, so we can definitely use it. It performs very well (~1ns per hash). The hash algo passes BigCrush (as a CBPNRG) and SMhasher (with the right loop to combine the input blocks). It's basically a fantastic hash function, that we are free to use. Ok that is great stuff! Maybe it'd be useful to move it to a central place with this PR (alternative full GC fwding), because we're going to need it for other purposes, too (other GC tables, i-hash, faster type checking, maybe more?) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184854210 From eosterlund at openjdk.org Thu May 4 11:00:29 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 4 May 2023 11:00:29 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v12] In-Reply-To: <4-Ztc15xMXxNX5tiyMT-9YokDi65JlIuVX1TFuC__-s=.64094113-c745-4376-8a88-ab75d5cf0beb@github.com> References: <7wh9oQNWTga_roKBanu2246zZ53TTL3HyoeAdnVQYpk=.f69973bf-821c-47dc-a8df-f4b47af1c9a6@github.com> <3H0IufrunNdAbYOGOTTp9oLepzSV7so-xIBrWUXoFkU=.980fe927-a804-48be-9c5a-aa7de4b8142e@github.com> <4-Ztc15xMXxNX5tiyMT-9YokDi65JlIuVX1TFuC__-s=.64094113-c745-4376-8a88-ab75d5cf0beb@github.com> Message-ID: <0DF0LOv43PBl8SDn7spkqCULAqEEwOjjgBZS4ewHF7U=.6d38770e-1849-42db-9274-2adaae264a80@github.com> On Thu, 4 May 2023 10:50:17 GMT, Roman Kennke wrote: >> If you wouldn't mind, I think this one is even better: https://github.com/iwanowww/jdk/blob/ssc.cascading/src/hotspot/share/oops/klass.cpp?plain=1#L309 >> @iwanowww is using it for faster type checking. Our very own @rose00 is behind this one, so we can definitely use it. It performs very well (~1ns per hash). The hash algo passes BigCrush (as a CBPNRG) and SMhasher (with the right loop to combine the input blocks). It's basically a fantastic hash function, that we are free to use. > > Ok that is great stuff! Maybe it'd be useful to move it to a central place with this PR (alternative full GC fwding), because we're going to need it for other purposes, too (other GC tables, i-hash, faster type checking, maybe more?) > > Oh and that code is using __int128 type, how/where do I get that outside of GCC? Yes - great idea. Maybe somewhere in utilities. We might swap to it with ZGC as well when things settle down there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184860529 From rkennke at openjdk.org Thu May 4 11:27:35 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 11:27:35 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v28] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Use @rose00's fast-hash impl instead of murmur ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/0cc732ed..ad9fb171 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=26-27 Stats: 106 lines in 2 files changed: 93 ins; 12 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Thu May 4 11:27:36 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 11:27:36 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v12] In-Reply-To: <0DF0LOv43PBl8SDn7spkqCULAqEEwOjjgBZS4ewHF7U=.6d38770e-1849-42db-9274-2adaae264a80@github.com> References: <7wh9oQNWTga_roKBanu2246zZ53TTL3HyoeAdnVQYpk=.f69973bf-821c-47dc-a8df-f4b47af1c9a6@github.com> <3H0IufrunNdAbYOGOTTp9oLepzSV7so-xIBrWUXoFkU=.980fe927-a804-48be-9c5a-aa7de4b8142e@github.com> <4-Ztc15xMXxNX5tiyMT-9YokDi65JlIuVX1TFuC__-s=.64094113-c745-4376-8a88-ab75d5cf0beb@github.com> <0DF0LOv43PBl8SDn7spkqCULAqEEwOjjgBZ S4ewHF7U=.6d38770e-1849-42db-9274-2adaae264a80@github.com> Message-ID: On Thu, 4 May 2023 10:57:07 GMT, Erik ?sterlund wrote: >> Ok that is great stuff! Maybe it'd be useful to move it to a central place with this PR (alternative full GC fwding), because we're going to need it for other purposes, too (other GC tables, i-hash, faster type checking, maybe more?) >> >> Oh and that code is using __int128 type, how/where do I get that outside of GCC? > > Yes - great idea. Maybe somewhere in utilities. We might swap to it with ZGC as well when things settle down there. Ok, I pushed a change that uses @rose00's better hashing. I added/changed the 128-bit multiplication to (hopefully) make it portable. Let's see what GHA has to say about this ;-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184882541 From rkennke at openjdk.org Thu May 4 11:40:14 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 11:40:14 GMT Subject: RFR: 8307395: Add missing STS to Shenandoah Message-ID: Testing in project Lilliput has revealed that Shenandoah GC is lacking one STS. This causes a reliable crash (with Lilliput) when running TestGCBasherWithShenandoah.java with -XX:+UseHeavyMonitors because it touches an already deflated monitor. Testing (all in Lilliput where it caused the troubles, but applies to upstream as well): - [x] TestGCBasherWithShenandoah.java +UseHeavyMonitors - [x] hotspot_gc_shenandoah +UseHeavyMonitors ------------- Commit messages: - 8307395: Add missing STS to Shenandoah Changes: https://git.openjdk.org/jdk/pull/13799/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13799&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307395 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13799.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13799/head:pull/13799 PR: https://git.openjdk.org/jdk/pull/13799 From rkennke at openjdk.org Thu May 4 11:56:22 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 11:56:22 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v29] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Add usual header include guards ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/ad9fb171..0f3604aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=27-28 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From jsjolen at openjdk.org Thu May 4 12:14:05 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 4 May 2023 12:14:05 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v4] In-Reply-To: References: Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: dholmes' fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12321/files - new: https://git.openjdk.org/jdk/pull/12321/files/5cc7a5a7..31b2ed11 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12321&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12321&range=02-03 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12321.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12321/head:pull/12321 PR: https://git.openjdk.org/jdk/pull/12321 From shade at openjdk.org Thu May 4 12:24:15 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 May 2023 12:24:15 GMT Subject: RFR: 8307395: Add missing STS to Shenandoah In-Reply-To: References: Message-ID: <9NYMLTyK8H2Ruo-8pZ8UtOR0m_tjv7rXucsYlYIhFUs=.3495923c-6f52-42e1-90b9-9b8930f111d3@github.com> On Thu, 4 May 2023 11:34:15 GMT, Roman Kennke wrote: > Testing in project Lilliput has revealed that Shenandoah GC is lacking one STS. This causes a reliable crash (with Lilliput) when running TestGCBasherWithShenandoah.java with -XX:+UseHeavyMonitors because it touches an already deflated monitor. > > Testing (all in Lilliput where it caused the troubles, but applies to upstream as well): > - [x] TestGCBasherWithShenandoah.java +UseHeavyMonitors > - [x] hotspot_gc_shenandoah +UseHeavyMonitors Looks fine, provided testing is clean. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13799#pullrequestreview-1412969497 From stefank at openjdk.org Thu May 4 13:51:16 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 4 May 2023 13:51:16 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions In-Reply-To: References: Message-ID: On Wed, 3 May 2023 18:17:20 GMT, William Kemper wrote: > At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. The proposed patch introduces two ways to inject these messages, which makes the code slightly harder to follow. I wonder if it wouldn't be easier if we just had one, a bit more flexible, way to provide the message. What do you think about something like this: https://github.com/stefank/jdk/commit/52e9fe84c2bc14b21824068d71419d0e1f0796c1 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13785#issuecomment-1534816352 From rkennke at openjdk.org Thu May 4 13:56:07 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 13:56:07 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v30] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Clamp home index. Duh. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/0f3604aa..c3b9ae9e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=28-29 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From stuefe at openjdk.org Thu May 4 13:56:09 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 May 2023 13:56:09 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v30] In-Reply-To: References: Message-ID: <7Sc3jztrzRq8rXZh4l3V4LdyBnR33qvKh8OZCTBQZbY=.9120353e-0db0-4135-81ea-30732ced0fb0@github.com> On Thu, 4 May 2023 13:51:53 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Clamp home index. Duh. LGTM. I ok this now; my remaining comments are suggestions - up to you to take them or not. I removed some of my obsolete comments to clear the space. Tests are missing. A simple way would be to run a selection of our standard GC tests with +AltGCForwarding. This is especially important if you follow Thomas' advice and make AltGCForwarding a develop switch. The only other thing that occurred to me is that you could probably change initialization: don't require caller to specify it but calculate it yourself such that the 28 bit offset is maximally used. That would save some memory since the bases table can be smaller. Again, up to you. src/hotspot/share/gc/shared/slidingForwarding.cpp line 127: > 125: size_t FallbackTable::home_index(HeapWord* from) { > 126: uint64_t val = reinterpret_cast(from); > 127: uint64_t hash = FastHash::get_hash64(val, 0xAAAAAAAAAAAAAAAA); Use UCONST64(0xAAA..AA) ? ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1413025879 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1185044658 From stuefe at openjdk.org Thu May 4 13:56:19 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 4 May 2023 13:56:19 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v29] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 11:56:22 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add usual header include guards src/hotspot/share/gc/shared/slidingForwarding.cpp line 35: > 33: // We cannot use 0, because that may already be a valid base address in zero-based heaps. > 34: // 0x1 is safe because heap base addresses must be aligned by much larger alignment > 35: HeapWord* const SlidingForwarding::UNUSED_BASE = reinterpret_cast(0x1); I try to understand under which circumstances a zero heap location would be okay. This is *uncompressed* oops, right? If that were 0, you could just hardcode constexpr 0 in the header. src/hotspot/share/gc/shared/slidingForwarding.cpp line 111: > 109: _table[i]._from = nullptr; > 110: _table[i]._to = nullptr; > 111: } It would be enough to set _from to nullptr. src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 60: > 58: } else if (_bases_table[base_idx] == UNUSED_BASE) { > 59: // Primary is free > 60: _bases_table[base_idx] = to_region_base; Since the else branch is probably much more common, would it make sense to swap the conditions? Same below. src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 87: > 85: assert(to == decode_forwarding(from, encoded), "must be reversible"); > 86: return encoded; > 87: } Since encoding should produce a 32-bit value, why not return a 32-bit value? Same below, for decoding. Or, at least assert that returned value has no higher bits set. src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 90: > 88: > 89: HeapWord* SlidingForwarding::decode_forwarding(HeapWord* from, uintptr_t encoded) { > 90: assert((encoded & markWord::marked_value) == markWord::marked_value, "must be marked as forwarded"); s/marked_value/lock_mask ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184986345 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184989465 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184972889 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184976288 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1184980683 From shade at openjdk.org Thu May 4 14:05:30 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 May 2023 14:05:30 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v30] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 13:56:07 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Clamp home index. Duh. Another round... ------------- PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1409678649 From shade at openjdk.org Thu May 4 14:05:36 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 May 2023 14:05:36 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v20] In-Reply-To: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> References: <3RFs-ck6jkNa-9dxLYQi00uC-f5K995W8d29V5swQpM=.f3bf90e8-1c09-436a-b414-760b76dac9ed@github.com> Message-ID: On Wed, 3 May 2023 10:54:43 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > More Thomas' comments src/hotspot/share/gc/g1/g1FullGCOopClosures.inline.hpp line 35: > 33: #include "gc/g1/g1FullGCMarker.inline.hpp" > 34: #include "gc/g1/heapRegionRemSet.inline.hpp" > 35: #include "gc/shared/gcForwarding.inline.hpp" Copyright years need to be updated. src/hotspot/share/gc/serial/markSweep.inline.hpp line 33: > 31: #include "classfile/javaClasses.inline.hpp" > 32: #include "gc/shared/continuationGCSupport.inline.hpp" > 33: #include "gc/shared/gcForwarding.inline.hpp" Copyright years need to be updated. src/hotspot/share/gc/shared/gc_globals.hpp line 698: > 696: constraint(GCCardSizeInBytesConstraintFunc,AtParse) \ > 697: \ > 698: product(bool, UseAltGCForwarding, false, EXPERIMENTAL, \ See if copyright years need to be updated. src/hotspot/share/gc/shared/preservedMarks.cpp line 26: > 24: > 25: #include "precompiled.hpp" > 26: #include "gc/shared/gcForwarding.inline.hpp" Copyright years need to be updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183540583 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183540894 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183542696 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1183542195 From shade at openjdk.org Thu May 4 14:05:39 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 May 2023 14:05:39 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v29] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 11:56:22 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add usual header include guards src/hotspot/share/gc/shared/gc_globals.hpp line 698: > 696: constraint(GCCardSizeInBytesConstraintFunc,AtParse) \ > 697: \ > 698: develop(bool, UseAltGCForwarding, false, \ I don't think you can opt-in into `true` in release bits, if this flag is `develop` when the rest of Lilliput arrives. In release bits, all checks involving this flag would fold with `false`. Maybe that's the intent here, as it keeps the release performance at the baseline level, but this makes performance overhead estimations for this PR a bit hard :) [I'll hack the flag back to experimental for tests] ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1185043173 From shade at openjdk.org Thu May 4 14:05:41 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 May 2023 14:05:41 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v19] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 18:21:18 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Initialize 'heap' elements in test case src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 43: > 41: bool SlidingForwarding::region_contains(HeapWord* region_base, HeapWord* addr) const { > 42: return (region_base <= addr) && (addr < (region_base + _region_size_words)); > 43: } Now unused! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1182911404 From rkennke at openjdk.org Thu May 4 14:34:27 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 14:34:27 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v30] In-Reply-To: <7Sc3jztrzRq8rXZh4l3V4LdyBnR33qvKh8OZCTBQZbY=.9120353e-0db0-4135-81ea-30732ced0fb0@github.com> References: <7Sc3jztrzRq8rXZh4l3V4LdyBnR33qvKh8OZCTBQZbY=.9120353e-0db0-4135-81ea-30732ced0fb0@github.com> Message-ID: On Thu, 4 May 2023 13:50:36 GMT, Thomas Stuefe wrote: > LGTM. I ok this now; my remaining comments are suggestions - up to you to take them or not. Thanks! > Tests are missing. A simple way would be to run a selection of our standard GC tests with +AltGCForwarding. This is especially important if you follow Thomas' advice and make AltGCForwarding a develop switch. I run hotspot_gc with UseAltGCForwarding turned on. Not sure if there is an easy way to make this a test task. I could perhaps add a few run configurations to tests that are useful. For example, gc/stress/TestMultiThreadStressRSet.java tended to exercise both the sliding-forwarding and the fallback-forwarding But would the develop-only switch not complicate this? Because it means we could only run such tests in debug builds. > The only other thing that occurred to me is that you could probably change initialization: don't require caller to specify it but calculate it yourself such that the 28 bit offset is maximally used. That would save some memory since the bases table can be smaller. Again, up to you. Yeah maybe. SpaceAlignment should be set by all GCs to a reasonable region-size, we could probably just pick that up. OTOH, we need a little bit of cooperation from the GC here: The whole sliding-forwarding algo relies on the fact that GC workers divide up their work based on their regions, and are essentially single-threaded within their work queues. I'm a bit worried about touching this stuff at this point, and cause another round of reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1534889536 From wkemper at openjdk.org Thu May 4 14:52:15 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 May 2023 14:52:15 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions In-Reply-To: References: Message-ID: On Thu, 4 May 2023 13:48:30 GMT, Stefan Karlsson wrote: >> At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. > > The proposed patch introduces two ways to inject these messages, which makes the code slightly harder to follow. I wonder if it wouldn't be easier if we just had one, a bit more flexible, way to provide the message. What do you think about something like this: > https://github.com/stefank/jdk/commit/52e9fe84c2bc14b21824068d71419d0e1f0796c1 @stefank - Unifying the code path looks great to me. I was just trying to minimize the changes. Should I cherry pick your commit into the branch for the PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13785#issuecomment-1534919498 From stefank at openjdk.org Thu May 4 14:59:15 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 4 May 2023 14:59:15 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions In-Reply-To: References: Message-ID: On Wed, 3 May 2023 18:17:20 GMT, William Kemper wrote: > At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. Yes, that sounds like good plan. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13785#issuecomment-1534930015 From kdnilsen at openjdk.org Thu May 4 16:32:15 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 4 May 2023 16:32:15 GMT Subject: RFR: DRAFT: Expand old on demand [v36] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Enable more tests on this branch ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/e4c120ea..cda0e412 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=35 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=34-35 Stats: 6 lines in 1 file changed: 1 ins; 0 del; 5 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From rkennke at openjdk.org Thu May 4 16:35:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 16:35:29 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v29] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 12:53:23 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Add usual header include guards > > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 60: > >> 58: } else if (_bases_table[base_idx] == UNUSED_BASE) { >> 59: // Primary is free >> 60: _bases_table[base_idx] = to_region_base; > > Since the else branch is probably much more common, would it make sense to swap the conditions? Same below. I just swapped that around to what it is now: I think the UNUSED_BASE would be taken exactly once per region, then every other call would find a 'good' target. Which means the if-branch would be much more common and else only taken rarely. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1185251911 From rkennke at openjdk.org Thu May 4 16:41:28 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 16:41:28 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v29] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 12:56:11 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Add usual header include guards > > src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 87: > >> 85: assert(to == decode_forwarding(from, encoded), "must be reversible"); >> 86: return encoded; >> 87: } > > Since encoding should produce a 32-bit value, why not return a 32-bit value? Same below, for decoding. Or, at least assert that returned value has no higher bits set. I'm adding asserts. The encoded value will be or-ed into the mark-word which is 64bit anyway, I don't see much value to use 32bit here and then require up-casting and stuff like that. Type mess tends to make compilers unhappy ;-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1185257704 From rkennke at openjdk.org Thu May 4 16:46:37 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 16:46:37 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v29] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 13:04:33 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Add usual header include guards > > src/hotspot/share/gc/shared/slidingForwarding.cpp line 35: > >> 33: // We cannot use 0, because that may already be a valid base address in zero-based heaps. >> 34: // 0x1 is safe because heap base addresses must be aligned by much larger alignment >> 35: HeapWord* const SlidingForwarding::UNUSED_BASE = reinterpret_cast(0x1); > > I try to understand under which circumstances a zero heap location would be okay. This is *uncompressed* oops, right? > > If that were 0, you could just hardcode constexpr 0 in the header. When running with compressed oops, the heap may be allocated at the zero page, iirc. Yes, this would still be using sliding-forwarding. (*Actually* this may be an optimization opportunity: When JVM runs with compressed oops, we could simply use compressed oops in the forwarding ptr and avoid the whole sliding-forwarding stuff. Maybe as a follow-up?) > src/hotspot/share/gc/shared/slidingForwarding.cpp line 111: > >> 109: _table[i]._from = nullptr; >> 110: _table[i]._to = nullptr; >> 111: } > > It would be enough to set _from to nullptr. Yes but I like this to be clean. ;-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1185262678 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1185263534 From rkennke at openjdk.org Thu May 4 16:56:34 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 16:56:34 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v31] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: - @shipilev comments - @tstuefe review - Add some test configs that use +UseAltGCForwarding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/c3b9ae9e..f85e4913 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=29-30 Stats: 74 lines in 10 files changed: 66 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Thu May 4 17:23:57 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 17:23:57 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v32] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/f85e4913..0ccd5a0a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=30-31 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From ysr at openjdk.org Thu May 4 17:32:50 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 4 May 2023 17:32:50 GMT Subject: RFR: Remove passing tests from ProblemList.txt [v2] In-Reply-To: References: <_mVMIv03Z0947HMJS7M-KNqs-KhklMG780ACwlHhA18=.019bccd7-2ff4-4724-b289-fef444047d8a@github.com> Message-ID: On Wed, 3 May 2023 23:51:55 GMT, William Kemper wrote: >> Recent test runs suggest that all but three tests are now passing. The remaining failures are believed to be fixed by https://github.com/openjdk/shenandoah/pull/248. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Disable tests which failed in GHA Marked as reviewed by ysr (Committer). ------------- PR Review: https://git.openjdk.org/shenandoah/pull/273#pullrequestreview-1413571758 From wkemper at openjdk.org Thu May 4 18:18:20 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 May 2023 18:18:20 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions [v2] In-Reply-To: References: Message-ID: > At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Fix missed implementation file - Unify message path ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13785/files - new: https://git.openjdk.org/jdk/pull/13785/files/d1792bb6..a85685b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13785&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13785&range=00-01 Stats: 77 lines in 16 files changed: 19 ins; 11 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/13785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13785/head:pull/13785 PR: https://git.openjdk.org/jdk/pull/13785 From wkemper at openjdk.org Thu May 4 18:26:57 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 May 2023 18:26:57 GMT Subject: RFR: Remove passing tests from ProblemList.txt [v3] In-Reply-To: <_mVMIv03Z0947HMJS7M-KNqs-KhklMG780ACwlHhA18=.019bccd7-2ff4-4724-b289-fef444047d8a@github.com> References: <_mVMIv03Z0947HMJS7M-KNqs-KhklMG780ACwlHhA18=.019bccd7-2ff4-4724-b289-fef444047d8a@github.com> Message-ID: > Recent test runs suggest that all but three tests are now passing. The remaining failures are believed to be fixed by https://github.com/openjdk/shenandoah/pull/248. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Add ticket numbers back to disabled tests ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/273/files - new: https://git.openjdk.org/shenandoah/pull/273/files/98210777..7006645f Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=273&range=02 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=273&range=01-02 Stats: 4 lines in 1 file changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/shenandoah/pull/273.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/273/head:pull/273 PR: https://git.openjdk.org/shenandoah/pull/273 From wkemper at openjdk.org Thu May 4 18:30:17 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 May 2023 18:30:17 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions [v3] In-Reply-To: References: Message-ID: > At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Remove trailing whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13785/files - new: https://git.openjdk.org/jdk/pull/13785/files/a85685b8..918c20cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13785&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13785&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13785/head:pull/13785 PR: https://git.openjdk.org/jdk/pull/13785 From stefank at openjdk.org Thu May 4 19:55:14 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 4 May 2023 19:55:14 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions [v3] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 18:30:17 GMT, William Kemper wrote: >> At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Remove trailing whitespace Marked as reviewed by stefank (Reviewer). I see that I dropped the ZGC changes when I rebased my proposed patch. Thanks for fixing that. ------------- PR Review: https://git.openjdk.org/jdk/pull/13785#pullrequestreview-1413812475 PR Comment: https://git.openjdk.org/jdk/pull/13785#issuecomment-1535329183 From kdnilsen at openjdk.org Thu May 4 19:59:47 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 4 May 2023 19:59:47 GMT Subject: RFR: DRAFT: Expand old on demand [v37] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Reorder lines of code to reduce diffs from original code - Remove irrelevant comment ShenandoahGenerationSizer::heap_size_changed() comment said this was not hooked anywhere. More recently, we have figured out two places that need to invoke this service. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/cda0e412..b59ece73 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=36 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=35-36 Stats: 40 lines in 2 files changed: 20 ins; 20 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From jsjolen at openjdk.org Thu May 4 20:13:20 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 4 May 2023 20:13:20 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v3] In-Reply-To: <8r0Te2Q1VuISH9tDaZaMzNpEL373FmmtBf5A0hO-0ek=.250720c8-bcbf-47f5-a82b-611e93247bd9@github.com> References: <8r0Te2Q1VuISH9tDaZaMzNpEL373FmmtBf5A0hO-0ek=.250720c8-bcbf-47f5-a82b-611e93247bd9@github.com> Message-ID: On Wed, 3 May 2023 13:46:31 GMT, Thomas Schatzl wrote: >> Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Fix style >> - Merge remote-tracking branch 'origin/master' into JDK-8301493 >> - Explicitly cast >> - Fixes >> - Replace NULL with nullptr in cpu/aarch64 > > Remaining `NULL` in > > gc/shared/BarrierSetAssembler::check_oop() > codeBuffer_aarch64.cpp/emit_shared_trampolines() > stubGenerator_aarch64.cpp/generate_final_stubs() > vm_version_aarch64.cpp/check_info_file() Thanks @tschatzl, probably new code from when I merged that didn't get a conflict. I did a grep for NULL and checked that no more occurrences are in the code. Running tier1 testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12321#issuecomment-1535348805 From jsjolen at openjdk.org Thu May 4 20:13:14 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 4 May 2023 20:13:14 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v5] In-Reply-To: References: Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Missed these NULLs somehow ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12321/files - new: https://git.openjdk.org/jdk/pull/12321/files/31b2ed11..cb6ffb99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12321&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12321&range=03-04 Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/12321.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12321/head:pull/12321 PR: https://git.openjdk.org/jdk/pull/12321 From kdnilsen at openjdk.org Thu May 4 20:37:11 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 4 May 2023 20:37:11 GMT Subject: RFR: DRAFT: Expand old on demand [v38] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Tidy code for integration Two "global" variables are no longer needed: thread_iteration_roots and MIXED_EVACUATIONS_ENABLED. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/b59ece73..b0b61a45 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=37 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=36-37 Stats: 9 lines in 3 files changed: 0 ins; 5 del; 4 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Thu May 4 21:59:21 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 4 May 2023 21:59:21 GMT Subject: RFR: DRAFT: Expand old on demand [v39] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Reduce guaranteed young GC interval to 30 seconds It had been set to 15 seconds after observing that the default value of 5 minutes was resulting in degenerated cycles because GC triggers were too lazy. The value of 30 seconds has been observed to not result in degenerated GC with our two PhasedUpdates netflix workloads. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/b0b61a45..ba5616d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=38 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=37-38 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Thu May 4 22:16:57 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 4 May 2023 22:16:57 GMT Subject: RFR: DRAFT: Expand old on demand [v40] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix some assertions One assertion was too strong. Several others were too weak. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/ba5616d5..81ae25aa Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=39 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=38-39 Stats: 21 lines in 1 file changed: 0 ins; 11 del; 10 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Thu May 4 23:21:56 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 4 May 2023 23:21:56 GMT Subject: RFR: DRAFT: Expand old on demand [v41] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Weaken some assertions As written, these assertions fail if generation->is_global(). It may work to rewrite these assertions using accessor methods used(), humongous_waste(), and affiliated_region_count() instead of the instance field values. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/81ae25aa..53b4b5a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=40 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=39-40 Stats: 20 lines in 1 file changed: 10 ins; 0 del; 10 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From wkemper at openjdk.org Fri May 5 05:46:23 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 5 May 2023 05:46:23 GMT Subject: Integrated: 8307378: Allow collectors to provide specific values for GC notifications' actions In-Reply-To: References: Message-ID: On Wed, 3 May 2023 18:17:20 GMT, William Kemper wrote: > At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. This pull request has now been integrated. Changeset: 1b143ba7 Author: William Kemper Committer: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/1b143ba78712e7ac98ca9873c50989b3fba07394 Stats: 88 lines in 19 files changed: 31 ins; 4 del; 53 mod 8307378: Allow collectors to provide specific values for GC notifications' actions Reviewed-by: kdnilsen, stefank ------------- PR: https://git.openjdk.org/jdk/pull/13785 From tschatzl at openjdk.org Fri May 5 06:24:23 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 5 May 2023 06:24:23 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v5] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 20:13:14 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Missed these NULLs somehow Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12321#pullrequestreview-1414228427 From rkennke at openjdk.org Fri May 5 08:40:30 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 08:40:30 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v25] In-Reply-To: <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> References: <6UvcVsPw9XNkD2wlN6kky_EK8bKyBlqdMt4h9_IONN4=.95d8c0c3-9a31-4edb-bdb5-9d5cd33d7c99@github.com> <7DNp1iiy51WJSWUuywlldghKwLETxFip4jlePA3ybrk=.b0f1243a-fa7b-4446-a6c3-3d7d2080ee3e@github.com> Message-ID: <17SrJiXbiYOKvJUgBACh1QVKgJ4y0NmS4LE0MMU2omc=.2c7d4020-807b-4f48-bc4b-f4418fec7846@github.com> On Wed, 3 May 2023 21:39:19 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix type narrowing > > Changes requested by tschatzl (Reviewer). @tschatzl are you good with this PR now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1535921703 From jsjolen at openjdk.org Fri May 5 08:57:37 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 5 May 2023 08:57:37 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v5] In-Reply-To: References: Message-ID: <4CR_EF2Ik0yRsvkhQ0aUei5y85LtzHnGv9b4Ef32oEo=.3087e525-3336-4e72-98f8-7ac1c83280de@github.com> On Thu, 4 May 2023 20:13:14 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Missed these NULLs somehow Alright, tier1 looks good. Integrating. Thank you for the reviews, good people. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12321#issuecomment-1535938350 From jsjolen at openjdk.org Fri May 5 08:57:41 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 5 May 2023 08:57:41 GMT Subject: Integrated: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 In-Reply-To: References: Message-ID: On Tue, 31 Jan 2023 11:39:27 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! This pull request has now been integrated. Changeset: 948f3b3c Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/948f3b3c24709eca3aa6c3f0db6adb9226d6f9ac Stats: 441 lines in 44 files changed: 0 ins; 0 del; 441 mod 8301493: Replace NULL with nullptr in cpu/aarch64 Reviewed-by: tschatzl, gziemski, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/12321 From shade at openjdk.org Fri May 5 10:51:29 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 5 May 2023 10:51:29 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v32] In-Reply-To: References: Message-ID: <-PrjkCfvvA4FqJh17986ncJIdPZSquKmx18tu7C1V4Y=.7230967e-e67f-43ca-b9b6-58aeb6552acf@github.com> On Thu, 4 May 2023 17:23:57 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix asserts I am okay with it. `develop` = `false` protects us from exposing this code path in release bits. I see no regressions even in my targeted tests with the feature turned off. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1414565112 From mdoerr at openjdk.org Fri May 5 15:00:10 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 5 May 2023 15:00:10 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v27] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add test case for passing a double value in a GP register. Use better instructions for moving between FP and GP reg. Improve comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/e4ddbda0..754a19a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=25-26 Stats: 110 lines in 4 files changed: 88 ins; 2 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From rkennke at openjdk.org Fri May 5 15:59:23 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 15:59:23 GMT Subject: Integrated: 8307395: Add missing STS to Shenandoah In-Reply-To: References: Message-ID: On Thu, 4 May 2023 11:34:15 GMT, Roman Kennke wrote: > Testing in project Lilliput has revealed that Shenandoah GC is lacking one STS. This causes a reliable crash (with Lilliput) when running TestGCBasherWithShenandoah.java with -XX:+UseHeavyMonitors because it touches an already deflated monitor. > > Testing (all in Lilliput where it caused the troubles, but applies to upstream as well): > - [x] TestGCBasherWithShenandoah.java +UseHeavyMonitors > - [x] hotspot_gc_shenandoah +UseHeavyMonitors This pull request has now been integrated. Changeset: 3968ab5d Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/3968ab5db5443ce93c9a19ebbc5464f7d91782fc Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8307395: Add missing STS to Shenandoah Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/13799 From tschatzl at openjdk.org Fri May 5 16:16:43 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 5 May 2023 16:16:43 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v32] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 17:23:57 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix asserts I would like to request some cleanups before pushing, but wait changing this. I have been working aggressively strenght-reducing operations and use biased arrays (yes, you can, not sure why not?) (performance seems better if you separate the ?_bases_table` into two logical ones too), reducing the overhead from baseline to +UseAltGCForwarding from a bit more than 10% to ~7.8% you gave on an Ice Lake machine. Same code also shows improvements on some Zen3 machine (from ~9% to 5.3% - but the result is noisy and the machine isn't "clean"), both on that benchmark you gave. ? Let me clean up the code a bit and provide to you on Monday so that you can test too if you are interested. I will then also look in detail at the forwarding table. src/hotspot/share/gc/shared/slidingForwarding.cpp line 57: > 55: _region_size_words = round_up_power_of_2(heap.word_size()); > 56: _region_size_words_shift = log2i_exact(_region_size_words); > 57: } Suggestion: _heap_start = heap.start(); if (UseSerialGC && heap.word_size() <= (1 << NUM_OFFSET_BITS)) { // In this case we can treat the whole heap as a single region and // make the encoding very simple. _num_regions = 1; _region_size_words = round_up_power_of_2(heap.word_size()); } else { _num_regions = align_up(pointer_delta(heap.end(), heap.start()), region_size_words) / region_size_words; _region_size_words = region_size_words; } _region_size_words_shift = log2i_exact(region_size_words); I.e. extract out the common code. src/hotspot/share/gc/shared/slidingForwarding.hpp line 117: > 115: static HeapWord* _heap_start; > 116: static size_t _num_regions; > 117: static size_t _region_size_words; I would prefer that the code, instead of having separate `_heap_start` and `_region_size_words`, would just take a copy of the `MemRegion` passed to it. This would allow making the asserts stronger by using the `contains` method. Both variables only seem to be used in asserts anyway. src/hotspot/share/gc/shared/slidingForwarding.inline.hpp line 77: > 75: > 76: size_t offset = pointer_delta(to, to_region_base); > 77: assert(offset < _region_size_words, "Offset should be within the region. from: " PTR_FORMAT I think for this check, `_region_size_words` should be the size passed to the initialization, not potentially something larger. (See the `SlidingForwarding::initialize()` method for the serial gc special case). ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1415034095 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1186245432 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1186251889 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1186248985 From tschatzl at openjdk.org Fri May 5 16:22:27 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 5 May 2023 16:22:27 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v32] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 17:23:57 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix asserts > I have been working aggressively strength-reducing operations and use biased arrays (yes, you can, not sure why not?) (performance seems better if you separate the ?_bases_table` into two logical ones too), reducing the overhead from baseline to +UseAltGCForwarding from a bit more than 10% to ~7.8% you gave on an Ice Lake machine. Same code also shows improvements on some Zen3 machine (from ~9% to 5.3% - but the result is noisy and the machine isn't "clean"), both on that benchmark you gave. tada > > Let me clean up the code a bit and provide to you on Monday so that you can test too if you are interested. > Actually, current code is at https://github.com/openjdk/jdk/compare/master...tschatzl:jdk:alt-fullgc-forwarding?expand=1 if you want to play with it ; the [strength reduction](https://github.com/openjdk/jdk/commit/10e1f43a736b28e18dc951cdd23309a71dc83694) commit does strength reduction and biasing, keeping current code intact, the other two split the `bases_table` into two separate tables and clean up a bit. Hopefully I wasn't just dreaming or something when testing it (fwiw, I only kind of guarantee that it runs that `Retain` test. Not really tested otherwise). But it tended to crash if I didn't get things right for that test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1536486650 From rkennke at openjdk.org Fri May 5 16:45:46 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 16:45:46 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v32] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 16:19:01 GMT, Thomas Schatzl wrote: > > I have been working aggressively strength-reducing operations and use biased arrays (yes, you can, not sure why not?) (performance seems better if you separate the ?_bases_table` into two logical ones too), reducing the overhead from baseline to +UseAltGCForwarding from a bit more than 10% to ~7.8% you gave on an Ice Lake machine. Same code also shows improvements on some Zen3 machine (from ~9% to 5.3% - but the result is noisy and the machine isn't "clean"), both on that benchmark you gave. tada > > Let me clean up the code a bit and provide to you on Monday so that you can test too if you are interested. > > Actually, current code is at https://github.com/openjdk/jdk/compare/master...tschatzl:jdk:alt-fullgc-forwarding?expand=1 if you want to play with it ; the [strength reduction](https://github.com/openjdk/jdk/commit/10e1f43a736b28e18dc951cdd23309a71dc83694) commit does strength reduction and biasing, keeping current code intact, the other two split the `bases_table` into two separate tables and clean up a bit. > > Hopefully I wasn't just dreaming or something when testing it (fwiw, I only kind of guarantee that it runs that `Retain` test. Not really tested otherwise). But it tended to crash if I didn't get things right for that test. Nice! Thank you for doing that! I'll give it a test as soon as I get to it (perhaps only on Monday). Would you be ok if I integrate it into the main PR once my testing is good? Or do you have more things you want to do on that branch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1536510796 From mdoerr at openjdk.org Sat May 6 19:38:36 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 6 May 2023 19:38:36 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v28] In-Reply-To: References: Message-ID: <9hDHgeACLaNP0lLQ7lXtWN07t6h4DDF5a9aaOTdvyMI=.932783da-eb49-4b9b-843b-fc564c6ffc41@github.com> > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: libTestHFA: Add explicit type conversion to avoid build warning. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/754a19a0..74586ab8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=26-27 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From dholmes at openjdk.org Mon May 8 05:13:33 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 8 May 2023 05:13:33 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v29] In-Reply-To: References: Message-ID: <2GJMfozzvuoZLoQjcqIHSA9ml-_2SllOV6ZJPavK1nc=.6b1d9955-a8f7-49f9-9044-2dbfca2c12d7@github.com> On Thu, 4 May 2023 13:46:18 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Add usual header include guards > > src/hotspot/share/gc/shared/gc_globals.hpp line 698: > >> 696: constraint(GCCardSizeInBytesConstraintFunc,AtParse) \ >> 697: \ >> 698: develop(bool, UseAltGCForwarding, false, \ > > I don't think you can opt-in into `true` in release bits, if this flag is `develop` when the rest of Lilliput arrives. In release bits, all checks involving this flag would fold with `false`. Maybe that's the intent here, as it keeps the release performance at the baseline level, but this makes performance overhead estimations for this PR a bit hard :) [I'll hack the flag back to experimental for tests] If you need this for compact object headers then shouldn't this flag be experimental? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1187028649 From rkennke at openjdk.org Mon May 8 05:44:25 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 05:44:25 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v29] In-Reply-To: <2GJMfozzvuoZLoQjcqIHSA9ml-_2SllOV6ZJPavK1nc=.6b1d9955-a8f7-49f9-9044-2dbfca2c12d7@github.com> References: <2GJMfozzvuoZLoQjcqIHSA9ml-_2SllOV6ZJPavK1nc=.6b1d9955-a8f7-49f9-9044-2dbfca2c12d7@github.com> Message-ID: <_uVyIUcDEzKU9Cf4ajJF8zN_-wR2EMQmkkDY4eoRiks=.354aeecd-00a1-4e72-b62f-3c60c8db0793@github.com> On Mon, 8 May 2023 05:10:11 GMT, David Holmes wrote: >> src/hotspot/share/gc/shared/gc_globals.hpp line 698: >> >>> 696: constraint(GCCardSizeInBytesConstraintFunc,AtParse) \ >>> 697: \ >>> 698: develop(bool, UseAltGCForwarding, false, \ >> >> I don't think you can opt-in into `true` in release bits, if this flag is `develop` when the rest of Lilliput arrives. In release bits, all checks involving this flag would fold with `false`. Maybe that's the intent here, as it keeps the release performance at the baseline level, but this makes performance overhead estimations for this PR a bit hard :) [I'll hack the flag back to experimental for tests] > > If you need this for compact object headers then shouldn't this flag be experimental? @tschatzl requested it to be develop *for this PR* and make it product/experimental in the compact-headers PR, because if the compact-headers PR doesn't make it into 21, then we'd have the slight performance hit of checking the flag (in a loop) in release builds for no benefit. See discussions above. I thought that is reasonable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1187043439 From ayang at openjdk.org Mon May 8 08:31:41 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 8 May 2023 08:31:41 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v32] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 17:23:57 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix asserts src/hotspot/share/gc/shared/slidingForwarding.hpp line 51: > 49: * maximum of two regions. This is an intuitive property: when we slide the compact region full of data, it can > 50: * only span two adjacent regions. This property allows us to use the off-side table to record the addresses of > 51: * two target regions. The table table holds N*2 entries for N logical regions. For each region, it gives the base "table table" src/hotspot/share/gc/shared/slidingForwarding.hpp line 77: > 75: * 4. Compute the mark word from "offset" and "alternate", write it out > 76: * > 77: * Similarily, looking up the target address, given an original object address generally works as follows: Typo - "Similarly" src/hotspot/share/gc/shared/slidingForwarding.hpp line 159: > 157: * is sufficient because G1 serial compaction is single-threaded. > 158: */ > 159: class FallbackTable : public CHeapObj{ Could this class be placed inside `SlidingForwarding` for better encapsulation? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1186754832 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1186754795 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1186754767 From rkennke at openjdk.org Mon May 8 09:58:53 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 09:58:53 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v33] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Fix typos - Make FallbackTable an inner class of SlidingForwarding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/0ccd5a0a..3a8d07b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=31-32 Stats: 78 lines in 2 files changed: 35 ins; 35 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From tschatzl at openjdk.org Mon May 8 10:15:32 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 8 May 2023 10:15:32 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v33] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 09:58:53 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Fix typos > - Make FallbackTable an inner class of SlidingForwarding The https://github.com/openjdk/jdk/compare/master...tschatzl:jdk:alt-fullgc-forwarding?expand=1 branch now contains the promised cleanup. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1538120604 From rkennke at openjdk.org Mon May 8 11:03:53 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 11:03:53 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v34] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 96 additional commits since the last revision: - Merge remote-tracking branch 'tschatzl/alt-fullgc-forwarding' into JDK-8305896 - Cleanup; remove commented out code, renames, "alt_region" -> "alternate" to match documentation, limit _region_word_size to heap size in the optimization case for better checking - cleanup - separate region bases - strength reduction - Fix asserts - @shipilev comments - @tstuefe review - Add some test configs that use +UseAltGCForwarding - Clamp home index. Duh. - ... and 86 more: https://git.openjdk.org/jdk/compare/5a30446c...c1ce8a76 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/3a8d07b0..c1ce8a76 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=32-33 Stats: 24377 lines in 543 files changed: 18238 ins; 2530 del; 3609 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Mon May 8 11:03:53 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 11:03:53 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v33] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 10:12:41 GMT, Thomas Schatzl wrote: > The https://github.com/openjdk/jdk/compare/master...tschatzl:jdk:alt-fullgc-forwarding?expand=1 branch now contains the promised cleanup. Thanks, Thomas! This looks useful. I've merged your branch into this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1538174145 From ayang at openjdk.org Mon May 8 12:30:49 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 8 May 2023 12:30:49 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v34] In-Reply-To: References: Message-ID: <4VUsEsRHm-L4bTnkObCeq4PD9IBSqDqiUMSn_g3Dk0o=.b960b3f2-bf32-4f24-a654-eac3f2a29a08@github.com> On Mon, 8 May 2023 11:03:53 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 96 additional commits since the last revision: > > - Merge remote-tracking branch 'tschatzl/alt-fullgc-forwarding' into JDK-8305896 > - Cleanup; remove commented out code, renames, "alt_region" -> "alternate" to match documentation, limit _region_word_size to heap size in the optimization case for better checking > - cleanup > - separate region bases > - strength reduction > - Fix asserts > - @shipilev comments > - @tstuefe review > - Add some test configs that use +UseAltGCForwarding > - Clamp home index. Duh. > - ... and 86 more: https://git.openjdk.org/jdk/compare/580e1066...c1ce8a76 src/hotspot/share/gc/shared/slidingForwarding.hpp line 59: > 57: * > 58: * 64 32 0 > 59: * [........................|OOOOOOOOOOOOOOO|A|F|TT] The number of zero should be 28 == 32 - 4 (A + F + TT), right? (I counted 15 there.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1187387702 From rkennke at openjdk.org Mon May 8 12:41:40 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 12:41:40 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v35] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix ASCII art ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/c1ce8a76..c2df2689 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=33-34 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Mon May 8 12:41:40 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 12:41:40 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v34] In-Reply-To: <4VUsEsRHm-L4bTnkObCeq4PD9IBSqDqiUMSn_g3Dk0o=.b960b3f2-bf32-4f24-a654-eac3f2a29a08@github.com> References: <4VUsEsRHm-L4bTnkObCeq4PD9IBSqDqiUMSn_g3Dk0o=.b960b3f2-bf32-4f24-a654-eac3f2a29a08@github.com> Message-ID: On Mon, 8 May 2023 12:27:54 GMT, Albert Mingkun Yang wrote: > The number of zero should be 28 == 32 - 4 (A + F + TT), right? (I counted 15 there.) Indeed. The number of .s was also too small. I think it was meant to be schematic and not precise. I changed it to be precise. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1187394425 From ayang at openjdk.org Mon May 8 12:46:45 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 8 May 2023 12:46:45 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v35] In-Reply-To: References: Message-ID: <0hIenx25CuYDfGCuxO-MB8ZRO8I3KoYRuFBqLJ-3dr4=.e21615a8-86e3-40e1-993a-34ece37d4ff0@github.com> On Mon, 8 May 2023 12:41:40 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix ASCII art > it can forward objects from one region to a maximum of two regions Those two regions must be adjacent, right? Something like: `_biased_bases[1][from_reg_idx] - _biased_bases[0][from_reg_idx] == _region_size_words`. If that's the case, I don't understand why the to-address is broken into two parts `[offset][alternate region select]`. Is it possible to double the size of the to-region so that "a maximum of two regions" becomes a single, large to-region? (I think this means `_biased_bases` can be shrunk to a 1D array.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1538297593 From rkennke at openjdk.org Mon May 8 12:51:11 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 12:51:11 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v35] In-Reply-To: <0hIenx25CuYDfGCuxO-MB8ZRO8I3KoYRuFBqLJ-3dr4=.e21615a8-86e3-40e1-993a-34ece37d4ff0@github.com> References: <0hIenx25CuYDfGCuxO-MB8ZRO8I3KoYRuFBqLJ-3dr4=.e21615a8-86e3-40e1-993a-34ece37d4ff0@github.com> Message-ID: On Mon, 8 May 2023 12:43:10 GMT, Albert Mingkun Yang wrote: > > it can forward objects from one region to a maximum of two regions > > Those two regions must be adjacent, right? Something like: `_biased_bases[1][from_reg_idx] - _biased_bases[0][from_reg_idx] == _region_size_words`. > > If that's the case, I don't understand why the to-address is broken into two parts `[offset][alternate region select]`. Is it possible to double the size of the to-region so that "a maximum of two regions" becomes a single, large to-region? (I think this means `_biased_bases` can be shrunk to a 1D array.) No, the target regions don't have to be adjacent. Parallel full GCs divide up the work across worker threads, assigning from-regions to each worker, and each worker would then claim to-regions as they go and work with that. To-regions of different workers can interleave and are often not adjacent. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1538302131 From ayang at openjdk.org Mon May 8 13:08:43 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 8 May 2023 13:08:43 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v35] In-Reply-To: References: <0hIenx25CuYDfGCuxO-MB8ZRO8I3KoYRuFBqLJ-3dr4=.e21615a8-86e3-40e1-993a-34ece37d4ff0@github.com> Message-ID: On Mon, 8 May 2023 12:47:15 GMT, Roman Kennke wrote: > Parallel full GCs divide up ... Why is Parallel relevant here? The description mentions only "the full-GC modes of Serial, Shenandoah and G1 GCs are", so I assumed this feature doesn't affect/depend on Parallel. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1538327902 From rkennke at openjdk.org Mon May 8 13:25:45 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 13:25:45 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v35] In-Reply-To: References: <0hIenx25CuYDfGCuxO-MB8ZRO8I3KoYRuFBqLJ-3dr4=.e21615a8-86e3-40e1-993a-34ece37d4ff0@github.com> Message-ID: <_LJq_Ls1hcPudH2uB30cf0JDYMfBaRZtv5EQ8Zi68yY=.0d5810ba-2694-4a4a-b5b5-d7a30103625f@github.com> On Mon, 8 May 2023 13:05:14 GMT, Albert Mingkun Yang wrote: > > Parallel full GCs divide up ... > > Why is Parallel relevant here? > > The description mentions only "the full-GC modes of Serial, Shenandoah and G1 GCs are", so I assumed this feature doesn't affect/depend on Parallel. Sorry I wasn't clear enough. I meant the full-GCs of G1 and Shenandoah, which use multiple threads to do their work. Parallel GC is indeed not relevant. Also, Serial GC doesn't really compact serially, because it divides the heap into old, young-eden, young-s1 and young-s2 spaces. And btw, even if what you say were true, we could not do this. If we were to collapse each adjacent two regions into one, we would still end up with the same conclusion that for each source region, objects would be forwarded to one or two potential target regions. Let's say we divide the heap into 10 equal regions, then region 0 would logically only compact into the bottom of region 0. But region 1 might compact to region 0 or 1. Depending on how full region 0 becomes, region 2 might compact into region 0 and 1, or region 1 and 2. And so on. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1538354393 From ayang at openjdk.org Mon May 8 13:45:37 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 8 May 2023 13:45:37 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v35] In-Reply-To: <_LJq_Ls1hcPudH2uB30cf0JDYMfBaRZtv5EQ8Zi68yY=.0d5810ba-2694-4a4a-b5b5-d7a30103625f@github.com> References: <0hIenx25CuYDfGCuxO-MB8ZRO8I3KoYRuFBqLJ-3dr4=.e21615a8-86e3-40e1-993a-34ece37d4ff0@github.com> <_LJq_Ls1hcPudH2uB30cf0JDYMfBaRZtv5EQ8Zi68yY=.0d5810ba-2694-4a4a-b5b5-d7a30103625f@github.com> Message-ID: On Mon, 8 May 2023 13:22:18 GMT, Roman Kennke wrote: > I meant the full-GCs of G1 and Shenandoah, which use multiple threads to do their work. I see; regions in the compaction-queue might not be contiguous in heap-space, so to-regions will not necessarily be adjacent. Thank you for the clarification. > And btw, even if what you say were true, we could not do this. I meant keep the current from-region size and double only to-region size. (But my original assumption was wrong, so this doesn't work.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1538382862 From rkennke at openjdk.org Mon May 8 15:19:02 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 15:19:02 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) Message-ID: This is the main body of the JEP 450: Compact Object Headers (Experimental). Main changes: - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. - The identity hash-code is narrowed to 25 bits. - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Testing: (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) - [x] tier1 (x86_64) - [x] tier2 (x86_64) - [ ] tier3 (x86_64) - [ ] tier4 (x86_64) - [x] tier1 (aarch64) - [x] tier2 (aarch64) - [ ] tier3 (aarch64) - [ ] tier4 (aarch64) - [ ] tier1 (x86_64) +UseCompactObjectHeaders - [ ] tier2 (x86_64) +UseCompactObjectHeaders - [ ] tier3 (x86_64) +UseCompactObjectHeaders - [ ] tier4 (x86_64) +UseCompactObjectHeaders - [ ] tier1 (aarch64) +UseCompactObjectHeaders - [ ] tier2 (aarch64) +UseCompactObjectHeaders - [ ] tier3 (aarch64) +UseCompactObjectHeaders - [ ] tier4 (aarch64) +UseCompactObjectHeaders ------------- Depends on: https://git.openjdk.org/jdk/pull/13779 Commit messages: - Imporve GetObjectSizeIntrinsicsTest - Some GC fixes - Add BaseOffsets test - Check UseCompactObjectHeaders flag in TestPLABPromotion - Turn off UseCompactObjectHeaders by default - Fix typeArrayOop gtest - Fix OldLayoutCheck test - SA fix - CDS fix - Turn off CDS when UseCompactObjectHeaders is not at default setting - ... and 11 more: https://git.openjdk.org/jdk/compare/b9c8ca0f...7b87ae9b Changes: https://git.openjdk.org/jdk/pull/13844/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305895 Stats: 1156 lines in 80 files changed: 925 ins; 71 del; 160 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From wkemper at openjdk.org Mon May 8 17:16:11 2023 From: wkemper at openjdk.org (William Kemper) Date: Mon, 8 May 2023 17:16:11 GMT Subject: Integrated: Remove passing tests from ProblemList.txt In-Reply-To: <_mVMIv03Z0947HMJS7M-KNqs-KhklMG780ACwlHhA18=.019bccd7-2ff4-4724-b289-fef444047d8a@github.com> References: <_mVMIv03Z0947HMJS7M-KNqs-KhklMG780ACwlHhA18=.019bccd7-2ff4-4724-b289-fef444047d8a@github.com> Message-ID: On Wed, 3 May 2023 18:20:42 GMT, William Kemper wrote: > Recent test runs suggest that all but three tests are now passing. The remaining failures are believed to be fixed by https://github.com/openjdk/shenandoah/pull/248. This pull request has now been integrated. Changeset: c966d9ac Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/c966d9acccda5669f61f2d4a5e0a1557fcfcf372 Stats: 4 lines in 2 files changed: 0 ins; 4 del; 0 mod Remove passing tests from ProblemList.txt Reviewed-by: ysr, shade ------------- PR: https://git.openjdk.org/shenandoah/pull/273 From rkennke at openjdk.org Mon May 8 18:40:05 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 18:40:05 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v36] In-Reply-To: References: Message-ID: <9lFPFZ9xr6UI731U-695uYuQoM5BtfoTCG45SH7sXI4=.51e9d5c9-4cb8-4a92-a40e-e23ff6e66f90@github.com> > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 99 commits: - Merge remote-tracking branch 'origin/JDK-8305896' into JDK-8305896 - Fix ASCII art - Merge branch 'master' into JDK-8305896 - Merge remote-tracking branch 'tschatzl/alt-fullgc-forwarding' into JDK-8305896 - Cleanup; remove commented out code, renames, "alt_region" -> "alternate" to match documentation, limit _region_word_size to heap size in the optimization case for better checking - cleanup - separate region bases - strength reduction - Fix asserts - @shipilev comments - ... and 89 more: https://git.openjdk.org/jdk/compare/14df5c13...c0147ca3 ------------- Changes: https://git.openjdk.org/jdk/pull/13582/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=35 Stats: 879 lines in 24 files changed: 841 ins; 0 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From kdnilsen at openjdk.org Mon May 8 18:41:49 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 8 May 2023 18:41:49 GMT Subject: RFR: DRAFT: Expand old on demand [v42] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix whitespace ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/53b4b5a2..3b32daf7 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=41 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=40-41 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From shade at openjdk.org Mon May 8 18:45:37 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 May 2023 18:45:37 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v35] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 12:41:40 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix ASCII art Marked as reviewed by shade (Reviewer). src/hotspot/share/gc/shared/slidingForwarding.cpp line 60: > 58: _region_size_words = heap.word_size(); > 59: _region_size_bytes_shift = log2i_exact(round_up_power_of_2(_region_size_words)) + LogHeapWordSize; > 60: } else { Indenting: Suggestion: } else { src/hotspot/share/gc/shared/slidingForwarding.cpp line 86: > 84: _bases_table = NEW_C_HEAP_ARRAY(HeapWord*, max, mtGC); > 85: _biased_bases[0] = _bases_table - _heap_start_region_bias; > 86: _biased_bases[1] = _bases_table + _num_regions - _heap_start_region_bias; Suggestion: HeapWord* biased_start = _bases_table - _heap_start_region_bias; _biased_bases[0] = biased_start; _biased_bases[1] = biased_start + _num_regions; ------------- PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1417252142 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1187728610 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1187757619 From rkennke at openjdk.org Mon May 8 18:54:44 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 18:54:44 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v2] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - Use new lightweight locking with compact headers - Merge branch 'JDK-8305898' into JDK-8305895 - Imporve GetObjectSizeIntrinsicsTest - Some GC fixes - Add BaseOffsets test - Check UseCompactObjectHeaders flag in TestPLABPromotion - Turn off UseCompactObjectHeaders by default - Fix typeArrayOop gtest - Fix OldLayoutCheck test - SA fix - ... and 13 more: https://git.openjdk.org/jdk/compare/15a8626b...2d580f8d ------------- Changes: https://git.openjdk.org/jdk/pull/13844/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=01 Stats: 1152 lines in 80 files changed: 920 ins; 71 del; 161 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From rkennke at openjdk.org Mon May 8 19:00:40 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 19:00:40 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Allow to resolve mark with LW locking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/2d580f8d..a258413b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From kdnilsen at openjdk.org Mon May 8 23:57:40 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 8 May 2023 23:57:40 GMT Subject: RFR: Changes to simplify MU accounting Message-ID: This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. ------------- Commit messages: - Changes to simplify MU accounting Changes: https://git.openjdk.org/shenandoah/pull/274/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=274&range=00 Stats: 203 lines in 7 files changed: 135 ins; 17 del; 51 mod Patch: https://git.openjdk.org/shenandoah/pull/274.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/274/head:pull/274 PR: https://git.openjdk.org/shenandoah/pull/274 From qamai at openjdk.org Tue May 9 03:02:24 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 9 May 2023 03:02:24 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Mon, 8 May 2023 19:00:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow to resolve mark with LW locking I'm not sure if this is trivial or significant, but if you limit the class pointer to 30 bit, and use the upper 2 bits for locking, then you can obtain the class pointer in less instructions: movl dst, [obj + 4] andl dst, 0xBFFFFFFF jl slow_path This exploits the fact that the most significant bit represents a negative number, so it clears the unrelated bit and checks for valid header at the same time, the sequence is only 2 instructions long after macro fusion, compared to the current value of 3. This also allows quick class comparisons against constants, assuming that most instance is in unlock state, the comparison when equality is likely can be done: cmpl [obj + 4], con | 0x40000000 jne slow_path This can be matched on an `If` so that the `slow_path` can branch to the `IfTrue` label directly, and the fast path has only 1 comparison and 1 conditional jump. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1539318097 From rkennke at openjdk.org Tue May 9 09:23:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 9 May 2023 09:23:33 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Tue, 9 May 2023 02:59:51 GMT, Quan Anh Mai wrote: > I'm not sure if this is trivial or significant, but if you limit the class pointer to 30 bit, and use the upper 2 bits for locking, then you can obtain the class pointer in less instructions: > > ``` > movl dst, [obj + 4] > andl dst, 0xBFFFFFFF > jl slow_path > ``` > > This exploits the fact that the most significant bit represents a negative number, so it clears the unrelated bit and checks for valid header at the same time, the sequence is only 2 instructions long after macro fusion, compared to the current value of 3. > > This also allows quick class comparisons against constants, assuming that most instance is in unlock state, the comparison when equality is likely can be done: > > ``` > cmpl [obj + 4], con | 0x40000000 > jne slow_path > ``` > > This can be matched on an `If` so that the `slow_path` can branch to the `IfTrue` label directly, and the fast path has only 1 comparison and 1 conditional jump. > > Thanks. These are great suggestions! I would shy away from doing it in this PR, though, because this also affects the locking subsystem and would cause quite intrusive changes and invalidate all the testing that we've done. Let's consider this in the Lilliput project and upstream the optimization separately, ok? Thanks! Roman ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1539763841 From rkennke at openjdk.org Tue May 9 09:27:53 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 9 May 2023 09:27:53 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v37] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/gc/shared/slidingForwarding.cpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/gc/shared/slidingForwarding.cpp Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/c0147ca3..840d2da7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=36 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=35-36 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From forax at openjdk.org Tue May 9 10:12:18 2023 From: forax at openjdk.org (=?UTF-8?B?UsOpbWk=?= Forax) Date: Tue, 9 May 2023 10:12:18 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Mon, 8 May 2023 19:00:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow to resolve mark with LW locking This seems great as an intermediary step toward a 32 bits header. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1539850759 From rkennke at openjdk.org Tue May 9 10:52:36 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 9 May 2023 10:52:36 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v33] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 10:12:41 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix typos >> - Make FallbackTable an inner class of SlidingForwarding > > The https://github.com/openjdk/jdk/compare/master...tschatzl:jdk:alt-fullgc-forwarding?expand=1 branch now contains the promised cleanup. @tschatzl or anybody else: any concerns remaining with this PR? If not, I would integrate it later today (if the GHA are green). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1539946381 From ayang at openjdk.org Tue May 9 12:32:35 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 9 May 2023 12:32:35 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v38] In-Reply-To: <0FqSomGDy121kJ9qWGl8sv2A6eBYdyQm4gIJ2GRcghc=.d5d6d5fd-395f-4cc7-ac43-d821eb9f1b45@github.com> References: <0FqSomGDy121kJ9qWGl8sv2A6eBYdyQm4gIJ2GRcghc=.d5d6d5fd-395f-4cc7-ac43-d821eb9f1b45@github.com> Message-ID: On Tue, 9 May 2023 10:28:21 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix build Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1418513204 From eosterlund at openjdk.org Tue May 9 12:39:30 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 9 May 2023 12:39:30 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v38] In-Reply-To: <0FqSomGDy121kJ9qWGl8sv2A6eBYdyQm4gIJ2GRcghc=.d5d6d5fd-395f-4cc7-ac43-d821eb9f1b45@github.com> References: <0FqSomGDy121kJ9qWGl8sv2A6eBYdyQm4gIJ2GRcghc=.d5d6d5fd-395f-4cc7-ac43-d821eb9f1b45@github.com> Message-ID: On Tue, 9 May 2023 10:28:21 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix build The heap alignment guarantee blows up in my local testing. How did you test this? src/hotspot/share/gc/shared/slidingForwarding.cpp line 68: > 66: _region_mask = ~((uintptr_t(1) << _region_size_bytes_shift) - 1); > 67: > 68: guarantee((_heap_start_region_bias << _region_size_bytes_shift) == (uintptr_t)_heap_start, "must be aligned"); This guarantee seems to blow up on Serial when the heap size isn't a power of two. ------------- Changes requested by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1418523391 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1188527103 From rkennke at openjdk.org Tue May 9 13:20:13 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 9 May 2023 13:20:13 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v38] In-Reply-To: References: <0FqSomGDy121kJ9qWGl8sv2A6eBYdyQm4gIJ2GRcghc=.d5d6d5fd-395f-4cc7-ac43-d821eb9f1b45@github.com> Message-ID: <_1zDYe4DUVzBN7GKjklSKgS1_re6uUAxx1RvN1bHumA=.521cc44b-9ced-4540-8d42-913607c906d7@github.com> On Tue, 9 May 2023 12:33:58 GMT, Erik ?sterlund wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix build > > src/hotspot/share/gc/shared/slidingForwarding.cpp line 68: > >> 66: _region_mask = ~((uintptr_t(1) << _region_size_bytes_shift) - 1); >> 67: >> 68: guarantee((_heap_start_region_bias << _region_size_bytes_shift) == (uintptr_t)_heap_start, "must be aligned"); > > This guarantee seems to blow up on Serial when the heap size isn't a power of two. Apparently this happens when heap is not aligned properly. I integrated @tschatzl's fix and added a test config to TestSystemGCWithSerial.java which checks the situation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1188580688 From rkennke at openjdk.org Tue May 9 13:20:10 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 9 May 2023 13:20:10 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v39] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: @tschatzl's latest fix, cleanup and a test that checks unaligned heap problem ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/0d001db2..158e8b28 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=38 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=37-38 Stats: 21 lines in 2 files changed: 15 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From eosterlund at openjdk.org Tue May 9 16:17:59 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 9 May 2023 16:17:59 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v39] In-Reply-To: References: Message-ID: <--YtVZQ3mzi7VmHg2BAufLPcCof-22LduGi74dTjNHc=.8cdb3c01-bd7b-4afd-849c-76be3e20f15c@github.com> On Tue, 9 May 2023 13:20:10 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). >> >> The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). >> >> I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. >> >> Testing: >> - [x] hotspot_gc -UseAltGCForwarding >> - [x] hotspot_gc +UseAltGCForwarding >> - [x] tier1 -UseAltGCForwarding >> - [x] tier1 +UseAltGCForwarding >> - [x] tier2 -UseAltGCForwarding >> - [x] tier2 +UseAltGCForwarding > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > @tschatzl's latest fix, cleanup and a test that checks unaligned heap problem src/hotspot/share/gc/shared/slidingForwarding.cpp line 166: > 164: // Set from and to in new or found entry. > 165: entry->_from = from; > 166: entry->_to = to; Does _from and _to change between lookups? In other words, if we first create a FallbackTableEntry and then next time find this existing FallbackTableEntry from the first lookup, shouldn't _from and _to be the same? The fact that setting _from and _to is done outside of the if, makes it look like they can and should mutate between lookups. But surely it will be the same values, or have I missed anything? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1188839952 From rrich at openjdk.org Tue May 9 16:17:59 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 9 May 2023 16:17:59 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v21] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 14:42:10 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 185: > 183: > 184: allocated_frame_size = align_up(allocated_frame_size, StackAlignmentInBytes); > 185: _frame_size_slots = allocated_frame_size >> LogBytesPerInt; `VMRegImpl::stack_slot_size` could be used when converting from size in bytes to size in slots. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1149547020 From rrich at openjdk.org Tue May 9 16:18:06 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 9 May 2023 16:18:06 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 10:44:03 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - Adaptation for JDK-8305668 > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. > - Adaptation for JDK-8303022. > - Adaptation for JDK-8303684. > - Merge branch 'openjdk:master' into PPC64_Panama > - Merge branch 'master' into PPC64_Panama > - Fix Copyright format. > - Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. > - Allow TestHFA to run on musl. Add Upcalls. > - ... and 14 more: https://git.openjdk.org/jdk/compare/3bba8995...725732a0 src/hotspot/cpu/ppc/frame_ppc.cpp line 219: > 217: UpcallStub* blob = _cb->as_upcall_stub(); > 218: JavaFrameAnchor* jfa = blob->jfa_for_frame(*this); > 219: return jfa->last_Java_sp() == NULL; Suggestion: return jfa->last_Java_sp() == nullptr; I'd suggest to do the same for all occurrences in the patch. src/hotspot/cpu/ppc/methodHandles_ppc.cpp line 316: > 314: // Load the invoker, as NEP -> .invoker > 315: __ verify_oop(nep_reg); > 316: __ ld(temp_target, jdk_internal_foreign_abi_NativeEntryPoint::downcall_stub_address_offset_in_bytes(), nep_reg); Other platforms use `access_load_at`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1177973466 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1177985899 From rrich at openjdk.org Tue May 9 16:18:09 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 9 May 2023 16:18:09 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v24] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 12:54:55 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Revert unintended formatting changes. Fix comment. src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 202: > 200: > 201: MacroAssembler* _masm = new MacroAssembler(&buffer); > 202: address start = __ function_entry(); // called by C If `!defined(ABI_ELFv2)` a function descriptor will be emitted here. It will be initialized with `friend_toc` and `friend_env`. But that's not correct for external callers, is it? If so, wouldn't an `Unimplemented()` be better than obscure crashes? src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 236: > 234: __ block_comment("{ receiver "); > 235: __ load_const_optimized(R3_ARG1, (intptr_t)receiver, R0); > 236: __ resolve_jobject(R3_ARG1, tmp, R31, MacroAssembler::PRESERVATION_FRAME_LR_GP_FP_REGS); // kills R31 As a simplification the receiver could be resolved in `UpcallLinker::on_entry` and returned in `JavaThread::_vm_result`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1179416614 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1180394508 From rrich at openjdk.org Tue May 9 16:17:52 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 9 May 2023 16:17:52 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v28] In-Reply-To: <9hDHgeACLaNP0lLQ7lXtWN07t6h4DDF5a9aaOTdvyMI=.932783da-eb49-4b9b-843b-fc564c6ffc41@github.com> References: <9hDHgeACLaNP0lLQ7lXtWN07t6h4DDF5a9aaOTdvyMI=.932783da-eb49-4b9b-843b-fc564c6ffc41@github.com> Message-ID: <17yuvkpMWGUKDER9SSdcPf2AP1b41i5P1Z907AOcfko=.7dc44835-eaaf-4fb1-a494-8109c7448297@github.com> On Sat, 6 May 2023 19:38:36 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > libTestHFA: Add explicit type conversion to avoid build warning. Hi Martin, finally I've completed a pass over the hotspot part of the port. This seems a good point to share the few comments and questions I've collected so far. In general the changes do look very good. Hardly any shared code changes. Nice work! Cheers, Richard. src/hotspot/cpu/ppc/vmstorage_ppc.hpp line 81: > 79: case T_BYTE : > 80: case T_SHORT : > 81: case T_INT : segment_mask = REG32_MASK; break; I wonder why the segment_mask depends on `bt` on ppc? ------------- PR Review: https://git.openjdk.org/jdk/pull/12708#pullrequestreview-1359503718 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1188808130 From rkennke at openjdk.org Tue May 9 16:29:34 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 9 May 2023 16:29:34 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v39] In-Reply-To: <--YtVZQ3mzi7VmHg2BAufLPcCof-22LduGi74dTjNHc=.8cdb3c01-bd7b-4afd-849c-76be3e20f15c@github.com> References: <--YtVZQ3mzi7VmHg2BAufLPcCof-22LduGi74dTjNHc=.8cdb3c01-bd7b-4afd-849c-76be3e20f15c@github.com> Message-ID: On Tue, 9 May 2023 16:15:08 GMT, Erik ?sterlund wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> @tschatzl's latest fix, cleanup and a test that checks unaligned heap problem > > src/hotspot/share/gc/shared/slidingForwarding.cpp line 166: > >> 164: // Set from and to in new or found entry. >> 165: entry->_from = from; >> 166: entry->_to = to; > > Does _from and _to change between lookups? In other words, if we first create a FallbackTableEntry and then next time find this existing FallbackTableEntry from the first lookup, shouldn't _from and _to be the same? The fact that setting _from and _to is done outside of the if, makes it look like they can and should mutate between lookups. But surely it will be the same values, or have I missed anything? Objects should be forwarded only once. The exception is G1 serial compaction, which re-forwards objects. However, I believe they should initially be forwarded 'normally' (by normal sliding compaction), and then only re-forwarded once, at which point they should land in the fallback-hashtable. I need to check it, but I believe we may be able to simplify this code and assume (and assert) that objects have not yet been forwarded in the fallback-table. Good find! (BTW, the gtests that I added for this code take advantage of re-forwarding, so those would require a re-write) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1188853376 From wkemper at openjdk.org Tue May 9 16:58:39 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 9 May 2023 16:58:39 GMT Subject: RFR: Increase card offset mask Message-ID: On 32-bit platforms, a 512 byte card covers 128 words. On 64-bit platforms, a 1024 byte card (the maximum allowed) also covers 128 words. This change increases the size of the mask we use when encoding object start offsets to handle these situations. ------------- Commit messages: - Re-problem-list failing pipeline tests - Re-problem-list failing GHA tests - Increase card offset mask to cover 32-bit words and/or 1k cards Changes: https://git.openjdk.org/shenandoah/pull/275/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=275&range=00 Stats: 5 lines in 2 files changed: 2 ins; 1 del; 2 mod Patch: https://git.openjdk.org/shenandoah/pull/275.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/275/head:pull/275 PR: https://git.openjdk.org/shenandoah/pull/275 From wkemper at openjdk.org Tue May 9 17:02:43 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 9 May 2023 17:02:43 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: Merges tag jdk-21+21 ------------- Commit messages: - Merge tag 'jdk-21+21' into merge-jdk-21-21 - 8306014: Update javax.net.ssl TLS tests to use SSLContextTemplate or SSLEngineTemplate - 8068925: Add @Override in javax.tools classes - 8306836: Remove pinned tag for G1 heap regions - 8306933: C2: "assert(false) failed: infinite loop" failure - 8306042: C2: failed: Missed optimization opportunity in PhaseCCP (adding LShift->Cast->Add notification) - 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity - 8307005: Make CardTableBarrierSet::initialize non-virtual - 8306997: C2: "malformed control flow" assert due to missing safepoint on backedge with a switch - 8307106: Allow concurrent GCs to walk CLDG without ClassLoaderDataGraph_lock - ... and 86 more: https://git.openjdk.org/shenandoah/compare/84c36e65...b701c556 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=shenandoah&pr=276&range=00.0 - openjdk/jdk:master: https://webrevs.openjdk.org/?repo=shenandoah&pr=276&range=00.1 Changes: https://git.openjdk.org/shenandoah/pull/276/files Stats: 34672 lines in 716 files changed: 20167 ins; 8751 del; 5754 mod Patch: https://git.openjdk.org/shenandoah/pull/276.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/276/head:pull/276 PR: https://git.openjdk.org/shenandoah/pull/276 From wkemper at openjdk.org Tue May 9 17:05:28 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 9 May 2023 17:05:28 GMT Subject: RFR: Merge openjdk/jdk:master [v2] In-Reply-To: References: Message-ID: > Merges tag jdk-21+21 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 278 commits: - Merge tag 'jdk-21+21' into merge-jdk-21-21 Added tag jdk-21+21 for changeset 705ad7d8 - Alphabetize includes Reviewed-by: shade - Use distinct "end of cycle" message for each Shenandoah pause Reviewed-by: kdnilsen, ysr - Remove the remaining unclean upstream difference Reviewed-by: kdnilsen - Merge openjdk/jdk:master Reviewed-by: shade - Add generations to freeset Reviewed-by: ysr, wkemper - Cleanup NumberSeq additions Reviewed-by: ysr, rkennke - Remove redundant initialization code Reviewed-by: shade - Usage tracking cleanup Reviewed-by: kdnilsen, ysr - Use state of mark context itself to track stability of old mark bitmap Reviewed-by: ysr, shade - ... and 268 more: https://git.openjdk.org/shenandoah/compare/705ad7d8...b701c556 ------------- Changes: https://git.openjdk.org/shenandoah/pull/276/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=276&range=01 Stats: 19788 lines in 216 files changed: 17933 ins; 888 del; 967 mod Patch: https://git.openjdk.org/shenandoah/pull/276.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/276/head:pull/276 PR: https://git.openjdk.org/shenandoah/pull/276 From wkemper at openjdk.org Tue May 9 17:05:30 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 9 May 2023 17:05:30 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Tue, 9 May 2023 16:52:48 GMT, William Kemper wrote: > Merges tag jdk-21+21 This pull request has now been integrated. Changeset: 28f7c5d5 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/28f7c5d501aa2a1ba2a4bacadc31c579fd45613e Stats: 34672 lines in 716 files changed: 20167 ins; 8751 del; 5754 mod Merge openjdk/jdk:master ------------- PR: https://git.openjdk.org/shenandoah/pull/276 From rkennke at openjdk.org Tue May 9 18:45:53 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 9 May 2023 18:45:53 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v40] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new (experimental) flag -XX:[+|-]UseAltGCForwarding. This flag is not really intended to be used by end-users. Instead, I intend to programatically enable it with compact object headers once they arrive (i.e. -XX:+UseCompactObjectHeaders would turn on -XX:+UseAltGCForwarding), and the flag is also useful for testing purposes. Once compact object headers become the default and only implementation, the flag and old implementation could be removed. Also, [JDK-8305898](https://bugs.openjdk.org/browse/JDK-8305898) would also use the same flag to enable an alternative self-forwarding approach (also in support of compact object headers). > > The change also adds a utility class GCForwarding which calls the old or new implementation based on the flag. I think it would also be used for the self-forwarding change to be proposed soon (and separately). > > I also experimented with a different forwarding approach that would use per-region hashtables, but shelved it for now, because performance was significantly worse than the sliding forwarding encoding. It will become useful later when we want to do 32bit compact object headers, because then, the sliding encoding will not be sufficient to hold forwarding pointers in the header. > > Testing: > - [x] hotspot_gc -UseAltGCForwarding > - [x] hotspot_gc +UseAltGCForwarding > - [x] tier1 -UseAltGCForwarding > - [x] tier1 +UseAltGCForwarding > - [x] tier2 -UseAltGCForwarding > - [x] tier2 +UseAltGCForwarding Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix gtest: Align fake-heaps, avoid re-forwardings ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/158e8b28..69c78eba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=39 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=38-39 Stats: 43 lines in 2 files changed: 6 ins; 6 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From kdnilsen at openjdk.org Tue May 9 19:55:08 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 9 May 2023 19:55:08 GMT Subject: RFR: Increase card offset mask In-Reply-To: References: Message-ID: On Tue, 9 May 2023 16:51:30 GMT, William Kemper wrote: > On 32-bit platforms, a 512 byte card covers 128 words. On 64-bit platforms, a 1024 byte card (the maximum allowed) also covers 128 words. This change increases the size of the mask we use when encoding object start offsets to handle these situations. Marked as reviewed by kdnilsen (Committer). src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 394: > 392: // frequently than last byte. This is true when number of clean cards is greater than number of dirty cards. > 393: static const uint16_t ObjectStartsInCardRegion = 0x80; > 394: static const uint16_t FirstStartBits = 0x3f; Nice catch. BTW, the cross_map structure is no longer needed and neither is the last field, and the comment that makes mention of the "last" field can be updated. We really only need one byte to represent crossing information of each card in the remembered set. I suggest integrating this as is, and doing further refactoring as a future TODO. ------------- PR Review: https://git.openjdk.org/shenandoah/pull/275#pullrequestreview-1419360655 PR Review Comment: https://git.openjdk.org/shenandoah/pull/275#discussion_r1189069588 From shade at openjdk.org Tue May 9 20:03:34 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 9 May 2023 20:03:34 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Mon, 8 May 2023 19:00:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow to resolve mark with LW locking Partial, cursory read... src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2355: > 2353: // Simple test for basic type arrays > 2354: if (UseCompressedClassPointers) { > 2355: __ load_nklass(tmp, src); Is this entire thing a `cmp_klass`? x86 seems to do it with just `cmp_klass`. src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2360: > 2358: } else { > 2359: __ ldr(tmp, Address(src, oopDesc::klass_offset_in_bytes())); > 2360: __ ldr(rscratch1, Address(dst, oopDesc::klass_offset_in_bytes())); Now that we inlined `src_klass_addr` and `dst_klass_addr` here, should we remove their definitions too? This would highlight if we have any paths that still use those addresses, perhaps by error? src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3581: > 3579: __ sub(r3, r3, BytesPerInt); > 3580: __ cbz(r3, initialize_header); > 3581: } Things like these need to be protected by `UseCompactObjectHeaders`, to make it abundantly clear the legacy paths are unaffected. src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3597: > 3595: __ mov(rscratch1, (intptr_t)markWord::prototype().value()); > 3596: __ str(rscratch1, Address(r0, oopDesc::mark_offset_in_bytes())); > 3597: __ store_klass(r0, r4); // store klass last Where is `__ store_klass_gap(r0, zr)` from the original code? src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 191: > 189: xorptr(t1, t1); > 190: movl(Address(obj, arrayOopDesc::length_offset_in_bytes() + sizeof(jint)), t1); > 191: } The relevant block is missing at least in AArch64, should it be there too? src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5157: > 5155: > 5156: void MacroAssembler::store_klass(Register dst, Register src, Register tmp) { > 5157: assert(!UseCompactObjectHeaders, "not with compact headers"); The assert like that should be in all arches? Missing at least in AArch64. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5249: > 5247: #ifdef _LP64 > 5248: void MacroAssembler::store_klass_gap(Register dst, Register src) { > 5249: assert(!UseCompactObjectHeaders, "Don't use with compact headers"); The assert like that should be in all arches? Missing at least in AArch64. src/hotspot/cpu/x86/macroAssembler_x86.hpp line 377: > 375: > 376: // Compares the Klass pointer of two objects o1 and o2. Result is in the condition flags. > 377: // Uses t1 and t2 as temporary registers. Suggestion: // Uses tmp1 and tmp2 as temporary registers. ------------- PR Review: https://git.openjdk.org/jdk/pull/13844#pullrequestreview-1419326848 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189070425 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189047098 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189068421 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189069097 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189073559 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189075770 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189076056 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189076273 From wkemper at openjdk.org Tue May 9 20:43:26 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 9 May 2023 20:43:26 GMT Subject: RFR: Increase card offset mask [v2] In-Reply-To: References: Message-ID: <6h2D0ecBepau5jpQOI1g8OekefJbesmAZf5vDtHfyi8=.759444bb-ed9b-445e-bc48-f89b2c90c1de@github.com> > On 32-bit platforms, a 512 byte card covers 128 words. On 64-bit platforms, a 1024 byte card (the maximum allowed) also covers 128 words. This change increases the size of the mask we use when encoding object start offsets to handle these situations. William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'openjdk:master' into increase-card-offset-mask - Re-problem-list failing pipeline tests - Re-problem-list failing GHA tests - Increase card offset mask to cover 32-bit words and/or 1k cards ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/275/files - new: https://git.openjdk.org/shenandoah/pull/275/files/9280938f..8848cf2e Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=275&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=275&range=00-01 Stats: 34672 lines in 716 files changed: 20167 ins; 8751 del; 5754 mod Patch: https://git.openjdk.org/shenandoah/pull/275.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/275/head:pull/275 PR: https://git.openjdk.org/shenandoah/pull/275 From wkemper at openjdk.org Tue May 9 20:51:15 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 9 May 2023 20:51:15 GMT Subject: RFR: Increase card offset mask [v2] In-Reply-To: References: Message-ID: <-x09IiVzRQppCQD2nS2pdQOPNj4z0n_YcuF4g6gazAo=.2860996b-82f6-49e7-9b12-200d71493c1e@github.com> On Tue, 9 May 2023 19:52:12 GMT, Kelvin Nilsen wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into increase-card-offset-mask >> - Re-problem-list failing pipeline tests >> - Re-problem-list failing GHA tests >> - Increase card offset mask to cover 32-bit words and/or 1k cards > > src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 394: > >> 392: // frequently than last byte. This is true when number of clean cards is greater than number of dirty cards. >> 393: static const uint16_t ObjectStartsInCardRegion = 0x80; >> 394: static const uint16_t FirstStartBits = 0x3f; > > Nice catch. BTW, the cross_map structure is no longer needed and neither is the last field, and the comment that makes mention of the "last" field can be updated. We really only need one byte to represent crossing information of each card in the remembered set. > > I suggest integrating this as is, and doing further refactoring as a future TODO. Hmm, we are definitely using the `last` field in the `cross_map` struct. I'd have to study the code to understand how we could remove it. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/275#discussion_r1189117348 From mdoerr at openjdk.org Wed May 10 10:57:46 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 10:57:46 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v29] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Replace NULL by nullptr. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/74586ab8..93060258 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=27-28 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Wed May 10 11:05:36 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 11:05:36 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v21] In-Reply-To: References: Message-ID: On Mon, 27 Mar 2023 16:54:31 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. > > src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 185: > >> 183: >> 184: allocated_frame_size = align_up(allocated_frame_size, StackAlignmentInBytes); >> 185: _frame_size_slots = allocated_frame_size >> LogBytesPerInt; > > `VMRegImpl::stack_slot_size` could be used when converting from size in bytes to size in slots. Yes, I think this would be better readable. But the following code should also be adapted, then: int framesize() const { return (_frame_size_slots >> (LogBytesPerWord - LogBytesPerInt)); } Maybe it makes sense to do some cleanup for all platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189741783 From mdoerr at openjdk.org Wed May 10 11:05:43 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 11:05:43 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 14:32:59 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: >> >> - Adaptation for JDK-8305668 >> - Merge remote-tracking branch 'origin' into PPC64_Panama >> - Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. >> - Adaptation for JDK-8303022. >> - Adaptation for JDK-8303684. >> - Merge branch 'openjdk:master' into PPC64_Panama >> - Merge branch 'master' into PPC64_Panama >> - Fix Copyright format. >> - Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. >> - Allow TestHFA to run on musl. Add Upcalls. >> - ... and 14 more: https://git.openjdk.org/jdk/compare/3bba8995...725732a0 > > src/hotspot/cpu/ppc/frame_ppc.cpp line 219: > >> 217: UpcallStub* blob = _cb->as_upcall_stub(); >> 218: JavaFrameAnchor* jfa = blob->jfa_for_frame(*this); >> 219: return jfa->last_Java_sp() == NULL; > > Suggestion: > > return jfa->last_Java_sp() == nullptr; > > I'd suggest to do the same for all occurrences in the patch. Good catch! I've replaced them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189742225 From mdoerr at openjdk.org Wed May 10 11:16:44 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 11:16:44 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 14:41:51 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: >> >> - Adaptation for JDK-8305668 >> - Merge remote-tracking branch 'origin' into PPC64_Panama >> - Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. >> - Adaptation for JDK-8303022. >> - Adaptation for JDK-8303684. >> - Merge branch 'openjdk:master' into PPC64_Panama >> - Merge branch 'master' into PPC64_Panama >> - Fix Copyright format. >> - Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. >> - Allow TestHFA to run on musl. Add Upcalls. >> - ... and 14 more: https://git.openjdk.org/jdk/compare/3bba8995...725732a0 > > src/hotspot/cpu/ppc/methodHandles_ppc.cpp line 316: > >> 314: // Load the invoker, as NEP -> .invoker >> 315: __ verify_oop(nep_reg); >> 316: __ ld(temp_target, jdk_internal_foreign_abi_NativeEntryPoint::downcall_stub_address_offset_in_bytes(), nep_reg); > > Other platforms use `access_load_at`. Interesting. I have no idea why. It does the same but with a more complicated API. I just noticed that other platforms use `NONZERO`. I think I should at least add that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189752212 From mdoerr at openjdk.org Wed May 10 11:19:34 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 11:19:34 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v24] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 16:19:46 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert unintended formatting changes. Fix comment. > > src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 202: > >> 200: >> 201: MacroAssembler* _masm = new MacroAssembler(&buffer); >> 202: address start = __ function_entry(); // called by C > > If `!defined(ABI_ELFv2)` a function descriptor will be emitted here. It will be initialized with `friend_toc` and `friend_env`. But that's not correct for external callers, is it? If so, wouldn't an `Unimplemented()` be better than obscure crashes? No, this code is correct and tested (I have a partially working Big Endian patch). `toc` and `env` are loaded by the external caller (C code), but not used by the stub. So, we don't need to initialize them to any specific values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189755698 From mdoerr at openjdk.org Wed May 10 11:26:37 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 11:26:37 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v24] In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 13:18:27 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert unintended formatting changes. Fix comment. > > src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 236: > >> 234: __ block_comment("{ receiver "); >> 235: __ load_const_optimized(R3_ARG1, (intptr_t)receiver, R0); >> 236: __ resolve_jobject(R3_ARG1, tmp, R31, MacroAssembler::PRESERVATION_FRAME_LR_GP_FP_REGS); // kills R31 > > As a simplification the receiver could be resolved in `UpcallLinker::on_entry` and returned in `JavaThread::_vm_result`. This sounds like a nice enhancement proposal for all platforms. The register spilling code in `resolve_jobject` can get lengthy dependent on the selected GC. Doing it in the C code (which we call anyway above) would make the upcall stubs smaller. @JornVernee: What do you think about this idea? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189763910 From shade at openjdk.org Wed May 10 11:28:53 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 May 2023 11:28:53 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Wed, 10 May 2023 10:26:14 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > test/jdk/java/lang/instrument/GetObjectSizeIntrinsicsTest.java line 376: > >> 374: private static long expectedSmallObjSize() { >> 375: long size; >> 376: if (!Platform.is64bit() || WhiteBox.getWhiteBox().getBooleanVMFlag("UseCompactObjectHeaders")) { > > `private static final boolean COMPACT_HEADERS = ...` again? Would make a negation cleaner too. I am surprised other tests are not affected by this... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189703116 From shade at openjdk.org Wed May 10 11:28:53 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 May 2023 11:28:53 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Mon, 8 May 2023 19:00:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow to resolve mark with LW locking Another round! src/hotspot/cpu/x86/sharedRuntime_x86.cpp line 75: > 73: uintptr_t mask_in_place = UseCompactObjectHeaders ? markWord::hash_mask_in_place_compact : markWord::hash_mask_in_place; > 74: __ shrptr(result, shift); > 75: __ andptr(result, mask); Please conditionalize it with `if (UseCompactObjectHeaders)` to make the distinction between the paths cleaner. src/hotspot/cpu/x86/templateTable_x86.cpp line 4034: > 4032: // The object is initialized before the header. If the object size is > 4033: // zero, go directly to the header initialization. > 4034: int header_size = align_up(oopDesc::base_offset_in_bytes(), BytesPerLong); Please conditionalize with `if (UseCompactObjectHeaders)`. src/hotspot/cpu/x86/templateTable_x86.cpp line 4068: > 4066: __ pop(rcx); // get saved klass back in the register. > 4067: __ movptr(rbx, Address(rcx, Klass::prototype_header_offset())); > 4068: __ movptr(Address(rax, oopDesc::mark_offset_in_bytes ()), rbx); Suggestion: __ movptr(Address(rax, oopDesc::mark_offset_in_bytes()), rbx); src/hotspot/cpu/x86/x86_64.ad line 5346: > 5344: __ jcc(Assembler::notZero, stub->entry()); > 5345: __ bind(stub->continuation()); > 5346: __ shrq(dst, markWord::klass_shift); Any reason not to do this thing in `MacroAssembler`? src/hotspot/share/gc/parallel/psOldGen.cpp line 398: > 396: > 397: virtual void do_object(oop obj) { > 398: HeapWord* test_addr = cast_from_oop(obj); I thought this `+1` is specifically to test that `object_start` is able to find the object header when given the interior pointer. See the `guarantee`-s in the next lines. src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 250: > 248: Copy::aligned_disjoint_words(cast_from_oop(o), cast_from_oop(new_obj), new_obj_size); > 249: > 250: if (!new_obj->mark().is_marked()) { Oh, this is the same as for Shenandoah. See the comment there: probably want to condition this on `UseCompactObjectHeaders`. src/hotspot/share/gc/serial/markSweep.cpp line 175: > 173: } > 174: > 175: ContinuationGCSupport::transform_stack_chunk(obj); Add a comment, something like: "// Do the transform while we still have the header intact, which might include important class information". src/hotspot/share/gc/shared/collectedHeap.cpp line 232: > 230: // With compact headers, we can't safely access the class, due > 231: // to possibly forwarded objects. > 232: if (!UseCompactObjectHeaders && is_in(object->klass_raw())) { Looks good, but what this even supposed to check? `object` is not `oop` if its klass field points into Java heap? Huh? Was it some CMS shenanigan that stores something in klass word? Or is it just a glorified null check? I'll follow up on that separately. src/hotspot/share/gc/shared/collectedHeap.hpp line 312: > 310: > 311: virtual void fill_with_dummy_object(HeapWord* start, HeapWord* end, bool zap); > 312: static size_t min_dummy_object_size() { Why this change? src/hotspot/share/gc/shared/gc_globals.hpp line 692: > 690: constraint(GCCardSizeInBytesConstraintFunc,AtParse) \ > 691: \ > 692: product(bool, UseAltGCForwarding, false, \ Should it be `EXPERIMENTAL`? src/hotspot/share/gc/shared/memAllocator.cpp line 414: > 412: // concurrent collectors. > 413: if (UseCompactObjectHeaders) { > 414: oopDesc::release_set_mark(mem, _klass->prototype_header()); In other cases, we do `markWord::prototype().set_narrow_klass(nk)` -- it looks safer, as we get the `markWord`-s prototype, and amend it. `_klass->prototype_header` can be removed, I think. src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 326: > 324: oop copy_val = cast_to_oop(copy); > 325: if (!copy_val->mark().is_marked()) { > 326: // If we copied a mark-word that indicates 'forwarded' state, then Ouch. This is only the problem with `UseCompactObjectHeaders`, right? Can additionally conditionalize on that, so that legacy code path stays the same. src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 309: > 307: _loc = obj; > 308: Klass* klass = obj->forward_safe_klass(); > 309: obj->oop_iterate_backwards(this, klass); Why `backwards`? src/hotspot/share/gc/z/zRelocate.cpp line 339: > 337: if (SuspendibleThreadSet::should_yield()) { > 338: SuspendibleThreadSet::yield(); > 339: } These should be upstreamed separately, like we did with Shenandoah STS additions? src/hotspot/share/oops/klass.hpp line 675: > 673: void set_is_cloneable(); > 674: > 675: markWord prototype_header() const { Suggestion: markWord prototype_header() const { src/hotspot/share/oops/markWord.hpp line 238: > 236: uintptr_t mask = UseCompactObjectHeaders ? hash_mask_compact : hash_mask; > 237: int shift = UseCompactObjectHeaders ? hash_shift_compact : hash_shift; > 238: uintptr_t tmp = value() & ~mask_in_place; No parenthesis left behind Suggestion: uintptr_t tmp = value() & (~mask_in_place); src/hotspot/share/oops/oop.cpp line 176: > 174: return obj->klass(); > 175: } else > 176: #endif Here and everywhere else: if we add `else` under define, we need to wrap the non-ifdefed code with `{}`. #ifdef _LP64 if (...) { ... } else #endif { ... } src/hotspot/share/oops/oop.hpp line 124: > 122: inline size_t size_given_klass(Klass* klass); > 123: > 124: // The following set of methods is used to access the mark-word and related So, these are done to avoid introducing branches on the paths where objects are definitely _not_ forwarded? Are there fewer places than where we expect forwardings? Maybe the better way would be to make all methods handle the occasional forwarding, and then provide the methods that provide the _fast-path_, like `fast_mark`, `fast_class`, etc? src/hotspot/share/oops/oop.hpp line 130: > 128: // those methods can not deal with the sliding-forwarding that is used > 129: // in Serial, G1 and Shenandoah full-GCs. > 130: inline markWord forward_safe_mark() const; `forward_safe_mark` seems to be only used in `forward_safe_klass`, can be inlined/simplified there? src/hotspot/share/oops/oop.hpp line 346: > 344: // load the narrowKlass from the header. > 345: STATIC_ASSERT(markWord::klass_shift % 8 == 0); > 346: return mark_offset_in_bytes() + markWord::klass_shift / 8; There is a convenient `BitsPerByte` constant for this -- makes it clear we are converting bits to bytes. src/hotspot/share/oops/oop.hpp line 362: > 360: // With compact headers, the Klass* field is not used for the Klass* > 361: // and is used for the object fields instead. > 362: assert(sizeof(markWord) == 8, "sanity"); `STATIC_ASSERT`? src/hotspot/share/oops/oop.inline.hpp line 106: > 104: #ifdef _LP64 > 105: if (UseCompactObjectHeaders) { > 106: assert(UseCompressedClassPointers, "expect compressed klass pointers"); Here and in other places in this file, it seems redundant to check `UseCompressedClassPointers`, as argument checking code makes sure we are in the right mode? We don't seem to check it consistently anyway. src/hotspot/share/oops/oop.inline.hpp line 261: > 259: > 260: markWord oopDesc::forward_safe_mark() const { > 261: markWord mrk = mark(); Convention: `mrk` -> `m`. src/hotspot/share/oops/oop.inline.hpp line 491: > 489: template > 490: void oopDesc::oop_iterate_backwards(OopClosureType* cl, Klass* k) { > 491: // We cannot safely access the Klass* with compact headers here. This comment is only about the assert itself, right? If so, then: // In this assert, ... src/hotspot/share/oops/typeArrayKlass.cpp line 231: > 229: > 230: size_t TypeArrayKlass::oop_size(oop obj) const { > 231: assert(obj->is_typeArray(),"must be a type array"); So, these checks are removed because they reach for class, which we cannot use with `UseCompactObjectHeaders`? Better to disable the assert at this time, I think, instead of removing it. src/hotspot/share/oops/typeArrayKlass.inline.hpp line 38: > 36: > 37: inline void TypeArrayKlass::oop_oop_iterate_impl(oop obj, OopIterateClosure* closure) { > 38: // We cannot safely access the Klass* with compact headers. Same thing about assert? "In this assert, ..." src/hotspot/share/opto/callnode.cpp line 1579: > 1577: Node* klass_node = in(AllocateNode::KlassNode); > 1578: Node* proto_adr = phase->transform(new AddPNode(klass_node, klass_node, phase->MakeConX(in_bytes(Klass::prototype_header_offset())))); > 1579: mark_node = LoadNode::make(*phase, control, mem, proto_adr, TypeRawPtr::BOTTOM, TypeX_X, TypeX_X->basic_type(), MemNode::unordered); Note to self: This load is probably foldable if we know the klass is constant -- which it almost always is on allocation paths. I'll check and fix if it is not. src/hotspot/share/runtime/arguments.cpp line 3120: > 3118: > 3119: #ifdef _LP64 > 3120: if (!FLAG_IS_DEFAULT(UseCompactObjectHeaders)) { Just `if (UseCompactObjectHeaders)`, or do I miss something? src/hotspot/share/runtime/arguments.cpp line 3128: > 3126: warning("Compact object headers require compressed class pointers. Disabling compact object headers."); > 3127: FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); > 3128: } I think you need to print the warning when user overrides it, but disable it _even if user did not overridden it_ (e.g. there is a flag selection bug somewhere in platform-specific VM code). Suggestion: if (UseCompactObjectHeaders && !UseCompressedClassPointers) { if (FLAG_IS_CMDLINE(UseCompressedClassPointers)) { warning("Compact object headers require compressed class pointers. Disabling compact object headers."); } FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); } src/hotspot/share/runtime/arguments.cpp line 3130: > 3128: } > 3129: > 3130: if (UseCompactObjectHeaders && !UseHeavyMonitors) { Should this check `LockingMode`? src/hotspot/share/runtime/globals.hpp line 133: > 131: \ > 132: product(bool, UseCompactObjectHeaders, false, EXPERIMENTAL, \ > 133: "Use compact 64-bit object headers in 64-bit VM") \ Suggestion: product(bool, UseCompactObjectHeaders, false, EXPERIMENTAL, \ "Use compact 64-bit object headers in 64-bit VM") \ src/hotspot/share/runtime/globals.hpp line 1067: > 1065: "If true, error data is printed to stdout instead of a file") \ > 1066: \ > 1067: product(bool, UseHeavyMonitors, false, \ Why back to `product`? src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Array.java line 87: > 85: if (VM.getVM().isCompactObjectHeadersEnabled()) { > 86: lengthOffsetInBytes = Oop.getHeaderSize(); > 87: } else if (VM.getVM().isCompressedKlassPointersEnabled()) { Suggestion: } else if (VM.getVM().isCompressedKlassPointersEnabled()) { src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 96: > 94: if (VM.getVM().isCompactObjectHeadersEnabled()) { > 95: assert(VM.getVM().isCompressedKlassPointersEnabled()); > 96: return getKlass(getMark()); Suggestion: assert(VM.getVM().isCompressedKlassPointersEnabled()); return getKlass(getMark()); test/hotspot/jtreg/gc/g1/plab/TestPLABPromotion.java line 77: > 75: private static final int OBJECT_SIZE_SMALL = 10; > 76: private static final int OBJECT_SIZE_MEDIUM = 100; > 77: private static final int OBJECT_SIZE_HIGH = (Platform.is64bit() && WhiteBox.getWhiteBox().getBooleanVMFlag("UseCompactObjectHeaders")) ? 3266 : 3250; Please pull this into a separate flag, `private static final boolean COMPACT_OBJECT_HEADERS = ...` test/hotspot/jtreg/runtime/FieldLayout/BaseOffsets.java line 62: > 60: > 61: // @0: 8 byte header, @8: int field > 62: static final long INT_OFFSET; What that comment is supposed to mean? Suggestion: // @0: 8 byte header, @8: int field static final long INT_OFFSET; test/jdk/java/lang/instrument/GetObjectSizeIntrinsicsTest.java line 376: > 374: private static long expectedSmallObjSize() { > 375: long size; > 376: if (!Platform.is64bit() || WhiteBox.getWhiteBox().getBooleanVMFlag("UseCompactObjectHeaders")) { `private static final boolean COMPACT_HEADERS = ...` again? Would make a negation cleaner too. ------------- PR Review: https://git.openjdk.org/jdk/pull/13844#pullrequestreview-1420143614 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189593168 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189596357 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189598288 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189602383 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189752465 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189754485 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189766197 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189719706 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189707883 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189707140 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189619618 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189763083 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189763521 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189626626 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189630917 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189651805 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189654298 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189750442 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189747596 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189663098 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189663671 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189673370 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189675459 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189680333 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189743882 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189684020 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189689577 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189671105 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189670590 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189672200 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189634284 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189635121 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189694294 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189696621 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189700925 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189703623 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189702226 From mdoerr at openjdk.org Wed May 10 11:30:36 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 11:30:36 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v28] In-Reply-To: <17yuvkpMWGUKDER9SSdcPf2AP1b41i5P1Z907AOcfko=.7dc44835-eaaf-4fb1-a494-8109c7448297@github.com> References: <9hDHgeACLaNP0lLQ7lXtWN07t6h4DDF5a9aaOTdvyMI=.932783da-eb49-4b9b-843b-fc564c6ffc41@github.com> <17yuvkpMWGUKDER9SSdcPf2AP1b41i5P1Z907AOcfko=.7dc44835-eaaf-4fb1-a494-8109c7448297@github.com> Message-ID: On Tue, 9 May 2023 15:48:52 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> libTestHFA: Add explicit type conversion to avoid build warning. > > src/hotspot/cpu/ppc/vmstorage_ppc.hpp line 81: > >> 79: case T_BYTE : >> 80: case T_SHORT : >> 81: case T_INT : segment_mask = REG32_MASK; break; > > I wonder why the segment_mask depends on `bt` on ppc? The usage of the `segment_mask` can be defined for each platform. I'm using it to encode the information if a value on the Java side uses a 32 or 64 bit slot. In case of 32 bit values, the C side requires all 64 register bits to get defined values (ints get sign extended, floats get converted to double-precision format). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189768204 From rkennke at openjdk.org Wed May 10 12:35:47 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 12:35:47 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v4] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: @shipilev comments, round 1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/a258413b..b39b71b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=02-03 Stats: 47 lines in 6 files changed: 29 ins; 10 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From rkennke at openjdk.org Wed May 10 12:46:40 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 12:46:40 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/b39b71b9..8761447f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From rkennke at openjdk.org Wed May 10 13:04:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 13:04:33 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: <7glkxk6JYyrYABi14s2CLrCBPeWy_6AChGZ9Gik-Nmc=.21eed5d4-90b5-4d12-aeb2-dab129c14d0d@github.com> On Wed, 10 May 2023 09:24:13 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > src/hotspot/share/gc/shared/memAllocator.cpp line 414: > >> 412: // concurrent collectors. >> 413: if (UseCompactObjectHeaders) { >> 414: oopDesc::release_set_mark(mem, _klass->prototype_header()); > > In other cases, we do `markWord::prototype().set_narrow_klass(nk)` -- it looks safer, as we get the `markWord`-s prototype, and amend it. `_klass->prototype_header` can be removed, I think. I like _klass->prototype_header() more, and would argue that we should use that instead, here. An object's prototype mark really depends on the Klass of the object, with compact headers, and we would always get the correct prototype out of _klass->prototype_header(). Also, perhaps more importantly, the Klass::prototype_header() is useful because we can load it in generated code with a single instruction, while fetching the markWord::prototype() and amending it *at runtime* would require a whole sequence of instructions. We only use markWord::prototype().set_narrow_klass(nk) in CDS, where the correct encoding of the narrow-klass in the header depends on the relocated Klass* location, and we couldn't safely use Klass::prototype_header(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189873545 From kbarrett at openjdk.org Wed May 10 13:09:31 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 10 May 2023 13:09:31 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends Message-ID: Please review this renaming of Atomic::fetch_and_add and friends to be consistent with the naming convention recently chosen for atomic bitops. That is, make the following name changes for class Atomic and it's implementation: - fetch_and_add => fetch_then_add - add_and_fetch => add_then_fetch Testing: mach5 tier1-3 GHA testing ------------- Commit messages: - rename in tests - rename uses - rename impl Changes: https://git.openjdk.org/jdk/pull/13896/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13896&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307806 Stats: 156 lines in 39 files changed: 0 ins; 0 del; 156 mod Patch: https://git.openjdk.org/jdk/pull/13896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13896/head:pull/13896 PR: https://git.openjdk.org/jdk/pull/13896 From jvernee at openjdk.org Wed May 10 13:10:22 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 10 May 2023 13:10:22 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:13:14 GMT, Martin Doerr wrote: > It does the same but with a more complicated API. AFAIK It depends on the GC that's being used. `access_load_at` will make sure the right GC barriers are inserted (mostly for concurrent GCs). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189885698 From jvernee at openjdk.org Wed May 10 13:16:27 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 10 May 2023 13:16:27 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v24] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:23:04 GMT, Martin Doerr wrote: >> src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 236: >> >>> 234: __ block_comment("{ receiver "); >>> 235: __ load_const_optimized(R3_ARG1, (intptr_t)receiver, R0); >>> 236: __ resolve_jobject(R3_ARG1, tmp, R31, MacroAssembler::PRESERVATION_FRAME_LR_GP_FP_REGS); // kills R31 >> >> As a simplification the receiver could be resolved in `UpcallLinker::on_entry` and returned in `JavaThread::_vm_result`. > > This sounds like a nice enhancement proposal for all platforms. The register spilling code in `resolve_jobject` can get lengthy dependent on the selected GC. Doing it in the C code (which we call anyway above) would make the upcall stubs smaller. > @JornVernee: What do you think about this idea? Seems like a nice idea. The resolution here pre-dates the time where we called into the VM. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189892999 From rrich at openjdk.org Wed May 10 13:27:33 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 May 2023 13:27:33 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:07:35 GMT, Jorn Vernee wrote: >> Interesting. I have no idea why. It does the same but with a more complicated API. >> I just noticed that other platforms use `NONZERO`. I think I should at least add that. > >> It does the same but with a more complicated API. > > AFAIK It depends on the GC that's being used. `access_load_at` will make sure the right GC barriers are inserted (mostly for concurrent GCs). As I see it, the access API is an abstraction to be used instead of raw loads. It hides details. See for instance `TemplateTable::getfield_or_static` on x86 where it is also used. PPC lags behind in making use of the access API. With a fancy new GC the oop in nep_reg could be stale, requiring some processing which would be taken care of by using the access API. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189905194 From mdoerr at openjdk.org Wed May 10 13:27:33 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 13:27:33 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:22:48 GMT, Richard Reingruber wrote: >>> It does the same but with a more complicated API. >> >> AFAIK It depends on the GC that's being used. `access_load_at` will make sure the right GC barriers are inserted (mostly for concurrent GCs). > > As I see it, the access API is an abstraction to be used instead of raw loads. It hides details. See for instance `TemplateTable::getfield_or_static` on x86 where it is also used. PPC lags behind in making use of the access API. > With a fancy new GC the oop in nep_reg could be stale, requiring some processing which would be taken care of by using the access API. GC barriers are used when loading or storing an oop. No GC we currently have (not even the generational ones) use barriers for loading a plain address from an oop. The PPC64 implementation of the BarrierSetAssembler currently has `Unimplemented()` for non-oop types and all GCs are implemented. Maybe it was intended for some future GC or other feature which has not yet reached the official repo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189908142 From rrich at openjdk.org Wed May 10 13:36:34 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 May 2023 13:36:34 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:24:55 GMT, Martin Doerr wrote: >> As I see it, the access API is an abstraction to be used instead of raw loads. It hides details. See for instance `TemplateTable::getfield_or_static` on x86 where it is also used. PPC lags behind in making use of the access API. >> With a fancy new GC the oop in nep_reg could be stale, requiring some processing which would be taken care of by using the access API. > > GC barriers are used when loading or storing an oop. No GC we currently have (not even the generational ones) use barriers for loading a plain address from an oop. The PPC64 implementation of the BarrierSetAssembler currently has `Unimplemented()` for non-oop types and all GCs are implemented. > Maybe it was intended for some future GC or other feature which has not yet reached the official repo. You are reasoning about implementation details. By using the provided abstraction you and other maintainers (who might be unfamiliar with them) would not have to do that. Also the assumptions you make introduce a hidden dependency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189919874 From mdoerr at openjdk.org Wed May 10 13:46:36 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 13:46:36 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: <3UOaZ75k_vzmyK8rntwXGjUT_Hd1IAtVRkQ5G3zpTr0=.0b46880f-1476-4688-82cc-df853e3f8bf8@github.com> On Wed, 10 May 2023 13:33:02 GMT, Richard Reingruber wrote: >> GC barriers are used when loading or storing an oop. No GC we currently have (not even the generational ones) use barriers for loading a plain address from an oop. The PPC64 implementation of the BarrierSetAssembler currently has `Unimplemented()` for non-oop types and all GCs are implemented. >> Maybe it was intended for some future GC or other feature which has not yet reached the official repo. > > You are reasoning about implementation details. By using the provided abstraction you and other maintainers (who might be unfamiliar with them) would not have to do that. Also the assumptions you make introduce a hidden dependency. I just figured it out. It was introduced by https://bugs.openjdk.org/browse/JDK-8203172 (on aarch64) which mentions Shenandoah and future GCs. However, the Shenandoah comment says "non-reference load, no additional barrier is needed" and it doesn't use barriers in such a case. So, for the time being, I'll keep the normal load (because `access_load_at` is not ready for non-oop types). But I should add the `NONZERO` check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189934352 From rkennke at openjdk.org Wed May 10 13:48:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 13:48:33 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Wed, 10 May 2023 09:36:10 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > src/hotspot/share/runtime/globals.hpp line 1067: > >> 1065: "If true, error data is printed to stdout instead of a file") \ >> 1066: \ >> 1067: product(bool, UseHeavyMonitors, false, \ > > Why back to `product`? This slipped in for some testing. Will revert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189936814 From kbarrett at openjdk.org Wed May 10 13:48:52 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 10 May 2023 13:48:52 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v2] In-Reply-To: References: Message-ID: > Please review this renaming of Atomic::fetch_and_add and friends to be > consistent with the naming convention recently chosen for atomic bitops. That > is, make the following name changes for class Atomic and it's implementation: > > - fetch_and_add => fetch_then_add > - add_and_fetch => add_then_fetch > > Testing: > mach5 tier1-3 > GHA testing Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: revert accidental Red Hat copyright change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13896/files - new: https://git.openjdk.org/jdk/pull/13896/files/a5da705e..f527511b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13896&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13896&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13896/head:pull/13896 PR: https://git.openjdk.org/jdk/pull/13896 From stefank at openjdk.org Wed May 10 13:53:24 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 10 May 2023 13:53:24 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v2] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:48:52 GMT, Kim Barrett wrote: >> Please review this renaming of Atomic::fetch_and_add and friends to be >> consistent with the naming convention recently chosen for atomic bitops. That >> is, make the following name changes for class Atomic and it's implementation: >> >> - fetch_and_add => fetch_then_add >> - add_and_fetch => add_then_fetch >> >> Testing: >> mach5 tier1-3 >> GHA testing > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > revert accidental Red Hat copyright change Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13896#pullrequestreview-1420697774 From rkennke at openjdk.org Wed May 10 13:53:31 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 13:53:31 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Wed, 10 May 2023 11:11:20 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > src/hotspot/share/oops/oop.hpp line 124: > >> 122: inline size_t size_given_klass(Klass* klass); >> 123: >> 124: // The following set of methods is used to access the mark-word and related > > So, these are done to avoid introducing branches on the paths where objects are definitely _not_ forwarded? Are there fewer places than where we expect forwardings? Maybe the better way would be to make all methods handle the occasional forwarding, and then provide the methods that provide the _fast-path_, like `fast_mark`, `fast_class`, etc? No, that would not work, because we have different ways to encode forwarding: full-GCs use sliding forwarding, and normal GCs use normal forwarding (with the exception of the forward-failed bit). Here, we wouldn't know which is which. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189944010 From stefank at openjdk.org Wed May 10 13:56:38 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 10 May 2023 13:56:38 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: <3UOaZ75k_vzmyK8rntwXGjUT_Hd1IAtVRkQ5G3zpTr0=.0b46880f-1476-4688-82cc-df853e3f8bf8@github.com> References: <3UOaZ75k_vzmyK8rntwXGjUT_Hd1IAtVRkQ5G3zpTr0=.0b46880f-1476-4688-82cc-df853e3f8bf8@github.com> Message-ID: <5uv2Nqt_IDeyq2NLXG3RziMSIPTeTnwUnDb9GhFaDEc=.a9728f65-3b53-44eb-94d5-87541d215334@github.com> On Wed, 10 May 2023 13:43:41 GMT, Martin Doerr wrote: >> You are reasoning about implementation details. By using the provided abstraction you and other maintainers (who might be unfamiliar with them) would not have to do that. Also the assumptions you make introduce a hidden dependency. > > I just figured it out. It was introduced by https://bugs.openjdk.org/browse/JDK-8203172 (on aarch64) which mentions Shenandoah and future GCs. However, the Shenandoah comment says "non-reference load, no additional barrier is needed" and it doesn't use barriers in such a case. So, for the time being, I'll keep the normal load (because `access_load_at` is not ready for non-oop types). But I should add the `NONZERO` check. FWIW, since Shenandoah changed their load barriers we have been cleaning away the usages of the Access API for loads and stores to primitive values. There's no such support in the C++ Runtime code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189948988 From rrich at openjdk.org Wed May 10 14:19:44 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 May 2023 14:19:44 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v24] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:16:44 GMT, Martin Doerr wrote: >> src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 202: >> >>> 200: >>> 201: MacroAssembler* _masm = new MacroAssembler(&buffer); >>> 202: address start = __ function_entry(); // called by C >> >> If `!defined(ABI_ELFv2)` a function descriptor will be emitted here. It will be initialized with `friend_toc` and `friend_env`. But that's not correct for external callers, is it? If so, wouldn't an `Unimplemented()` be better than obscure crashes? > > No, this code is correct and tested (I have a partially working Big Endian patch). `toc` and `env` are loaded by the external caller (C code), but not used by the stub. So, we don't need to initialize them to any specific values. I think I understand. The loaded `toc` and `env` of the stub are never used as Java execution does not use them and native or runtime calls will load corresponding `toc` and `env` of the callee. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189980161 From mdoerr at openjdk.org Wed May 10 14:19:43 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 14:19:43 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v30] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add NONZERO check for downcall_stub_address_offset_in_bytes(). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/93060258..edcdefba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=28-29 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From rkennke at openjdk.org Wed May 10 14:21:38 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 14:21:38 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Wed, 10 May 2023 10:42:38 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > src/hotspot/share/gc/shared/collectedHeap.cpp line 232: > >> 230: // With compact headers, we can't safely access the class, due >> 231: // to possibly forwarded objects. >> 232: if (!UseCompactObjectHeaders && is_in(object->klass_raw())) { > > Looks good, but what this even supposed to check? `object` is not `oop` if its klass field points into Java heap? Huh? Was it some CMS shenanigan that stores something in klass word? Or is it just a glorified null check? I'll follow up on that separately. I have no idea *shrugs* > src/hotspot/share/gc/shared/collectedHeap.hpp line 312: > >> 310: >> 311: virtual void fill_with_dummy_object(HeapWord* start, HeapWord* end, bool zap); >> 312: static size_t min_dummy_object_size() { > > Why this change? That's because oopDesc::header_size() can no longer be constexpr, because it depends on UseCompactObjectHeaders. > test/hotspot/jtreg/runtime/FieldLayout/BaseOffsets.java line 62: > >> 60: >> 61: // @0: 8 byte header, @8: int field >> 62: static final long INT_OFFSET; > > What that comment is supposed to mean? > > Suggestion: > > // @0: 8 byte header, @8: int field > static final long INT_OFFSET; This test is a variation of the OldLayoutCheck test, where the comment is better. It doesn't make much sense here, though. I'm removing it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189983638 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189982822 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189980306 From rkennke at openjdk.org Wed May 10 14:29:31 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 14:29:31 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Wed, 10 May 2023 11:13:30 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > src/hotspot/share/gc/parallel/psOldGen.cpp line 398: > >> 396: >> 397: virtual void do_object(oop obj) { >> 398: HeapWord* test_addr = cast_from_oop(obj); > > I thought this `+1` is specifically to test that `object_start` is able to find the object header when given the interior pointer. See the `guarantee`-s in the next lines. Yes, but with compact headers, we could now have 1-word-sized objects, in which case this would fail. I am not sure how to deal with that, TBH. Maybe do the whole test only when !UseCompactObjectHeaders or when object-size is > 1? > src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 326: > >> 324: oop copy_val = cast_to_oop(copy); >> 325: if (!copy_val->mark().is_marked()) { >> 326: // If we copied a mark-word that indicates 'forwarded' state, then > > Ouch. This is only the problem with `UseCompactObjectHeaders`, right? Can additionally conditionalize on that, so that legacy code path stays the same. I'm actually thinking to maybe upstream these parts separately? Because it seems a nice improvement to not even try to relativize a stack-chunk if object is not reachable anyway? > src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 309: > >> 307: _loc = obj; >> 308: Klass* klass = obj->forward_safe_klass(); >> 309: obj->oop_iterate_backwards(this, klass); > > Why `backwards`? Because that's the only oop_iterate() variant that takes a Klass* :-) It seems trivial to add such a forward-variant, though. Prefer that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189990477 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189993638 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189994714 From jvernee at openjdk.org Wed May 10 14:30:27 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 10 May 2023 14:30:27 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: <5uv2Nqt_IDeyq2NLXG3RziMSIPTeTnwUnDb9GhFaDEc=.a9728f65-3b53-44eb-94d5-87541d215334@github.com> References: <3UOaZ75k_vzmyK8rntwXGjUT_Hd1IAtVRkQ5G3zpTr0=.0b46880f-1476-4688-82cc-df853e3f8bf8@github.com> <5uv2Nqt_IDeyq2NLXG3RziMSIPTeTnwUnDb9GhFaDEc=.a9728f65-3b53-44eb-94d5-87541d215334@github.com> Message-ID: On Wed, 10 May 2023 13:53:53 GMT, Stefan Karlsson wrote: >> I just figured it out. It was introduced by https://bugs.openjdk.org/browse/JDK-8203172 (on aarch64) which mentions Shenandoah and future GCs. However, the Shenandoah comment says "non-reference load, no additional barrier is needed" and it doesn't use barriers in such a case. So, for the time being, I'll keep the normal load (because `access_load_at` is not ready for non-oop types). But I should add the `NONZERO` check. > > FWIW, since Shenandoah changed their load barriers we have been cleaning away the usages of the Access API for loads and stores to primitive values. There's no such support in the C++ Runtime code. Ok, since this is loading a `long` (which represents an address that points into the code cache) I think we're fine without using the access API then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189996018 From rkennke at openjdk.org Wed May 10 14:47:42 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 14:47:42 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: @shipilev review, round 2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/8761447f..48e8d104 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=04-05 Stats: 122 lines in 22 files changed: 40 ins; 43 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From shade at openjdk.org Wed May 10 14:57:30 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 May 2023 14:57:30 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: <9hXFOyAWzUILh5FpU5ADkbBsC1zpLA-jdKqGR-YyEhQ=.39149851-97be-4d87-94c6-a5c321191d94@github.com> On Wed, 10 May 2023 14:26:01 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 309: >> >>> 307: _loc = obj; >>> 308: Klass* klass = obj->forward_safe_klass(); >>> 309: obj->oop_iterate_backwards(this, klass); >> >> Why `backwards`? > > Because that's the only oop_iterate() variant that takes a Klass* :-) It seems trivial to add such a forward-variant, though. Prefer that? I guess it is not a bother for ShenandoahVerifier, so unless it is needed anywhere else, there is no need to add another method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190033196 From kdnilsen at openjdk.org Wed May 10 15:22:17 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 10 May 2023 15:22:17 GMT Subject: RFR: Increase card offset mask [v2] In-Reply-To: <-x09IiVzRQppCQD2nS2pdQOPNj4z0n_YcuF4g6gazAo=.2860996b-82f6-49e7-9b12-200d71493c1e@github.com> References: <-x09IiVzRQppCQD2nS2pdQOPNj4z0n_YcuF4g6gazAo=.2860996b-82f6-49e7-9b12-200d71493c1e@github.com> Message-ID: On Tue, 9 May 2023 20:48:41 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 394: >> >>> 392: // frequently than last byte. This is true when number of clean cards is greater than number of dirty cards. >>> 393: static const uint16_t ObjectStartsInCardRegion = 0x80; >>> 394: static const uint16_t FirstStartBits = 0x3f; >> >> Nice catch. BTW, the cross_map structure is no longer needed and neither is the last field, and the comment that makes mention of the "last" field can be updated. We really only need one byte to represent crossing information of each card in the remembered set. >> >> I suggest integrating this as is, and doing further refactoring as a future TODO. > > Hmm, we are definitely using the `last` field in the `cross_map` struct. I'd have to study the code to understand how we could remove it. You are right that we are still "maintaining" the last field and there are many references to it in the code. I am thinking that with Ramki's changes to remembered set scanning, it is no longer used in scanning, but I may be remembering this incorrectly. I agree this is not a subtle change and would require some "study" to make sure it is done properly, if at all. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/275#discussion_r1190069097 From shade at openjdk.org Wed May 10 15:54:31 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 May 2023 15:54:31 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: <2aa_qbDCJyxHjYxsTECFVv6NZ9-th6JH311liEQZn_8=.b6948717-78a3-47bf-b62b-24a2439ca922@github.com> On Wed, 10 May 2023 14:25:16 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 326: >> >>> 324: oop copy_val = cast_to_oop(copy); >>> 325: if (!copy_val->mark().is_marked()) { >>> 326: // If we copied a mark-word that indicates 'forwarded' state, then >> >> Ouch. This is only the problem with `UseCompactObjectHeaders`, right? Can additionally conditionalize on that, so that legacy code path stays the same. > > I'm actually thinking to maybe upstream these parts separately? Because it seems a nice improvement to not even try to relativize a stack-chunk if object is not reachable anyway? Yes, upstreaming this separately would be cleaner for this PR. It would also highlight any problems with adjusting the lifecycle for relativization of stack chunks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190109483 From wkemper at openjdk.org Wed May 10 15:58:57 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 10 May 2023 15:58:57 GMT Subject: Integrated: Increase card offset mask In-Reply-To: References: Message-ID: On Tue, 9 May 2023 16:51:30 GMT, William Kemper wrote: > On 32-bit platforms, a 512 byte card covers 128 words. On 64-bit platforms, a 1024 byte card (the maximum allowed) also covers 128 words. This change increases the size of the mask we use when encoding object start offsets to handle these situations. This pull request has now been integrated. Changeset: f969d53d Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/f969d53de1e9e318357d0f04b6270d3c9d012138 Stats: 5 lines in 2 files changed: 2 ins; 1 del; 2 mod Increase card offset mask Reviewed-by: kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/275 From rkennke at openjdk.org Wed May 10 16:23:31 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 16:23:31 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: <_8Kn0tp4jCs7AT6UkMa9BiK-NYWRIPF-AYprF8WcAwU=.b6753c09-89d2-4cd5-bea7-49cb1c55b927@github.com> On Wed, 10 May 2023 10:00:51 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > src/hotspot/share/runtime/arguments.cpp line 3120: > >> 3118: >> 3119: #ifdef _LP64 >> 3120: if (!FLAG_IS_DEFAULT(UseCompactObjectHeaders)) { > > Just `if (UseCompactObjectHeaders)`, or do I miss something? I've done this on purpose. When the default for UseCompactObjectHeaders is false, then CDS archives will be written with legacy headers, and we could not read this when running with +UseCompactObjectHeaders. When the default for UseCompactObjectHeaders is true, then CDS archives will be written with compact headers, and we could not read this when running with -UseCompactObjectHeaders. I (and others) are changing the default of this flag regularily for testing, because that also catches tests that require flagless, and the way this is written, would not require changing this line in arguments.cpp too. I guess it would be even more useful if we could detect which setting of the flag has been used when writing a CDS archive, and don't read it if it's not compatible. It would be *even* better, if we could detect the setting of the flag when archive has been written, and transform it into whatever the JVM is running with, but that would be too much to ask for this PR, I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190141817 From rkennke at openjdk.org Wed May 10 19:16:58 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 19:16:58 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: - Merge branch 'JDK-8305898' into JDK-8305895 - @shipilev review, round 2 - Fix build - @shipilev comments, round 1 - Allow to resolve mark with LW locking - Use new lightweight locking with compact headers - Merge branch 'JDK-8305898' into JDK-8305895 - Imporve GetObjectSizeIntrinsicsTest - Some GC fixes - Add BaseOffsets test - ... and 18 more: https://git.openjdk.org/jdk/compare/39c33727...58046e58 ------------- Changes: https://git.openjdk.org/jdk/pull/13844/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=06 Stats: 1183 lines in 81 files changed: 944 ins; 79 del; 160 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From rkennke at openjdk.org Wed May 10 20:31:27 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 20:31:27 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v41] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new ... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Align fake-heap without GCC warnings (duh) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/69c78eba..b0deb2b3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=40 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=39-40 Stats: 8 lines in 2 files changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From wkemper at openjdk.org Wed May 10 20:45:23 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 10 May 2023 20:45:23 GMT Subject: RFR: Assert that region usage accounting is only accessed in generational mode Message-ID: There was one unrelated test failure in the github actions (running G1). ------------- Commit messages: - Use gerund verb form for assert message - Merge remote-tracking branch 'shenandoah/master' into add-used-regions-assert - Assert that used_regions is only accessed for generational mode Changes: https://git.openjdk.org/shenandoah/pull/278/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=278&range=00 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/278.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/278/head:pull/278 PR: https://git.openjdk.org/shenandoah/pull/278 From ysr at openjdk.org Wed May 10 20:55:37 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 10 May 2023 20:55:37 GMT Subject: RFR: Increase card offset mask [v2] In-Reply-To: References: <-x09IiVzRQppCQD2nS2pdQOPNj4z0n_YcuF4g6gazAo=.2860996b-82f6-49e7-9b12-200d71493c1e@github.com> Message-ID: <_KusHbL2Xg0ba_6X_fgIeDisyVXnKG4PIPHB3w7epmE=.5ba7f5ec-ea0b-45e5-b93f-5b12d826e261@github.com> On Wed, 10 May 2023 15:19:28 GMT, Kelvin Nilsen wrote: >> Hmm, we are definitely using the `last` field in the `cross_map` struct. I'd have to study the code to understand how we could remove it. > > You are right that we are still "maintaining" the last field and there are many references to it in the code. I am thinking that with Ramki's changes to remembered set scanning, it is no longer used in scanning, but I may be remembering this incorrectly. > > I agree this is not a subtle change and would require some "study" to make sure it is done properly, if at all. @kdnilsen : `object_starts` uses the xmap info, which is still used in card scanning in the `block_start` method. The use of a traditional BOT was deferred to future work when I redid the RS scanning earlier this year. There's an open ticket for that work/investigation. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/275#discussion_r1190384053 From ysr at openjdk.org Wed May 10 21:01:37 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 10 May 2023 21:01:37 GMT Subject: RFR: Increase card offset mask [v2] In-Reply-To: <6h2D0ecBepau5jpQOI1g8OekefJbesmAZf5vDtHfyi8=.759444bb-ed9b-445e-bc48-f89b2c90c1de@github.com> References: <6h2D0ecBepau5jpQOI1g8OekefJbesmAZf5vDtHfyi8=.759444bb-ed9b-445e-bc48-f89b2c90c1de@github.com> Message-ID: On Tue, 9 May 2023 20:43:26 GMT, William Kemper wrote: >> On 32-bit platforms, a 512 byte card covers 128 words. On 64-bit platforms, a 1024 byte card (the maximum allowed) also covers 128 words. This change increases the size of the mask we use when encoding object start offsets to handle these situations. > > William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into increase-card-offset-mask > - Re-problem-list failing pipeline tests > - Re-problem-list failing GHA tests > - Increase card offset mask to cover 32-bit words and/or 1k cards src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 401: > 399: // If we're setting first_start, assume the card has an object. > 400: inline void set_first_start(size_t card_index, uint8_t value) { > 401: assert(value < FirstStartBits, "Offset into card would be truncated"); `value <= FirstStartBits` ? Can/should replace (or augment, if you prefer) with a static assert using the maximum offset computed using card size and oop size? ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/275#discussion_r1190389207 From ysr at openjdk.org Wed May 10 21:08:25 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 10 May 2023 21:08:25 GMT Subject: RFR: Increase card offset mask [v2] In-Reply-To: References: <6h2D0ecBepau5jpQOI1g8OekefJbesmAZf5vDtHfyi8=.759444bb-ed9b-445e-bc48-f89b2c90c1de@github.com> Message-ID: <_ZcHgPjQTRUHdLEgAudIE3hO5tIF50yLFeAUsLBSIf4=.e6d8d7cb-7dbb-42f4-9486-500ebd9dd579@github.com> On Wed, 10 May 2023 20:58:28 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into increase-card-offset-mask >> - Re-problem-list failing pipeline tests >> - Re-problem-list failing GHA tests >> - Increase card offset mask to cover 32-bit words and/or 1k cards > > src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 401: > >> 399: // If we're setting first_start, assume the card has an object. >> 400: inline void set_first_start(size_t card_index, uint8_t value) { >> 401: assert(value < FirstStartBits, "Offset into card would be truncated"); > > `value <= FirstStartBits` ? > > Can/should replace (or augment, if you prefer) with a static assert using the maximum possible offset computed using card size and heap word size? (dynamic) may be also in `set_last_start`? ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/275#discussion_r1190394955 From kdnilsen at openjdk.org Wed May 10 21:29:16 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 10 May 2023 21:29:16 GMT Subject: RFR: Assert that region usage accounting is only accessed in generational mode In-Reply-To: References: Message-ID: On Wed, 10 May 2023 20:39:49 GMT, William Kemper wrote: > There was one unrelated test failure in the github actions (running G1). Marked as reviewed by kdnilsen (Committer). ------------- PR Review: https://git.openjdk.org/shenandoah/pull/278#pullrequestreview-1421399369 From mdoerr at openjdk.org Wed May 10 22:32:55 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 22:32:55 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: <3UOaZ75k_vzmyK8rntwXGjUT_Hd1IAtVRkQ5G3zpTr0=.0b46880f-1476-4688-82cc-df853e3f8bf8@github.com> <5uv2Nqt_IDeyq2NLXG3RziMSIPTeTnwUnDb9GhFaDEc=.a9728f65-3b53-44eb-94d5-87541d215334@github.com> Message-ID: On Wed, 10 May 2023 14:26:54 GMT, Jorn Vernee wrote: >> FWIW, since Shenandoah changed their load barriers we have been cleaning away the usages of the Access API for loads and stores to primitive values. There's no such support in the C++ Runtime code. > > Ok, since this is loading a `long` (which represents an address that points into the code cache) I think we're fine without using the access API then? Correct. The code had been written for the previous version of Shenandoah (1.0). No current GC uses barriers for non-oop types and the C++ Runtime doesn't support it any more as Stefan pointed out. It is still possible to use the access API on other platforms, but it does nothing more than a plain load/store for non-oop types. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1190447075 From wkemper at openjdk.org Wed May 10 22:34:28 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 10 May 2023 22:34:28 GMT Subject: RFR: Increase card offset mask [v2] In-Reply-To: <_ZcHgPjQTRUHdLEgAudIE3hO5tIF50yLFeAUsLBSIf4=.e6d8d7cb-7dbb-42f4-9486-500ebd9dd579@github.com> References: <6h2D0ecBepau5jpQOI1g8OekefJbesmAZf5vDtHfyi8=.759444bb-ed9b-445e-bc48-f89b2c90c1de@github.com> <_ZcHgPjQTRUHdLEgAudIE3hO5tIF50yLFeAUsLBSIf4=.e6d8d7cb-7dbb-42f4-9486-500ebd9dd579@github.com> Message-ID: On Wed, 10 May 2023 21:06:06 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 401: >> >>> 399: // If we're setting first_start, assume the card has an object. >>> 400: inline void set_first_start(size_t card_index, uint8_t value) { >>> 401: assert(value < FirstStartBits, "Offset into card would be truncated"); >> >> `value <= FirstStartBits` ? >> >> Can/should replace (or augment, if you prefer) with a static assert using the maximum possible offset computed using card size and heap word size? > > (dynamic) may be also in `set_last_start`? I like the idea of a static assert, will look into that. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/275#discussion_r1190446988 From wkemper at openjdk.org Wed May 10 22:34:29 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 10 May 2023 22:34:29 GMT Subject: RFR: Increase card offset mask [v2] In-Reply-To: References: <6h2D0ecBepau5jpQOI1g8OekefJbesmAZf5vDtHfyi8=.759444bb-ed9b-445e-bc48-f89b2c90c1de@github.com> <_ZcHgPjQTRUHdLEgAudIE3hO5tIF50yLFeAUsLBSIf4=.e6d8d7cb-7dbb-42f4-9486-500ebd9dd579@github.com> Message-ID: On Wed, 10 May 2023 22:30:02 GMT, William Kemper wrote: >> (dynamic) may be also in `set_last_start`? > > I like the idea of a static assert, will look into that. Shouldn't it be `<`? If the offset were `=` to `FirstStartBits` it should be offset `0` in the next card? ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/275#discussion_r1190447711 From wkemper at openjdk.org Wed May 10 22:51:17 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 10 May 2023 22:51:17 GMT Subject: Integrated: Assert that region usage accounting is only accessed in generational mode In-Reply-To: References: Message-ID: <0mjO57NczwXL2KQDkUDZRz2bSYdE77adBJlyehBCE2U=.e9323b16-e718-420f-9056-eae431d5484d@github.com> On Wed, 10 May 2023 20:39:49 GMT, William Kemper wrote: > There was one unrelated test failure in the github actions (running G1). This pull request has now been integrated. Changeset: a2ab9979 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/a2ab9979c6fa12e8e7f1feadca248eb31eae07de Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Assert that region usage accounting is only accessed in generational mode Reviewed-by: kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/278 From kdnilsen at openjdk.org Wed May 10 23:01:34 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 10 May 2023 23:01:34 GMT Subject: RFR: DRAFT: Expand old on demand [v42] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 18:41:49 GMT, Kelvin Nilsen wrote: >> This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. >> >> As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 133: > 131: } > 132: if (r->age() >= InitialTenuringThreshold) { > 133: r->save_top_before_promote(); We have to save top before promote, even if we do not promote in place, because evacuator asks "what was garbage before top was saved"? Need to think about tis... ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190459207 From kdnilsen at openjdk.org Wed May 10 23:01:35 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 10 May 2023 23:01:35 GMT Subject: RFR: DRAFT: Expand old on demand [v42] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 22:56:37 GMT, Kelvin Nilsen wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix whitespace > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 133: > >> 131: } >> 132: if (r->age() >= InitialTenuringThreshold) { >> 133: r->save_top_before_promote(); > > We have to save top before promote, even if we do not promote in place, because evacuator asks "what was garbage before top was saved"? Need to think about tis... the evacuator asks what was top at mark end? ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190460033 From ysr at openjdk.org Thu May 11 01:24:17 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 11 May 2023 01:24:17 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: On Mon, 8 May 2023 23:52:30 GMT, Kelvin Nilsen wrote: > This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. It seems like you want to rename `ShenandoahMmuTracker` to `ShenandoahMuTracker` :-) Changes look good; just some minor comments below. src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 506: > 504: msg = (generation == YOUNG) ? "At end of Concurrent Young GC" : > 505: "At end of Concurrent Bootstrap GC"; > 506: // We only record GC results if GC was successful Can this section on MMU tracking be moved into `SCT::service_concurrent_cycle(ShenandoahHeap*, ShenandoahGeneration*, GCCause&, bool bootstrap)` at (just after) line 728 below? That would be consistent with doing the corresponding old gen work in `SCT::service_cocnurrent_old_cycle()` as you've done below. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 296: > 294: } > 295: > 296: // With simplified MU implementation, we no longer use _mmu_tracker->average to trigger resizing Delete, rather than comment out. You can just leave a comment to motivate the deletion if you think that would help reviewers. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 54: > 52: class ShenandoahMmuTracker { > 53: private: > 54: // For reporting utilization during most recent GC cycle May be elaborate a bit (please correct as needed): // These hold the values of the most recent snapshots of cumulative // observations, and are used to compute deltas for specific // recording/reporting epochs at each subsequent snapshot. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 75: > 73: TruncatedSeq _mmu_average; > 74: > 75: static double process_time_seconds(); It seems like `process_time_seconds()` is no longer needed either. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 78: > 76: static void fetch_cpu_times(double &gc_time, double &mutator_time); > 77: > 78: void help_record_concurrent(ShenandoahGeneration* generation, uint gcid, const char* msg); Do you want to call this `record_concurrent_helper()` or `..._work()` in line with existing helper method convention? src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 87: > 85: void initialize(); > 86: > 87: // At completion of each GC cycle (not including interrupted cycles), we invoke one of the following to record the In the case of GC's that have been canceled, is the work quantum in service of the partial/canceled cycle just dropped on the floor? Should it intentionally be? Should it intentionally not be? src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 90: > 88: // GC utilization during this cycle. > 89: // > 90: // We may redundantly record degen and full, in which case the gcid will repeat. We log these as FULL. Won't `gc_id` also repeat for old_marking_increments too? ------------- Marked as reviewed by ysr (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/274#pullrequestreview-1417731871 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1188041048 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1190537254 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1190535612 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1189236492 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1190532083 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1188057149 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1190532982 From ysr at openjdk.org Thu May 11 01:24:18 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 11 May 2023 01:24:18 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: On Thu, 11 May 2023 01:07:11 GMT, Y. Srinivas Ramakrishna wrote: >> This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 90: > >> 88: // GC utilization during this cycle. >> 89: // >> 90: // We may redundantly record degen and full, in which case the gcid will repeat. We log these as FULL. > > Won't `gc_id` also repeat for old_marking_increments too? Or may be I am misunderstanding what "repeat" means here. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1190533222 From ysr at openjdk.org Thu May 11 02:18:17 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 11 May 2023 02:18:17 GMT Subject: RFR: Assert that region usage accounting is only accessed in generational mode In-Reply-To: References: Message-ID: On Wed, 10 May 2023 20:39:49 GMT, William Kemper wrote: > There was one unrelated test failure in the github actions (running G1). Is the idea of the assert then to catch the failure to debug it? Should the `global generation` instance not even exist (i.e. c'tor not be called) in non-generational mode, or can such an object exist, but this and only this method may not be called on it in non-generational mode? Wondering if this is just a question of running the failing GHA test to reproduce the issue? ------------- PR Comment: https://git.openjdk.org/shenandoah/pull/278#issuecomment-1543220585 From dholmes at openjdk.org Thu May 11 02:19:40 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 11 May 2023 02:19:40 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v2] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:48:52 GMT, Kim Barrett wrote: >> Please review this renaming of Atomic::fetch_and_add and friends to be >> consistent with the naming convention recently chosen for atomic bitops. That >> is, make the following name changes for class Atomic and it's implementation: >> >> - fetch_and_add => fetch_then_add >> - add_and_fetch => add_then_fetch >> >> Testing: >> mach5 tier1-3 >> GHA testing > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > revert accidental Red Hat copyright change LGTM! Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13896#pullrequestreview-1421637367 From rrich at openjdk.org Thu May 11 08:28:53 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 11 May 2023 08:28:53 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: <3UOaZ75k_vzmyK8rntwXGjUT_Hd1IAtVRkQ5G3zpTr0=.0b46880f-1476-4688-82cc-df853e3f8bf8@github.com> <5uv2Nqt_IDeyq2NLXG3RziMSIPTeTnwUnDb9GhFaDEc=.a9728f65-3b53-44eb-94d5-87541d215334@github.com> Message-ID: On Wed, 10 May 2023 22:30:13 GMT, Martin Doerr wrote: >> Ok, since this is loading a `long` (which represents an address that points into the code cache) I think we're fine without using the access API then? > > Correct. The code had been written for the previous version of Shenandoah (1.0). No current GC uses barriers for non-oop types and the C++ Runtime doesn't support it any more as Stefan pointed out. > It is still possible to use the access API on other platforms, but it does nothing more than a plain load/store for non-oop types. I'm ok with doing a plain access. I don't like the difference to other ports as it will at least waste time of people with less expertise in the area (e.g. new to the project). No need to continue the discussion in this pr though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1190812209 From fjiang at openjdk.org Thu May 11 09:04:49 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 11 May 2023 09:04:49 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 19:16:58 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Merge branch 'JDK-8305898' into JDK-8305895 > - @shipilev review, round 2 > - Fix build > - @shipilev comments, round 1 > - Allow to resolve mark with LW locking > - Use new lightweight locking with compact headers > - Merge branch 'JDK-8305898' into JDK-8305895 > - Imporve GetObjectSizeIntrinsicsTest > - Some GC fixes > - Add BaseOffsets test > - ... and 18 more: https://git.openjdk.org/jdk/compare/39c33727...58046e58 Hi, I'm trying to build JDK binary base on this pr on Linux-riscv64 (with GCC-11), but I got the following error: === Output from failing command(s) repeated here === * For target buildtools_interim_langtools_modules_java.compiler.interim__the.BUILD_java.compiler.interim_batch: warning: unknown enum constant Feature.STRING_TEMPLATES warning: unknown enum constant Feature.STRING_TEMPLATES warning: unknown enum constant Feature.STRING_TEMPLATES warning: unknown enum constant Feature.STRING_TEMPLATES error: warnings found and -Werror specified 1 error 4 warnings * All command lines available in /home/ubuntu/workspace/jdk/build/linux-riscv64-server-release/make-support/failure-logs. === End of repeated output === Here is my gcc info: ubuntu at ubuntu-93:~/workspace/jdk$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/riscv64-linux-gnu/11/lto-wrapper Target: riscv64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.3.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=riscv64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --disable-multilib --with-arch=rv64gc --with-abi=lp64d --enable-checking=release --build=riscv64-linux-gnu --host=riscv64-linux-gnu --target=riscv64-linux-gnu --with-buil d-config=bootstrap-lto-lean --enable-link-serialization=4 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04) `` ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1543614743 From shade at openjdk.org Thu May 11 09:15:48 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 09:15:48 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 09:01:29 GMT, Feilong Jiang wrote: > === Output from failing command(s) repeated here === > * For target buildtools_interim_langtools_modules_java.compiler.interim__the.BUILD_java.compiler.interim_batch: > warning: unknown enum constant Feature.STRING_TEMPLATES > warning: unknown enum constant Feature.STRING_TEMPLATES > warning: unknown enum constant Feature.STRING_TEMPLATES > warning: unknown enum constant Feature.STRING_TEMPLATES > error: warnings found and -Werror specified This is not a gcc problem, but a javac problem. Please check your boot JDK is correct, that you have ran `make clean`, and that mainline JDK on the same machine does not fail with the same error? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1543632370 From fjiang at openjdk.org Thu May 11 09:30:47 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 11 May 2023 09:30:47 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 09:12:41 GMT, Aleksey Shipilev wrote: > > === Output from failing command(s) repeated here === > > > > * For target buildtools_interim_langtools_modules_java.compiler.interim__the.BUILD_java.compiler.interim_batch: > > warning: unknown enum constant Feature.STRING_TEMPLATES > > warning: unknown enum constant Feature.STRING_TEMPLATES > > warning: unknown enum constant Feature.STRING_TEMPLATES > > warning: unknown enum constant Feature.STRING_TEMPLATES > > error: warnings found and -Werror specified > > This is not a gcc problem, but a javac problem. > > Please check your boot JDK is correct, that you have ran `make clean`, and that mainline JDK on the same machine does not fail with the same error? Thanks for the troubleshooting advice! I have tried the same boot JDK with the mainline JDK code base, and it looks good. P.S. I also tried cross-compiling with this PR, and the building passed without failures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1543653570 From shade at openjdk.org Thu May 11 09:30:50 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 09:30:50 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: <_8Kn0tp4jCs7AT6UkMa9BiK-NYWRIPF-AYprF8WcAwU=.b6753c09-89d2-4cd5-bea7-49cb1c55b927@github.com> References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> <_8Kn0tp4jCs7AT6UkMa9BiK-NYWRIPF-AYprF8WcAwU=.b6753c09-89d2-4cd5-bea7-49cb1c55b927@github.com> Message-ID: On Wed, 10 May 2023 16:20:18 GMT, Roman Kennke wrote: >> src/hotspot/share/runtime/arguments.cpp line 3120: >> >>> 3118: >>> 3119: #ifdef _LP64 >>> 3120: if (!FLAG_IS_DEFAULT(UseCompactObjectHeaders)) { >> >> Just `if (UseCompactObjectHeaders)`, or do I miss something? > > I've done this on purpose. > When the default for UseCompactObjectHeaders is false, then CDS archives will be written with legacy headers, and we could not read this when running with +UseCompactObjectHeaders. > When the default for UseCompactObjectHeaders is true, then CDS archives will be written with compact headers, and we could not read this when running with -UseCompactObjectHeaders. > I (and others) are changing the default of this flag regularily for testing, because that also catches tests that require flagless, and the way this is written, would not require changing this line in arguments.cpp too. > > I guess it would be even more useful if we could detect which setting of the flag has been used when writing a CDS archive, and don't read it if it's not compatible. > > It would be *even* better, if we could detect the setting of the flag when archive has been written, and transform it into whatever the JVM is running with, but that would be too much to ask for this PR, I think. OK, I see. This makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190894761 From shade at openjdk.org Thu May 11 09:45:51 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 09:45:51 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 19:16:58 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Merge branch 'JDK-8305898' into JDK-8305895 > - @shipilev review, round 2 > - Fix build > - @shipilev comments, round 1 > - Allow to resolve mark with LW locking > - Use new lightweight locking with compact headers > - Merge branch 'JDK-8305898' into JDK-8305895 > - Imporve GetObjectSizeIntrinsicsTest > - Some GC fixes > - Add BaseOffsets test > - ... and 18 more: https://git.openjdk.org/jdk/compare/39c33727...58046e58 > I have tried the same boot JDK with the mainline JDK code base, and it looks good. P.S. I also tried cross-compiling with this PR, and the building passed without failures. OK, but this still points to incompatibility with boot/build JDK. The state of the source tree at this PR does not have any `STRING_TEMPLATES` symbol, it was added with JDK-8285932, so there is no way interim java.compiler in current PR references it, it must come from somewhere else. Do you have a self-built boot JDK that carries JDK-8285932? I'd suggest using a more stable JDK 20 as boot JDK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1543678637 From fjiang at openjdk.org Thu May 11 09:55:47 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 11 May 2023 09:55:47 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 09:42:59 GMT, Aleksey Shipilev wrote: > > I have tried the same boot JDK with the mainline JDK code base, and it looks good. P.S. I also tried cross-compiling with this PR, and the building passed without failures. > > > > OK, but this still points to incompatibility with boot/build JDK. The state of the source tree at this PR does not have any `STRING_TEMPLATES` symbol, it was added with JDK-8285932, so there is no way interim java.compiler in current PR references it, it must come from somewhere else. Do you have a self-built boot JDK that carries JDK-8285932? I'd suggest using a more stable JDK 20 as boot JDK. Yes, the boot JDK I used was based on mainline JDK. That the problem here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1543693010 From eosterlund at openjdk.org Thu May 11 10:02:57 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 11 May 2023 10:02:57 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 19:16:58 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Merge branch 'JDK-8305898' into JDK-8305895 > - @shipilev review, round 2 > - Fix build > - @shipilev comments, round 1 > - Allow to resolve mark with LW locking > - Use new lightweight locking with compact headers > - Merge branch 'JDK-8305898' into JDK-8305895 > - Imporve GetObjectSizeIntrinsicsTest > - Some GC fixes > - Add BaseOffsets test > - ... and 18 more: https://git.openjdk.org/jdk/compare/39c33727...58046e58 Changes requested by eosterlund (Reviewer). src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 250: > 248: Copy::aligned_disjoint_words(cast_from_oop(o), cast_from_oop(new_obj), new_obj_size); > 249: > 250: if (!new_obj->mark().is_marked()) { For this check to work correctly, we are assuming that Copy::aligned_disjoint_words respects word level atomicity, even though we are using one of the non-atomic copying functions. That doesn't feel safe. ------------- PR Review: https://git.openjdk.org/jdk/pull/13844#pullrequestreview-1422218250 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190934673 From shade at openjdk.org Thu May 11 10:34:49 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 10:34:49 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: <1XmWtKbvWRLRRE_Aw5xUwdRmakCIZzryIlMUT_kK3VU=.297cd5a9-c505-4f9d-a80f-3a3db89a84f4@github.com> On Thu, 11 May 2023 10:00:00 GMT, Erik ?sterlund wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: >> >> - Merge branch 'JDK-8305898' into JDK-8305895 >> - @shipilev review, round 2 >> - Fix build >> - @shipilev comments, round 1 >> - Allow to resolve mark with LW locking >> - Use new lightweight locking with compact headers >> - Merge branch 'JDK-8305898' into JDK-8305895 >> - Imporve GetObjectSizeIntrinsicsTest >> - Some GC fixes >> - Add BaseOffsets test >> - ... and 18 more: https://git.openjdk.org/jdk/compare/39c33727...58046e58 > > src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 250: > >> 248: Copy::aligned_disjoint_words(cast_from_oop(o), cast_from_oop(new_obj), new_obj_size); >> 249: >> 250: if (!new_obj->mark().is_marked()) { > > For this check to work correctly, we are assuming that Copy::aligned_disjoint_words respects word level atomicity, even though we are using one of the non-atomic copying functions. That doesn't feel safe. True, it is not exactly safe. I wonder if we can plug this particular leak by doing the following: // Copy obj Copy::aligned_disjoint_words(cast_from_oop(o), cast_from_oop(new_obj), new_obj_size); if (UseCompactObjectHeaders) { // The copy above is not atomic. Make sure we have seen the proper mark // and re-install it into the copy, so that Klass* is guaranteed to be correct. markWord mark = o->mark_acquire(); if (!mark.is_marked()) { new_obj->set_mark(mark); ContinuationGCSupport::transform_stack_chunk(new_obj); } else { // If we copied a mark-word that indicates 'forwarded' state, the object // installation would not succeed. We cannot access Klass* anymore either. // Skip the transformation. } } else { ContinuationGCSupport::transform_stack_chunk(new_obj); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190970182 From eosterlund at openjdk.org Thu May 11 10:41:50 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 11 May 2023 10:41:50 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: <1XmWtKbvWRLRRE_Aw5xUwdRmakCIZzryIlMUT_kK3VU=.297cd5a9-c505-4f9d-a80f-3a3db89a84f4@github.com> References: <1XmWtKbvWRLRRE_Aw5xUwdRmakCIZzryIlMUT_kK3VU=.297cd5a9-c505-4f9d-a80f-3a3db89a84f4@github.com> Message-ID: On Thu, 11 May 2023 10:31:22 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 250: >> >>> 248: Copy::aligned_disjoint_words(cast_from_oop(o), cast_from_oop(new_obj), new_obj_size); >>> 249: >>> 250: if (!new_obj->mark().is_marked()) { >> >> For this check to work correctly, we are assuming that Copy::aligned_disjoint_words respects word level atomicity, even though we are using one of the non-atomic copying functions. That doesn't feel safe. > > True, it is not exactly safe. I wonder if we can plug this particular leak by doing the following: > > > // Copy obj > Copy::aligned_disjoint_words(cast_from_oop(o), cast_from_oop(new_obj), new_obj_size); > > if (UseCompactObjectHeaders) { > // The copy above is not atomic. Make sure we have seen the proper mark > // and re-install it into the copy, so that Klass* is guaranteed to be correct. > markWord mark = o->mark_acquire(); > if (!mark.is_marked()) { > new_obj->set_mark(mark); > ContinuationGCSupport::transform_stack_chunk(new_obj); > } else { > // If we copied a mark-word that indicates 'forwarded' state, the object > // installation would not succeed. We cannot access Klass* anymore either. > // Skip the transformation. > } > } else { > ContinuationGCSupport::transform_stack_chunk(new_obj); > } The load in mark_acquire can float up above the copying. So I don't think that will work either. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190977091 From rkennke at openjdk.org Thu May 11 10:52:50 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 11 May 2023 10:52:50 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: <1XmWtKbvWRLRRE_Aw5xUwdRmakCIZzryIlMUT_kK3VU=.297cd5a9-c505-4f9d-a80f-3a3db89a84f4@github.com> Message-ID: On Thu, 11 May 2023 10:38:26 GMT, Erik ?sterlund wrote: >> True, it is not exactly safe. I wonder if we can plug this particular leak by doing the following: >> >> >> // Copy obj >> Copy::aligned_disjoint_words(cast_from_oop(o), cast_from_oop(new_obj), new_obj_size); >> >> if (UseCompactObjectHeaders) { >> // The copy above is not atomic. Make sure we have seen the proper mark >> // and re-install it into the copy, so that Klass* is guaranteed to be correct. >> markWord mark = o->mark_acquire(); >> if (!mark.is_marked()) { >> new_obj->set_mark(mark); >> ContinuationGCSupport::transform_stack_chunk(new_obj); >> } else { >> // If we copied a mark-word that indicates 'forwarded' state, the object >> // installation would not succeed. We cannot access Klass* anymore either. >> // Skip the transformation. >> } >> } else { >> ContinuationGCSupport::transform_stack_chunk(new_obj); >> } > > The load in mark_acquire can float up above the copying. So I don't think that will work either. Hmm, right. I guess this is not only about atomicity. It's also possible that we see that it's not marked/forwarded, then ignore the transform_stack_chunk() call, which would be wrong. The problem is that transform_stack_chunk() wants to access the Klass* to check is_stackChunk(). So maybe we need to extract the Klass* from the test_mark and pass it to (a new variant of) ContinuationSupport::transform_stack_chunk() which only uses that class? That should work, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190988361 From shade at openjdk.org Thu May 11 11:24:51 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 11:24:51 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: <1XmWtKbvWRLRRE_Aw5xUwdRmakCIZzryIlMUT_kK3VU=.297cd5a9-c505-4f9d-a80f-3a3db89a84f4@github.com> Message-ID: On Thu, 11 May 2023 10:49:49 GMT, Roman Kennke wrote: >> The load in mark_acquire can float up above the copying. So I don't think that will work either. > > Hmm, right. I guess this is not only about atomicity. It's also possible that we see that it's not marked/forwarded, then ignore the transform_stack_chunk() call, which would be wrong. > The problem is that transform_stack_chunk() wants to access the Klass* to check is_stackChunk(). So maybe we need to extract the Klass* from the test_mark and pass it to (a new variant of) ContinuationSupport::transform_stack_chunk() which only uses that class? That should work, right? I don't quite see the problem. So the `mark_acquire` floats before the copy, what does it break? If we read the "forwarded" mark early, we know it would fail installation, because there is already a forwarding. No transformation is needed then. If we read the "not forwarded" mark, then there a chance we are going to install the copy as forwardee, so we need to transform. But for the case of compact object headers, we need to guarantee for `transform_stack_chunk` is seeing a consistent `Klass*` from the mark, which specially atomic read of mark provides. We store that mark in the copy too, thus overwriting any atomicity violations the copy routine might have introduced, and giving `transform_chunk_call` a green light to proceed with transformation. (Aside: This thing could even be just `mark`, because even the relaxed atomic load would suffice.) IOW, if the copy routine is not atomic, and it is a problem for mark, then we can overwrite mark in the copy with the value we got with special atomic load. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191018539 From eosterlund at openjdk.org Thu May 11 11:35:55 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 11 May 2023 11:35:55 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: <1XmWtKbvWRLRRE_Aw5xUwdRmakCIZzryIlMUT_kK3VU=.297cd5a9-c505-4f9d-a80f-3a3db89a84f4@github.com> Message-ID: On Thu, 11 May 2023 11:21:16 GMT, Aleksey Shipilev wrote: >> Hmm, right. I guess this is not only about atomicity. It's also possible that we see that it's not marked/forwarded, then ignore the transform_stack_chunk() call, which would be wrong. >> The problem is that transform_stack_chunk() wants to access the Klass* to check is_stackChunk(). So maybe we need to extract the Klass* from the test_mark and pass it to (a new variant of) ContinuationSupport::transform_stack_chunk() which only uses that class? That should work, right? > > I don't quite see the problem. So the `mark_acquire` floats before the copy, what does it break? > > If we read the "forwarded" mark early, we know it would fail installation, because there is already a forwarding. No transformation is needed then. > > If we read the "not forwarded" mark, then there a chance we are going to install the copy as forwardee, so we need to transform. But for the case of compact object headers, we need to guarantee for `transform_stack_chunk` is seeing a consistent `Klass*` from the mark, which specially atomic read of mark provides. We store that mark in the copy too, thus overwriting any atomicity violations the copy routine might have introduced, and giving `transform_chunk_call` a green light to proceed with transformation. (Aside: This thing could even be just `mark`, because even the relaxed atomic load would suffice.) > > IOW, if the copy routine is not atomic, and it is a problem for mark, then we can overwrite mark in the copy with the value we got with special atomic load. > Hmm, right. I guess this is not only about atomicity. It's also possible that we see that it's not marked/forwarded, then ignore the transform_stack_chunk() call, which would be wrong. > The problem is that transform_stack_chunk() wants to access the Klass* to check is_stackChunk(). So maybe we need to extract the Klass* from the test_mark and pass it to (a new variant of) ContinuationSupport::transform_stack_chunk() which only uses that class? That should work, right? Yes, that would work. Given of course that nobody is reloading the klass somewhere in there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191032958 From eosterlund at openjdk.org Thu May 11 11:39:47 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 11 May 2023 11:39:47 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: <1XmWtKbvWRLRRE_Aw5xUwdRmakCIZzryIlMUT_kK3VU=.297cd5a9-c505-4f9d-a80f-3a3db89a84f4@github.com> Message-ID: <9FNzcjOVIHMX7qMkrrqKulZUOoN0Z3n57esvenD5KVw=.d3cc0a37-c18d-4774-bab1-28f21d5fef79@github.com> On Thu, 11 May 2023 11:33:12 GMT, Erik ?sterlund wrote: >> I don't quite see the problem. So the `mark_acquire` floats before the copy, what does it break? >> >> If we read the "forwarded" mark early, we know it would fail installation, because there is already a forwarding. No transformation is needed then. >> >> If we read the "not forwarded" mark, then there a chance we are going to install the copy as forwardee, so we need to transform. But for the case of compact object headers, we need to guarantee for `transform_stack_chunk` is seeing a consistent `Klass*` from the mark, which specially atomic read of mark provides. We store that mark in the copy too, thus overwriting any atomicity violations the copy routine might have introduced, and giving `transform_chunk_call` a green light to proceed with transformation. (Aside: This thing could even be just `mark`, because even the relaxed atomic load would suffice.) >> >> IOW, if the copy routine is not atomic, and it is a problem for mark, then we can overwrite mark in the copy with the value we got with special atomic load. > >> Hmm, right. I guess this is not only about atomicity. It's also possible that we see that it's not marked/forwarded, then ignore the transform_stack_chunk() call, which would be wrong. >> The problem is that transform_stack_chunk() wants to access the Klass* to check is_stackChunk(). So maybe we need to extract the Klass* from the test_mark and pass it to (a new variant of) ContinuationSupport::transform_stack_chunk() which only uses that class? That should work, right? > > Yes, that would work. Given of course that nobody is reloading the klass somewhere in there. > I don't quite see the problem. So the `mark_acquire` floats before the copy, what does it break? > > If we read the "forwarded" mark early, we know it would fail installation, because there is already a forwarding. No transformation is needed then. > > If we read the "not forwarded" mark, then there a chance we are going to install the copy as forwardee, so we need to transform. But for the case of compact object headers, we need to guarantee for `transform_stack_chunk` is seeing a consistent `Klass*` from the mark, which specially atomic read of mark provides. We store that mark in the copy too, thus overwriting any atomicity violations the copy routine might have introduced, and giving `transform_chunk_call` a green light to proceed with transformation. (Aside: This thing could even be just `mark`, because even the relaxed atomic load would suffice.) > > IOW, if the copy routine is not atomic, and it is a problem for mark, then we can overwrite mark in the copy with the value we got with special atomic load. Ah. Yes, I think you are right. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191036699 From shade at openjdk.org Thu May 11 12:01:10 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 12:01:10 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v41] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 20:31:27 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The c... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Align fake-heap without GCC warnings (duh) A few additional minor things: src/hotspot/share/gc/shared/slidingForwarding.cpp line 152: > 150: // Search existing entry in chain starting at idx. > 151: for (FallbackTableEntry* entry = head; entry != nullptr; entry = entry->_next) { > 152: assert(entry->_from != from,"Don't re-forward entries into the fallback-table"); Suggestion: assert(entry->_from != from, "Don't re-forward entries into the fallback-table"); src/hotspot/share/gc/shared/slidingForwarding.hpp line 62: > 60: * ^----- normal lock bits, would record "object is forwarded" > 61: * ^------ fallback bit (explained below) > 62: * ^------- alternate region select Suggestion: * [................................OOOOOOOOOOOOOOOOOOOOOOOOOOOOAFTT] * ^----- normal lock bits, would record "object is forwarded" * ^------- fallback bit (explained below) * ^-------- alternate region select test/hotspot/gtest/gc/shared/test_preservedMarks.cpp line 61: > 59: #ifndef PRODUCT > 60: FlagSetting fs(UseAltGCForwarding, false); > 61: #endif So, would this test fail in release JDK, but when `UseAltGCForwarding` is `true`? If so, maybe do: #ifndef PRODUCT FlagSetting fs(UseAltGCForwarding, false); #else // Should not run this test with alt GC forwarding if (UseAltGCForwarding) return; #endif test/hotspot/jtreg/gc/stress/systemgc/TestSystemGCWithG1.java line 44: > 42: * @requires vm.gc.G1 > 43: * @requires vm.debug > 44: * @requires os.maxMemory > 8g Why `>8g`? The test uses `-Xmx512m`. test/hotspot/jtreg/gc/stress/systemgc/TestSystemGCWithSerial.java line 65: > 63: * @library / > 64: * @requires vm.gc.Serial > 65: * @requires vm.debug This likely requires another requires: * @requires (vm.bits == "64") & (os.maxMemory >= 6G) ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1422199824 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1191051839 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1191057460 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1190936600 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1190929583 PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1190922619 From shade at openjdk.org Thu May 11 12:01:11 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 12:01:11 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v41] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 09:55:32 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Align fake-heap without GCC warnings (duh) > > test/hotspot/jtreg/gc/stress/systemgc/TestSystemGCWithG1.java line 44: > >> 42: * @requires vm.gc.G1 >> 43: * @requires vm.debug >> 44: * @requires os.maxMemory > 8g > > Why `>8g`? The test uses `-Xmx512m`. You'd probably need `(vm.bits == "64")` check too, because you can have a 32-bit system with lots of memory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1190930271 From shade at openjdk.org Thu May 11 12:29:55 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 12:29:55 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 19:16:58 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Merge branch 'JDK-8305898' into JDK-8305895 > - @shipilev review, round 2 > - Fix build > - @shipilev comments, round 1 > - Allow to resolve mark with LW locking > - Use new lightweight locking with compact headers > - Merge branch 'JDK-8305898' into JDK-8305895 > - Imporve GetObjectSizeIntrinsicsTest > - Some GC fixes > - Add BaseOffsets test > - ... and 18 more: https://git.openjdk.org/jdk/compare/39c33727...58046e58 Another quick read. src/hotspot/cpu/aarch64/aarch64.ad line 7370: > 7368: __ br(Assembler::NE, stub->entry()); > 7369: __ bind(stub->continuation()); > 7370: __ lsr(dst, dst, markWord::klass_shift); Like for x86, should probably go in `macroAssembler`? src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4325: > 4323: ldrw(dst, Address(src, oopDesc::klass_offset_in_bytes())); > 4324: return; > 4325: } Long-shot: maybe we should be doing `MacroAssembler::load_compact_klass` specifically for `UseCompactObjectHeaders`, and keep the legacy `UseCompressedClassPointers` paths in the callers. This would limit the exposure to the new code here. src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3542: > 3540: __ decode_klass_not_null(result, tmp); > 3541: } else > 3542: if (UseCompressedClassPointers) { Suggestion: } else if (UseCompressedClassPointers) { src/hotspot/share/oops/objArrayKlass.inline.hpp line 73: > 71: template > 72: void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { > 73: // We cannot safely access the Klass* with compact headers. "// In this assert, ... src/hotspot/share/oops/oop.inline.hpp line 88: > 86: markWord oopDesc::resolve_mark() const { > 87: assert(LockingMode != LM_LEGACY, "Not safe with legacy stack-locking"); > 88: markWord hdr = mark(); Convention `hdr` -> `m`? Everywhere in new code in this file, I think. ------------- PR Review: https://git.openjdk.org/jdk/pull/13844#pullrequestreview-1422448723 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191062237 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191074323 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191078669 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191095811 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191097686 From rkennke at openjdk.org Thu May 11 12:30:08 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 11 May 2023 12:30:08 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v42] In-Reply-To: References: Message-ID: <9P7Qf2b3x_d9lGYPl0VNxtWxSLj3rREZMK5JOrqVqog=.057c5855-3d1c-413a-9d0a-291aef59e3ce@github.com> > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new ... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Some more @shipilev comments - Update src/hotspot/share/gc/shared/slidingForwarding.hpp Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/b0deb2b3..3271b29b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=41 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=40-41 Stats: 20 lines in 5 files changed: 16 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From shade at openjdk.org Thu May 11 12:30:02 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 12:30:02 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Wed, 10 May 2023 14:23:02 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/parallel/psOldGen.cpp line 398: >> >>> 396: >>> 397: virtual void do_object(oop obj) { >>> 398: HeapWord* test_addr = cast_from_oop(obj); >> >> I thought this `+1` is specifically to test that `object_start` is able to find the object header when given the interior pointer. See the `guarantee`-s in the next lines. > > Yes, but with compact headers, we could now have 1-word-sized objects, in which case this would fail. I am not sure how to deal with that, TBH. Maybe do the whole test only when !UseCompactObjectHeaders or when object-size is > 1? I think we should keep the original code semantics intact, and `guarantees` might need to check stuff even with zero offset. // With compact headers, the objects can be one-word sized. size_t int_off = UseCompactObjectHeaders ? MIN2(1, obj->size() - 1) : 1; HeapWord* test_addr = cast_from_oop(obj) + int_off; >> src/hotspot/share/gc/shared/collectedHeap.cpp line 232: >> >>> 230: // With compact headers, we can't safely access the class, due >>> 231: // to possibly forwarded objects. >>> 232: if (!UseCompactObjectHeaders && is_in(object->klass_raw())) { >> >> Looks good, but what this even supposed to check? `object` is not `oop` if its klass field points into Java heap? Huh? Was it some CMS shenanigan that stores something in klass word? Or is it just a glorified null check? I'll follow up on that separately. > > I have no idea *shrugs* I am dealing with this separately, this is not a concern for this PR. >> src/hotspot/share/gc/shared/collectedHeap.hpp line 312: >> >>> 310: >>> 311: virtual void fill_with_dummy_object(HeapWord* start, HeapWord* end, bool zap); >>> 312: static size_t min_dummy_object_size() { >> >> Why this change? > > That's because oopDesc::header_size() can no longer be constexpr, because it depends on UseCompactObjectHeaders. OK, fine. >> src/hotspot/share/gc/shared/memAllocator.cpp line 414: >> >>> 412: // concurrent collectors. >>> 413: if (UseCompactObjectHeaders) { >>> 414: oopDesc::release_set_mark(mem, _klass->prototype_header()); >> >> In other cases, we do `markWord::prototype().set_narrow_klass(nk)` -- it looks safer, as we get the `markWord`-s prototype, and amend it. `_klass->prototype_header` can be removed, I think. > > I like _klass->prototype_header() more, and would argue that we should use that instead, here. An object's prototype mark really depends on the Klass of the object, with compact headers, and we would always get the correct prototype out of _klass->prototype_header(). Also, perhaps more importantly, the Klass::prototype_header() is useful because we can load it in generated code with a single instruction, while fetching the markWord::prototype() and amending it *at runtime* would require a whole sequence of instructions. > > We only use markWord::prototype().set_narrow_klass(nk) in CDS, where the correct encoding of the narrow-klass in the header depends on the relocated Klass* location, and we couldn't safely use Klass::prototype_header(). OK, I see, makes sense. >> src/hotspot/share/oops/oop.hpp line 124: >> >>> 122: inline size_t size_given_klass(Klass* klass); >>> 123: >>> 124: // The following set of methods is used to access the mark-word and related >> >> So, these are done to avoid introducing branches on the paths where objects are definitely _not_ forwarded? Are there fewer places than where we expect forwardings? Maybe the better way would be to make all methods handle the occasional forwarding, and then provide the methods that provide the _fast-path_, like `fast_mark`, `fast_class`, etc? > > No, that would not work, because we have different ways to encode forwarding: full-GCs use sliding forwarding, and normal GCs use normal forwarding (with the exception of the forward-failed bit). Here, we wouldn't know which is which. OK ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191086667 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191090403 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191090688 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191093810 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191096550 From shade at openjdk.org Thu May 11 12:30:04 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 12:30:04 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Wed, 10 May 2023 10:31:03 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > src/hotspot/share/gc/shared/gc_globals.hpp line 692: > >> 690: constraint(GCCardSizeInBytesConstraintFunc,AtParse) \ >> 691: \ >> 692: product(bool, UseAltGCForwarding, false, \ > > Should it be `EXPERIMENTAL`? Comment is still open :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191093000 From rkennke at openjdk.org Thu May 11 12:48:52 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 11 May 2023 12:48:52 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Merge branch 'JDK-8305898' into JDK-8305895 - Merge branch 'JDK-8305898' into JDK-8305895 - @shipilev review, round 2 - Fix build - @shipilev comments, round 1 - Allow to resolve mark with LW locking - Use new lightweight locking with compact headers - Merge branch 'JDK-8305898' into JDK-8305895 - Imporve GetObjectSizeIntrinsicsTest - Some GC fixes - ... and 19 more: https://git.openjdk.org/jdk/compare/95341f0a...7bd036a7 ------------- Changes: https://git.openjdk.org/jdk/pull/13844/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=07 Stats: 1183 lines in 81 files changed: 944 ins; 79 del; 160 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From rkennke at openjdk.org Thu May 11 13:21:15 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 11 May 2023 13:21:15 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: More @shipilev comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/7bd036a7..698384ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=07-08 Stats: 92 lines in 12 files changed: 41 ins; 14 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From rkennke at openjdk.org Thu May 11 13:28:13 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 11 May 2023 13:28:13 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v10] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: - Merge remote-tracking branch 'origin/JDK-8305895' into JDK-8305895 - Move compact klass offset into C2 - Merge branch 'JDK-8305898' into JDK-8305895 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/698384ec..e7a0f67c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=08-09 Stats: 75 lines in 17 files changed: 37 ins; 9 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From shade at openjdk.org Thu May 11 14:50:06 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 14:50:06 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v42] In-Reply-To: <9P7Qf2b3x_d9lGYPl0VNxtWxSLj3rREZMK5JOrqVqog=.057c5855-3d1c-413a-9d0a-291aef59e3ce@github.com> References: <9P7Qf2b3x_d9lGYPl0VNxtWxSLj3rREZMK5JOrqVqog=.057c5855-3d1c-413a-9d0a-291aef59e3ce@github.com> Message-ID: On Thu, 11 May 2023 12:30:08 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The c... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Some more @shipilev comments > - Update src/hotspot/share/gc/shared/slidingForwarding.hpp > > Co-authored-by: Aleksey Shipil?v The updates look fine, thanks. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13582#pullrequestreview-1422790198 From wkemper at openjdk.org Thu May 11 15:26:43 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 11 May 2023 15:26:43 GMT Subject: RFR: Assert that region usage accounting is only accessed in generational mode In-Reply-To: References: Message-ID: On Wed, 10 May 2023 20:39:49 GMT, William Kemper wrote: > There was one unrelated test failure in the github actions (running G1). Yes, it's the latter scenario. The global generation exists in all modes, but the implementation of `used_regions` is only correct for the generational mode. The assert is intended to show that this specific method is only called in generational mode. ------------- PR Comment: https://git.openjdk.org/shenandoah/pull/278#issuecomment-1544194800 From rkennke at openjdk.org Thu May 11 19:25:46 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 11 May 2023 19:25:46 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Fix some uses of klass_offset_in_bytes() - Fix args checking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/e7a0f67c..d83ff0e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=09-10 Stats: 12 lines in 3 files changed: 5 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From wkemper at openjdk.org Thu May 11 22:20:20 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 11 May 2023 22:20:20 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: On Mon, 8 May 2023 23:52:30 GMT, Kelvin Nilsen wrote: > This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. Changes requested by wkemper (Committer). src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 511: > 509: if (heap->collection_set()->has_old_regions()) { > 510: bool mixed_is_done = (heap->old_heuristics()->unprocessed_old_collection_candidates() == 0); > 511: mmu_tracker->record_mixed(the_generation, GCId::current(), mixed_is_done); Could use `ShenandoahControlThread::get_gc_id` or just access `_gc_id` directly. `GCId::current` has thread local access overhead. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 96: > 94: } > 95: > 96: void ShenandoahMmuTracker::help_record_concurrent(ShenandoahGeneration* generation, uint gcid, const char *msg) { `help_record_concurrent` is a confusing name because this function is also used to record degenerated and full gc cycles. Maybe just `update_utilization`? src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 106: > 104: _most_recent_timestamp = current; > 105: } else { > 106: double gc_cycle_duration = current - _most_recent_timestamp; Is this cycle duration? or time between cycles? src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 138: > 136: bool has_old_candidates) { > 137: // No special processing for old marking > 138: double duration = os::elapsedTime() - _most_recent_timestamp; We only seem to use `duration` here, none of the following lines produce any used values? src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 158: > 156: uint gcid, bool is_old_bootstrap, bool is_mixed_done) { > 157: if ((gcid == _most_recent_gcid) && _most_recent_is_full) { > 158: // Do nothing. This is a redundant recording for the full gc that just completed. We have a redundant call here because some degenerated cycles become full cycles? Could we arrange for this to only be called once from each code path? src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 192: > 190: double mu = mutator_delta / (_active_processors * time_delta); > 191: double gcu = gc_delta / (_active_processors * time_delta); > 192: log_info(gc)("Periodic Sample: Average GCU = %.3f%%, Average MU = %.3f%%", gcu * 100, mu * 100); These aren't really averages any more right? Just the utilization over the last period? src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 70: > 68: > 69: bool _most_recent_is_full; > 70: bool _doing_mixed_evacuations; This is written, but never read. Is this used somewhere on a different branch as part of the bigger picture? ------------- PR Review: https://git.openjdk.org/shenandoah/pull/274#pullrequestreview-1423472751 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1191723666 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1191729620 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1191726260 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1191739206 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1191740981 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1191741664 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1191735012 From wkemper at openjdk.org Thu May 11 22:20:22 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 11 May 2023 22:20:22 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: <1f93r31LVh0byfSQvtlPu8RENp-RxuAk5tizy9D7tNg=.30955864-34f8-49af-9ee0-88326526d9e2@github.com> On Thu, 11 May 2023 01:07:43 GMT, Y. Srinivas Ramakrishna wrote: >> src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 90: >> >>> 88: // GC utilization during this cycle. >>> 89: // >>> 90: // We may redundantly record degen and full, in which case the gcid will repeat. We log these as FULL. >> >> Won't `gc_id` also repeat for old_marking_increments too? > > Or may be I am misunderstanding what "repeat" means here. I think only a boot strap cycle and its first old marking increment will have the same `gc_id`. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1191713114 From kdnilsen at openjdk.org Fri May 12 05:33:50 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 12 May 2023 05:33:50 GMT Subject: RFR: DRAFT: Expand old on demand [v43] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 126 commits: - Merge remote-tracking branch 'origin' into expand-old-on-demand - Fix whitespace - Weaken some assertions As written, these assertions fail if generation->is_global(). It may work to rewrite these assertions using accessor methods used(), humongous_waste(), and affiliated_region_count() instead of the instance field values. - Fix some assertions One assertion was too strong. Several others were too weak. - Reduce guaranteed young GC interval to 30 seconds It had been set to 15 seconds after observing that the default value of 5 minutes was resulting in degenerated cycles because GC triggers were too lazy. The value of 30 seconds has been observed to not result in degenerated GC with our two PhasedUpdates netflix workloads. - Tidy code for integration Two "global" variables are no longer needed: thread_iteration_roots and MIXED_EVACUATIONS_ENABLED. - Reorder lines of code to reduce diffs from original code - Remove irrelevant comment ShenandoahGenerationSizer::heap_size_changed() comment said this was not hooked anywhere. More recently, we have figured out two places that need to invoke this service. - Enable more tests on this branch - Update global usage following full gc even when non-generational mode - ... and 116 more: https://git.openjdk.org/shenandoah/compare/a2ab9979...b84c8cb1 ------------- Changes: https://git.openjdk.org/shenandoah/pull/248/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=42 Stats: 3099 lines in 34 files changed: 1700 ins; 899 del; 500 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kbarrett at openjdk.org Fri May 12 09:00:49 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 12 May 2023 09:00:49 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v3] In-Reply-To: References: Message-ID: > Please review this renaming of Atomic::fetch_and_add and friends to be > consistent with the naming convention recently chosen for atomic bitops. That > is, make the following name changes for class Atomic and it's implementation: > > - fetch_and_add => fetch_then_add > - add_and_fetch => add_then_fetch > > Testing: > mach5 tier1-3 > GHA testing Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - additional renamings post-genzgc - Merge branch 'master' into atomic-arith-names - revert accidental Red Hat copyright change - rename in tests - rename uses - rename impl ------------- Changes: https://git.openjdk.org/jdk/pull/13896/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13896&range=02 Stats: 165 lines in 46 files changed: 0 ins; 0 del; 165 mod Patch: https://git.openjdk.org/jdk/pull/13896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13896/head:pull/13896 PR: https://git.openjdk.org/jdk/pull/13896 From stefank at openjdk.org Fri May 12 09:00:49 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 12 May 2023 09:00:49 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v3] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 08:56:37 GMT, Kim Barrett wrote: >> Please review this renaming of Atomic::fetch_and_add and friends to be >> consistent with the naming convention recently chosen for atomic bitops. That >> is, make the following name changes for class Atomic and it's implementation: >> >> - fetch_and_add => fetch_then_add >> - add_and_fetch => add_then_fetch >> >> Testing: >> mach5 tier1-3 >> GHA testing > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - additional renamings post-genzgc > - Merge branch 'master' into atomic-arith-names > - revert accidental Red Hat copyright change > - rename in tests > - rename uses > - rename impl Looks good. I don't think you need to wait for David's approval of the ZGC changes. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13896#pullrequestreview-1424068203 From kbarrett at openjdk.org Fri May 12 09:00:52 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 12 May 2023 09:00:52 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v2] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:48:52 GMT, Kim Barrett wrote: >> Please review this renaming of Atomic::fetch_and_add and friends to be >> consistent with the naming convention recently chosen for atomic bitops. That >> is, make the following name changes for class Atomic and it's implementation: >> >> - fetch_and_add => fetch_then_add >> - add_and_fetch => add_then_fetch >> >> Testing: >> mach5 tier1-3 >> GHA testing > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > revert accidental Red Hat copyright change I've merged after the genzgc integration (there were a couple of simple conflicts, where I took the new code but applied the renaming to it). I then searched for and renamed some "new" occurrences of fetch_and_add (some were in gc/x). So not a completely trivial merge. Reran mach5 tier1-3. @stefank and @dholmes-ora - still okay? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13896#issuecomment-1545405978 From eosterlund at openjdk.org Fri May 12 09:24:57 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 12 May 2023 09:24:57 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 19:25:46 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Fix some uses of klass_offset_in_bytes() > - Fix args checking Changes requested by eosterlund (Reviewer). src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 253: > 251: // The copy above is not atomic. Make sure we have seen the proper mark > 252: // and re-install it into the copy, so that Klass* is guaranteed to be correct. > 253: markWord mark = o->mark_acquire(); I don't think we need the acquire here, do we? ------------- PR Review: https://git.openjdk.org/jdk/pull/13844#pullrequestreview-1424115328 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192130009 From kbarrett at openjdk.org Fri May 12 09:55:56 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 12 May 2023 09:55:56 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v3] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 08:55:05 GMT, Stefan Karlsson wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - additional renamings post-genzgc >> - Merge branch 'master' into atomic-arith-names >> - revert accidental Red Hat copyright change >> - rename in tests >> - rename uses >> - rename impl > > Looks good. I don't think you need to wait for David's approval of the ZGC changes. Thanks @stefank and @dholmes-ora for reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13896#issuecomment-1545480593 From kbarrett at openjdk.org Fri May 12 09:55:57 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 12 May 2023 09:55:57 GMT Subject: Integrated: 8307806: Rename Atomic::fetch_and_add and friends In-Reply-To: References: Message-ID: On Wed, 10 May 2023 08:38:49 GMT, Kim Barrett wrote: > Please review this renaming of Atomic::fetch_and_add and friends to be > consistent with the naming convention recently chosen for atomic bitops. That > is, make the following name changes for class Atomic and it's implementation: > > - fetch_and_add => fetch_then_add > - add_and_fetch => add_then_fetch > > Testing: > mach5 tier1-3 > GHA testing This pull request has now been integrated. Changeset: f09a0f5c Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/f09a0f5ca787e139f240a33bb12491792b8e7003 Stats: 165 lines in 46 files changed: 0 ins; 0 del; 165 mod 8307806: Rename Atomic::fetch_and_add and friends Reviewed-by: stefank, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/13896 From rkennke at openjdk.org Fri May 12 10:33:10 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 10:33:10 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v12] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: - Merge remote-tracking branch 'origin/JDK-8305895' into JDK-8305895 - Use plain mark() instead of mark_acquire() - Re-format some hash-code related code-paths ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/d83ff0e7..32e00c2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=10-11 Stats: 16 lines in 3 files changed: 5 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From rkennke at openjdk.org Fri May 12 10:44:53 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 10:44:53 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 09:22:01 GMT, Erik ?sterlund wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix some uses of klass_offset_in_bytes() >> - Fix args checking > > src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 253: > >> 251: // The copy above is not atomic. Make sure we have seen the proper mark >> 252: // and re-install it into the copy, so that Klass* is guaranteed to be correct. >> 253: markWord mark = o->mark_acquire(); > > I don't think we need the acquire here, do we? Right. An atomic load would be sufficient, which is what oopDesc::mark() already does. I change those code paths to use just mark(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192208296 From rkennke at openjdk.org Fri May 12 12:10:16 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 12:10:16 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Some hashcode improvements (mostly SA) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/32e00c2e..d44247ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=11-12 Stats: 19 lines in 3 files changed: 16 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From kdnilsen at openjdk.org Fri May 12 14:31:11 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 12 May 2023 14:31:11 GMT Subject: RFR: DRAFT: Expand old on demand [v44] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > As currently drafted, there are regression failures. This DRAFT PR is published for the purpose of facilitating a careful code review. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Report number of old- and young cset regions when preparing to rebuild free set ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/b84c8cb1..cfd30ef6 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=43 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=42-43 Stats: 19 lines in 2 files changed: 13 ins; 3 del; 3 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From wkemper at openjdk.org Fri May 12 16:18:37 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 12 May 2023 16:18:37 GMT Subject: RFR: Use static assert to validate card offset encoding mask Message-ID: Follow up to https://github.com/openjdk/shenandoah/pull/275 - replace runtime assert with static assert. ------------- Commit messages: - Use static assert to validate card offset encoding mask Changes: https://git.openjdk.org/shenandoah/pull/279/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=279&range=00 Stats: 6 lines in 1 file changed: 5 ins; 1 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/279.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/279/head:pull/279 PR: https://git.openjdk.org/shenandoah/pull/279 From coleenp at openjdk.org Fri May 12 16:19:59 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 12 May 2023 16:19:59 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 12:10:16 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Some hashcode improvements (mostly SA) I don't have any comments on the compiler code or gc code, but some other comments and questions. Some of the LP64 preprocessor conditionals are inconsistent in the assembly code. src/hotspot/share/gc/parallel/psPromotionManager.cpp line 293: > 291: > 292: oop old = task.to_source_array(); > 293: assert(old->forward_safe_klass()->is_objArray_klass(), "invariant"); Why sometimes forward_safe_klass()? Shouldn't all calls to klass() be "forward_safe"? How do you know where to put this version of the klass() call? src/hotspot/share/memory/universe.cpp line 325: > 323: assert(oopDesc::klass_offset_in_bytes() < static_cast(os::vm_page_size()), > 324: "Klass offset is expected to be less than the page size"); > 325: } This is where you should have else mark_offset_in_bytes() < page size or maybe it should be changed to the needs_explicit_null_check_code(). src/hotspot/share/oops/klass.cpp line 207: > 205: return prototype; > 206: } > 207: This seems like a useful change without UseCompactObjectHeaders as an enhancement and to remove some conditional code. Since we have storage in Klass for it anyway. src/hotspot/share/oops/objArrayKlass.cpp line 160: > 158: size_t ObjArrayKlass::oop_size(oop obj) const { > 159: // In this assert, we cannot safely access the Klass* with compact headers. > 160: assert(UseCompactObjectHeaders || obj->is_objArray(), "must be object array"); Isn't there code that checks oop->is_objArray() before calling this? Would it return true when it's not an objArray? src/hotspot/share/oops/oop.inline.hpp line 126: > 124: > 125: Klass* oopDesc::klass_or_null() const { > 126: #ifdef _LP64 I don't like all these #ifdef _LP64 here. Maybe markWord.inline.hpp can be refactored to not require callers to have this conditional inclusion. src/hotspot/share/runtime/arguments.cpp line 3136: > 3134: if (UseCompactObjectHeaders && !UseCompressedClassPointers) { > 3135: FLAG_SET_DEFAULT(UseCompressedClassPointers, true); > 3136: } Make this a function like set_compact_object_headers_flags(), that checks for FLAG_IS_CMDLINE for the related options and give a warning for them too, and add a test. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13844#pullrequestreview-1423518398 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192518760 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192536609 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192539910 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192548449 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192552486 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192558364 From coleenp at openjdk.org Fri May 12 16:20:05 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 12 May 2023 16:20:05 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: <-EjzJr1r6Uwq2zLGjVxyetEp4G0pCx0BwWZT_6tD0bo=.c9879b47-1c55-49a1-90e9-389820d6ed5e@github.com> On Thu, 11 May 2023 19:25:46 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Fix some uses of klass_offset_in_bytes() > - Fix args checking src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp line 330: > 328: } else { > 329: assert(!MacroAssembler::needs_explicit_null_check(oopDesc::klass_offset_in_bytes()), "must add explicit null check"); > 330: } We put this check in Universe::genesis() so it's not needed here for one less conditional. Maybe that check should be this one instead of what we have there. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4322: > 4320: assert(UseCompactObjectHeaders, "expects UseCompactObjectHeaders"); > 4321: > 4322: if (!UseCompactObjectHeaders) { I'm confused, why is this conditional here if you asserted it before? I can't imagine this being an untested code path and you need this for safety. If so, this doesn't take CompressedKlassPointers into account. I think it would be better to remove it. If I'm reading this right. Maybe change this assert to a guarantee for testing if you think this is likely. I see why this is. This is inconsistent with x86. You should fix this to match x86 and make it load_narrow_klass(). src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 167: > 165: void C1_MacroAssembler::initialize_header(Register obj, Register klass, Register len, Register t1, Register t2) { > 166: assert_different_registers(obj, klass, len, t1, t2); > 167: if (UseCompactObjectHeaders) { Shouldn't this be in _LP64 too like the code just above? src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 186: > 184: if (len->is_valid()) { > 185: movl(Address(obj, arrayOopDesc::length_offset_in_bytes()), len); > 186: if (UseCompactObjectHeaders) { This should also be in _LP64 and not have && !UseCompactObjectHeaders. You should restrict this to LP64 in this change. src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 323: > 321: } else { > 322: assert(!MacroAssembler::needs_explicit_null_check(oopDesc::klass_offset_in_bytes()), "must add explicit null check"); > 323: } I think this should be removed in favor of the test in Universe::genesis. src/hotspot/cpu/x86/macroAssembler_x86.hpp line 367: > 365: // oop manipulations > 366: #ifdef _LP64 > 367: void load_nklass(Register dst, Register src); Should this be private? Is it only called by load_klass ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191740593 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191744083 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191757984 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191758786 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191759600 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191762136 From rkennke at openjdk.org Fri May 12 16:34:05 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 16:34:05 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: Message-ID: <9qJt_SuVsX0LRXvopL0zka8cOsfCR5aBoNlrkxnjuEM=.4106fe62-6cf4-4421-a355-dbcf55e465e8@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remove obsolete code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/d44247ca..00f6d401 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=12-13 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From coleenp at openjdk.org Fri May 12 16:34:06 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 12 May 2023 16:34:06 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Tue, 9 May 2023 09:20:19 GMT, Roman Kennke wrote: >> I'm not sure if this is trivial or significant, but if you limit the class pointer to 30 bit, and use the upper 2 bits for locking, then you can obtain the class pointer in less instructions: >> >> movl dst, [obj + 4] >> andl dst, 0xBFFFFFFF >> jl slow_path >> >> This exploits the fact that the most significant bit represents a negative number, so it clears the unrelated bit and checks for valid header at the same time, the sequence is only 2 instructions long after macro fusion, compared to the current value of 3. >> >> This also allows quick class comparisons against constants, assuming that most instance is in unlock state, the comparison when equality is likely can be done: >> >> cmpl [obj + 4], con | 0x40000000 >> jne slow_path >> >> This can be matched on an `If` so that the `slow_path` can branch to the `IfTrue` label directly, and the fast path has only 1 comparison and 1 conditional jump. >> >> Thanks. > >> I'm not sure if this is trivial or significant, but if you limit the class pointer to 30 bit, and use the upper 2 bits for locking, then you can obtain the class pointer in less instructions: >> >> ``` >> movl dst, [obj + 4] >> andl dst, 0xBFFFFFFF >> jl slow_path >> ``` >> >> This exploits the fact that the most significant bit represents a negative number, so it clears the unrelated bit and checks for valid header at the same time, the sequence is only 2 instructions long after macro fusion, compared to the current value of 3. >> >> This also allows quick class comparisons against constants, assuming that most instance is in unlock state, the comparison when equality is likely can be done: >> >> ``` >> cmpl [obj + 4], con | 0x40000000 >> jne slow_path >> ``` >> >> This can be matched on an `If` so that the `slow_path` can branch to the `IfTrue` label directly, and the fast path has only 1 comparison and 1 conditional jump. >> >> Thanks. > > These are great suggestions! I would shy away from doing it in this PR, though, because this also affects the locking subsystem and would cause quite intrusive changes and invalidate all the testing that we've done. Let's consider this in the Lilliput project and upstream the optimization separately, ok? > > Thanks! > Roman @rkennke Can you merge up with the GenerationalZGC changes because some of our test definitions need it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1545999675 From rkennke at openjdk.org Fri May 12 16:41:59 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 16:41:59 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: <-EjzJr1r6Uwq2zLGjVxyetEp4G0pCx0BwWZT_6tD0bo=.c9879b47-1c55-49a1-90e9-389820d6ed5e@github.com> References: <-EjzJr1r6Uwq2zLGjVxyetEp4G0pCx0BwWZT_6tD0bo=.c9879b47-1c55-49a1-90e9-389820d6ed5e@github.com> Message-ID: On Thu, 11 May 2023 22:50:20 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix some uses of klass_offset_in_bytes() >> - Fix args checking > > src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 186: > >> 184: if (len->is_valid()) { >> 185: movl(Address(obj, arrayOopDesc::length_offset_in_bytes()), len); >> 186: if (UseCompactObjectHeaders) { > > This should also be in _LP64 and not have && !UseCompactObjectHeaders. You should restrict this to LP64 in this change. Ok I will put it in _LP64 (even though it is not strictly needed - UseCompactObjectHeaders is hard-wired constant false, so compiler will not include the code, I would expect), but why not check UseCompactObjectHeaders here? The new code is only sensible with compact headers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192591711 From rkennke at openjdk.org Fri May 12 16:45:57 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 16:45:57 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: <-EjzJr1r6Uwq2zLGjVxyetEp4G0pCx0BwWZT_6tD0bo=.c9879b47-1c55-49a1-90e9-389820d6ed5e@github.com> References: <-EjzJr1r6Uwq2zLGjVxyetEp4G0pCx0BwWZT_6tD0bo=.c9879b47-1c55-49a1-90e9-389820d6ed5e@github.com> Message-ID: On Thu, 11 May 2023 22:57:41 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix some uses of klass_offset_in_bytes() >> - Fix args checking > > src/hotspot/cpu/x86/macroAssembler_x86.hpp line 367: > >> 365: // oop manipulations >> 366: #ifdef _LP64 >> 367: void load_nklass(Register dst, Register src); > > Should this be private? Is it only called by load_klass ? No, it's also used by cmp_klass(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192595297 From rkennke at openjdk.org Fri May 12 16:49:56 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 16:49:56 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 15:29:28 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Some hashcode improvements (mostly SA) > > src/hotspot/share/gc/parallel/psPromotionManager.cpp line 293: > >> 291: >> 292: oop old = task.to_source_array(); >> 293: assert(old->forward_safe_klass()->is_objArray_klass(), "invariant"); > > Why sometimes forward_safe_klass()? Shouldn't all calls to klass() be "forward_safe"? How do you know where to put this version of the klass() call? There are situations in GC where the object can be forwarded or not. This is where those methods are useful and required for compact-headers, because the mark-word and therefore Klass* could only be loaded from the forwardee. Originally I put this code in places where it happened in each GC, but eventually realized that it is a somewhat common pattern, so I put it in oopDesc. But we need to be careful there, because we have two different ways to deal with forwarding full-GC uses sliding-forwarding (which preserves the Klass* in the mark-word) and normal-GCs use the normal-forwarding which only preserves the Klass* in the forwardee. The forward_safe_* methods only work on the latter, and are only required in the latter. But that is not very difficult to sort out, IMO, because full-GCs are in fully separate code paths anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192598914 From rkennke at openjdk.org Fri May 12 16:54:55 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 16:54:55 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 15:50:14 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Some hashcode improvements (mostly SA) > > src/hotspot/share/oops/klass.cpp line 207: > >> 205: return prototype; >> 206: } >> 207: > > This seems like a useful change without UseCompactObjectHeaders as an enhancement and to remove some conditional code. Since we have storage in Klass for it anyway. Why? This code used to be there with BiasedLocking, and has been removed. I've re-instated it for compact object headers, because the prototype mark for an object now depends on its Klass, but other than that, why would it be useful? The prototype would be just markWord::prototype(). > src/hotspot/share/oops/objArrayKlass.cpp line 160: > >> 158: size_t ObjArrayKlass::oop_size(oop obj) const { >> 159: // In this assert, we cannot safely access the Klass* with compact headers. >> 160: assert(UseCompactObjectHeaders || obj->is_objArray(), "must be object array"); > > Isn't there code that checks oop->is_objArray() before calling this? Would it return true when it's not an objArray? Yes, there is. We're a bit excessive with asserting the klass here. I tried to remain as close as possible with that, so I disabled it only for compact object headers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192601974 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192602701 From rkennke at openjdk.org Fri May 12 16:59:53 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 16:59:53 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 16:03:26 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Some hashcode improvements (mostly SA) > > src/hotspot/share/oops/oop.inline.hpp line 126: > >> 124: >> 125: Klass* oopDesc::klass_or_null() const { >> 126: #ifdef _LP64 > > I don't like all these #ifdef _LP64 here. Maybe markWord.inline.hpp can be refactored to not require callers to have this conditional inclusion. The problem is with 32bits, in markWord, we only have 32bits in the header, and no place to stick in the Klass* in the upper 32 bits. That's why I put all those #ifdefs, there. If you take a step back, you'll notice that compact object headers mostly aligns the layout headers of 64bit and 32bit JVMs. There would be a great opportunity here to consolidate all this code, make the whole header a union/struct/bitfield that looks the same both on 32bit and 64bit builds. But this conflicts with the current implementation where we want to be able to switch between compact and legacy header layout. Also, going forward, we want to shrink the header even more to just 32bits, and still have it switchable with the old layout. Eventually all this stuff will be the same in 32bit and 64bit JVMs, but for the time being I think we need to keep it slightly messy to support the legacy layout. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192606637 From rkennke at openjdk.org Fri May 12 17:05:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 17:05:33 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v43] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new ... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 109 commits: - Merge remote-tracking branch 'origin/JDK-8305896' into JDK-8305896 - Some more @shipilev comments - Update src/hotspot/share/gc/shared/slidingForwarding.hpp Co-authored-by: Aleksey Shipil?v - Align fake-heap without GCC warnings (duh) - Merge branch 'master' into JDK-8305896 - Fix gtest: Align fake-heaps, avoid re-forwardings - @tschatzl's latest fix, cleanup and a test that checks unaligned heap problem - Fix build - Update src/hotspot/share/gc/shared/slidingForwarding.cpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/gc/shared/slidingForwarding.cpp Co-authored-by: Aleksey Shipil?v - ... and 99 more: https://git.openjdk.org/jdk/compare/6ebea897...f1ad3421 ------------- Changes: https://git.openjdk.org/jdk/pull/13582/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=42 Stats: 912 lines in 24 files changed: 875 ins; 0 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Fri May 12 17:18:34 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 17:18:34 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. ------------- Changes: https://git.openjdk.org/jdk/pull/13844/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=14 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From duke at openjdk.org Fri May 12 17:18:36 2023 From: duke at openjdk.org (duke) Date: Fri, 12 May 2023 17:18:36 GMT Subject: Withdrawn: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: <4DZ-toPVpu6KD4F82nUyCyky74gPt6eMC0JAwTCJbZ0=.8be0cbcd-ed48-404a-979f-39f4d768f578@github.com> On Fri, 5 May 2023 20:29:38 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13844 From rkennke at openjdk.org Fri May 12 17:35:14 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 17:35:14 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 17:18:34 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. I'm sorry, I think I butchered this PR while trying to merge latest upstream through all the dependent PRs. Let's continue the discussion the new PR #13961. I hope I haven't caused anything breakage (for some reason, this PR now shows as "Merged" which worries me. I believe the Skara bot did that. I wonder where it has been merged to.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1546066712 From coleenp at openjdk.org Fri May 12 17:42:15 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 12 May 2023 17:42:15 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 16:51:07 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/klass.cpp line 207: >> >>> 205: return prototype; >>> 206: } >>> 207: >> >> This seems like a useful change without UseCompactObjectHeaders as an enhancement and to remove some conditional code. Since we have storage in Klass for it anyway. > > Why? This code used to be there with BiasedLocking, and has been removed. I've re-instated it for compact object headers, because the prototype mark for an object now depends on its Klass, but other than that, why would it be useful? The prototype would be just markWord::prototype(). I thought not to waste a 64 bit field in Klass and to maybe eliminate some if (CompactObjectHeaders) use the one in Klass else use MarkWord::prototype(), just always use the one in Klass. Minimizing if (CompactObjectHeaders) would be a good thing. At any case, this isn't for this change, just an idea to try to use this field unconditionally. I recognize it from BiasedLocking. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192642191 From rkennke at openjdk.org Fri May 12 19:08:11 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 19:08:11 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) Message-ID: This is the main body of the JEP 450: Compact Object Headers (Experimental). Main changes: - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. - The identity hash-code is narrowed to 25 bits. - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Testing: (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) - [x] tier1 (x86_64) - [x] tier2 (x86_64) - [x] tier3 (x86_64) - [ ] tier4 (x86_64) - [x] tier1 (aarch64) - [x] tier2 (aarch64) - [x] tier3 (aarch64) - [ ] tier4 (aarch64) - [ ] tier1 (x86_64) +UseCompactObjectHeaders - [ ] tier2 (x86_64) +UseCompactObjectHeaders - [ ] tier3 (x86_64) +UseCompactObjectHeaders - [ ] tier4 (x86_64) +UseCompactObjectHeaders - [ ] tier1 (aarch64) +UseCompactObjectHeaders - [ ] tier2 (aarch64) +UseCompactObjectHeaders - [ ] tier3 (aarch64) +UseCompactObjectHeaders - [ ] tier4 (aarch64) +UseCompactObjectHeaders ------------- Depends on: https://git.openjdk.org/jdk/pull/13779 Commit messages: - Consolidate _LP64 #ifdef - Remove obsolete check - Handle klass offset in JVMCI - Disable CDS tests when running with +UseCompactObjectHeaders - Merge branch 'JDK-8305898' into JDK-8305895-v2 - @colenp review comments - Remove obsolete code - Some hashcode improvements (mostly SA) - Merge remote-tracking branch 'origin/JDK-8305895' into JDK-8305895 - Fix some uses of klass_offset_in_bytes() - ... and 36 more: https://git.openjdk.org/jdk/compare/880d564a...a6e9f10a Changes: https://git.openjdk.org/jdk/pull/13961/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13961&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305895 Stats: 1306 lines in 98 files changed: 1025 ins; 94 del; 187 mod Patch: https://git.openjdk.org/jdk/pull/13961.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13961/head:pull/13961 PR: https://git.openjdk.org/jdk/pull/13961 From coleenp at openjdk.org Fri May 12 19:08:11 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 12 May 2023 19:08:11 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Fri, 12 May 2023 17:27:25 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders These changes are an improvement. src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 193: > 191: movl(Address(obj, arrayOopDesc::length_offset_in_bytes() + sizeof(jint)), t1); > 192: } > 193: #endif This endif should go after UseCompressedClassPointers conditional, and consolidate to one set of #ifdef _LP64. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5126: > 5124: assert(UseCompactObjectHeaders, "expect compact object headers"); > 5125: > 5126: if (!UseCompactObjectHeaders) { Now this isn't needed, right? src/hotspot/share/cds/archiveBuilder.cpp line 726: > 724: k->set_prototype_header(markWord::prototype().set_narrow_klass(nk)); > 725: } > 726: #endif //_LP64 If CDS is turned off for UseCompactObjectHeaders, I don't understand this change or the one to archiveHeapWriter. -Xshare:dump objects would be the wrong size. If CDS is not supported, then there should be something in arguments.cpp that gives an error for that. And write a test for that error of mixing and matching. ------------- PR Review: https://git.openjdk.org/jdk/pull/13961#pullrequestreview-1424925882 PR Review Comment: https://git.openjdk.org/jdk/pull/13961#discussion_r1192648245 PR Review Comment: https://git.openjdk.org/jdk/pull/13961#discussion_r1192646637 PR Review Comment: https://git.openjdk.org/jdk/pull/13961#discussion_r1192661859 From rkennke at openjdk.org Fri May 12 19:08:12 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 19:08:12 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Fri, 12 May 2023 18:03:13 GMT, Coleen Phillimore wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > src/hotspot/share/cds/archiveBuilder.cpp line 726: > >> 724: k->set_prototype_header(markWord::prototype().set_narrow_klass(nk)); >> 725: } >> 726: #endif //_LP64 > > If CDS is turned off for UseCompactObjectHeaders, I don't understand this change or the one to archiveHeapWriter. -Xshare:dump objects would be the wrong size. If CDS is not supported, then there should be something in arguments.cpp that gives an error for that. And write a test for that error of mixing and matching. Yeah, we do have code in arguments.cpp that turns off CDS if the wrong setting is used (i.e. the opposite of the default setting). If you hard-code UseCompactObjectHeaders to be true, then the archives will be written in the compact layout, and can be read. That's what the changes in share/cds implement. (Note: I regularily hard-patch the UseCompactObjectHeaders flag to be true for testing, because that also catches all the @flagless tests, and I know that Daniel does that, too.) We disabled CDS jtreg tests when passing +UseCompactObjectHeaders via cmd line, because we would see a lot of test failures because of archive format mismatch. I am not sure if that's a good way to deal with that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13961#discussion_r1192698676 From ysr at openjdk.org Fri May 12 21:54:22 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Fri, 12 May 2023 21:54:22 GMT Subject: RFR: Use static assert to validate card offset encoding mask In-Reply-To: References: Message-ID: On Fri, 12 May 2023 16:12:16 GMT, William Kemper wrote: > Follow up to https://github.com/openjdk/shenandoah/pull/275 - replace runtime assert with static assert. LGTM; minor documentation suggestion. ? src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.hpp line 400: > 398: static const int MaxCardSize = NOT_LP64(512) LP64_ONLY(1024); > 399: STATIC_ASSERT((MaxCardSize / HeapWordSize) - 1 <= FirstStartBits); > 400: May be leave a reference to the raw values for `MaxCardSize` as coming from the range for `GCCardSizeInBytes` e.g. in `gc_globals.hpp`. ------------- Marked as reviewed by ysr (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/279#pullrequestreview-1425193780 PR Review Comment: https://git.openjdk.org/shenandoah/pull/279#discussion_r1192820463 From ysr at openjdk.org Sat May 13 02:02:50 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Sat, 13 May 2023 02:02:50 GMT Subject: RFR: Expand old on demand [v44] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 14:31:11 GMT, Kelvin Nilsen wrote: >> This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. >> >> This PR now passes all GHA pre-integration tests and other internal CI tests. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Report number of old- and young cset regions when preparing to rebuild free set Skimmed over some of the diffs and left some comments. Haven't gotten into the details yet, but will once the overall structure and shape of the changes becomes more clear. Most of the current comments are of form/structure rather than of the details which I will defer until Monday. src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 541: > 539: size_t ShenandoahHeuristics::evac_slack(size_t young_regions_to_be_recycled) { > 540: assert(false, "evac_slack() only implemented for young Adaptive Heuristics"); > 541: return 0; This is a bit awkward. We should decide if this is _needed_ anywhere else other than its only currently intended use case. If it isn't needed anywhere else and shouldn't be called, I'd change the assert to a `guarantee(false)` (or `ShouldNotReachHere()`). If it shouldn't be needed, but it makes sense to have a default implementation and let it return 0, I'd remove the assert entirely and let it just return 0. If it turns out that you can't be sure, I'd go for the former for now. src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.hpp line 176: > 174: virtual void initialize(); > 175: > 176: virtual size_t evac_slack(size_t region_to_be_recycled); Please document the public spec of this method in a comment here. Is the intention here to make this a pure virtual too, or is it a good idea to provide a default implementation for subtypes that don't care? src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.hpp line 83: > 81: > 82: // Flags are set when promotion failure is detected (by gc thread), and cleared when > 83: // old generation collection begins (by control thread). Flags are set and cleared at safepoints. First sentence repeats 78-79. Can these be consolidated into a single block of declarations with a comment block preceding the block? Otherwise, please fix the comment so it doesn't confusingly refer to promotion failure. src/hotspot/share/gc/shenandoah/shenandoahCollectionSet.hpp line 72: > 70: // should be subtracted from what's available. > 71: size_t _young_available_bytes_collected; > 72: size_t _old_available_bytes_collected; One typical HotSpot GC terminology is to use the term "free" to denote memory that is available for allocation, "garbage" for memory that is used by dead objects, and "used" for memory that is used by objects whose liveness is unknown (so they can't be collected) or that is wasted to fragmentation (internal or external). I am wondering if it might make sense to use that terminology for brevity and consistency in method and data member names. In that case, it would see as though the above block might read: // When a region with free memory is added to the collection set, that memory is no longer // available for allocation until the region is evacuated. size_t _young_free_bytes; size_t _old_free_bytes; In addition, you will need to state what these data members represent. Something like: These represent the free space in the young and old generations, respectively. src/hotspot/share/gc/shenandoah/shenandoahControlThread.hpp line 176: > 174: bool in_graceful_shutdown(); > 175: > 176: void service_concurrent_normal_cycle(ShenandoahHeap* heap, Isn't this delta already there in another PR? I'd rebase this PR wrt that as parent, or land that and refresh this so as to avoid presenting the same diffs in two different PRs where possible. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1080: > 1078: // Return the amount of young-gen memory that is about to be reycled > 1079: void ShenandoahFreeSet::prepare_to_rebuild(size_t &young_cset_regions, size_t &old_cset_regions) { > 1080: size_t region_size_bytes = ShenandoahHeapRegion::region_size_bytes(); This variable seems to be unused. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1088: > 1086: // This places regions that have alloc_capacity into the old_collector set if they identify as is_old() or the > 1087: // mutator set otherwise. > 1088: find_regions_with_alloc_capacity(young_cset_regions, old_cset_regions); This method needs a better name, as does `prepare_to_rebuild`. One way to do this is to write out the API spec for the method clearly in the header file. A natural name should then suggest itself. I think a natural name is `add_regions_with_free_space(&num_young, &num_old)`. Confusingly, we pass in young & old region counts as reference parameters, but talk about old collector set and mutator set. I realize we are identifying young with mutator because mutators allocate only from young, but it helps to not cause confusion, but rather stick with one consistent nomenclature here and elsewhere, invoking the other nomenclature (gc and mutator) only when absolutely needed for some specific reason. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp line 1091: > 1089: } > 1090: > 1091: void ShenandoahFreeSet::rebuild() { Why is this method called `rebuild`? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 175: > 173: void flip_to_old_gc(ShenandoahHeapRegion* r); > 174: > 175: void adjust_bounds_for_additional_old_collector_free_region(size_t idx); Where is this defined? The very long name makes me thing it could either be renamed or may have been awkwardly factored out. src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 179: > 177: void recompute_bounds(); > 178: void adjust_bounds(); > 179: bool touches_bounds(size_t num) const; These were there before, so why are they appearing as new in the diffs? src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp line 181: > 179: bool touches_bounds(size_t num) const; > 180: > 181: // Used of free set represents the amount of is_mutator_free set that has been consumed since most recent rebuild. That comment appears to belong where`used()` is declared? I also see that all these lines are not new, so I am now quite sure why they are showing up as new diffs introduced in this PR. src/hotspot/share/gc/shenandoah/shenandoahGeneration.hpp line 191: > 189: // Return the updated value of affiliated_region_count > 190: size_t decrease_affiliated_region_count(size_t delta); > 191: I realize the comments are talking about what is returned in each case, but I'd much rather they were clearer: // {In,De}crement the region count by delta and return its new value. I also feel it's overkill to have the method at line 184. Why not just use a default delta of 1, and do away with the 0-arg versions of these methods? This would also fix the current lack of checking for underflow from 0 in the case of the 0-arg decrement method. The asymmetry between the increase/decrease methods also points out that, although very unlikely, there is nothing that is checking for the reasonableness of the delta in the case of the increment. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 228: > 226: size_t _regular_regions_promoted_in_place; > 227: size_t _regular_usage_promoted_in_place; > 228: Although these names may in many cases be "self-explanatory" to those familiar with the code, I think they need at least a brief 1-line description each. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 362: > 360: // Allocation of old GCLABs (aka PLABs) assures that _old_evac_expended + request-size < _old_evac_reserved. If the allocation > 361: // is authorized, increment _old_evac_expended by request size. This allocation ignores old_gen->available(). > 362: This documentation comment seems like a stream-of-consciousness thing. I think we need this comment to move to the right spot. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 533: > 531: void recycle_trash(); > 532: public: > 533: void rebuild_free_set(bool concurrent); Please document the method's public specification so a caller is aware as to what they should expect the method to do. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 710: > 708: // > 709: private: > 710: // How many bytes to transfer between old and young after we have finished recycling collection set regions? General comment (no action needed at this time) about region counts. It appears to me that when talking about quantities that might take positive (`surplus`) or negative (`deficit`) values that are suitably bounded, it might make sense, where possible, to have a single variable of type `ssize_t` and make use of signs for that one variable. Is this such an opportunity? src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 747: > 745: void set_young_lab_region_flags(); > 746: > 747: inline void set_old_region_surplus(size_t surplus) { _old_regions_surplus = surplus; }; I'd avoid the unnecessary inconsistency between `set_old_region_surplus` and `_old_regions_surplus`. I prefer both to use the singular form of `region`. src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 50: > 48: * of the generation that is consuming the most CPU time. The assumption being > 49: * that increasing memory will reduce the collection frequency and raise the > 50: * MMU. I believe these have already been split out into a separate PR. Let's drop these from this PR by producing a dependent PR to avoid having to look at these diffs again. src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 102: > 100: product(uintx, ShenandoahOldGarbageThreshold, 15, EXPERIMENTAL, \ > 101: "How much garbage an old region has to contain before it would " \ > 102: "be taken for collection.") \ "An old region must contain at least this percentage of garbage before being eligible for evacuation." What happens when the user specifies a number like 0 or 100? Does it have the requisite effect? For 0, I would expect the region to be eligible for evacuation even when there is no garbage in it and, for 100, I would expect it to be collected only when it is fully garbage and not otherwise. Is this a reasonable expectation? Are there tests that check for this? src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 137: > 135: "(soft) max heap size.") \ > 136: range(0,100) \ > 137: \ Would it be correct to assume that this behaviour is no longer present or, if present, no longer directly controllable in this manner? It'd be great to have a PR comment against each of these knob changes motivating the changes to their default values, or of their no longer being available. src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 206: > 204: "milliseconds. Setting this to 0 disables the feature.") \ > 205: \ > 206: product(uintx, ShenandoahGuaranteedYoungGCInterval, 30*1000, EXPERIMENTAL, \ Why was this made so small? Are there cases where, say, for a quiescent/idle service instance with little or no load, we must still, for some reason, trigger young gc's twice each minute? src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 309: > 307: "evacuation conflicts, at expense of evacuating less on each " \ > 308: "GC cycle. Smaller values increase the risk of evacuation " \ > 309: "failures, which will trigger stop-the-world Full GC passes.") \ The terms `waste` and `conflicts` are not clear to me. Can we write this in a way that someone not familiar with the internals of the collector can understand without having to dig through the code to understand what `waste` and `evacuation conflict` are? Or include just enough for them to be able to use these knobs with some amount of instinct rather than knob-twiddling "because it was there". It looks to me like a "magic number estimate" that the user produces out of thin air that the collector then uses in its accounting, and then real life unfolds in such a manner that this determines risk vs reward or efficiency of space usage. In that sense this is more like a "magic padding factor". If so, I'd describe in a line or two what makes the actual space usage more than the size of the evacuated objects, why it's difficult to accurately assess ahead of time, and why a larger value potentially wastes space by allowing more headroom while avoiding running out of space while a smaller value wastes less space while running a greater risk of running out of space in tight situations where it attempted an evacuation. The talk of waste as related to "conflict" is a bit confusing to me at least without knowing what conflict is and why it must lead to wastage of space. This probably needs more discussion to see why we may not be able to learn something better based on measurement as the collections unfold, based on an initial rough estimate to begin with and then refining it with a suitable padding for uncertainty over time. src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 341: > 339: "The maximum proportion of evacuation from old-gen memory, as " \ > 340: "a percent ratio. The default value 75 denotes that no more " \ > 341: "than 75% of the collection set evacuation " \ Is the underlying quantity the number of bytes eligible for evacuation or the number of regions eligible for evacuation? Perhaps make that more clear in the description string. src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 367: > 365: range(0, 100) \ > 366: \ > 367: product(uintx, ShenandoahMaxYoungPercentage, 100, EXPERIMENTAL, \ If a user sets MinYoung == MaxYoung == 100, does one get single-generation Shenandoah? Or is that request rejected? What if both are set to 0? Are there sanity checks or tests that would fail today with such settings? test/hotspot/jtreg/ProblemList.txt line 90: > 88: gc/shenandoah/TestDynamicSoftMaxHeapSize.java#generational 8306333 generic-all > 89: gc/stress/gclocker/TestGCLockerWithShenandoah.java#generational 8306341 generic-all > 90: Nice!! test/hotspot/jtreg/ProblemList.txt line 95: > 93: # gc/shenandoah/oom/TestClassLoaderLeak.java 8306336 generic-all > 94: # gc/shenandoah/TestDynamicSoftMaxHeapSize.java#generational 8306333 generic-all > 95: # gc/stress/gclocker/TestGCLockerWithShenandoah.java#generational 8306341 generic-all I'd just delete these when ready to land, and close the tickets as duplicates of the ticket for this PR. ------------- Changes requested by ysr (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/248#pullrequestreview-1425202534 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192832501 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192826572 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192835927 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192844592 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192892333 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192883701 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192884145 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192886058 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192887256 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192887595 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192887717 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192890047 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192890299 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192890578 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192890838 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192891449 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192891752 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192893088 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192894139 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192894384 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192894759 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192899320 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192897901 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192897629 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192899482 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1192899582 From dholmes at openjdk.org Sat May 13 21:37:55 2023 From: dholmes at openjdk.org (David Holmes) Date: Sat, 13 May 2023 21:37:55 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v3] In-Reply-To: References: Message-ID: <_sFb2q6j7KzvC0EhPo48GXAe72ZDMG0BcEb5D-zK7X0=.e3da4eed-2e0a-4a3e-9389-b5cfc0ae57e6@github.com> On Fri, 12 May 2023 09:00:49 GMT, Kim Barrett wrote: >> Please review this renaming of Atomic::fetch_and_add and friends to be >> consistent with the naming convention recently chosen for atomic bitops. That >> is, make the following name changes for class Atomic and it's implementation: >> >> - fetch_and_add => fetch_then_add >> - add_and_fetch => add_then_fetch >> >> Testing: >> mach5 tier1-3 >> GHA testing > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - additional renamings post-genzgc > - Merge branch 'master' into atomic-arith-names > - revert accidental Red Hat copyright change > - rename in tests > - rename uses > - rename impl src/hotspot/share/gc/z/zRelocationSet.cpp line 47: > 45: const ZArray* _small; > 46: const ZArray* _medium; > 47: ZArrayParallelIterator _small_iter; Why was this removed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13896#discussion_r1193039772 From dholmes at openjdk.org Sat May 13 21:37:55 2023 From: dholmes at openjdk.org (David Holmes) Date: Sat, 13 May 2023 21:37:55 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v3] In-Reply-To: <_sFb2q6j7KzvC0EhPo48GXAe72ZDMG0BcEb5D-zK7X0=.e3da4eed-2e0a-4a3e-9389-b5cfc0ae57e6@github.com> References: <_sFb2q6j7KzvC0EhPo48GXAe72ZDMG0BcEb5D-zK7X0=.e3da4eed-2e0a-4a3e-9389-b5cfc0ae57e6@github.com> Message-ID: On Sat, 13 May 2023 21:33:10 GMT, David Holmes wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - additional renamings post-genzgc >> - Merge branch 'master' into atomic-arith-names >> - revert accidental Red Hat copyright change >> - rename in tests >> - rename uses >> - rename impl > > src/hotspot/share/gc/z/zRelocationSet.cpp line 47: > >> 45: const ZArray* _small; >> 46: const ZArray* _medium; >> 47: ZArrayParallelIterator _small_iter; > > Why was this removed? Never mind this seems to be a quirk of the github UI. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13896#discussion_r1193040372 From kdnilsen at openjdk.org Sun May 14 13:57:21 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sun, 14 May 2023 13:57:21 GMT Subject: RFR: Expand old on demand [v45] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Misc changes address reviewer feedback and observed performance issues - Minor improvements in comments ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/cfd30ef6..79fede27 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=44 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=43-44 Stats: 271 lines in 11 files changed: 164 ins; 30 del; 77 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Sun May 14 14:14:25 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sun, 14 May 2023 14:14:25 GMT Subject: RFR: Expand old on demand [v46] In-Reply-To: References: Message-ID: <9r8TvUE-lCz1ubUtSosAJ0gAtgfFYJTGXLZdzCxeiCI=.dbe5cef1-3821-4c74-8d15-3a3536a9df06@github.com> > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix whitespace ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/79fede27..d7b63412 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=45 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=44-45 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From wkemper at openjdk.org Mon May 15 17:14:48 2023 From: wkemper at openjdk.org (William Kemper) Date: Mon, 15 May 2023 17:14:48 GMT Subject: RFR: Use static assert to validate card offset encoding mask [v2] In-Reply-To: References: Message-ID: > Follow up to https://github.com/openjdk/shenandoah/pull/275 - replace runtime assert with static assert. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Document source for max card size ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/279/files - new: https://git.openjdk.org/shenandoah/pull/279/files/6c25a665..ff91daba Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=279&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=279&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/279.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/279/head:pull/279 PR: https://git.openjdk.org/shenandoah/pull/279 From wkemper at openjdk.org Mon May 15 17:14:49 2023 From: wkemper at openjdk.org (William Kemper) Date: Mon, 15 May 2023 17:14:49 GMT Subject: Integrated: Use static assert to validate card offset encoding mask In-Reply-To: References: Message-ID: On Fri, 12 May 2023 16:12:16 GMT, William Kemper wrote: > Follow up to https://github.com/openjdk/shenandoah/pull/275 - replace runtime assert with static assert. This pull request has now been integrated. Changeset: e48977a7 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/e48977a7bca71d7bf1d337bb7811f2719ceb3341 Stats: 7 lines in 1 file changed: 6 ins; 1 del; 0 mod Use static assert to validate card offset encoding mask Reviewed-by: ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/279 From kdnilsen at openjdk.org Mon May 15 20:35:48 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 15 May 2023 20:35:48 GMT Subject: RFR: Expand old on demand [v47] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Modify API for ShenandoahHeapRegion::set_affiliation Remove the deferred_accounting argument and change the behavior so that this method never adjusts affiliated region counts. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/d7b63412..f277f977 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=46 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=45-46 Stats: 68 lines in 5 files changed: 11 ins; 25 del; 32 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Tue May 16 04:06:27 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 16 May 2023 04:06:27 GMT Subject: RFR: Expand old on demand [v48] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Some changes regarding affiliated region accounting Do this accounting explicitly rather than inside of ShenandoahHeapRegion::set_affiliation() ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/f277f977..a171c778 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=47 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=46-47 Stats: 51 lines in 3 files changed: 7 ins; 39 del; 5 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From coleenp at openjdk.org Tue May 16 12:58:10 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 16 May 2023 12:58:10 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v43] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 17:05:33 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The c... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 109 commits: > > - Merge remote-tracking branch 'origin/JDK-8305896' into JDK-8305896 > - Some more @shipilev comments > - Update src/hotspot/share/gc/shared/slidingForwarding.hpp > > Co-authored-by: Aleksey Shipil?v > - Align fake-heap without GCC warnings (duh) > - Merge branch 'master' into JDK-8305896 > - Fix gtest: Align fake-heaps, avoid re-forwardings > - @tschatzl's latest fix, cleanup and a test that checks unaligned heap problem > - Fix build > - Update src/hotspot/share/gc/shared/slidingForwarding.cpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/share/gc/shared/slidingForwarding.cpp > > Co-authored-by: Aleksey Shipil?v > - ... and 99 more: https://git.openjdk.org/jdk/compare/6ebea897...f1ad3421 src/hotspot/share/utilities/fastHash.hpp line 30: > 28: #include "memory/allStatic.hpp" > 29: > 30: class FastHash : public AllStatic { Where did this hash come from? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1195122010 From rkennke at openjdk.org Tue May 16 14:18:16 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 16 May 2023 14:18:16 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v43] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 12:54:50 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 109 commits: >> >> - Merge remote-tracking branch 'origin/JDK-8305896' into JDK-8305896 >> - Some more @shipilev comments >> - Update src/hotspot/share/gc/shared/slidingForwarding.hpp >> >> Co-authored-by: Aleksey Shipil?v >> - Align fake-heap without GCC warnings (duh) >> - Merge branch 'master' into JDK-8305896 >> - Fix gtest: Align fake-heaps, avoid re-forwardings >> - @tschatzl's latest fix, cleanup and a test that checks unaligned heap problem >> - Fix build >> - Update src/hotspot/share/gc/shared/slidingForwarding.cpp >> >> Co-authored-by: Aleksey Shipil?v >> - Update src/hotspot/share/gc/shared/slidingForwarding.cpp >> >> Co-authored-by: Aleksey Shipil?v >> - ... and 99 more: https://git.openjdk.org/jdk/compare/6ebea897...f1ad3421 > > src/hotspot/share/utilities/fastHash.hpp line 30: > >> 28: #include "memory/allStatic.hpp" >> 29: >> 30: class FastHash : public AllStatic { > > Where did this hash come from? >From here: https://github.com/openjdk/jdk/pull/13582#discussion_r1184778234 apparently by @rose00 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1195236715 From coleenp at openjdk.org Tue May 16 16:57:06 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 16 May 2023 16:57:06 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v32] In-Reply-To: References: Message-ID: On Sat, 6 May 2023 22:44:26 GMT, Albert Mingkun Yang wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix asserts > > src/hotspot/share/gc/shared/slidingForwarding.hpp line 159: > >> 157: * is sufficient because G1 serial compaction is single-threaded. >> 158: */ >> 159: class FallbackTable : public CHeapObj{ > > Could this class be placed inside `SlidingForwarding` for better encapsulation? Why do you write your own hashtable when there's one in utilities/resourceHash.hpp ? there's a put_when_absent() function that is similar than this insert function and faster than put_if_absent(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1195447013 From rkennke at openjdk.org Tue May 16 17:08:09 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 16 May 2023 17:08:09 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v32] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 16:54:00 GMT, Coleen Phillimore wrote: >> src/hotspot/share/gc/shared/slidingForwarding.hpp line 159: >> >>> 157: * is sufficient because G1 serial compaction is single-threaded. >>> 158: */ >>> 159: class FallbackTable : public CHeapObj{ >> >> Could this class be placed inside `SlidingForwarding` for better encapsulation? > > Why do you write your own hashtable when there's one in utilities/resourceHash.hpp ? there's a put_when_absent() function that is similar than this insert function and faster than put_if_absent(). Uh, I've not been aware of it and somehow haven't found it when I searched for something like it. resourceHash.hpp perhaps wasn't a name that struck me as a generic hashtable impl. :-) Let me check if it's feasible to just use the one in resourceHash.hpp ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13582#discussion_r1195460418 From kdnilsen at openjdk.org Tue May 16 17:28:52 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 16 May 2023 17:28:52 GMT Subject: RFR: Expand old on demand [v49] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Remove unused variable from function This causes a compiler error on Windows builds. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/a171c778..517136a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=48 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=47-48 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Tue May 16 18:03:42 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 16 May 2023 18:03:42 GMT Subject: RFR: Expand old on demand [v50] In-Reply-To: References: Message-ID: <1AATYeLbZorbsULhcJlkDt4qPCcr4GjXzbZmWtt_BNc=.33f6d658-ab86-4645-86e0-0b02e30538d6@github.com> > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix whitespace ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/517136a2..17ff261a Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=49 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=48-49 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Tue May 16 22:43:32 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 16 May 2023 22:43:32 GMT Subject: RFR: Expand old on demand [v51] In-Reply-To: References: Message-ID: <5yR6kYtWYGF7wbbYjXc2Glu37HzpYIZwMkzEm6Rcm_0=.02479ba5-89c8-48a8-9a1f-40452d5452cc@github.com> > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Use max_capacity() rather than soft_max_capacity() for budgeting ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/17ff261a..2b517dcb Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=50 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=49-50 Stats: 27 lines in 8 files changed: 9 ins; 0 del; 18 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From rkennke at openjdk.org Wed May 17 16:12:10 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 17 May 2023 16:12:10 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v44] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new ... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Replace homegrown FallbackTable with a ResourceHashtable based impl ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/f1ad3421..3f7cc376 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=43 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=42-43 Stats: 79 lines in 2 files changed: 2 ins; 59 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From wkemper at openjdk.org Wed May 17 18:04:41 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 17 May 2023 18:04:41 GMT Subject: RFR: Use soft max capacity only for trigger calculations Message-ID: <2rljPAJExG7RWJYcKiLdCr2fvwz0hsZHdFzrogePyx0=.a48e735e-d33d-434c-be40-cfa7a342f46c@github.com> Use the soft max heap capacity only for the trigger's availability calculations. Our current generation sizing calculations cannot tolerate arbitrary changes to capacity. This use of soft max heap is consistent with original non-generational behavior. Here, the soft max heap capacity is applied directly as the heuristics view of the capacity of the young generation. ------------- Commit messages: - Remove now fixed test from problem list - Use soft max capacity only for trigger calculations Changes: https://git.openjdk.org/shenandoah/pull/280/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=280&range=00 Stats: 34 lines in 5 files changed: 8 ins; 6 del; 20 mod Patch: https://git.openjdk.org/shenandoah/pull/280.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/280/head:pull/280 PR: https://git.openjdk.org/shenandoah/pull/280 From kdnilsen at openjdk.org Wed May 17 18:04:43 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 17 May 2023 18:04:43 GMT Subject: RFR: Use soft max capacity only for trigger calculations In-Reply-To: <2rljPAJExG7RWJYcKiLdCr2fvwz0hsZHdFzrogePyx0=.a48e735e-d33d-434c-be40-cfa7a342f46c@github.com> References: <2rljPAJExG7RWJYcKiLdCr2fvwz0hsZHdFzrogePyx0=.a48e735e-d33d-434c-be40-cfa7a342f46c@github.com> Message-ID: On Wed, 17 May 2023 17:49:56 GMT, William Kemper wrote: > Use the soft max heap capacity only for the trigger's availability calculations. Our current generation sizing calculations cannot tolerate arbitrary changes to capacity. This use of soft max heap is consistent with original non-generational behavior. Here, the soft max heap capacity is applied directly as the heuristics view of the capacity of the young generation. Thanks for cleaning this up ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/280#pullrequestreview-1431271049 From ysr at openjdk.org Wed May 17 20:59:34 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 17 May 2023 20:59:34 GMT Subject: RFR: Use soft max capacity only for trigger calculations In-Reply-To: <2rljPAJExG7RWJYcKiLdCr2fvwz0hsZHdFzrogePyx0=.a48e735e-d33d-434c-be40-cfa7a342f46c@github.com> References: <2rljPAJExG7RWJYcKiLdCr2fvwz0hsZHdFzrogePyx0=.a48e735e-d33d-434c-be40-cfa7a342f46c@github.com> Message-ID: On Wed, 17 May 2023 17:49:56 GMT, William Kemper wrote: > Use the soft max heap capacity only for the trigger's availability calculations. Our current generation sizing calculations cannot tolerate arbitrary changes to capacity. This use of soft max heap is consistent with original non-generational behavior. Here, the soft max heap capacity is applied directly as the heuristics view of the capacity of the young generation. LGTM! Some minor comments. src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 900: > 898: _affiliated_region_count(0), _humongous_waste(0), _used(0), _bytes_allocated_since_gc_start(0), > 899: _max_capacity(max_capacity), _soft_max_capacity(soft_max_capacity), > 900: _adjusted_capacity(max_capacity), _heuristics(nullptr) { What is `_adjusted_capacity` used for and where? (Given that I see it start out here as being no different from `_max_capacity`, but I'll read on and see if I can find the answer.) src/hotspot/share/gc/shenandoah/shenandoahGeneration.hpp line 114: > 112: virtual size_t available() const; > 113: > 114: size_t soft_available() const; Please write a single line of documentation for the new field. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 513: > 511: // for old would be total heap - minimum capacity of young. This means the sum of the maximum > 512: // allowed for old and young could exceed the total heap size. It remains the case that the > 513: // _actual_ capacity of young + old = total. Not your change, but it seems like this comment might belong in the _declaration_ of `max_capacity()` (or of `_max_capacity`), rather than here? ------------- Marked as reviewed by ysr (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/280#pullrequestreview-1431582721 PR Review Comment: https://git.openjdk.org/shenandoah/pull/280#discussion_r1197029505 PR Review Comment: https://git.openjdk.org/shenandoah/pull/280#discussion_r1197032341 PR Review Comment: https://git.openjdk.org/shenandoah/pull/280#discussion_r1197038174 From ysr at openjdk.org Wed May 17 20:59:35 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Wed, 17 May 2023 20:59:35 GMT Subject: RFR: Use soft max capacity only for trigger calculations In-Reply-To: References: <2rljPAJExG7RWJYcKiLdCr2fvwz0hsZHdFzrogePyx0=.a48e735e-d33d-434c-be40-cfa7a342f46c@github.com> Message-ID: On Wed, 17 May 2023 20:39:56 GMT, Y. Srinivas Ramakrishna wrote: >> Use the soft max heap capacity only for the trigger's availability calculations. Our current generation sizing calculations cannot tolerate arbitrary changes to capacity. This use of soft max heap is consistent with original non-generational behavior. Here, the soft max heap capacity is applied directly as the heuristics view of the capacity of the young generation. > > src/hotspot/share/gc/shenandoah/shenandoahGeneration.cpp line 900: > >> 898: _affiliated_region_count(0), _humongous_waste(0), _used(0), _bytes_allocated_since_gc_start(0), >> 899: _max_capacity(max_capacity), _soft_max_capacity(soft_max_capacity), >> 900: _adjusted_capacity(max_capacity), _heuristics(nullptr) { > > What is `_adjusted_capacity` used for and where? (Given that I see it start out here as being no different from `_max_capacity`, but I'll read on and see if I can find the answer.) Never mind, I see now the code that does adjustments `adjust_available()` & `unadjust_available()`, and IIRC this will be gone (or at least done differently) in Kelvin's upcoming `Expand Old On Demand` work. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/280#discussion_r1197030734 From rkennke at openjdk.org Wed May 17 21:25:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 17 May 2023 21:25:29 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v45] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new ... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 111 commits: - Merge branch 'master' into JDK-8305896 - Replace homegrown FallbackTable with a ResourceHashtable based impl - Merge remote-tracking branch 'origin/JDK-8305896' into JDK-8305896 - Some more @shipilev comments - Update src/hotspot/share/gc/shared/slidingForwarding.hpp Co-authored-by: Aleksey Shipil?v - Align fake-heap without GCC warnings (duh) - Merge branch 'master' into JDK-8305896 - Fix gtest: Align fake-heaps, avoid re-forwardings - @tschatzl's latest fix, cleanup and a test that checks unaligned heap problem - Fix build - ... and 101 more: https://git.openjdk.org/jdk/compare/902585be...bff747fa ------------- Changes: https://git.openjdk.org/jdk/pull/13582/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=44 Stats: 855 lines in 24 files changed: 818 ins; 0 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From wkemper at openjdk.org Thu May 18 17:18:00 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 18 May 2023 17:18:00 GMT Subject: RFR: Use soft max capacity only for trigger calculations [v2] In-Reply-To: References: <2rljPAJExG7RWJYcKiLdCr2fvwz0hsZHdFzrogePyx0=.a48e735e-d33d-434c-be40-cfa7a342f46c@github.com> Message-ID: On Wed, 17 May 2023 20:48:30 GMT, Y. Srinivas Ramakrishna wrote: >> William Kemper has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add comment for new method >> - Re-problem list test with intermittent failures > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 513: > >> 511: // for old would be total heap - minimum capacity of young. This means the sum of the maximum >> 512: // allowed for old and young could exceed the total heap size. It remains the case that the >> 513: // _actual_ capacity of young + old = total. > > Not your change, but it seems like this comment might belong in the _declaration_ of `max_capacity()` (or of `_max_capacity`), rather than here? I believe this code will also be replaced soon. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/280#discussion_r1198073198 From wkemper at openjdk.org Thu May 18 17:17:57 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 18 May 2023 17:17:57 GMT Subject: RFR: Use soft max capacity only for trigger calculations [v2] In-Reply-To: <2rljPAJExG7RWJYcKiLdCr2fvwz0hsZHdFzrogePyx0=.a48e735e-d33d-434c-be40-cfa7a342f46c@github.com> References: <2rljPAJExG7RWJYcKiLdCr2fvwz0hsZHdFzrogePyx0=.a48e735e-d33d-434c-be40-cfa7a342f46c@github.com> Message-ID: > Use the soft max heap capacity only for the trigger's availability calculations. Our current generation sizing calculations cannot tolerate arbitrary changes to capacity. This use of soft max heap is consistent with original non-generational behavior. Here, the soft max heap capacity is applied directly as the heuristics view of the capacity of the young generation. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Add comment for new method - Re-problem list test with intermittent failures ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/280/files - new: https://git.openjdk.org/shenandoah/pull/280/files/6eb01c0f..5b406484 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=280&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=280&range=00-01 Stats: 5 lines in 2 files changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/280.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/280/head:pull/280 PR: https://git.openjdk.org/shenandoah/pull/280 From wkemper at openjdk.org Thu May 18 20:46:44 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 18 May 2023 20:46:44 GMT Subject: Integrated: Use soft max capacity only for trigger calculations In-Reply-To: <2rljPAJExG7RWJYcKiLdCr2fvwz0hsZHdFzrogePyx0=.a48e735e-d33d-434c-be40-cfa7a342f46c@github.com> References: <2rljPAJExG7RWJYcKiLdCr2fvwz0hsZHdFzrogePyx0=.a48e735e-d33d-434c-be40-cfa7a342f46c@github.com> Message-ID: On Wed, 17 May 2023 17:49:56 GMT, William Kemper wrote: > Use the soft max heap capacity only for the trigger's availability calculations. Our current generation sizing calculations cannot tolerate arbitrary changes to capacity. This use of soft max heap is consistent with original non-generational behavior. Here, the soft max heap capacity is applied directly as the heuristics view of the capacity of the young generation. This pull request has now been integrated. Changeset: 238fb7a7 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/238fb7a7fe9d9c5dfdba807a84e223a4ed372cb2 Stats: 38 lines in 5 files changed: 12 ins; 5 del; 21 mod Use soft max capacity only for trigger calculations Reviewed-by: kdnilsen, ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/280 From rkennke at openjdk.org Thu May 18 20:48:43 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 18 May 2023 20:48:43 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v46] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new ... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remove G1-only assert for fallback forwarding, and comment with explanation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/bff747fa..6bbb8e01 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=45 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=44-45 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From wkemper at openjdk.org Thu May 18 23:23:25 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 18 May 2023 23:23:25 GMT Subject: RFR: Improve mixed collection logging Message-ID: Not including any old regions in a mixed collection should be rare. It could indicate a bug. This change will log the relevant state when this happens. ------------- Commit messages: - Fix typo - Merge remote-tracking branch 'shenandoah/master' into improve-mixed-collection-logging - Log when and why no regions are selected for mixed collection Changes: https://git.openjdk.org/shenandoah/pull/281/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=281&range=00 Stats: 13 lines in 1 file changed: 12 ins; 1 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/281.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/281/head:pull/281 PR: https://git.openjdk.org/shenandoah/pull/281 From ysr at openjdk.org Thu May 18 23:48:28 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 18 May 2023 23:48:28 GMT Subject: RFR: Improve mixed collection logging In-Reply-To: References: Message-ID: <5iZDb_nHdePgykYBbMl8rNMsNh-EcoH9Z2tDIPLcJDE=.b88e772e-e854-45b1-ac92-932cd17fcf05@github.com> On Thu, 18 May 2023 23:16:55 GMT, William Kemper wrote: > Not including any old regions in a mixed collection should be rare. It could indicate a bug. This change will log the relevant state when this happens. Marked as reviewed by ysr (Committer). ------------- PR Review: https://git.openjdk.org/shenandoah/pull/281#pullrequestreview-1433589341 From kdnilsen at openjdk.org Fri May 19 14:11:39 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 19 May 2023 14:11:39 GMT Subject: RFR: Improve mixed collection logging In-Reply-To: References: Message-ID: On Thu, 18 May 2023 23:16:55 GMT, William Kemper wrote: > Not including any old regions in a mixed collection should be rare. It could indicate a bug. This change will log the relevant state when this happens. There are some other changes to this code in expand-old-on-demand branch as I had observed some sub-optimal behavior during test and debug. If you're chasing an issue here, you might take a peak at that. src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.cpp line 130: > 128: _old_generation->transition_to(ShenandoahOldGeneration::WAITING_FOR_FILL); > 129: } else { > 130: log_info(gc)("No regions selected for mixed collection. " When this fails, it may be useful to understand a bit more about why. For example: heap->get_old_evac_reserve() is the evacuation budget. Knowing how much is live (and how much garbage and "lost evacuation capacity" is in the rejected candidates (or in the first of the rejected candidates) might also be informative. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/281#pullrequestreview-1434497817 PR Review Comment: https://git.openjdk.org/shenandoah/pull/281#discussion_r1199005139 From rrich at openjdk.org Fri May 19 14:25:06 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 19 May 2023 14:25:06 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v30] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 14:19:43 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add NONZERO check for downcall_stub_address_offset_in_bytes(). Hi Martin, I've made a pass over the Java part (except HFA). I found the specs hard to understand but most specs are like this. I'll finish the rest beginning of next week. Cheers, Richard. src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 28: > 26: package jdk.internal.foreign.abi.ppc64; > 27: > 28: import java.lang.foreign.AddressLayout; Imports are not grouped and ordered alphabetically. (Very much as the aarch64 version) src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 73: > 71: public static final int MAX_FLOAT_REGISTER_ARGUMENTS = 13; > 72: > 73: // This is derived from the 64-Bit ELF V2 ABI spec, restricted to what's The comment says ABI V2 but the code seems to handle V1 too. src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 158: > 156: class StorageCalculator { > 157: private final boolean forArguments; > 158: private boolean forVarArgs = false; Seems to be not used. src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 221: > 219: // !useABIv2() && layout.byteSize() > 8 && layout.byteSize() % 8 != 0 > 220: > 221: // Allocate individual fields as gp slots (regs and stack). You explained to me, it's not individual (struct) fields that are handled here. Looks like registers and 8 byte stack slots are allocated to completely cover the struct. Would be good if you could change the comment and names in the code to better reflect this. src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/linux/LinuxPPC64leLinker.java line 41: > 39: > 40: public static LinuxPPC64leLinker getInstance() { > 41: if (instance == null) { Other platforms optimized this to return a constant (probably after you forked off the port). ------------- PR Review: https://git.openjdk.org/jdk/pull/12708#pullrequestreview-1428763014 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1195280060 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1195344452 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1195363380 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1198915617 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1199011136 From kdnilsen at openjdk.org Fri May 19 14:26:12 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 19 May 2023 14:26:12 GMT Subject: RFR: Expand old on demand [v52] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Improve handoff of generation sizing impacts At the end of a completed GC cycle, we make adjustments in the generation sizes. These adjusted are based on the number of trash regions (the size of the just-evacuated collection set) and the number of regions needed within old-gen to support any mixed evacuation effort or promotion that is anticipated to happen during the next GC cycle. Recycling of trashed regions and transfer of region membership between generations happens after we rebuild the free set. This commit assures that the free set rebuild process know how much memory will eventually be available in each generation after we finish reclaiming trashed regions and transfering regions between generations. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/2b517dcb..4298ca7e Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=51 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=50-51 Stats: 191 lines in 7 files changed: 163 ins; 8 del; 20 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From wkemper at openjdk.org Fri May 19 15:24:59 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 19 May 2023 15:24:59 GMT Subject: RFR: Merge openjdk/jdk:master Message-ID: Merges tag jdk-21+23 ------------- Commit messages: - Merge tag 'jdk-21+23' into merge-jdk-21-23 - 8307976: (fs) Files.createDirectories(dir) returns dir::toAbsolutePath instead of dir - 8308239: Tighten up accessibility of nested classes in java.lang.invoke - 8308246: PPC64le build broken after JDK-8304913 - 8307326: Package jdk.internal.classfile.java.lang.constant become obsolete - 8306304: Fix xlc17 clang warnings in ppc and aix code - 8308043: Deadlock in TestCSLocker.java due to blocking GC while allocating - 8308185: Update Http2TestServerConnection to use SSLSocket.startHandshake() - 8308088: Improve class check in CollectedHeap::is_oop - 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing - ... and 215 more: https://git.openjdk.org/shenandoah/compare/e48977a7...3253f2b1 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=shenandoah&pr=282&range=00.0 - openjdk/jdk:master: https://webrevs.openjdk.org/?repo=shenandoah&pr=282&range=00.1 Changes: https://git.openjdk.org/shenandoah/pull/282/files Stats: 122055 lines in 2096 files changed: 95833 ins; 10605 del; 15617 mod Patch: https://git.openjdk.org/shenandoah/pull/282.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/282/head:pull/282 PR: https://git.openjdk.org/shenandoah/pull/282 From shade at openjdk.org Fri May 19 15:25:00 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 19 May 2023 15:25:00 GMT Subject: RFR: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Fri, 19 May 2023 15:16:19 GMT, William Kemper wrote: > Merges tag jdk-21+23 Looks fine, provided testing is clean. ------------- Marked as reviewed by shade (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/282#pullrequestreview-1434627092 From wkemper at openjdk.org Fri May 19 16:08:27 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 19 May 2023 16:08:27 GMT Subject: RFR: Merge openjdk/jdk:master [v2] In-Reply-To: References: Message-ID: > Merges tag jdk-21+23 William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 283 commits: - Merge tag 'jdk-21+23' into merge-jdk-21-23 Added tag jdk-21+23 for changeset 6d4782bc - Use static assert to validate card offset encoding mask Reviewed-by: ysr - Assert that region usage accounting is only accessed in generational mode Reviewed-by: kdnilsen - Increase card offset mask Reviewed-by: kdnilsen - Merge openjdk/jdk:master - Remove passing tests from ProblemList.txt Reviewed-by: ysr, shade - Alphabetize includes Reviewed-by: shade - Use distinct "end of cycle" message for each Shenandoah pause Reviewed-by: kdnilsen, ysr - Remove the remaining unclean upstream difference Reviewed-by: kdnilsen - Merge openjdk/jdk:master Reviewed-by: shade - ... and 273 more: https://git.openjdk.org/shenandoah/compare/6d4782bc...3253f2b1 ------------- Changes: https://git.openjdk.org/shenandoah/pull/282/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=282&range=01 Stats: 19758 lines in 211 files changed: 17921 ins; 893 del; 944 mod Patch: https://git.openjdk.org/shenandoah/pull/282.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/282/head:pull/282 PR: https://git.openjdk.org/shenandoah/pull/282 From wkemper at openjdk.org Fri May 19 16:08:27 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 19 May 2023 16:08:27 GMT Subject: RFR: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Fri, 19 May 2023 15:16:19 GMT, William Kemper wrote: > Merges tag jdk-21+23 There is a failure in one test on windows, running G1. I don't think it has to do with our branch. ------------- PR Comment: https://git.openjdk.org/shenandoah/pull/282#issuecomment-1554792827 From wkemper at openjdk.org Fri May 19 16:08:31 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 19 May 2023 16:08:31 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Fri, 19 May 2023 15:16:19 GMT, William Kemper wrote: > Merges tag jdk-21+23 This pull request has now been integrated. Changeset: d7a706ef Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/d7a706efc554d13b17e276708f37e306f4911cd2 Stats: 122055 lines in 2096 files changed: 95833 ins; 10605 del; 15617 mod Merge openjdk/jdk:master Reviewed-by: shade ------------- PR: https://git.openjdk.org/shenandoah/pull/282 From wkemper at openjdk.org Fri May 19 16:09:11 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 19 May 2023 16:09:11 GMT Subject: RFR: Improve mixed collection logging [v2] In-Reply-To: References: Message-ID: > Not including any old regions in a mixed collection should be rare. It could indicate a bug. This change will log the relevant state when this happens. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Include old evac reserve in log message ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/281/files - new: https://git.openjdk.org/shenandoah/pull/281/files/f794060a..0a1f172a Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=281&range=01 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=281&range=00-01 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/shenandoah/pull/281.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/281/head:pull/281 PR: https://git.openjdk.org/shenandoah/pull/281 From wkemper at openjdk.org Fri May 19 16:09:12 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 19 May 2023 16:09:12 GMT Subject: Integrated: Improve mixed collection logging In-Reply-To: References: Message-ID: On Thu, 18 May 2023 23:16:55 GMT, William Kemper wrote: > Not including any old regions in a mixed collection should be rare. It could indicate a bug. This change will log the relevant state when this happens. This pull request has now been integrated. Changeset: 77885b10 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/77885b1085e2bbd027643c77d2e56bada4b24369 Stats: 15 lines in 1 file changed: 14 ins; 1 del; 0 mod Improve mixed collection logging Reviewed-by: ysr, kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/281 From mdoerr at openjdk.org Fri May 19 18:54:19 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 19 May 2023 18:54:19 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: T... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Cleanup imports, improve comments, updates from other platforms. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/edcdefba..b1f04382 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=29-30 Stats: 35 lines in 2 files changed: 12 ins; 11 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Fri May 19 18:54:20 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 19 May 2023 18:54:20 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v30] In-Reply-To: References: Message-ID: <6xmxUDp_QPq_6Nigjth1OqTi3iHBchfPzhtvQs6DkVM=.939c69ae-6bdc-4323-bf9f-3308f7c714fb@github.com> On Tue, 16 May 2023 14:44:25 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Add NONZERO check for downcall_stub_address_offset_in_bytes(). > > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 28: > >> 26: package jdk.internal.foreign.abi.ppc64; >> 27: >> 28: import java.lang.foreign.AddressLayout; > > Imports are not grouped and ordered alphabetically. > (Very much as the aarch64 version) Done. > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 73: > >> 71: public static final int MAX_FLOAT_REGISTER_ARGUMENTS = 13; >> 72: >> 73: // This is derived from the 64-Bit ELF V2 ABI spec, restricted to what's > > The comment says ABI V2 but the code seems to handle V1 too. It's derived from ABI v2, but v1 is compatible. Added that to the comment. > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 158: > >> 156: class StorageCalculator { >> 157: private final boolean forArguments; >> 158: private boolean forVarArgs = false; > > Seems to be not used. I had kept it in case another PPC64 OS would need it, but I guess it's unlikely. So, I just removed it. Could get added back easily if needed. > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 221: > >> 219: // !useABIv2() && layout.byteSize() > 8 && layout.byteSize() % 8 != 0 >> 220: >> 221: // Allocate individual fields as gp slots (regs and stack). > > You explained to me, it's not individual (struct) fields that are handled here. Looks like registers and 8 byte stack slots are allocated to completely cover the struct. Would be good if you could change the comment and names in the code to better reflect this. Adapted to aarch64 implementation (using `MAX_COPY_SIZE`). > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/linux/LinuxPPC64leLinker.java line 41: > >> 39: >> 40: public static LinuxPPC64leLinker getInstance() { >> 41: if (instance == null) { > > Other platforms optimized this to return a constant (probably after you forked off the port). Good catch. Adapted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1199271594 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1199271941 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1199272912 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1199273422 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1199273918 From kdnilsen at openjdk.org Fri May 19 19:04:17 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 19 May 2023 19:04:17 GMT Subject: RFR: Expand old on demand [v53] In-Reply-To: References: Message-ID: <7BdSNiI6vSsVnMHSKFKccQvla9dWo0TKtxVf5TZc3so=.a562f59f-5cf8-482a-9df8-c8f8b8e9b052@github.com> > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix typo in young_unaffiliated_regions calculation ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/4298ca7e..3f3f566d Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=52 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=51-52 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Sat May 20 23:16:58 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Sat, 20 May 2023 23:16:58 GMT Subject: RFR: Expand old on demand [v54] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with four additional commits since the last revision: - Remove extra debugging instrumentation - Only anticipate promotion for younger regions if is_aging_cycle() - Fix some syntax errors related to preceding commit - Better logging and fix fullgc to clear old deficit and surplus ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/3f3f566d..00284a59 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=53 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=52-53 Stats: 152 lines in 7 files changed: 59 ins; 90 del; 3 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From rrich at openjdk.org Mon May 22 08:26:01 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 08:26:01 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v30] In-Reply-To: <6xmxUDp_QPq_6Nigjth1OqTi3iHBchfPzhtvQs6DkVM=.939c69ae-6bdc-4323-bf9f-3308f7c714fb@github.com> References: <6xmxUDp_QPq_6Nigjth1OqTi3iHBchfPzhtvQs6DkVM=.939c69ae-6bdc-4323-bf9f-3308f7c714fb@github.com> Message-ID: On Fri, 19 May 2023 18:47:49 GMT, Martin Doerr wrote: >> src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 158: >> >>> 156: class StorageCalculator { >>> 157: private final boolean forArguments; >>> 158: private boolean forVarArgs = false; >> >> Seems to be not used. > > I had kept it in case another PPC64 OS would need it, but I guess it's unlikely. So, I just removed it. Could get added back easily if needed. I see. There are other examples of redundant code that might serve a purpose in the future. I honestly don't like that. In the case of `forVarArgs` porters can still find it in the aarch64 version :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200139200 From rrich at openjdk.org Mon May 22 08:56:07 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 08:56:07 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 18:54:19 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup imports, improve comments, updates from other platforms. src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 65: > 63: */ > 64: public abstract class CallArranger { > 65: protected abstract boolean useABIv2(); This could also be refactored into a static method with the same trick that is used in `LinuxPPC64leLinker::getInstance`. Callers could be static then and you could delete `CallArranger::ABIv2` and `ABIv2CallArranger`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200197510 From rrich at openjdk.org Mon May 22 12:18:08 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 12:18:08 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 08:53:21 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Cleanup imports, improve comments, updates from other platforms. > > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 65: > >> 63: */ >> 64: public abstract class CallArranger { >> 65: protected abstract boolean useABIv2(); > > This could also be refactored into a static method with the same trick that is used in `LinuxPPC64leLinker::getInstance`. Callers could be static then and you could delete `CallArranger::ABIv2` and `ABIv2CallArranger`. Maybe something like? protected static final boolean useABIv2 = CABI.current() == CABI.LINUX_PPC_64_LE; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200432599 From mdoerr at openjdk.org Mon May 22 12:42:08 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 12:42:08 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 12:14:48 GMT, Richard Reingruber wrote: >> src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 65: >> >>> 63: */ >>> 64: public abstract class CallArranger { >>> 65: protected abstract boolean useABIv2(); >> >> This could also be refactored into a static method with the same trick that is used in `LinuxPPC64leLinker::getInstance`. Callers could be static then and you could delete `CallArranger::ABIv2` and `ABIv2CallArranger`. > > Maybe something like? > > protected static final boolean useABIv2 = CABI.current() == CABI.LINUX_PPC_64_LE; That would be better to read, but would make the PPC64 CallArranger dependent on the current CABI. Note that there are tests which use import jdk.internal.foreign.abi.aarch64.CallArranger; ... CallArranger.LINUX.getBindings(mt, fd, false); for example. The tests are designed to run on all platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200459227 From rrich at openjdk.org Mon May 22 13:32:02 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 13:32:02 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 12:38:23 GMT, Martin Doerr wrote: > That would be better to read, but would make the PPC64 CallArranger dependent on the current CABI. Note that there are tests which use > > ``` > import jdk.internal.foreign.abi.aarch64.CallArranger; > ... > CallArranger.LINUX.getBindings(mt, fd, false); > ``` > > for example. The tests are designed to run on all platforms. I see, thanks. Would be nice to have some for PPC64 too :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200520623 From mdoerr at openjdk.org Mon May 22 13:38:04 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 13:38:04 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 13:29:14 GMT, Richard Reingruber wrote: >> That would be better to read, but would make the PPC64 CallArranger dependent on the current CABI. >> Note that there are tests which use >> >> import jdk.internal.foreign.abi.aarch64.CallArranger; >> ... >> CallArranger.LINUX.getBindings(mt, fd, false); >> >> for example. The tests are designed to run on all platforms. > >> That would be better to read, but would make the PPC64 CallArranger dependent on the current CABI. Note that there are tests which use >> >> ``` >> import jdk.internal.foreign.abi.aarch64.CallArranger; >> ... >> CallArranger.LINUX.getBindings(mt, fd, false); >> ``` >> >> for example. The tests are designed to run on all platforms. > > I see, thanks. Would be nice to have some for PPC64 too :) Probably, yes. I didn't find time for figuring out what would be useful tests. We could still add some in the future or with the big endian port. Another idea: Would the following be better? `final boolean useABIv2 = (this.getClass() == ABIv2CallArranger.class);` That would also allow getting rid of the method `useABIv2()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200527513 From mdoerr at openjdk.org Mon May 22 13:45:12 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 13:45:12 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 13:34:49 GMT, Martin Doerr wrote: >>> That would be better to read, but would make the PPC64 CallArranger dependent on the current CABI. Note that there are tests which use >>> >>> ``` >>> import jdk.internal.foreign.abi.aarch64.CallArranger; >>> ... >>> CallArranger.LINUX.getBindings(mt, fd, false); >>> ``` >>> >>> for example. The tests are designed to run on all platforms. >> >> I see, thanks. Would be nice to have some for PPC64 too :) > > Probably, yes. I didn't find time for figuring out what would be useful tests. We could still add some in the future or with the big endian port. > Another idea: Would the following be better? > `final boolean useABIv2 = (this.getClass() == ABIv2CallArranger.class);` > That would also allow getting rid of the method `useABIv2()`. Or better `final boolean useABIv2 = (this instanceof ABIv2CallArranger);` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200537482 From rrich at openjdk.org Mon May 22 14:02:04 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 14:02:04 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 13:42:27 GMT, Martin Doerr wrote: >> Probably, yes. I didn't find time for figuring out what would be useful tests. We could still add some in the future or with the big endian port. >> Another idea: Would the following be better? >> `final boolean useABIv2 = (this.getClass() == ABIv2CallArranger.class);` >> That would also allow getting rid of the method `useABIv2()`. > > Or better `final boolean useABIv2 = (this instanceof ABIv2CallArranger);` Yes, good idea. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200558827 From mdoerr at openjdk.org Mon May 22 14:18:56 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 14:18:56 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v32] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: T... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Replace abstract method useABIv2(). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/b1f04382..70736be6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=30-31 Stats: 13 lines in 2 files changed: 0 ins; 5 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Mon May 22 14:19:02 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 14:19:02 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 13:59:05 GMT, Richard Reingruber wrote: >> Or better `final boolean useABIv2 = (this instanceof ABIv2CallArranger);` > > Yes, good idea. Please take a look at https://github.com/openjdk/jdk/pull/12708/commits/70736be631e4f1bf3fd3c0d45ddfc076b74ef9dd ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200571704 From rrich at openjdk.org Mon May 22 14:19:06 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 14:19:06 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 14:09:22 GMT, Martin Doerr wrote: >> Yes, good idea. > > Please take a look at https://github.com/openjdk/jdk/pull/12708/commits/70736be631e4f1bf3fd3c0d45ddfc076b74ef9dd It looks good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200576857 From rkennke at openjdk.org Mon May 22 14:37:18 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 22 May 2023 14:37:18 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v47] In-Reply-To: References: Message-ID: > Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. > > I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. > > It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. > > We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. > > With this, forwarding information would be encoded like this: > - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. > - Bit 2: Used for 'fallback'-forwarding (see below) > - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) > - Bits 4..31 The number of heap words from the target base address > > This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. > > All the table accesses can be done unsynchronized because: > - Serial GC is single-threaded anyway > - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. > - G1 serial compaction is single-threaded > > The change introduces a new ... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Specialize full-GC loops to get UseAltGCForwarding flag check out of hot paths ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13582/files - new: https://git.openjdk.org/jdk/pull/13582/files/6bbb8e01..6bbd2952 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=46 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13582&range=45-46 Stats: 567 lines in 28 files changed: 354 ins; 101 del; 112 mod Patch: https://git.openjdk.org/jdk/pull/13582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13582/head:pull/13582 PR: https://git.openjdk.org/jdk/pull/13582 From rkennke at openjdk.org Mon May 22 14:41:09 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 22 May 2023 14:41:09 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v33] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 10:12:41 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix typos >> - Make FallbackTable an inner class of SlidingForwarding > > The https://github.com/openjdk/jdk/compare/master...tschatzl:jdk:alt-fullgc-forwarding?expand=1 branch now contains the promised cleanup. @tschatzl @fisk @coleenp @shipilev I've pushed a change that specializes all affected full-GC loops to get the flag-check out of the hot loops. This should get performance in the -UseAltGCForwarding paths back to normal, and show minimal effect (if any) in the +UseAltGCForwarding path. It is a little uglier though. If you prefer the clean and slightly slower version, let me know and I would back it out. Consider that once (after a few release cycles) the Lilliput stuff is stable and default, all that specialized stuff will disappear.. I guess the alternative would be to not put it under flag to begin with. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1557338343 From mdoerr at openjdk.org Mon May 22 16:06:02 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 16:06:02 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v33] In-Reply-To: References: Message-ID: <-4sIEOVwyrDPTp4DKnIJnoWau845QEXn3aTOkS9FLp8=.6c87bc30-c614-4b1c-8a72-51214bff444d@github.com> > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: T... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add comment about Register Save Area. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/70736be6..ac5c5dcc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=31-32 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From kdnilsen at openjdk.org Mon May 22 17:10:25 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 22 May 2023 17:10:25 GMT Subject: RFR: Expand old on demand [v55] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 142 commits: - Merge remote-tracking branch 'origin' into expand-old-on-demand - Remove extra debugging instrumentation - Only anticipate promotion for younger regions if is_aging_cycle() - Fix some syntax errors related to preceding commit - Better logging and fix fullgc to clear old deficit and surplus - Fix typo in young_unaffiliated_regions calculation - Improve handoff of generation sizing impacts At the end of a completed GC cycle, we make adjustments in the generation sizes. These adjusted are based on the number of trash regions (the size of the just-evacuated collection set) and the number of regions needed within old-gen to support any mixed evacuation effort or promotion that is anticipated to happen during the next GC cycle. Recycling of trashed regions and transfer of region membership between generations happens after we rebuild the free set. This commit assures that the free set rebuild process know how much memory will eventually be available in each generation after we finish reclaiming trashed regions and transfering regions between generations. - Use max_capacity() rather than soft_max_capacity() for budgeting - Fix whitespace - Remove unused variable from function This causes a compiler error on Windows builds. - ... and 132 more: https://git.openjdk.org/shenandoah/compare/d7a706ef...53c899cd ------------- Changes: https://git.openjdk.org/shenandoah/pull/248/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=54 Stats: 3386 lines in 35 files changed: 1946 ins; 934 del; 506 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From rkennke at openjdk.org Mon May 22 19:58:13 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 22 May 2023 19:58:13 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v47] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 14:37:18 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The c... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Specialize full-GC loops to get UseAltGCForwarding flag check out of hot paths Using the same Retain.java program that Aleksey posted earlier, I now get the following numbers: Baseline: 286.9ms -AltGCForwarding: 286.3ms (-0.2%) +AltGCForwarding: 309.1ms (+7.7%) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1557869772 From mdoerr at openjdk.org Mon May 22 21:36:12 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 21:36:12 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v34] In-Reply-To: References: Message-ID: <_rcz557uylyTKjbgSwU4vMDdy7ifSR8g0EUCjy9TmiI=.9bf264af-1f4b-4492-8db7-61b6308f5694@github.com> > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: T... Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 41 commits: - Adaptation for JDK-8308276. - Merge remote-tracking branch 'origin' into PPC64_Panama - Add comment about Register Save Area. - Replace abstract method useABIv2(). - Cleanup imports, improve comments, updates from other platforms. - Add NONZERO check for downcall_stub_address_offset_in_bytes(). - Replace NULL by nullptr. - libTestHFA: Add explicit type conversion to avoid build warning. - Add test case for passing a double value in a GP register. Use better instructions for moving between FP and GP reg. Improve comments. - Merge remote-tracking branch 'origin' into PPC64_Panama - ... and 31 more: https://git.openjdk.org/jdk/compare/939344b8...08a5c143 ------------- Changes: https://git.openjdk.org/jdk/pull/12708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=33 Stats: 2479 lines in 27 files changed: 2432 ins; 0 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From rrich at openjdk.org Mon May 22 22:03:07 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 22:03:07 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v33] In-Reply-To: <-4sIEOVwyrDPTp4DKnIJnoWau845QEXn3aTOkS9FLp8=.6c87bc30-c614-4b1c-8a72-51214bff444d@github.com> References: <-4sIEOVwyrDPTp4DKnIJnoWau845QEXn3aTOkS9FLp8=.6c87bc30-c614-4b1c-8a72-51214bff444d@github.com> Message-ID: On Mon, 22 May 2023 16:06:02 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add comment about Register Save Area. src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 161: > 159: // (native_abi_reg_args is native_abi_minframe plus space for 8 argument register spill slots) > 160: assert(_abi._shadow_space_bytes == frame::native_abi_minframe_size, "expected space according to ABI"); > 161: // Note: For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 This is hard to understand. It should be explained that we allocate a PSA even though ABI V2 only requires it if not all parameters can be passed in registers. src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 162: > 160: assert(_abi._shadow_space_bytes == frame::native_abi_minframe_size, "expected space according to ABI"); > 161: // Note: For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 > 162: int register_save_area_slots = MAX2(_input_registers.length(), 8); Both specs, ABI V1 and V2, call this "Parameter Save Area" we should use the same name. Suggestion: int parameter_save_area_slots = MAX2(_input_registers.length(), 8); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201132931 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201128718 From rrich at openjdk.org Mon May 22 22:03:04 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 22:03:04 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v34] In-Reply-To: <_rcz557uylyTKjbgSwU4vMDdy7ifSR8g0EUCjy9TmiI=.9bf264af-1f4b-4492-8db7-61b6308f5694@github.com> References: <_rcz557uylyTKjbgSwU4vMDdy7ifSR8g0EUCjy9TmiI=.9bf264af-1f4b-4492-8db7-61b6308f5694@github.com> Message-ID: On Mon, 22 May 2023 21:36:12 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat... > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 41 commits: > > - Adaptation for JDK-8308276. > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Add comment about Register Save Area. > - Replace abstract method useABIv2(). > - Cleanup imports, improve comments, updates from other platforms. > - Add NONZERO check for downcall_stub_address_offset_in_bytes(). > - Replace NULL by nullptr. > - libTestHFA: Add explicit type conversion to avoid build warning. > - Add test case for passing a double value in a GP register. Use better instructions for moving between FP and GP reg. Improve comments. > - Merge remote-tracking branch 'origin' into PPC64_Panama > - ... and 31 more: https://git.openjdk.org/jdk/compare/939344b8...08a5c143 Hi Martin, there seems to be a mismatch between this pr and the [64-bit ELF ABI V2 for PPC](https://openpowerfoundation.org/specifications/64bitelfabi/). In fact all dynamically generated calls that have to conform to ABI V2 are affected. I'm giving a short summary of our discussion this afternoon. Very briefly: ABI V2 states that a Parameter Save Area (PSA) shall be allocated _unless_ all parameters can be passed in registers as indicated by the caller's prototype, whereas the port always allocates a PSA of 8 double words. (Details under "Parameter Save Area" in "2.2.3.3. Optional Save Areas" of ELF ABI V2) It is not wrong what we're doing. It is like we didn't know the prototype of the call targets. But for most calls [1] we are wasting stack space (and confusing everybody that tries to match the implementation with the spec). Interestingly ABI V1 states that a PSA of at least 8 double words is always needed. Looks like we've missed that change. I have conducted a little experiment and compiled the following test program using Compiler Explorer [2] #include int64_t test_callee(int64_t p1, int64_t p2, int64_t p3, int64_t p4); int64_t test_call(int64_t p1, int64_t p2, int64_t p3, int64_t p4) { return test_callee(p1, p2, p3, p4); } This is the -O2 output for ELF ABI V2 (little endian) Note: the stdu allocates just the minimal frame of 4 double words without PSA. test_call: # @test_call .Lfunc_gep0: addis 2, 12, .TOC.-.Lfunc_gep0 at ha addi 2, 2, .TOC.-.Lfunc_gep0 at l mflr 0 stdu 1, -32(1) std 0, 48(1) bl test_callee nop addi 1, 1, 32 ld 0, 16(1) mtlr 0 blr This is the -O2 output for ELF ABI V1 (big endian) test_call: # @test_call .quad .Lfunc_begin0 .quad .TOC. at tocbase .quad 0 .Lfunc_begin0: mflr 0 stdu 1, -112(1) std 0, 128(1) bl test_callee nop addi 1, 1, 112 ld 0, 16(1) mtlr 0 blr Note: the stdu allocates a much larger frame because it accomodates a PSA of 8 double words. I'd suggest to keep the current well tested version but add comments to the code that a PSA is always allocated even though ABI V2 does not require it. This should also be explained in the JBS item. Furthermore RFEs should be filed to adopt ABI V2 in the FFM API port and in the hotspot port to PPC64le. There's quite a bit of room for improvement there. Ironically I've very recently fixed [JDK-8306111](https://bugs.openjdk.org/browse/JDK-8306111) citing the ABI V2 spec without realizing that the fix is not needed for V2, just for V1. [1] Exceptions that _do_ require a PSA are calls with a really long or variable parameter list or if a prototype is not available [2] [Experiment with "Compiler Explorer"](https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYgATKVpMGoAPrnkpJfWQE8Ayo3QBhVLQCuLBhNJOAMngMmAByngBGmMQgABykAA6oCoT2DK4eXj4JSXYCAUGhLBFRsdaYtilCBEzEBGme3lxWmDY5DJXVBHkh4ZExVlU1dRmNCgOdgd2FvdEAlFao7sTI7BwApHoAzIHIHlgA1KsbzqP4ggB0CIfYqxoAgje3gQQAbJKmBHsEmKOmorT0mAgT1e7z2cUae2Bbw%2BcQMkMEIJhG1I8Je0LBkhmhwAQg8HlDQV8fn9aECEejwSiCTC4dSwcjUYiMTMDgB2XF3PZcvbETAERYMT7fAi/MQAiCUsFwuIMuKYnF41kAEQeHDmtE4AFZeN4OFpSKhOM49goFktMAdNnpeARNGq5gBrECSACcZw0Gi4Ls1zxdklZXGeekkyI1HEkOrtBs4vAUIA0pFterVpDgsBgiBQqBYcTokXIlDQObzUWQcTiyFeNiMJlIWAAbnhlgA1PCYADuAHk4oxODw%2BHQvsQ4xAwlGwoFqgBPPu8CfMYhTzthbRlJP9otsQSdhi0GfJuuYFjGYDiA/4XnlevfKOYVRldxfWfkQTNKO0PBhYjT1xYKMEYg8BYWc5ioIxgAUVsO27XtuF4fhBBEMR2CkGRBEUFR1APXRGhrMwLEMT840gOZUDiVo4w4ABaTtrQNa9iEArBiIgOZSnKBwICcIYGl8Bh0C6AoikyRJkgEHiROyFJBJ6KIRmaNcKjGCT5JaJSOhkqY5P6DoVJ0mpNOErg2LNZZ9HVLVIwPQ0ODBVB20iKsLR2E89ggAD3AYB0WQgXBCBIS0Nj0GYbTtGY5gQTAmCwKJW NIJ0NlZM5NUMTgI1IYDNQTXV9Rs2N40TMLUwzCAkCLXN6DICgIHKksUHLStJBc2sGybTAoK7HtdX7GhaCHEcxwPedp2fYbF2XVdbGfTdGAIHc9yjLBjxMM99QvRTr0o/U7wfJ84JfL4w31D8vx/DAVn1ACgJAvhwMgttOtgnrZCQ8RUIQ%2BQlDUKNdAMPCQHMX5CLCFjSPIlJKJoujUAYpibxIpo1M47i3HqCRYn8CYhN6QN4lE1oVNiLIxIYQyceeRHFIEdpBlR4ZKY4toxjJ7TRl0uneLZgysdkiQTMWMzgtSjhtVIHLeBsuyHOICtXj2ZrgDcjyvJ8vyiGIQLgtC5NwtISLot6OKw3SzLsqjPKrAKpMtF1p0srOF0XWiDQQw0TU9GiRLNVZYWNis3KY0KnWLI4OjxejDhtZtuYGKSBxJCAA%3D%3D) src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 205: > 203: stack = stackAlloc(4, 4); > 204: } else { > 205: stack = stackAlloc(is32Bit ? 4 : 8, STACK_SLOT_SIZE); This looks like a stack slot is always allocated. Please explain that for ABI V2 this is actually only required if it is know from a prototype that not all parameters can be passed in registers and that we plan to change this. ------------- Changes requested by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12708#pullrequestreview-1437623272 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201198779 From mdoerr at openjdk.org Mon May 22 22:29:18 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 22:29:18 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: T... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Parameter Save Area is the correct name. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/08a5c143..b912155b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=33-34 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Mon May 22 22:29:18 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 22:29:18 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v33] In-Reply-To: References: <-4sIEOVwyrDPTp4DKnIJnoWau845QEXn3aTOkS9FLp8=.6c87bc30-c614-4b1c-8a72-51214bff444d@github.com> Message-ID: On Mon, 22 May 2023 21:22:50 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment about Register Save Area. > > src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 162: > >> 160: assert(_abi._shadow_space_bytes == frame::native_abi_minframe_size, "expected space according to ABI"); >> 161: // Note: For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 >> 162: int register_save_area_slots = MAX2(_input_registers.length(), 8); > > Both specs, ABI V1 and V2, call this "Parameter Save Area" we should use the same name. > Suggestion: > > int parameter_save_area_slots = MAX2(_input_registers.length(), 8); Thanks! Changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201234986 From mdoerr at openjdk.org Mon May 22 22:39:06 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 22:39:06 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v34] In-Reply-To: References: <_rcz557uylyTKjbgSwU4vMDdy7ifSR8g0EUCjy9TmiI=.9bf264af-1f4b-4492-8db7-61b6308f5694@github.com> Message-ID: On Mon, 22 May 2023 21:49:27 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 41 commits: >> >> - Adaptation for JDK-8308276. >> - Merge remote-tracking branch 'origin' into PPC64_Panama >> - Add comment about Register Save Area. >> - Replace abstract method useABIv2(). >> - Cleanup imports, improve comments, updates from other platforms. >> - Add NONZERO check for downcall_stub_address_offset_in_bytes(). >> - Replace NULL by nullptr. >> - libTestHFA: Add explicit type conversion to avoid build warning. >> - Add test case for passing a double value in a GP register. Use better instructions for moving between FP and GP reg. Improve comments. >> - Merge remote-tracking branch 'origin' into PPC64_Panama >> - ... and 31 more: https://git.openjdk.org/jdk/compare/939344b8...08a5c143 > > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 205: > >> 203: stack = stackAlloc(4, 4); >> 204: } else { >> 205: stack = stackAlloc(is32Bit ? 4 : 8, STACK_SLOT_SIZE); > > This looks like a stack slot is always allocated. Please explain that for ABI V2 this is actually only required if it is know from a prototype that not all parameters can be passed in registers and that we plan to change this. This basically computes the stack layout. We need to count all slots to get the right offset for the registers which actually get written on stack. The first such register will hit native_abi_minframe_size + 8 slots. If fewer registers are used, the counted stack slots will not be used. The decision whether we allocate the Parameter Save Area or not is done in the downcall stub and doesn't depend on the stackAllocs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201251283 From mdoerr at openjdk.org Mon May 22 22:49:04 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 22:49:04 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 22:29:18 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Parameter Save Area is the correct name. Thanks for publishing our discussion, here. The unnecessary PSA affects other areas of hotspot much more than Panama. Yes, we should file an RFE. I think one for hotspot is sufficient as the downcall stub is part of it. I don't think it needs extra treatment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1558139323 From rrich at openjdk.org Tue May 23 07:41:05 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 23 May 2023 07:41:05 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35] In-Reply-To: References: Message-ID: <_UnPU5zMoXYMnjctnqw9hvTHbJXltv5w0wEJjRVod54=.0d591e6b-5ae4-4c45-88f4-41227d98c745@github.com> On Mon, 22 May 2023 22:29:18 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Parameter Save Area is the correct name. src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 163: > 161: // The Parameter Save Area needs to be at least 8 slots for ABIv1. > 162: // ABIv2 allows omitting it when all parameters can get passed in registers. We currently don't optimize this. > 163: // For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 The PSA is also needed if the parameter list is variable in length. Is the expression `(_input_registers.length() > 8) ? _input_registers.length() : 0` correct in that case too? Otherwise: `ABIv2 allows omitting it if the caller's prototype indicates that stack parameters are not expected. We currently don't optimize this.` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201693249 From rrich at openjdk.org Tue May 23 07:49:05 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 23 May 2023 07:49:05 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35] In-Reply-To: <_UnPU5zMoXYMnjctnqw9hvTHbJXltv5w0wEJjRVod54=.0d591e6b-5ae4-4c45-88f4-41227d98c745@github.com> References: <_UnPU5zMoXYMnjctnqw9hvTHbJXltv5w0wEJjRVod54=.0d591e6b-5ae4-4c45-88f4-41227d98c745@github.com> Message-ID: On Tue, 23 May 2023 07:37:37 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Parameter Save Area is the correct name. > > src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 163: > >> 161: // The Parameter Save Area needs to be at least 8 slots for ABIv1. >> 162: // ABIv2 allows omitting it when all parameters can get passed in registers. We currently don't optimize this. >> 163: // For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 > > The PSA is also needed if the parameter list is variable in length. Is the expression `(_input_registers.length() > 8) ? _input_registers.length() : 0` correct in that case too? > Otherwise: `ABIv2 allows omitting it if the callee's prototype indicates that stack parameters are not expected. We currently don't optimize this.` Ok, I see now. This is not obvious though. There are a few layers of abstraction at play which hide this. A comment is needed. Maybe like this: ```c++ // With ABIv1 a Parameter Save Area of at least 8 double words is always needed. // ABIv2 allows omitting it if the callee's prototype indicates that stack parameters are not expected. // We currently don't optimize this (see DowncallStubGenerator in the backend). if (reg == null) return stack; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201706335 From rrich at openjdk.org Tue May 23 07:54:04 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 23 May 2023 07:54:04 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 22:44:41 GMT, Martin Doerr wrote: > Thanks for publishing our discussion, here. The unnecessary PSA affects other areas of hotspot much more than Panama. Yes, we should file an RFE. I think one for hotspot is sufficient as the downcall stub is part of it. I don't think it needs extra treatment. That's fine. It should have a little list of areas to be revisited. Adoc: - Runtime calls by the interpreter, c1, and c2 - Interpreted and compiled JNI calls - FFM API ("Panama") calls - Runtime calls by continuation intrisics - Runtime calls by GC barriers Subtasks can be generated from that list. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1558725262 From mdoerr at openjdk.org Tue May 23 15:21:02 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 23 May 2023 15:21:02 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v36] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: T... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Improve comments about the Parameter Save Area. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/b912155b..5a00d804 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=34-35 Stats: 8 lines in 2 files changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Tue May 23 15:21:11 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 23 May 2023 15:21:11 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35] In-Reply-To: References: <_UnPU5zMoXYMnjctnqw9hvTHbJXltv5w0wEJjRVod54=.0d591e6b-5ae4-4c45-88f4-41227d98c745@github.com> Message-ID: On Tue, 23 May 2023 07:46:08 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 163: >> >>> 161: // The Parameter Save Area needs to be at least 8 slots for ABIv1. >>> 162: // ABIv2 allows omitting it when all parameters can get passed in registers. We currently don't optimize this. >>> 163: // For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 >> >> The PSA is also needed if the parameter list is variable in length. Is the expression `(_input_registers.length() > 8) ? _input_registers.length() : 0` correct in that case too? >> Otherwise: `ABIv2 allows omitting it if the callee's prototype indicates that stack parameters are not expected. We currently don't optimize this.` > > Ok, I see now. This is not obvious though. There are a few layers of abstraction at play which hide this. A comment is needed. Maybe like this: > ```c++ > // With ABIv1 a Parameter Save Area of at least 8 double words is always needed. > // ABIv2 allows omitting it if the callee's prototype indicates that stack parameters are not expected. > // We currently don't optimize this (see DowncallStubGenerator in the backend). > if (reg == null) return stack; I believe omitting the PSA is wrong for varargs, but we don't have this information in the backend. So, I think we should simply not optimize it. Reserving 64 Byte stack space should be affordable for a downcall even if it's not always needed. The Java side could compute it, but there's no way to pass this information to the backend. I've improved the comments. Please take a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1202235085 From rrich at openjdk.org Tue May 23 16:23:03 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 23 May 2023 16:23:03 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35] In-Reply-To: References: <_UnPU5zMoXYMnjctnqw9hvTHbJXltv5w0wEJjRVod54=.0d591e6b-5ae4-4c45-88f4-41227d98c745@github.com> Message-ID: On Tue, 23 May 2023 12:26:52 GMT, Martin Doerr wrote: > I believe omitting the PSA is wrong for varargs, but we don't have this information in the backend. It is clearly wrong. > So, I think we should simply not optimize it. Already agreed for this version. > Reserving 64 Byte stack space should be affordable for a downcall even if it's not always needed. It is hardly ever needed to begin with. It's just not pretty to allocate the PSA for little test functions. It'll confuse people that compare the downcall stub to what the C compiler produces. They might even think we havn't understood the ABI ;) > The Java side could compute it, but there's no way to pass this information to the backend. I'm not sure about that. First idea would be to keep the allocated `stack` in `nextStorage` if needed and pass it to the backend. > I've improved the comments. Please take a look. Great! Thanks :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1202652115 From kdnilsen at openjdk.org Tue May 23 16:41:52 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 16:41:52 GMT Subject: RFR: Expand old on demand [v56] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Fix merge error - Move Collector-free regions to Mutator-free at start of update refs There is no need to maintain a reserve of regions once evacuation has completed. This extends the mutator allocation runway and reduces the likelihood of degenerated GC. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/53c899cd..9195007a Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=55 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=54-55 Stats: 83 lines in 3 files changed: 79 ins; 1 del; 3 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Tue May 23 16:49:16 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 16:49:16 GMT Subject: RFR: Expand old on demand [v57] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Change ShenandoahFullGCThreshold defaulit This experimental parameter specifies how many degenerated GCs to perform before upgrading the degenerated GC to a Full GC. Prior to this change, the default value was 3. This changes the default to 64. In general, upgrading degenerated GC to Full GC is to be discouraged because Full GC duplicates all of the work performed by concurrent GC before degeneration occurs and because concurrent GC triggered immediately following completion of FullGC is generally ineffective because there has been no time for objects allocated following completion of FullGC to die. In many workload scenarios, this results in a deadly embrace between repeated interleaving of interrupted concurrent GC followed by degenerated GC that upgrades to FullGC. A series of degenerated GCs is much more effective because the degenerated GC preserves the progress made by concurrent GC rather than duplicating those efforts, and because the degenerated GC leaves floating garbage around which will be cleaned up by the next concurrent GC. On Specjbb, this change improved critical JOPS and max JOPS scores by over 20%. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/9195007a..b1e5c267 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=56 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=55-56 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From rrich at openjdk.org Tue May 23 16:58:17 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 23 May 2023 16:58:17 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v36] In-Reply-To: References: Message-ID: On Tue, 23 May 2023 15:21:02 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Improve comments about the Parameter Save Area. This is really good to go now. (maybe after a final round of tests) Thanks a lot! Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12708#pullrequestreview-1440198377 From kdnilsen at openjdk.org Tue May 23 18:11:46 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 18:11:46 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: On Tue, 9 May 2023 01:56:25 GMT, Y. Srinivas Ramakrishna wrote: >> This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. > > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 506: > >> 504: msg = (generation == YOUNG) ? "At end of Concurrent Young GC" : >> 505: "At end of Concurrent Bootstrap GC"; >> 506: // We only record GC results if GC was successful > > Can this section on MMU tracking be moved into `SCT::service_concurrent_cycle(ShenandoahHeap*, ShenandoahGeneration*, GCCause&, bool bootstrap)` at (just after) line 728 below? > > That would be consistent with doing the corresponding old gen work in `SCT::service_cocnurrent_old_cycle()` as you've done below. Thanks. I'm moving this code as suggested on the expand-old-on-demand branch. I'm going to close this PR without integration. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1202804935 From kdnilsen at openjdk.org Tue May 23 18:18:42 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 18:18:42 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: On Wed, 10 May 2023 00:13:41 GMT, Y. Srinivas Ramakrishna wrote: >> This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 75: > >> 73: TruncatedSeq _mmu_average; >> 74: >> 75: static double process_time_seconds(); > > It seems like `process_time_seconds()` is no longer needed either. Thanks. removing from expand-old-on-demand branch. > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 87: > >> 85: void initialize(); >> 86: >> 87: // At completion of each GC cycle (not including interrupted cycles), we invoke one of the following to record the > > In the case of GC's that have been canceled, is the work quantum in service of the partial/canceled cycle just dropped on the floor? Should it intentionally be? Should it intentionally not be? The quantum of GC utilization work that is spent on concurrent GC up to the point of interruption is totaled into the subsequent report for degenerated or full gc when that completes. I'll make this more clear in the comment (on expand-old-on-demand branch). ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1202812896 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1202810730 From kdnilsen at openjdk.org Tue May 23 18:21:36 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 18:21:36 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: On Thu, 11 May 2023 01:05:50 GMT, Y. Srinivas Ramakrishna wrote: >> This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 78: > >> 76: static void fetch_cpu_times(double &gc_time, double &mutator_time); >> 77: >> 78: void help_record_concurrent(ShenandoahGeneration* generation, uint gcid, const char* msg); > > Do you want to call this `record_concurrent_helper()` or `..._work()` in line with existing helper method convention? Thanks. Will change this name on expand-old-on-demand branch. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1202816311 From kdnilsen at openjdk.org Tue May 23 18:26:47 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 18:26:47 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: <1f93r31LVh0byfSQvtlPu8RENp-RxuAk5tizy9D7tNg=.30955864-34f8-49af-9ee0-88326526d9e2@github.com> References: <1f93r31LVh0byfSQvtlPu8RENp-RxuAk5tizy9D7tNg=.30955864-34f8-49af-9ee0-88326526d9e2@github.com> Message-ID: On Thu, 11 May 2023 21:39:16 GMT, William Kemper wrote: >> Or may be I am misunderstanding what "repeat" means here. > > I think only a boot strap cycle and its first old marking increment will have the same `gc_id`. I'm updating this comment on the expand-old-on-demand branch to try to make this more clear. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1202822263 From kdnilsen at openjdk.org Tue May 23 19:59:48 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 19:59:48 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: On Thu, 11 May 2023 01:12:03 GMT, Y. Srinivas Ramakrishna wrote: >> This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 54: > >> 52: class ShenandoahMmuTracker { >> 53: private: >> 54: // For reporting utilization during most recent GC cycle > > May be elaborate a bit (please correct as needed): > > // These hold the values of the most recent snapshots of cumulative > // observations, and are used to compute deltas for specific > // recording/reporting epochs at each subsequent snapshot. Thanks. I've enhanced these comments in the expand-old-on-demand branch. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1202924900 From kdnilsen at openjdk.org Tue May 23 20:03:55 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 20:03:55 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: On Thu, 11 May 2023 01:15:04 GMT, Y. Srinivas Ramakrishna wrote: >> This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 296: > >> 294: } >> 295: >> 296: // With simplified MU implementation, we no longer use _mmu_tracker->average to trigger resizing > > Delete, rather than comment out. You can just leave a comment to motivate the deletion if you think that would help reviewers. this code has been removed from expand-old-on-demand. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1202929884 From kdnilsen at openjdk.org Tue May 23 20:07:39 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 20:07:39 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: <4Jz6CfqBDqe-UuufiwilT7Ir2E1EzYHZO_9M8G7A9Ok=.d327c8a1-bac1-4eea-a046-98e396025201@github.com> On Thu, 11 May 2023 21:50:11 GMT, William Kemper wrote: >> This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. > > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 511: > >> 509: if (heap->collection_set()->has_old_regions()) { >> 510: bool mixed_is_done = (heap->old_heuristics()->unprocessed_old_collection_candidates() == 0); >> 511: mmu_tracker->record_mixed(the_generation, GCId::current(), mixed_is_done); > > Could use `ShenandoahControlThread::get_gc_id` or just access `_gc_id` directly. `GCId::current` has thread local access overhead. Thanks for this. Replaced in expand-old-on-demand. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1202935031 From kdnilsen at openjdk.org Tue May 23 20:10:28 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 20:10:28 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: <31oYdTJXrpFkc_fyQPJPLmqfIxEKnCDGmu3aGWZMWPE=.f5743890-9e6e-4534-9013-50688a6fb777@github.com> On Thu, 11 May 2023 21:58:17 GMT, William Kemper wrote: >> This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 96: > >> 94: } >> 95: >> 96: void ShenandoahMmuTracker::help_record_concurrent(ShenandoahGeneration* generation, uint gcid, const char *msg) { > > `help_record_concurrent` is a confusing name because this function is also used to record degenerated and full gc cycles. Maybe just `update_utilization`? Thanks. Done in expand-old-on-demand. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1202939234 From kdnilsen at openjdk.org Tue May 23 20:16:36 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 20:16:36 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: On Thu, 11 May 2023 21:53:53 GMT, William Kemper wrote: >> This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 106: > >> 104: _most_recent_timestamp = current; >> 105: } else { >> 106: double gc_cycle_duration = current - _most_recent_timestamp; > > Is this cycle duration? or time between cycles? time-between-cycle ends. i think that's not quite the same as time "between cycles". Of course, duration suggests that GC is working the entire time, which it is not. I think I'll go with "period", as updated in expand-old-on-demand branch. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1202946096 From mdoerr at openjdk.org Tue May 23 20:22:10 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 23 May 2023 20:22:10 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v36] In-Reply-To: References: Message-ID: On Tue, 23 May 2023 15:21:02 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Improve comments about the Parameter Save Area. Thanks a lot for all discussions, feedback and the reviews! This was really helpful! I'm planning to integrate tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1560069523 From kdnilsen at openjdk.org Tue May 23 20:30:31 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 20:30:31 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: <_Rh0gI0TchVN9ZvDDkBwyME2KzD6YCliFztXhbyZqpo=.4809c0f3-0c96-44fd-8a5f-0704fe448a81@github.com> On Thu, 11 May 2023 22:11:13 GMT, William Kemper wrote: >> This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 138: > >> 136: bool has_old_candidates) { >> 137: // No special processing for old marking >> 138: double duration = os::elapsedTime() - _most_recent_timestamp; > > We only seem to use `duration` here, none of the following lines produce any used values? my mistake. i've corrected the report to make use of all relevant information (in expand-old-on-demand). Thanks. > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 158: > >> 156: uint gcid, bool is_old_bootstrap, bool is_mixed_done) { >> 157: if ((gcid == _most_recent_gcid) && _most_recent_is_full) { >> 158: // Do nothing. This is a redundant recording for the full gc that just completed. > > We have a redundant call here because some degenerated cycles become full cycles? Could we arrange for this to only be called once from each code path? It's complicated. I'll put a todo statement here to suggest future refactoring. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1202962629 PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1202963369 From kdnilsen at openjdk.org Tue May 23 20:33:33 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 20:33:33 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: On Thu, 11 May 2023 22:15:25 GMT, William Kemper wrote: >> This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.cpp line 192: > >> 190: double mu = mutator_delta / (_active_processors * time_delta); >> 191: double gcu = gc_delta / (_active_processors * time_delta); >> 192: log_info(gc)("Periodic Sample: Average GCU = %.3f%%, Average MU = %.3f%%", gcu * 100, mu * 100); > > These aren't really averages any more right? Just the utilization over the last period? correct. i'll change the log message in expand-old-on-demand. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1202966802 From kdnilsen at openjdk.org Tue May 23 20:53:29 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 20:53:29 GMT Subject: RFR: Changes to simplify MU accounting In-Reply-To: References: Message-ID: On Thu, 11 May 2023 22:04:06 GMT, William Kemper wrote: >> This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. > > src/hotspot/share/gc/shenandoah/shenandoahMmuTracker.hpp line 70: > >> 68: >> 69: bool _most_recent_is_full; >> 70: bool _doing_mixed_evacuations; > > This is written, but never read. Is this used somewhere on a different branch as part of the bigger picture? Good catch. It is no longer used. I'll remove it. (Its a remnant of early experiments with using GCU to trigger GC and wanting to treat GCU during mixed evac as different than GCU outside of mixed evac) I couldn't figure out how to get a good trigger from GCU so I'll remove this for now. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/274#discussion_r1202993181 From kdnilsen at openjdk.org Tue May 23 21:29:59 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 21:29:59 GMT Subject: RFR: Expand old on demand [v58] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Respond to reviewer feedback on PR 274 ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/b1e5c267..5d2c95ba Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=57 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=56-57 Stats: 107 lines in 4 files changed: 34 ins; 48 del; 25 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Tue May 23 21:32:34 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 21:32:34 GMT Subject: Withdrawn: Changes to simplify MU accounting In-Reply-To: References: Message-ID: On Mon, 8 May 2023 23:52:30 GMT, Kelvin Nilsen wrote: > This represents a small part of the expand-old-on-demand branch. By itself, this may result in some performance regression because we no longer use MU (mutator utilization) reports to guide generation sizing decisions. On the other hand, the guidance provided by MU metrics does not seem to be a very accurate predictor of ideal generation sizes. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/shenandoah/pull/274 From kdnilsen at openjdk.org Tue May 23 21:45:06 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 21:45:06 GMT Subject: RFR: Expand old on demand [v59] In-Reply-To: References: Message-ID: <5MeFx3SagcoI9B1er1zA921Ga1yAsO43WoMn4B1j35w=.f3e5d230-6d34-4d3e-9ddf-5af293c58764@github.com> > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix assert to allow transfers to Mutator free set at update refs ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/5d2c95ba..20fe5e38 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=58 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=57-58 Stats: 10 lines in 1 file changed: 7 ins; 0 del; 3 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Tue May 23 23:25:42 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 23 May 2023 23:25:42 GMT Subject: RFR: Expand old on demand [v60] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix argument types to not loose precision ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/20fe5e38..349f87fb Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=59 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=58-59 Stats: 15 lines in 2 files changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From mdoerr at openjdk.org Wed May 24 08:42:23 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 24 May 2023 08:42:23 GMT Subject: Integrated: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) In-Reply-To: References: Message-ID: On Wed, 22 Feb 2023 05:31:46 GMT, Martin Doerr wrote: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: T... This pull request has now been integrated. Changeset: 20f15352 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/20f15352a3014042aa69f7cbfb67de0f7fdddb40 Stats: 2485 lines in 27 files changed: 2438 ins; 0 del; 47 mod 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) Reviewed-by: jvernee, rrich ------------- PR: https://git.openjdk.org/jdk/pull/12708 From gli at openjdk.org Wed May 24 11:03:12 2023 From: gli at openjdk.org (Guoxiong Li) Date: Wed, 24 May 2023 11:03:12 GMT Subject: RFR: 8194823: Serial does not account GCs caused by TLAB allocation in GC overhead limit Message-ID: Hi all, This patch enables the gc overhead limit when allocating TLAB in serial gc. The main modification is at `GenCollectedHeap::allocate_new_tlab` and the other files only adjust the parameters of the method `allocate_new_tlab`. Thanks for the review. Best Regards, -- Guoxiong ------------- Commit messages: - JDK-8194823 Changes: https://git.openjdk.org/jdk/pull/14120/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14120&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8194823 Stats: 35 lines in 17 files changed: 16 ins; 0 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/14120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14120/head:pull/14120 PR: https://git.openjdk.org/jdk/pull/14120 From coleenp at openjdk.org Wed May 24 12:50:19 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 24 May 2023 12:50:19 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v33] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 14:37:46 GMT, Roman Kennke wrote: > I guess the alternative would be to not put it under flag to begin with. Yes, is there any reason this can't be an improvement in the default case without a flag? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1561079434 From kdnilsen at openjdk.org Wed May 24 13:56:20 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 24 May 2023 13:56:20 GMT Subject: RFR: Expand old on demand [v61] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Remove irrelevant assert Asserts about soft_max_capacity are not relevant after changes to how this is implemented and how it is used. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/349f87fb..92608d74 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=60 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=59-60 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Wed May 24 14:10:33 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 24 May 2023 14:10:33 GMT Subject: RFR: Expand old on demand [v62] In-Reply-To: References: Message-ID: <_dvYVBNB2R6bxNucITKLQ3gmvK1dHwebHGNZNlCYb-M=.2fbd57c6-a82c-4fd0-873a-5e7424262f3c@github.com> > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Keep lock while transferring regions to young In ShenandoahFreeSet::move_collector_sets_to_mutator(), we had released the lock before transferring OldCollector regions to young so that they could become part of the Mutator free set. This commit fixes that error. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/92608d74..e02032d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=61 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=60-61 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From rkennke at openjdk.org Wed May 24 14:38:23 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 24 May 2023 14:38:23 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v33] In-Reply-To: References: Message-ID: <9xoNQZSU369K49QDHssVieF48owK05KpTSRzGgxFDa0=.449d7f8d-cbfa-409f-ae3f-4aeac773cc89@github.com> On Wed, 24 May 2023 12:47:09 GMT, Coleen Phillimore wrote: > > I guess the alternative would be to not put it under flag to begin with. > > Yes, is there any reason this can't be an improvement in the default case without a flag? Dunno. A 7% worst-case performance degradation in the full-GC of Serial (and perhaps G1 and Shenandoah) might be a concern. Also, it is a design principle of the compact headers JEP that all non-trivial changes should be under a runtime flag, where the 'legacy' case is not measurably affected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1561280369 From eosterlund at openjdk.org Wed May 24 15:41:26 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 24 May 2023 15:41:26 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v33] In-Reply-To: <9xoNQZSU369K49QDHssVieF48owK05KpTSRzGgxFDa0=.449d7f8d-cbfa-409f-ae3f-4aeac773cc89@github.com> References: <9xoNQZSU369K49QDHssVieF48owK05KpTSRzGgxFDa0=.449d7f8d-cbfa-409f-ae3f-4aeac773cc89@github.com> Message-ID: <5vfFABBVLZ_Vllbn4BBT7qX-KcMXyBljCwNH8SYXnBQ=.1ab444bc-11fb-4df0-a173-de42e454972f@github.com> On Wed, 24 May 2023 14:34:51 GMT, Roman Kennke wrote: > > > I guess the alternative would be to not put it under flag to begin with. > > > > > > Yes, is there any reason this can't be an improvement in the default case without a flag? > > Dunno. A 7% worst-case performance degradation in the full-GC of Serial (and perhaps G1 and Shenandoah) might be a concern. Also, it is a design principle of the compact headers JEP that all non-trivial changes should be under a runtime flag, where the 'legacy' case is not measurably affected. In this case, I'd prefer to not have a flag. If we can convince ourselves that the impact is indeed only 7% worst case time for Serial and G1 + Shenandoah failure modes. I'd rather change the goals in the JEP to say we are willing to throw Serial GC non-failure full GC modes under the bust and risk that taking a bit longer, with the reasoning that if you rely on that level of latency or GC performance, then maybe Serial GC simply isn't for you. Does that make sense? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1561394556 From kdnilsen at openjdk.org Wed May 24 16:49:07 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 24 May 2023 16:49:07 GMT Subject: RFR: Expand old on demand [v42] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 18:41:49 GMT, Kelvin Nilsen wrote: >> This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. >> >> This PR now passes all GHA pre-integration tests and other internal CI tests. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 104: > 102: > 103: // Returns bytes of old-gen memory consumed by selected aged regions > 104: size_t ShenandoahHeuristics::select_aged_regions(size_t old_available, size_t num_regions, bool preselected_regions[]) { should probably call this preslected_region_candidates[] src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 222: > 220: } > 221: heap->set_pad_for_promote_in_place(promote_in_place_pad); > 222: heap->set_promotion_potential(promo_potential); An issue: if we're skipping aging cycles, this is overly conservative. src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 285: > 283: bool is_candidate; > 284: // This is our candidate for later consideration. > 285: if (is_generational && collection_set->is_preselected(i)) { Can we hoist the testfor is_generational() so we don't ahve to repeat it so many times. src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 342: > 340: } > 341: } > 342: heap->reserve_promotable_humongous_regions(humongous_regions_promoted); would be nice to eliminate this, and do all usage accounting at time regions are re-affiliated (promoted in place). src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 362: > 360: size_t immediate_percent = (total_garbage == 0) ? 0 : (immediate_garbage * 100 / total_garbage); > 361: collection_set->set_immediate_trash(immediate_garbage); > 362: If we could introduce a new phase to do promote-in-place (indepedent of evacuation phase), then we could avoid some redundancy. What's not implemented currently is that if the only reason we did evacuation was for promote in place, we SHOULD not have to update refs. src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.hpp line 176: > 174: virtual void initialize(); > 175: > 176: virtual size_t evac_slack(size_t region_to_be_recycled); should be regions_to_be_recycled src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 933: > 931: } > 932: > 933: void ShenandoahHeapRegion::set_affiliation(ShenandoahAffiliation new_affiliation, bool defer_affiliated_region_count_updates) { Let's think about whether we need this defer_affiliated_region_count_updates variable. (would be cleaner to not have this) src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 1065: > 1063: heap->set_promoted_reserve(promo_reserve); > 1064: > 1065: // TODO: adjust bounds in the free set Remove this comment: the bounds are adjusted already. src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 1073: > 1071: heap->card_scan()->reset_object_range(bottom(), end()); > 1072: heap->card_scan()->mark_range_as_dirty(bottom(), top() - bottom()); > 1073: We could defer this c&f until we start the next old cycle. The only problem is that when we do the next marking cycle, it only c&fs the regions that were in the mixed-evac candidate set. (so an alternative approach would be to tally these up). ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190472804 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190471315 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190473313 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190479134 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190480783 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190486096 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190467176 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190462276 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190464870 From kdnilsen at openjdk.org Wed May 24 16:49:08 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 24 May 2023 16:49:08 GMT Subject: RFR: Expand old on demand [v42] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 23:45:34 GMT, Kelvin Nilsen wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix whitespace > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 362: > >> 360: size_t immediate_percent = (total_garbage == 0) ? 0 : (immediate_garbage * 100 / total_garbage); >> 361: collection_set->set_immediate_trash(immediate_garbage); >> 362: > > If we could introduce a new phase to do promote-in-place (indepedent of evacuation phase), then we could avoid some redundancy. What's not implemented currently is that if the only reason we did evacuation was for promote in place, we SHOULD not have to update refs. Ideal: for abreviated cycles, we introduce a new phase that processes promote-in-palce regions and does not do update refs. > src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.hpp line 176: > >> 174: virtual void initialize(); >> 175: >> 176: virtual size_t evac_slack(size_t region_to_be_recycled); > > should be regions_to_be_recycled also explain what the argument means. > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 933: > >> 931: } >> 932: >> 933: void ShenandoahHeapRegion::set_affiliation(ShenandoahAffiliation new_affiliation, bool defer_affiliated_region_count_updates) { > > Let's think about whether we need this defer_affiliated_region_count_updates variable. > > (would be cleaner to not have this) It may be the reason that if we change the affiliations asynchronousnly during eva, then we have to change the generation capacities as well. During allocations, we ask how much is available in young and/or old, and if we add or remove regions from either generation without adjust capacities, available will be wrong. > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 1073: > >> 1071: heap->card_scan()->reset_object_range(bottom(), end()); >> 1072: heap->card_scan()->mark_range_as_dirty(bottom(), top() - bottom()); >> 1073: > > We could defer this c&f until we start the next old cycle. The only problem is that when we do the next marking cycle, it only c&fs the regions that were in the mixed-evac candidate set. (so an alternative approach would be to tally these up). Would be better to use an existing coalesce and fill function rather than repeating the code here. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190481291 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190486187 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190468710 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1190465175 From kdnilsen at openjdk.org Wed May 24 16:49:12 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 24 May 2023 16:49:12 GMT Subject: RFR: Expand old on demand [v44] In-Reply-To: References: Message-ID: On Sat, 13 May 2023 01:33:27 GMT, Y. Srinivas Ramakrishna wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Report number of old- and young cset regions when preparing to rebuild free set > > src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 206: > >> 204: "milliseconds. Setting this to 0 disables the feature.") \ >> 205: \ >> 206: product(uintx, ShenandoahGuaranteedYoungGCInterval, 30*1000, EXPERIMENTAL, \ > > Why was this made so small? Are there cases where, say, for a quiescent/idle service instance with little or no load, we must still, for some reason, trigger young gc's twice each minute? Let me experiment with reverting to original value. There were times when our heuristics were getting flat-footed and we triggered late after long idle spans, but this "fix" may have just masked a different problem that may now be resolved. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1204323814 From kdnilsen at openjdk.org Wed May 24 16:49:13 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 24 May 2023 16:49:13 GMT Subject: RFR: Expand old on demand [v44] In-Reply-To: References: Message-ID: On Wed, 24 May 2023 14:57:30 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp line 206: >> >>> 204: "milliseconds. Setting this to 0 disables the feature.") \ >>> 205: \ >>> 206: product(uintx, ShenandoahGuaranteedYoungGCInterval, 30*1000, EXPERIMENTAL, \ >> >> Why was this made so small? Are there cases where, say, for a quiescent/idle service instance with little or no load, we must still, for some reason, trigger young gc's twice each minute? > > Let me experiment with reverting to original value. There were times when our heuristics were getting flat-footed and we triggered late after long idle spans, but this "fix" may have just masked a different problem that may now be resolved. I will revert this change. My test on the Extremem phased update workload reveals that this does cause some degenerated cycles that are avoided when we have the shorter interval. Specifically, that workload has a wait-for-start-of-simulation pause that results in a 45 seconds of GC idle time, followed by a late trigger that degenerates during mark and then a back-to-back trigger for the next GC (which is dealing with pent-up demand following the "long" degenerated pause) and this second GC also degenerates during mark. Following these two degenerated GCs, GenShen is able to recalibrate its heuristic triggers and stay ahead of pace throughout the remainder of the 20 minute simulation. I believe the reason we are caught off guard here is because we have not yet "learned" the allocation rate. The first (late) trigger after the 45s idle span is: Free (3096M) is below minimum threshold (3101M) Am reverting this change because I recognize that setting this to 30 seconds is arbitrary and not a robust fix to the bigger triggering problem. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1204498046 From wkemper at openjdk.org Wed May 24 17:17:06 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 24 May 2023 17:17:06 GMT Subject: RFR: Make generational mode experimental Message-ID: Setting `-XX:ShenandoahGCMode=generational` now requires `XX:+UnlockExperimentalVMOptions`. ------------- Commit messages: - Make generational experimental (now requires -XX:+UnlockExperimentalVMOptions) Changes: https://git.openjdk.org/shenandoah/pull/283/files Webrev: https://webrevs.openjdk.org/?repo=shenandoah&pr=283&range=00 Stats: 7 lines in 4 files changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.org/shenandoah/pull/283.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/283/head:pull/283 PR: https://git.openjdk.org/shenandoah/pull/283 From kdnilsen at openjdk.org Wed May 24 17:28:35 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 24 May 2023 17:28:35 GMT Subject: RFR: Make generational mode experimental In-Reply-To: References: Message-ID: On Wed, 24 May 2023 17:09:52 GMT, William Kemper wrote: > Setting `-XX:ShenandoahGCMode=generational` now requires `XX:+UnlockExperimentalVMOptions`. Thanks. ------------- Marked as reviewed by kdnilsen (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/283#pullrequestreview-1442424666 From wkemper at openjdk.org Wed May 24 17:31:35 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 24 May 2023 17:31:35 GMT Subject: RFR: Expand old on demand [v42] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 18:41:49 GMT, Kelvin Nilsen wrote: >> This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. >> >> This PR now passes all GHA pre-integration tests and other internal CI tests. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 332: > 330: > 331: // Return conservative estimate of how much memory can be allocated before we need to start GC > 332: size_t ShenandoahAdaptiveHeuristics::evac_slack(size_t young_regions_to_be_reclaimed) { I don't understand the dimension of time or allocation rate in this function. The heuristic uses this formula to predict _when_ memory will be exhausted. If you just need to know how much memory is available, less any penalties are spike tolerance, then this could be much simpler. src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 560: > 558: proper_unit_for_byte_size(promo_in_place_potential)); > 559: return true; > 560: } else if (mixed_candidates > 0) { This will basically run mixed evacuations back to back until they're finished... is that what we want? It seems aggressive. src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.cpp line 411: > 409: if (_cannot_expand_trigger) { > 410: ShenandoahHeap* heap = ShenandoahHeap::heap(); > 411: ShenandoahOldGeneration* old_gen = heap->old_generation(); This class holds a pointer to the old generation in `_old_generation` - can skip the `ShenandoahHeap::heap()->old_generation()` calls. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1191746601 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1191749729 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1191753809 From wkemper at openjdk.org Wed May 24 17:31:35 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 24 May 2023 17:31:35 GMT Subject: RFR: Expand old on demand [v42] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 23:26:48 GMT, Kelvin Nilsen wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix whitespace > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 104: > >> 102: >> 103: // Returns bytes of old-gen memory consumed by selected aged regions >> 104: size_t ShenandoahHeuristics::select_aged_regions(size_t old_available, size_t num_regions, bool preselected_regions[]) { > > should probably call this preslected_region_candidates[] I'd prefer a name that described why these candidates are "pre-selected". Also not clear why these regions aren't added to the collection set through the same process as other regions. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1204542743 From wkemper at openjdk.org Wed May 24 18:56:35 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 24 May 2023 18:56:35 GMT Subject: Integrated: Make generational mode experimental In-Reply-To: References: Message-ID: On Wed, 24 May 2023 17:09:52 GMT, William Kemper wrote: > Setting `-XX:ShenandoahGCMode=generational` now requires `XX:+UnlockExperimentalVMOptions`. This pull request has now been integrated. Changeset: e33ecfae Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/e33ecfaedff193a62ae4dfebc844b94305a569ad Stats: 7 lines in 4 files changed: 2 ins; 0 del; 5 mod Make generational mode experimental Reviewed-by: kdnilsen ------------- PR: https://git.openjdk.org/shenandoah/pull/283 From kdnilsen at openjdk.org Thu May 25 03:20:31 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 25 May 2023 03:20:31 GMT Subject: RFR: Expand old on demand [v63] In-Reply-To: References: Message-ID: <70BmdacYv_rfcLNdgEQHyvy-UKReJCS2N2eJJA41dXg=.fbf1d7ae-ed3c-4e9c-a3a5-fc4eac53c9b3@github.com> > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: - Allow OOM failures to occur more quickly After a change to increase the defaulit number of consecutive degenerated cycles that can occur before a full gc is triggered, the number of allocation retries that must be completed before throwing an OutOfMemory exception increased significantly. This commit provides a mechanism to more quickly decide to throw OOM. - Revert default value of ShenandoahGuaranteedYoungGCInterval The previous change was somewhat arbitrary and masks a bigger problem with triggering GC following periods of idle activity. An alternative approach to triggering should be developed in the future. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/e02032d3..f6f35154 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=62 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=61-62 Stats: 22 lines in 4 files changed: 18 ins; 0 del; 4 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From tschatzl at openjdk.org Thu May 25 10:52:00 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 25 May 2023 10:52:00 GMT Subject: RFR: 8194823: Serial GC does not account GCs caused by TLAB allocation in GC overhead limit In-Reply-To: References: Message-ID: On Wed, 24 May 2023 10:54:16 GMT, Guoxiong Li wrote: > Hi all, > > This patch enables the gc overhead limit when allocating TLAB in serial gc. > The main modification is at `GenCollectedHeap::allocate_new_tlab` and the other > files only adjust the parameters of the method `allocate_new_tlab`. > > Thanks for the review. > > Best Regards, > -- Guoxiong Lgtm apart from parameter list formatting issues. Please fix before committing. It's a bit unfortunate that support for this feature for one collector causes so many changes everywhere else. (But also indicates an issue with all other collectors not supporting it ;)). Maybe the parameters could be wrapped into something like an "AllocationRequest", but this is a) a separate issue, and b) needs to be discussed first with others. src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 518: > 516: > 517: HeapWord* ParallelScavengeHeap::allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, > 518: bool* gc_overhead_limit_was_exceeded) { Suggestion: HeapWord* ParallelScavengeHeap::allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, bool* gc_overhead_limit_was_exceeded) { Parameter list formatting src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 106: > 104: protected: > 105: HeapWord* allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, > 106: bool* gc_overhead_limit_was_exceeded) override; Suggestion: HeapWord* allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, bool* gc_overhead_limit_was_exceeded) override; Parameter formatting src/hotspot/share/gc/shared/memAllocator.cpp line 325: > 323: size_t min_tlab_size = ThreadLocalAllocBuffer::compute_min_size(_word_size); > 324: mem = Universe::heap()->allocate_new_tlab(min_tlab_size, new_tlab_size, &allocation._allocated_tlab_size, > 325: &allocation._overhead_limit_exceeded); Suggestion: mem = Universe::heap()->allocate_new_tlab(min_tlab_size, new_tlab_size, &allocation._allocated_tlab_size, &allocation._overhead_limit_exceeded); Parameter formatting src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 535: > 533: > 534: HeapWord* allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, > 535: bool* gc_overhead_limit_was_exceeded) override; Suggestion: HeapWord* allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, bool* gc_overhead_limit_was_exceeded) override; src/hotspot/share/gc/x/xCollectedHeap.cpp line 150: > 148: > 149: HeapWord* XCollectedHeap::allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, > 150: bool* gc_overhead_limit_was_exceeded) { Suggestion: HeapWord* XCollectedHeap::allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, bool* gc_overhead_limit_was_exceeded) { src/hotspot/share/gc/z/zCollectedHeap.cpp line 145: > 143: > 144: HeapWord* ZCollectedHeap::allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, > 145: bool* gc_overhead_limit_was_exceeded) { Suggestion: HeapWord* ZCollectedHeap::allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, bool* gc_overhead_limit_was_exceeded) { ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14120#pullrequestreview-1443605666 PR Review Comment: https://git.openjdk.org/jdk/pull/14120#discussion_r1205332523 PR Review Comment: https://git.openjdk.org/jdk/pull/14120#discussion_r1205333052 PR Review Comment: https://git.openjdk.org/jdk/pull/14120#discussion_r1205333593 PR Review Comment: https://git.openjdk.org/jdk/pull/14120#discussion_r1205333968 PR Review Comment: https://git.openjdk.org/jdk/pull/14120#discussion_r1205334232 PR Review Comment: https://git.openjdk.org/jdk/pull/14120#discussion_r1205334518 From gli at openjdk.org Thu May 25 11:29:27 2023 From: gli at openjdk.org (Guoxiong Li) Date: Thu, 25 May 2023 11:29:27 GMT Subject: RFR: 8194823: Serial GC does not account GCs caused by TLAB allocation in GC overhead limit [v2] In-Reply-To: References: Message-ID: > Hi all, > > This patch enables the gc overhead limit when allocating TLAB in serial gc. > The main modification is at `GenCollectedHeap::allocate_new_tlab` and the other > files only adjust the parameters of the method `allocate_new_tlab`. > > Thanks for the review. > > Best Regards, > -- Guoxiong Guoxiong Li has updated the pull request incrementally with one additional commit since the last revision: Fix parameter list formatting issues. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14120/files - new: https://git.openjdk.org/jdk/pull/14120/files/f19d3213..fd9b3969 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14120&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14120&range=00-01 Stats: 18 lines in 6 files changed: 12 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14120/head:pull/14120 PR: https://git.openjdk.org/jdk/pull/14120 From gli at openjdk.org Thu May 25 11:29:28 2023 From: gli at openjdk.org (Guoxiong Li) Date: Thu, 25 May 2023 11:29:28 GMT Subject: RFR: 8194823: Serial GC does not account GCs caused by TLAB allocation in GC overhead limit [v2] In-Reply-To: References: Message-ID: On Thu, 25 May 2023 10:48:49 GMT, Thomas Schatzl wrote: >> Guoxiong Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix parameter list formatting issues. > > Lgtm apart from parameter list formatting issues. Please fix before committing. > > It's a bit unfortunate that support for this feature for one collector causes so many changes everywhere else. (But also indicates an issue with all other collectors not supporting it ;)). Maybe the parameters could be wrapped into something like an "AllocationRequest", but this is a) a separate issue, and b) needs to be discussed first with others. @tschatzl Thanks for the review. I fixed the formatting issues just now. > src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 518: > >> 516: >> 517: HeapWord* ParallelScavengeHeap::allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, >> 518: bool* gc_overhead_limit_was_exceeded) { > > Suggestion: > > HeapWord* ParallelScavengeHeap::allocate_new_tlab(size_t min_size, > size_t requested_size, > size_t* actual_size, > bool* gc_overhead_limit_was_exceeded) { > > Parameter list formatting Fixed. > src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 106: > >> 104: protected: >> 105: HeapWord* allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, >> 106: bool* gc_overhead_limit_was_exceeded) override; > > Suggestion: > > HeapWord* allocate_new_tlab(size_t min_size, > size_t requested_size, > size_t* actual_size, > bool* gc_overhead_limit_was_exceeded) override; > > Parameter formatting Fixed. > src/hotspot/share/gc/shared/memAllocator.cpp line 325: > >> 323: size_t min_tlab_size = ThreadLocalAllocBuffer::compute_min_size(_word_size); >> 324: mem = Universe::heap()->allocate_new_tlab(min_tlab_size, new_tlab_size, &allocation._allocated_tlab_size, >> 325: &allocation._overhead_limit_exceeded); > > Suggestion: > > mem = Universe::heap()->allocate_new_tlab(min_tlab_size, > new_tlab_size, > &allocation._allocated_tlab_size, > &allocation._overhead_limit_exceeded); > > Parameter formatting Fixed. > src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 535: > >> 533: >> 534: HeapWord* allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, >> 535: bool* gc_overhead_limit_was_exceeded) override; > > Suggestion: > > HeapWord* allocate_new_tlab(size_t min_size, > size_t requested_size, > size_t* actual_size, > bool* gc_overhead_limit_was_exceeded) override; Fixed. > src/hotspot/share/gc/x/xCollectedHeap.cpp line 150: > >> 148: >> 149: HeapWord* XCollectedHeap::allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, >> 150: bool* gc_overhead_limit_was_exceeded) { > > Suggestion: > > HeapWord* XCollectedHeap::allocate_new_tlab(size_t min_size, > size_t requested_size, > size_t* actual_size, > bool* gc_overhead_limit_was_exceeded) { Fixed. > src/hotspot/share/gc/z/zCollectedHeap.cpp line 145: > >> 143: >> 144: HeapWord* ZCollectedHeap::allocate_new_tlab(size_t min_size, size_t requested_size, size_t* actual_size, >> 145: bool* gc_overhead_limit_was_exceeded) { > > Suggestion: > > HeapWord* ZCollectedHeap::allocate_new_tlab(size_t min_size, > size_t requested_size, > size_t* actual_size, > bool* gc_overhead_limit_was_exceeded) { Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14120#issuecomment-1562734243 PR Review Comment: https://git.openjdk.org/jdk/pull/14120#discussion_r1205370192 PR Review Comment: https://git.openjdk.org/jdk/pull/14120#discussion_r1205370232 PR Review Comment: https://git.openjdk.org/jdk/pull/14120#discussion_r1205370289 PR Review Comment: https://git.openjdk.org/jdk/pull/14120#discussion_r1205370349 PR Review Comment: https://git.openjdk.org/jdk/pull/14120#discussion_r1205370388 PR Review Comment: https://git.openjdk.org/jdk/pull/14120#discussion_r1205370436 From iwalulya at openjdk.org Thu May 25 13:07:58 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 25 May 2023 13:07:58 GMT Subject: RFR: 8194823: Serial GC does not account GCs caused by TLAB allocation in GC overhead limit [v2] In-Reply-To: References: Message-ID: On Thu, 25 May 2023 11:29:27 GMT, Guoxiong Li wrote: >> Hi all, >> >> This patch enables the gc overhead limit when allocating TLAB in serial gc. >> The main modification is at `GenCollectedHeap::allocate_new_tlab` and the other >> files only adjust the parameters of the method `allocate_new_tlab`. >> >> Thanks for the review. >> >> Best Regards, >> -- Guoxiong > > Guoxiong Li has updated the pull request incrementally with one additional commit since the last revision: > > Fix parameter list formatting issues. Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14120#pullrequestreview-1443854022 From ayang at openjdk.org Thu May 25 13:34:57 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 25 May 2023 13:34:57 GMT Subject: RFR: 8194823: Serial GC does not account GCs caused by TLAB allocation in GC overhead limit [v2] In-Reply-To: References: Message-ID: On Thu, 25 May 2023 11:29:27 GMT, Guoxiong Li wrote: >> Hi all, >> >> This patch enables the gc overhead limit when allocating TLAB in serial gc. >> The main modification is at `GenCollectedHeap::allocate_new_tlab` and the other >> files only adjust the parameters of the method `allocate_new_tlab`. >> >> Thanks for the review. >> >> Best Regards, >> -- Guoxiong > > Guoxiong Li has updated the pull request incrementally with one additional commit since the last revision: > > Fix parameter list formatting issues. Do you have a small example to trigger `java.lang.OutOfMemoryError: GC Overhead Limit Exceeded` for Serial GC. I was under the impression that Serial doesn't support `UseGCOverheadLimit`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14120#issuecomment-1562916142 From wkemper at openjdk.org Thu May 25 15:32:13 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 25 May 2023 15:32:13 GMT Subject: RFR: Expand old on demand [v63] In-Reply-To: <70BmdacYv_rfcLNdgEQHyvy-UKReJCS2N2eJJA41dXg=.fbf1d7ae-ed3c-4e9c-a3a5-fc4eac53c9b3@github.com> References: <70BmdacYv_rfcLNdgEQHyvy-UKReJCS2N2eJJA41dXg=.fbf1d7ae-ed3c-4e9c-a3a5-fc4eac53c9b3@github.com> Message-ID: On Thu, 25 May 2023 03:20:31 GMT, Kelvin Nilsen wrote: >> This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. >> >> This PR now passes all GHA pre-integration tests and other internal CI tests. > > Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: > > - Allow OOM failures to occur more quickly > > After a change to increase the defaulit number of consecutive > degenerated cycles that can occur before a full gc is triggered, > the number of allocation retries that must be completed before throwing > an OutOfMemory exception increased significantly. This commit provides > a mechanism to more quickly decide to throw OOM. > - Revert default value of ShenandoahGuaranteedYoungGCInterval > > The previous change was somewhat arbitrary and masks a bigger problem > with triggering GC following periods of idle activity. An alternative > approach to triggering should be developed in the future. I think it would be simpler/fewer changes to pull full gc count of the existing count in `ShenandoahCollectorPolicy`. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 358: > 356: > 357: // How many full-gc cycles have been completed? > 358: volatile size_t _completed_fullgc_cycles; `ShenandoahCollectorPolicy` already counts full GC cycles (implicit and explicit). We could add a `get_full_gc_count` there without duplicating any functionality. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 397: > 395: > 396: void finish_fullgc() { > 397: Atomic::add(&_completed_fullgc_cycles, (size_t) 1); `Atomic::inc`? Also - this will only be called on a safepoint at the end of full gc? Does it need atomic access? ------------- Changes requested by wkemper (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/248#pullrequestreview-1444165349 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205688396 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205690065 From gli at openjdk.org Thu May 25 15:34:03 2023 From: gli at openjdk.org (Guoxiong Li) Date: Thu, 25 May 2023 15:34:03 GMT Subject: RFR: 8194823: Serial GC does not account GCs caused by TLAB allocation in GC overhead limit [v2] In-Reply-To: References: Message-ID: On Thu, 25 May 2023 13:31:59 GMT, Albert Mingkun Yang wrote: > Do you have a small example to trigger `java.lang.OutOfMemoryError: GC Overhead Limit Exceeded` for Serial GC. I was under the impression that Serial doesn't support `UseGCOverheadLimit`. I re-read the related code and now I think you are right. Currently, only the parallel GC supports `UseGCOverheadLimit`. In detail, the methods `GCOverheadChecker::check_gc_overhead_limit` and `AdaptiveSizePolicy::check_gc_overhead_limit` are only used by `PSScavenge::invoke_no_policy` and `PSParallelCompact::invoke_no_policy` under the condition **UseAdaptiveSizePolicy is true**. And the `UseAdaptiveSizePolicy` is only used by paralled gc, too. Several problems need to be confirmed before continuing the work: The `UseGCOverheadLimit` is only used when `UseAdaptiveSizePolicy` is true. Is it intentional? If it is intentional and only the parallel GC uses the `UseAdaptiveSizePolicy` now, should I remove the `UseGCOverheadLimit` related code in serial GC? Or we should implement the feature about `UseAdaptiveSizePolicy` and `UseGCOverheadLimit` in serial GC (seems a large change) ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14120#issuecomment-1563110554 From kdnilsen at openjdk.org Thu May 25 16:37:31 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 25 May 2023 16:37:31 GMT Subject: RFR: Expand old on demand [v64] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Remove deprecated code This removes code that is no longer needed in ShenandoahVerifier. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/f6f35154..94040c93 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=63 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=62-63 Stats: 76 lines in 3 files changed: 0 ins; 68 del; 8 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From kdnilsen at openjdk.org Thu May 25 16:50:02 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 25 May 2023 16:50:02 GMT Subject: RFR: Expand old on demand [v63] In-Reply-To: References: <70BmdacYv_rfcLNdgEQHyvy-UKReJCS2N2eJJA41dXg=.fbf1d7ae-ed3c-4e9c-a3a5-fc4eac53c9b3@github.com> Message-ID: On Thu, 25 May 2023 15:27:02 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with two additional commits since the last revision: >> >> - Allow OOM failures to occur more quickly >> >> After a change to increase the defaulit number of consecutive >> degenerated cycles that can occur before a full gc is triggered, >> the number of allocation retries that must be completed before throwing >> an OutOfMemory exception increased significantly. This commit provides >> a mechanism to more quickly decide to throw OOM. >> - Revert default value of ShenandoahGuaranteedYoungGCInterval >> >> The previous change was somewhat arbitrary and masks a bigger problem >> with triggering GC following periods of idle activity. An alternative >> approach to triggering should be developed in the future. > > src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 358: > >> 356: >> 357: // How many full-gc cycles have been completed? >> 358: volatile size_t _completed_fullgc_cycles; > > `ShenandoahCollectorPolicy` already counts full GC cycles (implicit and explicit). We could add a `get_full_gc_count` there without duplicating any functionality. good idea. I searched around and couldn't figure out where this was. I'll pursue this simplification. > src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 397: > >> 395: >> 396: void finish_fullgc() { >> 397: Atomic::add(&_completed_fullgc_cycles, (size_t) 1); > > `Atomic::inc`? Also - this will only be called on a safepoint at the end of full gc? Does it need atomic access? Thanks for looking at this. This is only written from within full gc. Maybe declaring the variable volatile is sufficient to assure that the "many" mutator threads that access the variable will see any change written by full-gc control thread. I followed the Atomic pattern out of an abundance of caution. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205783818 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205783367 From kdnilsen at openjdk.org Thu May 25 18:29:57 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 25 May 2023 18:29:57 GMT Subject: RFR: Expand old on demand [v42] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 23:23:41 GMT, Kelvin Nilsen wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix whitespace > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 222: > >> 220: } >> 221: heap->set_pad_for_promote_in_place(promote_in_place_pad); >> 222: heap->set_promotion_potential(promo_potential); > > An issue: if we're skipping aging cycles, this is overly conservative. We have changed the accumulation of promo_potential to not include regions that are "tenure_age - 1" if this is not an aging cycle. > src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 1065: > >> 1063: heap->set_promoted_reserve(promo_reserve); >> 1064: >> 1065: // TODO: adjust bounds in the free set > > Remove this comment: the bounds are adjusted already. This comment has been removed. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205874718 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205871385 From kdnilsen at openjdk.org Thu May 25 18:30:00 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 25 May 2023 18:30:00 GMT Subject: RFR: Expand old on demand [v42] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 23:17:46 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 933: >> >>> 931: } >>> 932: >>> 933: void ShenandoahHeapRegion::set_affiliation(ShenandoahAffiliation new_affiliation, bool defer_affiliated_region_count_updates) { >> >> Let's think about whether we need this defer_affiliated_region_count_updates variable. >> >> (would be cleaner to not have this) > > It may be the reason that if we change the affiliations asynchronousnly during eva, then we have to change the generation capacities as well. > > During allocations, we ask how much is available in young and/or old, and if we add or remove regions from either generation without adjust capacities, available will be wrong. We have refactored this code so that we no longer need the defer_affiliated_region_count_updates argument. It has been removed. >> src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp line 1073: >> >>> 1071: heap->card_scan()->reset_object_range(bottom(), end()); >>> 1072: heap->card_scan()->mark_range_as_dirty(bottom(), top() - bottom()); >>> 1073: >> >> We could defer this c&f until we start the next old cycle. The only problem is that when we do the next marking cycle, it only c&fs the regions that were in the mixed-evac candidate set. (so an alternative approach would be to tally these up). > > Would be better to use an existing coalesce and fill function rather than repeating the code here. I've placed a TODO comment to replace this code with an existing coalesce-and-fill function. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205873779 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205872835 From kdnilsen at openjdk.org Thu May 25 19:09:33 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 25 May 2023 19:09:33 GMT Subject: RFR: Expand old on demand [v42] In-Reply-To: References: Message-ID: On Wed, 24 May 2023 17:25:14 GMT, William Kemper wrote: >> src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 104: >> >>> 102: >>> 103: // Returns bytes of old-gen memory consumed by selected aged regions >>> 104: size_t ShenandoahHeuristics::select_aged_regions(size_t old_available, size_t num_regions, bool preselected_regions[]) { >> >> should probably call this preslected_region_candidates[] > > I'd prefer a name that described why these candidates are "pre-selected". Also not clear why these regions aren't added to the collection set through the same process as other regions. I've changed the name of preselected_regions[] to candidate_regions_for_promotion_by_copy[]. I've also added a comment to explain (and motivate) the behavior of this function. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205908284 From kdnilsen at openjdk.org Thu May 25 19:16:30 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 25 May 2023 19:16:30 GMT Subject: RFR: Expand old on demand [v42] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 23:41:39 GMT, Kelvin Nilsen wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix whitespace > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 342: > >> 340: } >> 341: } >> 342: heap->reserve_promotable_humongous_regions(humongous_regions_promoted); > > would be nice to eliminate this, and do all usage accounting at time regions are re-affiliated (promoted in place). We may be able to simplify or remove these four reserve_promotable functions in the future. But we need them for now. They identify what work needs to be done during evacuation, and may feed into decisions about how to implement abbreviated gc cycles. When I originally proposed to eliminate these functions, I was thinking we only needed this for updating generation sizes at the end of the GC cycle, but I was wrong. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205913853 From kdnilsen at openjdk.org Thu May 25 19:16:31 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 25 May 2023 19:16:31 GMT Subject: RFR: Expand old on demand [v42] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 23:46:46 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 362: >> >>> 360: size_t immediate_percent = (total_garbage == 0) ? 0 : (immediate_garbage * 100 / total_garbage); >>> 361: collection_set->set_immediate_trash(immediate_garbage); >>> 362: >> >> If we could introduce a new phase to do promote-in-place (indepedent of evacuation phase), then we could avoid some redundancy. What's not implemented currently is that if the only reason we did evacuation was for promote in place, we SHOULD not have to update refs. > > Ideal: for abreviated cycles, we introduce a new phase that processes promote-in-palce regions and does not do update refs. Development on this improvement is underway. The improvement will be incorporated after this PR is integrated. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205914834 From kdnilsen at openjdk.org Thu May 25 19:52:01 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 25 May 2023 19:52:01 GMT Subject: RFR: Expand old on demand [v42] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 22:25:22 GMT, William Kemper wrote: >> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix whitespace > > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 332: > >> 330: >> 331: // Return conservative estimate of how much memory can be allocated before we need to start GC >> 332: size_t ShenandoahAdaptiveHeuristics::evac_slack(size_t young_regions_to_be_reclaimed) { > > I don't understand the dimension of time or allocation rate in this function. The heuristic uses this formula to predict _when_ memory will be exhausted. If you just need to know how much memory is available, less any penalties are spike tolerance, then this could be much simpler. It's not strictly just the amount of memory available. It's the amount of memory that can be allocated before the next GC trigger. The next GC trigger is supposed to happen with enough additional allocation runway that we can finish that GC before the runway is exhausted. I'm changing the name of the function and adding some comments to hopefully make all of this more clear. > src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 560: > >> 558: proper_unit_for_byte_size(promo_in_place_potential)); >> 559: return true; >> 560: } else if (mixed_candidates > 0) { > > This will basically run mixed evacuations back to back until they're finished... is that what we want? It seems aggressive. Yes. I think it's what we want. It's basically our "compensation" for the fact that we do not have a truly concurrent old-gen collector. If the old-collector threads were concurrent, they would be consuming all available idle cycles to take care of the mixed evacuations. Certainly, this is an area that we can experiment with and auto-tune if appropriate. My experiments with Phased-Updates workload suggested that if we have a bunch of mixed evacuation work to be done, we want to get it done as possible. Otherwise, we can get caught flat footed if mutator demand for new allocations spikes. > src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.cpp line 411: > >> 409: if (_cannot_expand_trigger) { >> 410: ShenandoahHeap* heap = ShenandoahHeap::heap(); >> 411: ShenandoahOldGeneration* old_gen = heap->old_generation(); > > This class holds a pointer to the old generation in `_old_generation` - can skip the `ShenandoahHeap::heap()->old_generation()` calls. Thanks. Replacing throughout. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205927683 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205930119 PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205934980 From kdnilsen at openjdk.org Thu May 25 19:52:04 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 25 May 2023 19:52:04 GMT Subject: RFR: Expand old on demand [v42] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 23:58:51 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.hpp line 176: >> >>> 174: virtual void initialize(); >>> 175: >>> 176: virtual size_t evac_slack(size_t region_to_be_recycled); >> >> should be regions_to_be_recycled > > also explain what the argument means. I'm renaming evac_slack() to bytes_of_allocation_runway_before_gc_trigger(). I'm renaming the argument young_regions_to_be_reclaimed. I'm adding comments to improve the description of this function and its argument. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205926157 From kdnilsen at openjdk.org Thu May 25 19:52:07 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 25 May 2023 19:52:07 GMT Subject: RFR: Expand old on demand [v63] In-Reply-To: References: <70BmdacYv_rfcLNdgEQHyvy-UKReJCS2N2eJJA41dXg=.fbf1d7ae-ed3c-4e9c-a3a5-fc4eac53c9b3@github.com> Message-ID: On Thu, 25 May 2023 16:47:15 GMT, Kelvin Nilsen wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 358: >> >>> 356: >>> 357: // How many full-gc cycles have been completed? >>> 358: volatile size_t _completed_fullgc_cycles; >> >> `ShenandoahCollectorPolicy` already counts full GC cycles (implicit and explicit). We could add a `get_full_gc_count` there without duplicating any functionality. > > good idea. I searched around and couldn't figure out where this was. I'll pursue this simplification. I've made this change as suggested by you. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205944052 From kdnilsen at openjdk.org Thu May 25 20:15:05 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 25 May 2023 20:15:05 GMT Subject: RFR: Expand old on demand [v65] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Respond to reviewer feedback This is mostly tidy and cleanup. ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/94040c93..da0faa99 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=64 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=63-64 Stats: 85 lines in 12 files changed: 37 ins; 15 del; 33 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From ysr at openjdk.org Thu May 25 23:30:00 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 25 May 2023 23:30:00 GMT Subject: RFR: Expand old on demand [v44] In-Reply-To: References: Message-ID: On Wed, 24 May 2023 16:46:02 GMT, Kelvin Nilsen wrote: >> Let me experiment with reverting to original value. There were times when our heuristics were getting flat-footed and we triggered late after long idle spans, but this "fix" may have just masked a different problem that may now be resolved. > > I will revert this change. My test on the Extremem phased update workload reveals that this does cause some degenerated cycles that are avoided when we have the shorter interval. Specifically, that workload has a wait-for-start-of-simulation pause that results in a 45 seconds of GC idle time, followed by a late trigger that degenerates during mark and then a back-to-back trigger for the next GC (which is dealing with pent-up demand following the "long" degenerated pause) and this second GC also degenerates during mark. Following these two degenerated GCs, GenShen is able to recalibrate its heuristic triggers and stay ahead of pace throughout the remainder of the 20 minute simulation. > > I believe the reason we are caught off guard here is because we have not yet "learned" the allocation rate. The first (late) trigger after the 45s idle span is: Free (3096M) is below minimum threshold (3101M) > > Am reverting this change because I recognize that setting this to 30 seconds is arbitrary and not a robust fix to the bigger triggering problem. Thanks that makes sense. In the fullness of time (not now), we should consider why we might even need a minor gc cycle occurring even at the rate of once every five minutes even when, say, there is no apparent allocation load. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1205810962 From ysr at openjdk.org Thu May 25 23:29:59 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 25 May 2023 23:29:59 GMT Subject: RFR: Expand old on demand [v60] In-Reply-To: References: Message-ID: On Tue, 23 May 2023 23:25:42 GMT, Kelvin Nilsen wrote: >> This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. >> >> This PR now passes all GHA pre-integration tests and other internal CI tests. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Fix argument types to not loose precision src/hotspot/share/gc/shenandoah/shenandoahCollectionSet.cpp line 100: > 98: _young_bytes_to_evacuate += live; > 99: _young_available_bytes_collected += free; > 100: if (r->age() >= InitialTenuringThreshold) { I believe this is called in non-generational case as well (based on testing with tenuring threshold work I am doing). Does the logic work out right for the values of InitialTenuringThreshold and ("young") region age there? I am guessing it doesn't matter because perhaps `_young_bytes_to_promote` isn't used in the non-generational case. Might be good to elide this budgeting code in the non-generational case entirely if it isn't used there. ------------- PR Review Comment: https://git.openjdk.org/shenandoah/pull/248#discussion_r1203291405 From ysr at openjdk.org Thu May 25 23:29:56 2023 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Thu, 25 May 2023 23:29:56 GMT Subject: RFR: Expand old on demand [v65] In-Reply-To: References: Message-ID: On Thu, 25 May 2023 20:15:05 GMT, Kelvin Nilsen wrote: >> This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. >> >> This PR now passes all GHA pre-integration tests and other internal CI tests. > > Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: > > Respond to reviewer feedback > > This is mostly tidy and cleanup. Let's get this integrated as is, without doing any further extensive refactorings at this stage, and we can deal with more detailed or low-level review feedback in the fullness of time. Such review comments may appear in the future and can be fixed later. Thanks for the improvements in performance that these changes provided! ------------- Marked as reviewed by ysr (Committer). PR Review: https://git.openjdk.org/shenandoah/pull/248#pullrequestreview-1440821330 From ayang at openjdk.org Fri May 26 09:40:57 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 26 May 2023 09:40:57 GMT Subject: RFR: 8194823: Serial GC does not account GCs caused by TLAB allocation in GC overhead limit [v2] In-Reply-To: References: Message-ID: On Thu, 25 May 2023 15:31:29 GMT, Guoxiong Li wrote: > The UseGCOverheadLimit is only used when UseAdaptiveSizePolicy is true. Is it intentional? I can't see any dependency btw them after skimming through their specification. Worth investigation in its own ticket/PR. Some background which might be useful, "UseGCOverheadLimit can work independently of UseAdaptiveSizePolicy" from https://bugs.openjdk.org/browse/JDK-8212206 > should I remove the UseGCOverheadLimit related code in serial GC? I am leaned towards removing it, as it is effectively dead code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14120#issuecomment-1564108773 From coleenp at openjdk.org Fri May 26 11:48:17 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 26 May 2023 11:48:17 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v47] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 14:37:18 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The c... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Specialize full-GC loops to get UseAltGCForwarding flag check out of hot paths Is it 7% slower when not testing the flag? FTR - I ran this patch and the https://github.com/openjdk/jdk/pull/13779 with the flag on and it passes tier 1-4. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1564267051 From kdnilsen at openjdk.org Fri May 26 15:01:27 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 26 May 2023 15:01:27 GMT Subject: RFR: Expand old on demand [v66] In-Reply-To: References: Message-ID: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision: Fix errors in implementation of get_fullgc_count() ------------- Changes: - all: https://git.openjdk.org/shenandoah/pull/248/files - new: https://git.openjdk.org/shenandoah/pull/248/files/da0faa99..623edab2 Webrevs: - full: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=65 - incr: https://webrevs.openjdk.org/?repo=shenandoah&pr=248&range=64-65 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/shenandoah/pull/248.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/248/head:pull/248 PR: https://git.openjdk.org/shenandoah/pull/248 From wkemper at openjdk.org Fri May 26 15:25:28 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 26 May 2023 15:25:28 GMT Subject: Integrated: Merge openjdk/jdk:master Message-ID: This merges tag jdk-21+24 ------------- Commit messages: - Merge tag 'jdk-21+24' into merge-jdk-21-24 - 8179502: Enhance OCSP, CRL and Certificate Fetch Timeouts - 8306698: Add overloads to MethodTypeDesc::of - 8301154: SunPKCS11 KeyStore deleteEntry results in dangling PrivateKey entries - 8308716: ProblemList java/util/concurrent/ScheduledThreadPoolExecutor/BasicCancelTest.java with genzgc on windows-x64 - 8303942: os::write should write completely - 8306706: Support out-of-line code generation for MachNodes - 8308016: Use snippets in java.io package - 8308116: jdk.test.lib.compiler.InMemoryJavaCompiler.compile does not close files - 8307523: [vectorapi] Optimize MaskFromLongBenchmark.java - ... and 85 more: https://git.openjdk.org/shenandoah/compare/e33ecfae...c617ce32 The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.org/?repo=shenandoah&pr=284&range=00.0 - openjdk/jdk:master: https://webrevs.openjdk.org/?repo=shenandoah&pr=284&range=00.1 Changes: https://git.openjdk.org/shenandoah/pull/284/files Stats: 30195 lines in 852 files changed: 18155 ins; 6937 del; 5103 mod Patch: https://git.openjdk.org/shenandoah/pull/284.diff Fetch: git fetch https://git.openjdk.org/shenandoah.git pull/284/head:pull/284 PR: https://git.openjdk.org/shenandoah/pull/284 From wkemper at openjdk.org Fri May 26 15:25:29 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 26 May 2023 15:25:29 GMT Subject: Integrated: Merge openjdk/jdk:master In-Reply-To: References: Message-ID: On Fri, 26 May 2023 15:16:19 GMT, William Kemper wrote: > This merges tag jdk-21+24 This pull request has now been integrated. Changeset: 8d4f58c9 Author: William Kemper URL: https://git.openjdk.org/shenandoah/commit/8d4f58c9004f5e8da356b2561bf88a07772de1d5 Stats: 30195 lines in 852 files changed: 18155 ins; 6937 del; 5103 mod Merge openjdk/jdk:master ------------- PR: https://git.openjdk.org/shenandoah/pull/284 From kdnilsen at openjdk.org Fri May 26 18:43:32 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Fri, 26 May 2023 18:43:32 GMT Subject: Integrated: Expand old on demand In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 18:13:49 GMT, Kelvin Nilsen wrote: > This PR describes several proposed changes to dynamically adjust the sizes of old-gen and young-gen. In general, the objective is to keep old-gen as small as possible so that there is an abundance of memory available for the young-gen allocation runway. > > This PR now passes all GHA pre-integration tests and other internal CI tests. This pull request has now been integrated. Changeset: 57fdc186 Author: Kelvin Nilsen URL: https://git.openjdk.org/shenandoah/commit/57fdc1865d89b0f82d8601a813b9e9eb2d8abec2 Stats: 3502 lines in 37 files changed: 2023 ins; 969 del; 510 mod Expand old on demand Reviewed-by: ysr ------------- PR: https://git.openjdk.org/shenandoah/pull/248 From gli at openjdk.org Sat May 27 11:02:55 2023 From: gli at openjdk.org (Guoxiong Li) Date: Sat, 27 May 2023 11:02:55 GMT Subject: RFR: 8194823: Serial GC does not account GCs caused by TLAB allocation in GC overhead limit [v2] In-Reply-To: References: Message-ID: <6siXqALhd9c5Ej9TqoK4h4NmApvvc5T2RmfBPrnjhXA=.6b7758f2-bbdf-4123-a83b-c0e54aa89549@github.com> On Fri, 26 May 2023 09:38:25 GMT, Albert Mingkun Yang wrote: > I can't see any dependency btw them after skimming through their specification. Worth investigation in its own ticket/PR. > > Some background which might be useful, "UseGCOverheadLimit can work independently of UseAdaptiveSizePolicy" from https://bugs.openjdk.org/browse/JDK-8212206 Filed https://bugs.openjdk.org/browse/JDK-8308983 to follow up. > I am leaned towards removing it, as it is effectively dead code. I also think it is good to remove it. If hearing no objection in the next several days, I will submit another issue&PR to remove it and close this one. Or I should remove it in this PR? Need to be confirmed here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14120#issuecomment-1565350288 From ayang at openjdk.org Mon May 29 08:13:57 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 29 May 2023 08:13:57 GMT Subject: RFR: 8194823: Serial GC does not account GCs caused by TLAB allocation in GC overhead limit [v2] In-Reply-To: References: Message-ID: <7PQoFkIc--n8CWcJwLSksiSaIs1wk-MXDS39QsuLLQU=.3e8d1d80-12a6-4817-af60-0af3df248988@github.com> On Thu, 25 May 2023 11:29:27 GMT, Guoxiong Li wrote: >> Hi all, >> >> This patch enables the gc overhead limit when allocating TLAB in serial gc. >> The main modification is at `GenCollectedHeap::allocate_new_tlab` and the other >> files only adjust the parameters of the method `allocate_new_tlab`. >> >> Thanks for the review. >> >> Best Regards, >> -- Guoxiong > > Guoxiong Li has updated the pull request incrementally with one additional commit since the last revision: > > Fix parameter list formatting issues. I think it's best to do the removing in another ticket/PR and link the two tickets. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14120#issuecomment-1566736064 From jsjolen at openjdk.org Mon May 29 11:02:27 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 29 May 2023 11:02:27 GMT Subject: RFR: 8309044: Replace NULL with nullptr, final sweep of hotspot code Message-ID: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> A final sweep of Hotspot to remove all re-added NULLs. With only 110 changes I'd appreciate if this was considered trivial. ------------- Commit messages: - Fix remaining work Changes: https://git.openjdk.org/jdk/pull/14198/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14198&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309044 Stats: 110 lines in 63 files changed: 0 ins; 0 del; 110 mod Patch: https://git.openjdk.org/jdk/pull/14198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14198/head:pull/14198 PR: https://git.openjdk.org/jdk/pull/14198 From jsjolen at openjdk.org Mon May 29 11:02:40 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 29 May 2023 11:02:40 GMT Subject: RFR: 8309044: Replace NULL with nullptr, final sweep of hotspot code In-Reply-To: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> References: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> Message-ID: On Mon, 29 May 2023 10:09:15 GMT, Johan Sj?len wrote: > A final sweep of Hotspot to remove all re-added NULLs. With only 110 changes I'd appreciate if this was considered trivial. All of the stuff to actually keep. src/hotspot/cpu/aarch64/jvmciCodeInstaller_aarch64.cpp line 125: > 123: > 124: void CodeInstaller::pd_relocate_JavaMethod(CodeBuffer &cbuf, methodHandle& method, jint pc_offset, JVMCI_TRAPS) { > 125: NativeCall* call = nullptr; T src/hotspot/cpu/aarch64/jvmciCodeInstaller_aarch64.cpp line 158: > 156: // Check for proper post_call_nop > 157: NativePostCallNop* nop = nativePostCallNop_at(call->next_instruction_address()); > 158: if (nop == nullptr) { T src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 2297: > 2295: __ ld_ptr(method, array_base_offset + in_bytes(ResolvedIndyEntry::method_offset()), cache); > 2296: > 2297: // The invokedynamic is unresolved iff method is null T src/hotspot/cpu/riscv/gc/shared/barrierSetAssembler_riscv.cpp line 374: > 372: // Make sure klass is 'reasonable', which is not zero. > 373: __ load_klass(obj, obj, tmp1); // get klass > 374: __ beqz(obj, error); // if klass is null it is broken T src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4019: > 4017: StubRoutines::_forward_exception_entry = generate_forward_exception(); > 4018: > 4019: if (UnsafeCopyMemory::_table == nullptr) { T src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4077: > 4075: > 4076: BarrierSetNMethod* bs_nm = BarrierSet::barrier_set()->barrier_set_nmethod(); > 4077: if (bs_nm != nullptr) { T src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.cpp line 489: > 487: __ load_klass(obj, obj, tmp1); // get klass > 488: __ testptr(obj, obj); > 489: __ jcc(Assembler::zero, error); // if klass is null it is broken T src/hotspot/cpu/x86/interp_masm_x86.cpp line 303: > 301: jcc(Assembler::equal, L); > 302: stop("InterpreterMacroAssembler::call_VM_base:" > 303: " last_sp != null"); T src/hotspot/cpu/x86/jvmciCodeInstaller_x86.cpp line 191: > 189: // Check for proper post_call_nop > 190: NativePostCallNop* nop = nativePostCallNop_at(call->next_instruction_address()); > 191: if (nop == nullptr) { T src/hotspot/share/adlc/output_c.cpp line 279: > 277: int max_stage = 0; > 278: i = 0; > 279: for (pipeline->_reslist.reset(); (resource = pipeline->_reslist.iter()) != nullptr;) { T src/hotspot/share/adlc/output_c.cpp line 305: > 303: templen = 0; > 304: i = 0; > 305: for (pipeline->_reslist.reset(); (resource = pipeline->_reslist.iter()) != nullptr;) { T src/hotspot/share/adlc/output_c.cpp line 368: > 366: const char* resource; > 367: i = 0; > 368: for (pipeline->_reslist.reset(); (resource = pipeline->_reslist.iter()) != nullptr;) { T src/hotspot/share/adlc/output_c.cpp line 393: > 391: > 392: i = 0; > 393: for (pipeline->_reslist.reset(); (resource = pipeline->_reslist.iter()) != nullptr;) { T src/hotspot/share/adlc/output_c.cpp line 1009: > 1007: const char* resource; > 1008: i = 0; > 1009: for (_pipeline->_reslist.reset(); (resource = _pipeline->_reslist.iter()) != nullptr;) { T src/hotspot/share/cds/filemap.cpp line 363: > 361: > 362: void SharedClassPathEntry::copy_from(SharedClassPathEntry* ent, ClassLoaderData* loader_data, TRAPS) { > 363: assert(ent != nullptr, "sanity"); T src/hotspot/share/classfile/stringTable.hpp line 150: > 148: static size_t shared_entry_count() NOT_CDS_JAVA_HEAP_RETURN_(0); > 149: static void allocate_shared_strings_array(TRAPS) NOT_CDS_JAVA_HEAP_RETURN; > 150: static oop init_shared_table(const DumpedInternedStrings* dumped_interned_strings) NOT_CDS_JAVA_HEAP_RETURN_(nullptr); T src/hotspot/share/code/compiledIC.hpp line 93: > 91: CompiledICHolder* claim_cached_icholder() { > 92: assert(_is_icholder, ""); > 93: assert(_cached_value != nullptr, "must be non-null"); T src/hotspot/share/code/compiledIC.hpp line 342: > 340: // Code > 341: > 342: // Returns null if CodeBuffer::expand fails T src/hotspot/share/compiler/compileBroker.cpp line 388: > 386: > 387: MonitorLocker locker(MethodCompileQueue_lock); > 388: // If _first is null we have no more compile jobs. There are two reasons for T src/hotspot/share/gc/x/xBarrier.cpp line 242: > 240: oop XBarrier::weak_load_barrier_on_oop_field_preloaded(volatile narrowOop* p, oop o) { > 241: ShouldNotReachHere(); > 242: return nullptr; T src/hotspot/share/gc/x/xBarrierSet.inline.hpp line 187: > 185: // No check cast, bulk barrier and bulk copy > 186: XBarrier::load_barrier_on_oop_array(src, length); > 187: return Raw::oop_arraycopy_in_heap(nullptr, 0, src, nullptr, 0, dst, length); T src/hotspot/share/gc/x/xPageTable.inline.hpp line 43: > 41: inline bool XPageTableIterator::next(XPage** page) { > 42: for (XPage* entry; _iter.next(&entry);) { > 43: if (entry != nullptr && entry != _prev) { T src/hotspot/share/gc/z/zHeap.cpp line 383: > 381: > 382: if (addr == zaddress::null) { > 383: st->print_raw_cr("null"); T src/hotspot/share/gc/z/zHeap.cpp line 438: > 436: > 437: if (addr == zaddress::null) { > 438: st->print_raw_cr("null"); T src/hotspot/share/interpreter/linkResolver.cpp line 816: > 814: st->print("%s%s, compile-time-class:%s, method:%s, method_holder:%s, access_flags: ", > 815: prefix, > 816: (klass == nullptr ? "" : klass->internal_name()), T src/hotspot/share/jfr/dcmd/jfrDcmds.hpp line 162: > 160: } > 161: static const JavaPermission permission() { > 162: JavaPermission p = {"java.lang.management.ManagementPermission", "monitor", nullptr}; T src/hotspot/share/jfr/dcmd/jfrDcmds.hpp line 187: > 185: } > 186: static const JavaPermission permission() { > 187: JavaPermission p = {"java.lang.management.ManagementPermission", "monitor", nullptr}; T src/hotspot/share/jfr/recorder/checkpoint/types/jfrThreadState.cpp line 125: > 123: } > 124: } > 125: assert(name_str != nullptr, "unexpected null thread name"); T src/hotspot/share/jfr/recorder/repository/jfrRepository.cpp line 147: > 145: const char* const canonical_chunk_path = JfrJavaSupport::c_str(path, jt); > 146: if (nullptr == canonical_chunk_path && !_chunkwriter->is_valid()) { > 147: // new output is null and current output is null T src/hotspot/share/jfr/recorder/stringpool/jfrStringPool.cpp line 119: > 117: } > 118: release(old, thread); > 119: return new_buffer; // might be null T src/hotspot/share/jvmci/jvmciEnv.cpp line 366: > 364: JNIAccessMark jni(this, THREAD); > 365: jthrowable ex = jni()->ExceptionOccurred(); > 366: if (ex != nullptr) { T src/hotspot/share/logging/logAsyncWriter.cpp line 195: > 193: if (self->_initialized) { > 194: Atomic::release_store_fence(&AsyncLogWriter::_instance, self); > 195: // All readers of _instance after the fence see non-null. T src/hotspot/share/logging/logConfiguration.cpp line 481: > 479: const char* output_options, > 480: outputStream* errstream) { > 481: assert(errstream != nullptr, "errstream can not be null"); T src/hotspot/share/logging/logConfiguration.hpp line 64: > 62: static bool _async_mode; > 63: > 64: // Create a new output. Returns null if failed. T src/hotspot/share/logging/logMessageBuffer.hpp line 112: > 110: // using set_prefix(). Lines added to the LogMessageBuffer after a prefix > 111: // function has been set will be prefixed automatically. > 112: // Setting this to null will disable prefixing. T src/hotspot/share/logging/logStream.hpp line 103: > 101: : LogStreamImpl(LogTargetHandle(level, LogTagSetMapping::tagset())) {} > 102: > 103: // Constructor to support creation from typed (likely null) pointer. Mostly used by the logging framework. T src/hotspot/share/memory/metaspace.cpp line 878: > 876: "allocation size too large (" SIZE_FORMAT ")", word_size); > 877: > 878: assert(loader_data != nullptr, "Should never pass around a null loader_data. " T src/hotspot/share/memory/metaspace/metachunk.cpp line 280: > 278: > 279: // Test base pointer > 280: assert(base() != nullptr, "Base pointer null"); T src/hotspot/share/memory/metaspace/metaspaceArena.cpp line 346: > 344: > 345: if (p == nullptr) { > 346: UL(info, "allocation failed, returned null."); T src/hotspot/share/memory/universe.cpp line 469: > 467: if (!is_reference_type((BasicType)i)) { > 468: oop m = _basic_type_mirrors[i].resolve(); > 469: assert(m != nullptr, "archived mirrors should not be null"); T src/hotspot/share/memory/virtualspace.cpp line 598: > 596: // Last, desperate try without any placement. > 597: if (_base == nullptr) { > 598: log_trace(gc, heap, coops)("Trying to allocate at address null heap of size " SIZE_FORMAT_X, size + noaccess_prefix); T src/hotspot/share/oops/cpCache.cpp line 888: > 886: const bool has_appendix = appendix.not_null(); > 887: > 888: LogStream* log_stream = nullptr; T src/hotspot/share/oops/cpCache.cpp line 906: > 904: objArrayOop resolved_references = constant_pool()->resolved_references(); > 905: assert(appendix_index >= 0 && appendix_index < resolved_references->length(), "oob"); > 906: assert(resolved_references->obj_at(appendix_index) == nullptr, "init just once"); T src/hotspot/share/oops/cpCache.cpp line 914: > 912: resolved_indy_entry_at(index)->fill_in(adapter, adapter->size_of_parameters(), as_TosState(adapter->result_type()), has_appendix); > 913: > 914: if (log_stream != nullptr) { T src/hotspot/share/prims/jvmtiAgent.cpp line 375: > 373: HandleMark hm(thread); > 374: extern struct JavaVM_ main_vm; > 375: const jint err = (*on_load_entry)(&main_vm, const_cast(agent->options()), nullptr); T src/hotspot/share/prims/jvmtiThreadState.cpp line 248: > 246: > 247: // disable VTMS transitions for one virtual thread > 248: // no-op if thread is non-null and not a virtual thread T src/hotspot/share/prims/whitebox.cpp line 1885: > 1883: InstanceKlass* ik = InstanceKlass::cast(java_lang_Class::as_Klass(JNIHandles::resolve(klass))); > 1884: ConstantPool* cp = ik->constants(); > 1885: if (cp->cache() == nullptr) { T src/hotspot/share/prims/whitebox.cpp line 1894: > 1892: InstanceKlass* ik = InstanceKlass::cast(java_lang_Class::as_Klass(JNIHandles::resolve(klass))); > 1893: ConstantPool* cp = ik->constants(); > 1894: if (cp->cache() == nullptr) { T src/hotspot/share/runtime/fieldDescriptor.cpp line 164: > 162: obj->obj_field(offset())->print_value_on(st); > 163: } else { > 164: st->print("null"); T src/hotspot/share/runtime/fieldDescriptor.cpp line 171: > 169: obj->obj_field(offset())->print_value_on(st); > 170: } else { > 171: st->print("null"); T src/hotspot/share/runtime/globals.hpp line 632: > 630: \ > 631: develop(bool, InterceptOSException, false, \ > 632: "Start debugger when an implicit OS (e.g. null) " \ T src/hotspot/share/runtime/handles.hpp line 71: > 69: protected: > 70: oop obj() const { return _handle == nullptr ? (oop)nullptr : *_handle; } > 71: oop non_null_obj() const { assert(_handle != nullptr, "resolving null handle"); return *_handle; } T src/hotspot/share/runtime/handles.hpp line 147: > 145: protected: \ > 146: type* obj() const { return _value; } \ > 147: type* non_null_obj() const { assert(_value != nullptr, "resolving null _value"); return _value; } \ T src/hotspot/share/runtime/jniHandles.cpp line 381: > 379: JNIHandleBlock* next = block->_next; > 380: Atomic::dec(&_blocks_allocated); > 381: assert(block->pop_frame_link() == nullptr, "pop_frame_link should be null"); T src/hotspot/share/runtime/jniHandles.inline.hpp line 116: > 114: assert(handle != nullptr, "JNI handle should not be null"); > 115: oop result = resolve_impl(handle); > 116: assert(result != nullptr, "null read from jni handle"); T src/hotspot/share/runtime/objectMonitor.cpp line 539: > 537: // Attempt async deflation protocol. > 538: > 539: // Set a null owner to DEFLATER_MARKER to force any contending thread T src/hotspot/share/runtime/objectMonitor.cpp line 564: > 562: if (Atomic::cmpxchg(&_contentions, 0, INT_MIN) != 0) { > 563: // Contentions was no longer 0 so we lost the race since the > 564: // ObjectMonitor is now busy. Restore owner to null if it is T src/hotspot/share/runtime/objectMonitor.cpp line 669: > 667: ss->print("owner=" INTPTR_FORMAT, p2i(owner_raw())); > 668: } else { > 669: // We report null instead of DEFLATER_MARKER here because is_busy() T src/hotspot/share/runtime/os.cpp line 1131: > 1129: // Handle null first, so later checks don't need to protect against it. > 1130: if (addr == nullptr) { > 1131: st->print_cr("0x0 is null"); T src/hotspot/share/runtime/safepoint.cpp line 235: > 233: for (; JavaThread *cur = jtiwh.next(); ) { > 234: ThreadSafepointState *cur_tss = cur->safepoint_state(); > 235: assert(cur_tss->get_next() == nullptr, "Must be null"); T src/hotspot/share/runtime/thread.cpp line 269: > 267: delete metadata_handles(); > 268: > 269: // osthread() can be null, if creation of thread failed. T src/hotspot/share/runtime/thread.hpp line 343: > 341: virtual const char* type_name() const { return "Thread"; } > 342: > 343: // Returns the current thread (ASSERTS if null) T src/hotspot/share/runtime/threadSMR.cpp line 623: > 621: // Shared singleton data for all ThreadsList(0) instances. > 622: // Used by _bootstrap_list to avoid static init time heap allocation. > 623: // No real entries, just the final null terminator. T src/hotspot/share/runtime/threadSMR.cpp line 849: > 847: _protected_java_thread = java_lang_Thread::thread(thread_oop); > 848: assert(_protected_java_thread == nullptr || _tlh.includes(_protected_java_thread), "must be"); > 849: // If we captured a non-null JavaThread* after the _tlh was created T src/hotspot/share/runtime/vmOperations.cpp line 335: > 333: jt->is_exiting() || > 334: jt->is_hidden_from_external_view()) { > 335: // add a null snapshot if skipped T src/hotspot/share/services/threadService.cpp line 1071: > 1069: oop ownerObj = java_util_concurrent_locks_AbstractOwnableSynchronizer::get_owner_threadObj(waitingToLockBlocker); > 1070: currentThread = java_lang_Thread::thread(ownerObj); > 1071: assert(currentThread != nullptr, "AbstractOwnableSynchronizer owning thread is unexpectedly null"); T src/hotspot/share/utilities/concurrentHashTable.inline.hpp line 1295: > 1293: return false; > 1294: } > 1295: assert(_new_table == nullptr || _new_table == POISON_PTR, "Must be null"); T src/hotspot/share/utilities/copy.cpp line 73: > 71: assert(src != nullptr, "address must not be null"); > 72: assert(dst != nullptr, "address must not be null"); > 73: assert(elem_size == 2 || elem_size == 4 || elem_size == 8, T src/hotspot/share/utilities/elfFile.cpp line 1827: > 1825: } > 1826: > 1827: // If result is a null, we do not care about the content of the string being read. T src/hotspot/share/utilities/elfFuncDescTable.hpp line 135: > 133: ~ElfFuncDescTable(); > 134: > 135: // return the function address for the function descriptor at 'index' or null on error T src/hotspot/share/utilities/exceptions.cpp line 138: > 136: // therefore the exception oop should be in the oopmap. > 137: void Exceptions::_throw_oop(JavaThread* thread, const char* file, int line, oop exception) { > 138: assert(exception != nullptr, "exception should not be null"); T src/hotspot/share/utilities/exceptions.cpp line 145: > 143: void Exceptions::_throw(JavaThread* thread, const char* file, int line, Handle h_exception, const char* message) { > 144: ResourceMark rm(thread); > 145: assert(h_exception() != nullptr, "exception should not be null"); T src/hotspot/share/utilities/globalDefinitions.cpp line 162: > 160: static_assert((size_t)HeapWordSize >= sizeof(juint), > 161: "HeapWord should be at least as large as juint"); > 162: static_assert(sizeof(nullptr) == sizeof(char*), "null must be same size as pointer"); T src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 135: > 133: > 134: // gcc warns about applying offsetof() to non-POD object or calculating > 135: // offset directly when base address is null. The -Wno-invalid-offsetof T src/hotspot/share/utilities/linkedlist.hpp line 174: > 172: > 173: virtual void add(LinkedListNode* node) { > 174: assert(node != nullptr, "null pointer"); T src/hotspot/share/utilities/linkedlist.hpp line 388: > 386: > 387: virtual void add(LinkedListNode* node) { > 388: assert(node != nullptr, "null pointer"); T src/hotspot/share/utilities/lockFreeStack.hpp line 79: > 77: > 78: // Atomically removes the top object from this stack and returns a > 79: // pointer to that object, or null if this stack is empty. Acts as a T src/hotspot/share/utilities/lockFreeStack.hpp line 100: > 98: } > 99: > 100: // Atomically exchange the list of elements with null, returning the old T src/hotspot/share/utilities/lockFreeStack.hpp line 146: > 144: bool empty() const { return top() == nullptr; } > 145: > 146: // Return the most recently pushed element, or null if the stack is empty. T src/hotspot/share/utilities/nonblockingQueue.hpp line 48: > 46: // A queue may temporarily appear to be empty even though elements have been > 47: // added and not removed. For example, after running the following program, > 48: // the value of r may be null. T src/hotspot/share/utilities/nonblockingQueue.hpp line 108: > 106: // Thread-safe attempt to remove and return the first object in the queue. > 107: // Returns true if successful. If successful then *node_ptr is the former > 108: // first object, or null if the queue was empty. If unsuccessful, because T src/hotspot/share/utilities/nonblockingQueue.hpp line 114: > 112: inline bool try_pop(T** node_ptr); > 113: > 114: // Thread-safe remove and return the first object in the queue, or null T src/hotspot/share/utilities/nonblockingQueue.hpp line 116: > 114: // Thread-safe remove and return the first object in the queue, or null > 115: // if the queue was empty. This just iterates on try_pop() until it > 116: // succeeds, returning the (possibly null) element obtained from that. T src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 88: > 86: // An append operation atomically exchanges the new tail with the queue tail. > 87: // It then sets the "next" value of the old tail to the head of the list being > 88: // appended. If the old tail is null then the queue was empty, then the T src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 111: > 109: if (old_tail == nullptr) { > 110: // If old_tail is null then the queue was empty, and _head must also be > 111: // null. The correctness of this assertion depends on try_pop clearing T src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 129: > 127: } else { > 128: // A concurrent try_pop has claimed old_tail, so it is no longer in the > 129: // list. The queue was logically empty. _head is either null or T src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 155: > 153: // There are several cases for next_node. > 154: // (1) next_node is the extension of the queue's list. > 155: // (2) next_node is null, because a competing try_pop took old_head. T src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 178: > 176: // report it as such for consistency, though we could report the queue > 177: // was empty. We don't attempt to further help [Clause 2] by also > 178: // trying to set _tail to nullptr, as that would just ensure that one or T src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 194: > 192: // [Clause 2] > 193: // Old_head was the last entry and we've claimed it by setting its next > 194: // value to null. However, this leaves the queue in disarray. Fix up T src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 203: > 201: // also consistent with [Clause 1b]. > 202: > 203: // Attempt to change the queue head from old_head to null. Failure of T src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 208: > 206: Atomic::cmpxchg(&_head, old_head, (T*)nullptr); > 207: > 208: // Attempt to change the queue tail from old_head to null. Failure of T src/hotspot/share/utilities/ostream.cpp line 709: > 707: start_log(); > 708: } else { > 709: // and leave xtty as null T src/hotspot/share/utilities/ostream.cpp line 771: > 769: text->print_raw(p->key()); > 770: text->put('='); > 771: assert(p->value() != nullptr, "p->value() is null"); T src/hotspot/share/utilities/unsigned5.hpp line 327: > 325: Writer(const ARR& array) > 326: : _array(const_cast(array)), _limit_ptr(nullptr), _position(0) { > 327: // Note: if _limit_ptr is null, the ARR& is never reassigned, T src/hotspot/share/utilities/utf8.hpp line 76: > 74: // Utility methods > 75: > 76: // Returns null if 'c' it not found. This only works as long T src/hotspot/share/utilities/vmError.hpp line 45: > 43: static char _detail_msg[1024]; > 44: > 45: static Thread* _thread; // null if it's native thread T ------------- PR Review: https://git.openjdk.org/jdk/pull/14198#pullrequestreview-1449252066 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209159785 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209159872 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209160289 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209160378 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209160527 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209160578 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209161770 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209160686 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209160791 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209161299 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209161367 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209161415 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209161470 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209162033 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209162094 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209162234 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209162319 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209162344 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209162424 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209162602 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209162648 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209162687 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209162835 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209162873 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163006 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163074 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163111 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163185 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163220 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163248 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163287 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163352 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163396 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163558 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163605 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163642 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163724 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163763 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163811 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163873 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209163927 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209164027 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209164087 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209164125 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209164508 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209164848 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209164910 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209164940 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209165038 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209165073 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209165280 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209165354 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209165423 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209165533 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209165626 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209165671 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209165736 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209165819 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209165977 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209166087 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209166196 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209166250 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209166340 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209166433 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209166563 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209166748 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209166817 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209166925 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209166991 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209167061 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209167105 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209167139 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209167208 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209167536 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209167644 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209167683 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209167736 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209167787 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209167918 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209167968 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168014 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168057 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168094 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168142 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168185 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168230 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168252 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168310 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168372 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168414 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168454 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168522 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168564 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168609 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168655 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209168750 From jsjolen at openjdk.org Mon May 29 11:02:42 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 29 May 2023 11:02:42 GMT Subject: RFR: 8309044: Replace NULL with nullptr, final sweep of hotspot code In-Reply-To: References: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> Message-ID: On Mon, 29 May 2023 10:19:02 GMT, Johan Sj?len wrote: >> A final sweep of Hotspot to remove all re-added NULLs. With only 110 changes I'd appreciate if this was considered trivial. > > src/hotspot/share/utilities/globalDefinitions.cpp line 162: > >> 160: static_assert((size_t)HeapWordSize >= sizeof(juint), >> 161: "HeapWord should be at least as large as juint"); >> 162: static_assert(sizeof(nullptr) == sizeof(char*), "null must be same size as pointer"); > > T Add both? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209167397 From amitkumar at openjdk.org Mon May 29 12:27:03 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 29 May 2023 12:27:03 GMT Subject: RFR: 8309044: Replace NULL with nullptr, final sweep of hotspot code In-Reply-To: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> References: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> Message-ID: On Mon, 29 May 2023 10:09:15 GMT, Johan Sj?len wrote: > A final sweep of Hotspot to remove all re-added NULLs. With only 110 changes I'd appreciate if this was considered trivial. not a review, but would you like to check if these could replaced as well :-) ./cpu/ppc/macroAssembler_ppc.hpp:735: void load_klass_check_null(Register dst, Register src, Label* is_null = NULL); ./cpu/ppc/stubGenerator_ppc.cpp:4700: if (UnsafeCopyMemory::_table == NULL) { ./cpu/riscv/codeBuffer_riscv.cpp:74: if (cb->stubs()->maybe_expand_to_ensure_remaining(total_requested_size) && cb->blob() == NULL) { ./share/include/jvm.h:423: * Find a class from a boot class loader. Returns NULL if class not found. ./share/include/cds.h:72: char* _mapped_base; // Actually mapped address (NULL if this region is not mapped). ./share/opto/runtime.cpp:491: fields[TypeFunc::Parms+0] = NULL; // void ------------- PR Comment: https://git.openjdk.org/jdk/pull/14198#issuecomment-1567074248 From stefank at openjdk.org Mon May 29 15:45:04 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 29 May 2023 15:45:04 GMT Subject: RFR: 8309044: Replace NULL with nullptr, final sweep of hotspot code In-Reply-To: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> References: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> Message-ID: <4I1TbflhiAql_f2keh9Pbu7AA0wHem16ntfy8hP-cx4=.ad35b94a-6430-4a7d-87d4-ee019a3962b5@github.com> On Mon, 29 May 2023 10:09:15 GMT, Johan Sj?len wrote: > A final sweep of Hotspot to remove all re-added NULLs. With only 110 changes I'd appreciate if this was considered trivial. Looks good. Though, I'd prefer if we could slightly tweak the following two print lines. src/hotspot/share/gc/z/zHeap.cpp line 383: > 381: > 382: if (addr == zaddress::null) { > 383: st->print_raw_cr("NULL"); I'd prefer if this were left as either NULL or null. src/hotspot/share/gc/z/zHeap.cpp line 438: > 436: > 437: if (addr == zaddress::null) { > 438: st->print_raw_cr("nullptr"); I'd prefer if this were left as either NULL or null. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14198#pullrequestreview-1449662288 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209421642 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209421699 From dholmes at openjdk.org Tue May 30 01:01:13 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 May 2023 01:01:13 GMT Subject: RFR: 8309044: Replace NULL with nullptr, final sweep of hotspot code In-Reply-To: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> References: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> Message-ID: On Mon, 29 May 2023 10:09:15 GMT, Johan Sj?len wrote: > A final sweep of Hotspot to remove all re-added NULLs. With only 110 changes I'd appreciate if this was considered trivial. Looks good. A few suggested changes below. Can we now poison NULL so it can't get reintroduced? Or would that potentially break standard headers? Thanks src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 2297: > 2295: __ ld_ptr(method, array_base_offset + in_bytes(ResolvedIndyEntry::method_offset()), cache); > 2296: > 2297: // The invokedynamic is unresolved iff method is nullptr Suggest: null src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.cpp line 489: > 487: __ load_klass(obj, obj, tmp1); // get klass > 488: __ testptr(obj, obj); > 489: __ jcc(Assembler::zero, error); // if klass is nullptr it is broken suggest: null src/hotspot/share/gc/z/zHeap.cpp line 383: > 381: > 382: if (addr == zaddress::null) { > 383: st->print_raw_cr("nullptr"); Suggest: null ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14198#pullrequestreview-1449932378 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209610406 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209610541 PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209610909 From dholmes at openjdk.org Tue May 30 01:01:19 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 May 2023 01:01:19 GMT Subject: RFR: 8309044: Replace NULL with nullptr, final sweep of hotspot code In-Reply-To: References: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> Message-ID: On Mon, 29 May 2023 10:17:06 GMT, Johan Sj?len wrote: >> A final sweep of Hotspot to remove all re-added NULLs. With only 110 changes I'd appreciate if this was considered trivial. > > src/hotspot/share/runtime/globals.hpp line 632: > >> 630: \ >> 631: develop(bool, InterceptOSException, false, \ >> 632: "Start debugger when an implicit OS (e.g. null) " \ > > T Suggest: null pointer ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209611686 From dholmes at openjdk.org Tue May 30 01:01:16 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 May 2023 01:01:16 GMT Subject: RFR: 8309044: Replace NULL with nullptr, final sweep of hotspot code In-Reply-To: <4I1TbflhiAql_f2keh9Pbu7AA0wHem16ntfy8hP-cx4=.ad35b94a-6430-4a7d-87d4-ee019a3962b5@github.com> References: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> <4I1TbflhiAql_f2keh9Pbu7AA0wHem16ntfy8hP-cx4=.ad35b94a-6430-4a7d-87d4-ee019a3962b5@github.com> Message-ID: <-Fymi8Igyn7pUXLI9SqWDIabCue12XT0Avjgb071gzU=.2bbe930b-0788-47b6-860d-77ae1f9a12e9@github.com> On Mon, 29 May 2023 15:36:36 GMT, Stefan Karlsson wrote: >> A final sweep of Hotspot to remove all re-added NULLs. With only 110 changes I'd appreciate if this was considered trivial. > > src/hotspot/share/gc/z/zHeap.cpp line 438: > >> 436: >> 437: if (addr == zaddress::null) { >> 438: st->print_raw_cr("nullptr"); > > I'd prefer if this were left as either NULL or null. Suggest: null ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14198#discussion_r1209611018 From coleenp at openjdk.org Tue May 30 12:16:25 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 30 May 2023 12:16:25 GMT Subject: RFR: 8305896: Alternative full GC forwarding [v47] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 14:37:18 GMT, Roman Kennke wrote: >> Currently, the full-GC modes of Serial, Shenandoah and G1 GCs are forwarding objects by over-writing the object header with the new object location. Unfortunately, for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) this would not work, because the crucial class information is also stored in the header, and we could no longer iterate over objects until the headers would be restored. Also, the preserved-headers tables would grow quite large. >> >> I propose to use an alternative algorithm for full-GC (sliding-GC) forwarding that uses a special encoding so that the forwarding information fits into the lowest 32 bits of the header. >> >> It exploits the insight that, with sliding GCs, objects from one region will only ever be forwarded to one of two possible target regions. For this to work, we need to divide the heap into equal-sized regions. This is already the case for Shenandoah and G1, and can easily be overlaid for Serial GC, by assuming either the whole heap as a single region (if it fits) or by using SpaceAlignment-sized virtual regions. >> >> We also build and maintain a table that has N elements, where N is the number of regions. Each entry is two addresses, which are the start-address of the possible target regions for each source region. >> >> With this, forwarding information would be encoded like this: >> - Bits 0 and 1: same as before, we put in '11' to indicate that the object is forwarded. >> - Bit 2: Used for 'fallback'-forwarding (see below) >> - Bit 3: Selects the target region 0 or 1. Look up the base address in the table (see above) >> - Bits 4..31 The number of heap words from the target base address >> >> This works well for all sliding GCs in Serial, G1 and Shenandoah. The exception is in G1, there is a special mode called 'serial compaction' which acts as a last-last-ditch effort to squeeze more space out of the heap by re-forwarding the tails of the compaction chains. Unfortunately, this breaks the assumption of the sliding-forwarding-table. When that happens, we initialize a fallback table, which is a simple open hash-table, and set the Bit 2 in the forwarding to indicate that we shall look up the forwardee in the fallback-table. >> >> All the table accesses can be done unsynchronized because: >> - Serial GC is single-threaded anyway >> - In G1 and Shenandoah, GC worker threads divide up the work such that each worker does disjoint sets of regions. >> - G1 serial compaction is single-threaded >> >> The c... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Specialize full-GC loops to get UseAltGCForwarding flag check out of hot paths I ran this patch as of and the #13779 with the flag and it passed tier5-7 linux-x64-debug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13582#issuecomment-1568327358 From kdnilsen at openjdk.org Tue May 30 17:54:18 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 30 May 2023 17:54:18 GMT Subject: RFR: JDK-8307314: Implementation: Generational Shenandoah (Experimental) Message-ID: OpenJDK Colleagues: Please review this proposed integration of Generational mode for Shenandoah GC under https://bugs.openjdk.org/browse/JDK-8307314. Generational mode of Shenandoah is enabled by adding `-XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational` to a command line that already specifies ` -XX:+UseShenandoahGC`. The implementation automatically adjusts the sizes of old generation and young generation to efficiently utilize the entire heap capacity. Generational mode of Shenandoah resembles G1 in the following regards: 1. Old-generation marking runs concurrently during the time that multiple young generation collections run to completion. 2. After old-generation marking completes, we perform a sequence of mixed collections. Each mixed collection combines collection of young generation with evacuation of a portion of the old-generation regions identified for collection based on old-generation marking information. 3. Unlike G1, young-generation collections and evacuations are entirely concurrent, as with single-generation Shenandoah. 4. As with single-generation Shenandoah, there is no explicit notion of eden and survivor space within the young generation. In practice, regions that were most recently allocated tend to have large amounts of garbage and these regions tend to be collected with very little effort. Young-generation objects that survive garbage collection tend to accumulate in regions that hold survivor objects. These regions tend to have smaller amounts of garbage, and are less likely to be collected. If they survive a sufficient number of young-generation collections, the ?survivor? regions are promoted into the old generation. We expect to refine heuristics as we gain experience with more production workloads. In the future, we plan to remove the ?experimental? qualifier from generational mode, at which time we expect that generational mode will become the default mode for Shenandoah. **Testing**: We continuously run jtreg tiers 1-4 + hotspot_gc_shenandoah, gcstress, jck compiler, jck runtime, Dacapo, SpecJBB, SpecVM, Extremem, HyperAlloc, and multiple AWS production workload simulators. We test on Linux x64 and aarch64, Alpine x64 and aarch64, macOS x64 and aarch64, and Windows x64. ------------- Commit messages: - Revert changes to jcheck configuration - Remove early planning docs from PR - Merge remote-tracking branch 'shenandoah/master' into merge-generational-shenandoah - Expand old on demand - Merge openjdk/jdk:master - Make generational mode experimental - Merge openjdk/jdk:master - Improve mixed collection logging - Use soft max capacity only for trigger calculations - Use static assert to validate card offset encoding mask - ... and 281 more: https://git.openjdk.org/jdk/compare/ce5251af...aa85a907 Changes: https://git.openjdk.org/jdk/pull/14185/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14185&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307314 Stats: 20150 lines in 205 files changed: 18237 ins; 904 del; 1009 mod Patch: https://git.openjdk.org/jdk/pull/14185.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14185/head:pull/14185 PR: https://git.openjdk.org/jdk/pull/14185 From jsjolen at openjdk.org Tue May 30 19:15:38 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 30 May 2023 19:15:38 GMT Subject: RFR: 8309044: Replace NULL with nullptr, final sweep of hotspot code [v2] In-Reply-To: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> References: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> Message-ID: > A final sweep of Hotspot to remove all re-added NULLs. With only 110 changes I'd appreciate if this was considered trivial. Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Align - Suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14198/files - new: https://git.openjdk.org/jdk/pull/14198/files/c7e28571..c9d9c30c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14198&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14198&range=00-01 Stats: 9 lines in 8 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/14198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14198/head:pull/14198 PR: https://git.openjdk.org/jdk/pull/14198 From jsjolen at openjdk.org Tue May 30 19:15:41 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 30 May 2023 19:15:41 GMT Subject: RFR: 8309044: Replace NULL with nullptr, final sweep of hotspot code In-Reply-To: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> References: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> Message-ID: On Mon, 29 May 2023 10:09:15 GMT, Johan Sj?len wrote: > A final sweep of Hotspot to remove all re-added NULLs. With only 110 changes I'd appreciate if this was considered trivial. Hi, thank you all for the suggestions! I've now applied them. I'll look at integrating tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14198#issuecomment-1568940197 From kvn at openjdk.org Tue May 30 19:28:07 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 30 May 2023 19:28:07 GMT Subject: RFR: 8309044: Replace NULL with nullptr, final sweep of hotspot code [v2] In-Reply-To: References: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> Message-ID: On Tue, 30 May 2023 19:15:38 GMT, Johan Sj?len wrote: >> A final sweep of Hotspot to remove all re-added NULLs. With only 110 changes I'd appreciate if this was considered trivial. > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Align > - Suggestions Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14198#pullrequestreview-1451670515 From amitkumar at openjdk.org Wed May 31 04:27:04 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 31 May 2023 04:27:04 GMT Subject: RFR: 8309044: Replace NULL with nullptr, final sweep of hotspot code [v2] In-Reply-To: References: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> Message-ID: <1jUN4aGELW2Mc9vIZlta5Ml1wA9aXRYVYkF2DyU1F8w=.8c984fb2-de2b-4526-aa6e-214d97565d3f@github.com> On Tue, 30 May 2023 19:15:38 GMT, Johan Sj?len wrote: >> A final sweep of Hotspot to remove all re-added NULLs. With only 110 changes I'd appreciate if this was considered trivial. > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Align > - Suggestions Thanks for addressing the comment. Looks Good :-) ------------- Marked as reviewed by amitkumar (Author). PR Review: https://git.openjdk.org/jdk/pull/14198#pullrequestreview-1452183165 From jsjolen at openjdk.org Wed May 31 09:23:21 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 31 May 2023 09:23:21 GMT Subject: RFR: 8309044: Replace NULL with nullptr, final sweep of hotspot code [v2] In-Reply-To: References: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> Message-ID: On Tue, 30 May 2023 19:15:38 GMT, Johan Sj?len wrote: >> A final sweep of Hotspot to remove all re-added NULLs. With only 110 changes I'd appreciate if this was considered trivial. > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Align > - Suggestions Right, seems like the Windows CI fails at fetching JTReg. I'm merging this, thank you all for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14198#issuecomment-1569815714 From jsjolen at openjdk.org Wed May 31 09:23:22 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 31 May 2023 09:23:22 GMT Subject: Integrated: 8309044: Replace NULL with nullptr, final sweep of hotspot code In-Reply-To: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> References: <3FoMnGeBp8DqkpVb6YGXKxdPsgGz6ej-jrf2U2stVfU=.56a11e19-38dd-420a-a07d-3b025120f194@github.com> Message-ID: On Mon, 29 May 2023 10:09:15 GMT, Johan Sj?len wrote: > A final sweep of Hotspot to remove all re-added NULLs. With only 110 changes I'd appreciate if this was considered trivial. This pull request has now been integrated. Changeset: 4f161616 Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/4f16161607edbf69f423ced1d3c24f7af058d46b Stats: 114 lines in 67 files changed: 0 ins; 0 del; 114 mod 8309044: Replace NULL with nullptr, final sweep of hotspot code Reviewed-by: stefank, dholmes, kvn, amitkumar ------------- PR: https://git.openjdk.org/jdk/pull/14198