From rcastanedalo at openjdk.org Mon Sep 2 06:38:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 2 Sep 2024 06:38:07 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v12] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - Merge jdk-24+13 - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type' - Remark relation between compiler optimization and barrier filter - Make 'refine_barrier_by_new_val_type' static and its input argument 'const' - Replace 'the null' with 'null' in comment - Remove redundant redefinitions of '__' - Replace 'already dirty' with 'young' in post-barrier fast path comment - Rename g1XChgX to g1GetAndSetX for consistency with Ideal operation names - Pass oldval to the pre-barrier in g1CompareAndExchange/SwapP - Assert that no implicit null checks are generated for memory accesses with barriers - ... and 8 more: https://git.openjdk.org/jdk/compare/52ffcda1...4ee450ad ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/57adcfb0..4ee450ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=10-11 Stats: 30577 lines in 938 files changed: 18592 ins; 8033 del; 3952 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Mon Sep 2 06:38:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 2 Sep 2024 06:38:07 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v11] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 13:23:32 GMT, Feilong Jiang wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with six additional commits since the last revision: >> >> - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type' >> - Remark relation between compiler optimization and barrier filter >> - Make 'refine_barrier_by_new_val_type' static and its input argument 'const' >> - Replace 'the null' with 'null' in comment >> - Remove redundant redefinitions of '__' >> - Replace 'already dirty' with 'young' in post-barrier fast path comment > > risc-v port looks good too. > OK, if there are no objections from @feilongjiang or @snazarkin within a couple of days I will prepare an update to jdk-24+13. @TheRealMDoerr done (commit 4ee450a). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2323921726 From mbaesken at openjdk.org Mon Sep 2 12:52:47 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 2 Sep 2024 12:52:47 GMT Subject: RFR: 8339300: CollectorPolicy.young_scaled_initial_ergo_vm gtest fails on ppc64 based platforms Message-ID: We currently fail in CollectorPolicy.young_scaled_initial_ergo_vm after [JDK-8258483](https://bugs.openjdk.org/browse/JDK-8258483) came in. AIX / Linux ppc64le show this error : [ RUN ] CollectorPolicy.young_scaled_initial_ergo_vm test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:122: Failure Expected equality of these values: expected Which is: 44695552 NewSize Which is: 41943040 test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:78: Failure Expected: checker->execute() doesn't generate new fatal failures in the current thread. Actual: it does. [ FAILED ] CollectorPolicy.young_scaled_initial_ergo_vm (0 ms) So the decrease form 80M to 40M was too much for these platforms (they slightly differ in ergo/startup behavior). ------------- Commit messages: - JDK-8339300 Changes: https://git.openjdk.org/jdk/pull/20820/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20820&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339300 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20820.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20820/head:pull/20820 PR: https://git.openjdk.org/jdk/pull/20820 From duke at openjdk.org Mon Sep 2 13:14:27 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 2 Sep 2024 13:14:27 GMT Subject: RFR: 8339163: ZGC: Race in clearing of remembered sets Message-ID: When a young collection is in the "concurrent mark" phase and is scanning remembered sets (remsets) to find roots into the young gen it "consumes" the remset when it is finished by clearing it (using memset). At the same time, an old collection might find a completely empty/garbage page that it will insert into the page cache. Before inserting into the page cache, the page's remset is cleared (using memset). These two operations might interfere, resulting in both threads clearing the memory simultaneously. This race was found in connection to https://bugs.openjdk.org/browse/JDK-8339161 where I experimented replacing some clears of remsets with free's and got a crash on Windows from memset when operating on free'd memory. This patch makes sure that remsets are only cleared in the "concurrent mark" phase if not already handled by an old collection. Tested with tiers 1-3 and with a local test that crashes if both threads handle the remset. ------------- Commit messages: - 8339163: Race in clearing of remembered sets Changes: https://git.openjdk.org/jdk/pull/20821/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20821&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339163 Stats: 26 lines in 2 files changed: 14 ins; 9 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20821.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20821/head:pull/20821 PR: https://git.openjdk.org/jdk/pull/20821 From stefank at openjdk.org Mon Sep 2 13:31:17 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 2 Sep 2024 13:31:17 GMT Subject: RFR: 8339163: ZGC: Race in clearing of remembered sets In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 13:09:09 GMT, Joel Sikstr?m wrote: > When a young collection is in the "concurrent mark" phase and is scanning remembered sets (remsets) to find roots into the young gen it "consumes" the remset when it is finished by clearing it (using memset). > > At the same time, an old collection might find a completely empty/garbage page that it will insert into the page cache. Before inserting into the page cache, the page's remset is cleared (using memset). > > These two operations might interfere, resulting in both threads clearing the memory simultaneously. > > This race was found in connection to https://bugs.openjdk.org/browse/JDK-8339161 where I experimented replacing some clears of remsets with free's and got a crash on Windows from memset when operating on free'd memory. > > This patch makes sure that remsets are only cleared in the "concurrent mark" phase if not already handled by an old collection. > > Tested with tiers 1-3 and with a local test that crashes if both threads handle the remset. Looks good. Great that you found this! ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20821#pullrequestreview-2275671530 From duke at openjdk.org Mon Sep 2 16:21:29 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 2 Sep 2024 16:21:29 GMT Subject: RFR: 8339399: ZGC: Remove unnecessary page reset when splitting pages Message-ID: There is no use-case where it is necessary to reset the same page after splitting and thus the reset should be removed. Tested with tiers 1-3. ------------- Commit messages: - 8339399: ZGC: Remove unnecessary page reset when splitting pages Changes: https://git.openjdk.org/jdk/pull/20824/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20824&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339399 Stats: 11 lines in 2 files changed: 1 ins; 10 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20824.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20824/head:pull/20824 PR: https://git.openjdk.org/jdk/pull/20824 From stefank at openjdk.org Mon Sep 2 16:59:18 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 2 Sep 2024 16:59:18 GMT Subject: RFR: 8339399: ZGC: Remove unnecessary page reset when splitting pages In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 16:15:57 GMT, Joel Sikstr?m wrote: > There is no use-case where it is necessary to reset the same page after splitting and thus the reset should be removed. > > Tested with tiers 1-3. Looks good. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20824#pullrequestreview-2275990889 From eosterlund at openjdk.org Mon Sep 2 17:17:19 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Sep 2024 17:17:19 GMT Subject: RFR: 8339399: ZGC: Remove unnecessary page reset when splitting pages In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 16:15:57 GMT, Joel Sikstr?m wrote: > There is no use-case where it is necessary to reset the same page after splitting and thus the reset should be removed. > > Tested with tiers 1-3. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20824#pullrequestreview-2276004131 From eosterlund at openjdk.org Mon Sep 2 17:22:18 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 2 Sep 2024 17:22:18 GMT Subject: RFR: 8339163: ZGC: Race in clearing of remembered sets In-Reply-To: References: Message-ID: <7fbmqs1TZO5HZnvMz46ppNfQCv3lnB4Pu9zEeSzuQGY=.14d7dbb3-8243-4299-96b0-32eb85c78aa4@github.com> On Mon, 2 Sep 2024 13:09:09 GMT, Joel Sikstr?m wrote: > When a young collection is in the "concurrent mark" phase and is scanning remembered sets (remsets) to find roots into the young gen it "consumes" the remset when it is finished by clearing it (using memset). > > At the same time, an old collection might find a completely empty/garbage page that it will insert into the page cache. Before inserting into the page cache, the page's remset is cleared (using memset). > > These two operations might interfere, resulting in both threads clearing the memory simultaneously. > > This race was found in connection to https://bugs.openjdk.org/browse/JDK-8339161 where I experimented replacing some clears of remsets with free's and got a crash on Windows from memset when operating on free'd memory. > > This patch makes sure that remsets are only cleared in the "concurrent mark" phase if not already handled by an old collection. > > Tested with tiers 1-3 and with a local test that crashes if both threads handle the remset. Looks good! ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20821#pullrequestreview-2276006890 From aboldtch at openjdk.org Mon Sep 2 17:38:23 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 2 Sep 2024 17:38:23 GMT Subject: RFR: 8339399: ZGC: Remove unnecessary page reset when splitting pages In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 16:15:57 GMT, Joel Sikstr?m wrote: > There is no use-case where it is necessary to reset the same page after splitting and thus the reset should be removed. > > Tested with tiers 1-3. lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20824#pullrequestreview-2276017500 From aboldtch at openjdk.org Mon Sep 2 17:39:18 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 2 Sep 2024 17:39:18 GMT Subject: RFR: 8339163: ZGC: Race in clearing of remembered sets In-Reply-To: References: Message-ID: <8_W87_RCGbZgF4P8_vDMeL1qO4MCKeVlgFSDZa-0wuY=.5b9a0480-5c31-42bb-bb19-00da295c43b2@github.com> On Mon, 2 Sep 2024 13:09:09 GMT, Joel Sikstr?m wrote: > When a young collection is in the "concurrent mark" phase and is scanning remembered sets (remsets) to find roots into the young gen it "consumes" the remset when it is finished by clearing it (using memset). > > At the same time, an old collection might find a completely empty/garbage page that it will insert into the page cache. Before inserting into the page cache, the page's remset is cleared (using memset). > > These two operations might interfere, resulting in both threads clearing the memory simultaneously. > > This race was found in connection to https://bugs.openjdk.org/browse/JDK-8339161 where I experimented replacing some clears of remsets with free's and got a crash on Windows from memset when operating on free'd memory. > > This patch makes sure that remsets are only cleared in the "concurrent mark" phase if not already handled by an old collection. > > Tested with tiers 1-3 and with a local test that crashes if both threads handle the remset. lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20821#pullrequestreview-2276018164 From rcastanedalo at openjdk.org Tue Sep 3 07:26:00 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Sep 2024 07:26:00 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v13] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision: - Increase test coverage of new-object stores with different type information - Refactor the two post-barrier removal cases into a single expression - Remove unnecessary early null-based post-barrier elision - Make store capturability test G1-specific and more precise ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/4ee450ad..1ea2862f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=11-12 Stats: 88 lines in 5 files changed: 66 ins; 7 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Tue Sep 3 07:26:00 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Sep 2024 07:26:00 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Fri, 30 Aug 2024 08:23:44 GMT, Roberto Casta?eda Lozano wrote: > I will study if the check in get_store_barrier is superseded by that in refine_barrier_by_new_val_type. If I can convince myself that this is the case I will consider removing the former. This was indeed the case, so I have removed the compile-time null check from `G1BarrierSetC2::get_store_barrier` (commit deac05d7) and simplified the code around it (commit 6f4027bf). I also added a few extra test cases to exercise stores on newly-allocated objects with different nullness information (commit 1ea2862f). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741555725 From rcastanedalo at openjdk.org Tue Sep 3 07:26:00 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Sep 2024 07:26:00 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> On Fri, 30 Aug 2024 13:49:10 GMT, Roberto Casta?eda Lozano wrote: > Thanks for your suggestions and comments, Kim! I have addressed them now (modulo a couple of follow-up experiments that I will try out in the next days), please let me know if there is any further question. @kimbarrett I have addressed all your comments now (including follow-up enhancements), please re-review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2325782979 From rcastanedalo at openjdk.org Tue Sep 3 07:26:01 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Sep 2024 07:26:01 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v9] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 13:40:24 GMT, Roberto Casta?eda Lozano wrote: > A perhaps more principled solution might be extending store-capturing analysis to reject stores with late-expanded barriers. I will give it a try. This option proved to be infeasible because other GCs (ZGC) rely on store capturing for barrier elision. Furthermore, this would prevent eliding G1 barriers that are found to be elidable only after the program is simplified by C2's intermediate optimizations, even if `ReduceInitialCardMarks` is enabled (I found a few such cases, e.g. where range check elimination is the enabling simplification). Instead, I have opted to remove the `ReduceInitialCardMarks` condition from `StoreNode::Ideal` and introduce a GC-specific test to determine whether a store can be captured and used for object initialization (commit 6b9954979). For G1, this is true iff the store does not have any barrier or it does have barriers but `ReduceInitialCardMarks` is enabled. For all other GCs the test is always true, which preserves the original mainline behavior. To summarize, this option makes the logic clearer, improves analysis precision, and isolates the changes to G1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741554994 From mdoerr at openjdk.org Tue Sep 3 12:06:27 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 12:06:27 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v13] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 07:26:00 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision: > > - Increase test coverage of new-object stores with different type information > - Refactor the two post-barrier removal cases into a single expression > - Remove unnecessary early null-based post-barrier elision > - Make store capturability test G1-specific and more precise src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 646: > 644: instruct g1LoadPVolatile(iRegPNoSp dst, indirect mem, iRegPNoSp tmp1, iRegPNoSp tmp2, rFlagsReg cr) > 645: %{ > 646: predicate(UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0); Remark: This node should never match because the `referent` is never volatile (same for `g1LoadNVolatile`): https://github.com/openjdk/jdk/blob/7a418fc07464fe359a0b45b6d797c65c573770cb/src/java.base/share/classes/java/lang/ref/Reference.java#L157 Hence, `needs_acquiring_load(n)` and `n->as_Load()->barrier_data() != 0` are never true at the same time. Not sure if this should somehow be reflected in the code. I've inserted `ShouldNotReachHere` on PPC64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741937425 From mdoerr at openjdk.org Tue Sep 3 12:15:27 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 12:15:27 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> <6rvTU-KY2DpLx3sK7zmkaZCBaNELr3FLGItGwUJzNUM=.0e75c7a7-4309-49c5-b48b-5aa642bcbe43@github.com> Message-ID: On Thu, 29 Aug 2024 08:37:24 GMT, Albert Mingkun Yang wrote: >> Thanks, I prototyped the refactored version for both x64 and aarch64 [here](https://github.com/robcasloz/jdk/commit/c1ae871eadac0d44981b7892ac8f7b64e8734283). I do not have a strong opinion for or against this refactoring. @albertnetymk @kimbarrett what do you think about it? (asking since you have recently looked at this code). > > I find the use of default-arg-value `bool decode_new_val = false` a bit confusing. (I tend to think default-arg-value makes the code less readable in general.) > > If not using default-arg-value, I suspect the diff will be larger, and I don't see the immediate benefit of this refactoring. Maybe this can be deferred to its own PR if it's really desirable? @albertnetymk: FYI: The basic idea was to make compressed Oops optimizations easier. It allows using shorter decoding sequences and removing redundant null checks in the fast path. I've implemented it on PPC64: https://github.com/TheRealMDoerr/jdk/blob/ed9c0232f53a15d768804348e1d8a111fed9a19e/src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp#L471 But, I'm ok with postponing it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741950634 From mdoerr at openjdk.org Tue Sep 3 12:20:25 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 12:20:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> Message-ID: On Tue, 3 Sep 2024 07:22:32 GMT, Roberto Casta?eda Lozano wrote: >>> I've only looked at the changes in gc directories (shared and cpu-specific). >> >> Thanks for your suggestions and comments, Kim! I have addressed them now (modulo a couple of follow-up experiments that I will try out in the next days), please let me know if there is any further question. > >> Thanks for your suggestions and comments, Kim! I have addressed them now (modulo a couple of follow-up experiments that I will try out in the next days), please let me know if there is any further question. > > @kimbarrett I have addressed all your comments now (including follow-up enhancements), please re-review. @robcasloz: Thanks for the updates! I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e Do you prefer integrating it soon? Otherwise, I could keep it separate and do more rebasing and testing while this PR evolves further. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2326378191 From iwalulya at openjdk.org Tue Sep 3 13:56:29 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 3 Sep 2024 13:56:29 GMT Subject: RFR: 8339369: G1: TestVerificationInConcurrentCycle.java fails with "Missing rem set entry" when using "-XX:G1RSetUpdatingPauseTimePercent=0 -XX:G1UpdateBufferSize=2" Message-ID: Please review this patch to reset the per region cardsets in the later phases of the full-gc. This ensures that Remset verification can proceed without considering whether the cardsets are combined or not. Testing: passes test in the cited in the bug report and Tiers 1-3 ------------- Commit messages: - init Changes: https://git.openjdk.org/jdk/pull/20835/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20835&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339369 Stats: 3 lines in 2 files changed: 2 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20835.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20835/head:pull/20835 PR: https://git.openjdk.org/jdk/pull/20835 From mdoerr at openjdk.org Tue Sep 3 14:22:19 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 14:22:19 GMT Subject: RFR: 8339300: CollectorPolicy.young_scaled_initial_ergo_vm gtest fails on ppc64 based platforms In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 12:46:27 GMT, Matthias Baesken wrote: > We currently fail in CollectorPolicy.young_scaled_initial_ergo_vm after [JDK-8258483](https://bugs.openjdk.org/browse/JDK-8258483) came in. > AIX / Linux ppc64le show this error : > > [ RUN ] CollectorPolicy.young_scaled_initial_ergo_vm > test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:122: Failure > Expected equality of these values: > expected > Which is: 44695552 > NewSize > Which is: 41943040 > > test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:78: Failure > Expected: checker->execute() doesn't generate new fatal failures in the current thread. > Actual: it does. > > [ FAILED ] CollectorPolicy.young_scaled_initial_ergo_vm (0 ms) > > So the decrease form 80M to 40M was too much for these platforms (they slightly differ in ergo/startup behavior). LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20820#pullrequestreview-2277598668 From lucy at openjdk.org Tue Sep 3 17:38:18 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 3 Sep 2024 17:38:18 GMT Subject: RFR: 8339300: CollectorPolicy.young_scaled_initial_ergo_vm gtest fails on ppc64 based platforms In-Reply-To: References: Message-ID: <3cJc7O8zzVLBQ5Vyg88TOtUeRMwm2xBf2XsN0-_G7HA=.e5c77738-44d2-49c9-aa76-46150860903a@github.com> On Mon, 2 Sep 2024 12:46:27 GMT, Matthias Baesken wrote: > We currently fail in CollectorPolicy.young_scaled_initial_ergo_vm after [JDK-8258483](https://bugs.openjdk.org/browse/JDK-8258483) came in. > AIX / Linux ppc64le show this error : > > [ RUN ] CollectorPolicy.young_scaled_initial_ergo_vm > test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:122: Failure > Expected equality of these values: > expected > Which is: 44695552 > NewSize > Which is: 41943040 > > test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:78: Failure > Expected: checker->execute() doesn't generate new fatal failures in the current thread. > Actual: it does. > > [ FAILED ] CollectorPolicy.young_scaled_initial_ergo_vm (0 ms) > > So the decrease form 80M to 40M was too much for these platforms (they slightly differ in ergo/startup behavior). LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20820#pullrequestreview-2278070784 From mbaesken at openjdk.org Wed Sep 4 07:12:23 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 4 Sep 2024 07:12:23 GMT Subject: RFR: 8339300: CollectorPolicy.young_scaled_initial_ergo_vm gtest fails on ppc64 based platforms In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 12:46:27 GMT, Matthias Baesken wrote: > We currently fail in CollectorPolicy.young_scaled_initial_ergo_vm after [JDK-8258483](https://bugs.openjdk.org/browse/JDK-8258483) came in. > AIX / Linux ppc64le show this error : > > [ RUN ] CollectorPolicy.young_scaled_initial_ergo_vm > test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:122: Failure > Expected equality of these values: > expected > Which is: 44695552 > NewSize > Which is: 41943040 > > test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:78: Failure > Expected: checker->execute() doesn't generate new fatal failures in the current thread. > Actual: it does. > > [ FAILED ] CollectorPolicy.young_scaled_initial_ergo_vm (0 ms) > > So the decrease form 80M to 40M was too much for these platforms (they slightly differ in ergo/startup behavior). Hi Lutz and Martin, thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20820#issuecomment-2328083370 From mbaesken at openjdk.org Wed Sep 4 07:12:23 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 4 Sep 2024 07:12:23 GMT Subject: Integrated: 8339300: CollectorPolicy.young_scaled_initial_ergo_vm gtest fails on ppc64 based platforms In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 12:46:27 GMT, Matthias Baesken wrote: > We currently fail in CollectorPolicy.young_scaled_initial_ergo_vm after [JDK-8258483](https://bugs.openjdk.org/browse/JDK-8258483) came in. > AIX / Linux ppc64le show this error : > > [ RUN ] CollectorPolicy.young_scaled_initial_ergo_vm > test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:122: Failure > Expected equality of these values: > expected > Which is: 44695552 > NewSize > Which is: 41943040 > > test/hotspot/gtest/gc/shared/test_collectorPolicy.cpp:78: Failure > Expected: checker->execute() doesn't generate new fatal failures in the current thread. > Actual: it does. > > [ FAILED ] CollectorPolicy.young_scaled_initial_ergo_vm (0 ms) > > So the decrease form 80M to 40M was too much for these platforms (they slightly differ in ergo/startup behavior). This pull request has now been integrated. Changeset: f2c992c5 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/f2c992c5af021ab0ff8429fd261314bc7e01f7df Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8339300: CollectorPolicy.young_scaled_initial_ergo_vm gtest fails on ppc64 based platforms Reviewed-by: mdoerr, lucy ------------- PR: https://git.openjdk.org/jdk/pull/20820 From tschatzl at openjdk.org Wed Sep 4 08:04:17 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 4 Sep 2024 08:04:17 GMT Subject: RFR: 8339369: G1: TestVerificationInConcurrentCycle.java fails with "Missing rem set entry" when using "-XX:G1RSetUpdatingPauseTimePercent=0 -XX:G1UpdateBufferSize=2" In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 12:12:56 GMT, Ivan Walulya wrote: > Please review this patch to reset the per region cardsets in the later phases of the full-gc. This ensures that Remset verification can proceed without considering whether the cardsets are combined or not. > > Testing: passes test in the cited in the bug report and Tiers 1-3 lgtm. The original code basically dropped the remsets to the young gen which failed by uninstalling them (the remaining remset is empty after all). ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20835#pullrequestreview-2279294921 From duke at openjdk.org Wed Sep 4 08:09:34 2024 From: duke at openjdk.org (duke) Date: Wed, 4 Sep 2024 08:09:34 GMT Subject: Withdrawn: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 11:35:06 GMT, Thomas Stuefe wrote: > See JBS issue. > > It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. > > The patch: > - exposes os::available_memory via Whitebox > - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` > > I have some misgivings about this solution, though: > 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. > 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) > 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. > > Despite my doubts, I think this is the best we can come up with if we want to have such a test. > > Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19803 From duke at openjdk.org Wed Sep 4 08:51:20 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 4 Sep 2024 08:51:20 GMT Subject: RFR: 8339163: ZGC: Race in clearing of remembered sets In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 13:28:48 GMT, Stefan Karlsson wrote: >> When a young collection is in the "concurrent mark" phase and is scanning remembered sets (remsets) to find roots into the young gen it "consumes" the remset when it is finished by clearing it (using memset). >> >> At the same time, an old collection might find a completely empty/garbage page that it will insert into the page cache. Before inserting into the page cache, the page's remset is cleared (using memset). >> >> These two operations might interfere, resulting in both threads clearing the memory simultaneously. >> >> This race was found in connection to https://bugs.openjdk.org/browse/JDK-8339161 where I experimented replacing some clears of remsets with free's and got a crash on Windows from memset when operating on free'd memory. >> >> This patch makes sure that remsets are only cleared in the "concurrent mark" phase if not already handled by an old collection. >> >> Tested with tiers 1-3 and with a local test that crashes if both threads handle the remset. > > Looks good. Great that you found this! Thank you for the reviews! @stefank @fisk @xmas92 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20821#issuecomment-2328277567 From duke at openjdk.org Wed Sep 4 08:51:21 2024 From: duke at openjdk.org (duke) Date: Wed, 4 Sep 2024 08:51:21 GMT Subject: RFR: 8339163: ZGC: Race in clearing of remembered sets In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 13:09:09 GMT, Joel Sikstr?m wrote: > When a young collection is in the "concurrent mark" phase and is scanning remembered sets (remsets) to find roots into the young gen it "consumes" the remset when it is finished by clearing it (using memset). > > At the same time, an old collection might find a completely empty/garbage page that it will insert into the page cache. Before inserting into the page cache, the page's remset is cleared (using memset). > > These two operations might interfere, resulting in both threads clearing the memory simultaneously. > > This race was found in connection to https://bugs.openjdk.org/browse/JDK-8339161 where I experimented replacing some clears of remsets with free's and got a crash on Windows from memset when operating on free'd memory. > > This patch makes sure that remsets are only cleared in the "concurrent mark" phase if not already handled by an old collection. > > Tested with tiers 1-3 and with a local test that crashes if both threads handle the remset. @jsikstro Your change (at version 109ee7e0fbc088b555f55012e766b7c444ee8fbf) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20821#issuecomment-2328278552 From duke at openjdk.org Wed Sep 4 08:53:21 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 4 Sep 2024 08:53:21 GMT Subject: RFR: 8339399: ZGC: Remove unnecessary page reset when splitting pages In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 16:56:27 GMT, Stefan Karlsson wrote: >> There is no use-case where it is necessary to reset the same page after splitting and thus the reset should be removed. >> >> Tested with tiers 1-3. > > Looks good. Thank you for the reviews! @stefank @fisk @xmas92 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20824#issuecomment-2328279272 From duke at openjdk.org Wed Sep 4 08:53:22 2024 From: duke at openjdk.org (duke) Date: Wed, 4 Sep 2024 08:53:22 GMT Subject: RFR: 8339399: ZGC: Remove unnecessary page reset when splitting pages In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 16:15:57 GMT, Joel Sikstr?m wrote: > There is no use-case where it is necessary to reset the same page after splitting and thus the reset should be removed. > > Tested with tiers 1-3. @jsikstro Your change (at version b6fee02735ad4124e1f6e9eb1ab2654ad7444ddf) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20824#issuecomment-2328282425 From duke at openjdk.org Wed Sep 4 08:58:22 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 4 Sep 2024 08:58:22 GMT Subject: Integrated: 8339399: ZGC: Remove unnecessary page reset when splitting pages In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 16:15:57 GMT, Joel Sikstr?m wrote: > There is no use-case where it is necessary to reset the same page after splitting and thus the reset should be removed. > > Tested with tiers 1-3. This pull request has now been integrated. Changeset: a6186051 Author: Joel Sikstr?m Committer: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/a61860511f67038962c54e114599948ca103dae8 Stats: 11 lines in 2 files changed: 1 ins; 10 del; 0 mod 8339399: ZGC: Remove unnecessary page reset when splitting pages Reviewed-by: stefank, eosterlund, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/20824 From rcastanedalo at openjdk.org Wed Sep 4 09:06:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 4 Sep 2024 09:06:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v14] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: 8334111: Implementation of Late Barrier Expansion for G1: ppc port ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/1ea2862f..ed9c0232 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=12-13 Stats: 1036 lines in 5 files changed: 947 ins; 64 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Wed Sep 4 09:10:27 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 4 Sep 2024 09:10:27 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> Message-ID: On Tue, 3 Sep 2024 12:17:58 GMT, Martin Doerr wrote: > I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e Do you prefer integrating it soon? That's great, thank you for your work and your comments and suggestions, Martin! I just merged your implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2328319555 From duke at openjdk.org Wed Sep 4 09:12:23 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 4 Sep 2024 09:12:23 GMT Subject: Integrated: 8339163: ZGC: Race in clearing of remembered sets In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 13:09:09 GMT, Joel Sikstr?m wrote: > When a young collection is in the "concurrent mark" phase and is scanning remembered sets (remsets) to find roots into the young gen it "consumes" the remset when it is finished by clearing it (using memset). > > At the same time, an old collection might find a completely empty/garbage page that it will insert into the page cache. Before inserting into the page cache, the page's remset is cleared (using memset). > > These two operations might interfere, resulting in both threads clearing the memory simultaneously. > > This race was found in connection to https://bugs.openjdk.org/browse/JDK-8339161 where I experimented replacing some clears of remsets with free's and got a crash on Windows from memset when operating on free'd memory. > > This patch makes sure that remsets are only cleared in the "concurrent mark" phase if not already handled by an old collection. > > Tested with tiers 1-3 and with a local test that crashes if both threads handle the remset. This pull request has now been integrated. Changeset: 7ad61605 Author: Joel Sikstr?m Committer: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/7ad61605f1669f51a97f4f263a7afaa9ab7706be Stats: 26 lines in 2 files changed: 14 ins; 9 del; 3 mod 8339163: ZGC: Race in clearing of remembered sets Reviewed-by: stefank, eosterlund, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/20821 From kbarrett at openjdk.org Wed Sep 4 19:37:19 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 4 Sep 2024 19:37:19 GMT Subject: RFR: 8339369: G1: TestVerificationInConcurrentCycle.java fails with "Missing rem set entry" when using "-XX:G1RSetUpdatingPauseTimePercent=0 -XX:G1UpdateBufferSize=2" In-Reply-To: References: Message-ID: <7hxjryXYK8mRcjIc9ph0g0FJEqknP9-UBRZZhizFJSY=.9609fa18-6a9a-485c-a813-3919834ffbd4@github.com> On Tue, 3 Sep 2024 12:12:56 GMT, Ivan Walulya wrote: > Please review this patch to reset the per region cardsets in the later phases of the full-gc. This ensures that Remset verification can proceed without considering whether the cardsets are combined or not. > > Testing: passes test in the cited in the bug report and Tiers 1-3 Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20835#pullrequestreview-2281081437 From iwalulya at openjdk.org Thu Sep 5 08:21:00 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 5 Sep 2024 08:21:00 GMT Subject: RFR: 8339369: G1: TestVerificationInConcurrentCycle.java fails with "Missing rem set entry" when using "-XX:G1RSetUpdatingPauseTimePercent=0 -XX:G1UpdateBufferSize=2" In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 08:01:24 GMT, Thomas Schatzl wrote: >> Please review this patch to reset the per region cardsets in the later phases of the full-gc. This ensures that Remset verification can proceed without considering whether the cardsets are combined or not. >> >> Testing: passes test in the cited in the bug report and Tiers 1-3 > > lgtm. > > The original code basically dropped the remsets to the young gen which failed by uninstalling them (the remaining remset is empty after all). Thanks @tschatzl and @kimbarrett for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20835#issuecomment-2330898844 From iwalulya at openjdk.org Thu Sep 5 08:21:02 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 5 Sep 2024 08:21:02 GMT Subject: Integrated: 8339369: G1: TestVerificationInConcurrentCycle.java fails with "Missing rem set entry" when using "-XX:G1RSetUpdatingPauseTimePercent=0 -XX:G1UpdateBufferSize=2" In-Reply-To: References: Message-ID: <_Y9Z3b2iSPXZQ82AeH8cH144Jb8zyqnTGIMzW2xqvOY=.d64c1478-bb7c-4b42-99c0-2aec1802e09b@github.com> On Tue, 3 Sep 2024 12:12:56 GMT, Ivan Walulya wrote: > Please review this patch to reset the per region cardsets in the later phases of the full-gc. This ensures that Remset verification can proceed without considering whether the cardsets are combined or not. > > Testing: passes test in the cited in the bug report and Tiers 1-3 This pull request has now been integrated. Changeset: 96a0502d Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/96a0502d624e3eff1b00a7c63e8b3a27870b475e Stats: 3 lines in 2 files changed: 2 ins; 1 del; 0 mod 8339369: G1: TestVerificationInConcurrentCycle.java fails with "Missing rem set entry" when using "-XX:G1RSetUpdatingPauseTimePercent=0 -XX:G1UpdateBufferSize=2" Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/20835 From rcastanedalo at openjdk.org Thu Sep 5 10:05:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 5 Sep 2024 10:05:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary g1LoadXVolatile instructions in aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/ed9c0232..9821e795 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=13-14 Stats: 71 lines in 2 files changed: 4 ins; 51 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Thu Sep 5 10:09:55 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 5 Sep 2024 10:09:55 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v13] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 12:04:09 GMT, Martin Doerr wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision: >> >> - Increase test coverage of new-object stores with different type information >> - Refactor the two post-barrier removal cases into a single expression >> - Remove unnecessary early null-based post-barrier elision >> - Make store capturability test G1-specific and more precise > > src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 646: > >> 644: instruct g1LoadPVolatile(iRegPNoSp dst, indirect mem, iRegPNoSp tmp1, iRegPNoSp tmp2, rFlagsReg cr) >> 645: %{ >> 646: predicate(UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0); > > Remark: This node should never match because the `referent` is never volatile (same for `g1LoadNVolatile`): https://github.com/openjdk/jdk/blob/7a418fc07464fe359a0b45b6d797c65c573770cb/src/java.base/share/classes/java/lang/ref/Reference.java#L157 > Hence, `needs_acquiring_load(n)` and `n->as_Load()->barrier_data() != 0` are never true at the same time. > Not sure if this should somehow be reflected in the code. I've inserted `ShouldNotReachHere` on PPC64. Good catch, thanks! I simply removed the `g1LoadXVolatile` patterns and added a comment explaining why they are not needed (commit 9821e795). The matcher should already fail if we ever end up with an erroneous `LoadX` node `n` for which `UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0` holds. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1745185394 From mdoerr at openjdk.org Thu Sep 5 10:45:55 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 5 Sep 2024 10:45:55 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v13] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 10:07:14 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 646: >> >>> 644: instruct g1LoadPVolatile(iRegPNoSp dst, indirect mem, iRegPNoSp tmp1, iRegPNoSp tmp2, rFlagsReg cr) >>> 645: %{ >>> 646: predicate(UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0); >> >> Remark: This node should never match because the `referent` is never volatile (same for `g1LoadNVolatile`): https://github.com/openjdk/jdk/blob/7a418fc07464fe359a0b45b6d797c65c573770cb/src/java.base/share/classes/java/lang/ref/Reference.java#L157 >> Hence, `needs_acquiring_load(n)` and `n->as_Load()->barrier_data() != 0` are never true at the same time. >> Not sure if this should somehow be reflected in the code. I've inserted `ShouldNotReachHere` on PPC64. > > Good catch, thanks! I simply removed the `g1LoadXVolatile` patterns and added a comment explaining why they are not needed (commit 9821e795). The matcher should already fail if we ever end up with an erroneous `LoadX` node `n` for which `UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0` holds. Correct. Only the error message may be not so nice ("bad AD file"). PPC64 still has `g1LoadP_acq` and `g1LoadN_acq` which could also be replaced by a comment. But it's not important. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1745230285 From duke at openjdk.org Thu Sep 5 12:23:17 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 5 Sep 2024 12:23:17 GMT Subject: RFR: 8339579: ZGC: Race results in only one of two remembered sets being cleared Message-ID: https://github.com/openjdk/jdk/pull/20821 introduced a fix that skips clearing of a potential remembered set in the young collection in favor of doing it only in the old collection. The code responsible for clearing the remset (`ZRememberedSet::clear_all()`) is dependent on the `_current` variable in ZRememberedSet not being altered in between the two "clear_*" calls in `ZRememberedSet::clear_all()`. If the remembered set is being cleared in the old collection, and successfully clears the current remset with `clear_current()`, a young collection might then flip/swap the current/previous remembered sets by changing the `_current` value in ZRememberedSet. This would mean that the call to clear_previous() would clean the same bitmap again, resulting in one of the two bitmaps not being cleared at all. This will later crash in an assert checking if both bitmaps are empty/clear when the page is being freed. Code in question: ```c++ void ZRememberedSet::clear_all() { clear_current(); clear_previous(); } This PR makes sure that clearing is done independently of what the `_current` value is, by accessing the two bitmaps directly. ------------- Commit messages: - 8339579: ZGC: Race results in only one of two remembered sets being cleared Changes: https://git.openjdk.org/jdk/pull/20869/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20869&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339579 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20869.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20869/head:pull/20869 PR: https://git.openjdk.org/jdk/pull/20869 From stefank at openjdk.org Thu Sep 5 12:44:49 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 5 Sep 2024 12:44:49 GMT Subject: RFR: 8339579: ZGC: Race results in only one of two remembered sets being cleared In-Reply-To: References: Message-ID: <99UpwbYB_lIljopYQicstFc76LlA0icGylPglJpGW9Q=.8405b963-ea12-4a14-893b-74c645a3b52a@github.com> On Thu, 5 Sep 2024 12:18:48 GMT, Joel Sikstr?m wrote: > https://github.com/openjdk/jdk/pull/20821 introduced a fix that skips clearing of a potential remembered set in the young collection in favor of doing it only in the old collection. > > The code responsible for clearing the remset (`ZRememberedSet::clear_all()`) is dependent on the `_current` variable in ZRememberedSet not being altered in between the two "clear_*" calls in `ZRememberedSet::clear_all()`. If the remembered set is being cleared in the old collection, and successfully clears the current remset with `clear_current()`, a young collection might then flip/swap the current/previous remembered sets by changing the `_current` value in ZRememberedSet. This would mean that the call to clear_previous() would clean the same bitmap again, resulting in one of the two bitmaps not being cleared at all. This will later crash in an assert checking if both bitmaps are empty/clear when the page is being freed. > > Code in question: > ```c++ > void ZRememberedSet::clear_all() { > clear_current(); > clear_previous(); > } > > > This PR makes sure that clearing is done independently of what the `_current` value is, by accessing the two bitmaps directly. Looks good! ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20869#pullrequestreview-2282883401 From sjohanss at openjdk.org Thu Sep 5 13:18:54 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 5 Sep 2024 13:18:54 GMT Subject: RFR: 8339579: ZGC: Race results in only one of two remembered sets being cleared In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 12:18:48 GMT, Joel Sikstr?m wrote: > https://github.com/openjdk/jdk/pull/20821 introduced a fix that skips clearing of a potential remembered set in the young collection in favor of doing it only in the old collection. > > The code responsible for clearing the remset (`ZRememberedSet::clear_all()`) is dependent on the `_current` variable in ZRememberedSet not being altered in between the two "clear_*" calls in `ZRememberedSet::clear_all()`. If the remembered set is being cleared in the old collection, and successfully clears the current remset with `clear_current()`, a young collection might then flip/swap the current/previous remembered sets by changing the `_current` value in ZRememberedSet. This would mean that the call to clear_previous() would clean the same bitmap again, resulting in one of the two bitmaps not being cleared at all. This will later crash in an assert checking if both bitmaps are empty/clear when the page is being freed. > > Code in question: > ```c++ > void ZRememberedSet::clear_all() { > clear_current(); > clear_previous(); > } > > > This PR makes sure that clearing is done independently of what the `_current` value is, by accessing the two bitmaps directly. Looks good. ------------- Marked as reviewed by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20869#pullrequestreview-2283029963 From duke at openjdk.org Thu Sep 5 13:42:53 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 5 Sep 2024 13:42:53 GMT Subject: Integrated: 8339579: ZGC: Race results in only one of two remembered sets being cleared In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 12:18:48 GMT, Joel Sikstr?m wrote: > https://github.com/openjdk/jdk/pull/20821 introduced a fix that skips clearing of a potential remembered set in the young collection in favor of doing it only in the old collection. > > The code responsible for clearing the remset (`ZRememberedSet::clear_all()`) is dependent on the `_current` variable in ZRememberedSet not being altered in between the two "clear_*" calls in `ZRememberedSet::clear_all()`. If the remembered set is being cleared in the old collection, and successfully clears the current remset with `clear_current()`, a young collection might then flip/swap the current/previous remembered sets by changing the `_current` value in ZRememberedSet. This would mean that the call to `clear_previous()` would clean the same bitmap again, resulting in one of the two bitmaps not being cleared at all. This will later crash in an assert checking if both bitmaps are empty/clear when the page is being freed. > > Code in question: > ```c++ > void ZRememberedSet::clear_all() { > clear_current(); > clear_previous(); > } > > > This PR makes sure that clearing is done independently of what the `_current` value is, by accessing the two bitmaps directly. > > Tested with tier5, where the fails/crashes occured before this fix, and a reproducer of the crash as well. This pull request has now been integrated. Changeset: ab656c3a Author: Joel Sikstr?m Committer: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/ab656c3aab8157ed8e70bc126881cbadc825de93 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8339579: ZGC: Race results in only one of two remembered sets being cleared Reviewed-by: stefank, sjohanss ------------- PR: https://git.openjdk.org/jdk/pull/20869 From duke at openjdk.org Thu Sep 5 14:03:54 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 5 Sep 2024 14:03:54 GMT Subject: RFR: 8339579: ZGC: Race results in only one of two remembered sets being cleared In-Reply-To: <99UpwbYB_lIljopYQicstFc76LlA0icGylPglJpGW9Q=.8405b963-ea12-4a14-893b-74c645a3b52a@github.com> References: <99UpwbYB_lIljopYQicstFc76LlA0icGylPglJpGW9Q=.8405b963-ea12-4a14-893b-74c645a3b52a@github.com> Message-ID: On Thu, 5 Sep 2024 12:42:25 GMT, Stefan Karlsson wrote: >> https://github.com/openjdk/jdk/pull/20821 introduced a fix that skips clearing of a potential remembered set in the young collection in favor of doing it only in the old collection. >> >> The code responsible for clearing the remset (`ZRememberedSet::clear_all()`) is dependent on the `_current` variable in ZRememberedSet not being altered in between the two "clear_*" calls in `ZRememberedSet::clear_all()`. If the remembered set is being cleared in the old collection, and successfully clears the current remset with `clear_current()`, a young collection might then flip/swap the current/previous remembered sets by changing the `_current` value in ZRememberedSet. This would mean that the call to `clear_previous()` would clean the same bitmap again, resulting in one of the two bitmaps not being cleared at all. This will later crash in an assert checking if both bitmaps are empty/clear when the page is being freed. >> >> Code in question: >> ```c++ >> void ZRememberedSet::clear_all() { >> clear_current(); >> clear_previous(); >> } >> >> >> This PR makes sure that clearing is done independently of what the `_current` value is, by accessing the two bitmaps directly. >> >> Tested with tier5, where the fails/crashes occured before this fix, and a reproducer of the crash as well. > > Looks good! Thank you for the reviews! @stefank @kstefanj ------------- PR Comment: https://git.openjdk.org/jdk/pull/20869#issuecomment-2331767183 From fjiang at openjdk.org Thu Sep 5 14:56:02 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 5 Sep 2024 14:56:02 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> Message-ID: On Wed, 4 Sep 2024 09:07:23 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz: Thanks for the updates! I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e >> Do you prefer integrating it soon? Otherwise, I could keep it separate and do more rebasing and testing while this PR evolves further. > >> I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e > Do you prefer integrating it soon? > > That's great, thank you for your work and your comments and suggestions, Martin! I just merged your implementation. Hi @robcasloz, here is the implementation for RISC-V: https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6 We are still testing the latest changes, results will be updated later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2331932063 From rcastanedalo at openjdk.org Thu Sep 5 16:06:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 5 Sep 2024 16:06:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> Message-ID: On Wed, 4 Sep 2024 09:07:23 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz: Thanks for the updates! I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e >> Do you prefer integrating it soon? Otherwise, I could keep it separate and do more rebasing and testing while this PR evolves further. > >> I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e > Do you prefer integrating it soon? > > That's great, thank you for your work and your comments and suggestions, Martin! I just merged your implementation. > Hi @robcasloz, here is the implementation for RISC-V: [feilongjiang at 1c012cf](https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6) We are still testing the latest changes, results will be updated later. Great, thanks @feilongjiang! Just let me know when you are done with testing and I will merge it into this changeset. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2332119624 From mdoerr at openjdk.org Thu Sep 5 18:18:56 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 5 Sep 2024 18:18:56 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 10:05:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary g1LoadXVolatile instructions in aarch64 I've implemented the same cleanup as on aarch64: https://github.com/TheRealMDoerr/jdk/commit/ad662a256034a09156b1b43673d2640a119740b2 Would be nice if you could apply it. Thanks! In case you want to merge further updates from head, I have no objections. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2332365001 From duke at openjdk.org Thu Sep 5 20:45:57 2024 From: duke at openjdk.org (halkosajtarevic) Date: Thu, 5 Sep 2024 20:45:57 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 10:05:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary g1LoadXVolatile instructions in aarch64 Sorry, one maybe dumb question, hopefully matching the context here: Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2332586175 From sjohanss at openjdk.org Fri Sep 6 07:20:20 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 6 Sep 2024 07:20:20 GMT Subject: RFR: 8339387: ZGC: Synchronize medium page allocation Message-ID: Please review this change to synchronize medium page allocations in ZGC. **Summary** In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing. This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case. **Testing** * Functional testing through mach5 tier1-7 using ZGC * Performance testing through aurora to verify no regression occur * Manual testing to verify performance * Manual testing to verify we avoid page cache flushing ------------- Commit messages: - StefanK comments and reuse of share page addr - 8339387: ZGC: Synchronize medium page allocation Changes: https://git.openjdk.org/jdk/pull/20883/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20883&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339387 Stats: 57 lines in 2 files changed: 49 ins; 5 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20883.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20883/head:pull/20883 PR: https://git.openjdk.org/jdk/pull/20883 From eosterlund at openjdk.org Fri Sep 6 08:25:52 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 6 Sep 2024 08:25:52 GMT Subject: RFR: 8339387: ZGC: Synchronize medium page allocation In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 07:14:11 GMT, Stefan Johansson wrote: > Please review this change to synchronize medium page allocations in ZGC. > > **Summary** > In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing. > > This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case. > > **Testing** > * Functional testing through mach5 tier1-7 using ZGC > * Performance testing through aurora to verify no regression occur > * Manual testing to verify performance > * Manual testing to verify we avoid page cache flushing Looks good! Thanks for fixing. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20883#pullrequestreview-2285376634 From sjohanss at openjdk.org Fri Sep 6 08:25:52 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 6 Sep 2024 08:25:52 GMT Subject: RFR: 8339387: ZGC: Synchronize medium page allocation In-Reply-To: References: Message-ID: <-eVvccFIXfKeQNnEsbFJpW_C8WUmYx4fArqTUuTBoY4=.8551f684-57ed-4ba3-aff1-db367532a5b9@github.com> On Fri, 6 Sep 2024 07:14:11 GMT, Stefan Johansson wrote: > Please review this change to synchronize medium page allocations in ZGC. > > **Summary** > In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing. > > This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case. > > **Testing** > * Functional testing through mach5 tier1-7 using ZGC > * Performance testing through aurora to verify no regression occur > * Manual testing to verify performance > * Manual testing to verify we avoid page cache flushing As mentioned in the summary, there is no direct performance improvement seen in most benchmarks by this change. But looking at memory usage from our logs we can see improvements in how ZGC uses memory. In the below statistics logging from the end of a benchmark run where medium objects are in use we can see some of the improvements. Even if they don't translate into a score improvement, they will improve the latency of some allocation operations. Baseline: [369.264s][info][gc,stats ] Last 10s Last 10m [369.264s][info][gc,stats ] Avg / Max Avg / Max [369.264s][info][gc,stats ] Memory: Allocation Rate 438 / 950 684 / 2846 684 / 2846 684 / 2846 MB/s [369.264s][info][gc,stats ] Memory: Defragment 0 / 0 18 / 190 18 / 190 18 / 190 ops/s [369.264s][info][gc,stats ] Memory: Page Cache Flush 0 / 0 36 / 380 36 / 380 36 / 380 MB/s [369.264s][info][gc,stats ] Memory: Undo Page Allocation 0 / 1 2 / 71 2 / 71 2 / 71 ops/s With this change: [369.104s][info][gc,stats ] Memory: Allocation Rate 465 / 620 612 / 1086 612 / 1086 612 / 1086 MB/s [369.104s][info][gc,stats ] Memory: Defragment 0 / 0 0 / 0 0 / 0 0 / 0 ops/s [369.104s][info][gc,stats ] Memory: Page Cache Flush 0 / 0 0 / 0 0 / 0 0 / 0 MB/s [369.104s][info][gc,stats ] Memory: Undo Page Allocation 0 / 0 0 / 8 0 / 8 0 / 8 ops/s Additional details about the different lines: **Allocation rate** - The maximum allocation rate is down, because its not inflated by many unnecessary medium page allocation happening at once. **Defragment** - ZGC try to defragment the virtual address space by remapping memory used by small page from high addresses to low. This will only happen when the page cache only caches medium and large pages, which might be case after a set of medium page allocations that are later undone. In this run all such defragmentations were avoided. **Page Cache Flush** - When there are no medium (or large) pages available in the cache, the cache needs to be flushed to allow a creation of a new page. When not doing the unnecessary allocations ZGC is able to avoid flushing in this benchmark. **Undo Page Allocation** - When a page is allocated but later found to not be needed, we undo the page allocation. This can happen for small pages as well, so we still have some undos. But the one for medium pages are avoided. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20883#issuecomment-2333513053 From rcastanedalo at openjdk.org Fri Sep 6 08:49:35 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 08:49:35 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'TheRealMDoerr/8334111_PPC64_G1_Barriers_V2' into JDK-8334060-g1-late-barrier-expansion - Cleanup g1_ppc.ad ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/9821e795..22e07ef0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=14-15 Stats: 40 lines in 1 file changed: 4 ins; 30 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Fri Sep 6 08:49:35 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 08:49:35 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 10:05:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary g1LoadXVolatile instructions in aarch64 > I've implemented the same cleanup as on aarch64: [TheRealMDoerr at ad662a2](https://github.com/TheRealMDoerr/jdk/commit/ad662a256034a09156b1b43673d2640a119740b2) Would be nice if you could apply it. Thanks! Sure, merged now (commit 22e07ef03a). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333553391 From rcastanedalo at openjdk.org Fri Sep 6 09:43:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 09:43:57 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: <01AjvN-E7dfMD3IbJHbjLpHwe3VbMuPPloi3vt7Bxxk=.bc1bb5e5-6cd9-414d-8c76-a224fe856dcd@github.com> On Thu, 5 Sep 2024 20:36:01 GMT, halkosajtarevic wrote: > Sorry, one maybe dumb question, hopefully matching the context here: Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards? Hi, do you mean whether G1 requires barriers when writing enum instances into object fields, as in `storeEnum` in this example? (...) public enum Day {MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY}; static class MyObject { Day day; } public static void storeEnum(MyObject o, Day d) { o.day = d; } (...) MyObject o = new MyObject(); Day d = Day.TUESDAY; storeEnum(o, d); (...) If so, the answer is yes: C2 treats this case as any other object write and generates GC barriers accordingly. Do you have any specific optimization in mind? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333674779 From duke at openjdk.org Fri Sep 6 10:14:59 2024 From: duke at openjdk.org (halkosajtarevic) Date: Fri, 6 Sep 2024 10:14:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 08:49:35 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'TheRealMDoerr/8334111_PPC64_G1_Barriers_V2' into JDK-8334060-g1-late-barrier-expansion > - Cleanup g1_ppc.ad Yes exactly, that was what I meant. I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333731725 From mbaesken at openjdk.org Fri Sep 6 10:32:01 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 6 Sep 2024 10:32:01 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate Message-ID: The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational shows this error when running with ubsan enabled src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 ------------- Commit messages: - JDK-8339648 Changes: https://git.openjdk.org/jdk/pull/20888/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20888&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339648 Stats: 6 lines in 1 file changed: 4 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20888.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20888/head:pull/20888 PR: https://git.openjdk.org/jdk/pull/20888 From mbaesken at openjdk.org Fri Sep 6 10:38:49 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 6 Sep 2024 10:38:49 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 10:26:19 GMT, Matthias Baesken wrote: > The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational > shows this error when running with ubsan enabled > > src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero > #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 > #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 > #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 > #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 > #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 > #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 So we should avoid the division in case the divisor is zero and rewrite the coding a bit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20888#issuecomment-2333770996 From amitkumar at openjdk.org Fri Sep 6 10:43:54 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 6 Sep 2024 10:43:54 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: <01AjvN-E7dfMD3IbJHbjLpHwe3VbMuPPloi3vt7Bxxk=.bc1bb5e5-6cd9-414d-8c76-a224fe856dcd@github.com> References: <01AjvN-E7dfMD3IbJHbjLpHwe3VbMuPPloi3vt7Bxxk=.bc1bb5e5-6cd9-414d-8c76-a224fe856dcd@github.com> Message-ID: <5SBKgUwrPmIXH0hA64aKRsYZiHMg0M0uh_IjFq_xdAo=.f323ec69-adf3-4722-a5cb-0c49cfb8c5b1@github.com> On Fri, 6 Sep 2024 09:40:56 GMT, Roberto Casta?eda Lozano wrote: >> Sorry, one maybe dumb question, hopefully matching the context here: >> Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards? > >> Sorry, one maybe dumb question, hopefully matching the context here: Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards? > > Hi, do you mean whether G1 requires barriers when writing enum instances into object fields, as in `storeEnum` in this example? > > > (...) > > public enum Day {MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY}; > > static class MyObject { > Day day; > } > > public static void storeEnum(MyObject o, Day d) { > o.day = d; > } > > (...) > > MyObject o = new MyObject(); > Day d = Day.TUESDAY; > storeEnum(o, d); > > (...) > > > If so, the answer is yes: C2 treats this case as any other object write and generates GC barriers accordingly. Do you have any specific optimization in mind? Hi @robcasloz, you can pick up s390x patch from here: https://github.com/offamitkumar/jdk/commit/6663433c4aa17925f699eaa8995cdc0cd78c0034 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333779374 From rcastanedalo at openjdk.org Fri Sep 6 12:07:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 12:07:57 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 10:12:19 GMT, halkosajtarevic wrote: > I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333907222 From duke at openjdk.org Fri Sep 6 12:49:01 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Fri, 6 Sep 2024 12:49:01 GMT Subject: RFR: 8339661: ZGC: Move some page resets and verification to callsites Message-ID: Currently, `ZPage::reset()` does different things depending on where the page is being reset and what it will be used for. This leads to checks being somewhat hard to understand and to follow the edge-cases of. By moving some of the reset logic that is now part of `ZPage::reset()` to the callsite, some operations can be made easier to understand the reason behind when reading the code. Main highlights: - Clear logic behind initializing remsets at callsites, now guarded by asserts in `ZPage::remset_initialize()`. - `ZPage::clone_limited()` retains the value of the top-pointer. - The kind of verification for remsets are now at callsites: - Allocations from the page cache, and only if the page got a remset - Old-to-old in-place relocations, where only the inactive remset is checked ------------- Commit messages: - 8339661: ZGC: Move some page resets and verification to callsites Changes: https://git.openjdk.org/jdk/pull/20890/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20890&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339661 Stats: 127 lines in 6 files changed: 34 ins; 64 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/20890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20890/head:pull/20890 PR: https://git.openjdk.org/jdk/pull/20890 From aboldtch at openjdk.org Fri Sep 6 12:51:55 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 6 Sep 2024 12:51:55 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 10:26:19 GMT, Matthias Baesken wrote: > The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational > shows this error when running with ubsan enabled > > src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero > #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 > #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 > #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 > #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 > #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 > #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 src/hotspot/share/gc/z/zDirector.cpp line 524: > 522: const double current_old_gc_time_per_bytes_freed = double(old_gc_time) / double(reclaimed_per_old_gc); > 523: old_garbage_is_cheaper = current_old_gc_time_per_bytes_freed < current_young_gc_time_per_bytes_freed; > 524: } Ending up with `old_garbage_is_cheaper == true` when `reclaimed_per_old_gc == 0` seems wrong to me. Division by 0.0 is weird in C++. Do we even build for systems where it would not be supported. But regardless to me I feel like the change here should be more like: - const double current_old_gc_time_per_bytes_freed = double(old_gc_time) / double(reclaimed_per_old_gc); + const double current_old_gc_time_per_bytes_freed = reclaimed_per_old_gc == 0 ? std::numeric_limits::infinity : double(old_gc_time) / double(reclaimed_per_old_gc); Which is the behaviour I expect us to currently have, given that `old_gc_time` should be a positive number (`>0.0`). The `!stats._old_stats._cycle._is_time_trustable` check above should protect against `0.0`. I expect that this division we see happens when we have run a warmup major collection which did no reclaim any memory. And this change would trigger us to try and promote a minor collection to a major collection. I am no expert on our supported platforms matrix w.r.t. floating numbers and `std::numeric_limits::has_infinity`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1747065752 From stefank at openjdk.org Fri Sep 6 13:03:48 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 6 Sep 2024 13:03:48 GMT Subject: RFR: 8339661: ZGC: Move some page resets and verification to callsites In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 12:43:28 GMT, Joel Sikstr?m wrote: > Currently, `ZPage::reset()` does different things depending on where the page is being reset and what it will be used for. This leads to checks being somewhat hard to understand and to follow the edge-cases of. > > By moving some of the reset logic that is now part of `ZPage::reset()` to the callsite, some operations can be made easier to understand the reason behind when reading the code. > > Main highlights: > - Clear logic behind initializing remsets at callsites, now guarded by asserts in `ZPage::remset_initialize()`. > - `ZPage::clone_limited()` retains the value of the top-pointer. > - The kind of verification for remsets are now at callsites: > - Allocations from the page cache, and only if the page got a remset > - Old-to-old in-place relocations, where only the inactive remset is checked Looks good! ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20890#pullrequestreview-2286216362 From rcastanedalo at openjdk.org Fri Sep 6 14:15:41 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 14:15:41 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: s390 port : late barrier expansion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/22e07ef0..6663433c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=15-16 Stats: 896 lines in 8 files changed: 837 ins; 32 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Fri Sep 6 14:15:42 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 14:15:42 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 12:04:52 GMT, Roberto Casta?eda Lozano wrote: >> Yes exactly, that was what I meant. >> I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. > >> I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. > > As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. > Hi @robcasloz, you can pick up s390x patch from here: [offamitkumar at 6663433](https://github.com/offamitkumar/jdk/commit/6663433c4aa17925f699eaa8995cdc0cd78c0034) Done, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2334130205 From kbarrett at openjdk.org Fri Sep 6 20:26:09 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 6 Sep 2024 20:26:09 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 12:04:52 GMT, Roberto Casta?eda Lozano wrote: > > I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. > > As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. @robcasloz is correct, the GCs don't have any special knowledge about enum instances. They are ordinary objects, though probably long-lived so will eventually migrate to the old generation. Trying to do anything special with them seems very unlikely to provide a benefit worth the costs involved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2334754544 From duke at openjdk.org Fri Sep 6 20:26:10 2024 From: duke at openjdk.org (halkosajtarevic) Date: Fri, 6 Sep 2024 20:26:10 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 20:21:11 GMT, Kim Barrett wrote: > > > I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. > > > > > > As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. > > @robcasloz is correct, the GCs don't have any special knowledge about enum instances. They are ordinary objects, though probably long-lived so will eventually migrate to the old generation. Trying to do anything special with them seems very unlikely to provide a benefit worth the costs involved. Thank you very much for the insights! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2334756865 From kbarrett at openjdk.org Sat Sep 7 04:15:14 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Sep 2024 04:15:14 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 14:15:41 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > s390 port : late barrier expansion I've reviewed the non-compiler GC changes. I've looked over the compiler changes, but can't claim to have reviewed them. I've also reviewed the x64 changes, and looked over the aarch64 changes. src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 176: > 174: __ jcc(Assembler::zero, runtime); // jump to runtime if index == 0 (full buffer) > 175: // The buffer is not full, store value into it. > 176: __ subptr(temp, wordSize); // temp := next index Instead of __ testptr(temp, temp); __ jcc(Assembler::zero, runtime); __ subptr(temp, wordSize); it seems like this might be better __ subptr(temp, wordSize); __ jcc(Assembler::below, runtime); I think the code in the PR matches what the early expansion generates, so I think a change here can be deferred to a followup. src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 354: > 352: __ bind(runtime); > 353: // save the live input values > 354: RegSet saved = RegSet::of(store_addr NOT_LP64(COMMA thread)); I was looking at this a while ago, and haven't figured out why we're saving `store_addr` here. Also not sure why we're saving `thread` here for 32bit platforms. Something to think about for the future. Though maybe the 32bit case will be gone by then :) src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 112: > 110: // The answer is that stores of different sizes can co-exist > 111: // in the same sequence of RawMem effects. We sometimes initialize > 112: // a whole 'tile' of array elements with a single jint or jlong.) I'm having trouble making sense of this comment. I guess a jlong could be used to null-initialize two 32bit oops/narrowOops? But that doesn't have anything to do with jints. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2287188386 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747741376 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747824868 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747898995 From mdoerr at openjdk.org Sat Sep 7 12:40:10 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 7 Sep 2024 12:40:10 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 14:15:41 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > s390 port : late barrier expansion I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2335174688 From fjiang at openjdk.org Mon Sep 9 06:09:12 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 9 Sep 2024 06:09:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: <2Iqb8t5nI61Zq22PafvY9QUUw_9OZ7oHygSdOY6QCX8=.f1338ef5-d646-45aa-bcb6-54f0dd13bc87@github.com> On Fri, 6 Sep 2024 14:02:58 GMT, Roberto Casta?eda Lozano wrote: >>> I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. >> >> As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. > >> Hi @robcasloz, you can pick up s390x patch from here: [offamitkumar at 6663433](https://github.com/offamitkumar/jdk/commit/6663433c4aa17925f699eaa8995cdc0cd78c0034) > > Done, thanks! > > Hi @robcasloz, here is the implementation for RISC-V: [feilongjiang at 1c012cf](https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6) We are still testing the latest changes, results will be updated later. > > Great, thanks @feilongjiang! Just let me know when you are done with testing and I will merge it into this changeset. Tier1-3 & hotspot:tier4 test result is clean on linux-riscv64 platform. No regression observed for performance. (Applied on JDK head). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2337203016 From aboldtch at openjdk.org Mon Sep 9 06:18:08 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 9 Sep 2024 06:18:08 GMT Subject: RFR: 8339387: ZGC: Synchronize medium page allocation In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 07:14:11 GMT, Stefan Johansson wrote: > Please review this change to synchronize medium page allocations in ZGC. > > **Summary** > In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing. > > This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case. > > **Testing** > * Functional testing through mach5 tier1-7 using ZGC > * Performance testing through aurora to verify no regression occur > * Manual testing to verify performance > * Manual testing to verify we avoid page cache flushing lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20883#pullrequestreview-2288903518 From sjohanss at openjdk.org Mon Sep 9 06:46:25 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Mon, 9 Sep 2024 06:46:25 GMT Subject: RFR: 8339387: ZGC: Synchronize medium page allocation [v2] In-Reply-To: References: Message-ID: > Please review this change to synchronize medium page allocations in ZGC. > > **Summary** > In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing. > > This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case. > > **Testing** > * Functional testing through mach5 tier1-7 using ZGC > * Performance testing through aurora to verify no regression occur > * Manual testing to verify performance > * Manual testing to verify we avoid page cache flushing Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: - Review - use explicit null checks - StefanK review - change lock type ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20883/files - new: https://git.openjdk.org/jdk/pull/20883/files/66a9a238..fd5ad8b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20883&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20883&range=00-01 Stats: 19 lines in 2 files changed: 7 ins; 8 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20883.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20883/head:pull/20883 PR: https://git.openjdk.org/jdk/pull/20883 From aboldtch at openjdk.org Mon Sep 9 07:11:04 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 9 Sep 2024 07:11:04 GMT Subject: RFR: 8339387: ZGC: Synchronize medium page allocation [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 06:46:25 GMT, Stefan Johansson wrote: >> Please review this change to synchronize medium page allocations in ZGC. >> >> **Summary** >> In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing. >> >> This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case. >> >> **Testing** >> * Functional testing through mach5 tier1-7 using ZGC >> * Performance testing through aurora to verify no regression occur >> * Manual testing to verify performance >> * Manual testing to verify we avoid page cache flushing > > Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: > > - Review - use explicit null checks > - StefanK review - change lock type Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20883#pullrequestreview-2288987407 From stefank at openjdk.org Mon Sep 9 07:11:05 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 07:11:05 GMT Subject: RFR: 8339387: ZGC: Synchronize medium page allocation [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 06:46:25 GMT, Stefan Johansson wrote: >> Please review this change to synchronize medium page allocations in ZGC. >> >> **Summary** >> In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing. >> >> This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case. >> >> **Testing** >> * Functional testing through mach5 tier1-7 using ZGC >> * Performance testing through aurora to verify no regression occur >> * Manual testing to verify performance >> * Manual testing to verify we avoid page cache flushing > > Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: > > - Review - use explicit null checks > - StefanK review - change lock type Looks good! ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20883#pullrequestreview-2288990050 From rcastanedalo at openjdk.org Mon Sep 9 07:44:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 07:44:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Sat, 7 Sep 2024 12:37:54 GMT, Martin Doerr wrote: > I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed. Great, thanks for testing Martin! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2337362381 From mbaesken at openjdk.org Mon Sep 9 07:46:06 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 9 Sep 2024 07:46:06 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 10:26:19 GMT, Matthias Baesken wrote: > The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational > shows this error when running with ubsan enabled > > src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero > #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 > #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 > #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 > #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 > #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 > #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 looks like for clang https://bugs.llvm.org/show_bug.cgi?id=17000#c1 the float division by 0 became defined behavior, but it might be different for other compilers. I think it depends not only on the platform but also on the compiler. See the discussion here https://stackoverflow.com/questions/42926763/the-behaviour-of-floating-point-division-by-zero ------------- PR Comment: https://git.openjdk.org/jdk/pull/20888#issuecomment-2337366232 From rkennke at openjdk.org Mon Sep 9 10:29:55 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 10:29:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Fix compiler/c2/irTests/TestPadding.java for +COH - Simplify arrayOopDesc::length_offset_in_bytes and oopDesc::base_offset_in_bytes - Nit in header_size - GC code tweaks - Fix runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java - Fix jdk/tools/jlink/plugins/CDSPluginTest.java - Cleanup markWord bits and comments - x86_64: Fix loadNKlassCompactHeaders - aarch64: Fix loadNKlassCompactHeaders - Use FLAG_SET_ERGO when turning off UseCompactObjectHeaders - ... and 16 more: https://git.openjdk.org/jdk/compare/b45fe174...49126383 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=06 Stats: 4465 lines in 189 files changed: 3175 ins; 678 del; 612 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rcastanedalo at openjdk.org Mon Sep 9 11:15:47 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 11:15:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v18] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion - riscv port for JEP 475 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/6663433c..94145917 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=16-17 Stats: 860 lines in 4 files changed: 771 ins; 49 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Mon Sep 9 11:15:47 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 11:15:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 07:41:06 GMT, Roberto Casta?eda Lozano wrote: >> I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed. > >> I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed. > > Great, thanks for testing Martin! > > > Hi @robcasloz, here is the implementation for RISC-V: [feilongjiang at 1c012cf](https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6) We are still testing the latest changes, results will be updated later. > > > > > > Great, thanks @feilongjiang! Just let me know when you are done with testing and I will merge it into this changeset. > > Tier1-3 & hotspot:tier4 test result is clean on linux-riscv64 platform. No regression observed for performance. (Applied on JDK head). Thanks @feilongjiang, merged now (commit 94145917). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2337824882 From sjohanss at openjdk.org Mon Sep 9 11:17:12 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Mon, 9 Sep 2024 11:17:12 GMT Subject: RFR: 8339387: ZGC: Synchronize medium page allocation [v2] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 08:22:53 GMT, Erik ?sterlund wrote: >> Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: >> >> - Review - use explicit null checks >> - StefanK review - change lock type > > Looks good! Thanks for fixing. Thanks for the reviews @fisk, @xmas92 and @stefank. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20883#issuecomment-2337828395 From sjohanss at openjdk.org Mon Sep 9 11:17:14 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Mon, 9 Sep 2024 11:17:14 GMT Subject: Integrated: 8339387: ZGC: Synchronize medium page allocation In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 07:14:11 GMT, Stefan Johansson wrote: > Please review this change to synchronize medium page allocations in ZGC. > > **Summary** > In ZGC objects of a certain size class are allocated in medium sized pages. For each age there is a single medium page shared by all mutators. When this page gets full all thread that try to do a medium page allocation will try to allocate and install a new medium page, but only one will succeed. This can lead to a lot of unnecessary medium page allocation which in turn can lead to the unnecessary page cache flushing. > > This change introduces synchronization to only a allow a single thread to allocate the medium page in the common case. > > **Testing** > * Functional testing through mach5 tier1-7 using ZGC > * Performance testing through aurora to verify no regression occur > * Manual testing to verify performance > * Manual testing to verify we avoid page cache flushing This pull request has now been integrated. Changeset: 347d5728 Author: Stefan Johansson URL: https://git.openjdk.org/jdk/commit/347d5728e69ae1f7d1a24820cc2c17bb0b8c0af5 Stats: 47 lines in 2 files changed: 44 ins; 1 del; 2 mod 8339387: ZGC: Synchronize medium page allocation Reviewed-by: aboldtch, stefank, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/20883 From rcastanedalo at openjdk.org Mon Sep 9 11:35:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 11:35:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 21:33:42 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> s390 port : late barrier expansion > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 176: > >> 174: __ jcc(Assembler::zero, runtime); // jump to runtime if index == 0 (full buffer) >> 175: // The buffer is not full, store value into it. >> 176: __ subptr(temp, wordSize); // temp := next index > > Instead of > > __ testptr(temp, temp); > __ jcc(Assembler::zero, runtime); > __ subptr(temp, wordSize); > > it seems like this might be better > > __ subptr(temp, wordSize); > __ jcc(Assembler::below, runtime); > > I think the code in the PR matches what the early expansion generates, so I think a change here > can be deferred to a followup. Good point, thanks! I made a note for follow-up work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1750088920 From mbaesken at openjdk.org Mon Sep 9 11:37:41 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 9 Sep 2024 11:37:41 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v2] In-Reply-To: References: Message-ID: > The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational > shows this error when running with ubsan enabled > > src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero > #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 > #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 > #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 > #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 > #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 > #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Adjust division following suggestion by xmas ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20888/files - new: https://git.openjdk.org/jdk/pull/20888/files/c66c089e..21fe3ca7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20888&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20888&range=00-01 Stats: 6 lines in 1 file changed: 1 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20888.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20888/head:pull/20888 PR: https://git.openjdk.org/jdk/pull/20888 From mbaesken at openjdk.org Mon Sep 9 11:41:05 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 9 Sep 2024 11:41:05 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v2] In-Reply-To: References: Message-ID: <5oKpxfys6Bj1vhYQURKt_TYMXqJ1u-R2FrMXwZJrUng=.3ddf9704-cbc1-44ae-b871-b3b5b7bd821d@github.com> On Mon, 9 Sep 2024 11:37:41 GMT, Matthias Baesken wrote: >> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational >> shows this error when running with ubsan enabled >> >> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero >> #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 >> #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 >> #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 >> #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 >> #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 >> #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust division following suggestion by xmas Hi Axel, I adjusted the coding following your suggestion. Btw. is there maybe already somewhere a template function doing that division handling divisor 0? Probably it is not the only place in the codebase where this can happen ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20888#issuecomment-2337879054 From rcastanedalo at openjdk.org Mon Sep 9 11:48:11 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 11:48:11 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 23:57:59 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> s390 port : late barrier expansion > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 354: > >> 352: __ bind(runtime); >> 353: // save the live input values >> 354: RegSet saved = RegSet::of(store_addr NOT_LP64(COMMA thread)); > > I was looking at this a while ago, and haven't figured out why we're saving `store_addr` here. > Also not sure why we're saving `thread` here for 32bit platforms. > Something to think about for the future. Though maybe the 32bit case will be gone by then :) I'm not sure either, this is in any case pre-existing interpreter code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1750105760 From rkennke at openjdk.org Mon Sep 9 11:55:52 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 11:55:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Try to avoid lea in loadNklass (aarch64) - Fix release build error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/49126383..70f492d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=06-07 Stats: 24 lines in 5 files changed: 12 ins; 1 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From tschatzl at openjdk.org Mon Sep 9 12:40:13 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 12:40:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 10:29:55 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Fix compiler/c2/irTests/TestPadding.java for +COH > - Simplify arrayOopDesc::length_offset_in_bytes and oopDesc::base_offset_in_bytes > - Nit in header_size > - GC code tweaks > - Fix runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java > - Fix jdk/tools/jlink/plugins/CDSPluginTest.java > - Cleanup markWord bits and comments > - x86_64: Fix loadNKlassCompactHeaders > - aarch64: Fix loadNKlassCompactHeaders > - Use FLAG_SET_ERGO when turning off UseCompactObjectHeaders > - ... and 16 more: https://git.openjdk.org/jdk/compare/b45fe174...49126383 src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 481: > 479: Klass* klass = UseCompactObjectHeaders > 480: ? old_mark.klass() > 481: : old->klass(); To be exact "promotion" only refers to copying to an older generation, so this comment does not cover objects copied within the generation. Suggestion: // NOTE: With compact headers, it is not safe to load the Klass* from old, because // that would access the mark-word, that might change at any time by concurrent // workers. // This mark word would refer to a forwardee, which may not yet have completed // copying. Therefore we must load the Klass* from the mark-word that we already // loaded. This is safe, because we only enter here if not yet forwarded. src/hotspot/share/gc/parallel/mutableSpace.cpp line 225: > 223: // header-based forwarding during promotion. Full GC doesn't > 224: // use the object header for forwarding at all. > 225: p += obj->forwardee()->size(); Better use `!obj->is_self_forwarded()` here. src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 174: > 172: // may not yet have completed copying. Therefore we must load the Klass* from > 173: // the mark-word that we have already loaded. This is safe, because we have checked > 174: // that this is not yet forwarded in the caller.) Same adjustment needed as for G1. src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 711: > 709: // 8 - 32-bit VM > 710: // 12 - 64-bit VM, compressed klass > 711: // 16 - 64-bit VM, normal klass The comment needs to be adapted to include the case for compact object headers. src/hotspot/share/oops/arrayOop.hpp line 83: > 81: // The _length field is not declared in C++. It is allocated after the > 82: // declared nonstatic fields in arrayOopDesc if not compressed, otherwise > 83: // it occupies the second half of the _klass field in oopDesc. Needs update. src/hotspot/share/oops/instanceOop.hpp line 36: > 34: class instanceOopDesc : public oopDesc { > 35: public: > 36: // If compressed, the offset of the fields of the instance may not be aligned. Needs fixing (or removal) wrt to compact object headers, or move to the particular case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750046114 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750056160 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750074607 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750080552 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750027009 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750116336 From tschatzl at openjdk.org Mon Sep 9 12:40:14 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 12:40:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/hotspot/share/gc/shared/collectedHeap.cpp line 232: > 230: } > 231: > 232: // With compact headers, we can't safely access the class, due Suggestion: // With compact headers, we can't safely access the klass, due This is the case why? Because we might not have copied the header yet? Is this method actually ever used while the forwarded object is unstable? Given this is used for verification only afaik, we should make an effort to provide that check. src/hotspot/share/gc/shared/gcForwarding.hpp line 34: > 32: > 33: /* > 34: * Implements forwarding for the full-GCs of Serial, Parallel, G1 and Shenandoah in Suggestion: * Implements forwarding for the Full GCs of Serial, Parallel, G1 and Shenandoah in src/hotspot/share/gc/shared/gcForwarding.hpp line 41: > 39: * bits (to indicate 'forwarded' state as usual). > 40: */ > 41: class GCForwarding : public AllStatic { Since this class is only used for Full GCs, it may be useful to include that information, i.e. something like `FullGCForwarding` to avoid confusion why it is not used for other GCs too. (Unless this has been discussed and even rejected by me before). src/hotspot/share/oops/compressedKlass.hpp line 43: > 41: > 42: // Tiny-class-pointer mode > 43: static int _tiny_cp; // -1, 0=true, 1=false Suggestion: static int _tiny_cp; // -1 = uninitialized, 0 = true, 1 = false In addition to that, I am not sure if introducing a new term ("tiny") for compact class header related changes (and just here) makes the code more clear; I would have expected a "_compact_" prefix. Also all other members use "k"-klass and spell out "klass pointer", so I would prefer to keep that style. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749995275 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749980748 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749987945 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749969456 From tschatzl at openjdk.org Mon Sep 9 12:40:18 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 12:40:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> On Mon, 9 Sep 2024 11:55:52 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Try to avoid lea in loadNklass (aarch64) > - Fix release build error src/hotspot/share/oops/klass.hpp line 169: > 167: // contention that may happen when a nearby object is modified. > 168: AccessFlags _access_flags; // Access flags. The class/interface distinction is stored here. > 169: // Some flags created by the JVM, not in the class file itself, Suggestion: markWord _prototype_header; // Used to initialize objects' header with compact headers. Maybe some comment why this is an instance member. src/hotspot/share/oops/objArrayKlass.inline.hpp line 74: > 72: void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { > 73: // In this assert, we cannot safely access the Klass* with compact headers. > 74: assert (UseCompactObjectHeaders || obj->is_array(), "obj must be array"); If we can't safely access the `Klass*` here, why is the call to `obj->klass()` below safe? src/hotspot/share/oops/oop.cpp line 157: > 155: bool oopDesc::has_klass_gap() { > 156: // Only has a klass gap when compressed class pointers are used. > 157: // Except when using compact headers. Suggestion: // Only has a klass gap when compressed class pointers are used and not // using compact headers. (Not sure if repeating the fairly simple disjunction below makes sense, but there has been a comment before too) src/hotspot/share/oops/oop.cpp line 230: > 228: // disjunct below to fail if the two comparands are computed across such > 229: // a concurrent change. > 230: return Universe::heap()->is_stw_gc_active() && klass->is_objArray_klass() && is_forwarded() && (UseParallelGC || UseG1GC); Is this still true after the recent changes like JDK-8311163? It might be worth waiting for. src/hotspot/share/oops/oop.hpp line 103: > 101: static inline void set_klass_gap(HeapWord* mem, int z); > 102: > 103: // size of object header, aligned to platform wordSize Suggestion: // Size of object header, aligned to platform wordSize Pre-existing src/hotspot/share/oops/oop.hpp line 108: > 106: return sizeof(markWord) / HeapWordSize; > 107: } else { > 108: return sizeof(oopDesc) / HeapWordSize; Suggestion: return sizeof(oopDesc) / HeapWordSize; src/hotspot/share/oops/oop.hpp line 134: > 132: inline Klass* forward_safe_klass(markWord m) const; > 133: inline size_t forward_safe_size(); > 134: inline void forward_safe_init_mark(); Given the comment these methods do not seem "safe" to me. Maybe use "raw" or something to better indicate that care must be taken to use them. Maybe the "safe" refers to use them only in "safe" contexts, but in Hotspot code iirc we use something like "raw" or "unsafe". src/hotspot/share/oops/oop.hpp line 295: > 293: // this call returns null for that thread; any other thread has the > 294: // value of the forwarding pointer returned and does not modify "this". > 295: inline oop forward_to_atomic(oop p, markWord compare, atomic_memory_order order = memory_order_conservative); Maybe add an assert in the implementation so that it is not used for self-forwarding. Same for `forward_to`. src/hotspot/share/oops/oop.hpp line 356: > 354: return mark_offset_in_bytes() + sizeof(markWord) / 2; > 355: } else > 356: #endif Maybe instead of trying to calculate some random, meaningless value just use some "random" value directly? I am fine with the existing code, but first stating directly that "any value" works here, this additional code seems to confuse the message. (Fwiw, the method is also used during Universe initialization). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750118470 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750143956 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750145460 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750150640 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750154114 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750153663 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750157781 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750159516 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750163768 From tschatzl at openjdk.org Mon Sep 9 12:45:07 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 12:45:07 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 11:55:52 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Try to avoid lea in loadNklass (aarch64) > - Fix release build error Only looked at GC and runtime changes, only very briefly at compiler stuff. Only looked at GC and runtime changes, only very briefly at compiler stuff. ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2289786482 PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2289800458 From rkennke at openjdk.org Mon Sep 9 12:52:07 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 12:52:07 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 18:10:44 GMT, Albert Mingkun Yang wrote: >> FWIW, the ParallelGC does something very similar to what you propose, except that it walks bitmaps instead of paring the space to find the self-forwarded objects. It then has a check inside object_iterate to make sure that it doesn't expose the dead objects (in eden and the from space) to heap dumpers and histogram printers. >> >> Because of the the code above, the SerialGC clears away the information about what objects are dead in eden and the from space, so heap dumpers and histogram printers will include these dead objects. We might want to fix that as a future RFE. > >> If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. > > True; need to do sth like `obj->init_mark();` for the non-self-forwarded case. The postcondition is that no forwarded objs in eden/from. ParallelGC actually doesn't use bitmaps, it pushes all forwarded objs to preserved-marks-table, and uses that to find forwarded objects, which is why we can't remove the preserved-marks table in ParallelGC (IOW, after this patch, the preserved-marks-stuff in Parallel scavenger is *only* used to find forwarded objects. We might want to think about more efficient solutions for this). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750199051 From rkennke at openjdk.org Mon Sep 9 13:02:08 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 13:02:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: <3QGPH52NyrDPne5EgoGx2sx9OeGRu9K72onNNwzMr2M=.8a390b3d-2e8a-470e-8bb7-1ba975070c53@github.com> References: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> <3QGPH52NyrDPne5EgoGx2sx9OeGRu9K72onNNwzMr2M=.8a390b3d-2e8a-470e-8bb7-1ba975070c53@github.com> Message-ID: On Fri, 30 Aug 2024 07:42:39 GMT, Thomas Stuefe wrote: >> Yes. This silent setting of UseCompactObjectHeaders ended up hiding why we got CDS failures. I would also suggest that we change this to FLAG_SET_ERGO. > > Seems we run all into the same thoughts :) > > I added > > Suggestion: > > FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); > warning("Compact object headers require a java heap size smaller than %zu (given: %zu). " > "Disabling compact object headers.", max_narrow_heap_size * HeapWordSize, max_heap_size); That %zu is SIZE_FORMAT, right? This should probably use proper_unit_for_byte_size()/byte_size_in_proper_unit(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750215510 From rkennke at openjdk.org Mon Sep 9 13:31:10 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 13:31:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> References: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> Message-ID: On Thu, 22 Aug 2024 19:50:21 GMT, Albert Mingkun Yang wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix hash shift for 32 bit builds > > src/hotspot/share/gc/shared/gcForwarding.hpp line 36: > >> 34: * Implements forwarding for the full-GCs of Serial, Parallel, G1 and Shenandoah in >> 35: * a way that preserves upper N bits of object mark-words, which contain crucial >> 36: * Klass* information when running with compact headers. The encoding is similar to > > This doc suggests this forwarding is only for compact-header so I wonder if we can check `UseCompactObjectHeaders` directly instead of heap-size in `GCForwarding::initialize`. Right. The original implementation was more complex and then the consensus was to not sprinkle UseCompactHeaders all over the place, but with that new/simpler implementation it makes sense to simply check the UCOH flag. > src/hotspot/share/gc/shared/gcForwarding.hpp line 40: > >> 38: * heap-base, shifts that difference into the right place, and sets the lowest two >> 39: * bits (to indicate 'forwarded' state as usual). >> 40: */ > >> "can use 40 bits for forwardee encoding. That's enough for 8TB of heap." > > I feel this 8T-constraint is significant and should be in the doc. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750264571 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750265026 From rkennke at openjdk.org Mon Sep 9 14:11:08 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 14:11:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: <6I0T4rOjOTj-FZxpspatEo6j1_Num75bCAOBNxsrHI8=.097f731e-7c92-4eac-a379-c2df336cd412@github.com> On Tue, 27 Aug 2024 07:43:07 GMT, Hamlin Li wrote: >> @Hamlin-Li : AFAIK, porting to linux-riscv platform has NOT been started yet. To avoid duplicate work, please let me know if anyone is interested or has been working on it :-) > > Yes, I'm interested in it. Thanks for raising the discussion. :) If anybody is doing it, please send me a patch, or we can do it as a follow-up PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750345203 From rkennke at openjdk.org Mon Sep 9 14:11:10 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 14:11:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 11:38:39 GMT, Hamlin Li wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/oops/oop.inline.hpp line 94: > >> 92: >> 93: void oopDesc::init_mark() { >> 94: if (UseCompactObjectHeaders) { > > Seems only `set_mark(prototype_mark());` is fine for both cases? Right. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750342555 From rkennke at openjdk.org Mon Sep 9 14:35:08 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 14:35:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 21:52:58 GMT, Chris Plummer wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 169: > >> 167: } else { >> 168: visitor.doMetadata(klass, true); >> 169: } > > Why is there no `visitor.doMetadata()` call for the compact object header case? There is no dedicated klass field anymore, the Klass* is encoded in the mark, and we would need to extract it. What is the purpose of the visitors? Do they need to see the klass/compressedKlass, or is it sufficient to visit the mark-word (which we already do, but as CInt). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750386024 From rcastanedalo at openjdk.org Mon Sep 9 14:44:17 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 14:44:17 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: <_7xTwicd2PDxJZOJ7xLdkHZc18UT-9sVZk-3YFgkMA0=.7db81e94-4f38-44a0-9983-6e391459aab2@github.com> On Sat, 7 Sep 2024 03:57:43 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> s390 port : late barrier expansion > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 112: > >> 110: // The answer is that stores of different sizes can co-exist >> 111: // in the same sequence of RawMem effects. We sometimes initialize >> 112: // a whole 'tile' of array elements with a single jint or jlong.) > > I'm having trouble making sense of this comment. I guess a jlong could be used to null-initialize two > 32bit oops/narrowOops? But that doesn't have anything to do with jints. I am not sure the complex overlap test is necessary here, this code was copy-pasted from [MemNode::find_previous_store()](https://github.com/openjdk/jdk/blob/c54fc08aa3c63e4b26dc5edb2436844dfd3bab7c/src/hotspot/share/opto/memnode.cpp#L678) by [JDK-8057737](https://bugs.openjdk.org/browse/JDK-8057737), and in this new context I do not see how we might find stores of different sizes as mentioned in the comment. jlongs could be used to null-initialize two 32-bit OOPs, but such initializing stores are not even visible in C2's intermediate representation at the time `G1BarrierSetC2::g1_can_remove_pre_barrier()` is called. The fact that the comment refers to initializing several array elements with a single jint suggests to me that this code has lost some of its original purpose after being copied into a narrower context (OOP stores after object allocations). But since this code is pre-existing and in the worst case it is just performing some unnecessary work, I suggest to leave it as-is a nd possibly investigate how to simplify it as a follow-up task. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1750400106 From eosterlund at openjdk.org Mon Sep 9 14:47:06 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Sep 2024 14:47:06 GMT Subject: RFR: 8339661: ZGC: Move some page resets and verification to callsites In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 12:43:28 GMT, Joel Sikstr?m wrote: > Currently, `ZPage::reset()` does different things depending on where the page is being reset and what it will be used for. This leads to checks being somewhat hard to understand and to follow the edge-cases of. > > By moving some of the reset logic that is now part of `ZPage::reset()` to the callsite, some operations can be made easier to understand the reason behind when reading the code. > > Main highlights: > - Clear logic behind initializing remsets at callsites, now guarded by asserts in `ZPage::remset_initialize()`. > - `ZPage::clone_limited()` retains the value of the top-pointer. > - The kind of verification for remsets are now at callsites: > - Allocations from the page cache, and only if the page got a remset > - Old-to-old in-place relocations, where only the inactive remset is checked Nice change! Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20890#pullrequestreview-2290147705 From stefank at openjdk.org Mon Sep 9 14:50:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 14:50:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: <4yTWbD93OXGwYYxEQo56smKa5kl_WiPPcMsXSs0eUoQ=.893f54c8-ed2b-4a7f-bf0a-36553a951f47@github.com> On Fri, 30 Aug 2024 08:06:31 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/cds/filemap.cpp line 2507: > >> 2505: } >> 2506: >> 2507: if (compact_headers() != UseCompactObjectHeaders) { > > (Commenting here, but the comment applies to code a bit above) While debugging CDS, it would have been useful to print the value of UseCompactObjectHeaders. > > Could we change the code to be: > > log_info(cds)("Archive was created with UseCompressedOops = %d, UseCompressedClassPointers = %d, UseCompactObjectHeaders = %d", > compressed_oops(), compressed_class_pointers(), compact_headers()); Resolved. > src/hotspot/share/cds/filemap.cpp line 2508: > >> 2506: >> 2507: if (compact_headers() != UseCompactObjectHeaders) { >> 2508: log_info(cds)("The shared archive file's UseCompactObjectHeaders setting (%s)" > > Printing on the `info` level mimics what we do when there's a mismatch for compressed classes (and oops), but I wonder if that one is intentional or if it is accidentally printing to 'info' instead of 'warning'. @iklam informed me that some of the info levels (including this line) should be converted to warning. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750408043 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750410679 From rkennke at openjdk.org Mon Sep 9 15:04:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 15:04:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <4yTWbD93OXGwYYxEQo56smKa5kl_WiPPcMsXSs0eUoQ=.893f54c8-ed2b-4a7f-bf0a-36553a951f47@github.com> References: <4yTWbD93OXGwYYxEQo56smKa5kl_WiPPcMsXSs0eUoQ=.893f54c8-ed2b-4a7f-bf0a-36553a951f47@github.com> Message-ID: <-2JWx3F8EdyQ0Uf-mI62ImLXgjgIy9PEydjtKHhx12Q=.4d944301-6f1c-4270-953c-ec6c86df946a@github.com> On Mon, 9 Sep 2024 14:47:28 GMT, Stefan Karlsson wrote: >> src/hotspot/share/cds/filemap.cpp line 2508: >> >>> 2506: >>> 2507: if (compact_headers() != UseCompactObjectHeaders) { >>> 2508: log_info(cds)("The shared archive file's UseCompactObjectHeaders setting (%s)" >> >> Printing on the `info` level mimics what we do when there's a mismatch for compressed classes (and oops), but I wonder if that one is intentional or if it is accidentally printing to 'info' instead of 'warning'. > > @iklam informed me that some of the info levels (including this line) should be converted to warning. Yeah that looks inconsistent with other places where we print a warning instead. I'll change it to warning for the UCOH check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750430001 From stefank at openjdk.org Mon Sep 9 15:04:12 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 15:04:12 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 12:21:19 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/oop.hpp line 134: > >> 132: inline Klass* forward_safe_klass(markWord m) const; >> 133: inline size_t forward_safe_size(); >> 134: inline void forward_safe_init_mark(); > > Given the comment these methods do not seem "safe" to me. Maybe use "raw" or something to better indicate that care must be taken to use them. > > Maybe the "safe" refers to use them only in "safe" contexts, but in Hotspot code iirc we use something like "raw" or "unsafe". Restating my earlier comment about this: These functions are mainly used by the GCs. In one of the patches I've cleaned away all usages except for those in Shenandoah. I would prefer to see these completely removed from the oops/ directory and let the GCs decide when and how to perform "safe" reads of these values. > src/hotspot/share/oops/oop.hpp line 356: > >> 354: return mark_offset_in_bytes() + sizeof(markWord) / 2; >> 355: } else >> 356: #endif > > Maybe instead of trying to calculate some random, meaningless value just use some "random" value directly? > I am fine with the existing code, but first stating directly that "any value" works here, this additional code seems to confuse the message. (Fwiw, the method is also used during Universe initialization). Just to be clear, the second part of the quoted sentence is important: > could be any value *that is not a valid field offset* ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750428581 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750432186 From tschatzl at openjdk.org Mon Sep 9 15:04:12 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 15:04:12 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 15:00:09 GMT, Stefan Karlsson wrote: > could be any value that is not a valid field offset I understand that that "random value" needs to satisfy this condition. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750433800 From stefank at openjdk.org Mon Sep 9 15:34:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 15:34:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: References: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> <3QGPH52NyrDPne5EgoGx2sx9OeGRu9K72onNNwzMr2M=.8a390b3d-2e8a-470e-8bb7-1ba975070c53@github.com> Message-ID: On Mon, 9 Sep 2024 12:59:36 GMT, Roman Kennke wrote: > That %zu is SIZE_FORMAT, right? Yes. Reviewers have lately encouraged people to use %zu instead of SIZE_FORMAT. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750482486 From stefank at openjdk.org Mon Sep 9 15:34:09 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 15:34:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 12:49:05 GMT, Roman Kennke wrote: >>> If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. >> >> True; need to do sth like `obj->init_mark();` for the non-self-forwarded case. The postcondition is that no forwarded objs in eden/from. > > ParallelGC actually doesn't use bitmaps, it pushes all forwarded objs to preserved-marks-table, and uses that to find forwarded objects, which is why we can't remove the preserved-marks table in ParallelGC (IOW, after this patch, the preserved-marks-stuff in Parallel scavenger is *only* used to find forwarded objects. We might want to think about more efficient solutions for this). (Just to clarify if others are reading this) Right, what I referred to above was how we found the object to forward, which is done via the bitmaps: while (cur_addr < region_end) { cur_addr = mark_bitmap()->find_obj_beg(cur_addr, region_end); If the Parallel Old collector didn't do that, but instead parsed the heap like Serial does, then the Parallel Young collector would also have to fix the from space copies of moved objects when when it hits a promotion failure, just like Serial does. This was just meant to point out the differences between the two collectors and why the young GC code is different. I realize that in earlier comments I called the from-space copy of the objects "dead objects", but they are not dead they are just the stale objects that are discoverable because of promotion failure keeping the eden and from spaces. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750480983 From cjplummer at openjdk.org Mon Sep 9 16:56:08 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 9 Sep 2024 16:56:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: <1VACYSoQRtP9m4BJkCVrdFxueC75Kg4Kp3wjGsAA2Dw=.53563f62-70cf-4d93-8d99-69b737812ba6@github.com> On Mon, 26 Aug 2024 21:30:51 GMT, Chris Plummer wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 85: > >> 83: >> 84: private static Klass getKlass(Mark mark) { >> 85: assert(VM.getVM().isCompactObjectHeadersEnabled()); > > `mark.getKlass()` already does this assert. I don't see any value in this `getKlass()` method. The caller should just call `getMark().getKlass()` rather than `getKlass(getMark())`. I'm not sure why this got marked as resolved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750600652 From cjplummer at openjdk.org Mon Sep 9 16:56:08 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 9 Sep 2024 16:56:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 14:32:49 GMT, Roman Kennke wrote: >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 169: >> >>> 167: } else { >>> 168: visitor.doMetadata(klass, true); >>> 169: } >> >> Why is there no `visitor.doMetadata()` call for the compact object header case? > > There is no dedicated klass field anymore, the Klass* is encoded in the mark, and we would need to extract it. What is the purpose of the visitors? Do they need to see the klass/compressedKlass, or is it sufficient to visit the mark-word (which we already do, but as CInt). I've been looking into this. It's a bit hard to follow but I think you do need to do something more here. Can you run ClhsdbInspect.java and send me the output. In specific I need to know if the log includes the following: hsdb> + inspect 0x00000007cff154b8 instance of Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject @ 0x00000007cff154b8 (size = 24) _mark: 1 _metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject firstWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 lastWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 this$0: Oop for java/util/concurrent/locks/ReentrantLock$NonfairSync @ 0x00000007cff15498 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750598648 From rkennke at openjdk.org Mon Sep 9 17:45:47 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 17:45:47 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: Message-ID: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: - Print as warning when UCOH doesn't match in CDS archive - Improve initialization of mark-word in CDS ArchiveHeapWriter - Simplify getKlass() in SA - Simplify oopDesc::init_mark() - Get rid of forward_safe_* methods - GCForwarding touch-ups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/70f492d3..2884499a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=07-08 Stats: 132 lines in 17 files changed: 26 ins; 73 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From cjplummer at openjdk.org Mon Sep 9 18:37:09 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 9 Sep 2024 18:37:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 16:51:35 GMT, Chris Plummer wrote: >> There is no dedicated klass field anymore, the Klass* is encoded in the mark, and we would need to extract it. What is the purpose of the visitors? Do they need to see the klass/compressedKlass, or is it sufficient to visit the mark-word (which we already do, but as CInt). > > I've been looking into this. It's a bit hard to follow but I think you do need to do something more here. Can you run ClhsdbInspect.java and send me the output. In specific I need to know if the log includes the following: > > > hsdb> + inspect 0x00000007cff154b8 > instance of Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject @ 0x00000007cff154b8 (size = 24) > _mark: 1 > _metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject > firstWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 > lastWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 > this$0: Oop for java/util/concurrent/locks/ReentrantLock$NonfairSync @ 0x00000007cff15498 I pulled your changes and I see one slight difference in the output. The following line is missing: `_metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject` I realize that there is no `_metadata._compressed_klass` when you have compact headers, and that the Klass* is encoded in the `_mark` word, which is now looks something like this in the output: _mark: 16294762323640321 So you can say that the Klass* is embedded in the _mark work, but this isn't of much help to SA users. I think what is expected is that the visitor is passed a MetadataField object that when getValue() is called on it, the Klass mirror is returned. Maybe we need a new CompactKlassField type like we current have a NarrowKlassField field type, and it will do the decoding of the _mark work into a Klass. The current getKlass() is related to this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750743693 From cjplummer at openjdk.org Mon Sep 9 19:07:10 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 9 Sep 2024 19:07:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 18:34:10 GMT, Chris Plummer wrote: >> I've been looking into this. It's a bit hard to follow but I think you do need to do something more here. Can you run ClhsdbInspect.java and send me the output. In specific I need to know if the log includes the following: >> >> >> hsdb> + inspect 0x00000007cff154b8 >> instance of Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject @ 0x00000007cff154b8 (size = 24) >> _mark: 1 >> _metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject >> firstWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 >> lastWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 >> this$0: Oop for java/util/concurrent/locks/ReentrantLock$NonfairSync @ 0x00000007cff15498 > > I pulled your changes and I see one slight difference in the output. The following line is missing: > > `_metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject` > > I realize that there is no `_metadata._compressed_klass` when you have compact headers, and that the Klass* is encoded in the `_mark` word, which is now looks something like this in the output: > > _mark: 16294762323640321 > > So you can say that the Klass* is embedded in the _mark work, but this isn't of much help to SA users. I think what is expected is that the visitor is passed a MetadataField object that when getValue() is called on it, the Klass mirror is returned. Maybe we need a new CompactKlassField type like we current have a NarrowKlassField field type, and it will do the decoding of the _mark work into a Klass. The current getKlass() is related to this. Thinking about this a bit more, maybe _mark needs to be a MetadataFile rather than CInt. This is a kind of odd situation. Basically we have a CInt field that is more than just simple bits used as flags or small integers. It also gets you to the Klass*. Possibly SA should treat _mark is two seprate fields; one that remains a CInt as it currently is and another that treats it as an encoded Klass* like the NarrowKlassField case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750788243 From coleenp at openjdk.org Mon Sep 9 19:55:16 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 19:55:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Mon, 9 Sep 2024 17:45:47 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: > > - Print as warning when UCOH doesn't match in CDS archive > - Improve initialization of mark-word in CDS ArchiveHeapWriter > - Simplify getKlass() in SA > - Simplify oopDesc::init_mark() > - Get rid of forward_safe_* methods > - GCForwarding touch-ups I reviewed the oops code so far. src/hotspot/share/oops/compressedKlass.cpp line 116: > 114: _range = end - _base; > 115: > 116: DEBUG_ONLY(assert_is_valid_encoding(addr, len, _base, _shift);) Can you refactor so the aarch64 path runs this same code without duplication? src/hotspot/share/oops/klass.hpp line 173: > 171: > 172: markWord _prototype_header; // Used to initialize objects' header > 173: I think you should move this up after ClassLoaderData, as there might be an alignment gap (you can run pahole to check). src/hotspot/share/oops/klass.hpp line 718: > 716: > 717: markWord prototype_header() const { > 718: assert(UseCompactObjectHeaders, "only use with compact object headers"); Should this unconditionally return _prototype_header since it's initialized to markWord::prototype_header(), or would that decrease performance for the non-compact headers case? src/hotspot/share/oops/klass.inline.hpp line 54: > 52: } > 53: > 54: inline void Klass::set_prototype_header(markWord header) { Can you put a comment that this is only used when dumping the archive? Because otherwise the Klass::_prototype_header field should always be initialized to the right thing (either with Klass encoded or as markWord::protoytpe_header()) and doesn't change. src/hotspot/share/oops/markWord.inline.hpp line 90: > 88: ShouldNotReachHere(); > 89: return markWord(); > 90: #endif Is the ifdef _LP64 necessary, since UseCompactObjectHeaders should always be false for 32 bits? src/hotspot/share/oops/oop.inline.hpp line 90: > 88: } else { > 89: return markWord::prototype(); > 90: } Could this be unconditional since prototoype_header is initialized for all Klasses? src/hotspot/share/oops/typeArrayKlass.cpp line 175: > 173: size_t TypeArrayKlass::oop_size(oop obj) const { > 174: // In this assert, we cannot safely access the Klass* with compact headers. > 175: assert(UseCompactObjectHeaders || obj->is_typeArray(),"must be a type array"); Why not? I think I'm missing something. Klass should be in the markWord and that should be ok (?) ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2290316150 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750529270 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750727211 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750730078 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750736547 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750739441 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750842383 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750721069 From coleenp at openjdk.org Mon Sep 9 19:55:19 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 19:55:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 11:55:52 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Try to avoid lea in loadNklass (aarch64) > - Fix release build error src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 147: > 145: #endif > 146: > 147: return true; This should only be in the compressedKlass.cpp file. src/hotspot/share/oops/compressedKlass.cpp line 214: > 212: ss.print("Class space size (%zu) exceeds the maximum possible size (%zu)", > 213: len, max_encoding_range_size()); > 214: vm_exit_during_initialization(ss.base()); Why does this exit and not turn off compressed klass pointers and compact object headers? src/hotspot/share/oops/compressedKlass.cpp line 222: > 220: return; > 221: } > 222: #endif Why not add null pd_initialize to zero to remove this conditional code? src/hotspot/share/oops/compressedKlass.cpp line 224: > 222: #endif > 223: > 224: if (tiny_classpointer_mode()) { I kind of agree with Thomas Schatzl for this. Maybe it should be compact_classpointer_mode(). It's nice to have a new string for grep, but they're not really that tiny. src/hotspot/share/oops/compressedKlass.cpp line 234: > 232: _range = len; > 233: > 234: constexpr int log_cacheline = 6; Is 6 the log of DEFAULT_CACHE_LINE_SIZE? src/hotspot/share/oops/compressedKlass.cpp line 243: > 241: } else { > 242: > 243: // In legacy mode, we try, in order of preference: Can you not use the word 'legacy' here? Maybe in "non-compact object header mode"... src/hotspot/share/oops/compressedKlass.inline.hpp line 100: > 98: check_valid_klass(k, base(), shift()); > 99: // Also assert that k falls into what we know is the valid Klass range. This is usually smaller > 100: // than the encoding range (e.g. encoding range covers 4G, but we only have 1G class space and a 1G is the default CompressedClassSpaceSize but can be larger, right? So the comment isn't quite accurate. Or with tiny class pointers can it only be 1G? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750527537 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750511912 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750513660 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750515923 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750520712 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750524690 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750662637 From coleenp at openjdk.org Mon Sep 9 19:55:20 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 19:55:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> Message-ID: <4cTKmlYUEtFpr2TURf25gd7-_eSb-uF0cC0BmLl6wd0=.b9f0482d-5439-421b-9a29-7a014fb72558@github.com> On Mon, 9 Sep 2024 10:02:53 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/oops/compressedKlass.hpp line 43: > >> 41: >> 42: // Tiny-class-pointer mode >> 43: static int _tiny_cp; // -1, 0=true, 1=false > > Suggestion: > > static int _tiny_cp; // -1 = uninitialized, 0 = true, 1 = false > > In addition to that, I am not sure if introducing a new term ("tiny") for compact class header related changes (and just here) makes the code more clear; I would have expected a "_compact_" prefix. Also all other members use "k"-klass and spell out "klass pointer", so I would prefer to keep that style. I agree with this. 'cp' reads as ConstantPool for me even though this is a different context. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750531167 From stefank at openjdk.org Mon Sep 9 20:07:13 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 20:07:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: <1vlOROPKL6eDagcb0xz0MGe6vA6vYBVa4UzVsCZ5Q8I=.d5408660-ee94-4118-b809-d13a3dc500b4@github.com> On Mon, 9 Sep 2024 18:15:38 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: >> >> - Print as warning when UCOH doesn't match in CDS archive >> - Improve initialization of mark-word in CDS ArchiveHeapWriter >> - Simplify getKlass() in SA >> - Simplify oopDesc::init_mark() >> - Get rid of forward_safe_* methods >> - GCForwarding touch-ups > > src/hotspot/share/oops/typeArrayKlass.cpp line 175: > >> 173: size_t TypeArrayKlass::oop_size(oop obj) const { >> 174: // In this assert, we cannot safely access the Klass* with compact headers. >> 175: assert(UseCompactObjectHeaders || obj->is_typeArray(),"must be a type array"); > > Why not? I think I'm missing something. Klass should be in the markWord and that should be ok (?) I tracked this down to only (at least in my testing) happen from `size_given_klass` when called from the GC when it is about to copy an object. While that happens another thread can racingly succeed to copy the object and install a forwarding pointer over the old copy. When that happens the klass pointer is broken and the call to oopDesc::is_typeArray() crashes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750862842 From coleenp at openjdk.org Mon Sep 9 20:23:11 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 20:23:11 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: <1vlOROPKL6eDagcb0xz0MGe6vA6vYBVa4UzVsCZ5Q8I=.d5408660-ee94-4118-b809-d13a3dc500b4@github.com> References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> <1vlOROPKL6eDagcb0xz0MGe6vA6vYBVa4UzVsCZ5Q8I=.d5408660-ee94-4118-b809-d13a3dc500b4@github.com> Message-ID: On Mon, 9 Sep 2024 20:04:22 GMT, Stefan Karlsson wrote: >> src/hotspot/share/oops/typeArrayKlass.cpp line 175: >> >>> 173: size_t TypeArrayKlass::oop_size(oop obj) const { >>> 174: // In this assert, we cannot safely access the Klass* with compact headers. >>> 175: assert(UseCompactObjectHeaders || obj->is_typeArray(),"must be a type array"); >> >> Why not? I think I'm missing something. Klass should be in the markWord and that should be ok (?) > > I tracked this down to only (at least in my testing) happen from `size_given_klass` when called from the GC when it is about to copy an object. While that happens another thread can racingly succeed to copy the object and install a forwarding pointer over the old copy. When that happens the klass pointer is broken and the call to oopDesc::is_typeArray() crashes. I did miss something. I thought the markWord was never overwritten by the forwarding pointer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750882259 From rkennke at openjdk.org Tue Sep 10 07:23:13 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 07:23:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> Message-ID: On Mon, 9 Sep 2024 10:16:24 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/gc/shared/gcForwarding.hpp line 41: > >> 39: * bits (to indicate 'forwarded' state as usual). >> 40: */ >> 41: class GCForwarding : public AllStatic { > > Since this class is only used for Full GCs, it may be useful to include that information, i.e. something like `FullGCForwarding` to avoid confusion why it is not used for other GCs too. > (Unless this has been discussed and even rejected by me before). I agree. In-fact, that has been my original name. It has been suggested that I change it to SlidingForwarding when that was the approach that we were going to take, but with the new implementation, FullGCForwarding makes most sense. I'll change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751400378 From rkennke at openjdk.org Tue Sep 10 07:56:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 07:56:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> Message-ID: On Mon, 9 Sep 2024 10:21:54 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/gc/shared/collectedHeap.cpp line 232: > >> 230: } >> 231: >> 232: // With compact headers, we can't safely access the class, due > > Suggestion: > > // With compact headers, we can't safely access the klass, due > > > This is the case why? Because we might not have copied the header yet? Is this method actually ever used while the forwarded object is unstable? > Given this is used for verification only afaik, we should make an effort to provide that check. With compact headers, we can't safely access the Klass* when the object has been forwarded, because non-full-GC-forwarding temporarily overwrites the mark-word, and thus the Klass*, with the forwarding pointer, and here we have no way to make a distinction between Full-GC and regular GC forwarding. I improved the code to make the check when the object is not forwarded. Not sure if we could/should do more (e.g. pass around is_full argument to make the distinction, or find the - possibly few - places where we might call is_oop() on from-space objects in regular GC and do the check in a forwardee-safe way?). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751448814 From rkennke at openjdk.org Tue Sep 10 08:36:13 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 08:36:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 14:58:07 GMT, Stefan Karlsson wrote: >> src/hotspot/share/oops/oop.hpp line 134: >> >>> 132: inline Klass* forward_safe_klass(markWord m) const; >>> 133: inline size_t forward_safe_size(); >>> 134: inline void forward_safe_init_mark(); >> >> Given the comment these methods do not seem "safe" to me. Maybe use "raw" or something to better indicate that care must be taken to use them. >> >> Maybe the "safe" refers to use them only in "safe" contexts, but in Hotspot code iirc we use something like "raw" or "unsafe". > > Restating my earlier comment about this: These functions are mainly used by the GCs. In one of the patches I've cleaned away all usages except for those in Shenandoah. I would prefer to see these completely removed from the oops/ directory and let the GCs decide when and how to perform "safe" reads of these values. I've removed those methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751514466 From rkennke at openjdk.org Tue Sep 10 08:40:12 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 08:40:12 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 15:01:10 GMT, Thomas Schatzl wrote: >> Just to be clear, the second part of the quoted sentence is important: >>> could be any value *that is not a valid field offset* > >> could be any value that is not a valid field offset > > I understand that that "random value" needs to satisfy this condition. With compact headers, this value should only be used in C2, and not really as an actual offset. An earlier version of the change had the value in src/hotspot/share/opto/type.hpp instead, and only an assert(!UCOH) in oopDesc::klass_offset_in_bytes(). I think this would be a better solution overall, because it prevents accidental (and wrong) usage of the klass_offset in the runtime. Back then it has been rejected by somebody (don't remember), because it made the C2 diff a little messier, so I kept it like it is now. I would prefer to reinstate it, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751522091 From rkennke at openjdk.org Tue Sep 10 08:44:11 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 08:44:11 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 12:12:23 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/objArrayKlass.inline.hpp line 74: > >> 72: void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { >> 73: // In this assert, we cannot safely access the Klass* with compact headers. >> 74: assert (UseCompactObjectHeaders || obj->is_array(), "obj must be array"); > > If we can't safely access the `Klass*` here, why is the call to `obj->klass()` below safe? Good question. This comment and assert can probably be removed (same for the similar comment/assert in TypeArrayKlass::oop_oop_iterate_impl(). Could be a left-over from a time when we had to deal with OM and/or stack-locks in the header. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751527745 From mli at openjdk.org Tue Sep 10 08:54:13 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 10 Sep 2024 08:54:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <6I0T4rOjOTj-FZxpspatEo6j1_Num75bCAOBNxsrHI8=.097f731e-7c92-4eac-a379-c2df336cd412@github.com> References: <6I0T4rOjOTj-FZxpspatEo6j1_Num75bCAOBNxsrHI8=.097f731e-7c92-4eac-a379-c2df336cd412@github.com> Message-ID: On Mon, 9 Sep 2024 14:08:53 GMT, Roman Kennke wrote: >> Yes, I'm interested in it. Thanks for raising the discussion. :) > > If anybody is doing it, please send me a patch, or we can do it as a follow-up PR. Thanks. I'll send it to you if I finish it in time, otherwise I will do it in a separate pr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751544394 From lujianping521 at gmail.com Tue Sep 10 09:16:51 2024 From: lujianping521 at gmail.com (=?UTF-8?B?6bKB5bu65bmz?=) Date: Tue, 10 Sep 2024 17:16:51 +0800 Subject: Split Lock Warning with ZGC and -XX:-ClassUnloading on Linux x86_64, JDK 17.0.2 Message-ID: HI ALL: When running JDK 17.0.2 on a Linux x86_64 architecture with ZGC and the JVM option -XX:-ClassUnloading, I encounter split lock warnings from the Linux kernel. This issue appears consistently during garbage collection operations. Here is the specific warning message from the kernel: x86/split lock detection: #AC: ZWorker#0/2154775 took a split_lock trap at address: 0x7f50c6e0433c Upon investigating the assembly at this address, I identified the following instruction: 0x00007f50c6e0433c <+76>: lock cmpxchg %rcx,(%rbx) This is part of the function: Dump of assembler code for function _ZN15ZMarkOopClosure6do_oopEPP7oopDesc: 0x00007f50c6e0433c <+76>: lock cmpxchg %rcx,(%rbx) The split lock warning occurs during the execution of the ZWorker thread, which is responsible for concurrent marking in ZGC. The warning seems to be triggered specifically when class unloading is disabled with -XX:-ClassUnloading. Environment: JDK Version: OpenJDK 17.0.2 GC: ZGC with -XX:-ClassUnloading OS: Linux x86_64 I would like to understand if this behavior is expected when class unloading is disabled or if there are any recommended fixes or workarounds for avoiding the split lock issue during concurrent garbage collection. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkennke at openjdk.org Tue Sep 10 09:31:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 09:31:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Tue, 10 Sep 2024 08:37:43 GMT, Roman Kennke wrote: >>> could be any value that is not a valid field offset >> >> I understand that that "random value" needs to satisfy this condition. > > With compact headers, this value should only be used in C2, and not really as an actual offset. An earlier version of the change had the value in src/hotspot/share/opto/type.hpp instead, and only an assert(!UCOH) in oopDesc::klass_offset_in_bytes(). I think this would be a better solution overall, because it prevents accidental (and wrong) usage of the klass_offset in the runtime. Back then it has been rejected by somebody (don't remember), because it made the C2 diff a little messier, so I kept it like it is now. I would prefer to reinstate it, though. > (Fwiw, the method is also used during Universe initialization). Yes, but only in the -UCOH branch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751604467 From stefank at openjdk.org Tue Sep 10 10:05:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 10 Sep 2024 10:05:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Tue, 10 Sep 2024 08:41:16 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/objArrayKlass.inline.hpp line 74: >> >>> 72: void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { >>> 73: // In this assert, we cannot safely access the Klass* with compact headers. >>> 74: assert (UseCompactObjectHeaders || obj->is_array(), "obj must be array"); >> >> If we can't safely access the `Klass*` here, why is the call to `obj->klass()` below safe? > > Good question. This comment and assert can probably be removed (same for the similar comment/assert in TypeArrayKlass::oop_oop_iterate_impl(). Could be a left-over from a time when we had to deal with OM and/or stack-locks in the header. FWIW, I've been running tests with this assert restored (and the one in TypeArrayKlass) without hitting any problems. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751656595 From rkennke at openjdk.org Tue Sep 10 11:29:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 11:29:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> Message-ID: On Tue, 10 Sep 2024 07:53:23 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shared/collectedHeap.cpp line 232: >> >>> 230: } >>> 231: >>> 232: // With compact headers, we can't safely access the class, due >> >> Suggestion: >> >> // With compact headers, we can't safely access the klass, due >> >> >> This is the case why? Because we might not have copied the header yet? Is this method actually ever used while the forwarded object is unstable? >> Given this is used for verification only afaik, we should make an effort to provide that check. > > With compact headers, we can't safely access the Klass* when the object has been forwarded, because non-full-GC-forwarding temporarily overwrites the mark-word, and thus the Klass*, with the forwarding pointer, and here we have no way to make a distinction between Full-GC and regular GC forwarding. > > I improved the code to make the check when the object is not forwarded. Not sure if we could/should do more (e.g. pass around is_full argument to make the distinction, or find the - possibly few - places where we might call is_oop() on from-space objects in regular GC and do the check in a forwardee-safe way?). Ah, I found it! It seems only the ShenandoahVerifier calls oop_iterate() on from_space objects, which can have a forwarding, which would mess with the object's Klass*. We're lucky because that iterator doesn't visit the Klass*. I see the following ways out: - The caller must ensure that the oop is ok and Klass* is accessible. I could do that in the ShenandoahVerifier. It kinda defeats the point, though, we want the verifier operate on the 'raw' object, not necessarily the forwardee. - Next easy way out would be to use 'this' instead of obj->klass(). Should makes sense, because it should always be the same. Using 'this' in the assert (this->is_array_klass()) is kinda bogus, though. And asserting (this == obj->klass()) would be nice, but would have the same problem as before where we would need to exclude UCOH for the case where Shenandoah needs it. In-fact, this is done already in oopDesc::oop_iterate_backwards(), but also excluding UCOH. - We could add a hook in the iterator that gives the Klass* for a given oop, which can then be overridden by the actual iterator to do the right thing, e.g. load the Klass* from the forwardee. WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751770293 From tschatzl at openjdk.org Tue Sep 10 12:02:11 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 10 Sep 2024 12:02:11 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v18] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 11:15:47 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port for JEP 475 src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 210: > 208: Label& done, > 209: bool new_val_may_be_null) { > 210: // Does store cross heap regions? Suggestion: // Does store cross heap regions? Indentation ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1751721626 From stuefe at openjdk.org Tue Sep 10 12:07:09 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 12:07:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 15:49:57 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/compressedKlass.cpp line 214: > >> 212: ss.print("Class space size (%zu) exceeds the maximum possible size (%zu)", >> 213: len, max_encoding_range_size()); >> 214: vm_exit_during_initialization(ss.base()); > > Why does this exit and not turn off compressed klass pointers and compact object headers? This is tricky. We are already deep in initialization and have done a couple of decisions based on +UseCompressedClassPointers (e.g. CDS setup). I *think* we could still go with -UseCCP, but I wonder whether this is wise. Note that this error is not new. In the old code, we simply asserted. That left us with UB in release builds, which remains unresolved. I simply made the error explicit in release too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751819814 From stuefe at openjdk.org Tue Sep 10 12:16:11 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 12:16:11 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 15:50:50 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/compressedKlass.cpp line 222: > >> 220: return; >> 221: } >> 222: #endif > > Why not add null pd_initialize to zero to remove this conditional code? I can do that. Added to backlist (https://wiki.openjdk.org/display/lilliput/JEP-450+Review+Todo) > src/hotspot/share/oops/compressedKlass.cpp line 224: > >> 222: #endif >> 223: >> 224: if (tiny_classpointer_mode()) { > > I kind of agree with Thomas Schatzl for this. Maybe it should be compact_classpointer_mode(). It's nice to have a new string for grep, but they're not really that tiny. Yes, makes sense. Added to backlist. This coding was developed somewhat independently from +COH at the beginning, but now the two parts (tinycp and the rest of COH) depend on each other anyway. I should just use UseCompactObjectHeaders or a flag directly derived from it. > src/hotspot/share/oops/compressedKlass.cpp line 234: > >> 232: _range = len; >> 233: >> 234: constexpr int log_cacheline = 6; > > Is 6 the log of DEFAULT_CACHE_LINE_SIZE? 64, yes > src/hotspot/share/oops/compressedKlass.cpp line 243: > >> 241: } else { >> 242: >> 243: // In legacy mode, we try, in order of preference: > > Can you not use the word 'legacy' here? Maybe in "non-compact object header mode"... okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751828214 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751831035 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751831994 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751833034 From duke at openjdk.org Tue Sep 10 12:17:07 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 10 Sep 2024 12:17:07 GMT Subject: RFR: 8339661: ZGC: Move some page resets and verification to callsites In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 13:01:12 GMT, Stefan Karlsson wrote: >> Currently, `ZPage::reset()` does different things depending on where the page is being reset and what it will be used for. This leads to checks being somewhat hard to understand and to follow the edge-cases of. >> >> By moving some of the reset logic that is now part of `ZPage::reset()` to the callsite, some operations can be made easier to understand the reason behind when reading the code. >> >> Main highlights: >> - Clear logic behind initializing remsets at callsites, now guarded by asserts in `ZPage::remset_initialize()`. >> - `ZPage::clone_limited()` retains the value of the top-pointer. >> - The kind of verification for remsets are now at callsites: >> - Allocations from the page cache, and only if the page got a remset >> - Old-to-old in-place relocations, where only the inactive remset is checked > > Looks good! Thank you for the reviews! @stefank @fisk ------------- PR Comment: https://git.openjdk.org/jdk/pull/20890#issuecomment-2340530203 From duke at openjdk.org Tue Sep 10 12:17:09 2024 From: duke at openjdk.org (duke) Date: Tue, 10 Sep 2024 12:17:09 GMT Subject: RFR: 8339661: ZGC: Move some page resets and verification to callsites In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 12:43:28 GMT, Joel Sikstr?m wrote: > Currently, `ZPage::reset()` does different things depending on where the page is being reset and what it will be used for. This leads to checks being somewhat hard to understand and to follow the edge-cases of. > > By moving some of the reset logic that is now part of `ZPage::reset()` to the callsite, some operations can be made easier to understand the reason behind when reading the code. > > Main highlights: > - Clear logic behind initializing remsets at callsites, now guarded by asserts in `ZPage::remset_initialize()`. > - `ZPage::clone_limited()` retains the value of the top-pointer. > - The kind of verification for remsets are now at callsites: > - Allocations from the page cache, and only if the page got a remset > - Old-to-old in-place relocations, where only the inactive remset is checked @jsikstro Your change (at version d3378b4f21086b4f2eb84d7bf7ecf2e9007acf8d) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20890#issuecomment-2340532527 From coleenp at openjdk.org Tue Sep 10 12:22:09 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 10 Sep 2024 12:22:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:03:59 GMT, Thomas Stuefe wrote: >> src/hotspot/share/oops/compressedKlass.cpp line 214: >> >>> 212: ss.print("Class space size (%zu) exceeds the maximum possible size (%zu)", >>> 213: len, max_encoding_range_size()); >>> 214: vm_exit_during_initialization(ss.base()); >> >> Why does this exit and not turn off compressed klass pointers and compact object headers? > > This is tricky. We are already deep in initialization and have done a couple of decisions based on +UseCompressedClassPointers (e.g. CDS setup). I *think* we could still go with -UseCCP, but I wonder whether this is wise. > > Note that this error is not new. In the old code, we simply asserted. That left us with UB in release builds, which remains unresolved. I simply made the error explicit in release too. Ok, in this case, that's fine if we already asserted. A fatal error is better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751840556 From rkennke at openjdk.org Tue Sep 10 12:42:48 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 12:42:48 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v10] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: - More touch-ups, fix Shenandoah oop iterator - Remove asserts in XArrayKlass::oop_oop_iterate() - Various touch-ups - Improve is_oop() - Rename GCForwarding -> FullGCForwarding; some touch-ups - Fix comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/2884499a..5da250cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=08-09 Stats: 238 lines in 36 files changed: 74 ins; 65 del; 99 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stuefe at openjdk.org Tue Sep 10 12:42:49 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 12:42:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v10] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Mon, 9 Sep 2024 15:59:43 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: >> >> - More touch-ups, fix Shenandoah oop iterator >> - Remove asserts in XArrayKlass::oop_oop_iterate() >> - Various touch-ups >> - Improve is_oop() >> - Rename GCForwarding -> FullGCForwarding; some touch-ups >> - Fix comment > > src/hotspot/share/oops/compressedKlass.cpp line 116: > >> 114: _range = end - _base; >> 115: >> 116: DEBUG_ONLY(assert_is_valid_encoding(addr, len, _base, _shift);) > > Can you refactor so the aarch64 path runs this same code without duplication? In tinycp mode, aarch64 runs this code though? The aarch64 variant of pd_initialize just returns then. In non-COH mode (preexisting, not touched by this patch) Aarch64 needs its own handling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751866773 From stuefe at openjdk.org Tue Sep 10 12:42:49 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 12:42:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <4cTKmlYUEtFpr2TURf25gd7-_eSb-uF0cC0BmLl6wd0=.b9f0482d-5439-421b-9a29-7a014fb72558@github.com> References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> <4cTKmlYUEtFpr2TURf25gd7-_eSb-uF0cC0BmLl6wd0=.b9f0482d-5439-421b-9a29-7a014fb72558@github.com> Message-ID: On Mon, 9 Sep 2024 16:01:10 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/compressedKlass.hpp line 43: >> >>> 41: >>> 42: // Tiny-class-pointer mode >>> 43: static int _tiny_cp; // -1, 0=true, 1=false >> >> Suggestion: >> >> static int _tiny_cp; // -1 = uninitialized, 0 = true, 1 = false >> >> In addition to that, I am not sure if introducing a new term ("tiny") for compact class header related changes (and just here) makes the code more clear; I would have expected a "_compact_" prefix. Also all other members use "k"-klass and spell out "klass pointer", so I would prefer to keep that style. > > I agree with this. 'cp' reads as ConstantPool for me even though this is a different context. Okay, I will change that ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751867998 From tschatzl at openjdk.org Tue Sep 10 13:03:15 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 10 Sep 2024 13:03:15 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v18] In-Reply-To: References: Message-ID: <6oxoxxR5yGICT3d_4Wip3egvyFiQLWb3HWMxpL41jwY=.9ca3bb60-1e86-4bd2-a3b4-26eadc409eb5@github.com> On Mon, 9 Sep 2024 11:15:47 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port for JEP 475 Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2292405233 From rcastanedalo at openjdk.org Tue Sep 10 16:26:58 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 10 Sep 2024 16:26:58 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v19] In-Reply-To: References: Message-ID: <5CDsDluK4CaytgLTPJPuKbz8Ug9mcF10mco0In8ljZM=.3fe332b8-36af-4de5-8cee-73f58a564497@github.com> > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Fix indentation in generate_post_barrier_fast_path Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/94145917..0979e41e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=17-18 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Tue Sep 10 16:26:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 10 Sep 2024 16:26:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v18] In-Reply-To: <6oxoxxR5yGICT3d_4Wip3egvyFiQLWb3HWMxpL41jwY=.9ca3bb60-1e86-4bd2-a3b4-26eadc409eb5@github.com> References: <6oxoxxR5yGICT3d_4Wip3egvyFiQLWb3HWMxpL41jwY=.9ca3bb60-1e86-4bd2-a3b4-26eadc409eb5@github.com> Message-ID: <7epSurWH76D6t-eSs3neVvSHYRdhdGanYobPU0Y_-SM=.5068c4a5-d220-417d-9d8a-0518bfdc61d8@github.com> On Tue, 10 Sep 2024 13:00:05 GMT, Thomas Schatzl wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion >> - riscv port for JEP 475 > > Marked as reviewed by tschatzl (Reviewer). Thanks for reviewing, @tschatzl! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2341418514 From rkennke at openjdk.org Tue Sep 10 19:11:30 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 19:11:30 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix FullGCForwarding initialization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/5da250cf..6abda7bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=09-10 Stats: 8 lines in 7 files changed: 1 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stuefe at openjdk.org Tue Sep 10 19:11:30 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 19:11:30 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 17:40:03 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/compressedKlass.inline.hpp line 100: > >> 98: check_valid_klass(k, base(), shift()); >> 99: // Also assert that k falls into what we know is the valid Klass range. This is usually smaller >> 100: // than the encoding range (e.g. encoding range covers 4G, but we only have 1G class space and a > > 1G is the default CompressedClassSpaceSize but can be larger, right? So the comment isn't quite accurate. Or with tiny class pointers can it only be 1G? The comment was misleading, it referred to the 1g default class space. I recently changed class space (in mainline) to be max. 4GB (minus whatever little CDS needs), and for +COH, this is still true. 22 bit class pointer and 10 bit shift still gives us a max encoding range size of 4GB. I will update the comment. (->backlist) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751872461 From duke at openjdk.org Wed Sep 11 08:11:17 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 11 Sep 2024 08:11:17 GMT Subject: Integrated: 8339661: ZGC: Move some page resets and verification to callsites In-Reply-To: References: Message-ID: <7-ZqaoZ1s-ga06y8iChA-GQHwjluY_5y97aWJ0lsioc=.3bc5ef62-9879-4a43-bae5-6fe926d705ae@github.com> On Fri, 6 Sep 2024 12:43:28 GMT, Joel Sikstr?m wrote: > Currently, `ZPage::reset()` does different things depending on where the page is being reset and what it will be used for. This leads to checks being somewhat hard to understand and to follow the edge-cases of. > > By moving some of the reset logic that is now part of `ZPage::reset()` to the callsite, some operations can be made easier to understand the reason behind when reading the code. > > Main highlights: > - Clear logic behind initializing remsets at callsites, now guarded by asserts in `ZPage::remset_initialize()`. > - `ZPage::clone_limited()` retains the value of the top-pointer. > - The kind of verification for remsets are now at callsites: > - Allocations from the page cache, and only if the page got a remset > - Old-to-old in-place relocations, where only the inactive remset is checked This pull request has now been integrated. Changeset: ceef161e Author: Joel Sikstr?m Committer: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/ceef161eea51578160b71b20826a9328f9a87a88 Stats: 127 lines in 6 files changed: 34 ins; 64 del; 29 mod 8339661: ZGC: Move some page resets and verification to callsites Reviewed-by: stefank, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/20890 From epeter at openjdk.org Wed Sep 11 08:28:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Sep 2024 08:28:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix FullGCForwarding initialization @rkennke Can you please explain the changes in these tests: test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2342983487 From rcastanedalo at openjdk.org Wed Sep 11 08:30:02 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Sep 2024 08:30:02 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Fix a few style issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/0979e41e..141020e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=18-19 Stats: 7 lines in 3 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Wed Sep 11 08:32:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Sep 2024 08:32:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v19] In-Reply-To: <5CDsDluK4CaytgLTPJPuKbz8Ug9mcF10mco0In8ljZM=.3fe332b8-36af-4de5-8cee-73f58a564497@github.com> References: <5CDsDluK4CaytgLTPJPuKbz8Ug9mcF10mco0In8ljZM=.3fe332b8-36af-4de5-8cee-73f58a564497@github.com> Message-ID: <8-IYniHv9GgBnsv9w3GggGF1mKKf3MfwxIxGIjEUh3c=.446607ac-5624-4c16-a1a5-a29187526023@github.com> On Tue, 10 Sep 2024 16:26:58 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation in generate_post_barrier_fast_path > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> I just fixed a few more indentation and code style glitches found by clang-format in commit 141020e6 (thanks @dlunde for helping with the setup). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2342993484 From aboldtch at openjdk.org Wed Sep 11 09:44:04 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 11 Sep 2024 09:44:04 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v2] In-Reply-To: References: Message-ID: <7yVEKxrYW37Ci3BnoTv836ENs4EoncYnRKolR4ytJTM=.b69666e3-1a76-4ae4-a28c-ef28ae27cac9@github.com> On Mon, 9 Sep 2024 11:37:41 GMT, Matthias Baesken wrote: >> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational >> shows this error when running with ubsan enabled >> >> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero >> #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 >> #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 >> #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 >> #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 >> #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 >> #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust division following suggestion by xmas I am fine with this change. But I am not 100% about the use of `std::numeric_limits::infinity()`. Maybe someone else can chime in. Not sure if there are any other places we have expect division by zero to result in infinity. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20888#pullrequestreview-2296226260 From mbaesken at openjdk.org Wed Sep 11 11:13:04 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 11 Sep 2024 11:13:04 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v2] In-Reply-To: <7yVEKxrYW37Ci3BnoTv836ENs4EoncYnRKolR4ytJTM=.b69666e3-1a76-4ae4-a28c-ef28ae27cac9@github.com> References: <7yVEKxrYW37Ci3BnoTv836ENs4EoncYnRKolR4ytJTM=.b69666e3-1a76-4ae4-a28c-ef28ae27cac9@github.com> Message-ID: On Wed, 11 Sep 2024 09:41:14 GMT, Axel Boldt-Christmas wrote: > I am fine with this change. But I am not 100% about the use of `std::numeric_limits::infinity()`. Maybe someone else can chime in. > > Not sure if there are any other places we have expect division by zero to result in infinity. Thanks for the review ! Seems this exists since c++11 https://en.cppreference.com/w/cpp/types/numeric_limits/infinity so usage should be okay. We also find it in libsimdsort (linux only in OpenJDK however) https://github.com/openjdk/jdk/blob/master/src/java.base/linux/native/libsimdsort/xss-common-includes.h#L47 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20888#issuecomment-2343339018 From rkennke at openjdk.org Wed Sep 11 13:37:16 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 11 Sep 2024 13:37:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 08:24:16 GMT, Emanuel Peter wrote: > @rkennke Can you please explain the changes in these tests: > > ``` > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > ``` > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2343693629 From jsjolen at openjdk.org Wed Sep 11 14:00:17 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 11 Sep 2024 14:00:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix FullGCForwarding initialization Hi, Me and @caspernorrbin are reviewing the Metaspace changes (so anything in the `memory` and `metaspace` folders). We have found minor improvements that can be made and some nits, but the code over all looks OK. We are finishing up a first round of review now, and will have a second one. Thank you for your hard work and your patience with the review process. src/hotspot/share/memory/classLoaderMetaspace.cpp line 87: > 85: klass_alignment_words, > 86: "class arena"); > 87: } As per my comment in the header file, change the code to this: ```c++ if (class_context != nullptr) { // ... Same as in PR } else { _class_space_arena = _non_class_space_arena; } src/hotspot/share/memory/classLoaderMetaspace.cpp line 115: > 113: if (wastage.is_nonempty()) { > 114: non_class_space_arena()->deallocate(wastage); > 115: } This code reads a bit strangely. I understand *what* it tries to do. It tries to give back any wasted memory from either the class space arena *or* the non class space arena to the non class space arena's freelist. I assume that we do this since any wastage is presumably too small to be used by our new 22-bit class pointers. However, this context will be lost on future readers. It should have at least a comment in the `if (wastage.is_nonempty())` clause explaining what we expect should happen and why. For example: ```c++ // Any wasted memory is presumably too small for any class. // Therefore, give it back to the non-class space arena's free list. src/hotspot/share/memory/classLoaderMetaspace.cpp line 118: > 116: #ifdef ASSERT > 117: if (result.is_nonempty()) { > 118: const bool in_class_arena = class_space_arena() != nullptr ? class_space_arena()->contains(result) : false; Unnecessary nullptr check if you take my suggestion, or you should switch to `have_class_space_arena`. src/hotspot/share/memory/classLoaderMetaspace.cpp line 165: > 163: MetaBlock bl(ptr, word_size); > 164: // If the block would be reusable for a Klass, add to class arena, otherwise to > 165: // then non-class arena. Nit: spelling, "the" src/hotspot/share/memory/classLoaderMetaspace.hpp line 81: > 79: metaspace::MetaspaceArena* class_space_arena() const { return _class_space_arena; } > 80: > 81: bool have_class_space_arena() const { return _class_space_arena != nullptr; } This is unnecessary. Instead of having this and having to remember to check for nullness each time, just change the `_class_space_arena` to point to the same arena as the `_non_class_space_arena` does when we run with `-XX:-UseCompressedClassPointers` src/hotspot/share/memory/metaspace.cpp line 656: > 654: // Adjust size of the compressed class space. > 655: > 656: const size_t res_align = reserve_alignment(); Can you change the name to `root_chunk_size`? src/hotspot/share/memory/metaspace.hpp line 112: > 110: static size_t max_allocation_word_size(); > 111: > 112: // Minimum allocation alignment, in bytes. All MetaData shall be aligned correclty Nit: Spelling, "correctly" src/hotspot/share/memory/metaspace/metablock.hpp line 48: > 46: > 47: MetaWord* base() const { return _base; } > 48: const MetaWord* end() const { return _base + _word_size; } `assert(is_nonempty())` src/hotspot/share/memory/metaspace/metablock.hpp line 51: > 49: size_t word_size() const { return _word_size; } > 50: bool is_empty() const { return _base == nullptr; } > 51: bool is_nonempty() const { return _base != nullptr; } Can `_base == nullptr` but `_word_size != 0`? src/hotspot/share/memory/metaspace/metablock.hpp line 52: > 50: bool is_empty() const { return _base == nullptr; } > 51: bool is_nonempty() const { return _base != nullptr; } > 52: void reset() { _base = nullptr; _word_size = 0; } Is this function really necessary? According to my IDE it's only used in tests and even then the `MetaBlock` isn't used afterwards (so it has no effect). src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 44: > 42: class FreeBlocks; > 43: > 44: struct ArenaStats; Nit: Sort? src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 84: > 82: // between threads and needs to be synchronized in CLMS. > 83: > 84: const size_t _allocation_alignment_words; Nit: Document this? All other members are documented. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2296528491 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754335269 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754398993 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754343513 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754459464 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754330432 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754619023 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754508321 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754142822 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754142098 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754153662 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754192464 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754197251 From epeter at openjdk.org Wed Sep 11 14:17:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Sep 2024 14:17:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:34:28 GMT, Roman Kennke wrote: > > @rkennke Can you please explain the changes in these tests: > > ``` > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > > ``` > > > > > > > > > > > > > > > > > > > > > > > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. > > I will re-evaluate those tests, and add comments or remove the restrictions. If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;) My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2343797957 From rcastanedalo at openjdk.org Wed Sep 11 14:17:17 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Sep 2024 14:17:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix FullGCForwarding initialization src/hotspot/share/memory/metaspace/binList.hpp line 202: > 200: b_last = b; > 201: } > 202: if (UseNewCode)printf("\n"); I guess this line is a leftover to be removed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754702742 From rcastanedalo at openjdk.org Wed Sep 11 14:50:17 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Sep 2024 14:50:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix FullGCForwarding initialization src/hotspot/share/opto/machnode.cpp line 390: > 388: t = t->make_ptr(); > 389: } > 390: if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) { Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754813751 From stuefe at openjdk.org Wed Sep 11 16:17:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 11 Sep 2024 16:17:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 12:47:30 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/classLoaderMetaspace.cpp line 115: > >> 113: if (wastage.is_nonempty()) { >> 114: non_class_space_arena()->deallocate(wastage); >> 115: } > > This code reads a bit strangely. I understand *what* it tries to do. It tries to give back any wasted memory from either the class space arena *or* the non class space arena to the non class space arena's freelist. I assume that we do this since any wastage is presumably too small to be used by our new 22-bit class pointers. However, this context will be lost on future readers. It should have at least a comment in the `if (wastage.is_nonempty())` clause explaining what we expect should happen and why. For example: > > ```c++ > // Any wasted memory is presumably too small for any class. > // Therefore, give it back to the non-class space arena's free list. Yes. Some background: - wastage can only occur for larger Klass* alignments (aka class space arena alignment property), so only for +COH (note to self, maybe assert) - wastage is, by definition, not aligned to the required Klass* alignment, so it cannot be reused. Yes, its probably also too small Yes, I will write a better comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1755111131 From stuefe at openjdk.org Wed Sep 11 16:17:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 11 Sep 2024 16:17:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 14:15:12 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace/binList.hpp line 202: > >> 200: b_last = b; >> 201: } >> 202: if (UseNewCode)printf("\n"); > > I guess this line is a leftover to be removed? Yep thanks for spotting ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1755115905 From stuefe at openjdk.org Wed Sep 11 16:17:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 11 Sep 2024 16:17:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 16:14:39 GMT, Thomas Stuefe wrote: >> src/hotspot/share/memory/metaspace/binList.hpp line 202: >> >>> 200: b_last = b; >>> 201: } >>> 202: if (UseNewCode)printf("\n"); >> >> I guess this line is a leftover to be removed? > > Yep thanks for spotting So that was causing the empty lines in my logs (facepalm) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1755116656 From rkennke at openjdk.org Wed Sep 11 17:31:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 11 Sep 2024 17:31:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v12] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Make is_oop() MT-safe - Re-enable some vectorization tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/6abda7bc..b6c11f74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=10-11 Stats: 32 lines in 6 files changed: 7 ins; 8 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Wed Sep 11 17:38:57 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 11 Sep 2024 17:38:57 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Revert accidental change of UCOH default ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/b6c11f74..9e008ac1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From coleenp at openjdk.org Wed Sep 11 21:18:12 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 11 Sep 2024 21:18:12 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Wed, 11 Sep 2024 17:38:57 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Revert accidental change of UCOH default I was starting to understand the concerns with having prototype_header in Klass. It seems like it would simplify encoding the klass for object allocation. My recent change https://bugs.openjdk.org/browse/JDK-8338526 breaks this. You need to pass a parameter to Klass() to tell whether to encode the klass pointer or not, and pass this to Klass() constructor. diff --git a/src/hotspot/share/oops/instanceKlass.cpp b/src/hotspot/share/oops/instanceKlass.cpp index fd198f54fc9..7aa4bd24948 100644 --- a/src/hotspot/share/oops/instanceKlass.cpp +++ b/src/hotspot/share/oops/instanceKlass.cpp @@ -511,7 +511,7 @@ InstanceKlass::InstanceKlass() { } InstanceKlass::InstanceKlass(const ClassFileParser& parser, KlassKind kind, ReferenceType reference_type) : - Klass(kind), + Klass(kind, (!parser.is_interface() && !parser.is_abstract())), _nest_members(nullptr), _nest_host(nullptr), _permitted_subclasses(nullptr), ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2344715540 From stefank at openjdk.org Thu Sep 12 09:37:22 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 12 Sep 2024 09:37:22 GMT Subject: RFR: 8314842: zgc/genzgc tests ignore vm flags Message-ID: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com> Change some ZGC tests to propagate requested vm flags. I tested this by manually passing in various flags to jtreg. I also ran tier1-5 on Linux. ------------- Commit messages: - 8314842: zgc/genzgc tests ignore vm flags Changes: https://git.openjdk.org/jdk/pull/20963/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20963&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314842 Stats: 6 lines in 6 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20963.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20963/head:pull/20963 PR: https://git.openjdk.org/jdk/pull/20963 From rcastanedalo at openjdk.org Thu Sep 12 10:20:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 12 Sep 2024 10:20:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Wed, 11 Sep 2024 17:38:57 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Revert accidental change of UCOH default src/hotspot/share/opto/lcm.cpp line 272: > 270: const TypePtr* tptr; > 271: if ((UseCompressedOops || UseCompressedClassPointers) && > 272: (CompressedOops::shift() == 0 || CompressedKlassPointers::shift() == 0)) { Could you explain this change? It seems like it may affect C2's implicit null check analysis even for `-XX:-UseCompactObjectHeaders`. In particular, for the following configurations, the changed condition evaluates to true before the change and false after it, regardless of whether `UseCompactObjectHeaders` is enabled: (!UseCompressedOops, UseCompressedClassPointers, CompressedKlassPointers::shift() != 0) ( UseCompressedOops, !UseCompressedClassPointers, CompressedOops::shift() != 0) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1756570168 From rcastanedalo at openjdk.org Thu Sep 12 11:49:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 12 Sep 2024 11:49:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Wed, 11 Sep 2024 17:38:57 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Revert accidental change of UCOH default src/hotspot/share/cds/filemap.cpp line 2457: > 2455: compressed_oops(), compressed_class_pointers()); > 2456: if (compressed_oops() != UseCompressedOops || compressed_class_pointers() != UseCompressedClassPointers) { > 2457: log_info(cds)("Unable to use shared archive.\nThe saved state of UseCompressedOops and UseCompressedClassPointers is " The promotion of this CDS log line from `info` to `warning` triggers false failures in the `test/hotspot/jtreg/compiler/intrinsics/bmi` tests when running them with `-XX:-UseCompressedClassPointers`. These tests expect the standard output of different JVM runs to be identical, but the timestamps in the log messages tend to differ. I suggest adjusting the test configuration so that log timestamps are simply omitted, as in [this patch](https://github.com/robcasloz/jdk/commit/48f6e90ef6e0a71b55df536ed04a8b72130b5ea9) (feel free to merge it as-is or with any further changes you may find necessary). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1756699774 From tschatzl at openjdk.org Thu Sep 12 12:57:05 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 12 Sep 2024 12:57:05 GMT Subject: RFR: 8314842: zgc/genzgc tests ignore vm flags In-Reply-To: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com> References: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com> Message-ID: On Thu, 12 Sep 2024 09:31:46 GMT, Stefan Karlsson wrote: > Change some ZGC tests to propagate requested vm flags. > > I tested this by manually passing in various flags to jtreg. I also ran tier1-5 on Linux. lgtm ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20963#pullrequestreview-2300218267 From rkennke at openjdk.org Thu Sep 12 13:16:14 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 12 Sep 2024 13:16:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:34:28 GMT, Roman Kennke wrote: >> @rkennke Can you please explain the changes in these tests: >> >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >> >> >> You added these IR rule restriction: >> `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` >> >> This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. >> >> I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? >> >> Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. >> >> Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > >> @rkennke Can you please explain the changes in these tests: >> >> ``` >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >> ``` >> >> You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` >> >> This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. >> >> I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? >> >> Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. >> >> Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. > > I will re-evaluate those tests, and add comments or remove the restrictions. > > > @rkennke Can you please explain the changes in these tests: > > > ``` > > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` > > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. > > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? > > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. > > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > > > > > > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. > > I will re-evaluate those tests, and add comments or remove the restrictions. > > If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;) > > My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization. Indeed, I could re-enable all tests in: test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java but unfortunately not those others: > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset. I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2346250313 From epeter at openjdk.org Thu Sep 12 13:23:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Sep 2024 13:23:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:13:01 GMT, Roman Kennke wrote: > > > > @rkennke Can you please explain the changes in these tests: > > > > ``` > > > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > > > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` > > > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. > > > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? > > > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. > > > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > > > > > > > > > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. > > > I will re-evaluate those tests, and add comments or remove the restrictions. > > > > > > If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;) > > My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization. > > Indeed, I could re-enable all tests in: > > ``` > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > ``` > > but unfortunately not those others: > > ``` > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > ``` > > I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset. > > I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it. Excellent, that is what I hoped for! Thanks for filing the bug, I'll look into it once this is integrated. You should probably mark it as "blocked by", not "related to" ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2346266568 From stuefe at openjdk.org Thu Sep 12 15:41:22 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 12 Sep 2024 15:41:22 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Thu, 12 Sep 2024 10:17:47 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert accidental change of UCOH default > > src/hotspot/share/opto/lcm.cpp line 272: > >> 270: const TypePtr* tptr; >> 271: if ((UseCompressedOops || UseCompressedClassPointers) && >> 272: (CompressedOops::shift() == 0 || CompressedKlassPointers::shift() == 0)) { > > Could you explain this change? It seems like it may affect C2's implicit null check analysis even for `-XX:-UseCompactObjectHeaders`. In particular, for the following configurations, the changed condition evaluates to true before the change and false after it, regardless of whether `UseCompactObjectHeaders` is enabled: > > (!UseCompressedOops, UseCompressedClassPointers, CompressedKlassPointers::shift() != 0) > ( UseCompressedOops, !UseCompressedClassPointers, CompressedOops::shift() != 0) Hi @robcasloz The `CompressedKlassPointers` utility class is not usable anymore with `-UseCompressedClassPointers`. One change is that if `UseCompressedClassPointers` is off, `CompressedKlassPointers` stays uninitialized. And that makes more sense then to rely on the static initialization values of `CompressedOops::_shift`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757126946 From stuefe at openjdk.org Thu Sep 12 15:46:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 12 Sep 2024 15:46:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> On Wed, 11 Sep 2024 14:47:07 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert accidental change of UCOH default > > src/hotspot/share/opto/machnode.cpp line 390: > >> 388: t = t->make_ptr(); >> 389: } >> 390: if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) { > > Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`. I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757135035 From stuefe at openjdk.org Thu Sep 12 16:08:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 12 Sep 2024 16:08:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 15:58:29 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 147: > >> 145: #endif >> 146: >> 147: return true; > > This should only be in the compressedKlass.cpp file. Okay. I will remove the whole `CompressedKlassPointers::pd_initialize` logic. We only need it for one architecture (aarch) and one case (+UseCCP -UseCOH), so maybe its not worth fanning out across all platforms, including Zero. Instead, I will add a short `ifdef` section to `CompressedKlassPointers::initialize`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757169570 From coleenp at openjdk.org Thu Sep 12 17:37:15 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 12 Sep 2024 17:37:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 16:04:45 GMT, Thomas Stuefe wrote: >> src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 147: >> >>> 145: #endif >>> 146: >>> 147: return true; >> >> This should only be in the compressedKlass.cpp file. > > Okay. I will remove the whole `CompressedKlassPointers::pd_initialize` logic. We only need it for one architecture (aarch) and one case (+UseCCP -UseCOH), so maybe its not worth fanning out across all platforms, including Zero. Instead, I will add a short `ifdef` section to `CompressedKlassPointers::initialize`. Yes, looking at this further, it does seem like a small amount of conditional compilation that sets all the same values that are set in the architecture independent version. It seems best to move it there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757300544 From kdnilsen at openjdk.org Thu Sep 12 20:29:39 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 12 Sep 2024 20:29:39 GMT Subject: RFR: 8339960: Shenandoah: Fix inconsistencies in generational Shenandoah behaviors Message-ID: This fixes some bugs found in recent code review and playback of an assertion failure. See also https://github.com/openjdk/shenandoah/pull/497 ------------- Commit messages: - Use -1 for rightmost interval when range is empty - Check available rather than capacity before logging shortfall - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Revert "Make GC logging less verbose" - Make GC logging less verbose - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 13 more: https://git.openjdk.org/jdk/compare/81ff91ef...f1ba63f4 Changes: https://git.openjdk.org/jdk/pull/20974/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20974&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339960 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20974.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20974/head:pull/20974 PR: https://git.openjdk.org/jdk/pull/20974 From lmesnik at openjdk.org Fri Sep 13 01:08:06 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 13 Sep 2024 01:08:06 GMT Subject: RFR: 8314842: zgc/genzgc tests ignore vm flags In-Reply-To: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com> References: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com> Message-ID: On Thu, 12 Sep 2024 09:31:46 GMT, Stefan Karlsson wrote: > Change some ZGC tests to propagate requested vm flags. > > I tested this by manually passing in various flags to jtreg. I also ran tier1-5 on Linux. Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20963#pullrequestreview-2301742629 From stefank at openjdk.org Fri Sep 13 05:50:12 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 05:50:12 GMT Subject: RFR: 8314842: zgc/genzgc tests ignore vm flags In-Reply-To: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com> References: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com> Message-ID: <7SojoPqqF1B9dH2Dw7FSmdiGxx_eLJ8-pFksb2TP4k8=.b64e2636-6d9e-4581-8395-422e4351e4a4@github.com> On Thu, 12 Sep 2024 09:31:46 GMT, Stefan Karlsson wrote: > Change some ZGC tests to propagate requested vm flags. > > I tested this by manually passing in various flags to jtreg. I also ran tier1-5 on Linux. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20963#issuecomment-2348080550 From stefank at openjdk.org Fri Sep 13 05:50:12 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 05:50:12 GMT Subject: Integrated: 8314842: zgc/genzgc tests ignore vm flags In-Reply-To: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com> References: <7p6dM8Eb8dQSgTA_pDvyOLs3Qu6YB4uJT9LxZbCiwcU=.2fc89050-0ebd-4a4f-8968-7bfb824c3dac@github.com> Message-ID: <5s0GWk65p6QVVc6yXL8F8HCnDp3E1Cs0w6_29EcqUaQ=.01b6f839-312a-439a-9900-43a8908c74e2@github.com> On Thu, 12 Sep 2024 09:31:46 GMT, Stefan Karlsson wrote: > Change some ZGC tests to propagate requested vm flags. > > I tested this by manually passing in various flags to jtreg. I also ran tier1-5 on Linux. This pull request has now been integrated. Changeset: ae75ca05 Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/ae75ca05e450da577e712eb7ed9dd9203616b80b Stats: 6 lines in 6 files changed: 0 ins; 0 del; 6 mod 8314842: zgc/genzgc tests ignore vm flags Reviewed-by: tschatzl, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/20963 From rcastanedalo at openjdk.org Fri Sep 13 06:46:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Sep 2024 06:46:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> References: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> Message-ID: On Thu, 12 Sep 2024 15:42:59 GMT, Thomas Stuefe wrote: >> src/hotspot/share/opto/machnode.cpp line 390: >> >>> 388: t = t->make_ptr(); >>> 389: } >>> 390: if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) { >> >> Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`. > > I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP. I see, thanks. In that case, I would suggest removing the explicit `UseCompressedClassPointers` test, since it should be implied by `t->isa_narrowklass()`. `check_init()` within `CompressedKlassPointers::shift()` would already fail for the unexpected case where `t->isa_narrowklass() && !UseCompressedClassPointers`, no? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758270661 From rcastanedalo at openjdk.org Fri Sep 13 07:49:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Sep 2024 07:49:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Thu, 12 Sep 2024 15:38:18 GMT, Thomas Stuefe wrote: >> src/hotspot/share/opto/lcm.cpp line 272: >> >>> 270: const TypePtr* tptr; >>> 271: if ((UseCompressedOops || UseCompressedClassPointers) && >>> 272: (CompressedOops::shift() == 0 || CompressedKlassPointers::shift() == 0)) { >> >> Could you explain this change? It seems like it may affect C2's implicit null check analysis even for `-XX:-UseCompactObjectHeaders`. In particular, for the following configurations, the changed condition evaluates to true before the change and false after it, regardless of whether `UseCompactObjectHeaders` is enabled: >> >> (!UseCompressedOops, UseCompressedClassPointers, CompressedKlassPointers::shift() != 0) >> ( UseCompressedOops, !UseCompressedClassPointers, CompressedOops::shift() != 0) > > Hi @robcasloz > > The `CompressedKlassPointers` utility class is not usable anymore with `-UseCompressedClassPointers`. One change is that if `UseCompressedClassPointers` is off, `CompressedKlassPointers` stays uninitialized. And that makes more sense then to rely on the static initialization values of `CompressedOops::_shift`. Thanks for the explanation. I wonder if the test is necessary at all, or one could simply use `base->get_ptr_type()` unconditionally, which defaults to `base->bottom_type()->isa_ptr()` anyway for non-compressed pointers. But this simplification would be in any case out of the scope of this changeset. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758356268 From rcastanedalo at openjdk.org Fri Sep 13 07:57:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Sep 2024 07:57:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Thu, 12 Sep 2024 11:46:35 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert accidental change of UCOH default > > src/hotspot/share/cds/filemap.cpp line 2457: > >> 2455: compressed_oops(), compressed_class_pointers()); >> 2456: if (compressed_oops() != UseCompressedOops || compressed_class_pointers() != UseCompressedClassPointers) { >> 2457: log_info(cds)("Unable to use shared archive.\nThe saved state of UseCompressedOops and UseCompressedClassPointers is " > > The promotion of this CDS log line from `info` to `warning` triggers false failures in the `test/hotspot/jtreg/compiler/intrinsics/bmi` tests when running them with `-XX:-UseCompressedClassPointers`. These tests expect the standard output of different JVM runs to be identical, but the timestamps in the log messages tend to differ. I suggest adjusting the test configuration so that log timestamps are simply omitted, as in [this patch](https://github.com/robcasloz/jdk/commit/48f6e90ef6e0a71b55df536ed04a8b72130b5ea9) (feel free to merge it as-is or with any further changes you may find necessary). This comment has been marked as "resolved" without any apparent action being taken, is that intentional? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758369787 From rkennke at openjdk.org Fri Sep 13 08:21:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 08:21:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: Message-ID: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Hide log timestamps in test to prevent false failures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/9e008ac1..69f1ef1d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=12-13 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Fri Sep 13 08:21:55 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 08:21:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: <99QfaesSJzBLGXsBKOdiSwjAdt18pwNMh62Pyhr-6bk=.b27f001b-e3e3-4826-9542-698eef2a9ee3@github.com> On Fri, 13 Sep 2024 07:54:30 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/cds/filemap.cpp line 2457: >> >>> 2455: compressed_oops(), compressed_class_pointers()); >>> 2456: if (compressed_oops() != UseCompressedOops || compressed_class_pointers() != UseCompressedClassPointers) { >>> 2457: log_info(cds)("Unable to use shared archive.\nThe saved state of UseCompressedOops and UseCompressedClassPointers is " >> >> The promotion of this CDS log line from `info` to `warning` triggers false failures in the `test/hotspot/jtreg/compiler/intrinsics/bmi` tests when running them with `-XX:-UseCompressedClassPointers`. These tests expect the standard output of different JVM runs to be identical, but the timestamps in the log messages tend to differ. I suggest adjusting the test configuration so that log timestamps are simply omitted, as in [this patch](https://github.com/robcasloz/jdk/commit/48f6e90ef6e0a71b55df536ed04a8b72130b5ea9) (feel free to merge it as-is or with any further changes you may find necessary). > > This comment has been marked as "resolved" without any apparent action being taken, is that intentional? I have merged your patch locally but forgot to push it. Sorry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758407575 From tschatzl at openjdk.org Fri Sep 13 08:34:09 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Sep 2024 08:34:09 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 11:37:41 GMT, Matthias Baesken wrote: >> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational >> shows this error when running with ubsan enabled >> >> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero >> #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 >> #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 >> #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 >> #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 >> #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 >> #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust division following suggestion by xmas Lgtm but see the additional comment. src/hotspot/share/gc/z/zDirector.cpp line 490: > 488: > 489: // Calculate the GC cost for each reclaimed byte > 490: const double current_young_gc_time_per_bytes_freed = double(young_gc_time) / double(reclaimed_per_young_gc); Could this division have the same issue? ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20888#pullrequestreview-2302481070 PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1758431387 From aboldtch at openjdk.org Fri Sep 13 09:03:09 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 13 Sep 2024 09:03:09 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v2] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 08:31:43 GMT, Thomas Schatzl wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> Adjust division following suggestion by xmas > > src/hotspot/share/gc/z/zDirector.cpp line 490: > >> 488: >> 489: // Calculate the GC cost for each reclaimed byte >> 490: const double current_young_gc_time_per_bytes_freed = double(young_gc_time) / double(reclaimed_per_young_gc); > > Could this division have the same issue? Yes, it could if no memory has been reclaimed at all (since the VM started). Similar issues would occur in the call to `calculate_extra_young_gc_time` below. And there I think the problem is even worse, because we might end up with `inf - inf == -nan`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1758476392 From aboldtch at openjdk.org Fri Sep 13 09:19:04 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 13 Sep 2024 09:19:04 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v2] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 09:00:00 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/share/gc/z/zDirector.cpp line 490: >> >>> 488: >>> 489: // Calculate the GC cost for each reclaimed byte >>> 490: const double current_young_gc_time_per_bytes_freed = double(young_gc_time) / double(reclaimed_per_young_gc); >> >> Could this division have the same issue? > > Yes, it could if no memory has been reclaimed at all (since the VM started). Similar issues would occur in the call to `calculate_extra_young_gc_time` below. And there I think the problem is even worse, because we might end up with `inf - inf == -nan`. The case where we have performed a major collection and no young collection has reclaim any memory seems like a very degenerate situation. The solution is probably to handle that case separately, and not try to adapt the current heuristics to handle the extreme values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1758502503 From stuefe at openjdk.org Fri Sep 13 09:30:24 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 09:30:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:19:32 GMT, Coleen Phillimore wrote: >> This is tricky. We are already deep in initialization and have done a couple of decisions based on +UseCompressedClassPointers (e.g. CDS setup). I *think* we could still go with -UseCCP, but I wonder whether this is wise. >> >> Note that this error is not new. In the old code, we simply asserted. That left us with UB in release builds, which remains unresolved. I simply made the error explicit in release too. > > Ok, in this case, that's fine if we already asserted. A fatal error is better. Actually, a lot of the old code had dusty side corners that were UB. Making narrowKlass smaller than 32bit exposed a lot of them, and a lot of the changes in and around CompressedKlassPointers are about cleanly making explicit what before had been implicit or just broken (e.g. a clear distinction between encoding range and Klass range, and a clear handling of narrowKlass bit width as a runtime value). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758522844 From stuefe at openjdk.org Fri Sep 13 09:38:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 09:38:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:13:58 GMT, Thomas Stuefe wrote: >> src/hotspot/share/oops/compressedKlass.cpp line 243: >> >>> 241: } else { >>> 242: >>> 243: // In legacy mode, we try, in order of preference: >> >> Can you not use the word 'legacy' here? Maybe in "non-compact object header mode"... > > okay. I removed all traces of "legacy" and "tiny", reverting to "standard" or "non-coh" vs "coh". I would prefer to use the shorthand "coh" in some places since "compact object header mode" is a mouthful and gives me RSI :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758533732 From stefank at openjdk.org Fri Sep 13 09:44:19 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 09:44:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> References: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> Message-ID: On Fri, 13 Sep 2024 08:21:54 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Hide log timestamps in test to prevent false failures I went over the oops/ directory and added a few cleanup requests and comments. src/hotspot/share/oops/instanceOop.hpp line 43: > 41: } else { > 42: return sizeof(instanceOopDesc); > 43: } This entire function can be removed. It returns the same value as oopDesc::base_offset_in_bytes(), but in a slightly different way. src/hotspot/share/oops/markWord.hpp line 171: > 169: return mask_bits(value(), lock_mask_in_place | self_fwd_mask_in_place) >= static_cast(marked_value); > 170: } > 171: Suggestion to retain code layout. Suggestion: src/hotspot/share/oops/markWord.inline.hpp line 29: > 27: > 28: #include "oops/markWord.hpp" > 29: #include "oops/compressedOops.inline.hpp" Suggestion: #include "oops/compressedOops.inline.hpp" #include "oops/markWord.hpp" src/hotspot/share/oops/objArrayKlass.cpp line 146: > 144: > 145: size_t ObjArrayKlass::oop_size(oop obj) const { > 146: // In this assert, we cannot safely access the Klass* with compact headers. I would like a comment stating that this assert is turned of because size_give_klass calls oop_size on an object that might be concurrently forwarded. src/hotspot/share/oops/oop.cpp line 158: > 156: // Only has a klass gap when compressed class pointers are used and not > 157: // using compact headers. > 158: return UseCompressedClassPointers && !UseCompactObjectHeaders; This comment can just be removed. src/hotspot/share/oops/oop.hpp line 340: > 338: // field offset. Use an offset halfway into the markWord, as the markWord is never > 339: // partially loaded from C2. > 340: return 4; I asked around to see what people felt about dropping references to mark_offset_in_bytes(), which we know is 0. There was a request to strive to use mark_offset_in_bytes() for clarity. Suggestion: return mark_offset_in_bytes() + 4; src/hotspot/share/oops/oop.hpp line 349: > 347: static int klass_gap_offset_in_bytes() { > 348: assert(has_klass_gap(), "only applicable to compressed klass pointers"); > 349: assert(!UseCompactObjectHeaders, "don't use klass_gap_offset_in_bytes() with compact headers"); This assert is implied by `has_klass_gap()`. I don't see the need to repeat it here. src/hotspot/share/oops/oop.hpp line 363: > 361: return sizeof(markWord) + sizeof(Klass*); > 362: } > 363: } Not a strong request for this PR, but there are many places that calculates almost the same thing, and it might be good to limit the number of places we do similar calculations. I'm wondering if it wouldn't be better for readability to structure the code as follows: static int header_size_in_bytes() { if (UseCompactObjectHeaders) { return sizeof(markWord); } else if (UseCompressedClassPointers) { return sizeof(markWord) + sizeof(narrowKlass); } else { return sizeof(markWord) + sizeof(Klass*); } } // Size of object header, aligned to platform wordSize static int header_size() { return align_up(header_size_in_bytes(), HeapWordSize) / HeapWordSize; } ... static int base_offset_in_bytes() { return header_size_in_bytes(); } src/hotspot/share/oops/oop.inline.hpp line 161: > 159: > 160: void oopDesc::set_klass_gap(HeapWord* mem, int v) { > 161: assert(!UseCompactObjectHeaders, "don't set Klass* gap with compact headers"); We might want to consider just simplifying the function to: void oopDesc::set_klass_gap(HeapWord* mem, int v) { assert(has_klass_gap(), "precondition"); *(int*)(((char*)mem) + klass_gap_offset_in_bytes()) = v; } src/hotspot/share/oops/oop.inline.hpp line 295: > 293: // Used by scavengers > 294: void oopDesc::forward_to(oop p) { > 295: assert(cast_from_oop(p) != this, Do we really need the cast here? ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2302542279 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758503206 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758482703 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758505713 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758479437 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758478106 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758472909 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758474349 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758528515 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758538380 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758540055 From stefank at openjdk.org Fri Sep 13 09:44:20 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 09:44:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 12:17:17 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/oop.cpp line 230: > >> 228: // disjunct below to fail if the two comparands are computed across such >> 229: // a concurrent change. >> 230: return Universe::heap()->is_stw_gc_active() && klass->is_objArray_klass() && is_forwarded() && (UseParallelGC || UseG1GC); > > Is this still true after the recent changes like JDK-8311163? It might be worth waiting for. That bug doesn't fix all cases where the the length field is modified. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758477168 From tschatzl at openjdk.org Fri Sep 13 11:15:15 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Sep 2024 11:15:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Fri, 13 Sep 2024 09:00:32 GMT, Stefan Karlsson wrote: >> src/hotspot/share/oops/oop.cpp line 230: >> >>> 228: // disjunct below to fail if the two comparands are computed across such >>> 229: // a concurrent change. >>> 230: return Universe::heap()->is_stw_gc_active() && klass->is_objArray_klass() && is_forwarded() && (UseParallelGC || UseG1GC); >> >> Is this still true after the recent changes like JDK-8311163? It might be worth waiting for. > > That bug doesn't fix all cases where the the length field is modified. Which ones are remaining? JDK-8337709 implemented the same change for G1 GC before JDK-8311163. The full collectors/g1 marking do not modify the length fields but have multiple separate queues which is a different issue. It will also be handled by the new `PartialArrayTaskStepper`, but should be of no concern here. If I am not missing some case, this whole method is unnecessary now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758672296 From stuefe at openjdk.org Fri Sep 13 12:51:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 12:51:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Tue, 10 Sep 2024 12:35:42 GMT, Thomas Stuefe wrote: >> src/hotspot/share/oops/compressedKlass.cpp line 116: >> >>> 114: _range = end - _base; >>> 115: >>> 116: DEBUG_ONLY(assert_is_valid_encoding(addr, len, _base, _shift);) >> >> Can you refactor so the aarch64 path runs this same code without duplication? > > In tinycp mode, aarch64 runs this code though? The aarch64 variant of pd_initialize just returns then. In non-COH mode (preexisting, not touched by this patch) Aarch64 needs its own handling. I refactored: Now we should have no duplication (once my patch hits Romans PR branch) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758800913 From stefank at openjdk.org Fri Sep 13 12:51:18 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 12:51:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Fri, 13 Sep 2024 11:10:58 GMT, Thomas Schatzl wrote: >> That bug doesn't fix all cases where the the length field is modified. > > Which ones are remaining? JDK-8337709 implemented the same change for G1 GC before JDK-8311163. > > The full collectors/g1 marking do not modify the length fields but have multiple separate queues which is a different issue. It will also be handled by the new `PartialArrayTaskStepper`, but should be of no concern here. > > If I am not missing some case, this whole method is unnecessary now. If you've already fixed this for GC then I agree that we could remove this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758805418 From stefank at openjdk.org Fri Sep 13 12:51:18 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 12:51:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Fri, 13 Sep 2024 12:47:09 GMT, Stefan Karlsson wrote: >> Which ones are remaining? JDK-8337709 implemented the same change for G1 GC before JDK-8311163. >> >> The full collectors/g1 marking do not modify the length fields but have multiple separate queues which is a different issue. It will also be handled by the new `PartialArrayTaskStepper`, but should be of no concern here. >> >> If I am not missing some case, this whole method is unnecessary now. > > If you've already fixed this for GC then I agree that we could remove this. This seems like something that should be done as a separate patch that gets pushed before this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758808115 From rkennke at openjdk.org Fri Sep 13 12:56:17 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 12:56:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> Message-ID: On Fri, 13 Sep 2024 09:39:23 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Hide log timestamps in test to prevent false failures > > src/hotspot/share/oops/oop.inline.hpp line 295: > >> 293: // Used by scavengers >> 294: void oopDesc::forward_to(oop p) { >> 295: assert(cast_from_oop(p) != this, > > Do we really need the cast here? Yes, otherwise compiler complains about ambiguous != operator. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758815451 From rkennke at openjdk.org Fri Sep 13 13:03:16 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 13:03:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> Message-ID: On Fri, 13 Sep 2024 09:31:39 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Hide log timestamps in test to prevent false failures > > src/hotspot/share/oops/oop.hpp line 363: > >> 361: return sizeof(markWord) + sizeof(Klass*); >> 362: } >> 363: } > > Not a strong request for this PR, but there are many places that calculates almost the same thing, and it might be good to limit the number of places we do similar calculations. > > I'm wondering if it wouldn't be better for readability to structure the code as follows: > > static int header_size_in_bytes() { > if (UseCompactObjectHeaders) { > return sizeof(markWord); > } else if (UseCompressedClassPointers) { > return sizeof(markWord) + sizeof(narrowKlass); > } else { > return sizeof(markWord) + sizeof(Klass*); > } > } > > // Size of object header, aligned to platform wordSize > static int header_size() { > return align_up(header_size_in_bytes(), HeapWordSize) / HeapWordSize; > } > ... > static int base_offset_in_bytes() { > return header_size_in_bytes(); > } Ok. I filed: https://bugs.openjdk.org/browse/JDK-8340118 for now, let's see if I can sort this out before integrating this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758825458 From rkennke at openjdk.org Fri Sep 13 13:11:45 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 13:11:45 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Various touch-ups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/69f1ef1d..990926f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=13-14 Stats: 25 lines in 8 files changed: 3 ins; 17 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stefank at openjdk.org Fri Sep 13 13:18:16 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 13:18:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> Message-ID: On Fri, 13 Sep 2024 12:53:29 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/oop.inline.hpp line 295: >> >>> 293: // Used by scavengers >>> 294: void oopDesc::forward_to(oop p) { >>> 295: assert(cast_from_oop(p) != this, >> >> Do we really need the cast here? > > Yes, otherwise compiler complains about ambiguous != operator. OK, we shouldn't need to. It seems like I can silence the compiler by tweaking oopsHierarchy.hpp. I'll deal with that as a follow-up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758853099 From tschatzl at openjdk.org Fri Sep 13 13:51:16 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Sep 2024 13:51:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Fri, 13 Sep 2024 12:48:53 GMT, Stefan Karlsson wrote: >> If you've already fixed this for GC then I agree that we could remove this. > > This seems like something that should be done as a separate patch that gets pushed before this PR. Will remove in JDK-8340119. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758906485 From kvn at openjdk.org Fri Sep 13 22:12:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 22:12:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: <_7xTwicd2PDxJZOJ7xLdkHZc18UT-9sVZk-3YFgkMA0=.7db81e94-4f38-44a0-9983-6e391459aab2@github.com> References: <_7xTwicd2PDxJZOJ7xLdkHZc18UT-9sVZk-3YFgkMA0=.7db81e94-4f38-44a0-9983-6e391459aab2@github.com> Message-ID: <-lhMoCYQAGXWEAQ2ySemYzUh_DjKgqi4pG10NdrHils=.b2bc294a-941d-42aa-a00f-149d9260dfeb@github.com> On Mon, 9 Sep 2024 14:41:25 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 112: >> >>> 110: // The answer is that stores of different sizes can co-exist >>> 111: // in the same sequence of RawMem effects. We sometimes initialize >>> 112: // a whole 'tile' of array elements with a single jint or jlong.) >> >> I'm having trouble making sense of this comment. I guess a jlong could be used to null-initialize two >> 32bit oops/narrowOops? But that doesn't have anything to do with jints. > > I am not sure the complex overlap test is necessary here, this code was copy-pasted from [MemNode::find_previous_store()](https://github.com/openjdk/jdk/blob/c54fc08aa3c63e4b26dc5edb2436844dfd3bab7c/src/hotspot/share/opto/memnode.cpp#L678) by [JDK-8057737](https://bugs.openjdk.org/browse/JDK-8057737), and in this new context I do not see how we might find stores of different sizes as mentioned in the comment. jlongs could be used to null-initialize two 32-bit OOPs, but such initializing stores are not even visible in C2's intermediate representation at the time `G1BarrierSetC2::g1_can_remove_pre_barrier()` is called. The fact that the comment refers to initializing several array elements with a single jint suggests to me that this code has lost some of its original purpose after being copied into a narrower context (OOP stores after object allocations). But since this code is pre-existing and in the worst case it is just performing some unnecessary work, I suggest to leave it as-is and possibly investigate how to simplify it as a follow-up task. Yes, the comment reference to combined initialization stores: [memnode.cpp#L4925](https://github.com/openjdk/jdk/blob/c54fc08aa3c63e4b26dc5edb2436844dfd3bab7c/src/hotspot/share/opto/memnode.cpp#L4925) Which is used only for primitive type (integers and floats) constant strores. There was also recent change by Emanuel to combine stores into primitive arrays: [JDK-8335390](https://bugs.openjdk.org/browse/JDK-8335390) None of above do anything to oop stores. I agree that this code could left for now and be optimized later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759565105 From wkemper at openjdk.org Fri Sep 13 22:57:17 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 13 Sep 2024 22:57:17 GMT Subject: RFR: 8339960: Shenandoah: Fix inconsistencies in generational Shenandoah behaviors In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 20:23:36 GMT, Kelvin Nilsen wrote: > This fixes some bugs found in recent code review and playback of an assertion failure. > > See also https://github.com/openjdk/shenandoah/pull/497 Looks good to me. ------------- Marked as reviewed by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/20974#pullrequestreview-2304276183 From kvn at openjdk.org Fri Sep 13 23:23:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 23:23:19 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: References: Message-ID: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> On Wed, 11 Sep 2024 08:30:02 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Fix a few style issues src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 241: > 239: assert(newval_bottom->isa_ptr() || newval_bottom->isa_narrowoop(), "newval should be an OOP"); > 240: TypePtr::PTR newval_type = newval_bottom->make_ptr()->ptr(); > 241: uint8_t barrier_data = store->barrier_data(); Should you check barrier data for 0? `is_ptr()` has wide set of types. It includes TypeRawPtr, TypeKlassPtr and TypeMetadataPtr. Where you filtering them? src/hotspot/share/gc/g1/g1BarrierSet.cpp line 65: > 63: #else > 64: make_barrier_set_c2(), > 65: #endif I assume it is temporary until all ports a ready (except 32-bit x86 may be). Right? src/hotspot/share/opto/matcher.cpp line 1821: > 1819: if( rule >= _END_INST_CHAIN_RULE || rule < _BEGIN_INST_CHAIN_RULE ) { > 1820: assert(C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf), > 1821: "duplicating node that's already been matched"); Why it was removed? src/hotspot/share/opto/matcher.cpp line 2845: > 2843: n->Opcode() == Op_StoreN && > 2844: m->is_EncodeP(); > 2845: } Add comment that `m` is input of `n`. I thought about adding assert too but I will leave it to you. src/hotspot/share/opto/output.cpp line 2026: > 2024: if (n->is_MachNullCheck()) { > 2025: assert(n->in(1)->as_Mach()->barrier_data() == 0, > 2026: "Implicit null checks on memory accesses with barriers are not yet supported"); I don't see here changes in `lcm.cpp` which would prevent it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759604325 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759604944 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759593453 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759593131 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759605704 From stuefe at openjdk.org Sun Sep 15 06:17:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 15 Sep 2024 06:17:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Wed, 11 Sep 2024 21:15:21 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert accidental change of UCOH default > > I was starting to understand the concerns with having prototype_header in Klass. It seems like it would simplify encoding the klass for object allocation. My recent change https://bugs.openjdk.org/browse/JDK-8338526 breaks this. You need to pass a parameter to Klass() to tell whether to encode the klass pointer or not, and pass this to Klass() constructor. > > diff --git a/src/hotspot/share/oops/instanceKlass.cpp b/src/hotspot/share/oops/instanceKlass.cpp > index fd198f54fc9..7aa4bd24948 100644 > --- a/src/hotspot/share/oops/instanceKlass.cpp > +++ b/src/hotspot/share/oops/instanceKlass.cpp > @@ -511,7 +511,7 @@ InstanceKlass::InstanceKlass() { > } > > InstanceKlass::InstanceKlass(const ClassFileParser& parser, KlassKind kind, ReferenceType reference_type) : > - Klass(kind), > + Klass(kind, (!parser.is_interface() && !parser.is_abstract())), > _nest_members(nullptr), > _nest_host(nullptr), > _permitted_subclasses(nullptr), @coleenp > I was starting to understand the concerns with having prototype_header in Klass. It seems like it would simplify encoding the klass for object allocation. My recent change https://bugs.openjdk.org/browse/JDK-8338526 breaks this. You need to pass a parameter to Klass() to tell whether to encode the klass pointer or not, and pass this to Klass() constructor. > I solved this differently (Roman will merge this into his PR). static markWord make_prototype(const Klass* kls) { markWord prototype = markWord::prototype(); #ifdef _LP64 if (UseCompactObjectHeaders) { // With compact object headers, the narrow Klass ID is part of the mark word. // We therfore seed the mark word with the narrow Klass ID. // Note that only those Klass that can be instantiated have a narrow Klass ID. // For those who don't, we leave the klass bits empty and assert if someone // tries to use those. const narrowKlass nk = CompressedKlassPointers::is_encodable(kls) ? CompressedKlassPointers::encode(const_cast(kls)) : 0; prototype = prototype.set_narrow_klass(nk); } #endif return prototype; } inline bool CompressedKlassPointers::is_encodable(const void* address) { check_init(_base); // An address can only be encoded if: // // 1) the address lies within the klass range. // 2) It is suitably aligned to 2^encoding_shift. This only really matters for // +UseCompactObjectHeaders, since the encoding shift can be large (max 10 bits -> 1KB). return is_aligned(address, klass_alignment_in_bytes()) && address >= _klass_range_start && address < _klass_range_end; } So, we put an nKlass into the prototype if we can. We can, if the Klass address is encodable. It is encodable if it lives in the encoded Klass range and is correctly aligned. No need to pass this information via another channel: its right there, in the Klass address. This works even before Klass is initialized. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2351399143 From stuefe at openjdk.org Sun Sep 15 06:17:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 15 Sep 2024 06:17:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 11:25:41 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace/metablock.hpp line 51: > >> 49: size_t word_size() const { return _word_size; } >> 50: bool is_empty() const { return _base == nullptr; } >> 51: bool is_nonempty() const { return _base != nullptr; } > > Can `_base == nullptr` but `_word_size != 0`? No ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1759973362 From rcastanedalo at openjdk.org Mon Sep 16 06:56:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 06:56:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> On Fri, 13 Sep 2024 13:11:45 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Various touch-ups src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2576: > 2574: } else { > 2575: lea(dst, Address(obj, index, Address::lsl(scale))); > 2576: ldr(dst, Address(dst, offset)); Do you have a reproducer (or, better yet, a test case) that exercises this case? I ran Oracle's internal CI tiers 1-5 and could never hit it. Could this happen for x64 as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1760617744 From rcastanedalo at openjdk.org Mon Sep 16 08:07:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 08:07:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 13:11:45 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Various touch-ups > * Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version. An alternative that seems promising is to hide the object header klass pointer extraction and make it part of the `LoadNKlass` node semantics, as illustrated in this example: ![alternative-modeling](https://github.com/user-attachments/assets/06243966-3065-4969-a2dd-d05133b36366) `LoadNKlass` nodes can then be expanded into more primitive operations (load and shift for compact headers, load with `klass_offset_in_bytes()` for original headers) within C2's back-end or even during code emission as sketched [here](https://github.com/robcasloz/jdk/commit/6cb4219f101e3be982264071c2cb1d0af1c6d754). @rkennke is this similar to what you tried out ("Expanding it as a macro")? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2352253326 From lucy at openjdk.org Mon Sep 16 08:18:07 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 16 Sep 2024 08:18:07 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 11:37:41 GMT, Matthias Baesken wrote: >> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational >> shows this error when running with ubsan enabled >> >> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero >> #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 >> #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 >> #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 >> #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 >> #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 >> #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust division following suggestion by xmas Changes requested by lucy (Reviewer). src/hotspot/share/gc/z/zDirector.cpp line 491: > 489: // Calculate the GC cost for each reclaimed byte > 490: const double current_young_gc_time_per_bytes_freed = double(young_gc_time) / double(reclaimed_per_young_gc); > 491: const double current_old_gc_time_per_bytes_freed = reclaimed_per_old_gc == 0 ? std::numeric_limits::infinity() : double(old_gc_time) / double(reclaimed_per_old_gc); How about using some parentheses? To my understanding, the division has a higher precedence than the ternary conditional expression. See: https://en.cppreference.com/w/cpp/language/operator_precedence ------------- PR Review: https://git.openjdk.org/jdk/pull/20888#pullrequestreview-2306000198 PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1760715673 From rcastanedalo at openjdk.org Mon Sep 16 08:19:24 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 08:19:24 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 23:16:32 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix a few style issues > > src/hotspot/share/gc/g1/g1BarrierSet.cpp line 65: > >> 63: #else >> 64: make_barrier_set_c2(), >> 65: #endif > > I assume it is temporary until all ports a ready (except 32-bit x86 may be). Right? Right, all code guarded by `G1_LATE_BARRIER_MIGRATION_SUPPORT` will be removed before integration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1760716721 From rcastanedalo at openjdk.org Mon Sep 16 09:31:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 09:31:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 23:18:44 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix a few style issues > > src/hotspot/share/opto/output.cpp line 2026: > >> 2024: if (n->is_MachNullCheck()) { >> 2025: assert(n->in(1)->as_Mach()->barrier_data() == 0, >> 2026: "Implicit null checks on memory accesses with barriers are not yet supported"); > > I don't see here changes in `lcm.cpp` which would prevent it. I did not make any changes because the current logic in `lcm.cpp` already prevents this, albeit in a rather accidental way: `PhaseCFG::implicit_null_check` requires that all inputs of a candidate memory operation dominate the null check ([here](https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/lcm.cpp#L310-L328)) so that it can be hoisted. This fails if the candidate memory operation has barriers because these always require `MachTemp` nodes, which are placed in the same block as the candidate and break the dominance condition. See a longer explanation [here](https://github.com/openjdk/jdk/pull/19746/files#r1715387255). Should I add a check to `PhaseCFG::implicit_null_check` to discard these memory accesses more explicitly? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1760814745 From shade at openjdk.org Mon Sep 16 10:40:16 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Sep 2024 10:40:16 GMT Subject: RFR: 8340183: Shenandoah: Incorrect match for clone barrier in is_gc_barrier_node Message-ID: The name of the call we emit is "shenandoah_clone": https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L806 ...yet we test for "shenandoah_clone_barrier" here: https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L688 I think we are better off polling the call target instead of relying on call name. This change also eliminates `shenandoah_cas_obj` matcher, for which we do not have the emitted call ever since we started doing CAS expansions inline. Additional testing: - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/21014/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21014&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340183 Stats: 20 lines in 3 files changed: 6 ins; 10 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21014.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21014/head:pull/21014 PR: https://git.openjdk.org/jdk/pull/21014 From shade at openjdk.org Mon Sep 16 10:44:12 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Sep 2024 10:44:12 GMT Subject: RFR: 8340186: Shenandoah: Missing load_reference_barrier_phantom_narrow match in is_shenandoah_lrb_call Message-ID: [JDK-8256999](https://bugs.openjdk.org/browse/JDK-8256999) added `ShenandoahRuntime::load_reference_barrier_phantom_narrow`, but missed adding it to `ShenandoahBarrierSetC2::is_shenandoah_lrb_call`. It is currently innocuous, as there are no users of `is_shenandoah_lrb_call`, but it will be important when [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) lands. ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/21016/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21016&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340186 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21016.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21016/head:pull/21016 PR: https://git.openjdk.org/jdk/pull/21016 From rkennke at openjdk.org Mon Sep 16 12:38:00 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 16 Sep 2024 12:38:00 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v16] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 53 commits: - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java - Fix loop on aarch64 - clarify obscure assert in metasapce setup - Rework compressedklass encoding - remove stray debug output - Fixes post 8338526 - Merge commit '597788850041e7272a23714c05ba546ee6080856' into JDK-8305895-v4 - Various touch-ups - Hide log timestamps in test to prevent false failures - Revert accidental change of UCOH default - ... and 43 more: https://git.openjdk.org/jdk/compare/59778885...49c87547 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=15 Stats: 4605 lines in 190 files changed: 3252 ins; 724 del; 629 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Mon Sep 16 13:28:00 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 16 Sep 2024 13:28:00 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v17] In-Reply-To: References: Message-ID: <_5gI7i33xrOgXMTI_04oX9UDGwhVTtSNWoSiNfM3FOM=.b24979b3-dcde-401f-b2d8-9b201d303f57@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 54 commits: - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java - Fix loop on aarch64 - clarify obscure assert in metasapce setup - Rework compressedklass encoding - remove stray debug output - Fixes post 8338526 - Merge commit '597788850041e7272a23714c05ba546ee6080856' into JDK-8305895-v4 - Various touch-ups - Hide log timestamps in test to prevent false failures - ... and 44 more: https://git.openjdk.org/jdk/compare/54595188...2125cd81 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=16 Stats: 4598 lines in 190 files changed: 3245 ins; 719 del; 634 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Mon Sep 16 13:31:17 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 16 Sep 2024 13:31:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:13:01 GMT, Roman Kennke wrote: >>> @rkennke Can you please explain the changes in these tests: >>> >>> ``` >>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >>> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >>> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >>> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >>> ``` >>> >>> You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` >>> >>> This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. >>> >>> I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? >>> >>> Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. >>> >>> Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). >> >> IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. >> >> I will re-evaluate those tests, and add comments or remove the restrictions. > >> > > @rkennke Can you please explain the changes in these tests: >> > > ``` >> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >> > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >> > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >> > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >> > > ``` >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` >> > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. >> > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? >> > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. >> > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). >> > >> > >> > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. >> > I will re-evaluate those tests, and add comments or remove the restrictions. >> >> If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ... > `LoadNKlass` nodes can then be expanded into more primitive operations (load and shift for compact headers, load with `klass_offset_in_bytes()` for original headers) within C2's back-end or even during code emission as sketched [here](https://github.com/robcasloz/jdk/commit/6cb4219f101e3be982264071c2cb1d0af1c6d754). @rkennke is this similar to what you tried out ("Expanding it as a macro")? No, this is not what I tried. I tried to completely expand LoadNKlass, and replace it with the lower nodes that load and shift the mark-word right there, in ideal graph. But your approach is saner: there is so much implicit knowledge about Load(N)Klass, and even klass_offset_in_bytes(), all over the place, it would be very hard to get this right without breaking something. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2352926265 From roland at openjdk.org Mon Sep 16 14:07:07 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 16 Sep 2024 14:07:07 GMT Subject: RFR: 8340183: Shenandoah: Incorrect match for clone barrier in is_gc_barrier_node In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 10:35:15 GMT, Aleksey Shipilev wrote: > The name of the call we emit is "shenandoah_clone": > https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L806 > > ...yet we test for "shenandoah_clone_barrier" here: > https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L688 > > I think we are better off polling the call target instead of relying on call name. This change also eliminates `shenandoah_cas_obj` matcher, for which we do not have the emitted call ever since we started doing CAS expansions inline. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC` Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21014#pullrequestreview-2306783149 From aboldtch at openjdk.org Mon Sep 16 14:17:07 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 16 Sep 2024 14:17:07 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v2] In-Reply-To: References: Message-ID: <-fwIuetM6bjQ93mo3QgorOpFNkxkgJ2SH-LbTT0k2h0=.f37f385d-20cf-4cf8-9496-d7256482726d@github.com> On Mon, 16 Sep 2024 08:15:38 GMT, Lutz Schmidt wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> Adjust division following suggestion by xmas > > src/hotspot/share/gc/z/zDirector.cpp line 491: > >> 489: // Calculate the GC cost for each reclaimed byte >> 490: const double current_young_gc_time_per_bytes_freed = double(young_gc_time) / double(reclaimed_per_young_gc); >> 491: const double current_old_gc_time_per_bytes_freed = reclaimed_per_old_gc == 0 ? std::numeric_limits::infinity() : double(old_gc_time) / double(reclaimed_per_old_gc); > > How about using some parentheses? To my understanding, the division has a higher precedence than the ternary conditional expression. See: https://en.cppreference.com/w/cpp/language/operator_precedence I do not mind parentheses. But ternary are the lowest precedence (if you do not count the `,` which I would almost always say is wrong to use without a surrounding `() / [] / {}`), so to me it seems superfluous. Just to clarify the intent of this code is what we are getting with a higher precedence on division. That is: const double current_old_gc_time_per_bytes_freed = ((reclaimed_per_old_gc == 0) ? (std::numeric_limits::infinity()) : (double(old_gc_time) / double(reclaimed_per_old_gc))); _Side Note:_ I also think I prefer immediately invoked lambdas when the ternaries get this long. const double current_old_gc_time_per_bytes_freed = [&]() { if (reclaimed_per_old_gc == 0) { return std::numeric_limits::infinity(); } return double(old_gc_time) / double(reclaimed_per_old_gc); }(); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1761248058 From mbaesken at openjdk.org Mon Sep 16 15:07:42 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 16 Sep 2024 15:07:42 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v3] In-Reply-To: References: Message-ID: <9B02mnKaYLY90CZog5880J_BmLZT01rc6mEwqY6hWU0=.07e8eb6c-e9c3-40ac-98d1-8f99f32a5c88@github.com> > The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational > shows this error when running with ubsan enabled > > src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero > #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 > #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 > #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 > #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 > #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 > #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: add parentheses ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20888/files - new: https://git.openjdk.org/jdk/pull/20888/files/21fe3ca7..6902026f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20888&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20888&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20888.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20888/head:pull/20888 PR: https://git.openjdk.org/jdk/pull/20888 From kvn at openjdk.org Mon Sep 16 15:51:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 16 Sep 2024 15:51:15 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Mon, 16 Sep 2024 09:28:30 GMT, Roberto Casta?eda Lozano wrote: > Should I add a check to PhaseCFG::implicit_null_check to discard these memory accesses more explicitly? Yes, please. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1761413544 From aboldtch at openjdk.org Mon Sep 16 16:07:05 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 16 Sep 2024 16:07:05 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v3] In-Reply-To: <9B02mnKaYLY90CZog5880J_BmLZT01rc6mEwqY6hWU0=.07e8eb6c-e9c3-40ac-98d1-8f99f32a5c88@github.com> References: <9B02mnKaYLY90CZog5880J_BmLZT01rc6mEwqY6hWU0=.07e8eb6c-e9c3-40ac-98d1-8f99f32a5c88@github.com> Message-ID: On Mon, 16 Sep 2024 15:07:42 GMT, Matthias Baesken wrote: >> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational >> shows this error when running with ubsan enabled >> >> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero >> #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 >> #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 >> #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 >> #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 >> #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 >> #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add parentheses Changes requested by aboldtch (Reviewer). src/hotspot/share/gc/z/zDirector.cpp line 492: > 490: const double current_young_gc_time_per_bytes_freed = double(young_gc_time) / double(reclaimed_per_young_gc); > 491: const double current_old_gc_time_per_bytes_freed = ((reclaimed_per_old_gc == 0) ? (std::numeric_limits::infinity()) > 492: : (double(old_gc_time) / double(reclaimed_per_old_gc))); Suggestion: const double current_old_gc_time_per_bytes_freed = reclaimed_per_old_gc == 0 ? std::numeric_limits::infinity() : (double(old_gc_time) / double(reclaimed_per_old_gc)); Sorry I probably confused things here. I think this is what was wanted. I just added all the parentheses as a clarification of how this was meant to be parsed by the compiler. ------------- PR Review: https://git.openjdk.org/jdk/pull/20888#pullrequestreview-2307097576 PR Review Comment: https://git.openjdk.org/jdk/pull/20888#discussion_r1761435309 From kvn at openjdk.org Mon Sep 16 16:11:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 16 Sep 2024 16:11:04 GMT Subject: RFR: 8340186: Shenandoah: Missing load_reference_barrier_phantom_narrow match in is_shenandoah_lrb_call In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 10:38:12 GMT, Aleksey Shipilev wrote: > [JDK-8256999](https://bugs.openjdk.org/browse/JDK-8256999) added `ShenandoahRuntime::load_reference_barrier_phantom_narrow`, but missed adding it to `ShenandoahBarrierSetC2::is_shenandoah_lrb_call`. It is currently innocuous, as there are no users of `is_shenandoah_lrb_call`, but it will be important when [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) lands. Trivial ;) ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21016#pullrequestreview-2307108763 From aboldtch at openjdk.org Mon Sep 16 16:21:20 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 16 Sep 2024 16:21:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v17] In-Reply-To: <_5gI7i33xrOgXMTI_04oX9UDGwhVTtSNWoSiNfM3FOM=.b24979b3-dcde-401f-b2d8-9b201d303f57@github.com> References: <_5gI7i33xrOgXMTI_04oX9UDGwhVTtSNWoSiNfM3FOM=.b24979b3-dcde-401f-b2d8-9b201d303f57@github.com> Message-ID: On Mon, 16 Sep 2024 13:28:00 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 54 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 > - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java > - Fix loop on aarch64 > - clarify obscure assert in metasapce setup > - Rework compressedklass encoding > - remove stray debug output > - Fixes post 8338526 > - Merge commit '597788850041e7272a23714c05ba546ee6080856' into JDK-8305895-v4 > - Various touch-ups > - Hide log timestamps in test to prevent false failures > - ... and 44 more: https://git.openjdk.org/jdk/compare/54595188...2125cd81 src/hotspot/cpu/aarch64/aarch64.ad line 6459: > 6457: format %{ "ldrw $dst, $mem\t# compressed class ptr" %} > 6458: ins_encode %{ > 6459: __ load_nklass_compact_c2($dst$$Register, $mem$$base$$Register, $mem$$index$$Register, $mem$$scale, $mem$$disp); I wonder if something along the line of this is required here. Suggestion: Address addr = mem2address($mem->opcode(), $mem$$base$$Register, $mem$$index, $mem$$scale, $mem$$disp); __ load_nklass_compact_c2($dst$$Register, __ adjust_compact_object_header_address_c2(addr, rscratch1)); With `adjust_compact_object_header_address_c2` being: ```C++ Address C2_MacroAssembler::adjust_compact_object_header_address_c2(Address addr, Register tmp) { // The incoming address is pointing into obj-start + klass_offset_in_bytes. We need to extract // obj-start, so that we can load from the object's mark-word instead. Usually the address // comes as obj-start in addr.base() and klass_offset_in_bytes in addr.offset(). if (addr.getMode() != Address::base_plus_offset) { lea(tmp, addr); addr = Address(tmp, -oopDesc::klass_offset_in_bytes()); } else { addr = Address(addr.base(), addr.offset() - oopDesc::klass_offset_in_bytes()); } return legitimize_address(addr, 8, tmp); } Maybe it is the case that we never get the case where `$mem->opcode()` is not `lsl` variant, nor that the offset is to far away for an immediate fixed by `legitimize_address`. But it seems like this would at least make those cases correct, while avoiding the `lea` in the common case. Maybe someone with better experience in aarch64 macroassembler+ad files and C2 can give an opinion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1761455581 From shade at openjdk.org Mon Sep 16 16:25:08 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Sep 2024 16:25:08 GMT Subject: RFR: 8340186: Shenandoah: Missing load_reference_barrier_phantom_narrow match in is_shenandoah_lrb_call In-Reply-To: References: Message-ID: <2JTmLP8qoq6b358-CVLMhT5fgLK3rB_dWqTdNtwYUXg=.6a991137-e68a-4bdc-b58c-3bfb66774e79@github.com> On Mon, 16 Sep 2024 10:38:12 GMT, Aleksey Shipilev wrote: > [JDK-8256999](https://bugs.openjdk.org/browse/JDK-8256999) added `ShenandoahRuntime::load_reference_barrier_phantom_narrow`, but missed adding it to `ShenandoahBarrierSetC2::is_shenandoah_lrb_call`. It is currently innocuous, as there are no users of `is_shenandoah_lrb_call`, but it will be important when [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) lands. Yup :) Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21016#issuecomment-2353368636 From shade at openjdk.org Mon Sep 16 16:25:09 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Sep 2024 16:25:09 GMT Subject: Integrated: 8340186: Shenandoah: Missing load_reference_barrier_phantom_narrow match in is_shenandoah_lrb_call In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 10:38:12 GMT, Aleksey Shipilev wrote: > [JDK-8256999](https://bugs.openjdk.org/browse/JDK-8256999) added `ShenandoahRuntime::load_reference_barrier_phantom_narrow`, but missed adding it to `ShenandoahBarrierSetC2::is_shenandoah_lrb_call`. It is currently innocuous, as there are no users of `is_shenandoah_lrb_call`, but it will be important when [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) lands. This pull request has now been integrated. Changeset: 1640bd26 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/1640bd2676d8d183f02b4f5386ce42c47950e356 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8340186: Shenandoah: Missing load_reference_barrier_phantom_narrow match in is_shenandoah_lrb_call Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/21016 From rcastanedalo at openjdk.org Mon Sep 16 16:34:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 16:34:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v21] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with three additional commits since the last revision: - Add missing IR test to test run - Skip barrier refining for non-OOP stores and stores without barrier data - Assert that m is input to n in Matcher::is_encode_and_store_pattern ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/141020e6..653f9acf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=19-20 Stats: 21 lines in 3 files changed: 16 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Mon Sep 16 16:37:32 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 16:37:32 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 22:51:07 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix a few style issues > > src/hotspot/share/opto/matcher.cpp line 2845: > >> 2843: n->Opcode() == Op_StoreN && >> 2844: m->is_EncodeP(); >> 2845: } > > Add comment that `m` is input of `n`. I thought about adding assert too but I will leave it to you. Added the assertion (commit a480d70b). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1761478462 From rcastanedalo at openjdk.org Mon Sep 16 16:49:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 16:49:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 23:14:19 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix a few style issues > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 241: > >> 239: assert(newval_bottom->isa_ptr() || newval_bottom->isa_narrowoop(), "newval should be an OOP"); >> 240: TypePtr::PTR newval_type = newval_bottom->make_ptr()->ptr(); >> 241: uint8_t barrier_data = store->barrier_data(); > > Should you check barrier data for 0? > `is_ptr()` has wide set of types. It includes TypeRawPtr, TypeKlassPtr and TypeMetadataPtr. Where you filtering them? I added the check and excluded other pointers than OOPs, narrow OOPs, and null pointers (needed because null in uncompressed OOP mode is typed as `AnyPtr`) in commit 10bc0d2c. Note that these checks are not strictly required for correctness, because for all other pointers the corresponding barrier data would be 0, and the only potential operations over it would be bit clearing. But I still think they have value in that they communicate more clearly the intent and scope of the optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1761494258 From rkennke at openjdk.org Mon Sep 16 17:53:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 16 Sep 2024 17:53:09 GMT Subject: RFR: 8339960: GenShen: Fix inconsistencies in generational Shenandoah behavior In-Reply-To: References: Message-ID: <82yjeweCEIPcfscwWESC3M8c_UTgXKOAROiVXAKF09k=.4273df48-3931-4836-8da7-c25d8fd5a29b@github.com> On Thu, 12 Sep 2024 20:23:36 GMT, Kelvin Nilsen wrote: > This fixes some bugs found in recent code review and playback of an assertion failure. > > See also https://github.com/openjdk/shenandoah/pull/497 Looks good, thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20974#pullrequestreview-2307337108 From duke at openjdk.org Mon Sep 16 18:09:22 2024 From: duke at openjdk.org (duke) Date: Mon, 16 Sep 2024 18:09:22 GMT Subject: RFR: 8339960: GenShen: Fix inconsistencies in generational Shenandoah behavior In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 20:23:36 GMT, Kelvin Nilsen wrote: > This fixes some bugs found in recent code review and playback of an assertion failure. > > See also https://github.com/openjdk/shenandoah/pull/497 @kdnilsen Your change (at version f1ba63f4d58161512ad0262783ceda0916aece3c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20974#issuecomment-2353576294 From kdnilsen at openjdk.org Mon Sep 16 19:18:11 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 16 Sep 2024 19:18:11 GMT Subject: Integrated: 8339960: GenShen: Fix inconsistencies in generational Shenandoah behavior In-Reply-To: References: Message-ID: <6MxEO3TVWiI4HzK-mHqwb32Yq8tRq6Gg6PbxePp8Hl8=.ced31c43-4ff8-45af-9293-819a6cc9ab73@github.com> On Thu, 12 Sep 2024 20:23:36 GMT, Kelvin Nilsen wrote: > This fixes some bugs found in recent code review and playback of an assertion failure. > > See also https://github.com/openjdk/shenandoah/pull/497 This pull request has now been integrated. Changeset: 858b4f12 Author: Kelvin Nilsen Committer: Y. Srinivas Ramakrishna URL: https://git.openjdk.org/jdk/commit/858b4f127ad873666f51f4c54c37fa2d7801c32c Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8339960: GenShen: Fix inconsistencies in generational Shenandoah behavior Reviewed-by: wkemper, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/20974 From rcastanedalo at openjdk.org Tue Sep 17 05:20:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 17 Sep 2024 05:20:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v22] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Discard memory accesses with barrier data as implicit null check candidates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/653f9acf..71a51bfc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=20-21 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Tue Sep 17 05:20:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 17 Sep 2024 05:20:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Mon, 16 Sep 2024 15:48:32 GMT, Vladimir Kozlov wrote: >> I did not make any changes because the current logic in `lcm.cpp` already prevents this, albeit in a rather accidental way: `PhaseCFG::implicit_null_check` requires that all inputs of a candidate memory operation dominate the null check ([here](https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/lcm.cpp#L310-L328)) so that it can be hoisted. This fails if the candidate memory operation has barriers because these always require `MachTemp` nodes, which are placed in the same block as the candidate and break the dominance condition. See a longer explanation [here](https://github.com/openjdk/jdk/pull/19746/files#r1715387255). >> >> Should I add a check to `PhaseCFG::implicit_null_check` to discard these memory accesses more explicitly? > >> Should I add a check to PhaseCFG::implicit_null_check to discard these memory accesses more explicitly? > > Yes, please. Done (commit 71a51bfc). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1762318179 From mbaesken at openjdk.org Tue Sep 17 07:28:51 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 17 Sep 2024 07:28:51 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v4] In-Reply-To: References: Message-ID: > The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational > shows this error when running with ubsan enabled > > src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero > #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 > #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 > #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 > #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 > #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 > #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: adjust parentheses ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20888/files - new: https://git.openjdk.org/jdk/pull/20888/files/6902026f..7ecdb37f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20888&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20888&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20888.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20888/head:pull/20888 PR: https://git.openjdk.org/jdk/pull/20888 From aboldtch at openjdk.org Tue Sep 17 07:45:06 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 17 Sep 2024 07:45:06 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v4] In-Reply-To: References: Message-ID: <8oaWmqLYOQgXvxb4I9EFR_Jw7IyPWz6O9_nd9i2YlB4=.30e72075-acda-4eba-9a3b-b3589e22df13@github.com> On Tue, 17 Sep 2024 07:28:51 GMT, Matthias Baesken wrote: >> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational >> shows this error when running with ubsan enabled >> >> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero >> #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 >> #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 >> #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 >> #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 >> #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 >> #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > adjust parentheses Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20888#pullrequestreview-2308661474 From lucy at openjdk.org Tue Sep 17 09:32:11 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 17 Sep 2024 09:32:11 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v4] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 07:28:51 GMT, Matthias Baesken wrote: >> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational >> shows this error when running with ubsan enabled >> >> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero >> #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 >> #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 >> #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 >> #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 >> #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 >> #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > adjust parentheses Looks good now. Thanks ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20888#pullrequestreview-2309074532 From rkennke at openjdk.org Tue Sep 17 09:35:02 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 17 Sep 2024 09:35:02 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v18] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 - Fixes post-8340184 - Merge upstream up to and including 8340184 - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java - Fix loop on aarch64 - clarify obscure assert in metasapce setup - Rework compressedklass encoding - remove stray debug output - Fixes post 8338526 - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=17 Stats: 4518 lines in 190 files changed: 3180 ins; 718 del; 620 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stuefe at openjdk.org Tue Sep 17 10:02:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:02:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 12:25:37 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/classLoaderMetaspace.hpp line 81: > >> 79: metaspace::MetaspaceArena* class_space_arena() const { return _class_space_arena; } >> 80: >> 81: bool have_class_space_arena() const { return _class_space_arena != nullptr; } > > This is unnecessary. Instead of having this and having to remember to check for nullness each time, just change the `_class_space_arena` to point to the same arena as the `_non_class_space_arena` does when we run with `-XX:-UseCompressedClassPointers` I'd prefer not to. This logic (when -UCCP class space arena is NULL, with the implicit assumption that both are different entities) has been in there forever, and changing that is out of scope for and unrelated to this PR. I am not sure what will break if I change this but don't want to chase risk test errors at this point (one example, reporting would have to be adapted to recognize that both arenas are the same, and there are plenty of tests that would also need to be fixd). This can be done in a follow-up RFE if necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762917467 From stuefe at openjdk.org Tue Sep 17 10:05:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:05:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:05:10 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/classLoaderMetaspace.cpp line 165: > >> 163: MetaBlock bl(ptr, word_size); >> 164: // If the block would be reusable for a Klass, add to class arena, otherwise to >> 165: // then non-class arena. > > Nit: spelling, "the" Okay ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762928041 From stuefe at openjdk.org Tue Sep 17 10:16:24 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:16:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:50:59 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace.cpp line 656: > >> 654: // Adjust size of the compressed class space. >> 655: >> 656: const size_t res_align = reserve_alignment(); > > Can you change the name to `root_chunk_size`? It feels wrong, since this is a deeply hidden implementation detail.\ I will remove this temporary variable, which will also make the diff smaller. > src/hotspot/share/memory/metaspace.hpp line 112: > >> 110: static size_t max_allocation_word_size(); >> 111: >> 112: // Minimum allocation alignment, in bytes. All MetaData shall be aligned correclty > > Nit: Spelling, "correctly" Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762968742 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762972938 From stuefe at openjdk.org Tue Sep 17 10:23:19 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:23:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 11:25:56 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace/metablock.hpp line 48: > >> 46: >> 47: MetaWord* base() const { return _base; } >> 48: const MetaWord* end() const { return _base + _word_size; } > > `assert(is_nonempty())` Raises the question of why here and not in other accessors? Note that the only patch via which end() is called already asserts for non-empty-ness (MetaspaceArena::contains). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762985723 From jsjolen at openjdk.org Tue Sep 17 10:31:19 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 17 Sep 2024 10:31:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 09:59:49 GMT, Thomas Stuefe wrote: >> src/hotspot/share/memory/classLoaderMetaspace.hpp line 81: >> >>> 79: metaspace::MetaspaceArena* class_space_arena() const { return _class_space_arena; } >>> 80: >>> 81: bool have_class_space_arena() const { return _class_space_arena != nullptr; } >> >> This is unnecessary. Instead of having this and having to remember to check for nullness each time, just change the `_class_space_arena` to point to the same arena as the `_non_class_space_arena` does when we run with `-XX:-UseCompressedClassPointers` > > I'd prefer not to. > > This logic (when -UCCP class space arena is NULL, with the implicit assumption that both are different entities) has been in there forever, and changing that is out of scope for and unrelated to this PR. I am not sure what will break if I change this but don't want to risk test errors at this point (one example, reporting would have to be adapted to recognize that both arenas are the same, and there are plenty of tests that would also need to be fixd). > > This can be done in a follow-up RFE if necessary. OK, that's fine. >> src/hotspot/share/memory/metaspace.cpp line 656: >> >>> 654: // Adjust size of the compressed class space. >>> 655: >>> 656: const size_t res_align = reserve_alignment(); >> >> Can you change the name to `root_chunk_size`? > > It feels wrong, since this is a deeply hidden implementation detail.\ > > I will remove this temporary variable, which will also make the diff smaller. Sounds OK, I wanted the name change to indicate that "hey, deep impl detail where we use this to mean something else". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762993568 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762994772 From stuefe at openjdk.org Tue Sep 17 10:31:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:31:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: <32_SIVHDWyZyYSvbV1jUHc631MTKUP2Thh_M9Q71jrc=.351aed23-599d-4a53-9cc0-0e9c85ecdf03@github.com> On Wed, 11 Sep 2024 11:29:38 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace/metablock.hpp line 52: > >> 50: bool is_empty() const { return _base == nullptr; } >> 51: bool is_nonempty() const { return _base != nullptr; } >> 52: void reset() { _base = nullptr; _word_size = 0; } > > Is this function really necessary? According to my IDE it's only used in tests and even then the `MetaBlock` isn't used afterwards (so it has no effect). see test_clms.cpp, test_random function, used in two places there. > src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 84: > >> 82: // between threads and needs to be synchronized in CLMS. >> 83: >> 84: const size_t _allocation_alignment_words; > > Nit: Document this? All other members are documented. ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762993378 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762995731 From stuefe at openjdk.org Tue Sep 17 10:31:23 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:31:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v18] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 11:40:24 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: >> >> - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 >> - Fixes post-8340184 >> - Merge upstream up to and including 8340184 >> - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 >> - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java >> - Fix loop on aarch64 >> - clarify obscure assert in metasapce setup >> - Rework compressedklass encoding >> - remove stray debug output >> - Fixes post 8338526 >> - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed > > src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 44: > >> 42: class FreeBlocks; >> 43: >> 44: struct ArenaStats; > > Nit: Sort? ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762994972 From jsjolen at openjdk.org Tue Sep 17 10:47:25 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 17 Sep 2024 10:47:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v18] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 09:35:02 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: > > - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 > - Fixes post-8340184 > - Merge upstream up to and including 8340184 > - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 > - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java > - Fix loop on aarch64 > - clarify obscure assert in metasapce setup > - Rework compressedklass encoding > - remove stray debug output > - Fixes post 8338526 > - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed Hi, We've gone through the rest of the Metaspace code and looked at the tests. It looks OK to us. Would like to see some style cleanups in the tests, but that can wait as a follow up. test/hotspot/gtest/metaspace/test_clms.cpp line 193: > 191: > 192: { > 193: // Nonclass arena allocation. The style in this source file isn't really up to scratch, especially *these* lines. Anyway, it's in the tests, so I'm OK with this being fixed in a follow up RFE. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2309360771 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1763005291 From mbaesken at openjdk.org Tue Sep 17 12:01:10 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 17 Sep 2024 12:01:10 GMT Subject: RFR: 8339648: ZGC: Division by zero in rule_major_allocation_rate [v4] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 07:28:51 GMT, Matthias Baesken wrote: >> The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational >> shows this error when running with ubsan enabled >> >> src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero >> #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 >> #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 >> #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 >> #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 >> #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 >> #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 >> #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > adjust parentheses Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20888#issuecomment-2355506623 From mbaesken at openjdk.org Tue Sep 17 12:01:11 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 17 Sep 2024 12:01:11 GMT Subject: Integrated: 8339648: ZGC: Division by zero in rule_major_allocation_rate In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 10:26:19 GMT, Matthias Baesken wrote: > The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational > shows this error when running with ubsan enabled > > src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero > #0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491 > #1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822 > #2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912 > #3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29 > #4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48 > #5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225 > #6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858 This pull request has now been integrated. Changeset: 80db6e71 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/80db6e71b092867212147bd369a9fda65dbd4b70 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8339648: ZGC: Division by zero in rule_major_allocation_rate Reviewed-by: aboldtch, lucy, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/20888 From rkennke at openjdk.org Tue Sep 17 12:52:03 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 17 Sep 2024 12:52:03 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - CompressedKlassPointers::is_encodable shall be callable with -UseCCP - Johan review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/28a26aed..612d3045 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=17-18 Stats: 39 lines in 7 files changed: 22 ins; 8 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From duke at openjdk.org Tue Sep 17 12:54:15 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 17 Sep 2024 12:54:15 GMT Subject: RFR: 8339161: ZGC: Remove unused remembered sets Message-ID: In ZGC, when a page becomes old it needs a remembered set (remset) which stores 2 bits per byte, a memory overhead of 3.125% (2/64) per page that stores an allocated remset. When an old page is potentially freed and inserted in the page cache, it can later be re-used as a young page. In this case, the remset is still allocated even though the young page does not need it. This is especially noteworthy for long-running programs where pages are recycled for a long enough period to have a remset allocated for close to all pages. The attached plot shows remset memory usage for pages that are "live" for a program that frequently recycles pages using a cache-mechanism. As remsets for young pages are unused, it should be considered wasted memory. ![remset_waste](https://github.com/user-attachments/assets/2a60948b-9297-4554-8fb4-d9f527855c33) The alternative solution that I propose in this PR deletes/frees remsets when an old page is inserted into the page cache, to not wast ememory. This would mean that remsets are only stored for old pages that are in use. Pages that are not in use or are young, should not have an allocated remset. With this change, the line showing remset usage for young pages would disappear in the plot above and the total memory usage dictated by the number of "live" old pages. Below is a performance measurement of initializing vs. clearing remsets in the same cache program mentioned above. When deleting remsets, freeing is made by GC threads so there is no latency impact for mutators. Initializing remsets is on average ~2.3x slower than clearing, but also uses 3.125% less memory for pages that do not need a remset. ~89% of the measured initializations are made by GC threads and the rest by mutator threads. | | min (ms) | max (ms) | mean (ms) | | ------------ | -------- | -------- | ---------- | | remset init | 0.000292 | 0.706 | 0.00258083 | | remset clear | 0.000082 | 0.015 | 0.00111340 | Tested with tiers 1-7 and local test making sure there are no remsets for young pages. SPECjbb2015 performance measurements show no statistically significant regression/improvement. ------------- Commit messages: - Merge resize and delete for remsets - 8339161: ZGC: Remove unused remembered sets Changes: https://git.openjdk.org/jdk/pull/20947/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20947&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339161 Stats: 95 lines in 7 files changed: 1 ins; 67 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/20947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20947/head:pull/20947 PR: https://git.openjdk.org/jdk/pull/20947 From kvn at openjdk.org Tue Sep 17 16:12:24 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Sep 2024 16:12:24 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v22] In-Reply-To: References: Message-ID: <9qt_iRfNSfdLuraZ18LqQx_xVt7xNPF9SBRZXwWkIig=.f86597de-6440-4235-9152-55b3a08c0d89@github.com> On Tue, 17 Sep 2024 05:20:30 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Discard memory accesses with barrier data as implicit null check candidates Looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2310210106 From rcastanedalo at openjdk.org Wed Sep 18 07:18:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Sep 2024 07:18:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v22] In-Reply-To: <9qt_iRfNSfdLuraZ18LqQx_xVt7xNPF9SBRZXwWkIig=.f86597de-6440-4235-9152-55b3a08c0d89@github.com> References: <9qt_iRfNSfdLuraZ18LqQx_xVt7xNPF9SBRZXwWkIig=.f86597de-6440-4235-9152-55b3a08c0d89@github.com> Message-ID: On Tue, 17 Sep 2024 16:09:30 GMT, Vladimir Kozlov wrote: > Looks good to me. Thanks for reviewing, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2357686525 From rcastanedalo at openjdk.org Wed Sep 18 07:49:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Sep 2024 07:49:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v23] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - Restore some asserts - Default values for tmp regs of G1PostBarrierStubC2 - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 - 8330685: [arm32] share barrier spilling logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/71a51bfc..13b93bd9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=21-22 Stats: 614 lines in 12 files changed: 521 ins; 36 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Wed Sep 18 08:00:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Sep 2024 08:00:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v23] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 07:49:52 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: > > - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms > - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - Restore some asserts > - Default values for tmp regs of G1PostBarrierStubC2 > - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 > - 8330685: [arm32] share barrier spilling logic Thanks for the arm 32-bits port @snazarkin! Merged in commit 3957c03f. Besides the arm 32-bits port, @snazarkin's changeset includes adding the possibility to use a third temporary register in the platform-independent class `G1PostBarrierStubC2`. This temporary register (`G1PostBarrierStubC2::_tmp3`) is initialized to `noreg` by default in `G1PostBarrierStubC2::initialize_registers`, so no other platform should be affected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2357765066 From rkennke at openjdk.org Wed Sep 18 12:11:31 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 12:11:31 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Mon, 16 Sep 2024 06:53:42 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Various touch-ups > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2576: > >> 2574: } else { >> 2575: lea(dst, Address(obj, index, Address::lsl(scale))); >> 2576: ldr(dst, Address(dst, offset)); > > Do you have a reproducer (or, better yet, a test case) that exercises this case? I ran Oracle's internal CI tiers 1-5 and could never hit it. Could this happen for x64 as well? AFAIK, this happens only when using compressed oops with a heap-base in r27. When running with that setting, we would get addresses like r27[nklass] or r27[nklass]+offset, both with scale=8. You would need large heaps, perhaps >4GB, to get this coops setting. The problem with aarch64 is that we can't have an address like r27[nklass]+offset, that's why we need to lea the r27[nklass] part first. Yes, this also happens on x86, but x86 supports rX[nklass]+offset addressing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1764937842 From rkennke at openjdk.org Wed Sep 18 12:25:50 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 12:25:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v20] In-Reply-To: References: Message-ID: <1o2b4fxBhqrlRqkNwKqZD1mgRNfTM16_NHZweEbd9SI=.1f68868b-1b98-4f78-9d37-2a805ffc932b@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 60 commits: - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 - CompressedKlassPointers::is_encodable shall be callable with -UseCCP - Johan review feedback - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 - Fixes post-8340184 - Merge upstream up to and including 8340184 - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java - Fix loop on aarch64 - clarify obscure assert in metasapce setup - ... and 50 more: https://git.openjdk.org/jdk/compare/19b2cee4...bb641621 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=19 Stats: 4525 lines in 190 files changed: 3194 ins; 718 del; 613 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From yzheng at openjdk.org Wed Sep 18 12:25:51 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 18 Sep 2024 12:25:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 12:52:03 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - CompressedKlassPointers::is_encodable shall be callable with -UseCCP > - Johan review feedback Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2358324621 From rkennke at openjdk.org Wed Sep 18 12:38:21 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 12:38:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 19:04:13 GMT, Chris Plummer wrote: >> I pulled your changes and I see one slight difference in the output. The following line is missing: >> >> `_metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject` >> >> I realize that there is no `_metadata._compressed_klass` when you have compact headers, and that the Klass* is encoded in the `_mark` word, which is now looks something like this in the output: >> >> _mark: 16294762323640321 >> >> So you can say that the Klass* is embedded in the _mark work, but this isn't of much help to SA users. I think what is expected is that the visitor is passed a MetadataField object that when getValue() is called on it, the Klass mirror is returned. Maybe we need a new CompactKlassField type like we current have a NarrowKlassField field type, and it will do the decoding of the _mark work into a Klass. The current getKlass() is related to this. > > Thinking about this a bit more, maybe _mark needs to be a MetadataFile rather than CInt. This is a kind of odd situation. Basically we have a CInt field that is more than just simple bits used as flags or small integers. It also gets you to the Klass*. Possibly SA should treat _mark is two seprate fields; one that remains a CInt as it currently is and another that treats it as an encoded Klass* like the NarrowKlassField case. Do you think this needs to be addressed before integration? And if so, could you help with implementation? Or could we do it after intergration? Then please file a follow-up issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1764976086 From rkennke at openjdk.org Wed Sep 18 12:59:25 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 12:59:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Mon, 9 Sep 2024 18:30:21 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: >> >> - Print as warning when UCOH doesn't match in CDS archive >> - Improve initialization of mark-word in CDS ArchiveHeapWriter >> - Simplify getKlass() in SA >> - Simplify oopDesc::init_mark() >> - Get rid of forward_safe_* methods >> - GCForwarding touch-ups > > src/hotspot/share/oops/markWord.inline.hpp line 90: > >> 88: ShouldNotReachHere(); >> 89: return markWord(); >> 90: #endif > > Is the ifdef _LP64 necessary, since UseCompactObjectHeaders should always be false for 32 bits? Kindof. The problem is that klass_shift is larger than 31, and shifting with it would thus be UB and generate a compiler warning. I opted to simply not compile any of that code in 32bit builds. We could also define klass_shift differently on 32bit. Long-term (maybe with Lilliput2/4-byte-headers?) it would be nice to consolidate the header layout between 32 and 64 bit builds and not make any distinction anywhere. E.g. define markWord (or objectHeader?) in a single way, and use that to extract all the relevant stuff. It's not totally unlikely that we deprecate 32-bit builds before that can happen, though. > src/hotspot/share/oops/oop.inline.hpp line 90: > >> 88: } else { >> 89: return markWord::prototype(); >> 90: } > > Could this be unconditional since prototoype_header is initialized for all Klasses? yes, but there is ongoing effort (at Oracle) to get rid of ```Klass::_prototype_header``` altogether. Let's wait for that and see how it looks then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765003983 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765006669 From rkennke at openjdk.org Wed Sep 18 13:23:44 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 13:23:44 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: JVMCI support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/bb641621..9ad2e62f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=19-20 Stats: 22 lines in 6 files changed: 16 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From shade at openjdk.org Wed Sep 18 13:55:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 18 Sep 2024 13:55:37 GMT Subject: RFR: 8340381: Shenandoah: Class mirrors verification should check forwarded objects Message-ID: <9vV2xnuP2lgRCLLbB5LWnIg26HtPjS7BOIyt0qaLkwg=.d7975d49-c70b-43e5-89cb-ef1b4f86ac52@github.com> The from-space objects can be effectively dead, and their backlinks to `InstanceKlass*` not updated anymore. So they can point to garbage. Additional testing: - [x] Some previously failing reproducers are not failing anymore - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/21064/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21064&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340381 Stats: 22 lines in 2 files changed: 9 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/21064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21064/head:pull/21064 PR: https://git.openjdk.org/jdk/pull/21064 From stuefe at openjdk.org Wed Sep 18 14:00:25 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 18 Sep 2024 14:00:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 12:27:14 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/classLoaderMetaspace.cpp line 87: > >> 85: klass_alignment_words, >> 86: "class arena"); >> 87: } > > As per my comment in the header file, change the code to this: > > ```c++ > if (class_context != nullptr) { > // ... Same as in PR > } else { > _class_space_arena = _non_class_space_arena; > } Rather not, see reasoning under https://github.com/openjdk/jdk/pull/20677/files#r1754330432 > src/hotspot/share/memory/classLoaderMetaspace.cpp line 118: > >> 116: #ifdef ASSERT >> 117: if (result.is_nonempty()) { >> 118: const bool in_class_arena = class_space_arena() != nullptr ? class_space_arena()->contains(result) : false; > > Unnecessary nullptr check if you take my suggestion, or you should switch to `have_class_space_arena`. See reasoning under https://github.com/openjdk/jdk/pull/20677/files#r1754335269 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765113297 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765113850 From cjplummer at openjdk.org Wed Sep 18 16:41:20 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 18 Sep 2024 16:41:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:35:28 GMT, Roman Kennke wrote: >> Thinking about this a bit more, maybe _mark needs to be a MetadataField rather than CInt. This is a kind of odd situation. Basically we have a CInt field that is more than just simple bits used as flags or small integers. It also gets you to the Klass*. Possibly SA should treat _mark is two separate fields; one that remains a CInt as it currently is and another that treats it as an encoded Klass* like the NarrowKlassField case. > > Do you think this needs to be addressed before integration? And if so, could you help with implementation? Or could we do it after intergration? Then please file a follow-up issue. Ok. I filed [JDK-8340396](https://bugs.openjdk.org/browse/JDK-8340396). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765387764 From rcastanedalo at openjdk.org Wed Sep 18 17:45:51 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Sep 2024 17:45:51 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v24] In-Reply-To: References: Message-ID: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - Remove redundant comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/13b93bd9..d54d67f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=22-23 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From aboldtch at openjdk.org Wed Sep 18 18:41:09 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 18 Sep 2024 18:41:09 GMT Subject: RFR: 8339161: ZGC: Remove unused remembered sets In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 12:15:47 GMT, Joel Sikstr?m wrote: > In ZGC, when a page becomes old it needs a remembered set (remset) which stores 2 bits per byte, a memory overhead of 3.125% (2/64) per page that stores an allocated remset. > > When an old page is potentially freed and inserted in the page cache, it can later be re-used as a young page. In this case, the remset is still allocated even though the young page does not need it. This is especially noteworthy for long-running programs where pages are recycled for a long enough period to have a remset allocated for close to all pages. > > The attached plot shows remset memory usage for pages that are "live" for a program that frequently recycles pages using a cache-mechanism. As remsets for young pages are unused, it should be considered wasted memory. > > ![remset_waste](https://github.com/user-attachments/assets/2a60948b-9297-4554-8fb4-d9f527855c33) > > The alternative solution that I propose in this PR deletes/frees remsets when an old page is inserted into the page cache, to not wast ememory. This would mean that remsets are only stored for old pages that are in use. Pages that are not in use or are young, should not have an allocated remset. With this change, the line showing remset usage for young pages would disappear in the plot above and the total memory usage dictated by the number of "live" old pages. > > Below is a performance measurement of initializing vs. clearing remsets in the same cache program mentioned above. When deleting remsets, freeing is made by GC threads so there is no latency impact for mutators. Initializing remsets is on average ~2.3x slower than clearing, but also uses 3.125% less memory for pages that do not need a remset. ~89% of the measured initializations are made by GC threads and the rest by mutator threads. > > | | min (ms) | max (ms) | mean (ms) | > | ------------ | -------- | -------- | ---------- | > | remset init | 0.000292 | 0.706 | 0.00258083 | > | remset clear | 0.000082 | 0.015 | 0.00111340 | > > Tested with tiers 1-7 and local test making sure there are no remsets for young pages. SPECjbb2015 performance measurements show no statistically significant regression/improvement. lgtm. Nicely done. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20947#pullrequestreview-2313491254 From wkemper at openjdk.org Wed Sep 18 21:09:19 2024 From: wkemper at openjdk.org (William Kemper) Date: Wed, 18 Sep 2024 21:09:19 GMT Subject: RFR: 8340400: Shenandoah: Whitebox breakpoint GC requests may cause assertions Message-ID: When a test requests a concurrent GC breakpoint, the calling thread arranges for itself to block until the concurrent GC thread notifies it that the GC has reached the requested breakpoint (phase). The code that handles the whitebox breakpoint request should therefore not block the caller. An attempt was made to do this, but the request just has the caller thread run in a busy loop without waiting. What's more, this loop resets the requested gc cause on every iteration, which may lead to gc cycles with a wb_breakpoint cause, but no breakpoint set - which violates assertions. ------------- Commit messages: - Do not block whitebox breakpoint requests for gc Changes: https://git.openjdk.org/jdk/pull/21074/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21074&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340400 Stats: 13 lines in 1 file changed: 10 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21074.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21074/head:pull/21074 PR: https://git.openjdk.org/jdk/pull/21074 From coleenp at openjdk.org Thu Sep 19 00:04:45 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 19 Sep 2024 00:04:45 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:23:44 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > JVMCI support src/hotspot/share/oops/compressedKlass.cpp line 242: > 240: } else { > 241: > 242: // Traditional (non-compact) header mode) Extra ) src/hotspot/share/oops/compressedKlass.hpp line 175: > 173: // 5b) if CDS=off: Calls initialize() - here, we have more freedom and, if we want, can choose an encoding > 174: // base that differs from the reservation base from step (4). That allows us, e.g., to later use > 175: // zero-based encoding. Not for this but is there really any benefit for zero based encoding for klass ids? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765888065 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765889975 From coleenp at openjdk.org Thu Sep 19 00:04:46 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 19 Sep 2024 00:04:46 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Wed, 18 Sep 2024 12:56:16 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/oop.inline.hpp line 90: >> >>> 88: } else { >>> 89: return markWord::prototype(); >>> 90: } >> >> Could this be unconditional since prototoype_header is initialized for all Klasses? > > yes, but there is ongoing effort (at Oracle) to get rid of ```Klass::_prototype_header``` altogether. Let's wait for that and see how it looks then. Yes, I saw that patch. I'm not sure I like the idea of cpu dependent code also doing the encoding. There were some C2 changes related to it that I didn't understand if that scheme required them. I don't see the down side to having the prototype header pre-encoded in the markWord. Seems simpler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765893566 From zgu at openjdk.org Thu Sep 19 00:40:21 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Thu, 19 Sep 2024 00:40:21 GMT Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue Message-ID: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. Adopt shared implementation. ------------- Commit messages: - 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue Changes: https://git.openjdk.org/jdk/pull/21077/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21077&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340408 Stats: 49 lines in 4 files changed: 0 ins; 47 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21077.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21077/head:pull/21077 PR: https://git.openjdk.org/jdk/pull/21077 From stefank at openjdk.org Thu Sep 19 05:00:37 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 05:00:37 GMT Subject: RFR: 8339161: ZGC: Remove unused remembered sets In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 12:15:47 GMT, Joel Sikstr?m wrote: > In ZGC, when a page becomes old it needs a remembered set (remset) which stores 2 bits per byte, a memory overhead of 3.125% (2/64) per page that stores an allocated remset. > > When an old page is potentially freed and inserted in the page cache, it can later be re-used as a young page. In this case, the remset is still allocated even though the young page does not need it. This is especially noteworthy for long-running programs where pages are recycled for a long enough period to have a remset allocated for close to all pages. > > The attached plot shows remset memory usage for pages that are "live" for a program that frequently recycles pages using a cache-mechanism. As remsets for young pages are unused, it should be considered wasted memory. > > ![remset_waste](https://github.com/user-attachments/assets/2a60948b-9297-4554-8fb4-d9f527855c33) > > The alternative solution that I propose in this PR deletes/frees remsets when an old page is inserted into the page cache, to not wast ememory. This would mean that remsets are only stored for old pages that are in use. Pages that are not in use or are young, should not have an allocated remset. With this change, the line showing remset usage for young pages would disappear in the plot above and the total memory usage dictated by the number of "live" old pages. > > Below is a performance measurement of initializing vs. clearing remsets in the same cache program mentioned above. When deleting remsets, freeing is made by GC threads so there is no latency impact for mutators. Initializing remsets is on average ~2.3x slower than clearing, but also uses 3.125% less memory for pages that do not need a remset. ~89% of the measured initializations are made by GC threads and the rest by mutator threads. > > | | min (ms) | max (ms) | mean (ms) | > | ------------ | -------- | -------- | ---------- | > | remset init | 0.000292 | 0.706 | 0.00258083 | > | remset clear | 0.000082 | 0.015 | 0.00111340 | > > Tested with tiers 1-7 and local test making sure there are no remsets for young pages. SPECjbb2015 performance measurements show no statistically significant regression/improvement. Looks good! Thanks for fixing. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20947#pullrequestreview-2314405140 From stefank at openjdk.org Thu Sep 19 05:06:51 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 05:06:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Wed, 18 Sep 2024 23:59:39 GMT, Coleen Phillimore wrote: >> yes, but there is ongoing effort (at Oracle) to get rid of ```Klass::_prototype_header``` altogether. Let's wait for that and see how it looks then. > > Yes, I saw that patch. I'm not sure I like the idea of cpu dependent code also doing the encoding. There were some C2 changes related to it that I didn't understand if that scheme required them. I don't see the down side to having the prototype header pre-encoded in the markWord. Seems simpler. We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766163092 From stefank at openjdk.org Thu Sep 19 05:53:48 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 05:53:48 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:23:44 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > JVMCI support src/hotspot/share/gc/parallel/psParallelCompact.cpp line 787: > 785: // The gap is always equal to min-fill-size, so nothing to do. > 786: return; > 787: } Reading the comment, it is not obvious that this is correct if you set MinObjectAlignment to something larger than the default value: void PSParallelCompact::fill_dense_prefix_end(SpaceId id) { // Comparing two sizes to decide if filling is required: // // The size of the filler (min-obj-size) is 2 heap words with the default // MinObjAlignment, since both markword and klass take 1 heap word. // // The size of the gap (if any) right before dense-prefix-end is // MinObjAlignment. // // Need to fill in the gap only if it's smaller than min-obj-size, and the // filler obj will extend to next region. // Note: If min-fill-size decreases to 1, this whole method becomes redundant. if (UseCompactObjectHeaders) { // The gap is always equal to min-fill-size, so nothing to do. return; } assert(CollectedHeap::min_fill_size() >= 2, "inv"); src/hotspot/share/oops/compressedKlass.cpp line 231: > 229: // The reason is that we want to avoid, if possible, shifts larger than > 230: // a cacheline size. > 231: _base = addr; Why is this important? src/hotspot/share/oops/compressedKlass.hpp line 261: > 259: } > 260: > 261: }; Missing blank line before `#endif` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766185665 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766192688 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766193355 From stefank at openjdk.org Thu Sep 19 05:53:49 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 05:53:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: <-UEFgAIQjGBginN0JRoyuwwJLmDssUEQGE_tCP-tRkw=.01ef3f37-01fa-4931-b4f3-571d21252bbd@github.com> On Thu, 19 Sep 2024 05:35:34 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/share/gc/parallel/psParallelCompact.cpp line 787: > >> 785: // The gap is always equal to min-fill-size, so nothing to do. >> 786: return; >> 787: } > > Reading the comment, it is not obvious that this is correct if you set MinObjectAlignment to something larger than the default value: > > void PSParallelCompact::fill_dense_prefix_end(SpaceId id) { > // Comparing two sizes to decide if filling is required: > // > // The size of the filler (min-obj-size) is 2 heap words with the default > // MinObjAlignment, since both markword and klass take 1 heap word. > // > // The size of the gap (if any) right before dense-prefix-end is > // MinObjAlignment. > // > // Need to fill in the gap only if it's smaller than min-obj-size, and the > // filler obj will extend to next region. > > // Note: If min-fill-size decreases to 1, this whole method becomes redundant. > if (UseCompactObjectHeaders) { > // The gap is always equal to min-fill-size, so nothing to do. > return; > } > assert(CollectedHeap::min_fill_size() >= 2, "inv"); Style note: The added code is inserted between a comment and the code that the comment refers to. It would be nice to tidy this up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766186545 From shade at openjdk.org Thu Sep 19 05:58:38 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Sep 2024 05:58:38 GMT Subject: RFR: 8340183: Shenandoah: Incorrect match for clone barrier in is_gc_barrier_node In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 10:35:15 GMT, Aleksey Shipilev wrote: > The name of the call we emit is "shenandoah_clone": > https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L806 > > ...yet we test for "shenandoah_clone_barrier" here: > https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L688 > > I think we are better off polling the call target instead of relying on call name. This change also eliminates `shenandoah_cas_obj` matcher, for which we do not have the emitted call ever since we started doing CAS expansions inline. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC` Need another review here. @rkennke, maybe? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21014#issuecomment-2360040048 From shade at openjdk.org Thu Sep 19 08:31:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Sep 2024 08:31:35 GMT Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue In-Reply-To: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> Message-ID: <1m4c0G4GgF_uHGxwKvqmilfGjIv1qqsvhCZX3VfKvbo=.5628bbd0-f0c2-4efd-a9f0-c43c0a8ccc64@github.com> On Thu, 19 Sep 2024 00:33:04 GMT, Zhengyu Gu wrote: > [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. > > Adopt shared implementation. Ah, cool. Thanks. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21077#pullrequestreview-2314809810 From shade at openjdk.org Thu Sep 19 08:45:36 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Sep 2024 08:45:36 GMT Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue In-Reply-To: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> Message-ID: <8ML4TaXKk8S8zA_MiOfZyniZqVP7uyQMaY2SRw5Nsow=.490b258d-3731-4fca-bfb1-07439da5a1a3@github.com> On Thu, 19 Sep 2024 00:33:04 GMT, Zhengyu Gu wrote: > [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. > > Adopt shared implementation. @earthling-amzn, @kdnilsen, @ysramakrishna -- your turn :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21077#issuecomment-2360383565 From mli at openjdk.org Thu Sep 19 10:32:50 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 19 Sep 2024 10:32:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:23:44 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > JVMCI support src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2529: > 2527: } > 2528: __ decode_klass_not_null(result); > 2529: } else { Could this if/else block be replaced with a simple call of load_klass(...)? src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3522: > 3520: { > 3521: __ movptr(result, Address(obj, oopDesc::klass_offset_in_bytes())); > 3522: } Could this if/else block be replaced with a simple call of load_klass(...)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766587136 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766582255 From rcastanedalo at openjdk.org Thu Sep 19 11:02:50 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Sep 2024 11:02:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 08:04:43 GMT, Roberto Casta?eda Lozano wrote: > I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version. What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on `oopDesc::klass_offset_in_bytes()`) being a pre-condition for a future, non-experimental version of compact headers? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2360673405 From yzheng at openjdk.org Thu Sep 19 11:12:49 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 19 Sep 2024 11:12:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Thu, 19 Sep 2024 05:03:42 GMT, Stefan Karlsson wrote: >> Yes, I saw that patch. I'm not sure I like the idea of cpu dependent code also doing the encoding. There were some C2 changes related to it that I didn't understand if that scheme required them. I don't see the down side to having the prototype header pre-encoded in the markWord. Seems simpler. > > We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler. Could you please point me to the C2 change? Is it going to be integrated in this PR? We in Graal have not yet adopted `Klass::_prototype_header` and will hold if you decide to get rid of it ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766642585 From stuefe at openjdk.org Thu Sep 19 11:39:50 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:39:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 23:49:34 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/share/oops/compressedKlass.cpp line 242: > >> 240: } else { >> 241: >> 242: // Traditional (non-compact) header mode) > > Extra ) Will fix ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766676702 From rkennke at openjdk.org Thu Sep 19 11:52:34 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 11:52:34 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v22] In-Reply-To: References: Message-ID: <0mWQW50x4UNwdsRE94w3rZVGnppxQeR9fbe4eUrAGtM=.cca89805-ca82-4605-bc11-4f9ac53d2b90@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Simplify LIR_Assembler::emit_load_klass() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/9ad2e62f..b25a4b69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=20-21 Stats: 28 lines in 2 files changed: 0 ins; 26 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Sep 19 11:52:34 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 11:52:34 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:00:20 GMT, Roberto Casta?eda Lozano wrote: > > I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version. > > What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on `oopDesc::klass_offset_in_bytes()`) being a pre-condition for a future, non-experimental version of compact headers? Yes, that sounds like a good improvement! It'd also clean up C2 considerably - right now there are many places in C2 that rely on klass_offset_in_bytes(). Getting rid of them all would be great, but also seems like a major effort. Could you file an issue to track that future work? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2360756796 From rkennke at openjdk.org Thu Sep 19 11:52:37 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 11:52:37 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 10:29:11 GMT, Hamlin Li wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2529: > >> 2527: } >> 2528: __ decode_klass_not_null(result); >> 2529: } else { > > Could this if/else block be replaced with a simple call of load_klass(...)? Yes, will do. > src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3522: > >> 3520: { >> 3521: __ movptr(result, Address(obj, oopDesc::klass_offset_in_bytes())); >> 3522: } > > Could this if/else block be replaced with a simple call of load_klass(...)? Yes, will do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766689169 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766689004 From stuefe at openjdk.org Thu Sep 19 11:52:38 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:52:38 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 05:44:42 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/share/oops/compressedKlass.cpp line 231: > >> 229: // The reason is that we want to avoid, if possible, shifts larger than >> 230: // a cacheline size. >> 231: _base = addr; > > Why is this important? It lessens the cache effects of Klass hyperaligning. > src/hotspot/share/oops/compressedKlass.hpp line 261: > >> 259: } >> 260: >> 261: }; > > Missing blank line before `#endif` Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766684016 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766684491 From stuefe at openjdk.org Thu Sep 19 11:52:39 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:52:39 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:43:12 GMT, Thomas Stuefe wrote: >> src/hotspot/share/oops/compressedKlass.cpp line 231: >> >>> 229: // The reason is that we want to avoid, if possible, shifts larger than >>> 230: // a cacheline size. >>> 231: _base = addr; >> >> Why is this important? > > It lessens the cache effects of Klass hyperaligning. Note that if we go with my KLUT proposal for post-Lilliput (the GC oop iteration improvements), this will not matter anymore and can be simplified to a fixed shift of 10. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766688756 From stuefe at openjdk.org Thu Sep 19 11:52:40 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:52:40 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 23:53:28 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/share/oops/compressedKlass.hpp line 175: > >> 173: // 5b) if CDS=off: Calls initialize() - here, we have more freedom and, if we want, can choose an encoding >> 174: // base that differs from the reservation base from step (4). That allows us, e.g., to later use >> 175: // zero-based encoding. > > Not for this but is there really any benefit for zero based encoding for klass ids? Yes, I think so. I think the SAP Jit people investigated this when doing the PPC ports. You save at least two instructions, and possibly more, per decode op. You save code size too since you don't need to materialize the 64-bit base immediate. Especially on x64 this can mean easily 11 fewer bytes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766681110 From stuefe at openjdk.org Thu Sep 19 11:52:42 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:52:42 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v18] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 10:36:58 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: >> >> - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 >> - Fixes post-8340184 >> - Merge upstream up to and including 8340184 >> - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 >> - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java >> - Fix loop on aarch64 >> - clarify obscure assert in metasapce setup >> - Rework compressedklass encoding >> - remove stray debug output >> - Fixes post 8338526 >> - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed > > test/hotspot/gtest/metaspace/test_clms.cpp line 193: > >> 191: >> 192: { >> 193: // Nonclass arena allocation. > > The style in this source file isn't really up to scratch, especially *these* lines. Anyway, it's in the tests, so I'm OK with this being fixed in a follow up RFE. Okay, will fix ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766686807 From rkennke at openjdk.org Thu Sep 19 11:57:52 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 11:57:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Thu, 19 Sep 2024 05:03:42 GMT, Stefan Karlsson wrote: >> Yes, I saw that patch. I'm not sure I like the idea of cpu dependent code also doing the encoding. There were some C2 changes related to it that I didn't understand if that scheme required them. I don't see the down side to having the prototype header pre-encoded in the markWord. Seems simpler. > > We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler. We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766697849 From rkennke at openjdk.org Thu Sep 19 12:08:46 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 12:08:46 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: References: Message-ID: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 - review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/b25a4b69..0d8a9236 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=21-22 Stats: 10 lines in 3 files changed: 1 ins; 4 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From shade at openjdk.org Thu Sep 19 12:20:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Sep 2024 12:20:39 GMT Subject: RFR: 8340400: Shenandoah: Whitebox breakpoint GC requests may cause assertions In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 21:02:23 GMT, William Kemper wrote: > When a test requests a concurrent GC breakpoint, the calling thread arranges for itself to block until the concurrent GC thread notifies it that the GC has reached the requested breakpoint (phase). The code that handles the whitebox breakpoint request should therefore not block the caller. An attempt was made to do this, but the request just has the caller thread run in a busy loop without waiting. What's more, this loop resets the requested gc cause on every iteration, which may lead to gc cycles with a wb_breakpoint cause, but no breakpoint set - which violates assertions. Yeah, this makes sense. Any tests fail without this patch? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21074#pullrequestreview-2315363086 From coleenp at openjdk.org Thu Sep 19 12:38:48 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 19 Sep 2024 12:38:48 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:47:21 GMT, Thomas Stuefe wrote: >> It lessens the cache effects of Klass hyperaligning. > > Note that if we go with my KLUT proposal for post-Lilliput (the GC oop iteration improvements), this will not matter anymore and can be simplified to a fixed shift of 10. Yes, please, not having this code would be really nice. This is difficult code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766753081 From rcastanedalo at openjdk.org Thu Sep 19 13:12:49 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Sep 2024 13:12:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: <0gatRiYQ3frDnMftpb_WaDolUwcYvBFh5hAp6jY0dzQ=.21d6518e-7217-477e-954f-69fd52eb713e@github.com> On Thu, 19 Sep 2024 11:42:04 GMT, Roman Kennke wrote: > > > I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version. > > > > > > What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on `oopDesc::klass_offset_in_bytes()`) being a pre-condition for a future, non-experimental version of compact headers? > > Yes, that sounds like a good improvement! It'd also clean up C2 considerably - right now there are many places in C2 that rely on klass_offset_in_bytes(). Getting rid of them all would be great, but also seems like a major effort. Could you file an issue to track that future work? Done: https://bugs.openjdk.org/browse/JDK-8340453. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2360945827 From stefank at openjdk.org Thu Sep 19 13:12:50 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 13:12:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 12:35:30 GMT, Coleen Phillimore wrote: >> Note that if we go with my KLUT proposal for post-Lilliput (the GC oop iteration improvements), this will not matter anymore and can be simplified to a fixed shift of 10. > > Yes, please, not having this code would be really nice. This is difficult code. Do you seen any effects of this in anything other than special-crafted micro benchmarks? I wonder if it would be good enough to hard-code this to be 10 for the first integration of Lilliput. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766804699 From stuefe at openjdk.org Thu Sep 19 13:37:52 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 13:37:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 13:08:43 GMT, Stefan Karlsson wrote: >> Yes, please, not having this code would be really nice. This is difficult code. > > Do you seen any effects of this in anything other than special-crafted micro benchmarks? I wonder if it would be good enough to hard-code this to be 10 for the first integration of Lilliput. I will do some benchmarks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766848371 From zgu at openjdk.org Thu Sep 19 14:05:56 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Thu, 19 Sep 2024 14:05:56 GMT Subject: RFR: 8339668: Parallel: Adopt PartialArrayState to consolidate marking stack in compact GC Message-ID: Please review this patch that adopts `PartialArrayState`introduced by [JDK-8337709](https://bugs.openjdk.org/browse/JDK-8337709) to consolidate `_oop_task_queues` and `_objarray_task_queues` into single `_marking_stacks`. The change mirrors Kim's [JDK-8311163](https://bugs.openjdk.org/browse/JDK-8311163) work, therefore, there are methods can be consolidated and simplified, but I would like defer to a followup CR. ------------- Commit messages: - v7 - v6 - v5 - v4 - v3 - Correct marking stride - v2 - tq stats - v1 - v0 Changes: https://git.openjdk.org/jdk/pull/21089/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21089&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339668 Stats: 262 lines in 5 files changed: 152 ins; 44 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/21089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21089/head:pull/21089 PR: https://git.openjdk.org/jdk/pull/21089 From stefank at openjdk.org Thu Sep 19 14:25:52 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 14:25:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> On Thu, 19 Sep 2024 11:54:50 GMT, Roman Kennke wrote: >> We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler. > > We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful. This is my current work-in-progress code: https://github.com/stefank/jdk/compare/pull/20677...stefank:jdk:lilliput_remove_prototype_header_wip_2 I've made some large rewrites and are currently running it through functional testing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766934571 From mli at openjdk.org Thu Sep 19 15:03:53 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 19 Sep 2024 15:03:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> References: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> Message-ID: On Thu, 19 Sep 2024 12:08:46 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 > - review feedback In both aarch64.ad and x86_64.ad, `MachUEPNode::format` might need some change accordingly? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2361266175 From rcastanedalo at openjdk.org Thu Sep 19 17:23:50 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Sep 2024 17:23:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Wed, 18 Sep 2024 12:08:46 GMT, Roman Kennke wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2576: >> >>> 2574: } else { >>> 2575: lea(dst, Address(obj, index, Address::lsl(scale))); >>> 2576: ldr(dst, Address(dst, offset)); >> >> Do you have a reproducer (or, better yet, a test case) that exercises this case? I ran Oracle's internal CI tiers 1-5 and could never hit it. Could this happen for x64 as well? > > AFAIK, this happens only when using compressed oops with a heap-base in r27. When running with that setting, we would get addresses like r27[nklass] or r27[nklass]+offset, both with scale=8. You would need large heaps, perhaps >4GB, to get this coops setting. The problem with aarch64 is that we can't have an address like r27[nklass]+offset, that's why we need to lea the r27[nklass] part first. > Yes, this also happens on x86, but x86 supports rX[nklass]+offset addressing. Thanks @rkennke, I tried running test tiers 1-3 using different compressed OOPs configurations but could not reach this code, unfortunately. Could you provide a reproducer? The reason I am particularly interested is because I'd like to find whether there could be any problematic interaction with C2's implicit null check optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1767315114 From wkemper at openjdk.org Thu Sep 19 17:57:38 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 19 Sep 2024 17:57:38 GMT Subject: RFR: 8340400: Shenandoah: Whitebox breakpoint GC requests may cause assertions In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 21:02:23 GMT, William Kemper wrote: > When a test requests a concurrent GC breakpoint, the calling thread arranges for itself to block until the concurrent GC thread notifies it that the GC has reached the requested breakpoint (phase). The code that handles the whitebox breakpoint request should therefore not block the caller. An attempt was made to do this, but the request just has the caller thread run in a busy loop without waiting. What's more, this loop resets the requested gc cause on every iteration, which may lead to gc cycles with a wb_breakpoint cause, but no breakpoint set - which violates assertions. TestReferenceShortcutCycle and TestReferenceRefersToShenandoah would fail occasionally in the generational mode. I believe the generational mode was more susceptible to the issue because of differences in the generational mode controller. I don't recall seeing test failures in upstream, but as I read the code I believe the issue _could_ happen to other Shenandoah modes (or otherwise cause tests using whitebox breakpoints to behave in unexpected ways). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21074#issuecomment-2361831211 From wkemper at openjdk.org Thu Sep 19 17:57:39 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 19 Sep 2024 17:57:39 GMT Subject: Integrated: 8340400: Shenandoah: Whitebox breakpoint GC requests may cause assertions In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 21:02:23 GMT, William Kemper wrote: > When a test requests a concurrent GC breakpoint, the calling thread arranges for itself to block until the concurrent GC thread notifies it that the GC has reached the requested breakpoint (phase). The code that handles the whitebox breakpoint request should therefore not block the caller. An attempt was made to do this, but the request just has the caller thread run in a busy loop without waiting. What's more, this loop resets the requested gc cause on every iteration, which may lead to gc cycles with a wb_breakpoint cause, but no breakpoint set - which violates assertions. This pull request has now been integrated. Changeset: 75d5e117 Author: William Kemper URL: https://git.openjdk.org/jdk/commit/75d5e117770590d2432fcfe8d89734c7038d4e55 Stats: 13 lines in 1 file changed: 10 ins; 2 del; 1 mod 8340400: Shenandoah: Whitebox breakpoint GC requests may cause assertions Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/21074 From wkemper at openjdk.org Thu Sep 19 21:50:35 2024 From: wkemper at openjdk.org (William Kemper) Date: Thu, 19 Sep 2024 21:50:35 GMT Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue In-Reply-To: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> Message-ID: On Thu, 19 Sep 2024 00:33:04 GMT, Zhengyu Gu wrote: > [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. > > Adopt shared implementation. Thanks for this! Can we use the labels as requested in the review comments? src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 228: > 226: finish_mark_work(); > 227: assert(task_queues()->is_empty(), "Should be empty"); > 228: TASKQUEUE_STATS_ONLY(task_queues()->print_and_reset_taskqueue_stats("")); Could we pass `"Finish Mark"` for the label here. src/hotspot/share/gc/shenandoah/shenandoahSTWMark.cpp line 136: > 134: > 135: assert(task_queues()->is_empty(), "Should be empty"); > 136: TASKQUEUE_STATS_ONLY(task_queues()->print_and_reset_taskqueue_stats("")); Could we pass `"Mark"` for the label here? ------------- Changes requested by wkemper (Committer). PR Review: https://git.openjdk.org/jdk/pull/21077#pullrequestreview-2316808410 PR Review Comment: https://git.openjdk.org/jdk/pull/21077#discussion_r1767638510 PR Review Comment: https://git.openjdk.org/jdk/pull/21077#discussion_r1767638220 From rkennke at openjdk.org Fri Sep 20 12:33:50 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 20 Sep 2024 12:33:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Thu, 19 Sep 2024 17:20:36 GMT, Roberto Casta?eda Lozano wrote: >> AFAIK, this happens only when using compressed oops with a heap-base in r27. When running with that setting, we would get addresses like r27[nklass] or r27[nklass]+offset, both with scale=8. You would need large heaps, perhaps >4GB, to get this coops setting. The problem with aarch64 is that we can't have an address like r27[nklass]+offset, that's why we need to lea the r27[nklass] part first. >> Yes, this also happens on x86, but x86 supports rX[nklass]+offset addressing. > > Thanks @rkennke, I tried running test tiers 1-3 using different compressed OOPs configurations but could not reach this code, unfortunately. Could you provide a reproducer? The reason I am particularly interested is because I'd like to find whether there could be any problematic interaction with C2's implicit null check optimization. I tried to reproduce for a few hours now using a custom testcase, with no success. I am pretty sure that this can happen, that is why I added this code. Originally I had an assert there asserting that index is not used. I do remember that this happens very rarely, and I don't remember the exact condition. Looking at the possible operands in opclass memory, I think this can only happen when we load an nKlass from an address of the form [rX, rY], i.e. the address in rX indexed by rY. This is an odd thing to happen for loadNKlass, I think, because rY should always be klass_offset_in_bytes. Maybe this is possible when we get odd address merges where we get a PhiNode as the offset/index? I don't know. I agree, this *might* lead to surprising problems with implicit null-checking, if it is expected that the first instruction in loadNKlass provokes the SIGSEGV. A way around this would be to declare an opclass that is a subset of 'memory' that excludes all operands with index, and match on that. I think this would force the lea as a separate instruction and ensure that we never see such a thing in loadNKlass. However, I would not feel very confident to do that without a reproducer. Let me dig a little further. For reference, here is my unsuccessful reproducer: https://gist.github.com/rkennke/8a57610d74fcde07a9390f268ec35738 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768538965 From rkennke at openjdk.org Fri Sep 20 15:29:52 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 20 Sep 2024 15:29:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Fri, 20 Sep 2024 12:31:18 GMT, Roman Kennke wrote: >> Thanks @rkennke, I tried running test tiers 1-3 using different compressed OOPs configurations but could not reach this code, unfortunately. Could you provide a reproducer? The reason I am particularly interested is because I'd like to find whether there could be any problematic interaction with C2's implicit null check optimization. > > I tried to reproduce for a few hours now using a custom testcase, with no success. > I am pretty sure that this can happen, that is why I added this code. Originally I had an assert there asserting that index is not used. I do remember that this happens very rarely, and I don't remember the exact condition. Looking at the possible operands in opclass memory, I think this can only happen when we load an nKlass from an address of the form [rX, rY], i.e. the address in rX indexed by rY. This is an odd thing to happen for loadNKlass, I think, because rY should always be klass_offset_in_bytes. Maybe this is possible when we get odd address merges where we get a PhiNode as the offset/index? I don't know. > I agree, this *might* lead to surprising problems with implicit null-checking, if it is expected that the first instruction in loadNKlass provokes the SIGSEGV. A way around this would be to declare an opclass that is a subset of 'memory' that excludes all operands with index, and match on that. I think this would force the lea as a separate instruction and ensure that we never see such a thing in loadNKlass. However, I would not feel very confident to do that without a reproducer. Let me dig a little further. > > For reference, here is my unsuccessful reproducer: https://gist.github.com/rkennke/8a57610d74fcde07a9390f268ec35738 Something like this is what I have in mind. It seems to pass tier1 tests. I still haven't managed to reproduce the path that requires an index register, though. https://github.com/rkennke/jdk/commit/2c4a7877e4ef94017c8155578d8cfc9342441656 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768816377 From matsaave at openjdk.org Fri Sep 20 17:21:51 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 20 Sep 2024 17:21:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> References: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> Message-ID: On Thu, 19 Sep 2024 12:08:46 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 > - review feedback CDS changes look good! Have two style comments but otherwise this makes sense ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2318793061 From matsaave at openjdk.org Fri Sep 20 17:21:53 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 20 Sep 2024 17:21:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/hotspot/share/cds/archiveBuilder.cpp line 677: > 675: // Allocate space for the future InstanceKlass with proper alignment > 676: const size_t alignment = > 677: #ifdef _LP64 I think the text alignment here is a bit confusing. Should 678 and 682 be at the same indentation? src/hotspot/share/cds/archiveUtils.cpp line 348: > 346: old_tag = (int)(intptr_t)nextPtr(); > 347: // do_int(&old_tag); > 348: assert(tag == old_tag, "tag doesn't match (%d, expected %d)", old_tag, tag); Is this assert message change a leftover from debugging or is it meant to be this way? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768946883 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768923643 From coleenp at openjdk.org Fri Sep 20 18:19:51 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 18:19:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> References: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> Message-ID: On Thu, 19 Sep 2024 12:08:46 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 > - review feedback I mostly reviewed the metaspace changes and suggest upstreaming the MetaBlock refactoring ahead of the rest of this patch. Only one comment about the interpreter code (affecting 4 locations). src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3636: > 3634: } else { > 3635: __ sub(r3, r3, sizeof(oopDesc)); > 3636: } This looks like something that could be buggy if we're not careful. We had a pass where we cleaned up sizeof(oopDesc) once. Can this be in oopDesc as (this is not header_size() anymore?) some function with the right name? src/hotspot/cpu/x86/templateTable_x86.cpp line 4121: > 4119: __ movptr(Address(rax, rdx, Address::times_8, sizeof(oopDesc) - 1*oopSize), rcx); > 4120: NOT_LP64(__ movptr(Address(rax, rdx, Address::times_8, sizeof(oopDesc) - 2*oopSize), rcx)); > 4121: } For this and above, I'd rather oopDesc encapsulate the header_size for UseCompactObjectHeaders condition in C++ code, and never see sizeof(oopDesc). src/hotspot/share/memory/metaspace.cpp line 799: > 797: > 798: // Set up compressed class pointer encoding. > 799: // In CDS=off mode, we give the JVM some leeway to choose a favorable base/shift combination. I don't know why this comment is here. Seems out of place. src/hotspot/share/memory/metaspace/freeBlocks.cpp line 57: > 55: } > 56: } > 57: return p; This answers my prior question. The waste is added back to the block list for non-class-arenas as well. src/hotspot/share/memory/metaspace/metablock.hpp line 74: > 72: #define METABLOCKFORMATARGS(__block__) p2i((__block__).base()), (__block__).word_size() > 73: > 74: } // namespace metaspace I am wondering if some of these metaspace changes, that is, the addition of MetaBlock could be upstreamed ahead of the CompactObjectHeaders. Some is refactoring so that you can use the wastage to allocate into class-arena but a lot of this seems neutral to compact object headers, and would reduce this patch and allow different people to focus on just this. src/hotspot/share/memory/metaspace/metaspaceArena.cpp line 470: > 468: > 469: // Returns true if the given block is contained in this arena > 470: // Returns true if the given block is contained in this arena Here's the same comment twice. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2318539468 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768775590 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768781956 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768979540 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769008437 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769012842 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769015008 From coleenp at openjdk.org Fri Sep 20 18:19:52 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 18:19:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: <5Mh0IteE4Z7zDseCNMmYKOtyCMTQe0iuAp70kJf8pS0=.8215bcce-6387-46fb-97a0-d1e6a9498b61@github.com> On Wed, 18 Sep 2024 13:57:29 GMT, Thomas Stuefe wrote: >> src/hotspot/share/memory/classLoaderMetaspace.cpp line 87: >> >>> 85: klass_alignment_words, >>> 86: "class arena"); >>> 87: } >> >> As per my comment in the header file, change the code to this: >> >> ```c++ >> if (class_context != nullptr) { >> // ... Same as in PR >> } else { >> _class_space_arena = _non_class_space_arena; >> } > > Rather not, see reasoning under https://github.com/openjdk/jdk/pull/20677/files#r1754330432 Yes, I'd rather _class_space_arena be nullptr if not used. >> src/hotspot/share/memory/classLoaderMetaspace.cpp line 115: >> >>> 113: if (wastage.is_nonempty()) { >>> 114: non_class_space_arena()->deallocate(wastage); >>> 115: } >> >> This code reads a bit strangely. I understand *what* it tries to do. It tries to give back any wasted memory from either the class space arena *or* the non class space arena to the non class space arena's freelist. I assume that we do this since any wastage is presumably too small to be used by our new 22-bit class pointers. However, this context will be lost on future readers. It should have at least a comment in the `if (wastage.is_nonempty())` clause explaining what we expect should happen and why. For example: >> >> ```c++ >> // Any wasted memory is presumably too small for any class. >> // Therefore, give it back to the non-class space arena's free list. > > Yes. Some background: > > - wastage can only occur for larger Klass* alignments (aka class space arena alignment property), so only for +COH (note to self, maybe assert) > - wastage is, by definition, not aligned to the required Klass* alignment, so it cannot be reused. Yes, its probably also too small > > Yes, I will write a better comment. Yes, this definitely needs a comment why since this is how we allocate small chunks of wasted because of hyper-aligning Klasses in class space. Line 111 is somewhat surprising though. I didn't expect there to be wastage from allocating to non-class-metaspace. The unnerving bit of this is that CompressedKlassPointers::is_encodable() is true for memory allocated here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768897591 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768966812 From coleenp at openjdk.org Fri Sep 20 18:19:53 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 18:19:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: <5Mh0IteE4Z7zDseCNMmYKOtyCMTQe0iuAp70kJf8pS0=.8215bcce-6387-46fb-97a0-d1e6a9498b61@github.com> References: <5Mh0IteE4Z7zDseCNMmYKOtyCMTQe0iuAp70kJf8pS0=.8215bcce-6387-46fb-97a0-d1e6a9498b61@github.com> Message-ID: On Fri, 20 Sep 2024 17:34:09 GMT, Coleen Phillimore wrote: >> Yes. Some background: >> >> - wastage can only occur for larger Klass* alignments (aka class space arena alignment property), so only for +COH (note to self, maybe assert) >> - wastage is, by definition, not aligned to the required Klass* alignment, so it cannot be reused. Yes, its probably also too small >> >> Yes, I will write a better comment. > > Yes, this definitely needs a comment why since this is how we allocate small chunks of wasted because of hyper-aligning Klasses in class space. Line 111 is somewhat surprising though. I didn't expect there to be wastage from allocating to non-class-metaspace. > > The unnerving bit of this is that CompressedKlassPointers::is_encodable() is true for memory allocated here. I think this should also assert or be condionalized on UseCompactObjectHeaders. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768972448 From xpeng at openjdk.org Fri Sep 20 18:31:55 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 20 Sep 2024 18:31:55 GMT Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer Message-ID: In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277)) The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget. Here the latency comparison for the optimization: ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0) With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip: static final int threadCount = Runtime.getRuntime().availableProcessors(); static final LongAdder totalCount = new LongAdder(); static volatile byte[] sink; public static void main(String[] args) { runAllocationTest(100000); } static void recordTimeToAllocate(final int dataSize, final Histogram histogram) { long startTime = System.nanoTime(); sink = new byte[dataSize]; long endTime = System.nanoTime(); histogram.recordValue(endTime - startTime); } static void runAllocationTest(final int dataSize) { final long endTime = System.currentTimeMillis() + 30_000; final CountDownLatch startSignal = new CountDownLatch(1); final CountDownLatch finished = new CountDownLatch(threadCount); final Thread[] threads = new Thread[threadCount]; final Histogram[] histograms = new Histogram[threadCount]; final Histogram totalHistogram = new Histogram(3600000000000L, 3); for (int i = 0; i < threadCount; i++) { final var histogram = new Histogram(3600000000000L, 3); histograms[i] = histogram; threads[i] = new Thread(() -> { wait(startSignal); do { recordTimeToAllocate(dataSize, histogram); } while (System.currentTimeMillis() < endTime); finished.countDown(); }); threads[i].start(); } startSignal.countDown(); //Start to test wait(finished); for (Histogram histogram : histograms) { totalHistogram.add(histogram); } totalHistogram.outputPercentileDistribution(System.out, 1000.0); } public static void wait(final CountDownLatch latch) { try { latch.await(); } catch (InterruptedException e) { throw new RuntimeException(e); } } ### Additional test - [x] MacOS AArch64 server fastdebug, hotspot_gc_shenandoah ------------- Commit messages: - use const - refactor - Clean code - try claim_for_alloc before calculating total_delay - try claim_for_alloc before calculating total_delay - clean up - 8340490: Shenandoah: Optimize ShenandoahPacer Changes: https://git.openjdk.org/jdk/pull/21099/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21099&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340490 Stats: 41 lines in 3 files changed: 8 ins; 16 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/21099.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21099/head:pull/21099 PR: https://git.openjdk.org/jdk/pull/21099 From shade at openjdk.org Fri Sep 20 18:31:56 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 20 Sep 2024 18:31:56 GMT Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer In-Reply-To: References: Message-ID: <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com> On Thu, 19 Sep 2024 23:32:14 GMT, Xiaolong Peng wrote: > In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277)) > > The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget. > > Here the latency comparison for the optimization: > ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0) > > With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip: > > static final int threadCount = Runtime.getRuntime().availableProcessors(); > static final LongAdder totalCount = new LongAdder(); > static volatile byte[] sink; > public static void main(String[] args) { > runAllocationTest(100000); > } > static void recordTimeToAllocate(final int dataSize, final Histogram histogram) { > long startTime = System.nanoTime(); > sink = new byte[dataSize]; > long endTime = System.nanoTime(); > histogram.recordValue(endTime - startTime); > } > > static void runAllocationTest(final int dataSize) { > final long endTime = System.currentTimeMillis() + 30_000; > final CountDownLatch startSignal = new CountDownLatch(1); > final CountDownLatch finished = new CountDownLatch(threadCount); > final Thread[] threads = new Thread[threadCount]; > final Histogram[] histograms = new Histogram[threadCount]; > final Histogram totalHistogram = new Histogram(3600000000000L, 3); > for (int i = 0; i < threadCount; i++) { > final var histogram = new Histogram(3600000000000L, 3); > histograms[i] = histogram; > threads[i] = new Thread(() -> { > wait(startSignal); > do { > recordTimeToAllocate(dataSize, histogram); > } while (System.currentTimeMillis() < e... I am trying to remember why we even bothered to go into negative budget on this path, and then waited for it to recover. I think it is from here: https://mail.openjdk.org/pipermail/shenandoah-dev/2018-April/005559.html. AFAICS, the intent for that fix was to make sure that unsuccessful pacing claim the budget, which this patch also does. And given it apparently improves performance, I don't mind it going in. Comprehension question: the actual improvement comes from waiting in 1ms slices, not from anything else? In retrospect, _it is_ silly to wait until the deadline before attempting to claim the pacing budget. I am good with this, assuming performance runs show good results. src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 191: > 189: _need_notify_waiters.try_set(); > 190: } > 191: template Newline before `template`, please. src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 206: > 204: } > 205: new_val = cur - tax; > 206: } while (Atomic::load(&_budget) == cur && I don't think we need this load, since we have _just_ had another load nearby. This should be enough to resolve the contention issues TTAS pattern tries to avoid. src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 256: > 254: double total_delay = 0; > 255: > 256: double start = os::elapsedTime(); While we are here, let's avoid some integer divisions and floating-point math. Try to rewrite this using `jlong os::elapsed_counter()`, which returns integer nanoseconds? Do the math in `jlong`-s. src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 257: > 255: > 256: double start = os::elapsedTime(); > 257: while (!claimed) { I suggest we common some exit paths by writing the loop like this: double start_time = os::elapsedTime(); while (!claimed && (os::elapsedTime() - start_time) < max_delay) { // We could instead assist GC, but this would suffice for now. wait(1); claimed = claim_for_alloc(words); } if (!claimed) { // Spent local time budget to wait for enough GC progress. // Force allocating anyway, which may mean we outpace GC, // and start Degenerated GC cycle. claimed = claim_for_alloc(words); assert(claimed, "Should always succeed"); } ShenandoahThreadLocalData::add_paced_time(JavaThread::current(), os::elapsedTime() - start_time); src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 265: > 263: // and start Degenerated GC cycle. > 264: claimed = claim_for_alloc(words); > 265: assert(claimed, "Should always succeed"); Come to think about it, we don't need to check for return value here. We don't check in other place where we call `claim_for_alloc(words);` src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 267: > 265: assert(claimed, "Should always succeed"); > 266: } > 267: ShenandoahThreadLocalData::add_paced_time(JavaThread::current(), (double)(os::elapsed_counter() - start_time) / (double) NANOSECS_PER_SEC); We already have `current` (`JavaThread::current()`) in scope here, use that :) I also think a second cast to `(double) NANOSECS_PER_SEC` is redundant. ------------- PR Review: https://git.openjdk.org/jdk/pull/21099#pullrequestreview-2317722311 Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21099#pullrequestreview-2318988155 PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1768272092 PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1768271671 PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1768281644 PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1768291970 PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1769025976 PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1769027052 From xpeng at openjdk.org Fri Sep 20 18:31:56 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 20 Sep 2024 18:31:56 GMT Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer In-Reply-To: <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com> References: <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com> Message-ID: On Fri, 20 Sep 2024 09:46:50 GMT, Aleksey Shipilev wrote: > I am trying to remember why we even bothered to go into negative budget on this path, and then waited for it to recover. I think it is from here: https://mail.openjdk.org/pipermail/shenandoah-dev/2018-April/005559.html. AFAICS, the intent for that fix was to make sure that unsuccessful pacing claim the budget, which this patch also does. And given it apparently improves performance, I don't mind it going in. > > Comprehension question: the actual improvement comes from waiting in 1ms slices, not from anything else? In retrospect, _it is_ silly to wait until the deadline before attempting to claim the pacing budget. > src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 206: > >> 204: } >> 205: new_val = cur - tax; >> 206: } while (Atomic::load(&_budget) == cur && > > I don't think we need this load, since we have _just_ had another load nearby. This should be enough to resolve the contention issues TTAS pattern tries to avoid. Thanks, reverted TTAS pattern. > src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp line 257: > >> 255: >> 256: double start = os::elapsedTime(); >> 257: while (!claimed) { > > I suggest we common some exit paths by writing the loop like this: > > > double start_time = os::elapsedTime(); > while (!claimed && (os::elapsedTime() - start_time) < max_delay) { > // We could instead assist GC, but this would suffice for now. > wait(1); > claimed = claim_for_alloc(words); > } > if (!claimed) { > // Spent local time budget to wait for enough GC progress. > // Force allocating anyway, which may mean we outpace GC, > // and start Degenerated GC cycle. > claimed = claim_for_alloc(words); > assert(claimed, "Should always succeed"); > } > ShenandoahThreadLocalData::add_paced_time(JavaThread::current(), os::elapsedTime() - start_time); Thanks, refactored the code along with the change to use os::elapsed_counter(), only need handle the nanos to seconds conversion when calling ShenandoahThreadLocalData::add_paced_time at the last. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2364294440 PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1769018442 PR Review Comment: https://git.openjdk.org/jdk/pull/21099#discussion_r1769021455 From xpeng at openjdk.org Fri Sep 20 18:47:50 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 20 Sep 2024 18:47:50 GMT Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2] In-Reply-To: References: Message-ID: > In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277)) > > The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget. > > Here the latency comparison for the optimization: > ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0) > > With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip: > > static final int threadCount = Runtime.getRuntime().availableProcessors(); > static final LongAdder totalCount = new LongAdder(); > static volatile byte[] sink; > public static void main(String[] args) { > runAllocationTest(100000); > } > static void recordTimeToAllocate(final int dataSize, final Histogram histogram) { > long startTime = System.nanoTime(); > sink = new byte[dataSize]; > long endTime = System.nanoTime(); > histogram.recordValue(endTime - startTime); > } > > static void runAllocationTest(final int dataSize) { > final long endTime = System.currentTimeMillis() + 30_000; > final CountDownLatch startSignal = new CountDownLatch(1); > final CountDownLatch finished = new CountDownLatch(threadCount); > final Thread[] threads = new Thread[threadCount]; > final Histogram[] histograms = new Histogram[threadCount]; > final Histogram totalHistogram = new Histogram(3600000000000L, 3); > for (int i = 0; i < threadCount; i++) { > final var histogram = new Histogram(3600000000000L, 3); > histograms[i] = histogram; > threads[i] = new Thread(() -> { > wait(startSignal); > do { > recordTimeToAllocate(dataSize, histogram); > } while (System.currentTimeMillis() < e... Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: clean up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21099/files - new: https://git.openjdk.org/jdk/pull/21099/files/1de70211..58196a4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21099&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21099&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21099.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21099/head:pull/21099 PR: https://git.openjdk.org/jdk/pull/21099 From xpeng at openjdk.org Fri Sep 20 18:47:50 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 20 Sep 2024 18:47:50 GMT Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2] In-Reply-To: References: <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com> Message-ID: <-6wa5ftQJ3WdXiX-SsMY-nXgnTWCl9ZzDTt89akghyM=.7e53fed3-2f4c-4c40-9465-15c97bf8e089@github.com> On Fri, 20 Sep 2024 18:27:14 GMT, Xiaolong Peng wrote: > I am trying to remember why we even bothered to go into negative budget on this path, and then waited for it to recover. I think it is from here: https://mail.openjdk.org/pipermail/shenandoah-dev/2018-April/005559.html. AFAICS, the intent for that fix was to make sure that unsuccessful pacing claim the budget, which this patch also does. And given it apparently improves performance, I don't mind it going in. > > Comprehension question: the actual improvement comes from waiting in 1ms slices, not from anything else? In retrospect, _it is_ silly to wait until the deadline before attempting to claim the pacing budget. It is primarily from the algorithm change with 1ms slices. The behavior has been changed in the new algorithm with 1ms slices, e.g. when 10 threads seeming insufficient budget at the same time, assuming each of them claim 100 budget, in old algorithm all of the 10 threads forcefully claim the budget and result in `-1000` budget, them it need other mutators to release at least `1000` or they have to wait for up to 10ms even they may be woken up by the ShenandoahPeriodicPacerNotifyTask. In new algorithm, each threads will try to claim 100 budget every 1ms and don't need to wait other mutators to release at least `1000`, as soon as enough budget(>100) is returned, some thread(s) will compete others and proceed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2364322181 From xpeng at openjdk.org Fri Sep 20 18:51:35 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 20 Sep 2024 18:51:35 GMT Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2] In-Reply-To: <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com> References: <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com> Message-ID: On Fri, 20 Sep 2024 18:27:20 GMT, Aleksey Shipilev wrote: > I am good with this, assuming performance runs show good results. Latency wise, in most time it is better than old impl. In my specific test with 8G heap on MacOS, throughput is very close to the test w/ ShenandoahPacing disabled, and about 25%~30% improvement comparing the old implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2364332905 From coleenp at openjdk.org Fri Sep 20 19:02:49 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 19:02:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Wed, 18 Sep 2024 12:54:34 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/markWord.inline.hpp line 90: >> >>> 88: ShouldNotReachHere(); >>> 89: return markWord(); >>> 90: #endif >> >> Is the ifdef _LP64 necessary, since UseCompactObjectHeaders should always be false for 32 bits? > > Kindof. The problem is that klass_shift is larger than 31, and shifting with it would thus be UB and generate a compiler warning. I opted to simply not compile any of that code in 32bit builds. We could also define klass_shift differently on 32bit. > Long-term (maybe with Lilliput2/4-byte-headers?) it would be nice to consolidate the header layout between 32 and 64 bit builds and not make any distinction anywhere. E.g. define markWord (or objectHeader?) in a single way, and use that to extract all the relevant stuff. It's not totally unlikely that we deprecate 32-bit builds before that can happen, though. Ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769069007 From coleenp at openjdk.org Fri Sep 20 19:09:50 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 19:09:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> Message-ID: On Thu, 19 Sep 2024 14:22:51 GMT, Stefan Karlsson wrote: >> We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful. > > This is my current work-in-progress code: > https://github.com/stefank/jdk/compare/pull/20677...stefank:jdk:lilliput_remove_prototype_header_wip_2 > > I've made some large rewrites and I'm currently running it through functional testing. The refactoring is better in this last version with encode_and_store_compact_object_header, although some comments around the c2 version would be good. Still don't know what the c2 version does. Someone else should review that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769075714 From shade at openjdk.org Sat Sep 21 05:54:42 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Sat, 21 Sep 2024 05:54:42 GMT Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2] In-Reply-To: References: <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com> Message-ID: On Fri, 20 Sep 2024 18:48:45 GMT, Xiaolong Peng wrote: > > I am good with this, assuming performance runs show good results. > > Latency wise, in most time it is better than old impl. It is great it improves targeted tests, and it makes sense from the first principles. Run our usual performance pipeline to sanity check if this affects any other benchmarks in any meaningful way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2365015599 From fyang at openjdk.org Sat Sep 21 06:48:45 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 21 Sep 2024 06:48:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v24] In-Reply-To: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> References: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> Message-ID: On Wed, 18 Sep 2024 17:45:51 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - Remove redundant comment src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 257: > 255: RegSet::of($res$$Register) /* no_preserve */); > 256: __ mov($tmp1$$Register, $oldval$$Register); > 257: __ mov($tmp2$$Register, $newval$$Register); Hi, I don't quite understand these two register-register moves here. Seems to me that we could pass `oldval` and `newval` to `cmpxchg` directly as `cmpxchg` won't modify them, which help us save these two moves. Did I miss anything? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1769492955 From kbarrett at openjdk.org Sat Sep 21 23:38:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 21 Sep 2024 23:38:43 GMT Subject: RFR: 8340573: Remove unused G1ParScanThreadState::_partial_objarray_chunk_size Message-ID: Please review this trivial change to remove unused G1ParScanThreadState::_partial_objarray_chunk_size. Testing: local (linux-x64) clean build ------------- Commit messages: - remove unused _partial_objarray_chunk_size Changes: https://git.openjdk.org/jdk/pull/21117/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21117&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340573 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21117.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21117/head:pull/21117 PR: https://git.openjdk.org/jdk/pull/21117 From stuefe at openjdk.org Sun Sep 22 12:01:51 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 22 Sep 2024 12:01:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 16:56:58 GMT, Matias Saavedra Silva wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/cds/archiveUtils.cpp line 348: > >> 346: old_tag = (int)(intptr_t)nextPtr(); >> 347: // do_int(&old_tag); >> 348: assert(tag == old_tag, "tag doesn't match (%d, expected %d)", old_tag, tag); > > Is this assert message change a leftover from debugging or is it meant to be this way? Its a leftover, but otoh it does not hurt. I found myself re-adding it several times to analyze CDS issues during development, so I decided to just leave it in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1770536320 From tschatzl at openjdk.org Mon Sep 23 07:15:37 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 23 Sep 2024 07:15:37 GMT Subject: RFR: 8340573: Remove unused G1ParScanThreadState::_partial_objarray_chunk_size In-Reply-To: References: Message-ID: On Sat, 21 Sep 2024 23:34:24 GMT, Kim Barrett wrote: > Please review this trivial change to remove unused G1ParScanThreadState::_partial_objarray_chunk_size. > > Testing: local (linux-x64) clean build Lgtm and trivial. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21117#pullrequestreview-2321288400 From aboldtch at openjdk.org Mon Sep 23 07:28:02 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 23 Sep 2024 07:28:02 GMT Subject: RFR: 8340146: ZGC: TestAllocateHeapAt.java should not run with UseLargePages Message-ID: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com> TestAllocateHeapAt.java expects that creating the heap file works in the current director (`.`). But when using persistent hugepages (-XX:+UseLargePages) this would require the filesystem to be a HugeTLBFS. I propose that we do not allow running these tests with persistent hugepages. ------------- Commit messages: - 8340146: ZGC: TestAllocateHeapAt.java should not run with UseLargePages Changes: https://git.openjdk.org/jdk/pull/21127/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21127&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340146 Stats: 5 lines in 3 files changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21127.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21127/head:pull/21127 PR: https://git.openjdk.org/jdk/pull/21127 From aboldtch at openjdk.org Mon Sep 23 07:32:47 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 23 Sep 2024 07:32:47 GMT Subject: RFR: 8340419: ZGC: Create an UseLargePages adaptation of TestAllocateHeapAt.java Message-ID: [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127 disables TestAllocateHeapAt.java when running with persistent hugepages because it makes assumptions on the underlying file systems. I propose creating a version of this tests which instead first checks if there is an appropriate mount point for a persistent hugepages heap file, and only runs the test if exists. ------------- Commit messages: - 8340419: ZGC: Create an UseLargePages adaptation of TestAllocateHeapAt.java Changes: https://git.openjdk.org/jdk/pull/21128/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21128&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340419 Stats: 91 lines in 1 file changed: 91 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21128.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21128/head:pull/21128 PR: https://git.openjdk.org/jdk/pull/21128 From tschatzl at openjdk.org Mon Sep 23 07:33:36 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 23 Sep 2024 07:33:36 GMT Subject: RFR: 8340146: ZGC: TestAllocateHeapAt.java should not run with UseLargePages In-Reply-To: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com> References: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com> Message-ID: <55EUODB7OmULTHMww9vJX5MrU3_tuMXrwrlm9EsxeiU=.6d6fbdcd-0f35-4381-b2d9-fe6da69c9884@github.com> On Mon, 23 Sep 2024 07:22:44 GMT, Axel Boldt-Christmas wrote: > TestAllocateHeapAt.java expects that creating the heap file works in the current director (`.`). But when using persistent hugepages (-XX:+UseLargePages) this would require the filesystem to be a HugeTLBFS. > > I propose that we do not allow running these tests with persistent hugepages. lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21127#pullrequestreview-2321326839 From stefank at openjdk.org Mon Sep 23 07:40:34 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 23 Sep 2024 07:40:34 GMT Subject: RFR: 8340419: ZGC: Create an UseLargePages adaptation of TestAllocateHeapAt.java In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:28:15 GMT, Axel Boldt-Christmas wrote: > [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127 disables TestAllocateHeapAt.java when running with persistent hugepages because it makes assumptions on the underlying file systems. > > I propose creating a version of this tests which instead first checks if there is an appropriate mount point for a persistent hugepages heap file, and only runs the test if exists. Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21128#pullrequestreview-2321340466 From stefank at openjdk.org Mon Sep 23 07:41:34 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 23 Sep 2024 07:41:34 GMT Subject: RFR: 8340146: ZGC: TestAllocateHeapAt.java should not run with UseLargePages In-Reply-To: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com> References: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com> Message-ID: On Mon, 23 Sep 2024 07:22:44 GMT, Axel Boldt-Christmas wrote: > TestAllocateHeapAt.java expects that creating the heap file works in the current director (`.`). But when using persistent hugepages (-XX:+UseLargePages) this would require the filesystem to be a HugeTLBFS. > > I propose that we do not allow running these tests with persistent hugepages. Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21127#pullrequestreview-2321341392 From rcastanedalo at openjdk.org Mon Sep 23 07:48:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 23 Sep 2024 07:48:16 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v25] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 46 additional commits since the last revision: - Merge jdk-24+16 - Ensure that detected encode-and-store patterns are matched - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - Remove redundant comment - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - Restore some asserts - Default values for tmp regs of G1PostBarrierStubC2 - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 - ... and 36 more: https://git.openjdk.org/jdk/compare/bdb0e33c...47c982ba ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/d54d67f1..47c982ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=23-24 Stats: 170497 lines in 1328 files changed: 155223 ins; 8073 del; 7201 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Mon Sep 23 07:57:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 23 Sep 2024 07:57:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v25] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 22:51:59 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 46 additional commits since the last revision: >> >> - Merge jdk-24+16 >> - Ensure that detected encode-and-store patterns are matched >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Remove redundant comment >> - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms >> - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Restore some asserts >> - Default values for tmp regs of G1PostBarrierStubC2 >> - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 >> - ... and 36 more: https://git.openjdk.org/jdk/compare/da906826...47c982ba > > src/hotspot/share/opto/matcher.cpp line 1821: > >> 1819: if( rule >= _END_INST_CHAIN_RULE || rule < _BEGIN_INST_CHAIN_RULE ) { >> 1820: assert(C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf), >> 1821: "duplicating node that's already been matched"); > > Why it was removed? The assertion was failing due to it being too strict in several cases where the matcher would generate valid code anyway. One of them is when `is_encode_and_store_pattern(n, m)` returns true but `m -> n` cannot be matched by a single `g1EncodePAndStoreN` instruction. Commit 9ad158b6 removes this case by ensuring that `is_encode_and_store_pattern(n, m)` holds only if `m -> n` can indeed be matched. There are other cases (all of them harmless as far as I can see) in which this assertion can fail. I am investigating whether they can be avoided so that the assertion can be restored, and what would be the impact on the "redundant decompression removal" (`g1EncodePAndStoreN`) optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1770925777 From kbarrett at openjdk.org Mon Sep 23 08:05:40 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 23 Sep 2024 08:05:40 GMT Subject: RFR: 8340573: Remove unused G1ParScanThreadState::_partial_objarray_chunk_size In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:12:49 GMT, Thomas Schatzl wrote: >> Please review this trivial change to remove unused G1ParScanThreadState::_partial_objarray_chunk_size. >> >> Testing: local (linux-x64) clean build > > Lgtm and trivial. Thanks for reviewing, @tschatzl ------------- PR Comment: https://git.openjdk.org/jdk/pull/21117#issuecomment-2367484521 From kbarrett at openjdk.org Mon Sep 23 08:05:41 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 23 Sep 2024 08:05:41 GMT Subject: Integrated: 8340573: Remove unused G1ParScanThreadState::_partial_objarray_chunk_size In-Reply-To: References: Message-ID: <-JHhJl5HdvVcX0v4ZpflLd9Kog9RYBnr0mHAHJ8f-RI=.bee5f54e-dd64-417f-86b0-cbf0f88f272b@github.com> On Sat, 21 Sep 2024 23:34:24 GMT, Kim Barrett wrote: > Please review this trivial change to remove unused G1ParScanThreadState::_partial_objarray_chunk_size. > > Testing: local (linux-x64) clean build This pull request has now been integrated. Changeset: a07052e8 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/a07052e83d20e107f21fd0d266ab638043531c8a Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8340573: Remove unused G1ParScanThreadState::_partial_objarray_chunk_size Reviewed-by: tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/21117 From duke at openjdk.org Mon Sep 23 12:14:38 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 23 Sep 2024 12:14:38 GMT Subject: RFR: 8339161: ZGC: Remove unused remembered sets In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 18:38:56 GMT, Axel Boldt-Christmas wrote: >> In ZGC, when a page becomes old it needs a remembered set (remset) which stores 2 bits per byte, a memory overhead of 3.125% (2/64) per page that stores an allocated remset. >> >> When an old page is potentially freed and inserted in the page cache, it can later be re-used as a young page. In this case, the remset is still allocated even though the young page does not need it. This is especially noteworthy for long-running programs where pages are recycled for a long enough period to have a remset allocated for close to all pages. >> >> The attached plot shows remset memory usage for pages that are "live" for a program that frequently recycles pages using a cache-mechanism. As remsets for young pages are unused, it should be considered wasted memory. >> >> ![remset_waste](https://github.com/user-attachments/assets/2a60948b-9297-4554-8fb4-d9f527855c33) >> >> The alternative solution that I propose in this PR deletes/frees remsets when an old page is inserted into the page cache, to not wast ememory. This would mean that remsets are only stored for old pages that are in use. Pages that are not in use or are young, should not have an allocated remset. With this change, the line showing remset usage for young pages would disappear in the plot above and the total memory usage dictated by the number of "live" old pages. >> >> Below is a performance measurement of initializing vs. clearing remsets in the same cache program mentioned above. When deleting remsets, freeing is made by GC threads so there is no latency impact for mutators. Initializing remsets is on average ~2.3x slower than clearing, but also uses 3.125% less memory for pages that do not need a remset. ~89% of the measured initializations are made by GC threads and the rest by mutator threads. >> >> | | min (ms) | max (ms) | mean (ms) | >> | ------------ | -------- | -------- | ---------- | >> | remset init | 0.000292 | 0.706 | 0.00258083 | >> | remset clear | 0.000082 | 0.015 | 0.00111340 | >> >> Tested with tiers 1-7 and local test making sure there are no remsets for young pages. SPECjbb2015 performance measurements show no statistically significant regression/improvement. > > lgtm. Nicely done. Thank you for the reviews! @xmas92 @stefank ------------- PR Comment: https://git.openjdk.org/jdk/pull/20947#issuecomment-2368030643 From duke at openjdk.org Mon Sep 23 12:14:39 2024 From: duke at openjdk.org (duke) Date: Mon, 23 Sep 2024 12:14:39 GMT Subject: RFR: 8339161: ZGC: Remove unused remembered sets In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 12:15:47 GMT, Joel Sikstr?m wrote: > In ZGC, when a page becomes old it needs a remembered set (remset) which stores 2 bits per byte, a memory overhead of 3.125% (2/64) per page that stores an allocated remset. > > When an old page is potentially freed and inserted in the page cache, it can later be re-used as a young page. In this case, the remset is still allocated even though the young page does not need it. This is especially noteworthy for long-running programs where pages are recycled for a long enough period to have a remset allocated for close to all pages. > > The attached plot shows remset memory usage for pages that are "live" for a program that frequently recycles pages using a cache-mechanism. As remsets for young pages are unused, it should be considered wasted memory. > > ![remset_waste](https://github.com/user-attachments/assets/2a60948b-9297-4554-8fb4-d9f527855c33) > > The alternative solution that I propose in this PR deletes/frees remsets when an old page is inserted into the page cache, to not wast ememory. This would mean that remsets are only stored for old pages that are in use. Pages that are not in use or are young, should not have an allocated remset. With this change, the line showing remset usage for young pages would disappear in the plot above and the total memory usage dictated by the number of "live" old pages. > > Below is a performance measurement of initializing vs. clearing remsets in the same cache program mentioned above. When deleting remsets, freeing is made by GC threads so there is no latency impact for mutators. Initializing remsets is on average ~2.3x slower than clearing, but also uses 3.125% less memory for pages that do not need a remset. ~89% of the measured initializations are made by GC threads and the rest by mutator threads. > > | | min (ms) | max (ms) | mean (ms) | > | ------------ | -------- | -------- | ---------- | > | remset init | 0.000292 | 0.706 | 0.00258083 | > | remset clear | 0.000082 | 0.015 | 0.00111340 | > > Tested with tiers 1-7 and local test making sure there are no remsets for young pages. SPECjbb2015 performance measurements show no statistically significant regression/improvement. @jsikstro Your change (at version af01efcb9fb9567bf1aec73eca91c987626cbe8a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20947#issuecomment-2368035079 From duke at openjdk.org Mon Sep 23 12:31:41 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Mon, 23 Sep 2024 12:31:41 GMT Subject: Integrated: 8339161: ZGC: Remove unused remembered sets In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 12:15:47 GMT, Joel Sikstr?m wrote: > In ZGC, when a page becomes old it needs a remembered set (remset) which stores 2 bits per byte, a memory overhead of 3.125% (2/64) per page that stores an allocated remset. > > When an old page is potentially freed and inserted in the page cache, it can later be re-used as a young page. In this case, the remset is still allocated even though the young page does not need it. This is especially noteworthy for long-running programs where pages are recycled for a long enough period to have a remset allocated for close to all pages. > > The attached plot shows remset memory usage for pages that are "live" for a program that frequently recycles pages using a cache-mechanism. As remsets for young pages are unused, it should be considered wasted memory. > > ![remset_waste](https://github.com/user-attachments/assets/2a60948b-9297-4554-8fb4-d9f527855c33) > > The alternative solution that I propose in this PR deletes/frees remsets when an old page is inserted into the page cache, to not wast ememory. This would mean that remsets are only stored for old pages that are in use. Pages that are not in use or are young, should not have an allocated remset. With this change, the line showing remset usage for young pages would disappear in the plot above and the total memory usage dictated by the number of "live" old pages. > > Below is a performance measurement of initializing vs. clearing remsets in the same cache program mentioned above. When deleting remsets, freeing is made by GC threads so there is no latency impact for mutators. Initializing remsets is on average ~2.3x slower than clearing, but also uses 3.125% less memory for pages that do not need a remset. ~89% of the measured initializations are made by GC threads and the rest by mutator threads. > > | | min (ms) | max (ms) | mean (ms) | > | ------------ | -------- | -------- | ---------- | > | remset init | 0.000292 | 0.706 | 0.00258083 | > | remset clear | 0.000082 | 0.015 | 0.00111340 | > > Tested with tiers 1-7 and local test making sure there are no remsets for young pages. SPECjbb2015 performance measurements show no statistically significant regression/improvement. This pull request has now been integrated. Changeset: 37ec80df Author: Joel Sikstr?m Committer: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/37ec80df8d3b014292fc3d31a1b2aad4e8218ea5 Stats: 95 lines in 7 files changed: 1 ins; 67 del; 27 mod 8339161: ZGC: Remove unused remembered sets Reviewed-by: aboldtch, stefank ------------- PR: https://git.openjdk.org/jdk/pull/20947 From zgu at openjdk.org Mon Sep 23 12:37:36 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 23 Sep 2024 12:37:36 GMT Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue In-Reply-To: References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> Message-ID: On Thu, 19 Sep 2024 21:46:43 GMT, William Kemper wrote: >> [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. >> >> Adopt shared implementation. > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 228: > >> 226: finish_mark_work(); >> 227: assert(task_queues()->is_empty(), "Should be empty"); >> 228: TASKQUEUE_STATS_ONLY(task_queues()->print_and_reset_taskqueue_stats("")); > > Could we pass `"Finish Mark"` for the label here. The label is used for queue names in other GCs, instead of phases. I passed empty string to be consistent with old label. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21077#discussion_r1771319732 From rkennke at openjdk.org Mon Sep 23 14:30:41 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 23 Sep 2024 14:30:41 GMT Subject: RFR: 8340183: Shenandoah: Incorrect match for clone barrier in is_gc_barrier_node In-Reply-To: References: Message-ID: <56gDsD_PGk6_iCgfzeI2NIEC-FpUrjRyW8WUzKy5oXs=.5965cafb-3926-4134-88e6-6d9cf72fef1d@github.com> On Mon, 16 Sep 2024 10:35:15 GMT, Aleksey Shipilev wrote: > The name of the call we emit is "shenandoah_clone": > https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L806 > > ...yet we test for "shenandoah_clone_barrier" here: > https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L688 > > I think we are better off polling the call target instead of relying on call name. This change also eliminates `shenandoah_cas_obj` matcher, for which we do not have the emitted call ever since we started doing CAS expansions inline. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC` Looks good, thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21014#pullrequestreview-2322461615 From shade at openjdk.org Mon Sep 23 14:35:41 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Sep 2024 14:35:41 GMT Subject: RFR: 8340183: Shenandoah: Incorrect match for clone barrier in is_gc_barrier_node In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 10:35:15 GMT, Aleksey Shipilev wrote: > The name of the call we emit is "shenandoah_clone": > https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L806 > > ...yet we test for "shenandoah_clone_barrier" here: > https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L688 > > I think we are better off polling the call target instead of relying on call name. This change also eliminates `shenandoah_cas_obj` matcher, for which we do not have the emitted call ever since we started doing CAS expansions inline. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC` Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21014#issuecomment-2368467997 From shade at openjdk.org Mon Sep 23 14:35:42 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Sep 2024 14:35:42 GMT Subject: Integrated: 8340183: Shenandoah: Incorrect match for clone barrier in is_gc_barrier_node In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 10:35:15 GMT, Aleksey Shipilev wrote: > The name of the call we emit is "shenandoah_clone": > https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L806 > > ...yet we test for "shenandoah_clone_barrier" here: > https://github.com/openjdk/jdk/blob/545951889c1ea68646be600decaf2bf4c049600b/src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp#L688 > > I think we are better off polling the call target instead of relying on call name. This change also eliminates `shenandoah_cas_obj` matcher, for which we do not have the emitted call ever since we started doing CAS expansions inline. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [ ] Linux x86_64 server fastdebug, `all` with `-XX:+UseShenandoahGC` This pull request has now been integrated. Changeset: ea8f35b9 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/ea8f35b98e618bfa55371e45b3ef61fa5289dd94 Stats: 20 lines in 3 files changed: 6 ins; 10 del; 4 mod 8340183: Shenandoah: Incorrect match for clone barrier in is_gc_barrier_node Reviewed-by: roland, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/21014 From kirk at kodewerk.com Mon Sep 23 15:48:22 2024 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Mon, 23 Sep 2024 08:48:22 -0700 Subject: Aligning the Serial collector with ZGC Message-ID: <49D0FB58-94AB-46B6-A8A6-F9F08773770E@kodewerk.com> Hi, I wanted to surface to the mailing list that we've taken on the task of adding Automated Heap Sizing (AHS) as has been introduced into ZGC (and is currently being introduced into G1 https://github.com/openjdk/jdk/pull/20783 )) into the Serial collector. The goals of this effort are modeled after the goals for ZGC and we plan to borrow as much as possible (or as much as makes sense). For example, we would like to alter the default settings for -Xmx and -Xms. Instead of 1/4, the default MaxHeapSize would be set to available RAM. The collector will use of memory and CPU pressure, similar to what was introduced in ZGC, to control heap expansion and contraction. Current sizing ergonomics is based on the number of non-daemon threads. Altering this is expected to give the Serial collector a more dynamic ability to uncommit memory no longer in use (thus be more memory efficient when running in a container). The flags SoftMaxHeapSize and SerialPressure as well as the level of global memory pressure would be used to help guide ergonomic choices. This new ergonomic choice should work to minimize GC overhead while avoiding becoming an OOM victim. As part of this, the goal is to provide enough memory but not at all costs. We see this work being broken down into several steps. Very roughly the steps would be; - Introduce an adaptive size policy that takes into account memory and CPU pressure along with global memory pressure. - Heap should be large enough to minimize GC overhead but not large enough to trigger OOM. - Introduce -XX:SerialPressure=[0-100] to support this work. - introduce a smoothing algorythm to avoid excessive small resizes. - Introduce manageable flag SoftMaxHeapSize to define a target heap size and set the default max heap size to 100% of available. - Add in the ability to uncommit memory (to reduce global memory pressure). While working through the details of this work I noted that there appear to opportunities to offer new defaults for other settings. For example, when tuning GC I've found that it's best to set max heap size to the sum of the size of Eden, Survivor, and Tenured. The reasoning is that each of these spaces surves a specific purpose in managing object lifecycles and as such, (with few exceptions) each make use of a different metric to guide how to size. Also, unlike ZGC, there is an overhead penalty for having an oversized tenured space. Consequently there is a sizing sweetspot where overhead will be minimized . Too small and overheads and GC cycle freqency will be very high. As heap is increases past the optimal size you will tend to see a gradual degradation in GC performance. For Eden the guiding metric is allocation rate. For Survivor it's life cycle (age table). For Tenured it's live set size. Using these metrics to determine size of the parts and use that to then calculate a max heap size has almost always yielded lower GC overheads than setting a heap size and then letting ratios size everything. This maybe a separate piece of work but the intent would be to have ergonomics calculate optimal eden, survivor and tenured sizes. Each young collection is an opportunity to resize Eden and Survivor whereas a full would be used to resize Eden, Survivor and Tenured space. This may lead to the need to ignore NewRatio and (the soft target) MaxGCPauseMillis. As for testing. I?m currently looking at modifying HyperAlloc to add ability to alter the shape of the load on the collector over time. All of this is still in it?s infancy and we?re open for guidance and input. As for the work on G1, an initial patch as been submitted (URL above) and is open for comments. Kind regards, Kirk -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkemper at openjdk.org Mon Sep 23 17:17:34 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 23 Sep 2024 17:17:34 GMT Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue In-Reply-To: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> Message-ID: On Thu, 19 Sep 2024 00:33:04 GMT, Zhengyu Gu wrote: > [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. > > Adopt shared implementation. Marked as reviewed by wkemper (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21077#pullrequestreview-2322876022 From wkemper at openjdk.org Mon Sep 23 17:17:35 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 23 Sep 2024 17:17:35 GMT Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue In-Reply-To: References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> Message-ID: On Mon, 23 Sep 2024 12:35:16 GMT, Zhengyu Gu wrote: >> src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 228: >> >>> 226: finish_mark_work(); >>> 227: assert(task_queues()->is_empty(), "Should be empty"); >>> 228: TASKQUEUE_STATS_ONLY(task_queues()->print_and_reset_taskqueue_stats("")); >> >> Could we pass `"Finish Mark"` for the label here. > > The label is used for queue names in other GCs, instead of phases. I passed empty string to be consistent with old label. Okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21077#discussion_r1771815824 From shade at openjdk.org Mon Sep 23 17:26:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Sep 2024 17:26:37 GMT Subject: RFR: 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue In-Reply-To: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> Message-ID: On Thu, 19 Sep 2024 00:33:04 GMT, Zhengyu Gu wrote: > [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. > > Adopt shared implementation. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21077#pullrequestreview-2322894077 From aboldtch at openjdk.org Tue Sep 24 05:37:38 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 24 Sep 2024 05:37:38 GMT Subject: RFR: 8340146: ZGC: TestAllocateHeapAt.java should not run with UseLargePages In-Reply-To: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com> References: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com> Message-ID: On Mon, 23 Sep 2024 07:22:44 GMT, Axel Boldt-Christmas wrote: > TestAllocateHeapAt.java expects that creating the heap file works in the current director (`.`). But when using persistent hugepages (-XX:+UseLargePages) this would require the filesystem to be a HugeTLBFS. > > I propose that we do not allow running these tests with persistent hugepages. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21127#issuecomment-2370228388 From aboldtch at openjdk.org Tue Sep 24 05:37:38 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 24 Sep 2024 05:37:38 GMT Subject: Integrated: 8340146: ZGC: TestAllocateHeapAt.java should not run with UseLargePages In-Reply-To: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com> References: <1AOCA9rqJqTywzPzJ7jtmw4SJ01kE1LAfvmG1otOH-U=.e17bd4bd-221b-4ae6-a213-57e031f731a1@github.com> Message-ID: On Mon, 23 Sep 2024 07:22:44 GMT, Axel Boldt-Christmas wrote: > TestAllocateHeapAt.java expects that creating the heap file works in the current director (`.`). But when using persistent hugepages (-XX:+UseLargePages) this would require the filesystem to be a HugeTLBFS. > > I propose that we do not allow running these tests with persistent hugepages. This pull request has now been integrated. Changeset: 4098acc2 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/4098acc200e608369ac1631dcc8513ea797bd59e Stats: 5 lines in 3 files changed: 3 ins; 0 del; 2 mod 8340146: ZGC: TestAllocateHeapAt.java should not run with UseLargePages Reviewed-by: tschatzl, stefank ------------- PR: https://git.openjdk.org/jdk/pull/21127 From rcastanedalo at openjdk.org Tue Sep 24 09:01:53 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 24 Sep 2024 09:01:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Fri, 20 Sep 2024 15:26:36 GMT, Roman Kennke wrote: >> I tried to reproduce for a few hours now using a custom testcase, with no success. >> I am pretty sure that this can happen, that is why I added this code. Originally I had an assert there asserting that index is not used. I do remember that this happens very rarely, and I don't remember the exact condition. Looking at the possible operands in opclass memory, I think this can only happen when we load an nKlass from an address of the form [rX, rY], i.e. the address in rX indexed by rY. This is an odd thing to happen for loadNKlass, I think, because rY should always be klass_offset_in_bytes. Maybe this is possible when we get odd address merges where we get a PhiNode as the offset/index? I don't know. >> I agree, this *might* lead to surprising problems with implicit null-checking, if it is expected that the first instruction in loadNKlass provokes the SIGSEGV. A way around this would be to declare an opclass that is a subset of 'memory' that excludes all operands with index, and match on that. I think this would force the lea as a separate instruction and ensure that we never see such a thing in loadNKlass. However, I would not feel very confident to do that without a reproducer. Let me dig a little further. >> >> For reference, here is my unsuccessful reproducer: https://gist.github.com/rkennke/8a57610d74fcde07a9390f268ec35738 > > Something like this is what I have in mind. It seems to pass tier1 tests. I still haven't managed to reproduce the path that requires an index register, though. > https://github.com/rkennke/jdk/commit/2c4a7877e4ef94017c8155578d8cfc9342441656 Thanks for the update! If there is a path requiring an index register, I would agree on limiting the memory opclass to exclude indices as you suggest. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1772945253 From rkennke at openjdk.org Tue Sep 24 11:42:30 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 24 Sep 2024 11:42:30 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v24] In-Reply-To: References: Message-ID: <7N9vxRKxAK2GCBNlnU5E0Bj0sGV6_T-2QX9fKCCxlWg=.bdee038b-cee3-4c52-825c-d381d3616092@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Improve matching of loadNKlassCompactHeaders on aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/0d8a9236..2c4a7877 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=22-23 Stats: 17 lines in 3 files changed: 5 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From zgu at openjdk.org Tue Sep 24 13:19:43 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Tue, 24 Sep 2024 13:19:43 GMT Subject: Integrated: 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue In-Reply-To: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> References: <6PiT6kRNGaQjkJTNknVTqMV14SyYXwFosAz9Ts3Emfk=.46c01286-6733-424c-a0d6-e6ff8aa4726c@github.com> Message-ID: On Thu, 19 Sep 2024 00:33:04 GMT, Zhengyu Gu wrote: > [JDK-8280397](https://bugs.openjdk.org/browse/JDK-8280397) made the code redundant. > > Adopt shared implementation. This pull request has now been integrated. Changeset: 279086d4 Author: Zhengyu Gu URL: https://git.openjdk.org/jdk/commit/279086d4ce7e05972e099022e8045f39680dd4e8 Stats: 49 lines in 4 files changed: 0 ins; 47 del; 2 mod 8340408: Shenandoah: Remove redundant task stats printing code in ShenandoahTaskQueue Reviewed-by: shade, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/21077 From erik.osterlund at oracle.com Tue Sep 24 13:28:42 2024 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Tue, 24 Sep 2024 13:28:42 +0000 Subject: Aligning the Serial collector with ZGC In-Reply-To: <49D0FB58-94AB-46B6-A8A6-F9F08773770E@kodewerk.com> References: <49D0FB58-94AB-46B6-A8A6-F9F08773770E@kodewerk.com> Message-ID: Hi Kirk, I wonder if we all end up having a -XX:{Z, G1, Shenandoah, Serial, Parallel?}GCPressure=[similar, range] flag to hint to the GC to be more or less aggressive, if we should try to have just a single GCPressure flag for this instead. What do you think? Kind regards, /Erik On 23 Sep 2024, at 17:48, Kirk Pepperdine wrote: Hi, I wanted to surface to the mailing list that we've taken on the task of adding Automated Heap Sizing (AHS) as has been introduced into ZGC (and is currently being introduced into G1 https://github.com/openjdk/jdk/pull/20783)) into the Serial collector. The goals of this effort are modeled after the goals for ZGC and we plan to borrow as much as possible (or as much as makes sense). For example, we would like to alter the default settings for -Xmx and -Xms. Instead of 1/4, the default MaxHeapSize would be set to available RAM. The collector will use of memory and CPU pressure, similar to what was introduced in ZGC, to control heap expansion and contraction. Current sizing ergonomics is based on the number of non-daemon threads. Altering this is expected to give the Serial collector a more dynamic ability to uncommit memory no longer in use (thus be more memory efficient when running in a container). The flags SoftMaxHeapSize and SerialPressure as well as the level of global memory pressure would be used to help guide ergonomic choices. This new ergonomic choice should work to minimize GC overhead while avoiding becoming an OOM victim. As part of this, the goal is to provide enough memory but not at all costs. We see this work being broken down into several steps. Very roughly the steps would be; - Introduce an adaptive size policy that takes into account memory and CPU pressure along with global memory pressure. - Heap should be large enough to minimize GC overhead but not large enough to trigger OOM. - Introduce -XX:SerialPressure=[0-100] to support this work. - introduce a smoothing algorythm to avoid excessive small resizes. - Introduce manageable flag SoftMaxHeapSize to define a target heap size and set the default max heap size to 100% of available. - Add in the ability to uncommit memory (to reduce global memory pressure). While working through the details of this work I noted that there appear to opportunities to offer new defaults for other settings. For example, when tuning GC I've found that it's best to set max heap size to the sum of the size of Eden, Survivor, and Tenured. The reasoning is that each of these spaces surves a specific purpose in managing object lifecycles and as such, (with few exceptions) each make use of a different metric to guide how to size. Also, unlike ZGC, there is an overhead penalty for having an oversized tenured space. Consequently there is a sizing sweetspot where overhead will be minimized . Too small and overheads and GC cycle freqency will be very high. As heap is increases past the optimal size you will tend to see a gradual degradation in GC performance. For Eden the guiding metric is allocation rate. For Survivor it's life cycle (age table). For Tenured it's live set size. Using these metrics to determine size of the parts and use that to then calculate a max heap size has almost always yielded lower GC overheads than setting a heap size and then letting ratios size everything. This maybe a separate piece of work but the intent would be to have ergonomics calculate optimal eden, survivor and tenured sizes. Each young collection is an opportunity to resize Eden and Survivor whereas a full would be used to resize Eden, Survivor and Tenured space. This may lead to the need to ignore NewRatio and (the soft target) MaxGCPauseMillis. As for testing. I?m currently looking at modifying HyperAlloc to add ability to alter the shape of the load on the collector over time. All of this is still in it?s infancy and we?re open for guidance and input. As for the work on G1, an initial patch as been submitted (URL above) and is open for comments. Kind regards, Kirk -------------- next part -------------- An HTML attachment was scrubbed... URL: From kirk at kodewerk.com Tue Sep 24 15:27:43 2024 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Tue, 24 Sep 2024 08:27:43 -0700 Subject: Aligning the Serial collector with ZGC In-Reply-To: References: <49D0FB58-94AB-46B6-A8A6-F9F08773770E@kodewerk.com> Message-ID: Hi Erik, I wasn?t sure how committed everyone was to the ZGCPressure especially as it?s in the JEP. I also wasn?t sure about how entangled one would want the flags to be. For example, I?m guessing that a good default value for GCPressure would be 1 or 2 for the Serial collector whereas for G1 I believe Google has settled on 20 IIRC for their version of this flag. I could see 1 or 2 for the Parallel collector should it be decided that the work be performed on that collector also. But other than that, my first thought was, maybe this could just be GCPressure. Kind regards, Kirk > On Sep 24, 2024, at 6:28 AM, Erik Osterlund wrote: > > Hi Kirk, > > I wonder if we all end up having a -XX:{Z, G1, Shenandoah, Serial, Parallel?}GCPressure=[similar, range] flag to hint to the GC to be more or less aggressive, if we should try to have just a single GCPressure flag for this instead. What do you think? > > Kind regards, > /Erik > >> On 23 Sep 2024, at 17:48, Kirk Pepperdine wrote: >> >> Hi, >> >> I wanted to surface to the mailing list that we've taken on the task of adding Automated Heap Sizing (AHS) as has been introduced into ZGC (and is currently being introduced into G1 https://github.com/openjdk/jdk/pull/20783 )) into the Serial collector. The goals of this effort are modeled after the goals for ZGC and we plan to borrow as much as possible (or as much as makes sense). For example, we would like to alter the default settings for -Xmx and -Xms. Instead of 1/4, the default MaxHeapSize would be set to available RAM. The collector will use of memory and CPU pressure, similar to what was introduced in ZGC, to control heap expansion and contraction. Current sizing ergonomics is based on the number of non-daemon threads. Altering this is expected to give the Serial collector a more dynamic ability to uncommit memory no longer in use (thus be more memory efficient when running in a container). The flags SoftMaxHeapSize and SerialPressure as well as the level of global memory pressure would be used to help guide ergonomic choices. This new ergonomic choice should work to minimize GC overhead while avoiding becoming an OOM victim. As part of this, the goal is to provide enough memory but not at all costs. >> >> We see this work being broken down into several steps. Very roughly the steps would be; >> >> - Introduce an adaptive size policy that takes into account memory and CPU pressure along with global memory pressure. >> - Heap should be large enough to minimize GC overhead but not large enough to trigger OOM. >> - Introduce -XX:SerialPressure=[0-100] to support this work. >> - introduce a smoothing algorythm to avoid excessive small resizes. >> >> - Introduce manageable flag SoftMaxHeapSize to define a target heap size and set the default max heap size to 100% of available. >> - Add in the ability to uncommit memory (to reduce global memory pressure). >> >> >> While working through the details of this work I noted that there appear to opportunities to offer new defaults for other settings. For example, when tuning GC I've found that it's best to set max heap size to the sum of the size of Eden, Survivor, and Tenured. The reasoning is that each of these spaces surves a specific purpose in managing object lifecycles and as such, (with few exceptions) each make use of a different metric to guide how to size. Also, unlike ZGC, there is an overhead penalty for having an oversized tenured space. Consequently there is a sizing sweetspot where overhead will be minimized . Too small and overheads and GC cycle freqency will be very high. As heap is increases past the optimal size you will tend to see a gradual degradation in GC performance. >> >> For Eden the guiding metric is allocation rate. For Survivor it's life cycle (age table). For Tenured it's live set size. Using these metrics to determine size of the parts and use that to then calculate a max heap size has almost always yielded lower GC overheads than setting a heap size and then letting ratios size everything. This maybe a separate piece of work but the intent would be to have ergonomics calculate optimal eden, survivor and tenured sizes. Each young collection is an opportunity to resize Eden and Survivor whereas a full would be used to resize Eden, Survivor and Tenured space. This may lead to the need to ignore NewRatio and (the soft target) MaxGCPauseMillis. >> >> As for testing. I?m currently looking at modifying HyperAlloc to add ability to alter the shape of the load on the collector over time. >> >> All of this is still in it?s infancy and we?re open for guidance and input. >> >> As for the work on G1, an initial patch as been submitted (URL above) and is open for comments. >> >> >> Kind regards, >> Kirk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From coleenp at openjdk.org Tue Sep 24 15:40:55 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 24 Sep 2024 15:40:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: References: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> Message-ID: On Fri, 20 Sep 2024 18:11:43 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 >> - review feedback > > src/hotspot/share/memory/metaspace/metablock.hpp line 74: > >> 72: #define METABLOCKFORMATARGS(__block__) p2i((__block__).base()), (__block__).word_size() >> 73: >> 74: } // namespace metaspace > > I am wondering if some of these metaspace changes, that is, the addition of MetaBlock could be upstreamed ahead of the CompactObjectHeaders. Some is refactoring so that you can use the wastage to allocate into class-arena but a lot of this seems neutral to compact object headers, and would reduce this patch and allow different people to focus on just this. For the record, I am fine with these metaspace changes going in with this PR if the timing for that is better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1773607587 From kvn at openjdk.org Tue Sep 24 20:00:47 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 24 Sep 2024 20:00:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v25] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Mon, 23 Sep 2024 07:54:39 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/opto/matcher.cpp line 1821: >> >>> 1819: if( rule >= _END_INST_CHAIN_RULE || rule < _BEGIN_INST_CHAIN_RULE ) { >>> 1820: assert(C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf), >>> 1821: "duplicating node that's already been matched"); >> >> Why it was removed? > > The assertion was failing due to it being too strict in several cases where the matcher would generate valid code anyway. One of them is when `is_encode_and_store_pattern(n, m)` returns true but `m -> n` cannot be matched by a single `g1EncodePAndStoreN` instruction. Commit 9ad158b6 removes this case by ensuring that `is_encode_and_store_pattern(n, m)` holds only if `m -> n` can indeed be matched. > There are other cases (all of them harmless as far as I can see) in which this assertion can fail. I am investigating whether they can be avoided so that the assertion can be restored, and what would be the impact on the "redundant decompression removal" (`g1EncodePAndStoreN`) optimization. I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1773999931 From rcastanedalo at openjdk.org Wed Sep 25 04:22:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 04:22:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v26] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with three additional commits since the last revision: - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/47c982ba..6fb36e50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=24-25 Stats: 104 lines in 5 files changed: 4 ins; 30 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Wed Sep 25 04:26:43 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 04:26:43 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v24] In-Reply-To: References: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> Message-ID: On Sat, 21 Sep 2024 06:44:21 GMT, Fei Yang wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Remove redundant comment > > src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 257: > >> 255: RegSet::of($res$$Register) /* no_preserve */); >> 256: __ mov($tmp1$$Register, $oldval$$Register); >> 257: __ mov($tmp2$$Register, $newval$$Register); > > Hi, I don't quite understand these two register-register moves here. Seems to me that we could pass `oldval` and `newval` to `cmpxchg` directly as `cmpxchg` won't modify them, which help us save these two moves. Did I miss anything? Thanks. Hi Fei, good catch, thanks. These moves have been around since the changes were initially prototyped, and are indeed unnecessary. Note that micro-optimization of the barrier code for atomic memory accesses has not been a focus of this JEP since we have not found any performance reason to do it at the macro level. Having said that, the moves are just wasteful and (perhaps more importantly) make the code harder to read and maintain, so I just removed them (commit 2c7f374e). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774393587 From rcastanedalo at openjdk.org Wed Sep 25 04:58:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 04:58:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v19] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Tue, 24 Sep 2024 19:57:29 GMT, Vladimir Kozlov wrote: > I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code. Thanks, in the meantime I found the remaining case that was causing assertion failures, and managed to handle it and reintroduce the original assertion. The case is where the output of `m` is shared by multiple nodes `N = {n1, n2, ...}` and there exists at the same time a `n` in `N` such that `is_encode_and_store_pattern(n, m)`, and a different `n` in `N` such that `!is_encode_and_store_pattern(n, m)`. Here is an example of such a case: ![ideal](https://github.com/user-attachments/assets/2122dfe0-757c-4094-b8f7-451f4380af45) Commit f96dfe73 ensures that this case does not trigger the assertion by avoiding cloning `m` in `m -> n` if `m` is shared. This means that a few encode-and-store patterns that were matched by a single `g1EncodePAndStoreN` before are now matched with the less optimized `encodeHeapOop` + `g1StoreN`, e.g. in the above example: ![mach-before-after](https://github.com/user-attachments/assets/a6fe5ad2-c0fb-4098-8bed-34593dafbcc5) Luckily, this case is very infrequent so we will only miss around 1% of all optimization opportunities. In return, we can reintroduce the original assertion and be sure that the original invariants of the matcher are preserved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774467183 From rcastanedalo at openjdk.org Wed Sep 25 04:58:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 04:58:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v19] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Wed, 25 Sep 2024 04:55:35 GMT, Roberto Casta?eda Lozano wrote: >> I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code. > >> I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code. > > Thanks, in the meantime I found the remaining case that was causing assertion failures, and managed to handle it and reintroduce the original assertion. The case is where the output of `m` is shared by multiple nodes `N = {n1, n2, ...}` and there exists at the same time a `n` in `N` such that `is_encode_and_store_pattern(n, m)`, and a different `n` in `N` such that `!is_encode_and_store_pattern(n, m)`. Here is an example of such a case: > > ![ideal](https://github.com/user-attachments/assets/2122dfe0-757c-4094-b8f7-451f4380af45) > > Commit f96dfe73 ensures that this case does not trigger the assertion by avoiding cloning `m` in `m -> n` if `m` is shared. This means that a few encode-and-store patterns that were matched by a single `g1EncodePAndStoreN` before are now matched with the less optimized `encodeHeapOop` + `g1StoreN`, e.g. in the above example: > > ![mach-before-after](https://github.com/user-attachments/assets/a6fe5ad2-c0fb-4098-8bed-34593dafbcc5) > > Luckily, this case is very infrequent so we will only miss around 1% of all optimization opportunities. In return, we can reintroduce the original assertion and be sure that the original invariants of the matcher are preserved. @TheRealMDoerr: since there are now a few corner cases where we match a StoreN node with g1StoreN even though it stores the output of an EncodeP node, I had to remove the assertions in the x64 and ppc g1StoreN definitions, see above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774467652 From fyang at openjdk.org Wed Sep 25 07:36:47 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 25 Sep 2024 07:36:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v24] In-Reply-To: References: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> Message-ID: On Wed, 25 Sep 2024 04:22:49 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 257: >> >>> 255: RegSet::of($res$$Register) /* no_preserve */); >>> 256: __ mov($tmp1$$Register, $oldval$$Register); >>> 257: __ mov($tmp2$$Register, $newval$$Register); >> >> Hi, I don't quite understand these two register-register moves here. Seems to me that we could pass `oldval` and `newval` to `cmpxchg` directly as `cmpxchg` won't modify them, which help us save these two moves. Did I miss anything? Thanks. > > Hi Fei, good catch, thanks. These moves have been around since the changes were initially prototyped, and are indeed unnecessary. Note that micro-optimization of the barrier code for atomic memory accesses has not been a focus of this JEP since we have not found any performance reason to do it at the macro level. Having said that, the moves are just wasteful and (perhaps more importantly) make the code harder to read and maintain, so I just removed them (commit 2c7f374e). Thanks for the update. It now looks cleaner and easier to understand. BTW: Seems that RISC-V part bears a similar issue. I will discuss with @feilongjiang and hopefully we will come up with a similar fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774695093 From sjohanss at openjdk.org Wed Sep 25 08:05:43 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 25 Sep 2024 08:05:43 GMT Subject: RFR: 8340419: ZGC: Create an UseLargePages adaptation of TestAllocateHeapAt.java In-Reply-To: References: Message-ID: <9ZwyHA5LJ4HJ7S_j9rKB7PqehVDuil-EzwJEOA72zIY=.f6ca79f6-86ec-4815-9fbf-10745ed034bd@github.com> On Mon, 23 Sep 2024 07:28:15 GMT, Axel Boldt-Christmas wrote: > [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127 disables TestAllocateHeapAt.java when running with persistent hugepages because it makes assumptions on the underlying file systems. > > I propose creating a version of this tests which instead first checks if there is an appropriate mount point for a persistent hugepages heap file, and only runs the test if exists. Looks good. test/hotspot/jtreg/gc/z/TestAllocateHeapAtWithHugeTLBFS.java line 80: > 78: ProcessTools.executeTestJava( > 79: "-XX:+UseZGC", > 80: "-XX:+ZGenerational", Any reason to include `-XX:+ZGenerational` or should we just skip it? ------------- Marked as reviewed by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21128#pullrequestreview-2327468399 PR Review Comment: https://git.openjdk.org/jdk/pull/21128#discussion_r1774745690 From aboldtch at openjdk.org Wed Sep 25 09:25:36 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 25 Sep 2024 09:25:36 GMT Subject: RFR: 8340419: ZGC: Create an UseLargePages adaptation of TestAllocateHeapAt.java In-Reply-To: <9ZwyHA5LJ4HJ7S_j9rKB7PqehVDuil-EzwJEOA72zIY=.f6ca79f6-86ec-4815-9fbf-10745ed034bd@github.com> References: <9ZwyHA5LJ4HJ7S_j9rKB7PqehVDuil-EzwJEOA72zIY=.f6ca79f6-86ec-4815-9fbf-10745ed034bd@github.com> Message-ID: <-jmkh9ZNe6sDpJ6wTKeR7AB9JFBk5PSnY3ISsoT9ErM=.65995ba4-85e4-44b5-b3e4-594d8d1c8d75@github.com> On Wed, 25 Sep 2024 08:02:39 GMT, Stefan Johansson wrote: >> [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127 disables TestAllocateHeapAt.java when running with persistent hugepages because it makes assumptions on the underlying file systems. >> >> I propose creating a version of this tests which instead first checks if there is an appropriate mount point for a persistent hugepages heap file, and only runs the test if exists. > > test/hotspot/jtreg/gc/z/TestAllocateHeapAtWithHugeTLBFS.java line 80: > >> 78: ProcessTools.executeTestJava( >> 79: "-XX:+UseZGC", >> 80: "-XX:+ZGenerational", > > Any reason to include `-XX:+ZGenerational` or should we just skip it? I try to keep this option explicit in tests until it is removed. Avoid assumptions about default values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21128#discussion_r1774882896 From duke at openjdk.org Wed Sep 25 11:57:40 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Wed, 25 Sep 2024 11:57:40 GMT Subject: RFR: 8340419: ZGC: Create an UseLargePages adaptation of TestAllocateHeapAt.java In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:28:15 GMT, Axel Boldt-Christmas wrote: > [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127 disables TestAllocateHeapAt.java when running with persistent hugepages because it makes assumptions on the underlying file systems. > > I propose creating a version of this tests which instead first checks if there is an appropriate mount point for a persistent hugepages heap file, and only runs the test if exists. I think this looks good. ------------- Marked as reviewed by jsikstro at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/21128#pullrequestreview-2328027755 From rkennke at openjdk.org Wed Sep 25 12:34:36 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 25 Sep 2024 12:34:36 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v25] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Enforce lightweight locking on 32-bit platforms ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/2c4a7877..cd69da86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=23-24 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Wed Sep 25 12:53:17 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 25 Sep 2024 12:53:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Allow LM_MONITOR on 32-bit platforms ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/cd69da86..4904d433 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=24-25 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rcastanedalo at openjdk.org Wed Sep 25 13:54:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 13:54:59 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> Message-ID: <2adTLZAwTvFTVNGeR5e9Cef5uNqpsz2haeobLIDZiNI=.cb2bbf0d-5c1b-4583-b4bd-898e0c5cdbb7@github.com> On Fri, 13 Sep 2024 06:43:34 GMT, Roberto Casta?eda Lozano wrote: >> I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP. > > I see, thanks. In that case, I would suggest removing the explicit `UseCompressedClassPointers` test, since it should be implied by `t->isa_narrowklass()`. `check_init()` within `CompressedKlassPointers::shift()` would already fail for the unexpected case where `t->isa_narrowklass() && !UseCompressedClassPointers`, no? I think it would be good to remove the explicit `UseCompressedClassPointers` test as argued above (i.e. revert this change), unless there is any other reason to keep it that I am missing out? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1775277784 From rcastanedalo at openjdk.org Wed Sep 25 14:19:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 14:19:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/share/opto/memnode.cpp line 2256: > 2254: if (!UseCompactObjectHeaders && alloc != nullptr) { > 2255: return TypeX::make(markWord::prototype().value()); > 2256: } Suggestion: make these four lines conditional on `!UseCompactObjectHeaders`, like so: if (!UseCompactObjectHeaders) { Node* alloc = is_new_object_mark_load(); if (alloc != nullptr) { return TypeX::make(markWord::prototype().value()); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1775322670 From sjohanss at openjdk.org Wed Sep 25 20:10:44 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 25 Sep 2024 20:10:44 GMT Subject: RFR: 8340426: ZGC: Move defragment out of the allocation path Message-ID: Please review this change to move defragmentation of small pages out of the allocation path, **Summary** In ZGC small pages are allocated at lower addresses while medium and large pages are allocated at higher addresses. Sometimes when memory usage is high or the distribution of pages in the cache demand it, small pages can be split out from medium or large pages. When this happens the small page is defragmented by remapping it to a lower address if it resides at a too high address. This is done to avoid fragmentation, but doing it in the allocation path comes with a cost since the remapping involves system calls. This change moves the remapping away from the allocation path to when the pages are freed after being garbage collected. This can increase the fragmentation a bit, since small pages are allowed to reside at higher addresses for a period time. The expectation is that since small pages are frequently collected the period should be short enough to not cause any problems. I've done detailed experiments with temporary JFR events to look at the cost of doing the defrag/remapping of pages. One problem is that by default we don't get a lot of defrags (unless we run out of memory), and to better investigate this I also altered the allocation path to force more defragmentation. These tests showed that moving defrag out of the allocation path not only have the positive effect of reducing the time spent allocating, but also spaced out the defragmentation a bit more. The reason for this is that now we will defrag when a page is freed by the GC, instead of when the page is allocated and the tests show that often many threads allocate at the same time (and then also defrag at the same time). This in turn leads to many concurrent calls down to the system to remap the memory, and this increases the cost of the actual mapping (seen by adding JFR events). **Additional testing** - Functional testing in mach5 tier1-7 - Sanity performance testing in aurora ------------- Commit messages: - Move statistics to cover all cases - Enable defragment for ZRelocate calls to free_page - 8340426: ZGC: Move defragment out of the allocation path Changes: https://git.openjdk.org/jdk/pull/21191/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21191&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340426 Stats: 75 lines in 5 files changed: 47 ins; 17 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/21191.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21191/head:pull/21191 PR: https://git.openjdk.org/jdk/pull/21191 From thomas.schatzl at oracle.com Thu Sep 26 08:20:11 2024 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Thu, 26 Sep 2024 10:20:11 +0200 Subject: Aligning the Serial collector with ZGC In-Reply-To: <49D0FB58-94AB-46B6-A8A6-F9F08773770E@kodewerk.com> References: <49D0FB58-94AB-46B6-A8A6-F9F08773770E@kodewerk.com> Message-ID: <44230e90-d8bc-491f-a5c4-2c483646fc3e@oracle.com> Hi Kirk, somewhat random comments... On 23.09.24 17:48, Kirk Pepperdine wrote:> Hi, > > I wanted to surface to the mailing list that we've taken on the task > of adding Automated Heap Sizing (AHS) as has been introduced into ZGC > (and is currently being introduced into G1 > https://github.com/openjdk/jdk/pull/20783 > )) into the Serial > collector. > The goals of this effort are modeled after the goals for ZGC and we > plan to borrow as much as possible (or as much as makes sense). For > example, we would like to alter the default settings for -Xmx and > -Xms. Instead of 1/4, the default MaxHeapSize would be set to > available RAM. The > collector will use of memory and CPU pressure, similar to what was > introduced in ZGC, to control heap expansion and contraction. Current > sizing ergonomics is based on the number of non-daemon threads. > Altering this is expected to give the Serial collector a more dynamic > ability to uncommit memory no longer in use (thus be more memory > efficient when running in a container). The flags SoftMaxHeapSize and > SerialPressure as well as the level of global memory pressure would be > used to help guide ergonomic choices. This new ergonomic choice should > work to minimize GC overhead while avoiding becoming an OOM victim. As > part of this, the goal is to provide enough memory but not at all > costs. > > We see this work being broken down into several steps. Very roughly > the steps would be; > > - Introduce an adaptive size policy that takes into account memory and > CPU pressure along with global memory pressure. > - Heap should be large enough to minimize GC overhead but not > large enough to trigger OOM. (probably meant "small enough" the second time) > - Introduce -XX:SerialPressure=[0-100] to support this work. (Fwiw, regards to the other discussion, I agree that if we have a flag with the same "meaning" across collectors it might be useful to use the same name). > - introduce a smoothing algorythm to avoid excessive small > resizes. One option is to split this further into parts: * list what actions Serial GC could do in reaction to memory pressure on an abstract level, and which make sense; from that see what functionality is needed. * provide functionality that tries to keep some kind of GC/mutator time ratio; I would start with looking at G1 does because Serial GC's behaviour is probably closer to G1 than ZGC, but ymmv. (Obviously improvements are welcome :)) (This may not need to be exposed externally like some GCTimeRatio/GCCPUPercentage/whatever flag name) * add functionality to calculate memory pressure from the environment; maybe in a containerized environment from a manageable flag as it does not have a global "pressure" view. This could probably taken from ZGC, at least partially * some transfer function that translates this external memory pressure, based on "GCPressure", (e.g. that "sigmoid" function plus lots of magic numbers) to reaction in the gc: e.g. change the gc/mutator pause time goal, start collections, uncommit memory... * (probably) some background thread that continuously calculates and reacts on global pressure (uncommit memory, do a gc, resize heap, ...) because one probably does not want to wait for the next gc to react... * do lots of testing to weed out corner cases > - Introduce manageable flag SoftMaxHeapSize to define a target heap > size nd set the default max heap size to 100% of available. I am a bit torn about SoftMaxHeapSize in Serial GC. What do you envision that Serial GC would do when the SoftMaxHeapSize has been reached, and what if old gen occupancy permanently stays above that value? The usefulness of SoftMaxHeapSize kind of relies on having a minimally invasive old gen collection that tries to get old gen usage back below that value. Serial GC has no "minimally invasive" way to collect old generation. It is either Full GC or nothing. This is the only option for Serial, but always doing Full collections after reaching that threshold seems very heavy handed, expensive and undesirable to me (ymmv). That reaction would follow the spirit of the flag though. Maybe at the small heaps Serial GC targets, this makes sense, and full gc is not that costly anyway. It might be useful to enumerate what actions could be performed on global pressure. > - Add in the ability to uncommit memory (to reduce global memory > pressure). > The following imo outlines a compdoneletely separate idea, and should be discussed separately: > > While working through the details of this work I noted that there > appear to opportunities to offer new defaults for other settings. For > example, [...] That seems to be some more elaborate way of finding "optimal" generation size for a given heap size (which may follow from what the gc/mutator time ratio algorithm gives you). > > For Eden the guiding metric is allocation rate. For Survivor it's life > cycle (age table). For Tenured it's live set size. Using these metrics > to determine size of the parts and use that to then calculate a max > heap size has almost always yielded lower GC overheads than setting a > heap size and then letting ratios size everything. This maybe a > separate piece of work +1 > but the intent would be to have ergonomics calculate > optimal eden, survivor and tenured sizes. Each young collection is an > opportunity to resize Eden and Survivor whereas a full would be used > to resize Eden, Survivor and Tenured space. This may lead to the need > to ignore NewRatio and (the soft target) MaxGCPauseMillis. Fwiw, the only collector that observes MaxGCPauseMillis is G1; in the context of Serial GC discussed further above I am confused. Not sure if MaxGCPauseMillis would make sense in Serial GC given that you can't control Full GC pause length. Also, in the context of G1 some of the statements above are hard to understand: e.g. the text seems to imply that there is a fixed ratio between eden and survivor which isn't really the case, at least not in the sense of Serial GC. Could you elaborate? Even then, with Serial GC's fixed generation sizes fine-grained on-the-fly adaptation as somewhat suggested might be harder than usual. Not against doing all that, but it really sounds like separate work. > > As for testing. I?m currently looking at modifying HyperAlloc to add > ability to alter the shape of the load on the collector over time. > > All of this is still in it?s infancy and we?re open for guidance and > input. > > As for the work on G1, an initial patch as been submitted (URL above) > and is open for comments. > The patch does not seem to implement AHS. It implements CurrentMaxHeapSize which might be what AHS uses to set max heap size. To implement AHS for G1 roughly at least the following items need to be added/implemented/changed: * remove the use of Min/MaxHeapFreeRatio for heap sizing. These flags completely disregard cpu and heap pressure based heap sizing (should also be removed from Serial GC - this means deprecating/obsoleting this flag as soon as the last user is gone). * implement CurrentMaxHeapSize which is a (configurable) hard limit on how much the Java application may allocate (JDK-8204088) in support of AHS. As mentioned, that patch might be an initial discussion base. I do not think we need a JEP for that, but it gives you more publicity. * implement SoftMaxHeapSize in the sense of ZGC where it uses it to guide IHOP (or ZGC's equivalent). Note that I am not sure that SoftMaxHeapSize is something absolutely necessary in the context of AHS, but may be a tool. * the same background functionality as for serial: implement some mechanism to control the heap size based on the decisions of AHS; i.e. start collections to get to heap target, uncommit stuff/enqueue for uncommit etc. Currently G1 only resizes the heap during Remark and Full GC which is too limiting to follow current "memory pressure". Maybe use/update Soft/CurrentMaxHeapSize as needed so that GC compacts the heap first; this may either be in the form of JDK-8238687 which uncommits at every gc, which is probably still too limiting for an AHS system. Probably other issues will crop up along the way. * do lots of testing to weed out corner cases and hopefully not regress too much from current performance Hth, Thomas From rcastanedalo at openjdk.org Thu Sep 26 09:07:56 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 09:07:56 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5692: > 5690: > 5691: void MacroAssembler::load_klass(Register dst, Register src, Register tmp) { > 5692: BLOCK_COMMENT("load_klass"); I am not sure that the complexity of `MacroAssembler::load_klass` and the two `MacroAssembler::cmp_klass` functions warrant adding block comments, but if you prefer to leave them in, could you use opening and closing comments, as in the other functions in this file (e.g. `MacroAssembler::_verify_oop`)? In that case, please update the comment in the two `MacroAssembler::cmp_klass` functions with a more descriptive name than `cmp_klass 1` and `cmp_klass 2`. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5726: > 5724: #ifdef _LP64 > 5725: if (UseCompactObjectHeaders) { > 5726: load_nklass_compact(tmp, obj); Suggestion: assert here that `tmp != noreg`, just like in `MacroAssembler::cmp_klass(Register src, Register dst, Register tmp1, Register tmp2)` below. Perhaps also assert that the input registers are different. src/hotspot/cpu/x86/macroAssembler_x86.hpp line 379: > 377: // Uses tmp1 and tmp2 as temporary registers. > 378: void cmp_klass(Register src, Register dst, Register tmp1, Register tmp2); > 379: The naming of these two functions could be made clearer and more consistent with their documentation. Please consider renaming the four-argument `cmp_klass` function to `cmp_klasses_from_objects` or similar. The notion of "source" and "destination" in the parameter names is unclear, I suggest to just call them `obj`, `obj1`, `obj2` etc. Please also make sure that the parameter names are consistent in the declaration and definition (e.g. `dst` vs `obj`). src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008: > 4006: #ifdef COMPILER2 > 4007: if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) { > 4008: generate_string_indexof(StubRoutines::_string_indexof_array); This stub routine should be re-enabled if `UseCompactObjectHeaders` is to become non-experimental and enabled by default in the future. Is there a RFE for this task? src/hotspot/share/opto/memnode.cpp line 1976: > 1974: // The field is Klass::_prototype_header. Return its (constant) value. > 1975: assert(this->Opcode() == Op_LoadX, "must load a proper type from _prototype_header"); > 1976: return TypeX::make(klass->prototype_header()); This code is dead, because by the time we call `load_array_final_field` from `LoadNode::Value` (its only caller) we know that if `UseCompactObjectHeaders`, then `tkls->offset() != in_bytes(Klass::prototype_header_offset()` (or else we would have returned from line 2161). Please remove it, or replace it with an assertion if you prefer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776676785 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776628929 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776644021 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776663594 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776621766 From stefank at openjdk.org Thu Sep 26 09:14:34 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 26 Sep 2024 09:14:34 GMT Subject: RFR: 8340426: ZGC: Move defragment out of the allocation path In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 20:05:17 GMT, Stefan Johansson wrote: > Please review this change to move defragmentation of small pages out of the allocation path, > > **Summary** > In ZGC small pages are allocated at lower addresses while medium and large pages are allocated at higher addresses. Sometimes when memory usage is high or the distribution of pages in the cache demand it, small pages can be split out from medium or large pages. When this happens the small page is defragmented by remapping it to a lower address if it resides at a too high address. This is done to avoid fragmentation, but doing it in the allocation path comes with a cost since the remapping involves system calls. > > This change moves the remapping away from the allocation path to when the pages are freed after being garbage collected. This can increase the fragmentation a bit, since small pages are allowed to reside at higher addresses for a period time. The expectation is that since small pages are frequently collected the period should be short enough to not cause any problems. > > I've done detailed experiments with temporary JFR events to look at the cost of doing the defrag/remapping of pages. One problem is that by default we don't get a lot of defrags (unless we run out of memory), and to better investigate this I also altered the allocation path to force more defragmentation. These tests showed that moving defrag out of the allocation path not only have the positive effect of reducing the time spent allocating, but also spaced out the defragmentation a bit more. The reason for this is that now we will defrag when a page is freed by the GC, instead of when the page is allocated and the tests show that often many threads allocate at the same time (and then also defrag at the same time). This in turn leads to many concurrent calls down to the system to remap the memory, and this increases the cost of the actual mapping (seen by adding JFR events). > > **Additional testing** > > - Functional testing in mach5 tier1-7 > - Sanity performance testing in aurora Thanks for fixing this! I would like to suggest the following style changes: https://github.com/openjdk/jdk/commit/996688ae541d9fc9f88268f1d090af409c5ee65a https://github.com/openjdk/jdk/compare/master...stefank:jdk:pull/21191 My main motivation for the suggestions is to get rid of the addition of the if / else block in the `free_page[s]` functions. The addition of them lead to code duplication, non-const initialization of the local variable, a disproportionate amount of lines compared to the rest of the code, which all lead to readability taking a hit, IMHO. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21191#issuecomment-2376393758 From rcastanedalo at openjdk.org Thu Sep 26 09:54:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 09:54:57 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: <4sBfv1qLQjGZnrCuHBPuWp1PNkIDFLBjxMo3z_RR0Mo=.38e699ce-30bc-42fe-86b6-988df6700c82@github.com> On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/cpu/x86/x86_64.ad line 4388: > 4386: effect(KILL cr); > 4387: ins_cost(125); // XXX > 4388: format %{ "movl $dst, $mem\t# compressed klass ptr" %} For consistency with the aarch64 back-end: Suggestion: format %{ "load_nklass_compact $dst, $mem\t# compressed klass ptr" %} ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776747538 From rkennke at openjdk.org Thu Sep 26 11:41:53 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 26 Sep 2024 11:41:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 08:55:44 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow LM_MONITOR on 32-bit platforms > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008: > >> 4006: #ifdef COMPILER2 >> 4007: if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) { >> 4008: generate_string_indexof(StubRoutines::_string_indexof_array); > > This stub routine should be re-enabled if `UseCompactObjectHeaders` is to become non-experimental and enabled by default in the future. Is there a RFE for this task? This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in `LibraryCallKit::inline_string_indexOfI` and `generate_string_indexof_stubs()` and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0 If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776888460 From shade at openjdk.org Thu Sep 26 11:43:06 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 26 Sep 2024 11:43:06 GMT Subject: RFR: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage Message-ID: When debugging CDS, I asked for `os::print_location` before OopStorage was initialized, like an error handler would do. This is a fairly unusual situation, this is why we have not seen it before. Anyhow, we entered the new code added by [JDK-8340392](https://bugs.openjdk.org/browse/JDK-8340392), which crashed on `OopStorage` that was `nullptr`. I think we should null-check `OopStorage` before calling into it. Additional testing: - [x] OopStorageSetTest still passing - [x] Verified the check is now passing in similar debugging session ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/21204/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21204&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341015 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21204.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21204/head:pull/21204 PR: https://git.openjdk.org/jdk/pull/21204 From rcastanedalo at openjdk.org Thu Sep 26 12:16:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 12:16:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2570: > 2568: // we get the heapBase in obj, and the narrowOop+klass_offset_in_bytes/sizeof(narrowOop) in index. > 2569: // When that happens, we need to lea the address into a single register, and subtract the > 2570: // klass_offset_in_bytes, to get the address of the mark-word. Parts of this comment are obsolete after commit 2c4a7877, please update the comment. src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 882: > 880: void store_klass(Register dst, Register src); > 881: void cmp_klass(Register oop, Register trial_klass, Register tmp); > 882: void cmp_klass(Register src, Register dst, Register tmp1, Register tmp2); Same suggestion as for the analogous x86 functions: consider renaming the four-argument `cmp_klass` function to `cmp_klasses_from_objects` or similar, and the `src` and `dst` parameters to `oop1` and `oop2` or similar if there is no notion of "source" and "destination". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776927247 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776942226 From kbarrett at openjdk.org Thu Sep 26 12:19:35 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 26 Sep 2024 12:19:35 GMT Subject: RFR: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 11:36:26 GMT, Aleksey Shipilev wrote: > When debugging CDS, I asked for `os::print_location` before OopStorage was initialized, like an error handler would do. This is a fairly unusual situation, this is why we have not seen it before. Anyhow, we entered the new code added by [JDK-8340392](https://bugs.openjdk.org/browse/JDK-8340392), which crashed on `OopStorage` that was `nullptr`. I think we should null-check `OopStorage` before calling into it. > > Additional testing: > - [x] OopStorageSetTest still passing > - [x] Verified the check is now passing in similar debugging session Looks good. src/hotspot/share/gc/shared/oopStorageSet.cpp line 89: > 87: const void* aligned_addr = align_down(addr, alignof(oop)); > 88: for (OopStorage* storage : Range()) { > 89: if (storage != nullptr && storage->print_containing((oop*) aligned_addr, st)) { Add a comment? Something like "Might get here while handling error before storage initialization." ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21204#pullrequestreview-2331047091 PR Review Comment: https://git.openjdk.org/jdk/pull/21204#discussion_r1776951002 From shade at openjdk.org Thu Sep 26 12:47:47 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 26 Sep 2024 12:47:47 GMT Subject: RFR: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage [v2] In-Reply-To: References: Message-ID: > When debugging CDS, I asked for `os::print_location` before OopStorage was initialized, like an error handler would do. This is a fairly unusual situation, this is why we have not seen it before. Anyhow, we entered the new code added by [JDK-8340392](https://bugs.openjdk.org/browse/JDK-8340392), which crashed on `OopStorage` that was `nullptr`. I think we should null-check `OopStorage` before calling into it. > > Additional testing: > - [x] OopStorageSetTest still passing > - [x] Verified the check is now passing in similar debugging session Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Touchups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21204/files - new: https://git.openjdk.org/jdk/pull/21204/files/c2e276c6..73b21b46 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21204&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21204&range=00-01 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21204.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21204/head:pull/21204 PR: https://git.openjdk.org/jdk/pull/21204 From shade at openjdk.org Thu Sep 26 12:47:48 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 26 Sep 2024 12:47:48 GMT Subject: RFR: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage [v2] In-Reply-To: References: Message-ID: <-74WDew849TqHNEh7XAyT1-bSP_TkCqjY5SnEy509bY=.692c0018-a6ca-4240-b167-294975ab821a@github.com> On Thu, 26 Sep 2024 12:16:27 GMT, Kim Barrett wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Touchups > > src/hotspot/share/gc/shared/oopStorageSet.cpp line 89: > >> 87: const void* aligned_addr = align_down(addr, alignof(oop)); >> 88: for (OopStorage* storage : Range()) { >> 89: if (storage != nullptr && storage->print_containing((oop*) aligned_addr, st)) { > > Add a comment? Something like "Might get here while handling error before storage initialization." Sure thing, see new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21204#discussion_r1776997613 From tschatzl at openjdk.org Thu Sep 26 12:55:35 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 26 Sep 2024 12:55:35 GMT Subject: RFR: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage [v2] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 12:47:47 GMT, Aleksey Shipilev wrote: >> When debugging CDS, I asked for `os::print_location` before OopStorage was initialized, like an error handler would do. This is a fairly unusual situation, this is why we have not seen it before. Anyhow, we entered the new code added by [JDK-8340392](https://bugs.openjdk.org/browse/JDK-8340392), which crashed on `OopStorage` that was `nullptr`. I think we should null-check `OopStorage` before calling into it. >> >> Additional testing: >> - [x] OopStorageSetTest still passing >> - [x] Verified the check is now passing in similar debugging session > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Touchups lgtm, maybe it's worth to explicitly print an "unitialized" message ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21204#pullrequestreview-2331150552 From tschatzl at openjdk.org Thu Sep 26 12:55:35 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 26 Sep 2024 12:55:35 GMT Subject: RFR: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage [v2] In-Reply-To: <-74WDew849TqHNEh7XAyT1-bSP_TkCqjY5SnEy509bY=.692c0018-a6ca-4240-b167-294975ab821a@github.com> References: <-74WDew849TqHNEh7XAyT1-bSP_TkCqjY5SnEy509bY=.692c0018-a6ca-4240-b167-294975ab821a@github.com> Message-ID: <3suLikiPJ0bF8rh7_2xpcdrlHcQlDw0iuaN-gS1rxWs=.4c850872-1a6f-490b-b340-808a8521bf4a@github.com> On Thu, 26 Sep 2024 12:43:40 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shared/oopStorageSet.cpp line 89: >> >>> 87: const void* aligned_addr = align_down(addr, alignof(oop)); >>> 88: for (OopStorage* storage : Range()) { >>> 89: if (storage != nullptr && storage->print_containing((oop*) aligned_addr, st)) { >> >> Add a comment? Something like "Might get here while handling error before storage initialization." > > Sure thing, see new commit. Another maybe preferable option could be printing "uninitialized" or something. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21204#discussion_r1777011233 From rcastanedalo at openjdk.org Thu Sep 26 13:07:55 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 13:07:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 11:39:02 GMT, Roman Kennke wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008: >> >>> 4006: #ifdef COMPILER2 >>> 4007: if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) { >>> 4008: generate_string_indexof(StubRoutines::_string_indexof_array); >> >> This stub routine should be re-enabled if `UseCompactObjectHeaders` is to become non-experimental and enabled by default in the future. Is there a RFE for this task? > > This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in `LibraryCallKit::inline_string_indexOfI` and `generate_string_indexof_stubs()` and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0 > > If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections. I am not familiar with the `indexOf` implementation, but here is a relevant comment that motivates the assertion: https://github.com/openjdk/jdk/pull/16753#discussion_r1592774634. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777033220 From kbarrett at openjdk.org Thu Sep 26 13:13:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 26 Sep 2024 13:13:37 GMT Subject: RFR: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage [v2] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 12:47:47 GMT, Aleksey Shipilev wrote: >> When debugging CDS, I asked for `os::print_location` before OopStorage was initialized, like an error handler would do. This is a fairly unusual situation, this is why we have not seen it before. Anyhow, we entered the new code added by [JDK-8340392](https://bugs.openjdk.org/browse/JDK-8340392), which crashed on `OopStorage` that was `nullptr`. I think we should null-check `OopStorage` before calling into it. >> >> Additional testing: >> - [x] OopStorageSetTest still passing >> - [x] Verified the check is now passing in similar debugging session > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Touchups Still looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21204#pullrequestreview-2331218638 From kbarrett at openjdk.org Thu Sep 26 13:13:38 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 26 Sep 2024 13:13:38 GMT Subject: RFR: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage [v2] In-Reply-To: <3suLikiPJ0bF8rh7_2xpcdrlHcQlDw0iuaN-gS1rxWs=.4c850872-1a6f-490b-b340-808a8521bf4a@github.com> References: <-74WDew849TqHNEh7XAyT1-bSP_TkCqjY5SnEy509bY=.692c0018-a6ca-4240-b167-294975ab821a@github.com> <3suLikiPJ0bF8rh7_2xpcdrlHcQlDw0iuaN-gS1rxWs=.4c850872-1a6f-490b-b340-808a8521bf4a@github.com> Message-ID: On Thu, 26 Sep 2024 12:52:51 GMT, Thomas Schatzl wrote: >> Sure thing, see new commit. > > Another maybe preferable option could be printing "uninitialized" or something. @tschatzl I don't think printing "uninitialized" or anything like that is really appropriate here. What this code is doing is printing something if-and-only-if the pointer is found to be in an oopstorage block. There aren't any of those if there's no oopstorage yet. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21204#discussion_r1777052593 From rkennke at openjdk.org Thu Sep 26 14:00:58 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 26 Sep 2024 14:00:58 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> On Thu, 26 Sep 2024 13:04:57 GMT, Roberto Casta?eda Lozano wrote: >> This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in `LibraryCallKit::inline_string_indexOfI` and `generate_string_indexof_stubs()` and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0 >> >> If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections. > > I am not familiar with the `indexOf` implementation, but here is a relevant comment that motivates the assertion: https://github.com/openjdk/jdk/pull/16753#discussion_r1592774634. Ok, this is indeed relevant and helpful. This could segfault if we happen to read from the very first object on the heap. I can solve this by allowing to copy only 8 bytes onto the stack: https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 Does this look correct to you? Or better to do it as a follow-up? (It passes a couple of indexOf tests, will run tier1-4 on it). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777134871 From rkennke at openjdk.org Thu Sep 26 14:04:43 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 26 Sep 2024 14:04:43 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v27] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - @robcasloz review comments - Improve CollectedHeap::is_oop() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/4904d433..d48f55d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=25-26 Stats: 86 lines in 10 files changed: 20 ins; 21 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From shade at openjdk.org Thu Sep 26 14:37:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 26 Sep 2024 14:37:35 GMT Subject: RFR: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage [v2] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 12:53:24 GMT, Thomas Schatzl wrote: > lgtm, maybe it's worth to explicitly print an "unitialized" message A normal thing to do in these printers is to silently return, letting other printers to handle the location. If OopStorage does not recognize the pointer, the downstream NMT and generic SafeFetch code would try to look it up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21204#issuecomment-2377156417 From rcastanedalo at openjdk.org Thu Sep 26 16:02:55 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 16:02:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Thu, 26 Sep 2024 13:58:02 GMT, Roman Kennke wrote: > Does this look correct to you? Or better to do it as a follow-up? I do not feel confident enough to review this part. If you want to include https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777370316 From rkennke at openjdk.org Thu Sep 26 16:18:58 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 26 Sep 2024 16:18:58 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Thu, 26 Sep 2024 15:59:50 GMT, Roberto Casta?eda Lozano wrote: >> Ok, this is indeed relevant and helpful. This could segfault if we happen to read from the very first object on the heap. I can solve this by allowing to copy only 8 bytes onto the stack: https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 >> >> Does this look correct to you? Or better to do it as a follow-up? >> (It passes a couple of indexOf tests, will run tier1-4 on it). > >> Does this look correct to you? Or better to do it as a follow-up? > > I do not feel confident enough to review this part. If you want to include https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement. @sviswa7 or @asgibbons WDYT about including https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 as part of compact object headers implementation? Otherwise we would have to disable indexOf intrinsic when running with compact headers, because of the assumption that array headers are >= 16 bytes, which is no longer true with compact headers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777396409 From sgibbons at openjdk.org Thu Sep 26 17:27:50 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 26 Sep 2024 17:27:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Thu, 26 Sep 2024 16:15:39 GMT, Roman Kennke wrote: >>> Does this look correct to you? Or better to do it as a follow-up? >> >> I do not feel confident enough to review this part. If you want to include https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement. > > @sviswa7 or @asgibbons WDYT about including https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 as part of compact object headers implementation? Otherwise we would have to disable indexOf intrinsic when running with compact headers, because of the assumption that array headers are >= 16 bytes, which is no longer true with compact headers. @rkennke I reviewed [rkennke@ 097c2af](https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1) and the code looks good to me. I would prefer this approach instead of not generating the IndexOf intrinsic. Should the controlling `if` be conditioned on `UseCompactObjectHeaders` instead of `arrayOopDesc::base_offset_in_bytes`? I can see benefits to either - which provides more clarity? I like the assert as it makes the intention clear (thanks!). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777485078 From xpeng at openjdk.org Thu Sep 26 17:42:35 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 26 Sep 2024 17:42:35 GMT Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2] In-Reply-To: References: <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com> Message-ID: On Sat, 21 Sep 2024 05:52:10 GMT, Aleksey Shipilev wrote: > > > I am good with this, assuming performance runs show good results. > > > > > > Latency wise, in most time it is better than old impl. > > It is great it improves targeted tests, and it makes sense from the first principles. Run our usual performance pipeline to sanity check if this affects any other benchmarks in any meaningful way. Performance pipeline showed improvments in most Dacapo benchmarks, we did found very small regression in Dacapo Spring max latency(<1%?), tried to reproduce it with bare metal instance and can't really stably reproduce the regression, sometime better and sometime worse, it could be just noises. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2377567597 From kdnilsen at openjdk.org Thu Sep 26 17:57:36 2024 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Thu, 26 Sep 2024 17:57:36 GMT Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 18:47:50 GMT, Xiaolong Peng wrote: >> In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277)) >> >> The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget. >> >> Here the latency comparison for the optimization: >> ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0) >> >> With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip: >> >> static final int threadCount = Runtime.getRuntime().availableProcessors(); >> static final LongAdder totalCount = new LongAdder(); >> static volatile byte[] sink; >> public static void main(String[] args) { >> runAllocationTest(100000); >> } >> static void recordTimeToAllocate(final int dataSize, final Histogram histogram) { >> long startTime = System.nanoTime(); >> sink = new byte[dataSize]; >> long endTime = System.nanoTime(); >> histogram.recordValue(endTime - startTime); >> } >> >> static void runAllocationTest(final int dataSize) { >> final long endTime = System.currentTimeMillis() + 30_000; >> final CountDownLatch startSignal = new CountDownLatch(1); >> final CountDownLatch finished = new CountDownLatch(threadCount); >> final Thread[] threads = new Thread[threadCount]; >> final Histogram[] histograms = new Histogram[threadCount]; >> final Histogram totalHistogram = new Histogram(3600000000000L, 3); >> for (int i = 0; i < threadCount; i++) { >> final var histogram = new Histogram(3600000000000L, 3); >> histograms[i] = histogram; >> threads[i] = new Thread(() -> { >> wait(startSignal); >> do { >> recordTimeToAllocate(dataS... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > clean up Marked as reviewed by kdnilsen (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/21099#pullrequestreview-2332001130 From xpeng at openjdk.org Thu Sep 26 18:57:35 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Thu, 26 Sep 2024 18:57:35 GMT Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2] In-Reply-To: References: <61bsu02-sDgUFx45vjC3xAvAaDtIY405Qm8lh3Edp28=.5a1c2d42-a51c-4014-b5e3-64cbd02786ed@github.com> Message-ID: On Sat, 21 Sep 2024 05:52:10 GMT, Aleksey Shipilev wrote: >>> I am good with this, assuming performance runs show good results. >> >> Latency wise, in most time it is better than old impl. >> >> In my specific test with 8G heap on MacOS, throughput is very close to the test w/ ShenandoahPacing disabled, and about 25%~30% improvement comparing the old implementation. > >> > I am good with this, assuming performance runs show good results. >> >> Latency wise, in most time it is better than old impl. > > It is great it improves targeted tests, and it makes sense from the first principles. Run our usual performance pipeline to sanity check if this affects any other benchmarks in any meaningful way. @shipilev Need you to review it again since I pushed minor refactor and format change as per your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2377706480 From shade at openjdk.org Fri Sep 27 07:43:42 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 07:43:42 GMT Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 18:47:50 GMT, Xiaolong Peng wrote: >> In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277)) >> >> The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget. >> >> Here the latency comparison for the optimization: >> ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0) >> >> With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip: >> >> static final int threadCount = Runtime.getRuntime().availableProcessors(); >> static final LongAdder totalCount = new LongAdder(); >> static volatile byte[] sink; >> public static void main(String[] args) { >> runAllocationTest(100000); >> } >> static void recordTimeToAllocate(final int dataSize, final Histogram histogram) { >> long startTime = System.nanoTime(); >> sink = new byte[dataSize]; >> long endTime = System.nanoTime(); >> histogram.recordValue(endTime - startTime); >> } >> >> static void runAllocationTest(final int dataSize) { >> final long endTime = System.currentTimeMillis() + 30_000; >> final CountDownLatch startSignal = new CountDownLatch(1); >> final CountDownLatch finished = new CountDownLatch(threadCount); >> final Thread[] threads = new Thread[threadCount]; >> final Histogram[] histograms = new Histogram[threadCount]; >> final Histogram totalHistogram = new Histogram(3600000000000L, 3); >> for (int i = 0; i < threadCount; i++) { >> final var histogram = new Histogram(3600000000000L, 3); >> histograms[i] = histogram; >> threads[i] = new Thread(() -> { >> wait(startSignal); >> do { >> recordTimeToAllocate(dataS... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > clean up Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21099#pullrequestreview-2333007889 From axel.boldt-christmas at oracle.com Fri Sep 27 08:02:33 2024 From: axel.boldt-christmas at oracle.com (Axel Boldt-Christmas) Date: Fri, 27 Sep 2024 08:02:33 +0000 Subject: RFC: ZGC: Remove Non-Generational Mode Message-ID: <9010A225-7333-48D1-A17F-A21085175D7A@oracle.com> Hi, I have written a draft JEP for removing the non-generational mode of ZGC. The JEP description is available in JBS: https://bugs.openjdk.org/browse/JDK-8335850 Comments and feedback are welcome. // Axel Boldt-Christmas -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkennke at openjdk.org Fri Sep 27 08:27:57 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 27 Sep 2024 08:27:57 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Thu, 26 Sep 2024 17:25:06 GMT, Scott Gibbons wrote: >> @sviswa7 or @asgibbons WDYT about including https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 as part of compact object headers implementation? Otherwise we would have to disable indexOf intrinsic when running with compact headers, because of the assumption that array headers are >= 16 bytes, which is no longer true with compact headers. > > @rkennke I reviewed [rkennke@ 097c2af](https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1) and the code looks good to me. I would prefer this approach instead of not generating the IndexOf intrinsic. > > Should the controlling `if` be conditioned on `UseCompactObjectHeaders` instead of `arrayOopDesc::base_offset_in_bytes`? I can see benefits to either - which provides more clarity? I like the assert as it makes the intention clear (thanks!). I like to have the functional connection: if - for whatever reason - the array base offset is smaller than 16, we need to deal with that. The reason for this happens to be `UseCompactObjectHeaders`, but that may not be clear to the reader of the code. I could add an `assert(UseCompactObjectHeaders` in that branch to make that connection clear. Also consider that `UseCompactObjectHeaders` is intended to go away at some point. I wonder if having 2 or 3 branches ahead of the main-loop (which probably doesn't do much, because haystack is <=32 bytes) is a useful approach, or if there may be a better way to get the bytes on the stack? I don't know enough about the implementation to make that judgement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778230714 From sjohanss at openjdk.org Fri Sep 27 08:34:19 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 27 Sep 2024 08:34:19 GMT Subject: RFR: 8340426: ZGC: Move defragment out of the allocation path [v2] In-Reply-To: References: Message-ID: > Please review this change to move defragmentation of small pages out of the allocation path, > > **Summary** > In ZGC small pages are allocated at lower addresses while medium and large pages are allocated at higher addresses. Sometimes when memory usage is high or the distribution of pages in the cache demand it, small pages can be split out from medium or large pages. When this happens the small page is defragmented by remapping it to a lower address if it resides at a too high address. This is done to avoid fragmentation, but doing it in the allocation path comes with a cost since the remapping involves system calls. > > This change moves the remapping away from the allocation path to when the pages are freed after being garbage collected. This can increase the fragmentation a bit, since small pages are allowed to reside at higher addresses for a period time. The expectation is that since small pages are frequently collected the period should be short enough to not cause any problems. > > I've done detailed experiments with temporary JFR events to look at the cost of doing the defrag/remapping of pages. One problem is that by default we don't get a lot of defrags (unless we run out of memory), and to better investigate this I also altered the allocation path to force more defragmentation. These tests showed that moving defrag out of the allocation path not only have the positive effect of reducing the time spent allocating, but also spaced out the defragmentation a bit more. The reason for this is that now we will defrag when a page is freed by the GC, instead of when the page is allocated and the tests show that often many threads allocate at the same time (and then also defrag at the same time). This in turn leads to many concurrent calls down to the system to remap the memory, and this increases the cost of the actual mapping (seen by adding JFR events). > > **Additional testing** > > - Functional testing in mach5 tier1-7 > - Sanity performance testing in aurora Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: - Additional changes - StefanK review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21191/files - new: https://git.openjdk.org/jdk/pull/21191/files/1e64e361..1fe872d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21191&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21191&range=00-01 Stats: 36 lines in 2 files changed: 19 ins; 5 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/21191.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21191/head:pull/21191 PR: https://git.openjdk.org/jdk/pull/21191 From sjohanss at openjdk.org Fri Sep 27 08:34:19 2024 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 27 Sep 2024 08:34:19 GMT Subject: RFR: 8340426: ZGC: Move defragment out of the allocation path In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 20:05:17 GMT, Stefan Johansson wrote: > Please review this change to move defragmentation of small pages out of the allocation path, > > **Summary** > In ZGC small pages are allocated at lower addresses while medium and large pages are allocated at higher addresses. Sometimes when memory usage is high or the distribution of pages in the cache demand it, small pages can be split out from medium or large pages. When this happens the small page is defragmented by remapping it to a lower address if it resides at a too high address. This is done to avoid fragmentation, but doing it in the allocation path comes with a cost since the remapping involves system calls. > > This change moves the remapping away from the allocation path to when the pages are freed after being garbage collected. This can increase the fragmentation a bit, since small pages are allowed to reside at higher addresses for a period time. The expectation is that since small pages are frequently collected the period should be short enough to not cause any problems. > > I've done detailed experiments with temporary JFR events to look at the cost of doing the defrag/remapping of pages. One problem is that by default we don't get a lot of defrags (unless we run out of memory), and to better investigate this I also altered the allocation path to force more defragmentation. These tests showed that moving defrag out of the allocation path not only have the positive effect of reducing the time spent allocating, but also spaced out the defragmentation a bit more. The reason for this is that now we will defrag when a page is freed by the GC, instead of when the page is allocated and the tests show that often many threads allocate at the same time (and then also defrag at the same time). This in turn leads to many concurrent calls down to the system to remap the memory, and this increases the cost of the actual mapping (seen by adding JFR events). > > **Additional testing** > > - Functional testing in mach5 tier1-7 > - Sanity performance testing in aurora Me and StefanK discussed his proposal and did some additional changes with regards to naming and structure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21191#issuecomment-2378741094 From aboldtch at openjdk.org Fri Sep 27 09:24:35 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 27 Sep 2024 09:24:35 GMT Subject: RFR: 8340426: ZGC: Move defragment out of the allocation path [v2] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 08:34:19 GMT, Stefan Johansson wrote: >> Please review this change to move defragmentation of small pages out of the allocation path, >> >> **Summary** >> In ZGC small pages are allocated at lower addresses while medium and large pages are allocated at higher addresses. Sometimes when memory usage is high or the distribution of pages in the cache demand it, small pages can be split out from medium or large pages. When this happens the small page is defragmented by remapping it to a lower address if it resides at a too high address. This is done to avoid fragmentation, but doing it in the allocation path comes with a cost since the remapping involves system calls. >> >> This change moves the remapping away from the allocation path to when the pages are freed after being garbage collected. This can increase the fragmentation a bit, since small pages are allowed to reside at higher addresses for a period time. The expectation is that since small pages are frequently collected the period should be short enough to not cause any problems. >> >> I've done detailed experiments with temporary JFR events to look at the cost of doing the defrag/remapping of pages. One problem is that by default we don't get a lot of defrags (unless we run out of memory), and to better investigate this I also altered the allocation path to force more defragmentation. These tests showed that moving defrag out of the allocation path not only have the positive effect of reducing the time spent allocating, but also spaced out the defragmentation a bit more. The reason for this is that now we will defrag when a page is freed by the GC, instead of when the page is allocated and the tests show that often many threads allocate at the same time (and then also defrag at the same time). This in turn leads to many concurrent calls down to the system to remap the memory, and this increases the cost of the actual mapping (seen by adding JFR events). >> >> **Additional testing** >> >> - Functional testing in mach5 tier1-7 >> - Sanity performance testing in aurora > > Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: > > - Additional changes > - StefanK review lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21191#pullrequestreview-2333222238 From rkennke at openjdk.org Fri Sep 27 09:41:17 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 27 Sep 2024 09:41:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v28] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Disable TestSplitPacks::test4a, failing on aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/d48f55d6..059b1573 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=26-27 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From shade at openjdk.org Fri Sep 27 09:46:41 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 09:46:41 GMT Subject: RFR: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage [v2] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 12:47:47 GMT, Aleksey Shipilev wrote: >> When debugging CDS, I asked for `os::print_location` before OopStorage was initialized, like an error handler would do. This is a fairly unusual situation, this is why we have not seen it before. Anyhow, we entered the new code added by [JDK-8340392](https://bugs.openjdk.org/browse/JDK-8340392), which crashed on `OopStorage` that was `nullptr`. I think we should null-check `OopStorage` before calling into it. >> >> Additional testing: >> - [x] OopStorageSetTest still passing >> - [x] Verified the check is now passing in similar debugging session > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Touchups Thanks for reviews, I think this is simple enough to push on Friday. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21204#issuecomment-2378876903 From shade at openjdk.org Fri Sep 27 09:46:42 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 09:46:42 GMT Subject: Integrated: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage In-Reply-To: References: Message-ID: <0Bm-oLEAC72_g1jyOfPU8qOYWDk49HD79ZqmFKVGlaQ=.b2038275-9777-49d5-af2e-aeff6696f88e@github.com> On Thu, 26 Sep 2024 11:36:26 GMT, Aleksey Shipilev wrote: > When debugging CDS, I asked for `os::print_location` before OopStorage was initialized, like an error handler would do. This is a fairly unusual situation, this is why we have not seen it before. Anyhow, we entered the new code added by [JDK-8340392](https://bugs.openjdk.org/browse/JDK-8340392), which crashed on `OopStorage` that was `nullptr`. I think we should null-check `OopStorage` before calling into it. > > Additional testing: > - [x] OopStorageSetTest still passing > - [x] Verified the check is now passing in similar debugging session This pull request has now been integrated. Changeset: 6587909c Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/6587909c7db6482bda92d314096a2a1795900ffd Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage Reviewed-by: kbarrett, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/21204 From rcastanedalo at openjdk.org Fri Sep 27 14:35:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 27 Sep 2024 14:35:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v28] In-Reply-To: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> References: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> Message-ID: On Thu, 12 Sep 2024 15:42:59 GMT, Thomas Stuefe wrote: >> src/hotspot/share/opto/machnode.cpp line 390: >> >>> 388: t = t->make_ptr(); >>> 389: } >>> 390: if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) { >> >> Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`. > > I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP. @tstuefe @rkennke what do you think about this suggestion? If there is a known case where `t->isa_narrowklass() && !UseCompressedClassPointers` holds, it should be investigated because it might be a symptom of a larger problem. If there is no such a case, I think the explicit `UseCompressedClassPointers` test should be removed to avoid confusion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778724120 From sgibbons at openjdk.org Fri Sep 27 14:47:51 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 27 Sep 2024 14:47:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Fri, 27 Sep 2024 08:24:50 GMT, Roman Kennke wrote: >> @rkennke I reviewed [rkennke@ 097c2af](https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1) and the code looks good to me. I would prefer this approach instead of not generating the IndexOf intrinsic. >> >> Should the controlling `if` be conditioned on `UseCompactObjectHeaders` instead of `arrayOopDesc::base_offset_in_bytes`? I can see benefits to either - which provides more clarity? I like the assert as it makes the intention clear (thanks!). > > I like to have the functional connection: if - for whatever reason - the array base offset is smaller than 16, we need to deal with that. The reason for this happens to be `UseCompactObjectHeaders`, but that may not be clear to the reader of the code. I could add an `assert(UseCompactObjectHeaders` in that branch to make that connection clear. Also consider that `UseCompactObjectHeaders` is intended to go away at some point. > > I wonder if having 2 or 3 branches ahead of the main-loop (which probably doesn't do much, because haystack is <=32 bytes) is a useful approach, or if there may be a better way to get the bytes on the stack? I don't know enough about the implementation to make that judgement. I believe the code in the patch is good enough as-is, especially if `UseCompactObjectHeaders` is slated to go away. The existing `if` will prevent the < 16 byte header code from being emitted, which is the desired behavior - i.e., if the header size is >= 16, there will be no code emitted to the intrinsic for that block. So there will not be an additional branch for the code when it is executed. I'm good with a comment tying `UseCompactObjectHeaders` to the condition. The comment can be removed when the flag is removed. "Ship it" :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778739517 From xpeng at openjdk.org Fri Sep 27 15:07:39 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 27 Sep 2024 15:07:39 GMT Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 18:47:50 GMT, Xiaolong Peng wrote: >> In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277)) >> >> The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget. >> >> Here the latency comparison for the optimization: >> ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0) >> >> With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip: >> >> static final int threadCount = Runtime.getRuntime().availableProcessors(); >> static final LongAdder totalCount = new LongAdder(); >> static volatile byte[] sink; >> public static void main(String[] args) { >> runAllocationTest(100000); >> } >> static void recordTimeToAllocate(final int dataSize, final Histogram histogram) { >> long startTime = System.nanoTime(); >> sink = new byte[dataSize]; >> long endTime = System.nanoTime(); >> histogram.recordValue(endTime - startTime); >> } >> >> static void runAllocationTest(final int dataSize) { >> final long endTime = System.currentTimeMillis() + 30_000; >> final CountDownLatch startSignal = new CountDownLatch(1); >> final CountDownLatch finished = new CountDownLatch(threadCount); >> final Thread[] threads = new Thread[threadCount]; >> final Histogram[] histograms = new Histogram[threadCount]; >> final Histogram totalHistogram = new Histogram(3600000000000L, 3); >> for (int i = 0; i < threadCount; i++) { >> final var histogram = new Histogram(3600000000000L, 3); >> histograms[i] = histogram; >> threads[i] = new Thread(() -> { >> wait(startSignal); >> do { >> recordTimeToAllocate(dataS... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > clean up Thanks all for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2379487017 From duke at openjdk.org Fri Sep 27 15:07:39 2024 From: duke at openjdk.org (duke) Date: Fri, 27 Sep 2024 15:07:39 GMT Subject: RFR: 8340490: Shenandoah: Optimize ShenandoahPacer [v2] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 18:47:50 GMT, Xiaolong Peng wrote: >> In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277)) >> >> The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget. >> >> Here the latency comparison for the optimization: >> ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0) >> >> With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip: >> >> static final int threadCount = Runtime.getRuntime().availableProcessors(); >> static final LongAdder totalCount = new LongAdder(); >> static volatile byte[] sink; >> public static void main(String[] args) { >> runAllocationTest(100000); >> } >> static void recordTimeToAllocate(final int dataSize, final Histogram histogram) { >> long startTime = System.nanoTime(); >> sink = new byte[dataSize]; >> long endTime = System.nanoTime(); >> histogram.recordValue(endTime - startTime); >> } >> >> static void runAllocationTest(final int dataSize) { >> final long endTime = System.currentTimeMillis() + 30_000; >> final CountDownLatch startSignal = new CountDownLatch(1); >> final CountDownLatch finished = new CountDownLatch(threadCount); >> final Thread[] threads = new Thread[threadCount]; >> final Histogram[] histograms = new Histogram[threadCount]; >> final Histogram totalHistogram = new Histogram(3600000000000L, 3); >> for (int i = 0; i < threadCount; i++) { >> final var histogram = new Histogram(3600000000000L, 3); >> histograms[i] = histogram; >> threads[i] = new Thread(() -> { >> wait(startSignal); >> do { >> recordTimeToAllocate(dataS... > > Xiaolong Peng has updated the pull request incrementally with one additional commit since the last revision: > > clean up @pengxiaolong Your change (at version 58196a4f6f9f509525667dba1bd1fb2c2afa3e8e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21099#issuecomment-2379489972 From rkennke at openjdk.org Fri Sep 27 16:25:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 27 Sep 2024 16:25:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Fri, 27 Sep 2024 14:44:35 GMT, Scott Gibbons wrote: >> I like to have the functional connection: if - for whatever reason - the array base offset is smaller than 16, we need to deal with that. The reason for this happens to be `UseCompactObjectHeaders`, but that may not be clear to the reader of the code. I could add an `assert(UseCompactObjectHeaders` in that branch to make that connection clear. Also consider that `UseCompactObjectHeaders` is intended to go away at some point. >> >> I wonder if having 2 or 3 branches ahead of the main-loop (which probably doesn't do much, because haystack is <=32 bytes) is a useful approach, or if there may be a better way to get the bytes on the stack? I don't know enough about the implementation to make that judgement. > > I believe the code in the patch is good enough as-is, especially if `UseCompactObjectHeaders` is slated to go away. The existing `if` will prevent the < 16 byte header code from being emitted, which is the desired behavior - i.e., if the header size is >= 16, there will be no code emitted to the intrinsic for that block. So there will not be an additional branch for the code when it is executed. > > I'm good with a comment tying `UseCompactObjectHeaders` to the condition. The comment can be removed when the flag is removed. "Ship it" :-) Wait a second, I've probably not been clear. `UseCompactObjectHeaders` is slated to become *on by default* and then slated to go away. That means that array base offets <= 16 bytes will become the default. The generated code will be something like: if (haystack_len <= 8) { // Copy 8 bytes onto stack } else if (haystack_len <= 16) { // Copy 16 bytes onto stack } else { // Copy 32 bytes onto stack } So that is 2 branches in this prologue code instead of originally 1. However, I just noticed that what I proposed is not enough. Consider what happens when haystack_len is 17. This would take the last case and copy 32 bytes. But we only have 17+8=25 bytes that we can guarantee to be available for copying. If this happens to be the array at the very beginning of the heap (very rare/unlikely), this would segfault. I think I need to mull over it some more to come up with a correct fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778874906 From yzheng at openjdk.org Fri Sep 27 16:34:55 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 27 Sep 2024 16:34:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> Message-ID: On Thu, 19 Sep 2024 14:22:51 GMT, Stefan Karlsson wrote: >> We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful. > > This is my current work-in-progress code: > https://github.com/stefank/jdk/compare/pull/20677...stefank:jdk:lilliput_remove_prototype_header_wip_2 > > I've made some large rewrites and I'm currently running it through functional testing. If @stefank 's patch does not go in this PR, could you please export `Klass::_prototype_header` to JVMCI? Thanks! diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp index 9d1b8a1cb9f..e462025074f 100644 --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp @@ -278,6 +278,7 @@ nonstatic_field(Klass, _bitmap, uintx) \ nonstatic_field(Klass, _hash_slot, uint8_t) \ nonstatic_field(Klass, _misc_flags._flags, u1) \ + nonstatic_field(Klass, _prototype_header, markWord) \ \ nonstatic_field(LocalVariableTableElement, start_bci, u2) \ nonstatic_field(LocalVariableTableElement, length, u2) \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778884055 From xpeng at openjdk.org Fri Sep 27 17:08:45 2024 From: xpeng at openjdk.org (Xiaolong Peng) Date: Fri, 27 Sep 2024 17:08:45 GMT Subject: Integrated: 8340490: Shenandoah: Optimize ShenandoahPacer In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 23:32:14 GMT, Xiaolong Peng wrote: > In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget [here](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L230), all of them will forcefully claim and them wait for up to 10ms([code link](https://github.com/openjdk/jdk/blob/fdc16a373459cb2311316448c765b1bee5c73694/src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp#L239-L277)) > > The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget. > > Here the latency comparison for the optimization: > ![hdr-histogram-optimize-pacer](https://github.com/user-attachments/assets/811f48c5-87eb-462d-8b27-d15bd08be7b0) > > With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip: > > static final int threadCount = Runtime.getRuntime().availableProcessors(); > static final LongAdder totalCount = new LongAdder(); > static volatile byte[] sink; > public static void main(String[] args) { > runAllocationTest(100000); > } > static void recordTimeToAllocate(final int dataSize, final Histogram histogram) { > long startTime = System.nanoTime(); > sink = new byte[dataSize]; > long endTime = System.nanoTime(); > histogram.recordValue(endTime - startTime); > } > > static void runAllocationTest(final int dataSize) { > final long endTime = System.currentTimeMillis() + 30_000; > final CountDownLatch startSignal = new CountDownLatch(1); > final CountDownLatch finished = new CountDownLatch(threadCount); > final Thread[] threads = new Thread[threadCount]; > final Histogram[] histograms = new Histogram[threadCount]; > final Histogram totalHistogram = new Histogram(3600000000000L, 3); > for (int i = 0; i < threadCount; i++) { > final var histogram = new Histogram(3600000000000L, 3); > histograms[i] = histogram; > threads[i] = new Thread(() -> { > wait(startSignal); > do { > recordTimeToAllocate(dataSize, histogram); > } while (System.currentTimeMillis() < e... This pull request has now been integrated. Changeset: 65200a95 Author: Xiaolong Peng Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/65200a9589e46956a2194b20c4c90d003351a539 Stats: 41 lines in 3 files changed: 8 ins; 16 del; 17 mod 8340490: Shenandoah: Optimize ShenandoahPacer Reviewed-by: shade, kdnilsen ------------- PR: https://git.openjdk.org/jdk/pull/21099 From wkemper at openjdk.org Fri Sep 27 21:35:05 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 27 Sep 2024 21:35:05 GMT Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' Message-ID: Use an unsigned version of `right_n_bits`. ------------- Commit messages: - Use an unsigned variant of right_n_bits Changes: https://git.openjdk.org/jdk/pull/21236/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21236&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332697 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21236.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21236/head:pull/21236 PR: https://git.openjdk.org/jdk/pull/21236 From kirk at kodewerk.com Fri Sep 27 22:55:43 2024 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Fri, 27 Sep 2024 15:55:43 -0700 Subject: Aligning the Serial collector with ZGC In-Reply-To: <44230e90-d8bc-491f-a5c4-2c483646fc3e@oracle.com> References: <49D0FB58-94AB-46B6-A8A6-F9F08773770E@kodewerk.com> <44230e90-d8bc-491f-a5c4-2c483646fc3e@oracle.com> Message-ID: Hi Thomas, I wanted to respond to all of your comments but I thought better of it given one response deserves it?s own email. The focus is mostly on that one question. > > > > - Introduce an adaptive size policy that takes into account memory and > > CPU pressure along with global memory pressure. > > - Heap should be large enough to minimize GC overhead but not > > large enough to trigger OOM. > > (probably meant "small enough" the second time) I actually did mean large but in the context of OOM killer?. But to your point, smaller but avoid OOME is also a concern. > > > - Introduce -XX:SerialPressure=[0-100] to support this work. > > (Fwiw, regards to the other discussion, I agree that if we have a flag with the same "meaning" across collectors it might be useful to use the same name). I think we have deadly agreement on this one. > > > - introduce a smoothing algorythm to avoid excessive small > > resizes. > > One option is to split this further into parts: > > * list what actions Serial GC could do in reaction to memory pressure on an abstract level, and which make sense; from that see what functionality is needed. I built a chart some time ago and this is an expanded version of it. GC Overhead (Pause:mutator time) Allocation Pressure Global Memory Pressure Action (Eden) Action (Tenured) (full collection only) < target Low Low shrink shrink < target Low Medium shrink shrink < target Low High shrink shrink < target Medium Low hold shrink < target Medium Medium shrink shrink < target Medium High shrink shrink < target High Low shrink shrink < target High Medium shrink shrink < target High High shrink shrink ~= target Low Low hold hold ~= target Low Medium hold hold ~= target Low High shrink shrink ~= target Medium Low hold hold ~= target Medium Medium hold hold ~= target Medium High shrink shrink ~= target High Low hold hold ~= target High Medium hold hold ~= target High High shrink shrink > target Low Low expand expand > target Low Medium expand expand > target Low High hold hold > target Medium Low expand expand > target Medium Medium expand expand > target Medium High hold hold > target High Low expand expand > target High Medium expand expand > target High High hold hold Some of my thoughts used to construct the table. GC Overhead tells us if the heap is under/appropriately/over sized. Allocation pressure combined with the size of Eden drives the frequency of young generational collections Global memory pressure is an measure of the availability of memory. The goal of resizing is to hit a target GC Overhead threshold without risking either OutOfMemoryError or the OOM killer. Reducing Full GC activity requires one to provide enough tenured space to hold the Live Data Set (LDS) as well as minimizing the promotion of transients. Partial GC frequency is a function of the size of Eden and the allocation pressure. Controlling GC frequency is key to controlling the rate at which transients are promoted. On Heap sizing. Tenured maybe resized at the end of a tenured (full) collection. Eden and Survivor maybe resized at the end of either a tenured or partial collection. The size of Eden, Survivor and Tenured will be decided separately. Overall logic is the heap should have as much memory as it needs for the GC to run within overhead targets. The live set size is used to determine the size of tenured. The heuristic is that tenured should be 1.5 to 2x * LDS. Tenured should be expanded or shrunk to meet this ratio. Expansion should only happen when there is memory to support it. The decision to resize young is based on; is the GC overhead target being met the strength of the allocation pressure the availability of global memory Meeting the GC overhead target indicates that the heap is appropriately sized. Under this condition there is no pressure to resize unless there is a shortage of global memory. If this is the case, there should be a balance made between being a good neighbour by releasing memory and the risk/costs of higher GC overhead. Having GC overhead being under target is an indication that the heap is oversized. In this case it should be safe to reduce the heap size and release memory back to OS. Having GC overhead be higher than the target indicates that heap is undersized. In this case heap (and likely Eden in particular) should be expanded assuming there is enough global memory to support the expansion without risking an OOM killer event. Allocation pressure combined with the size of Eden sets GC frequency. High GC frequency tends to drive up GC overhead. If allocation pressure is high and GC overhead is high then increasing the size of Eden should reduce GC overhead. Having both allocation pressure and GC overhead be low provides and opportunity to reduce heap size and return memory. All of the resizing decisions need to be moderated by the availability of (global) memory. If global memory is scarce, then the decision should favour releasing (uncommitting) memory. This may come at the expense of higher GC overhead. Resizing to smaller pool sizes is not without risk and in the case of young, both high global memory pressure and high allocation pressure add to the risk. > > * provide functionality that tries to keep some kind of GC/mutator time ratio; I would start with looking at G1 does because Serial GC's behaviour is probably closer to G1 than ZGC, but ymmv. > (Obviously improvements are welcome :)) I would agree. > > (This may not need to be exposed externally like some GCTimeRatio/GCCPUPercentage/whatever flag name) > > * add functionality to calculate memory pressure from the environment; maybe in a containerized environment from a manageable flag as it does not have a global "pressure" view. This could probably taken from ZGC, at least partially This is but one area where we are looking to ?borrow? from. > > * some transfer function that translates this external memory pressure, based on "GCPressure", (e.g. that "sigmoid" function plus lots of magic numbers) to reaction in the gc: e.g. change the gc/mutator pause time goal, start collections, uncommit memory... We prototyped our own smoothing function but I?d defer to the sigmoid function as I?d prefer to share where ever possible. > > * (probably) some background thread that continuously calculates and reacts on global pressure (uncommit memory, do a gc, resize heap, ...) because one probably does not want to wait for the next gc to react... I?ve been trying to avoid an extra background thread and try to backload the work on the GC thread but I also recognize that an extra thread maybe necessary. > > * do lots of testing to weed out corner cases > > > - Introduce manageable flag SoftMaxHeapSize to define a target heap > > size nd set the default max heap size to 100% of available. > > I am a bit torn about SoftMaxHeapSize in Serial GC. What do you envision that Serial GC would do when the SoftMaxHeapSize has been reached, and what if old gen occupancy permanently stays above that value? At the moment, SoftMaxHeapSize is an implementation in Z. I?d first like to pull a (rough) spec out of the implementation and then try to answer your question. It?s currently not clear to me how this should work with any collector. > > The usefulness of SoftMaxHeapSize kind of relies on having a minimally invasive old gen collection that tries to get old gen usage back below that value. Well, the LDS is what it is and running a speculative collection would likely clean up (prematurely) promoted transients? but that?s about it. Whereas it would clean both transients and floating garbage for the concurrent collectors. I?m not at fan of speculative collections given all of the time I?ve spent getting rid of them :-) IMO, a DGC triggered full collections was rarely necessary (all overhead with very little return). This also applied to the G1 patch that speculatively ran to counter to-space overflows and it also applied to running a young gen prior to remark with CMS collector. Long story sort, loads of extra overhead with very little to no payback. > > Serial GC has no "minimally invasive" way to collect old generation. It is either Full GC or nothing. This is the only option for Serial, but always doing Full collections after reaching that threshold seems very heavy handed, expensive and undesirable to me (ymmv). > > That reaction would follow the spirit of the flag though. > > Maybe at the small heaps Serial GC targets, this makes sense, and full gc is not that costly anyway. Yeah, for small heap this shouldn?t be a big deal. But this is one of the reasons why I believe we should treat young and old separately. We can cheaply and safely return memory from young gen and leave the sizing of tenured to when a full is really needed. I grant you that this may not be very timely but I?m not sure that we need this to happen on demand? I think we can wait for natural cycles to take their course. But, maybe I?m wrong on this point. We plan to experiment with this. > > It might be useful to enumerate what actions could be performed on global pressure. That?s in the table? > > > - Add in the ability to uncommit memory (to reduce global memory > > pressure). > > > > The following imo outlines a compdoneletely separate idea, and should be discussed separately: > > > > > While working through the details of this work I noted that there > > appear to opportunities to offer new defaults for other settings. For > > example, [...] > > That seems to be some more elaborate way of finding "optimal" generation size for a given heap size (which may follow from what the gc/mutator time ratio algorithm gives you). I?m trying to apply my years of experience tuning 100s of collectors across 100s of applications. > > > > > For Eden the guiding metric is allocation rate. For Survivor it's life > > cycle (age table). For Tenured it's live set size. Using these metrics > > to determine size of the parts and use that to then calculate a max > > heap size has almost always yielded lower GC overheads than setting a > > heap size and then letting ratios size everything. This maybe a > > separate piece of work > > +1 > > > but the intent would be to have ergonomics calculate > > optimal eden, survivor and tenured sizes. Each young collection is an > > opportunity to resize Eden and Survivor whereas a full would be used > > to resize Eden, Survivor and Tenured space. This may lead to the need > > to ignore NewRatio and (the soft target) MaxGCPauseMillis. > > Fwiw, the only collector that observes MaxGCPauseMillis is G1; in the context of Serial GC discussed further above I am confused. > > Not sure if MaxGCPauseMillis would make sense in Serial GC given that you can't control Full GC pause length. Agreed. Sizing in Serial is currently controlled by the number of non-daemon threads and that rarely changes. This implies that pause times are loosely a function of load and LDS size. > > Also, in the context of G1 some of the statements above are hard to understand: e.g. the text seems to imply that there is a fixed ratio between eden and survivor which isn't really the case, at least not in the sense of Serial GC. Sorry for the confusion, I wasn?t trying to imply that the ratio is fixed. I was trying to do was introduce better default start settings When I?m tuning I tend to set the young to tenured ratio to 1 and then set the survivor ratio to 2. This allows me to collect as clean a signal from the collector as it possible. I would then make adjustments from that starting point. If we want to resize then I believe that this starting point would give ergonomics a better chance to stabilize at a more optimal place. > > Could you elaborate? > > Even then, with Serial GC's fixed generation sizes fine-grained on-the-fly adaptation as somewhat suggested might be harder than usual. > > Not against doing all that, but it really sounds like separate work. I believe it might be as it feels like to falls into the category of auto-tuning. > > > > > As for testing. I?m currently looking at modifying HyperAlloc to add > > ability to alter the shape of the load on the collector over time. > > > > All of this is still in it?s infancy and we?re open for guidance and > > input. > > > > As for the work on G1, an initial patch as been submitted (URL above) > > and is open for comments. > > > > The patch does not seem to implement AHS. It implements CurrentMaxHeapSize which might be what AHS uses to set max heap size. > > To implement AHS for G1 roughly at least the following items need to be added/implemented/changed: > > * remove the use of Min/MaxHeapFreeRatio for heap sizing. These flags completely disregard cpu and heap pressure based heap sizing (should also be removed from Serial GC - this means deprecating/obsoleting this flag as soon as the last user is gone). > > * implement CurrentMaxHeapSize which is a (configurable) hard limit on how much the Java application may allocate (JDK-8204088) in support of AHS. As mentioned, that patch might be an initial discussion base. > I do not think we need a JEP for that, but it gives you more publicity. > > * implement SoftMaxHeapSize in the sense of ZGC where it uses it to guide IHOP (or ZGC's equivalent). Note that I am not sure that SoftMaxHeapSize is something absolutely necessary in the context of AHS, but may be a tool. > > * the same background functionality as for serial: implement some mechanism to control the heap size based on the decisions of AHS; i.e. start collections to get to heap target, uncommit stuff/enqueue for uncommit etc. > > Currently G1 only resizes the heap during Remark and Full GC which is too limiting to follow current "memory pressure". Maybe use/update Soft/CurrentMaxHeapSize as needed so that GC compacts the heap first; this may either be in the form of JDK-8238687 which uncommits at every gc, which is probably still too limiting for an AHS system. Got it, I think we?re aiming to get all of this done it?s just not written here. But I appreciate the list as it?s helpful. > > Probably other issues will crop up along the way. > > * do lots of testing to weed out corner cases and hopefully not regress too much from current performance I?m hoping that instead of regressing, we reduce GC interference. And happy to avoid a JEP but also happy to write one if it?s really needed.. and I don?t need nor want more publicity but thanks for the warning. ;-) Kind regards, Kirk -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkemper at openjdk.org Fri Sep 27 23:26:33 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 27 Sep 2024 23:26:33 GMT Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' [v2] In-Reply-To: References: Message-ID: > Use an unsigned version of `right_n_bits`. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Use template to match type of subtrahend and minuend ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21236/files - new: https://git.openjdk.org/jdk/pull/21236/files/4e33d52f..a3fb5858 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21236&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21236&range=00-01 Stats: 29 lines in 3 files changed: 4 ins; 3 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/21236.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21236/head:pull/21236 PR: https://git.openjdk.org/jdk/pull/21236 From wkemper at openjdk.org Fri Sep 27 23:39:15 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 27 Sep 2024 23:39:15 GMT Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' [v3] In-Reply-To: References: Message-ID: > Use a template version of `right_n_bits` to use the same type for minuend and subtrahend. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Fix comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21236/files - new: https://git.openjdk.org/jdk/pull/21236/files/a3fb5858..97d1272b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21236&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21236&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21236.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21236/head:pull/21236 PR: https://git.openjdk.org/jdk/pull/21236 From kbarrett at openjdk.org Sat Sep 28 05:24:42 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 28 Sep 2024 05:24:42 GMT Subject: RFR: 8340945: Ubsan: oopStorage.cpp:374:8: runtime error: applying non-zero offset 18446744073709551168 to null pointer Message-ID: <1uTdfa0c-Nrk1-6t0IPbYRmwy9URAHyz7-V4YHx5hDo=.377970e5-1b31-4658-94fc-e232bfbdec3b@github.com> Please review this change to the OopStorage handling of storage block lookup, now being more careful about pointer arithmetic to avoid UB. As an initial cleanup, renamed OopStorage::find_block_or_null to block_for_ptr, for consistency with the Block function that implements it. Also moved the precondition assert that the argument is non-null into the Block function, where the requirement is located. Changed OopStorage::Block::block_for_ptr to avoid pointer arithmetic that might invoke UB, instead converting the pointer argument to uintptr_t and performing arithmetic on it. Also fixed its description in the header file. Similarly changed OopStorage::Block::active_index_safe to avoid pointer arithmetic, instead converting to uintptr_t for arithmetic. This avoids potential problems when the Block argument is a "false positive" from block_for_ptr. Changed OopStorage::allocation_status to check up front for a null argument, immediately returning INVALID_ENTRY in that case. This avoids voilating block_for_ptr's precondition that the argument is non-null. Added a gtest for this. Also added a gtest for the potential false-positive case. While updating gtests, removed #ifndef DISABLE_GARBAGE_ALLOCATION_STATUS_TESTS. That macro was included when these tests were first added, because some tests needed to be disabled on Windows, due to SafeFetchN in gtest context not working on that platform. That was later fixed by JDK-8185734. The conditional #define of that macro in test_oopStorage.cpp was removed, but the no longer needed #ifndef was inadvertently not removed. Testing: mach5 tier1-5 Locally (linux-x64) reproduced the reported ubsan failure, and verified it no longer reproduces with these changes. While working on this change I noticed a related issue. The recently added OopStorage::print_containing doesn't verify the block is not a false positive before using it as a block. I'll file a JBS issue for this. ------------- Commit messages: - be more careful Changes: https://git.openjdk.org/jdk/pull/21240/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21240&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340945 Stats: 69 lines in 4 files changed: 37 ins; 4 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/21240.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21240/head:pull/21240 PR: https://git.openjdk.org/jdk/pull/21240 From fjiang at openjdk.org Sat Sep 28 11:55:45 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 28 Sep 2024 11:55:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v23] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 07:57:15 GMT, Roberto Casta?eda Lozano wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: >> >> - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms >> - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Restore some asserts >> - Default values for tmp regs of G1PostBarrierStubC2 >> - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 >> - 8330685: [arm32] share barrier spilling logic > > Thanks for the arm 32-bits port @snazarkin! Merged in commit 3957c03f. > Besides the arm 32-bits port, @snazarkin's changeset includes adding the possibility to use a third temporary register in the platform-independent class `G1PostBarrierStubC2`. This temporary register (`G1PostBarrierStubC2::_tmp3`) is initialized to `noreg` by default in `G1PostBarrierStubC2::initialize_registers`, so no other platform should be affected. Hi @robcasloz, riscv port cleanup is available at https://github.com/feilongjiang/jdk/commit/1297f6086e1de62196e2acddf2f7c86a29619dd7, would you please help to apply it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2380614984 From rsunderbabu at openjdk.org Sun Sep 29 08:39:05 2024 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Sun, 29 Sep 2024 08:39:05 GMT Subject: RFR: 8211400: nsk.share.gc.Memory::getArrayLength returns wrong value Message-ID: Current formula is incorrect since array doesn't use reference for each element. Tested with test groups, vmTestbase_vm_gc_ref vmTestbase_vm_gc_juggle vmTestbase_vm_gc_misc ------------- Commit messages: - 8211400: nsk.share.gc.Memory::getArrayLength returns wrong value Changes: https://git.openjdk.org/jdk/pull/21247/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21247&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8211400 Stats: 7 lines in 1 file changed: 0 ins; 3 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21247/head:pull/21247 PR: https://git.openjdk.org/jdk/pull/21247 From phh at openjdk.org Sun Sep 29 21:01:34 2024 From: phh at openjdk.org (Paul Hohensee) Date: Sun, 29 Sep 2024 21:01:34 GMT Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' [v3] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 23:39:15 GMT, William Kemper wrote: >> Use a template version of `right_n_bits` to use the same type for minuend and subtrahend. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments Rather than define a new method get_right_n_bits(), why not just replace the definition of right_n_bits() in globalDefinitions.hpp? The C++ compiler will inline and optimize both. ------------- PR Review: https://git.openjdk.org/jdk/pull/21236#pullrequestreview-2335997131 From kbarrett at openjdk.org Mon Sep 30 01:42:45 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 30 Sep 2024 01:42:45 GMT Subject: RFR: 8211400: nsk.share.gc.Memory::getArrayLength returns wrong value In-Reply-To: References: Message-ID: On Sun, 29 Sep 2024 08:33:31 GMT, Ramkumar Sunderbabu wrote: > Current formula is incorrect since array doesn't use reference for each element. > > Tested with test groups, > vmTestbase_vm_gc_ref > vmTestbase_vm_gc_juggle > vmTestbase_vm_gc_misc I've never looked at this file before. Wow! Several problems spotted on just brief skimming! But out of scope for this specific issue. test/hotspot/jtreg/vmTestbase/nsk/share/gc/Memory.java line 166: > 164: */ > 165: public static long getArraySize(int length, long objectSize) { > 166: return getObjectExtraSize() + length * objectSize; pre-existing: Shouldn't that be getArrayExtraSize()? ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21247#pullrequestreview-2336196756 PR Review Comment: https://git.openjdk.org/jdk/pull/21247#discussion_r1780306207 From rcastanedalo at openjdk.org Mon Sep 30 05:02:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 05:02:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion - riscv port refactor - Remove temporary support code - Merge jdk-24+17 - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes - Merge jdk-24+16 - Ensure that detected encode-and-store patterns are matched - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - ... and 43 more: https://git.openjdk.org/jdk/compare/8ee5f762...14483b83 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/6fb36e50..14483b83 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=25-26 Stats: 19042 lines in 408 files changed: 13042 ins; 3680 del; 2320 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From aboldtch at openjdk.org Mon Sep 30 06:22:46 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 30 Sep 2024 06:22:46 GMT Subject: RFR: 8340419: ZGC: Create an UseLargePages adaptation of TestAllocateHeapAt.java In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:28:15 GMT, Axel Boldt-Christmas wrote: > [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127 disables TestAllocateHeapAt.java when running with persistent hugepages because it makes assumptions on the underlying file systems. > > I propose creating a version of this tests which instead first checks if there is an appropriate mount point for a persistent hugepages heap file, and only runs the test if exists. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21128#issuecomment-2382205091 From aboldtch at openjdk.org Mon Sep 30 06:22:47 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 30 Sep 2024 06:22:47 GMT Subject: Integrated: 8340419: ZGC: Create an UseLargePages adaptation of TestAllocateHeapAt.java In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:28:15 GMT, Axel Boldt-Christmas wrote: > [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127 disables TestAllocateHeapAt.java when running with persistent hugepages because it makes assumptions on the underlying file systems. > > I propose creating a version of this tests which instead first checks if there is an appropriate mount point for a persistent hugepages heap file, and only runs the test if exists. This pull request has now been integrated. Changeset: 6514aef8 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/6514aef8403fa5fc09e5c064a783ff0f1fccd0cf Stats: 91 lines in 1 file changed: 91 ins; 0 del; 0 mod 8340419: ZGC: Create an UseLargePages adaptation of TestAllocateHeapAt.java Reviewed-by: stefank, sjohanss, jsikstro ------------- PR: https://git.openjdk.org/jdk/pull/21128 From rcastanedalo at openjdk.org Mon Sep 30 07:59:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 07:59:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v23] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 07:57:15 GMT, Roberto Casta?eda Lozano wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: >> >> - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms >> - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Restore some asserts >> - Default values for tmp regs of G1PostBarrierStubC2 >> - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 >> - 8330685: [arm32] share barrier spilling logic > > Thanks for the arm 32-bits port @snazarkin! Merged in commit 3957c03f. > Besides the arm 32-bits port, @snazarkin's changeset includes adding the possibility to use a third temporary register in the platform-independent class `G1PostBarrierStubC2`. This temporary register (`G1PostBarrierStubC2::_tmp3`) is initialized to `noreg` by default in `G1PostBarrierStubC2::initialize_registers`, so no other platform should be affected. > Hi @robcasloz, riscv port cleanup is available at [feilongjiang at 1297f60](https://github.com/feilongjiang/jdk/commit/1297f6086e1de62196e2acddf2f7c86a29619dd7), would you please help to apply it? Done (commit 14483b83), thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382377364 From rcastanedalo at openjdk.org Mon Sep 30 08:24:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 08:24:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/60c13deb...14483b83 I just updated to jdk-24+17 (commit bda4ab21) and removed the temporary support code guarded by `G1_LATE_BARRIER_MIGRATION_SUPPORT` (commit 55a1f621). The current changeset passes all tests specified in the pull request [description](https://github.com/openjdk/jdk/pull/19746#issue-2356905813) and yields benchmark results similar to those of the original submission. @albertnetymk @vnkozlov @tschatzl @kimbarrett could you please re-review? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382431347 From tschatzl at openjdk.org Mon Sep 30 08:31:37 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Sep 2024 08:31:37 GMT Subject: RFR: 8340945: Ubsan: oopStorage.cpp:374:8: runtime error: applying non-zero offset 18446744073709551168 to null pointer In-Reply-To: <1uTdfa0c-Nrk1-6t0IPbYRmwy9URAHyz7-V4YHx5hDo=.377970e5-1b31-4658-94fc-e232bfbdec3b@github.com> References: <1uTdfa0c-Nrk1-6t0IPbYRmwy9URAHyz7-V4YHx5hDo=.377970e5-1b31-4658-94fc-e232bfbdec3b@github.com> Message-ID: On Sat, 28 Sep 2024 05:20:23 GMT, Kim Barrett wrote: > Please review this change to the OopStorage handling of storage block lookup, > now being more careful about pointer arithmetic to avoid UB. > > As an initial cleanup, renamed OopStorage::find_block_or_null to > block_for_ptr, for consistency with the Block function that implements it. > Also moved the precondition assert that the argument is non-null into the > Block function, where the requirement is located. > > Changed OopStorage::Block::block_for_ptr to avoid pointer arithmetic that > might invoke UB, instead converting the pointer argument to uintptr_t and > performing arithmetic on it. Also fixed its description in the header file. > > Similarly changed OopStorage::Block::active_index_safe to avoid pointer > arithmetic, instead converting to uintptr_t for arithmetic. This avoids > potential problems when the Block argument is a "false positive" from > block_for_ptr. > > Changed OopStorage::allocation_status to check up front for a null argument, > immediately returning INVALID_ENTRY in that case. This avoids voilating > block_for_ptr's precondition that the argument is non-null. Added a gtest for > this. Also added a gtest for the potential false-positive case. > > While updating gtests, removed #ifndef DISABLE_GARBAGE_ALLOCATION_STATUS_TESTS. > That macro was included when these tests were first added, because some tests > needed to be disabled on Windows, due to SafeFetchN in gtest context not working > on that platform. That was later fixed by JDK-8185734. The conditional #define > of that macro in test_oopStorage.cpp was removed, but the no longer needed > #ifndef was inadvertently not removed. > > Testing: mach5 tier1-5 > Locally (linux-x64) reproduced the reported ubsan failure, and verified it no > longer reproduces with these changes. > > While working on this change I noticed a related issue. The recently added > OopStorage::print_containing doesn't verify the block is not a false positive > before using it as a block. I'll file a JBS issue for this. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21240#pullrequestreview-2336748455 From rsunderbabu at openjdk.org Mon Sep 30 08:52:11 2024 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Mon, 30 Sep 2024 08:52:11 GMT Subject: RFR: 8211400: nsk.share.gc.Memory::getArrayLength returns wrong value [v2] In-Reply-To: References: Message-ID: > Current formula is incorrect since array doesn't use reference for each element. > > Tested with test groups, > vmTestbase_vm_gc_ref > vmTestbase_vm_gc_juggle > vmTestbase_vm_gc_misc Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: review comment fix on getArraySize method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21247/files - new: https://git.openjdk.org/jdk/pull/21247/files/eb3dcde5..d26624f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21247&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21247&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21247/head:pull/21247 PR: https://git.openjdk.org/jdk/pull/21247 From kbarrett at openjdk.org Mon Sep 30 09:42:35 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 30 Sep 2024 09:42:35 GMT Subject: RFR: 8211400: nsk.share.gc.Memory::getArrayLength returns wrong value [v2] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 08:52:11 GMT, Ramkumar Sunderbabu wrote: >> Current formula is incorrect since array doesn't use reference for each element. >> >> Tested with test groups, >> vmTestbase_vm_gc_ref >> vmTestbase_vm_gc_juggle >> vmTestbase_vm_gc_misc > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > review comment fix on getArraySize method Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21247#pullrequestreview-2336922963 From tschatzl at openjdk.org Mon Sep 30 09:50:35 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Sep 2024 09:50:35 GMT Subject: RFR: 8211400: nsk.share.gc.Memory::getArrayLength returns wrong value [v2] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 08:52:11 GMT, Ramkumar Sunderbabu wrote: >> Current formula is incorrect since array doesn't use reference for each element. >> >> Tested with test groups, >> vmTestbase_vm_gc_ref >> vmTestbase_vm_gc_juggle >> vmTestbase_vm_gc_misc > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > review comment fix on getArraySize method Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21247#pullrequestreview-2336937758 From duke at openjdk.org Mon Sep 30 09:50:35 2024 From: duke at openjdk.org (duke) Date: Mon, 30 Sep 2024 09:50:35 GMT Subject: RFR: 8211400: nsk.share.gc.Memory::getArrayLength returns wrong value [v2] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 08:52:11 GMT, Ramkumar Sunderbabu wrote: >> Current formula is incorrect since array doesn't use reference for each element. >> >> Tested with test groups, >> vmTestbase_vm_gc_ref >> vmTestbase_vm_gc_juggle >> vmTestbase_vm_gc_misc > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > review comment fix on getArraySize method @rsunderbabu Your change (at version d26624f56ca8817a4f0de5eb105a3d0e1442c7aa) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21247#issuecomment-2382646344 From tschatzl at openjdk.org Mon Sep 30 10:04:51 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Sep 2024 10:04:51 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: <9o1uzat6Ap4MIn7o6xZhwXKaHsgMRNi_2qyvpcjAlIQ=.311f7fe0-45be-4f39-a28d-3fc1d5ea5471@github.com> On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/55c0ecf8...14483b83 Still seems good. Mostly only looked at the changes in the GC directory and the barrier code themselves as I do not feel enabled to comment too much on other (compiler) changes. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2336972915 From rcastanedalo at openjdk.org Mon Sep 30 11:33:47 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 11:33:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: <9o1uzat6Ap4MIn7o6xZhwXKaHsgMRNi_2qyvpcjAlIQ=.311f7fe0-45be-4f39-a28d-3fc1d5ea5471@github.com> References: <9o1uzat6Ap4MIn7o6xZhwXKaHsgMRNi_2qyvpcjAlIQ=.311f7fe0-45be-4f39-a28d-3fc1d5ea5471@github.com> Message-ID: On Mon, 30 Sep 2024 10:02:17 GMT, Thomas Schatzl wrote: > Still seems good. > > Mostly only looked at the changes in the GC directory and the barrier code themselves as I do not feel enabled to comment too much on other (compiler) changes. Thanks for re-reviewing, Thomas! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382930857 From fyang at openjdk.org Mon Sep 30 11:53:52 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 30 Sep 2024 11:53:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/dede1992...14483b83 Updated RISC-V part of the change looks good to me. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2337279856 From rcastanedalo at openjdk.org Mon Sep 30 12:06:48 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 12:06:48 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 11:51:02 GMT, Fei Yang wrote: > Updated RISC-V part of the change looks good to me. Thanks, Fei! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382997964 From rcastanedalo at openjdk.org Mon Sep 30 12:40:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 12:40:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:20:14 GMT, Emanuel Peter wrote: > Indeed, I could re-enable all tests in: > > ``` > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > ``` > > but unfortunately not those others: > > ``` > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > ``` > > I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset. > > I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it. @rkennke A test run of the current changeset in our internal CI system revealed that the following tests fail (because of missing vectorization) when using `-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:UseSSE=N` with `N <= 3` on an Intel Xeon Platinum 8358 machine: - test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java - test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java - test/hotspot/jtreg/compiler/vectorization/runner/LoopCombinedOpTest.java Here are the failure details: test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java: 1) Method "public static void compiler.c2.irTests.TestVectorizationNotRun.test(byte[],long[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java: 1) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteByte1(byte[],byte[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 2) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteByte2(byte[],byte[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 3) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong1(byte[],long[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 4) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong2(byte[],long[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 5) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong3(byte[],long[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 6) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong5(byte[],long[],int,int)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! test/hotspot/jtreg/compiler/vectorization/runner/LoopCombinedOpTest.java: 1) Method "public int[] compiler.vectorization.runner.LoopCombinedOpTest.multipleOpsWith2DifferentTypesAndComplexExpression()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "sse2", "true"}, counts={"_#STORE_VECTOR#_", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 2) Method "public int[] compiler.vectorization.runner.LoopCombinedOpTest.multipleOpsWith2DifferentTypesAndInvariant()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "sse2", "true"}, counts={"_#STORE_VECTOR#_", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2383072505 From rsunderbabu at openjdk.org Mon Sep 30 13:46:39 2024 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Mon, 30 Sep 2024 13:46:39 GMT Subject: Integrated: 8211400: nsk.share.gc.Memory::getArrayLength returns wrong value In-Reply-To: References: Message-ID: On Sun, 29 Sep 2024 08:33:31 GMT, Ramkumar Sunderbabu wrote: > Current formula is incorrect since array doesn't use reference for each element. > > Tested with test groups, > vmTestbase_vm_gc_ref > vmTestbase_vm_gc_juggle > vmTestbase_vm_gc_misc This pull request has now been integrated. Changeset: 860d49db Author: Ramkumar Sunderbabu Committer: Kim Barrett URL: https://git.openjdk.org/jdk/commit/860d49db22cf352eaf1b3b20fff43d090f0eebc8 Stats: 7 lines in 1 file changed: 0 ins; 3 del; 4 mod 8211400: nsk.share.gc.Memory::getArrayLength returns wrong value Reviewed-by: kbarrett, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/21247 From shade at openjdk.org Mon Sep 30 15:00:10 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 15:00:10 GMT Subject: RFR: 8340183: Shenandoah: LRB node is not matched as GC barrier after JDK-8340183 Message-ID: [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) introduced a regression: `ShenandoahBarrierSetC2::is_gc_barrier_node` is now answering `false` for the actual `ShenandoahLoadReferenceBarrierNode`. The fix reinstates the check for LRB node. Additional testing: - [ ] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` - [x] Linux x86_64 server fastdebug, `jdk/jfr/api/consumer/ ` (100x) -- used to fail intermittently, now it does not ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/21266/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21266&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340183 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21266.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21266/head:pull/21266 PR: https://git.openjdk.org/jdk/pull/21266 From rkennke at openjdk.org Mon Sep 30 15:18:35 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 30 Sep 2024 15:18:35 GMT Subject: RFR: 8341242: Shenandoah: LRB node is not matched as GC barrier after JDK-8340183 In-Reply-To: References: Message-ID: <_pdzl3TfvgJVVZUL9VKDAsUIaulvTgYg7FcKzuAGATg=.4e7ce808-9ff9-4815-a016-08c81dfc1272@github.com> On Mon, 30 Sep 2024 14:54:30 GMT, Aleksey Shipilev wrote: > [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) introduced a regression: `ShenandoahBarrierSetC2::is_gc_barrier_node` is now answering `false` for the actual `ShenandoahLoadReferenceBarrierNode`. The fix reinstates the check for LRB node. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `jdk/jfr/api/consumer/ ` (100x) -- used to fail intermittently, now it does not Looks good! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21266#pullrequestreview-2337883746 From phh at openjdk.org Mon Sep 30 16:40:35 2024 From: phh at openjdk.org (Paul Hohensee) Date: Mon, 30 Sep 2024 16:40:35 GMT Subject: RFR: 8341242: Shenandoah: LRB node is not matched as GC barrier after JDK-8340183 In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 14:54:30 GMT, Aleksey Shipilev wrote: > [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) introduced a regression: `ShenandoahBarrierSetC2::is_gc_barrier_node` is now answering `false` for the actual `ShenandoahLoadReferenceBarrierNode`. The fix reinstates the check for LRB node. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `jdk/jfr/api/consumer/ ` (100x) -- used to fail intermittently, now it does not Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21266#pullrequestreview-2338076905 From shade at openjdk.org Mon Sep 30 16:50:46 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 16:50:46 GMT Subject: RFR: 8341242: Shenandoah: LRB node is not matched as GC barrier after JDK-8340183 In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 14:54:30 GMT, Aleksey Shipilev wrote: > [JDK-8340183](https://bugs.openjdk.org/browse/JDK-8340183) introduced a regression: `ShenandoahBarrierSetC2::is_gc_barrier_node` is now answering `false` for the actual `ShenandoahLoadReferenceBarrierNode`. The fix reinstates the check for LRB node. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `jdk/jfr/api/consumer/ ` (100x) -- used to fail intermittently, now it does not Thanks! Trivial, right? Restores the code to previous shape. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21266#issuecomment-2383696926 From kvn at openjdk.org Mon Sep 30 16:59:59 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 30 Sep 2024 16:59:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/ae84aa47...14483b83 Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2338111198 From wkemper at openjdk.org Mon Sep 30 17:01:36 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 30 Sep 2024 17:01:36 GMT Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' [v3] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 23:39:15 GMT, William Kemper wrote: >> Use a template version of `right_n_bits` to use the same type for minuend and subtrahend. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments I tried that, but there is a warning at the macro declaration: // (note: #define used only so that they can be used in enum constant definitions) #define nth_bit(n) (((n) >= BitsPerWord) ? 0 : (OneBit << (n))) #define right_n_bits(n) (nth_bit(n) - 1) There are many usages of `right_n_bits` that use an unnamed enum constant, which will not accept a cast from a numeric type. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21236#issuecomment-2383717052 From rkennke at openjdk.org Mon Sep 30 17:50:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 30 Sep 2024 17:50:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Fri, 27 Sep 2024 16:23:15 GMT, Roman Kennke wrote: >> I believe the code in the patch is good enough as-is, especially if `UseCompactObjectHeaders` is slated to go away. The existing `if` will prevent the < 16 byte header code from being emitted, which is the desired behavior - i.e., if the header size is >= 16, there will be no code emitted to the intrinsic for that block. So there will not be an additional branch for the code when it is executed. >> >> I'm good with a comment tying `UseCompactObjectHeaders` to the condition. The comment can be removed when the flag is removed. "Ship it" :-) > > Wait a second, I've probably not been clear. `UseCompactObjectHeaders` is slated to become *on by default* and then slated to go away. That means that array base offets <= 16 bytes will become the default. The generated code will be something like: > > > if (haystack_len <= 8) { > // Copy 8 bytes onto stack > } else if (haystack_len <= 16) { > // Copy 16 bytes onto stack > } else { > // Copy 32 bytes onto stack > } > > > So that is 2 branches in this prologue code instead of originally 1. > > However, I just noticed that what I proposed is not enough. Consider what happens when haystack_len is 17. This would take the last case and copy 32 bytes. But we only have 17+8=25 bytes that we can guarantee to be available for copying. If this happens to be the array at the very beginning of the heap (very rare/unlikely), this would segfault. > > I think I need to mull over it some more to come up with a correct fix. I changed the header<16 version to be a small loop: https://github.com/rkennke/jdk/commit/bcba264ea5c15581647933db1163ca1dae39b6c5 The idea is the same as before, except it's made as a small loop with a maximum of 4 iterations (backward-branches), and it copies 8 bytes at a time, such that 1. it may copy up to 7 bytes that precede the array and 2. doesn't run over the end of the array (which would potentially crash). I am not sure if using XMM_TMP1 and XMM_TMP2 there is ok, or if it would encode better to use one of the regular registers.? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1781535745 From duke at openjdk.org Mon Sep 30 20:46:08 2024 From: duke at openjdk.org (joejackson1993) Date: Mon, 30 Sep 2024 20:46:08 GMT Subject: RFR: 8337389: Parallel: Remove unnecessary forward declarations in psScavenge.hpp Message-ID: <6pGuQAwh4aO2dyQcgugU0uz9ys8hulq1lyAckrKRDK0=.2f124adb-05d5-4a81-97ed-971a7d8b3a6b@github.com> trivial cleanup ------------- Commit messages: - 8337389: Remove unnecessary forward declarations Changes: https://git.openjdk.org/jdk/pull/20393/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20393&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337389 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20393.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20393/head:pull/20393 PR: https://git.openjdk.org/jdk/pull/20393 From zgu at openjdk.org Mon Sep 30 20:46:08 2024 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 30 Sep 2024 20:46:08 GMT Subject: RFR: 8337389: Parallel: Remove unnecessary forward declarations in psScavenge.hpp In-Reply-To: <6pGuQAwh4aO2dyQcgugU0uz9ys8hulq1lyAckrKRDK0=.2f124adb-05d5-4a81-97ed-971a7d8b3a6b@github.com> References: <6pGuQAwh4aO2dyQcgugU0uz9ys8hulq1lyAckrKRDK0=.2f124adb-05d5-4a81-97ed-971a7d8b3a6b@github.com> Message-ID: On Tue, 30 Jul 2024 18:14:36 GMT, joejackson1993 wrote: > trivial cleanup > Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated! I can confirm that Joseph Jackson <[joseph.jackson at servicenow.com](mailto:joseph.jackson at servicenow.com)> is an employee of ServiceNow, he is covered by ServiceNow OCA ------------- PR Comment: https://git.openjdk.org/jdk/pull/20393#issuecomment-2258945487 From duke at openjdk.org Mon Sep 30 20:46:08 2024 From: duke at openjdk.org (joejackson1993) Date: Mon, 30 Sep 2024 20:46:08 GMT Subject: RFR: 8337389: Parallel: Remove unnecessary forward declarations in psScavenge.hpp In-Reply-To: <6pGuQAwh4aO2dyQcgugU0uz9ys8hulq1lyAckrKRDK0=.2f124adb-05d5-4a81-97ed-971a7d8b3a6b@github.com> References: <6pGuQAwh4aO2dyQcgugU0uz9ys8hulq1lyAckrKRDK0=.2f124adb-05d5-4a81-97ed-971a7d8b3a6b@github.com> Message-ID: On Tue, 30 Jul 2024 18:14:36 GMT, joejackson1993 wrote: > trivial cleanup still waiting on oca confirmation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20393#issuecomment-2318817050 From phh at openjdk.org Mon Sep 30 21:47:39 2024 From: phh at openjdk.org (Paul Hohensee) Date: Mon, 30 Sep 2024 21:47:39 GMT Subject: RFR: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' [v3] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 23:39:15 GMT, William Kemper wrote: >> Use a template version of `right_n_bits` to use the same type for minuend and subtrahend. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments In that case, I'd put get_right_n_bits() in globalDefinitions.hpp because it's generally useful, and a comment on why, of course. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21236#issuecomment-2384203382